Volver atrás Publicación

Semiautomatic Text Baseline Detection in Large Historical Handwritten Documents

Imprimir

¿Quieres contarnos tu reto? Pincha aquí y te ayudamos a encontrar una solución

Autores UPV

Bosch Campos Vicente, Toselli Alejandro Héctor, Vidal Ruiz Enrique

Año

2014

CONGRESO

Semiautomatic Text Baseline Detection in Large Historical Handwritten Documents

Abstract

A semiautomatic iterative process for the detection of text baselines in historical handwritten document images is presented. It relies on the use of Hidden Markov Models (HMM) to provide initial text baselines hypotheses, followed by user review in order to produce ground-truth quality results. Using the set of revised baselines as ground truth, the HMMs are re-trained before processing the next batch of pages. This process has been evaluated in the context of a real transcription task which, as a by-product, has produced line-detection ground truth. We show that the usage of a formal, HMMbased line-detection approach which requires training data, not only yields good detection results but is also of practical use in large handwritten image collections. Through experiments with real users we show that the proposed approach has interesting features; namely, accuracy, scalability and ease of use, as well as low overall human effort requirements.