Volver atrás Publicación

Character-Based Handwritten Text Recognition of Multilingual Documents

Imprimir

¿Quieres contarnos tu reto? Pincha aquí y te ayudamos a encontrar una solución

Autores UPV

Agua Teba Miguel Ángel del, Serrano Martínez-Santos Nicolás, Civera Saiz Jorge, Juan Císcar Alfonso

Año

2012

Revista

Communications in Computer and Information Science

Abstract

An effective approach to transcribe handwritten text documents is to follow a sequential interactive approach. During the supervision phase, user corrections are incorporated into the system through an ongoing retraining process. In the case of multilingual documents with a high percentage of out-of-vocabulary (OOV) words, two principal issues arise. On the one hand, a minor yet important matter for this interactive approach is to identify the language of the current text line image to be transcribed, as a language dependent recognisers typically performs better than a monolingual recogniser. On the other hand, word-based language models suffer from data scarcity in the presence of a large number of OOV words, degrading their estimation and affecting the performance of the transcription system. In this paper, we successfully tackle both issues deploying character-based language models combined with language identification techniques on an entire 764-page multilingual document. The results obtained significantly reduce previously reported results in terms of transcription error on the same task, but showed that a language dependent approach is not effective on top of character-based recognition of similar languages.

Más Información