Multimodal Computer-Assisted Transcription of Text Images at Character-Level Interaction

Autores UPV


Currently, automatic handwriting recognition systems are ineffectual in unconstrained handwriting documents. Therefore, to obtain perfect transcriptions, heavy human intervention is required to validate and correct the results of such systems. Given that this post-editing process is inefficient and uncomfortable, a multimodal interactive approach has been proposed in previous works, which aims at obtaining correct transcriptions with the minimum human effort. In this approach, the user interacts with the system by means of an e-pen and/or more traditional methods such as keyboard or mouse. This user's feedback allows to improve system accuracy and multimodality increases system ergonomics and user acceptability. Until now, multimodal interaction has been studied only at whole-word level. In this work, multimodal interaction at character-level is studied, that may lead to more effective interactivity, since it is faster and easier to write only one character rather than a whole word. Here we study this kind of fine-grained multimodal interaction and present developments that allow taking advantage of interaction-derived context to significantly improve feedback decoding accuracy. Empirical tests on three cursive handwritten tasks suggest that, despite losing the deterministic accuracy of traditional peripherals, this approach can save significant amounts of user effort with respect to fully manual transcription as well as to non-interactive post-editing correction.