An Empirical Analysis of Data Selection Techniques in Statistical Machine Translation

Autores UPV
Año
Revista PROCESAMIENTO DEL LENGUAJE NATURAL

Abstract

Domain adaptation has recently gained interest in statistical machine translation. One of the adaptation techniques is based in the selection data. Data selection aims to select the best subset of the bilingual sentences from an available pool of sentences, with which to train a SMT system. In this paper, we study how aect the bilingual corpora used for the data selection methods in the translation quality.