Autores UPV
Gupta Parth Alokkumar,
Bali Kalika,
Banchs Rafael ,
Choudhury Monojit ,
Rosso Paolo
Abstract
In this paper, we formally introduce the concept of MixedScript
IR, and through analysis of the query logs of Bing
search engine, estimate the prevalence and thereby establish
the importance of this problem. We also give a principled solution
to handle the mixed-script term matching and spelling
variation where the terms across the scripts are modelled
jointly in a deep-learning architecture and can be compared
in a low-dimensional abstract space. We present an extensive
empirical analysis of the proposed method along with
the evaluation results in an ad-hoc retrieval setting of mixedscript
IR where the proposed method achieves significantly
better results (12% increase in MRR and 29% increase in
MAP) compared to other state-of-the-art baselines.