Volver atrás Publicación

Cross-Language High Similarity Search Using a Conceptual Thesaurus

Imprimir

¿Quieres contarnos tu reto? Pincha aquí y te ayudamos a encontrar una solución

Autores UPV

Gupta Parth Alokkumar, Barrón Cedeño Luis Alberto, Rosso Paolo

Año

2012

Revista

Lecture Notes in Computer Science

Abstract

This work addresses the issue of cross-language high similarity and near-duplicates search, where, for the given document, a highly similar one is to be identified from a large cross-language collection of documents. We propose a concept-based similarity model for the problem which is very light in computation and memory. We evaluate the model on three corpora of different nature and two language pairs English-German and English-Spanish using the Eurovoc conceptual thesaurus. Our model is compared with two state-of-the-art models and we find, though the proposed model is very generic, it produces competitive results and is significantly stable and consistent across the corpora.

Más Información