IDRAAQ: New Arabic Question Answering system based on Query Expansion and Passage Retrieval

Autores UPV
Revista CLEF Conference on Multilingual and Multimodal Information Access Evaluation


Arabic is one of the languages which are less concerned by researchers in the field of Question Answering. The paper presents core modules of a new Arabic Question Answering system called IDRAAQ. These modules aim at enhancing the quality of retrieved passages with respect to a given question. Experiments have been conducted in the framework of the main task of QA4MRE@CLEF 2012 that includes this year the Arabic language. Two runs were submitted. Both runs only use reading test documents to answer questions. The difference between the two runs exists in the answer validation process which is more relaxed in the second run. The Passage Retrieval (PR) module of our system presents multi-levels of processing in order to improve the quality of returned passage and thereafter the performances of the whole system. The PR module of IDRAAQ is based on keyword-based and structure-based levels that respectively consist in: (i) a Query Expansion (QE) process relying on Arabic WordNet semantic relations; (ii) a Distance Density N-gram Model based passage retrieval system. The latter level uses passages retrieved on the basis of QE queries and re-ranks them according to a structure-based similarity score. Named Entities are recognized by means of a mapping between the YAGO ontology and Arabic WordNet. The experiments that we conducted show that with respect to the accuracy and c@1 measure, IDRAAQ registered encouraging performances in particular with factoid questions. The same experiments allowed us to identify the lacks of the system especially when processing non factoid questions and at the Answer Validation stage. The IDRAAQ system, which is still under construction, will integrate a Conceptual Graph-based passage re-ranking introducing a semantic level to its PR module.