Chinese-English Cross-Lingual Information Retrieval in Biomedicine Using Ontology-Based Query Expansion

UoM administered thesis: Phd

  • Authors:
  • Xinkai Wang

Abstract

In this thesis, we propose a new approach to Chinese-English Biomedical cross-lingual information retrieval (CLIR) using query expansion based on the eCMeSH Tree, a Chinese-English ontology extended from the Chinese Medical Subject Headings (CMeSH) Tree. The CMeSH Tree is not designed for information retrieval (IR), since it only includes heading terms and has no term weighting scheme for these terms. Therefore, we design an algorithm, which employs a rule-based parsing technique combined with the C-value term extraction algorithm and a filtering technique based on mutual information, to extract Chinese synonyms for the corresponding heading terms. We also develop a term-weighting mechanism. Following the hierarchical structure of CMeSH, we extend the CMeSH Tree to the eCMeSH Tree with synonymous terms and their weights. We propose an algorithm to implement CLIR using the eCMeSH Tree terms to expand queries. In order to evaluate the retrieval improvements obtained from our approach, the results of the query expansion based on the eCMeSH Tree are individually compared with the results of the experiments of query expansion using the CMeSH Tree terms, query expansion using pseudo-relevance feedback, and document translation. We also evaluate the combinations of these three approaches. This study also investigates the factors which affect the CLIR performance, including a stemming algorithm, retrieval models, and word segmentation.

Details

Original languageEnglish
Awarding Institution
Supervisors/Advisors
Award date1 Aug 2012