Use of Radical Features in Chinese Medical Text Mining

UoM administered thesis: Phd

  • Authors:
  • Yifei Wang

Abstract

The radical is an important feature of Chinese characters. It can represent the meaning or the pronunciation of the character. However, the use of radical features in text mining in Chinese is less concerned. The study will focus on the use of radical features in both Named Entity Recognition task and Terminology Extraction task. By reviewing the structure of Chinese characters, phono-semantic characters show the close relationship with the radicals. A phono-semantic character has a primary radical representing its meaning and a phonetic radical representing its pronunciation. A new method is proposed to identify a phono-semantic character by looking for the phonetic radical. The test is made on Shuowen Jiezi, an ancient dictionary, and the F-measure at 0.802 shows it can correctly identify most of the phono-semantic characters. Experiments using radical features have been made on both the basic machine learning method and deep learning method on named entity recognition. In deep learning method, three different embedding models using radical features are proposed. The result shows that the model uses primary radical and pinyin performs best with an F-measure at 0.709. An advanced version of C-value, RC-value, is proposed for terminology extraction task. RC-value beats C-value with higher F-measures by testing them on two different sets of data.

Details

Original languageEnglish
Awarding Institution
Supervisors/Advisors
Award date1 Aug 2021