Earticle

현재 위치 Home

Ternary Decomposition and Dictionary Extension for Khmer Word Segmentation

첫 페이지 보기
  • 발행기관
    한국정보기술응용학회 바로가기
  • 간행물
    JITAM KCI 등재 바로가기
  • 통권
    Vol.23 No.2 (2016.06)바로가기
  • 페이지
    pp.11-28
  • 저자
    Thaileang Sung, Insoo Hwang
  • 언어
    영어(ENG)
  • URL
    https://www.earticle.net/Article/A280896

※ 기관로그인 시 무료 이용이 가능합니다.

5,200원

원문정보

초록

영어
In this paper, we proposed a dictionary extension and a ternary decomposition technique to improve the effectiveness of Khmer word segmentation. Most word segmentation approaches depend on a dictionary. However, the dictionary being used is not fully reliable and cannot cover all the words of the Khmer language. This causes an issue of unknown words or out-of-vocabulary words. Our approach is to extend the original dictionary to be more reliable with new words. In addition, we use ternary decomposition for the segmentation process. In this research, we also introduced the invisible space of the Khmer Unicode (char\u200B) in order to segment our training corpus. With our segmentation algorithm, based on ternary decomposition and invisible space, we can extract new words from our training text and then input the new words into the dictionary. We used an extended wordlist and a segmentation algorithm regardless of the invisible space to test an unannotated text. Our results remarkably outperformed other approaches. We have achieved 88.8%, 91.8% and 90.6% rates of precision, recall and F-measurement.

목차

Abstract
 1. Introduction
 2. Khmer Language Overview
  2.1 Khmer Language
  2.2 Chuon Nath Dictionary
  2.3 Problems in Khmer Word Segmentation
 3. Research Reviews
  3.1 KCC Bigram
  3.2 Trainable Rule-based
 4. Proposed Approach
  4.1 Initialization
  4.2 Decomposition
  4.3 New Word Extraction
  4.4 Extended Dictionary
 5. Experiment
  5.1 Experimental Setup
  5.2 Experimental Results
  5.3 Discussion
 6. Conclusion
 References

키워드

Word Segmentation Decomposition Natural Language Processing Khmer

저자

  • Thaileang Sung [ Graduate Student of Information Systems Dept., Jeonju University ]
  • Insoo Hwang [ Professor of Smart Media, Jeonju University ] Corresponding author

참고문헌

자료제공 : 네이버학술정보

간행물 정보

발행기관

  • 발행기관명
    한국정보기술응용학회 [The Korea Society of Information Technology Applications]
  • 설립연도
    1999
  • 분야
    사회과학>경영학
  • 소개
    본 학회는 정보기술 관련 분야의 연구 및 교류를 촉진하여 국가 및 기업정보화 발전에 공헌함을 그 목적으로 한다.

간행물

  • 간행물명
    JITAM [Journal of Information Technology Applications and Management]
  • 간기
    격월간
  • pISSN
    1598-6284
  • eISSN
    2508-1209
  • 수록기간
    1999~2026
  • 십진분류
    KDC 005 DDC 005

이 권호 내 다른 논문 / JITAM Vol.23 No.2

    피인용수 : 0(자료제공 : 네이버학술정보)

    함께 이용한 논문 이 논문을 다운로드한 분들이 이용한 다른 논문입니다.

      페이지 저장