Earticle

현재 위치 Home

A Lexical and Syntactic Analysis System for Chinese Electronic Medical Record

첫 페이지 보기
  • 발행기관
    보안공학연구지원센터(IJUNESST) 바로가기
  • 간행물
    International Journal of u- and e- Service, Science and Technology 바로가기
  • 통권
    Vol.9 No.9 (2016.09)바로가기
  • 페이지
    pp.305-318
  • 저자
    Zhipeng Jiang, Xue Dai, Yi Guan, Fangfang Zhao
  • 언어
    영어(ENG)
  • URL
    https://www.earticle.net/Article/A285064

※ 원문제공기관과의 협약기간이 종료되어 열람이 제한될 수 있습니다.

원문정보

초록

영어
Lexical and syntactic analysis, including word segmentation, part-of-speech (POS) tagging, shallow parsing and full parsing, are essential for medical language processing (MLP). However, research on full parsing, even shallow parsing and POS tagging for Chinese electronic medical record (CEMR), has not been carried out because of the lack of annotated corpus on CEMR. In this paper, we built a corpus of 5,024 sentences from CEMR with word segmentation, POS tags and phrase tags, of them, 2,553 are annotated as full parsing trees. Inter-annotator agreement results: Chinese word segmentation (97.56%), POS tagging (93.34%), shallow parsing (96.5%), full parsing (91.22%). A lexical and syntactic analysis system for CEMR is developed and evaluated based on above corpus. Of its components, we proposed a joint model for word segmentation and POS tagging with the transformation-based error-driven model as correction postprocessing to alleviate the problem of error accumulation, the F1-score of word segmentation and POS tagging were 94.39% and 93.2%, respectively. A shallow parsing model under the framework of group learning we proposed was developed, which enriched word features by word embedding from large unlabeled CEMRs and achieved the F1-score of 96.3%. At last, we presented a state-of-art full parser combining the Berkeley parser and the Stanford parser to outperform the best single parser by 3.68%. The evaluation results show a substantial benefit to statistical machine learning models from the annotated CEMR. These works are the foundation for natural language processing (NLP) technologies applied to CEMR.

목차

Abstract
 1. Introduction
 2. Background
 3. System Description
  3.1. Corpus
  3.2. Tokenizer and POS Tagger
  3.3. Shallow Parser
  3.4. Full Parser
 4. Experiments and Analysis
  4.1. Corpus
  4.2. Tokenizer and POS Tagger
  4.3. Shallow Parser
  4.4. Full Parser
 5. Discussion
 6. Conclusion
 References

키워드

CEMR Chinese word segmentation part-of-speech tagging shallow parsing full parsing

저자

  • Zhipeng Jiang [ Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China ]
  • Xue Dai [ Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China ]
  • Yi Guan [ Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China ]
  • Fangfang Zhao [ Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China ]

참고문헌

자료제공 : 네이버학술정보

간행물 정보

발행기관

  • 발행기관명
    보안공학연구지원센터(IJUNESST) [Science & Engineering Research Support Center, Republic of Korea(IJUNESST)]
  • 설립연도
    2006
  • 분야
    공학>컴퓨터학
  • 소개
    1. 보안공학에 대한 각종 조사 및 연구 2. 보안공학에 대한 응용기술 연구 및 발표 3. 보안공학에 관한 각종 학술 발표회 및 전시회 개최 4. 보안공학 기술의 상호 협조 및 정보교환 5. 보안공학에 관한 표준화 사업 및 규격의 제정 6. 보안공학에 관한 산학연 협동의 증진 7. 국제적 학술 교류 및 기술 협력 8. 보안공학에 관한 논문지 발간 9. 기타 본 회 목적 달성에 필요한 사업

간행물

  • 간행물명
    International Journal of u- and e- Service, Science and Technology
  • 간기
    격월간
  • pISSN
    2005-4246
  • 수록기간
    2008~2016
  • 십진분류
    KDC 505 DDC 605

이 권호 내 다른 논문 / International Journal of u- and e- Service, Science and Technology Vol.9 No.9

    피인용수 : 0(자료제공 : 네이버학술정보)

    함께 이용한 논문 이 논문을 다운로드한 분들이 이용한 다른 논문입니다.

      페이지 저장