딥러닝을 활용한 국어사 말뭉치 분석 방안 연구

일반논문

딥러닝을 활용한 국어사 말뭉치 분석 방안 연구
A Study on the Analysis of Historical Korean Corpus Using Deep Learning

한국어: 본고는 국어사 말뭉치의 구축 현황을 분석하고 국립국어원의 ‘국어 역사 말뭉치’를 대상으로 하여 딥러닝 시스템으로 형태 분석을 시도한 것이다. 현대 국어 연구와 달리 국어사 분야에서는 다양한 디지털 인문학적 방법론 적용에 한계가 있었다. 딥러닝 기반의 국어사 형태 분석을 위해 필요한 세종 말뭉치의 구축 현황을 분석하여 언해문과 원문을 추출하고 각 세기별 국어의 형태론적 특징을 반영할 필요성을 확인하였다. 정밀한 품사 태깅을 위해 학습 데이터의 확장과 모델 고도화가 요구되나 국어사 연구에 딥러닝 기술을 적용하여 형태 분석의 가능성을 모색한 시도로 세기별 형태 분석 모델의 개발과 국어사 정보의 활용을 위해 딥러닝을 활용한 초기 연구로 의의를 지닌다.

영어: This study investigates methods for applying deep learning to the analysis of historical Korean corpus. While contemporary Korean linguistics has widely adopted computational and digital humanities approaches, research on historical Korean has been limited by the complexity of older language forms and the scarcity of annotated data. Focusing on the National Institute of Korean Language’s Historical Korean Corpus, this research examines the construction status of the Sejong Corpus, extracts vernacular translations and original texts, and identifies the need to incorporate century-specific morphological characteristics. The study demonstrates that deep learning-based morphological analysis is feasible but requires both the expansion of training data and the refinement of model architectures to achieve accurate part-of-speech tagging. As one of the earliest attempts to integrate deep learning into historical Korean linguistics, this work highlights the potential for developing century-specific morphological models and for advancing the broader utilization of linguistic information in historical Korean corpus.

자료제공 : 네이버학술정보