トピックモデルを用いた日本語テキスト マイニングの研究 -旧JLPTの読解の既出問題に対する分析を中心に-
Research on Japanese text mining using the Topic Model - Focusing on the analysis of past test reading comprehension questions in the previous format of JLPT -
In this paper, as one of the attempts to effectively utilize the vast amount of text data, I have introduced a text mining technique called Topic Model into the field of Japanese studies. Concretely, the texts of the reading comprehension parts of the previous format JLPT for the past 20 years were collected, and Topic Model analysis was carried out. The following points were made clear by such a study. First of all, it was confirmed from actual data that the subjects of the previous format JLPT tried to avoid topic-specific biases when selecting and producing the texts for the questions. Next, the text can be statistically classified into four main topics: “Private relationships such as family and work,” “Communications related to schedules,” “Public relations related to the country and society,” and “Economic activity.” The techniques and results of topic model analysis in this paper were empirical analyzes of actual existing questions. It is considered significant in that it can be applied to all fields of Japanese studies that are needed. Of course, the discussion in this paper is limited to the texts of the previous format JLPT, not the new format JLPT, and the amount of data is relatively small, although it covers all the data for the past 20 years. In addition, a comparative analysis with other texts was not possible. Therefore, it seems that there is still room for improvement in this paper, but I would like to address this as a future issue.
한국일본언어문화학회 [Japanese Language & Culture Association of Korea]
설립연도
2001
분야
인문학>일본어와문학
소개
본 학회는 일본어학 및 일본문학은 물론, 일본의 정치, 경제, 문화, 사회 등의 일본학 전반에 걸친 연구 및 일본의 언어, 문화를 매체로 한 한국과의 비교 연구를 대상으로 하고 있다. 본 학회는 회원들에게 연구 발표 및 정보 교환의 기회를 부여하고 나아가 한국에서의 바람직한 일본 연구 자세를 확립하는 것을 주된 목표로 하고 있다.