Corpus Linguistics Research Vol. 9 No. 1::한국코퍼스언어학회

1

최지선, 김일환

한국코퍼스언어학회 Corpus Linguistics Research Vol. 9 No. 1 2024.06 pp.1-12

※ 기관로그인 시 무료 이용이 가능합니다.

4,300원

This study aims to extract grammatical collocation patterns from the learner corpus constructed by the National Institute of the Korean Language (2015–2019) and to investigate the usage patterns of grammatical collocations among Korean language learners. Grammatical collocations are particularly interesting because they involve two or more linguistic units that function as a single, integrated entity and because these units include forms responsible for grammatical functions. In particular, grammatical collocations are not only a challenging aspect for foreign learners of Korean but also a crucial component in language instruction. Therefore, extracting grammatical collocations from a large-scale learner corpus and analyzing their characteristics is of utmost importance. To achieve this, this study extracted grammatical collocations according to proficiency levels from a morphologically analyzed learner corpus consisting of approximately 2.6 million word tokens and examined their characteristics. Furthermore, to highlight the distinctive features of grammatical collocations in the learner corpus, a comparative analysis was conducted with a native Korean corpus. Through this analysis, the study quantitatively and qualitatively examined the usage patterns of grammatical collocations among Korean learners based on their proficiency levels, while also explicitly identifying distinctions between learners' grammatical collocations and those of native speakers.

2

한국어 학습자의 접사 사용 양상 분석을 위한 자료 가공 방법론 연구

최정도

한국코퍼스언어학회 Corpus Linguistics Research Vol. 9 No. 1 2024.06 pp.13-30

※ 기관로그인 시 무료 이용이 가능합니다.

5,200원

The purpose of this study is to present a methodology for processing corpus in order to lay the foundation for comprehensively observing the use of Korean affixes by Korean learners. As basic data for preparing the corpus processing methodology, we will use the <Korean learner corpus> distributed by the National Institute of the Korean Language. However, it should be noted that the Korean affixes is not fully analyzed and reflected in the <Korean learner corpus> that has inherited the construction system of <Sejong Corpus>. Therefore, in order to compensate for this, I would like to analyze the list of all affixes listed in the <Standard Korean Dictionary> of the National Institute of the Korean Language. Only after going through this process will the full extent of Korean learners' affixes usage be fully observed with arsenic. In other words, it will be possible to analyze the usage pattern of Korean affixes used by learners, to analyze quantitatively and qualitatively the types of roots combined for each affixes, and to lay the foundation for thorough description by examining the errors in the learner's affixes use and root errors. It is expected that the results of the analysis of the pattern of Korean learners' use of affixees based on these data can be used for learners' vocabulary education.

3

어휘 교육을 위한 보조용언 ‘-어 버리다’와 ‘-고 말다’의 어휘 변별 연구

김선혜

한국코퍼스언어학회 Corpus Linguistics Research Vol. 9 No. 1 2024.06 pp.31-50

※ 기관로그인 시 무료 이용이 가능합니다.

5,500원

This study examines the semantic distinction between malda and beorida through corpus analysis. The non-substitutability of the two verbs is largely due to the syntactic and semantic constraints of malda, which are more restrictive than those of beorida. Despite similar conceptual meanings and syntactic combinations, differences emerge in morphological usage and modal nuances perceived by speakers. These findings suggest that malda and beorida form a challenging synonym pair for both teaching and learning, requiring careful semantic analysis. The identified constraints and differences may inform more effective materials for Korean language learners.

4

신문사의 정치 성향에 따른 북한 관련 보도 어휘 연구

안수빈, 강범일

한국코퍼스언어학회 Corpus Linguistics Research Vol. 9 No. 1 2024.06 pp.51-66

※ 기관로그인 시 무료 이용이 가능합니다.

4,900원

The press not only delivers a wide range of news to the public but also plays a crucial role in shaping public opinion on various issues and situations. Depending on their interests, newspapers may interpret the same issue differently. One of the major topics consistently covered by the South Korean press is North Korea. Since the division of the Korean Peninsula, issues related to North Korea have remained a focal point in South Korean society. This study analyzes and discusses the lexical characteristics of North Korea-related news coverage according to the political orientation of newspapers. Politically charged high-frequency words were selected from both progressive and conservative newspapers. An analysis of the usage examples of these words reveals that progressive and conservative newspapers tend to view the same topic from differing perspectives when reporting on North Korea.

5

공기어를 활용한 유의 부사 변별 연구 : ‘고작’, ‘기껏’, ‘겨우’, ‘불과’, '기껏해야'를 중심으로

이진

한국코퍼스언어학회 Corpus Linguistics Research Vol. 9 No. 1 2024.06 pp.67-89

※ 기관로그인 시 무료 이용이 가능합니다.

6,000원

This study examined the semantic distinctions among the synonymous Korean adverbs gojak (‘고작’), gikkeot (‘기껏’), gyewu (‘겨우’), bulgwa (‘불과’), and gikkeot-haeya (‘기껏해야’) through a quantitative analysis based on co-occurrence data from the Sejong Corpus. Using hierarchical clustering and correspondence analysis, the study visualized the degree of semantic proximity among these adverbs.The hierarchical clustering results show that gikkeot and gikkeot-haeya form the closest semantic cluster, followed by gojak and bulgwa. In contrast, gyewu emerged as a semantically independent adverb, forming a distinct cluster. Correspondence analysis further confirmed these patterns by illustrating that gojak, gikkeot, and gikkeot-haeya are located near the origin and share similar directional vectors, indicating overlapping co-occurrence profiles. Meanwhile, bulgwa and gyewu are clearly separated in different quadrants of the plot, reflecting their distinct semantic and syntactic properties.By integrating co-occurrence patterns with statistical analysis, this study supplements intuition-based and dictionary-driven synonym classifications. The findings affirm that a corpus-based approach is effective in distinguishing subtle semantic differences among synonymous adverbs. Further research is needed to expand this analysis to a wider range of adverbs and to incorporate pragmatic and discourse-level factors into the investigation..

6

세종 말뭉치 기반 한국어 사자성어 세부 분류 연구

Li Fei

한국코퍼스언어학회 Corpus Linguistics Research Vol. 9 No. 1 2024.06 pp.91-110

※ 기관로그인 시 무료 이용이 가능합니다.

5,500원

In the modern Korean vocabulary system, compared to other idiomatic expressions, four-character idioms can be considered a remarkable variety equipped with several distinctive functions in various linguistic fields, although they account for a relatively small proportion. To expand the classification framework of four-character idioms in Korean with detailed morphological and syntactic criteria, this paper aims to conduct a corpus-based quantitative analysis on four-character idioms, not just collecting thousands of realistic samples, but also extracting adequate morphemic co-occurrence frequency information from both the Sejong colloquial and written POS Tagged Corpus. The key contributions of this paper are threefold: (1) a comprehensive review of existing definitions and classifications of four-character idioms, with particular attention to identifying distinctive morphological and semantic traits; (2) the application of corpus-linguistic methodologies to propose an improved framework supplemented with frequency-based idiom lists; and (3) a multi-stage frequency analysis to explore the internal linguistic and contextual factors affecting the deployment of four-character idioms, culminating in a set of syntactic and pragmatic usage constraints and corresponding classification strategies.

7

대형 언어 모델의 문화적 편향 측정

정가연, 강채안, 김민선, 노소현, 최혜지, 김한샘

한국코퍼스언어학회 Corpus Linguistics Research Vol. 9 No. 1 2024.06 pp.111-137

※ 기관로그인 시 무료 이용이 가능합니다.

6,600원

This study analyzes cultural biases in major large language models from the United States, South Korea, and China (GPT-4, CLOVA X, and Qwen1.5) through story generation tasks using culture-specific names. Morphological analysis of the generated stories revealed that all models exhibited certain cultural biases. GPT-4 did not show negative biases toward Korean and Chinese cultures but tended to prefer traditional and rural settings when describing these cultures. In contrast, CLOVA X and Qwen1.5, which are specialized for their respective national languages, portrayed their own cultures in modern and positive terms while using a relatively higher proportion of negative adjectives and unrealistic settings when describing Western contexts. These findings are significant because they go beyond the conventional focus on biases in Western-centric models toward non-Western contexts. They newly reveal that East Asian-based models can also exhibit similar biases when representing Western cultures. This research suggests that current language model has fundamental limitations in achieving cultural neutrality and highlights the importance of balanced learning and reflection of diverse cultural contexts as a crucial challenge in language model development.

8

한국코퍼스언어학회 회칙 외

한국코퍼스언어학회

한국코퍼스언어학회 Corpus Linguistics Research Vol. 9 No. 1 2024.06 pp.138-154

※ 기관로그인 시 무료 이용이 가능합니다.

5,100원

Earticle

Issues

Corpus Linguistics Research

[Articles]