제미나이와 구글은 인간번역 문체에 얼마나 가까워졌나 : 5개 언어, 5개 머신러닝 활용연구
How close are Gemini and Google to human translation style : A five-algorithm machine learning analysis across five languages.
This study quantitatively examines stylistic differences between human and machine translation in literary texts. A parallel corpus was constructed from five source languages-German, Arabic, Indonesian, Japanese, and Chinese-each translated into Korean by professional human translators and two machine translation systems, Google Translate and Gemini. Using five machine learning algorithms (Decision Tree, Random Forest, XG-Boost, t-SNE, and PCA), we analyzed a range of stylometric features to determine the separability of human and machine outputs. Our results show that human and machine translations are highly distinguishable, with classification models achieving over 90%accuracy. Gemini’s stylistic profile aligned more closely with that of Google Translate than with human translators. The analysis revealed comma ratio as the strongest discriminator, followed by sentence count and pronoun ratio. These findings provide empirical evidence of a persistent stylistic gap between human and machine translation in the literary domain and identify specific linguistic features that mark this divide.
목차
Abstract I. 들어가는 말 II. 문체 III. 번역의 문체 IV. 기계번역의 문체 IV. 실험 1. 연구질문 2. 분석자료 3. 분석방법 V. 결과 및 분석 1. 실험 결과 2. 결과 분석 VI. 결론 참고문헌
키워드
machine translationtranslation styleliterary translationmachine learninggenerative AI