Earticle

다운로드

Finding an Optimal Classification Model for Analyzing Linguistic Data

  • 간행물
    인문언어 KCI 등재 바로가기
  • 권호(발행년)
    제26권 2호 (2024.12) 바로가기
  • 페이지
    pp.205-236
  • 저자
    Wonbin Kim
  • 언어
    영어(ENG)
  • URL
    https://www.earticle.net/Article/A459546

원문정보

초록

영어
This study aims to identify an AI classification model that is optimal for the classification of linguistic data. For this purpose, three commonly used classification models (XGBoost classifier, Random Forest Classifier, and SVM classifier) are compared in terms of their performance. Specifically, the three models are trained to classify the input data into essays and dialogues based on the syntactic complexity-related characteristics that distinguish between essays and dialogues. To determine if a model performing well on balanced data also performs well on imbalanced data, the three models’ performances are measured under two conditions: when the training dataset is balanced and when it is imbalanced. The performances of the trained models on the first test dataset are evaluated using accuracy, F1-score, normalized confusion matrix, and the area under the receiver operating characteristic curve. The performances on the second test dataset are assessed in terms of accuracy, confusion matrix, precision, and recall. The results demonstrate that the Random Forest Classifier has the best performance among the three models regardless of the balance of training data.

목차

1. Introduction
2. Background
2.1 XGBoost Classifier
2.2 Random Forest Classifier
2.3 Support Vector Machine Classifier
3. Method
3.1 Data
3.2 Procedure
4. Results
4.1 Results from XGBoost Classifier
4.2 Results from Random Forest Classifier
4.3 Results from Support Vector Machine Classifier
5. Discussion and Conclusion
References
[Abstract]

저자

  • Wonbin Kim [ 김원빈 | Yonsei University ]

참고문헌

자료제공 : 네이버학술정보

    간행물 정보

    • 간행물
      인문언어 [LINGUA HUMANITATIS]
    • 간기
      반년간
    • pISSN
      1598-2130
    • 수록기간
      2000~2025
    • 등재여부
      KCI 등재
    • 십진분류
      KDC 705 DDC 405