Earticle

현재 위치 Home

Text Classification with Heterogeneous Data Using Multiple Self-Training Classifiers

첫 페이지 보기
  • 발행기관
    한국경영정보학회 바로가기
  • 간행물
    Asia Pacific Journal of Information Systems KCI 등재 SCOPUS 바로가기
  • 통권
    제29권 제4호 (2019.12)바로가기
  • 페이지
    pp.789-816
  • 저자
    William Xiu Shun Wong, Donghoon Lee, Namgyu Kim
  • 언어
    영어(ENG)
  • URL
    https://www.earticle.net/Article/A367258

※ 기관로그인 시 무료 이용이 가능합니다.

6,700원

원문정보

초록

영어
Text classification is a challenging task, especially when dealing with a huge amount of text data. The performance of a classification model can be varied depending on what type of words contained in the document corpus and what type of features generated for classification. Aside from proposing a new modified version of the existing algorithm or creating a new algorithm, we attempt to modify the use of data. The classifier performance is usually affected by the quality of learning data as the classifier is built based on these training data. We assume that the data from different domains might have different characteristics of noise, which can be utilized in the process of learning the classifier. Therefore, we attempt to enhance the robustness of the classifier by injecting the heterogeneous data artificially into the learning process in order to improve the classification accuracy. Semi-supervised approach was applied for utilizing the heterogeneous data in the process of learning the document classifier. However, the performance of document classifier might be degraded by the unlabeled data. Therefore, we further proposed an algorithm to extract only the documents that contribute to the accuracy improvement of the classifier.

목차

ABSTRACT
Ⅰ. Introduction
Ⅱ. Related Work
2.1. Data Heterogeneity and Robustness
2.2. Semi-Supervised Learning
2.3. Ensemble Learning
Ⅲ. Research Methodology
3.1. Research Overview
3.2. Module 1: Heterogeneity Injection
3.3. Module 2: Classification Rule Selection
Ⅳ. Data Analysis and Results
4.1. Data Description
4.2. Data Preparation
4.3. Experiments and Results
Ⅴ. Conclusion

키워드

Text Mining Text Classification Heterogeneity Learning Semi-Supervised Learning Ensemble Learning

저자

  • William Xiu Shun Wong [ Senior Consultant, Biz Consulting Team, Datasolution Inc., Korea ]
  • Donghoon Lee [ Staff, BI LAB, Cafe24 Corp., Korea ]
  • Namgyu Kim [ Professor, School of Management Information Systems, Kookmin University, Korea ] Corresponding author

참고문헌

자료제공 : 네이버학술정보

간행물 정보

발행기관

  • 발행기관명
    한국경영정보학회 [The Korea Society of Management information Systems]
  • 설립연도
    1989
  • 분야
    사회과학>경영학
  • 소개
    이 학회는 경영정보학의 연구 및 교류를 촉진하고 학문의 발전과 응용에 공헌함을 목적으로 합니다.

간행물

  • 간행물명
    Asia Pacific Journal of Information Systems
  • 간기
    계간
  • pISSN
    2288-5404
  • eISSN
    2288-6818
  • 수록기간
    1990~2026
  • 등재여부
    KCI 등재,SCOPUS
  • 십진분류
    KDC 325 DDC 658

이 권호 내 다른 논문 / Asia Pacific Journal of Information Systems 제29권 제4호

    피인용수 : 0(자료제공 : 네이버학술정보)

    함께 이용한 논문 이 논문을 다운로드한 분들이 이용한 다른 논문입니다.

      페이지 저장