Earticle

현재 위치 Home

[Articles]

A Gender Identification of Korean Blog Writers through Machine Learning

첫 페이지 보기
  • 발행기관
    한국코퍼스언어학회 바로가기
  • 간행물
    Corpus Linguistics Research 바로가기
  • 통권
    Vol. 7 No. 2 (2022.12)바로가기
  • 페이지
    pp.71-89
  • 저자
    Ji-Myoung Choi
  • 언어
    영어(ENG)
  • URL
    https://www.earticle.net/Article/A426583

※ 기관로그인 시 무료 이용이 가능합니다.

5,400원

원문정보

초록

영어
Choi, J.M.(2023). A gender identification of Korean blog writers through machine learning. Gender identification of texts is a subfield of author analysis; author profiling. This study is an preliminary experiment on an automatic gender detection model for the 1,162 posts of 13 blog owners. As linguistic features, four types of n-gram (word, function word, character, and POS), phoneme frequency, and four lexical sets were chosen, and the support vector machine was adopted as a classifier. The classification accuracy ranged from 54% to 99% depending on the feature type. But the best performing model was produced(obtained) when all the features were inputted combined minus word n-grams. The most salient features distinguishing female from male writers were found to be the first person pronouns( (‘나(I, me)’ and ‘내(+*)’ for females vs. 저(-*)’ and 제(-)’ for males)) and sentence endings(‘다, ‘ᄂ다’ and ‘었다’ for females vs. , ‘습니다’, ‘ᄇ니다’, ‘습니다’, ‘네요’for males). This preliminary study could lead to further research into the gender language variations, and contribute to the development of a stable and robust author profiling system.

목차

ABSTRACT
1. Introduction
2. Related research
3. Method
3.1. Data
3.2. Linguistic features
3.3. Procedures
4. Result
5. Discussion: what are the distinguishing features in gender differentiation?
6. Conclusion
References

키워드

Gender detection SVM Machine learning N-grams Author analysis

저자

  • Ji-Myoung Choi [ Yonsei University ]

참고문헌

자료제공 : 네이버학술정보

간행물 정보

발행기관

  • 발행기관명
    한국코퍼스언어학회 [Korean Association for Corpus Linguistics]
  • 설립연도
    2012
  • 분야
    인문학>언어학
  • 소개
    본 학회는 코퍼스를 연구하거나 코퍼스를 사용하여 언어학 현상을 설명하고자 하는 연구자들로 구성된 학회이며, 코퍼스를 활용하여 다양한 언어현상을 설명하는 것을 설립목적으로 한다.

간행물

  • 간행물명
    Corpus Linguistics Research
  • 간기
    계간
  • pISSN
    2465-812X
  • 수록기간
    2015~2025
  • 십진분류
    KDC 701 DDC 410

이 권호 내 다른 논문 / Corpus Linguistics Research Vol. 7 No. 2

    피인용수 : 0(자료제공 : 네이버학술정보)

    함께 이용한 논문 이 논문을 다운로드한 분들이 이용한 다른 논문입니다.

      페이지 저장