A Gender Identification of Korean Blog Writers through Machine Learning

Ji-Myoung Choi

216.73.216.43

개인회원 가입

개인회원
기관회원

개인회원 로그인

개인회원 가입으로 더욱 편리하게 이용하세요. 개인회원 가입

아이디/비밀번호를 잊으셨나요? 아이디/비밀번호 찾기

기관회원 로그인

소속기관에서 검색되지 않는 기관은 무료원문다운이 불가능합니다. 개인회원 가입 후 유료구매를 하시거나 소속기관 도서관에 이용문의해 주세요.

Home

[Articles]

A Gender Identification of Korean Blog Writers through Machine Learning

발행기관

한국코퍼스언어학회 바로가기
간행물

Corpus Linguistics Research 바로가기
통권

Vol. 7 No. 2 (2022.12)바로가기
페이지

pp.71-89
저자

Ji-Myoung Choi
언어

영어(ENG)
URL

https://www.earticle.net/Article/A426583

※ 기관로그인 시 무료 이용이 가능합니다.

5,400원

원문정보

초록

영어: Choi, J.M.(2023). A gender identification of Korean blog writers through machine learning. Gender identification of texts is a subfield of author analysis; author profiling. This study is an preliminary experiment on an automatic gender detection model for the 1,162 posts of 13 blog owners. As linguistic features, four types of n-gram (word, function word, character, and POS), phoneme frequency, and four lexical sets were chosen, and the support vector machine was adopted as a classifier. The classification accuracy ranged from 54% to 99% depending on the feature type. But the best performing model was produced(obtained) when all the features were inputted combined minus word n-grams. The most salient features distinguishing female from male writers were found to be the first person pronouns( (‘나(I, me)’ and ‘내(+*)’ for females vs. 저(-*)’ and 제(-)’ for males)) and sentence endings(‘다, ‘ᄂ다’ and ‘었다’ for females vs. , ‘습니다’, ‘ᄇ니다’, ‘습니다’, ‘네요’for males). This preliminary study could lead to further research into the gender language variations, and contribute to the development of a stable and robust author profiling system.

키워드

Gender detection SVM Machine learning N-grams Author analysis

저자

Ji-Myoung Choi [ Yonsei University ]

참고문헌

자료제공 : 네이버학술정보

간행물 정보

발행기관

발행기관명

한국코퍼스언어학회 [Korean Association for Corpus Linguistics]
설립연도
2012
분야
인문학>언어학
소개
본 학회는 코퍼스를 연구하거나 코퍼스를 사용하여 언어학 현상을 설명하고자 하는 연구자들로 구성된 학회이며, 코퍼스를 활용하여 다양한 언어현상을 설명하는 것을 설립목적으로 한다.

간행물

간행물명

Corpus Linguistics Research
간기
계간
pISSN
2465-812X
수록기간
2015~2025
십진분류
KDC 701 DDC 410

이 권호 내 다른 논문 / Corpus Linguistics Research Vol. 7 No. 2

피인용수 : 0건 (자료제공 : 네이버학술정보)

함께 이용한 논문 이 논문을 다운로드한 분들이 이용한 다른 논문입니다.

출처 : 네이버학술정보

0개의 논문이 장바구니에 담겼습니다.

페이지 저장

소속기관 조회

이용자님의 소속기관(단체)이 서비스에 가입되어 있는지 확인해 보십시오.
기관회원에 소속되어 있는 이용자는 원문을 무료로 이용할 수 있습니다.

상호: 주식회사 학술교육원 I 대표: 노방용 I 사업자등록번호: 122-81-88227 I 통신판매업신고번호: 제2008-인천부평-00176호 I 정보보호책임자: 이두영
주소: (21319)인천광역시 부평구 영성중로 50 미래타워 701호 I 전화: 0505-555-0740 I 팩스: 0505-555-0741 I 이메일: earticle@earticle.net

음성지원 및 돋보기 서비스

Earticle

A Gender Identification of Korean Blog Writers through Machine Learning

원문정보

초록

목차

키워드

저자

참고문헌

간행물 정보

발행기관

간행물

이 권호 내 다른 논문 / Corpus Linguistics Research Vol. 7 No. 2

피인용수 : 0건 (자료제공 : 네이버학술정보)

함께 이용한 논문 이 논문을 다운로드한 분들이 이용한 다른 논문입니다.