LLM 학습 데이터의 한국어 유해표현 검출 체계 연구

조용현; 임춘성; 이성웅

216.73.216.41

개인회원 가입

개인회원
기관회원

개인회원 로그인

개인회원 가입으로 더욱 편리하게 이용하세요. 개인회원 가입

아이디/비밀번호를 잊으셨나요? 아이디/비밀번호 찾기

기관회원 로그인

소속기관에서 검색되지 않는 기관은 무료원문다운이 불가능합니다. 개인회원 가입 후 유료구매를 하시거나 소속기관 도서관에 이용문의해 주세요.

Home

LLM 학습 데이터의 한국어 유해표현 검출 체계 연구
Detection System for Harmful Expressions in Korean LLM Training Data

발행기관

국제차세대융합기술학회 바로가기
간행물

차세대융합기술학회논문지 KCI 등재 바로가기
통권

제9권 11호 (2025.11)바로가기
페이지

pp.2831-2842
저자

조용현, 임춘성, 이성웅
언어

한국어(KOR)
URL

https://www.earticle.net/Article/A476297

※ 기관로그인 시 무료 이용이 가능합니다.

4,300원

원문정보

초록

영어: This study proposes a two-stage detection system for the pre-hoc filtering of harmful expressions in Korean LLM training data and validates its effectiveness using related datasets. Harmful expressions were defined into 11 categories (insult, profanity, sexism, obscenity, racial/region, disability, age, religion, political ideology, occupation, and violence/crime). Public Korean corpora were collected and preprocessed through deduplication, normalization, and quality filtering, while preserving multi-label annotations to capture contextual and implicit harms. The final corpus consisted of 200,000 sentences (100,000 harmful and 100,000 non-harmful). In Stage 1, a binary classifier rapidly distinguished harmful from non-harmful sentences, while Stage 2 performed fine-grained multi-label classification across the 11 categories using KoGPT-2, KrMedium (KR-BERT), KoELECTRA, and KcELECTRA. Experimental results showed that all models achieved F1-scores above 0.99 in Stage 1, and that KcELECTRA outperformed the others in Stage 2 with a micro-F1 of 0.8291 and ROC-AUC of 0.9122. In conclusion, this study proposed a detection framework for harmful expressions in Korean and validated its effectiveness through experiments.

한국어: 본 논문에서는 LLM 학습 데이터에서 한국어 유해표현을 사전에 정화하기 위해 2단계 검출 체계를 제안 하고, 관련 데이터를 활용하여 유효성 검증을 진행하였다. 유해표현은 11개 카테고리(모욕, 욕설, 외설, 폭력범죄조 장, 성혐오, 연령차별, 인종·지역 차별, 장애, 종교, 정치성향, 직업비하)로 정의하고, 데이터는 공개 한국어 코퍼스 에서 수집하여 중복 제거, 정규화, 품질필터를 거쳐 전처리했으며, 다중레이블 분류를 통해 맥락적/암시적 유해를 포착할 수 있도록 구성하였다. 최종 코퍼스는 총 20만 문장(유해 10만/비유해(정상) 10만)으로 구성하였다. 유해표 현 검출 1단계는 이진 탐지로 유해/비유해 문장을 빠르게 식별하고, 2단계에서는 11개 범주 다중레이블 분류로 정 밀 판별한다(모델: KoGPT-2, KrMedium(KR-BERT), KoELECTRA, KcELECTRA). 실험결과 1단계는 모든 모 델에서 F1-score가 0.99 이상의 매우 높은 성능을 나타냈고, 2단계는 KcELECTRA가 micro-F1 0.8291, ROC‑AUC 0.9122로 다른 모델 대비 높은 성능을 나타냈다. 결론적으로, 본 연구는 한국어 유해표현에 대한 유해표현 검출 체계를 제시하고, 실험을 통해서 효과를 검증하였다.

요약
Abstract
Ⅰ. 서론
Ⅱ. 이론적 배경 및 선행연구
2.1 유해표현 연구 동향
2.2 한국어 유해표현 연구
Ⅲ. 연구방법
3.1 유해표현 정의 및 분류
3.2 데이터셋 구축
3.3 유해표현 모델
Ⅳ. 실험 및 평가
4.1 실험개요
4.2 성능지표 측정
4.3 결과
Ⅴ. 결론 및 향후 계획
5.1 결론
5.2 제언 및 기대효과
5.3 연구 한계점 및 후속연구
REFERENCES

키워드

한국어 유해표현 혐오표현 LLM 학습데이터 데이터 편향성 유해표현 검출 모델 Harmful expression Hate speech LLM datazset Dataset bias Harmful expression detection

저자

조용현 [ Yong Hyun Jo | 연세대학교 기술정책협동과정 박사과정 ]
임춘성 [ Choon Seong Leem | 연세대학교 산업공학과 교수 ] Corresponding Author
이성웅 [ Seong Woong Lee | 연세대학교 기술정책 박사 ]

참고문헌

자료제공 : 네이버학술정보

간행물 정보

발행기관

발행기관명

국제차세대융합기술학회 [International Next-generation Convergence technology Association]
설립연도
2017
분야
복합학>기술정책
소개
Ever since next generation convergence technology became one of the most important industries in the nation, computing professionals have encountered a growing number of challenges. Along with scholars and colleagues in related fields, they have gathered in avariety of forums and meetings over the last few decades to share their knowledge, experiences and the outcome of their research. These exchanges have led to the founding of the International Next-generation Convergence technology (INCA) on December 1, 2015. INCA was registered as an incorporated association under the Ministry of Information and Communications. The main purpose of the organization is to improve our society by achieving the highest capability possible in next generation convergence technology.

간행물

간행물명

차세대융합기술학회논문지 [The Journal of Next-generation Convergence Technology Association]
간기
월간
pISSN
2508-8270
수록기간
2017~2026
등재여부
KCI 등재
십진분류
KDC 506 DDC 606

이 권호 내 다른 논문 / 차세대융합기술학회논문지 제9권 11호

피인용수 : 0건 (자료제공 : 네이버학술정보)

함께 이용한 논문 이 논문을 다운로드한 분들이 이용한 다른 논문입니다.

출처 : 네이버학술정보

0개의 논문이 장바구니에 담겼습니다.

페이지 저장

소속기관 조회

이용자님의 소속기관(단체)이 서비스에 가입되어 있는지 확인해 보십시오.
기관회원에 소속되어 있는 이용자는 원문을 무료로 이용할 수 있습니다.

상호: 주식회사 학술교육원 I 대표: 노방용 I 사업자등록번호: 122-81-88227 I 통신판매업신고번호: 제2008-인천부평-00176호 I 정보보호책임자: 이두영
주소: (21319)인천광역시 부평구 영성중로 50 미래타워 701호 I 전화: 0505-555-0740 I 팩스: 0505-555-0741 I 이메일: earticle@earticle.net

음성지원 및 돋보기 서비스

Earticle

LLM 학습 데이터의 한국어 유해표현 검출 체계 연구
Detection System for Harmful Expressions in Korean LLM Training Data

원문정보

초록

목차

키워드

저자

참고문헌

간행물 정보

발행기관

간행물

이 권호 내 다른 논문 / 차세대융합기술학회논문지 제9권 11호

피인용수 : 0건 (자료제공 : 네이버학술정보)

함께 이용한 논문 이 논문을 다운로드한 분들이 이용한 다른 논문입니다.

Earticle

LLM 학습 데이터의 한국어 유해표현 검출 체계 연구 Detection System for Harmful Expressions in Korean LLM Training Data

원문정보

초록

목차

키워드

저자

참고문헌

간행물 정보

발행기관

간행물

이 권호 내 다른 논문 / 차세대융합기술학회논문지 제9권 11호

피인용수 : 0건 (자료제공 : 네이버학술정보)

함께 이용한 논문 이 논문을 다운로드한 분들이 이용한 다른 논문입니다.

LLM 학습 데이터의 한국어 유해표현 검출 체계 연구
Detection System for Harmful Expressions in Korean LLM Training Data