제한대역 음성에서의 화자 인식을 위한 다중 특징 융합 방법

전찬준; 고창환; 전옥엽; 박남인

216.73.216.176

개인회원 가입

개인회원
기관회원

개인회원 로그인

개인회원 가입으로 더욱 편리하게 이용하세요. 개인회원 가입

아이디/비밀번호를 잊으셨나요? 아이디/비밀번호 찾기

기관회원 로그인

소속기관에서 검색되지 않는 기관은 무료원문다운이 불가능합니다. 개인회원 가입 후 유료구매를 하시거나 소속기관 도서관에 이용문의해 주세요.

Home

제한대역 음성에서의 화자 인식을 위한 다중 특징 융합 방법
Multiple Feature Fusion for Speaker Recognition from Limited-bandwidth Speech

발행기관

한국차세대컴퓨팅학회 바로가기
간행물

한국차세대컴퓨팅학회 논문지 KCI 등재 바로가기
통권

Vol.21 No.5 (2025.10)바로가기
페이지

pp.22-44
저자

전찬준, 고창환, 전옥엽, 박남인
언어

영어(ENG)
URL

https://www.earticle.net/Article/A475394

원문정보

초록

영어: Speaker recognition technology identifies individuals independently of the speech content based on their unique voice characteristics, such as timbre, pitch, formants, and prosody. These vocal traits are leveraged to reliably authenticate or differentiate between speakers in various applications, offering a robust approach to secure and efficient identity verification. This technology aims to determine whether a voice belongs to a registered speaker, especially in cases where fraudulent activities are common, such as voice phishing. Typically, speaker recognition models are trained on datasets such as VoxCeleb, which are sampled at 16 kHz. However, phone communication often involves a lower sampling rate, specifically 8 kHz. To ensure robust speaker recognition in these environments, it is necessary to develop models that can function effectively even with limitedbandwidth speech data. In this study, we aim to mitigate the degradation in the performance of speaker recognition systems for limited-bandwidth speech by extracting and combining various speech features. Specifically, we extract multiple spectrogram forms(vanilla, mel, linear, and MFCC), as well as features such as the constant Q-transform(CQT) and the CCTZ set, which includes chroma, contrast, tonnetz, and the zero-crossing rate. These features are fused in various configurations to enhance the robustness of the model. The experimental results reveal that the fusion of multiple features outperforms the use of single features alone. Moreover, we observed an approximate 0.65% improvement in the equal error rate(EER) when the model trained on 16 kHz data was tested on 8 kHz speech compared to its performance without such feature combinations. These findings highlight the effectiveness of feature fusion for enhancing speaker recognition for limited-bandwidth speech in real-world telecommunication environments.

한국어: 화자 인식 기술은 음성의 내용과 무관하게 음색, 피치, 포먼트, 운율 등 각 개인의 고유한 음성 특성을 기반으로 사 람을 식별하는 기술이다. 이러한 음성 특성들은 다양한 분야에서 화자를 인증하거나 구별하는 데 효과적으로 활용될 수 있으며, 특히 보이스피싱과 같은 사기 행위가 빈번히 발생하는 분야에서 신뢰성 높은 개인 인증을 제공한다. 일 반적으로 화자 인식 모델은 16 kHz로 샘플링된 VoxCeleb와 같은 데이터셋으로 학습된다. 그러나 실제 전화 통신 환경에서는 주로 8 kHz의 낮은 샘플링 레이트를 사용하는 제한대역 음성이 사용되므로, 이러한 환경에서도 성능 저하 없이 작동 가능한 화자 인식 모델 개발이 필수적이다. 본 논문에서는 제한대역 음성 환경에서 화자 인식 시스템 의 성능 저하를 완화하기 위해 다양한 음성 특징을 추출하고 결합하는 방법을 제안한다. 구체적으로 기본 스펙트로그 램, 멜 스펙트로그램, 선형 스펙트로그램, MFCC와 같은 다양한 스펙트로그램 특징과 상수-Q 변환(Constant-Q Transform, CQT), 그리고 크로마(chroma), 대비(contrast), 톤네츠(tonnetz), 영교차율(zero-crossing rate)을 포함한 CCTZ 특징 세트를 추출하였다. 실험에서는 다양한 조합으로 이 특징들을 융합하여 모델의 강인성 을 향상시키고자 하였다. 실험 결과, 여러 특징들을 융합한 모델이 단일 특징을 사용한 모델보다 우수한 성능을 보 였으며, 특히 16 kHz에서 훈련된 모델을 8 kHz 음성 데이터로 평가했을 때, 특징 융합을 하지 않은 경우 대비 약 0.65%의 등오류율(Equal Error Rate, EER) 개선 효과가 나타났다. 이와 같은 결과는 실제 전화 통신 환경에서 제한대역 음성에 대한 화자 인식 성능을 향상시키는 데 있어서 다중 특징 융합이 매우 효과적임을 시사한다.

요약
Abstract
1. Introduction
2. Feature Extraction and Fusion forSpeaker Recognition from Limited-bandwidthSpeech
2.1 Spectrogram, Mel Spectrogram, Linear Spectrogram, and MFCC
2.2 Constant Q Transform
2.3 Chroma, Contrast, Tonnetz, and Zero-Crossing Rate
2.4 Proposed Framework for SpeakerRecognition from Limited-bandwidth Speech
3. Performance Experiment
4. Conclusions
Acknowledgement
References

키워드

화자 인식 제한대역 음성 특징 융합 화자 임베딩 speaker recognition limited-bandwidth speech feature fusion speaker embedding

저자

전찬준 [ Chanjun Chun | 조선대학교 컴퓨터공학과 ]
고창환 [ Chandhwan Go | 조선대학교 컴퓨터공학과 ]
전옥엽 [ Oc-Yeub Jeon | 국립과학수사연구원 디지털과 ]
박남인 [ Nam In Park | 국립과학수사연구원 디지털과 ] 교신저자

참고문헌

자료제공 : 네이버학술정보

간행물 정보

발행기관

발행기관명

한국차세대컴퓨팅학회 [Korean Institute of Next Generation Computing]
설립연도
2005
분야
공학>컴퓨터학
소개
본 학회는 차세대 PC 및 그 관련분야의 학술활동을 통하여 차세대 PC의 학문 및 기술발전을 도모하고 산업발전 및 국제협력 증진을 목적으로 한다.

간행물

간행물명

한국차세대컴퓨팅학회 논문지 [THE JOURNAL OF KOREAN INSTITUTE OF NEXT GENERATION COMPUTING]
간기
격월간
pISSN
1975-681X
수록기간
2005~2026
등재여부
KCI 등재
십진분류
KDC 566 DDC 004

이 권호 내 다른 논문 / 한국차세대컴퓨팅학회 논문지 Vol.21 No.5

피인용수 : 0건 (자료제공 : 네이버학술정보)

함께 이용한 논문 이 논문을 다운로드한 분들이 이용한 다른 논문입니다.

출처 : 네이버학술정보

0개의 논문이 장바구니에 담겼습니다.

페이지 저장

소속기관 조회

이용자님의 소속기관(단체)이 서비스에 가입되어 있는지 확인해 보십시오.
기관회원에 소속되어 있는 이용자는 원문을 무료로 이용할 수 있습니다.

상호: 주식회사 학술교육원 I 대표: 노방용 I 사업자등록번호: 122-81-88227 I 통신판매업신고번호: 제2008-인천부평-00176호 I 정보보호책임자: 이두영
주소: (21319)인천광역시 부평구 영성중로 50 미래타워 701호 I 전화: 0505-555-0740 I 팩스: 0505-555-0741 I 이메일: earticle@earticle.net

음성지원 및 돋보기 서비스

Earticle

제한대역 음성에서의 화자 인식을 위한 다중 특징 융합 방법
Multiple Feature Fusion for Speaker Recognition from Limited-bandwidth Speech

원문정보

초록

목차

키워드

저자

참고문헌

간행물 정보

발행기관

간행물

이 권호 내 다른 논문 / 한국차세대컴퓨팅학회 논문지 Vol.21 No.5

피인용수 : 0건 (자료제공 : 네이버학술정보)

함께 이용한 논문 이 논문을 다운로드한 분들이 이용한 다른 논문입니다.

Earticle

제한대역 음성에서의 화자 인식을 위한 다중 특징 융합 방법 Multiple Feature Fusion for Speaker Recognition from Limited-bandwidth Speech

원문정보

초록

목차

키워드

저자

참고문헌

간행물 정보

발행기관

간행물

이 권호 내 다른 논문 / 한국차세대컴퓨팅학회 논문지 Vol.21 No.5

피인용수 : 0건 (자료제공 : 네이버학술정보)

함께 이용한 논문 이 논문을 다운로드한 분들이 이용한 다른 논문입니다.

제한대역 음성에서의 화자 인식을 위한 다중 특징 융합 방법
Multiple Feature Fusion for Speaker Recognition from Limited-bandwidth Speech