Earticle

다운로드

An Improved Conversion Model for Enhancing Silent Speech Performance

원문정보

초록

영어
Among individuals who have difficulty phonating due to laryngectomy or voice disorders, the need for silentspeech- based communication technologies is steadily increasing. Recent studies reconstruct acoustic speech from silent speech by extracting audio features from electromyography (EMG) signals with a transduction model, aligning these features with those from phonated speech, and decoding the aligned representations. Speech generated by this approach typically contains substantial noise and exhibits weak articulation and indistinct phonation. In addition, because speaker-specific voice information is modeled as a whole rather than disentangled, personalized adaptation is difficult. To improve the naturalness and articulation of synthesized speech, we adopt Diff-HierVC, a diffusion-based hierarchical voice conversion architecture, and modify the original design, which predicted targets using only phonated speech, so that target acoustic representations are predicted from EMG signals. We train the model with three disentangled features: content (w2v), mel-spectrogram, and pitch (f0), enabling voice conversion for silent speech. We also compare it with a baseline model that does not use Diff-HierVC in a listening test. The results show that the proposed model significantly improves perceived speech naturalness over the baseline.

목차

Abstract
I. INTRODUCTION
II. ARCHITECTURE OF THE PROPOSED METHOD
A. Pitch/Mel-spectrogram feature
B. Contents feature
C. Convolution Block
D. CBAM(Convolutional Block Attention Module)
E. Transformer Encoder
III. RESULTS
IV. CONCLUSION
ACKNOWLEDGMENT
REFERENCES

저자

  • Chae-Yeon Song [ Department of Electrical and Computer Engineering Inha University Incheon, South Korea ]
  • Srinidhi Kanagachalam [ Department of Electrical and Computer Engineering Inha University Incheon, South Korea ]
  • Deok-Hwan Kim [ Department of Electrical and Computer Engineering Inha University Inha University IPCC Incheon, South Korea ] Corresponding Author

참고문헌

자료제공 : 네이버학술정보

    간행물 정보

    • 간행물
      한국차세대컴퓨팅학회 학술대회
    • 간기
      반년간
    • 수록기간
      2021~2025
    • 십진분류
      KDC 566 DDC 004