Earticle

다운로드

Utterance-Level Speech Emotion Recognition using Parallel Convolutional Neural Network with Self- Attention Module

원문정보

초록

영어
Automated speech emotion recognition (SER) by efficient long-term temporal context modeling is a challenging task of the digital audio signal processing domain. However, by default, the recurrent neural network (RNN) is employed to incorporate the temporal dependencies in sequence to investigate the relationships among sequences and features. In this study, we design a parallel convolutional neural network (PCNN) for SER by using a squeeze and excitation network (SEnet) with the self-attention module. Additionally, we adopt the residual learning strategy in both module, SEnet and self-attention, which is further improve the performance of the network. Our proposed SER system utilizes speech spectrogram as input and extracts utterancelevel discrete features by using the PCNN model. We experimentally evaluated our proposed system by standard speech corpus, interactive emotional dyadic motion capture (IEMOCAP). The prediction result reveals the significance and robustness of the proposed PCNN system, which obtained a high recognition rate of 72.01% over state-of-the-art (SOTA) methods.

목차

Abstract
I. INTRODUCTION
II. PROPOSED PCNN-BASED SER SYSTEM
A. SEnet Module
B. Self-Attention
III. RESULTS & DISCUSSION
IV. CONCLUSION & FUTURE DIRECTION
ACKNOWLEDGEMENT
REFERENCES

저자

  • Mustaqeem [ Interaction Technology Laboratory, Department of Software, Sejong University ]
  • Muhammad Ishaq [ Interaction Technology Laboratory, Department of Software, Sejong University ]
  • Guiyoung Son [ Interaction Technology Laboratory, Department of Software, Sejong University ]
  • Soonil Kwon [ Interaction Technology Laboratory, Department of Software, Sejong University ] Corresponding Author

참고문헌

자료제공 : 네이버학술정보

    간행물 정보

    • 간행물
      한국차세대컴퓨팅학회 학술대회
    • 간기
      반년간
    • 수록기간
      2021~2025
    • 십진분류
      KDC 566 DDC 004