The 10th International Conference on Next Generation Computing 2024 (2024.11)바로가기
페이지
pp.187-190
저자
Deoghwa KIM, Han Wang, Deok-Hwan Kim
언어
영어(ENG)
URL
https://www.earticle.net/Article/A468840
원문정보
초록
영어
This paper proposed a Feature-level fusion technique that combines facial expression and audio modalities for multimodal emotion recognition. The learning model utilizes a hybrid approach combining CNN and LSTM to learn the spatiotemporal characteristics of video and audio modalities effectively. Compared to a unimodal approach, speech emotion recognition achieved 74% accuracy, and facial emotion recognition achieved 83% accuracy, while the proposed multimodal approach achieved 93% accuracy, demonstrated that multimodal emotion recognition is more accurate than unimodal emotion recognition. Furthermore, in tests using the RAVDESS dataset, the proposed model achieved higher emotion recognition rates compared to related studies. This study demonstrated the possibility of multimodal emotion recognition and designed a model capable of recognizing emotions in various environments and situations. Through this, we aim to contribute to the advancement of emotion recognition technology.
목차
Abstract I. INTRODUCTION II. RELATED WORK A. Facial Emotion Recognition B. Speech Emotion Recognition C. Multimodal(Speech + Facial Emotion Recognition, Facial+ EEG Emotion Recognition) III. PROPOSED METHOD A. Preprocessing Process for Video and Audio Data B. Structure of Proposed Model IV. EXPERIMENTS A. Used DATASET B. EXPERIMENTS RESULTS V. CONCLUSION ACKNOWLEDGMENT REFERENCES