Accurate Facial Emotion Recognition via Attention-based Dual-Backbone and Knowledge Distillation

Oral Session Ⅱ 멀티미디어/의료

영어: Facial emotion recognition (FER) has gained increasing attention in human–computer interaction and affective computing; however, existing methods often suffer from high computational cost and limited generalization, especially in real-world scenarios with subtle expressions and noisy inputs. To cope with these issues, this study proposes a knowledge distillation-based framework for FER. The teacher network utilizes a parallel architecture of NasNetMobile and MobileNet as dual backbones for comprehensive feature extraction, further, these features are enhanced by a Deformable Attention (DA) module, which primarily focuses spatial feature representations. To transfer this rich knowledge effectively, we introduce a lightweight student model, TinyNasNet, inspired by the internal architecture of NasNetMobile. In this framework, the student model is trained to mimic the behavior of the teacher network, aiming to achieve higher performance while maintaining computational complexity. Moreover, extensive experiments were conducted over two benchmarks, such as FER and KDEF. In contrast, the proposed network offers higher performance compared to various competitive networks, demonstrating a highly efficient yet robust solution for real-time FER.

Taimoor Khan [ Department of Computer Engineering, IT Convergence, Gachon University ]
Abdul Hai Karimi [ Department of Computer Engineering, IT Convergence, Gachon University ]
Chang Choi [ Department of Computer Engineering, IT Convergence, Gachon University ] 교신저자

자료제공 : 네이버학술정보

Earticle