The 10th International Conference on Next Generation Computing 2024 (2024.11)바로가기
페이지
pp.102-104
저자
Amjid Ali, Noman Khan, Zulfiqar Ahmad Khan, Su Min Lee, Min Je Kim, Sung Wook Baik
언어
영어(ENG)
URL
https://www.earticle.net/Article/A468810
원문정보
초록
영어
Recognizing anomalies in surveillance is crucial for public safety to identify events that deviate from normal patterns. Visual information is essential for effective anomaly recognition; however, audio data can enhance recognition accuracy by providing additional context. Despite this, existing systems only utilize visual information, overlooking the potential of audio modalities in anomaly recognition. This paper introduces a multi-modal framework for anomaly recognition through active learning, integrating audio and visual modalities to enhance anomaly prediction. The framework extracts features using a pretrained ResNet-50 convolutional neural network (CNN) model from the visual and audio data. The extracted features are then forwarded to the Bi-Directional Long Short-Term Memory (Bi-LSTM) network for temporal feature learning. These features are then fused and fed into a classification layer for final prediction. The proposed framework's performance is assessed on a benchmark dataset and yields promising results.