Earticle

다운로드

A Modified Vision Transformer-based Anomaly Recognition using Audio Data

원문정보

초록

영어
In recent years, anomaly recognition using audio has attracted the attention of the research community, due to the increasing number of abnormal situations day by day. In the past, researchers have mainly focused on video-based anomaly recognition. However, occlusion is one of the most important factors due to which the anomalous object is unidentifiable. Therefore, in this paper, we proposed a modified vision transformer that utilized the Shifted Patch Tokenization (SPT), and Local Self-Attention (LSA) mechanism and reduced the number of multilayer perceptrons in the head, enabling the model to capture rich spatial information within the spectrogram of anomalous data. The proposed model is implemented using the Sound Events for Surveillance Applications (SESA) dataset and obtained 87% testing accuracy. Thus, the proposed model is an efficient and effective solution for audio-based anomaly recognition.

목차

Abstract
1. Introduction
2. Research methodology
3. Results and discussion
3.1. Dataset
3.2. Experiment setup
3.3. Experiment results
4. Conclusions
Acknowledgment
References

저자

  • Hikmat Yar [ Sejong University ]
  • Amjid Ali [ Sejong University ]
  • Zulfiqar Ahmad Khan [ Sejong University ]
  • Noman Khan [ Sejong University ]
  • Min Je Kim [ Sejong University ]
  • Su Min Lee [ Sejong University ]
  • Sung Wook Baik [ Sejong University ] Corresponding Author

참고문헌

자료제공 : 네이버학술정보

    간행물 정보

    • 간행물
      한국차세대컴퓨팅학회 학술대회
    • 간기
      반년간
    • 수록기간
      2021~2025
    • 십진분류
      KDC 566 DDC 004