ICNGC 2025 The 11th International Conference on Next Generation Computing 2025 (2025.12)바로가기
페이지
pp.111-112
저자
Yousung Yeon, Chang Choi
언어
영어(ENG)
URL
https://www.earticle.net/Article/A478472
원문정보
초록
영어
Video surveillance is widely used for public safety, but anomalous behaviors often manifest patterns similar to normal ones, making detection difficult. Conventional approaches reconstruct full frames into 3D to learn global structure; however, they have the limitation of greatly increased computation due to redundant information in adjacent frames. This paper proposes a method that reduces the number of frames in powers of two and compares performance and training efficiency with the full-frame approach. Based on the UCF-Crime trimmed dataset, we trained a Video Vision Transformer (ViViT); compared to the full-frame baseline, accuracy differed from −0.74% to +1.27%, while training time was shortened by up to 3.8×. These results suggest that, within the range that preserves global structure, frame reduction can serve as an efficient alternative for video anomaly detection.
목차
Abstract I. INTRODUCTION II. RELATED WORK A. Dataset Preprocessing B. Model Training C. Evaluation Metrics IV. EXPERIMENTAL RESULTS V. CONCLUSION ACKNOWLEDGMENT REFERENCES
키워드
Video anomaly detectionFrame reductionVision TransformerTraining efficiency
저자
Yousung Yeon [ Department of Computer Engineering Gachon University Seongnam-si, Republic of Korea ]
Chang Choi [ Department of Computer Engineering Gachon University Seongnam-si, Republic of Korea ]
Corresponding Author