Reversing Attention Mechanisms in Transformers to Improve Object Detection Performance

Poster Session I : Next Generation Computing Applications I

간행물

한국차세대컴퓨팅학회 학술대회 바로가기
권호(발행년)

The 10th International Conference on Next Generation Computing 2024 (2024.11) 바로가기
페이지

pp.117-119
저자

Chan-Young Choi, Sung-Yoon Ahn, Abrar Alabdulwahab, Joo-Hee Oh, Sang-Woong Lee
언어

영어(ENG)
URL

https://www.earticle.net/Article/A468823

영어: Recent advancements in object detection increasingly leverage end-to-end transformer architectures. However, many studies in this domain have applied transformer structures, originally designed for natural language processing, directly to object detection models. This direct application can lead to issues such as skipping self-attention in first decoder layer and the prediction of duplicate objects during training. In this study, we propose a novel approach to address these challenges by reversing the attention order in the transformer decoder from the self-cross to a cross-self structure. This modification structurally prevents the initial attention skip and mitigates the issue of predicting the same object multiple times by delaying the implementation of self-attention. Experimental results demonstrate that reversing the attention order in the decoder improves both the training loss and test performance across all stages of the learning process.

자료제공 : 네이버학술정보

Earticle