Challenges in Implementing Vision Transformer as a Detection Transformer

Session Ⅴ: Best papers

간행물

한국차세대컴퓨팅학회 학술대회 바로가기
권호(발행년)

The 9th International Conference on Next Generation Computing 2023 (2023.12) 바로가기
페이지

pp.174-177
저자

Chan-Young Choi, Sung-Yoon Ahn, Sang-Woong Lee
언어

영어(ENG)
URL

https://www.earticle.net/Article/A448144

영어: In recent object detection research, there has been a growing focus on Detection Transformers predicting bounding boxes directly. However, Detection Transformers face challenges such as slow convergence and difficulty in detecting small objects. We attribute these issues to the insufficient feature extraction capability of the backbone. Therefore, we employ the high-performing backbone, the Pyramid Pooling Transformer to detection Transformer. However, we observe a problem where, despite rapid initial convergence, the model fails to converge effectively after a certain point in training. We discuss the underlying causes of this issue in this study.

자료제공 : 네이버학술정보

Earticle