ICNGC 2025 The 11th International Conference on Next Generation Computing 2025 (2025.12)바로가기
페이지
pp.113-115
저자
JaeBong Choi, NamGyu Jung, Chang Choi
언어
영어(ENG)
URL
https://www.earticle.net/Article/A478473
원문정보
초록
영어
Visual Question Answering (VQA) models suffer from a language bias problem, where they excessively rely on textual correlations. This study proposes a plausible counterfactual data generation method, named Plausible Counterfactual Data Generation (PCDG), which utilizes Grad- CAM-based visual importance to replace significant objects in a contextually appropriate manner. By synthesizing more contextually relevant samples than other existing augmentation methods, PCDS effectively strengthens visual-language alignment. In experiments on the VQA-CP v2 benchmark, our method achieved significant performance improvements, particularly a 10.56% increase in the 'Num' category and a 2.78% increase in the 'Other' category. This indicates that the proposed method enhances the VQA model's generalization ability and robustness through debiasing.
목차
Abstract I. INTRODUCTION II. RELATED WORK A. Retrieval Visual Contrastive Decoding B. Counterfactcal sample synthesis III. METHOD A. Visual Importance B. Dynamic Counterfactual Image Generation IV. EXPERIMENTS A. Experimental Settings B. Training C. Results V. CONCLUSION VI. FUTURE WORK ACKNOWLEDGMENT
키워드
Visual Question AnsweringExplainable AIData AugmentationDebiasing
저자
JaeBong Choi [ Department of Computer Engineering Gachon University Seongnam 1342, Gyeonggi, Republic of Korea ]
NamGyu Jung [ Department of Computer Engineering Gachon University Seongnam 1342, Gyeonggi, Republic of Korea ]
Chang Choi [ Department of Computer Engineering Gachon University Seongnam 1342, Gyeonggi, Republic of Korea ]
Corresponding Author