Abstract
I. INTRODUCTION
II. RELATED WORK
A. Vision–Language Models for Image Captioning
B. Medical Image Captioning
C. Optimization with Reinforcement Learning
D. Motivations for this Work
III. METHODOLOGY
A. MedSwinGPT Model Architecture
B. Self-Critical Sequence Training (SCST)
IV. EXPERIMENTS
A. Dataset
B. Training Details
C. Evaluation Setup
D. Evaluation Metrics
V. RESULTS AND DISCUSSION
A. Quantitative Analysis
B. Qualitative Analysis
VI. CONCLUSION
ACNKOWLEDEMENT
REFERENCES