Unstable Prompt Sensitivity in Few-shot Disease Classification with Small Language Model

Oral Session B-3 : Biomedical Applications

간행물

한국차세대컴퓨팅학회 학술대회 바로가기
권호(발행년)

ICNGC 2025 The 11th International Conference on Next Generation Computing 2025 (2025.12) 바로가기
페이지

pp.267-270
저자

Sihyung Kim, Jaehyun Cha, Siyoung Kim, Yoojoong Kim
언어

영어(ENG)
URL

https://www.earticle.net/Article/A478509

영어: Small Language Models are competitive without large-scale infrastructure, their performance is highly contingent on prompt design. This study analyzes the sensitivity of BitNet b1.58-2B-4T to label exposure and fewshot exemplar composition on a 36-class medical query classification task. We generated 504 items consisting of 6 direct and 8 indirect questions for each disease and after removing cross-exemplar leakage the final evaluation set contained 494 items. With no parameter updates, 0/1/2/5/10- shot prompting was evaluated using Accuracy. Under the nolabel- exposure setting accuracy increased as more exemplars were provided. However, these gains were accompanied by growing prediction concentration on exemplar labels. In contrast with label-exposure, zero-shot achieved the highest accuracy, while the inclusion of exemplars reduced accuracy and amplified label bias. These results show that the structure of the prompt tends to shift few-shot effects from beneficial to detrimental. This highlights the importance of controlled prompt design and domain-adaptive training to ensure trustworthy performance.

Sihyung Kim [ Department of Computer Engineering The Catholic University of Korea Bucheon, South Korea ]
Jaehyun Cha [ Department of Computer Engineering The Catholic University of Korea Bucheon, South Korea ]
Siyoung Kim [ Department of Computer Engineering The Catholic University of Korea Bucheon, South Korea ]
Yoojoong Kim [ School of Computer Science and Information Engineering The Catholic University of Korea Bucheon, South Korea ] Corresponding Author

자료제공 : 네이버학술정보

Earticle