ICNGC 2025 The 11th International Conference on Next Generation Computing 2025 (2025.12)바로가기
페이지
pp.59-60
저자
Jiho Jun, Junhee Seok
언어
영어(ENG)
URL
https://www.earticle.net/Article/A478460
원문정보
초록
영어
We investigate whether Large Language Model (LLM)s can learn strategic reasoning and social deception abilities through Reinforcement Learning (RL) finetuning via a multi-agent “Mafia Game” simulation environment. We finetune a baseline 7B model using Proximal Policy Optimization (PPO) with sparse binary rewards based on game outcomes. Training samples are collected through an opponent pool consisting of different versions of the finetuned model. Our experiment results show that the finetuned model outperforms the baseline model by a significant margin and suggest that strategic capabilities unseen in baseline models emerge.
목차
Abstract I. INTRODUCTION II. METHODOLOGY A. Game Environment Setup B. Training Setup C. Opponent Pool Design III. EXPERIMENTS AND RESULTS IV. CONCLUSION ACKNOWLEDGMENT REFERENCES
키워드
Large Language ModelsReinforcement LearningProximal Policy OptimizationMulti-Agent SystemStrategic Reasoning
저자
Jiho Jun [ School of Electrical Engineering Korea University Seoul, Korea ]
Junhee Seok [ School of Electrical Engineering Korea University Seoul, Korea ]
Corresponding Author