This study compares two embedding-based natural language processing techniques—Sentence-BERT (SBERT) combined with HDBSCAN clustering and BERTopic modeling—for detecting complex emotions in short Korean online comments. Using 33,531 comments collected from a YouTube relationship counseling channel, we examined how each method captures nuanced and overlapping sentiments such as affection, avoidance, and conflict. Both models used identical SBERT embeddings and UMAP-based dimensionality reduction, and their clustering performance was quantitatively evaluated using Silhouette Score, Davies–Bouldin Index (DBI), and Calinski–Harabasz Index (CHI). The results show that BERTopic achieved higher coherence and clearer topic boundaries (Silhouette = 0.40, DBI = 0.85, CHI = 15,157) compared to SBERT–HDBSCAN (Silhouette = –0.23, DBI = 1.49, CHI = 1,230). Although both methods yielded high noise ratios due to the leaf-based density clustering, BERTopic effectively reclassified semantically relevant comments through its ClassTF-IDF weighting, improving topic stability and interpretability. These findings suggest that BERTopic provides superior performance for analyzing short, emotion-rich Korean text and offers methodological insight for future sentiment analysis research. This electronic document is a “live” template and already defines the components of your paper [title, text, heads, etc.] in its style sheet.
목차
Abstract I. INTRODUCTION II. METHODS A. Data Composition and Preprocessing B. Analytical Procedure C. Evaluation III. RESULTS A. HDBSCAN Clustering Based on Sentence-BERT B. BERTopic Modeling IV. CONCLUSION REFERENCES
저자
Jiwon Kim [ Data and Knowledge Service Engineering Dankook University Gyeonggi-do, South Korea ]
Eungkyo Suh [ Data and Knowledge Service Engineering Dankook University Gyeonggi-do, South Korea ]