Earticle

현재 위치 Home

머신러닝을 활용한 기업 신용평가모형 및 주요 재무변수 분석
Analysis on Corporate Credit Scoring Models and Key Financial Variables Using Machine Learning

첫 페이지 보기
  • 발행기관
    한국재무학회 바로가기
  • 간행물
    재무연구 KCI 등재 SCOPUS 바로가기
  • 통권
    제38권 제3호 (2025.08)바로가기
  • 페이지
    pp.1-36
  • 저자
    전새봄, 권태연
  • 언어
    한국어(KOR)
  • URL
    https://www.earticle.net/Article/A471309

※ 기관로그인 시 무료 이용이 가능합니다.

7,900원

원문정보

초록

영어
Credit scoring is essential for assessing financial soundness and serves as a fundamental tool for loan screening, capital allocation, and risk management in financial institutions. The accuracy and reliability of credit scoring models are directly linked to financial system stability, making their continuous improvement essential. Traditional models primarily rely on Generalized Linear Models (GLM), particularly Logistic Regression. While these models provide interpretable relationships between financial variables and default risk, they are constrained by their linear functional form and reliance on a limited set of features. This restricts their adaptability to evolving financial markets and the increasing availability of unstructured data sources. Advancements in machine learning (ML) and artificial intelligence (AI) have introduced various models to enhance predictive accuracy and address the limitations of conventional credit scoring models. ML-based approaches such as Random Forest, Support Vector Machines (SVM), XGBoost, and LightGBM, along with deep learning techniques, have been widely applied to credit risk modeling. These methods process large volumes of financial and transactional data, capturing complex patterns in credit risk assessment. However, their adoption requires further validation regarding interpretability and regulatory compliance. This study makes four key contributions to credit scoring research. First, unlike previous studies that relied on subjectively selected financial variables, we incorporate all financial features collected by credit agencies and adopt a data-driven selection approach, minimizing researcher bias and ensuring greater objectivity. This enables us to identify the most relevant predictors based on empirical evidence rather than predetermined assumptions. Second, we address the class imbalance issue, a common challenge in credit risk modeling. Since default cases are rare, traditional logistic regression models often suffer from biased estimates, where the model underweights defaulting firms. To mitigate this, we apply the Synthetic Minority Oversampling Technique (SMOTE) to balance the dataset before applying ML techniques. Third, we integrate multiple ML techniques to derive a comprehensive interpretation of feature importance. Specifically, we compare classification performance across Random Forest, Extreme Gradient Boosting (XGBoost), and Category Boosting (CatBoost). Unlike prior studies that analyze a single ML model independently, our approach integrates feature importance rankings across multiple models, providing a more robust estimation of the importance of financial variables in credit risk. Fourth, while ML models enhance predictive accuracy, their complexity can hinder interpretability, making adoption challenging for financial institutions. This study emphasizes the importance of explainable AI (XAI) in credit scoring. By applying Shapley Additive Explanations (SHAP), we provide insights into how key financial variables influence credit risk and default probabilities, offering practical guidance on the appropriateness of financial variables and threshold settings used in credit scoring. This study analyzes credit scoring data of manufacturing firms evaluated by Korea Enterprise Assessment from 2010 to 2024. By applying multiple ML techniques, we identify key financial variables influencing credit risk and integrate results for a comprehensive interpretation. Our analysis highlights differences between realized credit risk, which reflects actual defaults and missed payments, and implied credit risk, which is assessed by the current credit risk model. Realized credit risk is primarily driven by short-term liquidity and profitability indicators, such as inventory turnover period, current ratio, return on equity, and return on capital employed. In contrast, implied credit risk is largely influenced by firm size and long-term financial stability, with key variables including EBITDA, cost-to-sales ratio, pre-tax continuous operating income, total sales, and total liabilities. These findings suggest that while current credit scoring models emphasize long-term financial health, actual credit events are more influenced by short-term financial constraints. This discrepancy underscores the need to supplement credit scoring models by incorporating financial variables, particularly those related to short-term liquidity, especially for high-risk firms. Further analysis reveals that the importance of financial variables varies across rating levels. For A-level firms, short-term financial stability and debt repayment capacity are critical, emphasizing the importance of liquidity management. In contrast, B-level firms are more affected by structural financial indicators such as the debt-to-equity ratio and capital adequacy ratio, highlighting the significance of long-term solvency and debt management. These differences underscore the need to tailor credit scoring criteria based on risk levels. SHAP results indicate that while higher debt-to-equity and capital growth ratios generally reduce the likelihood of default, their impact on credit risk is nonlinear. This suggests that simple threshold-based classification may be insufficient for credit scoring. Instead, a more nuanced approach that accounts for interactions between financial indicators and their varying effects across credit risk levels is needed. Beyond feature importance analysis, we examine credit transitions. Credit scores evolve based on firms' financial conditions. Our findings show that while most firms maintain stable credit scores, downgrades occur more frequently than upgrades, particularly within the B-level category between 2022 and 2023. While some A-level firms experienced rating upgrades between 2019 and 2022, the trend shifted toward downgrades from 2022 to 2023. These patterns highlight the need for dynamic credit transition models that account for temporal changes in creditworthiness.
한국어
현대 금융시장에서 신용평가는 금융건전성 평가뿐만 아니라 금융 기관의 대출 심사와 리스크 관리에도 필수적이다. 그러나 금융 환경이 급변하고 머신러닝 기술이 발전함에 따라 기존 신용평가 모형은 한계가 드러나고 있다. 본 논문에서는 2010년부터 2024년 까지 한국기업평가에서 부여한 제조업 기업들의 신용등급 데이터를 바탕으로 신용평 가 모형의 개선 방향을 논의하였다. 본 논문은 데이터 탐색을 통해 기존 신용평가 모형의 문제점을 파악하고, 다양한 머신러닝 기법을 적용하여 신용평가 모형을 개선하 고자 하였다. Random Forest, XGBoost, CatBoost를 활용해 주요 재무 변수의 중요도를 분석하고 신용위험 예측력을 향상시키는 데 초점을 맞추었다. 또한, 데이터 불균형 문제를 해결하기 위해 SMOTE를 적용하고, XAI 기법인 SHAP을 활용하여 신용등급 산정에 사용되는 재무 변수와 임계값 설정의 적정성을 평가하였다. 분석 결과, 실현된 신용위험과 기존 평가 방식에서 결정된 내재적 신용위험을 설명하는 주요 재무 변수가 다름을 확인하였다. 이는 특히 고 신용위험 기업의 평가 기준을 재정립할 필요성을 시사한다. 본 연구는 머신러닝 기반 신용평가 모형의 개선 가능성 을 제시하며, 금융 기관이 보다 정교한 신용위험 관리 전략을 수립하는 데 기여할 수 있다.

목차

요약
Abstract
Ⅰ. 서론
Ⅱ. 데이터 탐색
1. 재무특성변수
2. 전 산업 데이터
Ⅲ. 신용위험 모형과 머신러닝방법의 적용
1. 모형의 설정
2. 머신러닝 방법의 적용
Ⅳ. 특성변수의 중요도 비교결과
1. 특성변수의 중요도 비교
2. SHAP
Ⅴ. 결론 및 제언
References
<부록>

키워드

신용등급 랜덤 포레스트 익스트림 그래디언트 부스팅 카테고리 부스팅 샤플리 값 Credit Scoring Random Forest XGBoost CatBoost SHAP

저자

  • 전새봄 [ Saebom Jeon | 목원대학교 마케팅빅데이터학과 부교수 ]
  • 권태연 [ Tae Yeon Kwon | 한국외국어대학교 국제금융학과 부교수 ] 교신저자

참고문헌

자료제공 : 네이버학술정보

간행물 정보

발행기관

  • 발행기관명
    한국재무학회 [The Korean Finance Association]
  • 설립연도
    1988
  • 분야
    사회과학>경영학
  • 소개
    본 회는 재무학 및 이와 관련되는 분야를 발전시키며 회원 상호간의 친목 도모를 목적으로 한다.

간행물

  • 간행물명
    재무연구 [Asian Review of Financial Research]
  • 간기
    계간
  • pISSN
    1229-0351
  • eISSN
    2713-6531
  • 수록기간
    1988~2026
  • 등재여부
    KCI 등재,SCOPUS
  • 십진분류
    KDC 325 DDC 330

이 권호 내 다른 논문 / 재무연구 제38권 제3호

    피인용수 : 0(자료제공 : 네이버학술정보)

    함께 이용한 논문 이 논문을 다운로드한 분들이 이용한 다른 논문입니다.

      페이지 저장