Purpose: The central aim of this study is to leverage machine learning techniques for the classification of Intrusion Detection System (IDS) data, with a specific focus on identifying the variables responsible for enhancing overall performance. Method: First, we classified ‘R2L(Remote to Local)’ and ‘U2R (User to Root)’ attacks in the NSL-KDD dataset, which are difficult to detect due to class imbalance, using seven machine learning models, including Logistic Regression (LR) and K-Nearest Neighbor (KNN). Next, we use the SHapley Additive exPlanation (SHAP) for two classification models that showed high performance, Random Forest (RF) and Light Gradient-Boosting Machine (LGBM), to check the importance of variables that affect classification for each model. Result: In the case of RF, the 'service' variable and in the case of LGBM, the 'dst_host_srv_count' variable were confirmed to be the most important variables. These pivotal variables serve as key factors capable of enhancing performance in the context of classification for each respective model. Conclusion: In conclusion, this paper successfully identifies the optimal models, RF and LGBM, for classifying 'R2L' and 'U2R' attacks, while elucidating the crucial variables associated with each selected model.
목차
ABSTRACT Introduction Datasets and Related Research NSL-KDD Dataset XAI(eXplainable Artificail Intelligence) - SHAP Experiment and Results Experimental Dataset Preprocessing Create and Evaluate Classification Models Variable Importance by Model Interpretation of results Conclusion References
키워드
NSL-KDDRemote to Local (R2L)User to Root (U2R)eXplainalble Artificial Intelligence (XAI)SHapley Additive exPlanation (SHAP)
저자
이상덕 [ Sang-duk Lee | Ph.D Candidate, Big Data Collaborative. 138, Central Police Academy, Suhoeri- ro, Suanbo-myeon, Chungju-si, Chungcheongbuk-do, Republic of Korea ]
김대규 [ Dae-gyu Kim | Ph.D Candidate, Department of IT Convergence and Application Engineering, Pukyong National University, Busan, Republic of Korea ]
김창수 [ Chang Soo Kim | Professor, Department of IT Convergence and Application Engineering, Pukyong National University, Busan, Republic of Korea ]
Corresponding Author