Comparative Study of Feature Selection Methods for Cat Boost-Based Heart Disease Prediction

المؤلفون

  • NAJEM A FARAJ 1*, and MUHAMMED S HAMEDr2 , Akram Gihedan3 Faculty of Science, Department of Computer Science, University of Derna, Libya ، المؤلف

الكلمات المفتاحية:

Heart Disease Prediction, Feature Selection, CatBoost, Machine Learning, Clinical Decision Support, Cardiovascular Informatics

الملخص

Since cardiovascular disease continues to be one of the world's top causes of mortality, precise diagnostic tools are vital.. While learning models, such as CatBoost, are still in development and hold promise for cardiac prediction, the optimal strategy is less effective and remains underexplored. In order to determine the best strategy for enhancing CatBoost-based heart disease prediction, this work performs a thorough comparison analysis of several feature selection techniques. We evaluated six distinct feature selection methods—holistic filter models (information gain, chi-square), wrapper models (redundant feature removal), and embedded models (LASSO, Random Forest Feature Importance, CatBoost Feature Importance)—using the publicly available Cleveland Cardiology dataset. The dataset was preprocessed, and the performance of the CatBoost classifier with each feature subset was evaluated using standard metrics including accuracy, precision, recall, and F1 score. Our results demonstrate that feature selection significantly improves model performance over the baseline (all 13 featuresWith just seven features chosen, the combined approach utilizing CatBoost feature importance measurements (CB-FI) demonstrated its superiority by reaching a maximum accuracy of 88.8% and an F1 score of 89.8%. This approach fared better than filter-based approaches and LASSO (accuracy of 87.6%). The best methods agreed on identifying a core set of clinically relevant features: chest pain type (cp), thallium scan (thal), number of major vessels (ca), ST-segment depression (oldpeak), maximum heart rate (thalach), and exercise-induced angina (exang).The study demonstrates that feature selection, particularly using classifier intrinsic importance measures (CB-FI), is critical for developing high-performance and effective heart disease prediction models. Based on a clinically interpretable, integrated feature set, the resulting economic model offers a strong basis for developing dependable and reasonably priced clinical decision support systems to help with the early diagnosis of heart disease.

التنزيلات

تنزيل البيانات ليس متاحًا بعد.

التنزيلات

منشور

2025-09-16

كيفية الاقتباس

Comparative Study of Feature Selection Methods for Cat Boost-Based Heart Disease Prediction . (2025). مجلة العلوم الشاملة, 9(36), 95-103. https://cjos.histr.edu.ly/index.php/journal/article/view/531