EGE 12TH INTERNATIONAL CONFERENCE ON APPLIED SCIENCES
THE PERFORMANCE COMPARISON OF BOOSTING-BASED MACHINE LEARNING METHODS IN BREAST CANCER RISK ANALYSIS
Yayıncı:
Academy Global Publishing House
Breast cancer is one of the most common types of cancer in women, and early diagnosis and determination of risk factors are of critical importance for the prevention and management of the disease. In this study, two different machine learning models (Extreme Gradient Boosting; XGBoost, Stochastic Gradient Boosting; SGB) were developed to estimate the risk of breast cancer and to determine possible risk factors. The data used in the study were taken from the UCI machine learning repository and consisted of age (years), BMI (kg/m2 ), glucose (mg/dL), insulin (µU/mL), HOMA, leptin (ng/mL), adiponectin (µg/mL), resistin (ng/mL) and MCP-1 (pg/dL) clinic features. 5-fold cross-validation from resampling methods was applied in the study. While the accuracy value for the XGBoost model in predicting the presence of breast cancer was 91.38%, the accuracy value for the SGB model was obtained as 87.93. Depending on the XGBoost model that can best classify breast cancer, the variables included in the model were determined as glucose (mg/dL), resistin (ng/mL), age (years), BMI (kg/m2 ), HOMA, Adiponectin (µg/mL), Leptin (ng/mL), Insulin (µU/mL) and MCP-1 (pg/dL) according to their significance values. In the light of these findings, it is revealed that the integration of machine learning methods with biomedical data can be an effective tool in breast cancer risk estimation and determination of risk factors. In particular, the superior performance of XGBoost is promising for individualized risk assessment systems. The findings obtained may provide significant contributions to the development of strategies for the prevention of the disease and clinical decision support systems.