Abstract— Having a compact and efficient ML model to predict the outcome of mechanically ventilated patients can reduce financial burden while the decision-making processes can be shortened among healthcare professionals. Three machine learning algorithms were utilized to perform a prediction task, namely eXtreme Gradient Boosting (XGBOOST), Random Forest (RF), and Support Vector Machine (SVM). SHapley Additive exPlanations (SHAP) method was adopted to find the most important features to discriminate patient outcomes. The dataset for this study was publicly available, MIMIC-III, and relevant features were retrieved from the study conducted by Nguyen et al (2024). In this study, a set of 12,489 patients and 68 variables was included in the dataset, where an imbalanced class distribution was observed. To address this issue, a random undersampling (RUS) method was applied, thereby significantly reducing the size of the training and test sets.
Another method - a class weight (CW) was adopted to maintain the same number of observations while imbalanced class distribution was handled. As a result, the XGB model yielded the highest AUC score overall across both situations. Furthermore, after reducing nearly 60% of the total features, XGB still yielded a similar AUC score compared to the model using all 67 dimensions. The calibration curve for XGB in the case of the CW showed the closest alignment to the perfect line.
Keywords— Machine learning models, SHAP, Feature selection, Model performance
JINGHAO CHEN on LinkedIn: #ccece2024