| Abstract: |
Peptides are potent antimicrobial agents that offer promising alternatives to conventional antibiotics in addressing the global challenge of antibiotic resistance. Owing to the remarkable diversity of antimicrobial peptides (AMPs) and recent advances in computational biology, the development of robust machine learning algorithms for AMP classification has become increasingly crucial. In this study, we introduce AMP-zGSM, a novel statistical feature-ranking model designed to enhance AMP classification. The AMP-zGSM framework ranks feature groups based on q-values derived from statistical z-scores. The model was trained on three large peptide datasets containing 3145, 12022, and 8346 peptides, respectively. Multiple machine learning algorithmsincluding Random Forest (RF), Support Vector Machine (SVM), Nave Bayes, XGBoost, AdaBoost, CatBoost, GradientBoost, and their ensemble variantswere trained and evaluated using feature subsets selected according to these q-values. Compared with models employing feature subsets obtained through traditional feature selection techniques (XGBoost, SelectKBest, XGB+SHAP, Information Gain Ratio, and mRMR), the AMP-zGSM model achieved comparable performance across all datasets. Notably, when compared with the three competing models, AMP-zGSM consistently demonstrated superior performance across the evaluated datasets, achieving the highest AUC values of 0.9737, 0.8846, and 0.97 on Datasets 1, 2, and 3, respectively. The datasets and AMP-zGSM model code developed in this study are publicly available at the following link: [https://github.com/DemetParlakSonmez/amp-zGSM]. |