PhyIndBC: Development of a machine learning tool for screening of potential breast cancer inhibitors from phytochemicalsGitHub
Breast cancer is the foremost contributor to cancer-related mortality among women on a global scale. However, its treatment encounters challenges compounded by the disease's complexity. A promising avenue in the quest for effective therapeutics lies within the realm of phytomolecules, which are...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2025-06-01
|
Series: | Current Plant Biology |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2214662825000039 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1825199444020166656 |
---|---|
author | Agneesh Pratim Das Subhash M. Agarwal |
author_facet | Agneesh Pratim Das Subhash M. Agarwal |
author_sort | Agneesh Pratim Das |
collection | DOAJ |
description | Breast cancer is the foremost contributor to cancer-related mortality among women on a global scale. However, its treatment encounters challenges compounded by the disease's complexity. A promising avenue in the quest for effective therapeutics lies within the realm of phytomolecules, which are characterized by their chemical diversity and biological potential. Thus, in the current study a machine learning (ML) model was created using phytomolecules having inhibitory activity against breast cancer cell lines. Multiple ML techniques viz., k-nearest neighbor (KNN), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGB) were combined with various molecular fingerprints (MACCS and Morgan2) to develop multiple predictive models. Among these models, the RF algorithm coupled with the MACCS fingerprint emerged as the best performing model. Mean decreases in impurity, t-SNE analysis, and k-means clustering was studied to determine the important features and understand chemical space diversity. Further, to predict potential breast cancer inhibitors, ADMET adherent Natural Products (NPs) of plant origin (identified from the COCONUT database) were screened using the developed ML model. NPs predicted as actives were further screened via ensemble virtual screening (eVS) technique against erb-b2 receptor tyrosine kinase 2 (HER2), to identify high-affinity molecules against this breast cancer drug target. In summary, the validated machine learning model developed in this study has been incorporated into a freely available standalone package named PhyIndBC (https://github.com/subhashmagarwal/PhyIndBC) which can be used for virtual screening and predicting breast cancer inhibitors of plant origin. |
format | Article |
id | doaj-art-b1cc40e1f9494489b3e817f93124c28e |
institution | Kabale University |
issn | 2214-6628 |
language | English |
publishDate | 2025-06-01 |
publisher | Elsevier |
record_format | Article |
series | Current Plant Biology |
spelling | doaj-art-b1cc40e1f9494489b3e817f93124c28e2025-02-08T05:00:30ZengElsevierCurrent Plant Biology2214-66282025-06-0142100435PhyIndBC: Development of a machine learning tool for screening of potential breast cancer inhibitors from phytochemicalsGitHubAgneesh Pratim Das0Subhash M. Agarwal1Bioinformatics Division, ICMR-National Institute of Cancer Prevention and Research, I-7, Sector-39, Noida, Uttar Pradesh 201301, IndiaBioinformatics Division, ICMR-National Institute of Cancer Prevention and Research, I-7, Sector-39, Noida, Uttar Pradesh 201301, India; The Academy of Scientific and Innovative Research, AcSIR, India; Corresponding author at: Bioinformatics Division, ICMR-National Institute of Cancer Prevention and Research, I-7, Sector-39, Noida, Uttar Pradesh 201301, India.Breast cancer is the foremost contributor to cancer-related mortality among women on a global scale. However, its treatment encounters challenges compounded by the disease's complexity. A promising avenue in the quest for effective therapeutics lies within the realm of phytomolecules, which are characterized by their chemical diversity and biological potential. Thus, in the current study a machine learning (ML) model was created using phytomolecules having inhibitory activity against breast cancer cell lines. Multiple ML techniques viz., k-nearest neighbor (KNN), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGB) were combined with various molecular fingerprints (MACCS and Morgan2) to develop multiple predictive models. Among these models, the RF algorithm coupled with the MACCS fingerprint emerged as the best performing model. Mean decreases in impurity, t-SNE analysis, and k-means clustering was studied to determine the important features and understand chemical space diversity. Further, to predict potential breast cancer inhibitors, ADMET adherent Natural Products (NPs) of plant origin (identified from the COCONUT database) were screened using the developed ML model. NPs predicted as actives were further screened via ensemble virtual screening (eVS) technique against erb-b2 receptor tyrosine kinase 2 (HER2), to identify high-affinity molecules against this breast cancer drug target. In summary, the validated machine learning model developed in this study has been incorporated into a freely available standalone package named PhyIndBC (https://github.com/subhashmagarwal/PhyIndBC) which can be used for virtual screening and predicting breast cancer inhibitors of plant origin.http://www.sciencedirect.com/science/article/pii/S2214662825000039Machine learningEnsemble dockingBreast cancerDrug discoveryHER2 |
spellingShingle | Agneesh Pratim Das Subhash M. Agarwal PhyIndBC: Development of a machine learning tool for screening of potential breast cancer inhibitors from phytochemicalsGitHub Current Plant Biology Machine learning Ensemble docking Breast cancer Drug discovery HER2 |
title | PhyIndBC: Development of a machine learning tool for screening of potential breast cancer inhibitors from phytochemicalsGitHub |
title_full | PhyIndBC: Development of a machine learning tool for screening of potential breast cancer inhibitors from phytochemicalsGitHub |
title_fullStr | PhyIndBC: Development of a machine learning tool for screening of potential breast cancer inhibitors from phytochemicalsGitHub |
title_full_unstemmed | PhyIndBC: Development of a machine learning tool for screening of potential breast cancer inhibitors from phytochemicalsGitHub |
title_short | PhyIndBC: Development of a machine learning tool for screening of potential breast cancer inhibitors from phytochemicalsGitHub |
title_sort | phyindbc development of a machine learning tool for screening of potential breast cancer inhibitors from phytochemicalsgithub |
topic | Machine learning Ensemble docking Breast cancer Drug discovery HER2 |
url | http://www.sciencedirect.com/science/article/pii/S2214662825000039 |
work_keys_str_mv | AT agneeshpratimdas phyindbcdevelopmentofamachinelearningtoolforscreeningofpotentialbreastcancerinhibitorsfromphytochemicalsgithub AT subhashmagarwal phyindbcdevelopmentofamachinelearningtoolforscreeningofpotentialbreastcancerinhibitorsfromphytochemicalsgithub |