PhyIndBC: Development of a machine learning tool for screening of potential breast cancer inhibitors from phytochemicalsGitHub

Breast cancer is the foremost contributor to cancer-related mortality among women on a global scale. However, its treatment encounters challenges compounded by the disease's complexity. A promising avenue in the quest for effective therapeutics lies within the realm of phytomolecules, which are...

Full description

Saved in:
Bibliographic Details
Main Authors: Agneesh Pratim Das, Subhash M. Agarwal
Format: Article
Language:English
Published: Elsevier 2025-06-01
Series:Current Plant Biology
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2214662825000039
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1825199444020166656
author Agneesh Pratim Das
Subhash M. Agarwal
author_facet Agneesh Pratim Das
Subhash M. Agarwal
author_sort Agneesh Pratim Das
collection DOAJ
description Breast cancer is the foremost contributor to cancer-related mortality among women on a global scale. However, its treatment encounters challenges compounded by the disease's complexity. A promising avenue in the quest for effective therapeutics lies within the realm of phytomolecules, which are characterized by their chemical diversity and biological potential. Thus, in the current study a machine learning (ML) model was created using phytomolecules having inhibitory activity against breast cancer cell lines. Multiple ML techniques viz., k-nearest neighbor (KNN), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGB) were combined with various molecular fingerprints (MACCS and Morgan2) to develop multiple predictive models. Among these models, the RF algorithm coupled with the MACCS fingerprint emerged as the best performing model. Mean decreases in impurity, t-SNE analysis, and k-means clustering was studied to determine the important features and understand chemical space diversity. Further, to predict potential breast cancer inhibitors, ADMET adherent Natural Products (NPs) of plant origin (identified from the COCONUT database) were screened using the developed ML model. NPs predicted as actives were further screened via ensemble virtual screening (eVS) technique against erb-b2 receptor tyrosine kinase 2 (HER2), to identify high-affinity molecules against this breast cancer drug target. In summary, the validated machine learning model developed in this study has been incorporated into a freely available standalone package named PhyIndBC (https://github.com/subhashmagarwal/PhyIndBC) which can be used for virtual screening and predicting breast cancer inhibitors of plant origin.
format Article
id doaj-art-b1cc40e1f9494489b3e817f93124c28e
institution Kabale University
issn 2214-6628
language English
publishDate 2025-06-01
publisher Elsevier
record_format Article
series Current Plant Biology
spelling doaj-art-b1cc40e1f9494489b3e817f93124c28e2025-02-08T05:00:30ZengElsevierCurrent Plant Biology2214-66282025-06-0142100435PhyIndBC: Development of a machine learning tool for screening of potential breast cancer inhibitors from phytochemicalsGitHubAgneesh Pratim Das0Subhash M. Agarwal1Bioinformatics Division, ICMR-National Institute of Cancer Prevention and Research, I-7, Sector-39, Noida, Uttar Pradesh 201301, IndiaBioinformatics Division, ICMR-National Institute of Cancer Prevention and Research, I-7, Sector-39, Noida, Uttar Pradesh 201301, India; The Academy of Scientific and Innovative Research, AcSIR, India; Corresponding author at: Bioinformatics Division, ICMR-National Institute of Cancer Prevention and Research, I-7, Sector-39, Noida, Uttar Pradesh 201301, India.Breast cancer is the foremost contributor to cancer-related mortality among women on a global scale. However, its treatment encounters challenges compounded by the disease's complexity. A promising avenue in the quest for effective therapeutics lies within the realm of phytomolecules, which are characterized by their chemical diversity and biological potential. Thus, in the current study a machine learning (ML) model was created using phytomolecules having inhibitory activity against breast cancer cell lines. Multiple ML techniques viz., k-nearest neighbor (KNN), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGB) were combined with various molecular fingerprints (MACCS and Morgan2) to develop multiple predictive models. Among these models, the RF algorithm coupled with the MACCS fingerprint emerged as the best performing model. Mean decreases in impurity, t-SNE analysis, and k-means clustering was studied to determine the important features and understand chemical space diversity. Further, to predict potential breast cancer inhibitors, ADMET adherent Natural Products (NPs) of plant origin (identified from the COCONUT database) were screened using the developed ML model. NPs predicted as actives were further screened via ensemble virtual screening (eVS) technique against erb-b2 receptor tyrosine kinase 2 (HER2), to identify high-affinity molecules against this breast cancer drug target. In summary, the validated machine learning model developed in this study has been incorporated into a freely available standalone package named PhyIndBC (https://github.com/subhashmagarwal/PhyIndBC) which can be used for virtual screening and predicting breast cancer inhibitors of plant origin.http://www.sciencedirect.com/science/article/pii/S2214662825000039Machine learningEnsemble dockingBreast cancerDrug discoveryHER2
spellingShingle Agneesh Pratim Das
Subhash M. Agarwal
PhyIndBC: Development of a machine learning tool for screening of potential breast cancer inhibitors from phytochemicalsGitHub
Current Plant Biology
Machine learning
Ensemble docking
Breast cancer
Drug discovery
HER2
title PhyIndBC: Development of a machine learning tool for screening of potential breast cancer inhibitors from phytochemicalsGitHub
title_full PhyIndBC: Development of a machine learning tool for screening of potential breast cancer inhibitors from phytochemicalsGitHub
title_fullStr PhyIndBC: Development of a machine learning tool for screening of potential breast cancer inhibitors from phytochemicalsGitHub
title_full_unstemmed PhyIndBC: Development of a machine learning tool for screening of potential breast cancer inhibitors from phytochemicalsGitHub
title_short PhyIndBC: Development of a machine learning tool for screening of potential breast cancer inhibitors from phytochemicalsGitHub
title_sort phyindbc development of a machine learning tool for screening of potential breast cancer inhibitors from phytochemicalsgithub
topic Machine learning
Ensemble docking
Breast cancer
Drug discovery
HER2
url http://www.sciencedirect.com/science/article/pii/S2214662825000039
work_keys_str_mv AT agneeshpratimdas phyindbcdevelopmentofamachinelearningtoolforscreeningofpotentialbreastcancerinhibitorsfromphytochemicalsgithub
AT subhashmagarwal phyindbcdevelopmentofamachinelearningtoolforscreeningofpotentialbreastcancerinhibitorsfromphytochemicalsgithub