Creation and interpretation of machine learning models for aqueous solubility prediction

Aim: Solubility prediction is an essential factor in rational drug design and many models have been developed with machine learning (ML) methods to enhance the predictive ability. However, most of the ML models are hard to interpret which limits the insights they can give in the lead optimization pr...

Full description

Saved in:

Bibliographic Details
Main Authors:	Minyi Su, Enric Herrero
Format:	Article
Language:	English
Published:	Open Exploration 2023-10-01
Series:	Exploration of Drug Science
Subjects:	aqueous solubility machine learning fragment-coloring property prediction
Online Access:	https://www.explorationpub.com/uploads/Article/A100826/100826.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1825199620631822336
author	Minyi Su Enric Herrero
author_facet	Minyi Su Enric Herrero
author_sort	Minyi Su
collection	DOAJ
description	Aim: Solubility prediction is an essential factor in rational drug design and many models have been developed with machine learning (ML) methods to enhance the predictive ability. However, most of the ML models are hard to interpret which limits the insights they can give in the lead optimization process. Here, an approach to construct and interpret solubility models with a combination of physicochemical properties and ML algorithms is presented. Methods: The models were trained, optimized, and tested in a dataset containing 12,983 compounds from two public datasets and further evaluated in two external test sets. More importantly, the SHapley Additive exPlanations (SHAP) and heat map coloring approaches were used to explain the predictive models and assess their suitability to guide compound optimization. Results: Among the different ML methods, random forest (RF) models obtain the best performance in the different test sets. From the interpretability perspective, fragment-based coloring offers a more robust interpretation than atom-based coloring and that normalizing the values further improves it. Conclusions: Overall, for certain applications simple ML algorithms such as RF work well and can outperform more complex methods and that combining them with fragment-coloring can offer guidance for chemists to modify the structure with a desired property. This interpretation strategy is publicly available at https://github.com/Pharmacelera/predictive-model-coloring and could be further applied in other property predictions to improve the interpretability of ML models.
format	Article
id	doaj-art-eeeb6bbebd384a46a00b23cda7539746
institution	Kabale University
issn	2836-7677
language	English
publishDate	2023-10-01
publisher	Open Exploration
record_format	Article
series	Exploration of Drug Science
spelling	doaj-art-eeeb6bbebd384a46a00b23cda75397462025-02-08T03:49:05ZengOpen ExplorationExploration of Drug Science2836-76772023-10-011538840410.37349/eds.2023.00026Creation and interpretation of machine learning models for aqueous solubility predictionMinyi Su0https://orcid.org/0000-0001-5830-059XEnric Herrero1https://orcid.org/0000-0001-7837-3593Pharmacelera, 08028 Barcelona, SpainPharmacelera, 08028 Barcelona, SpainAim: Solubility prediction is an essential factor in rational drug design and many models have been developed with machine learning (ML) methods to enhance the predictive ability. However, most of the ML models are hard to interpret which limits the insights they can give in the lead optimization process. Here, an approach to construct and interpret solubility models with a combination of physicochemical properties and ML algorithms is presented. Methods: The models were trained, optimized, and tested in a dataset containing 12,983 compounds from two public datasets and further evaluated in two external test sets. More importantly, the SHapley Additive exPlanations (SHAP) and heat map coloring approaches were used to explain the predictive models and assess their suitability to guide compound optimization. Results: Among the different ML methods, random forest (RF) models obtain the best performance in the different test sets. From the interpretability perspective, fragment-based coloring offers a more robust interpretation than atom-based coloring and that normalizing the values further improves it. Conclusions: Overall, for certain applications simple ML algorithms such as RF work well and can outperform more complex methods and that combining them with fragment-coloring can offer guidance for chemists to modify the structure with a desired property. This interpretation strategy is publicly available at https://github.com/Pharmacelera/predictive-model-coloring and could be further applied in other property predictions to improve the interpretability of ML models.https://www.explorationpub.com/uploads/Article/A100826/100826.pdfaqueous solubilitymachine learningfragment-coloringproperty prediction
spellingShingle	Minyi Su Enric Herrero Creation and interpretation of machine learning models for aqueous solubility prediction Exploration of Drug Science aqueous solubility machine learning fragment-coloring property prediction
title	Creation and interpretation of machine learning models for aqueous solubility prediction
title_full	Creation and interpretation of machine learning models for aqueous solubility prediction
title_fullStr	Creation and interpretation of machine learning models for aqueous solubility prediction
title_full_unstemmed	Creation and interpretation of machine learning models for aqueous solubility prediction
title_short	Creation and interpretation of machine learning models for aqueous solubility prediction
title_sort	creation and interpretation of machine learning models for aqueous solubility prediction
topic	aqueous solubility machine learning fragment-coloring property prediction
url	https://www.explorationpub.com/uploads/Article/A100826/100826.pdf
work_keys_str_mv	AT minyisu creationandinterpretationofmachinelearningmodelsforaqueoussolubilityprediction AT enricherrero creationandinterpretationofmachinelearningmodelsforaqueoussolubilityprediction

Creation and interpretation of machine learning models for aqueous solubility prediction

Similar Items