PretoxTM: a text mining system for extracting treatment-related findings from preclinical toxicology reports

Abstract Over the last few decades the pharmaceutical industry has generated a vast corpus of knowledge on the safety and efficacy of drugs. Much of this information is contained in toxicology reports, which summarise the results of animal studies designed to analyse the effects of the tested compou...

Full description

Saved in:
Bibliographic Details
Main Authors: Javier Corvi, Nicolás Díaz-Roussel, José M. Fernández, Francesco Ronzano, Emilio Centeno, Pablo Accuosto, Celine Ibrahim, Shoji Asakura, Frank Bringezu, Mirjam Fröhlicher, Annika Kreuchwig, Yoko Nogami, Jeong Rih, Raul Rodriguez-Esteban, Nicolas Sajot, Joerg Wichard, Heng-Yi Michael Wu, Philip Drew, Thomas Steger-Hartmann, Alfonso Valencia, Laura I. Furlong, Salvador Capella-Gutierrez
Format: Article
Language:English
Published: BMC 2025-02-01
Series:Journal of Cheminformatics
Subjects:
Online Access:https://doi.org/10.1186/s13321-024-00925-x
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823861674397925376
author Javier Corvi
Nicolás Díaz-Roussel
José M. Fernández
Francesco Ronzano
Emilio Centeno
Pablo Accuosto
Celine Ibrahim
Shoji Asakura
Frank Bringezu
Mirjam Fröhlicher
Annika Kreuchwig
Yoko Nogami
Jeong Rih
Raul Rodriguez-Esteban
Nicolas Sajot
Joerg Wichard
Heng-Yi Michael Wu
Philip Drew
Thomas Steger-Hartmann
Alfonso Valencia
Laura I. Furlong
Salvador Capella-Gutierrez
author_facet Javier Corvi
Nicolás Díaz-Roussel
José M. Fernández
Francesco Ronzano
Emilio Centeno
Pablo Accuosto
Celine Ibrahim
Shoji Asakura
Frank Bringezu
Mirjam Fröhlicher
Annika Kreuchwig
Yoko Nogami
Jeong Rih
Raul Rodriguez-Esteban
Nicolas Sajot
Joerg Wichard
Heng-Yi Michael Wu
Philip Drew
Thomas Steger-Hartmann
Alfonso Valencia
Laura I. Furlong
Salvador Capella-Gutierrez
author_sort Javier Corvi
collection DOAJ
description Abstract Over the last few decades the pharmaceutical industry has generated a vast corpus of knowledge on the safety and efficacy of drugs. Much of this information is contained in toxicology reports, which summarise the results of animal studies designed to analyse the effects of the tested compound, including unintended pharmacological and toxic effects, known as treatment-related findings. Despite the potential of this knowledge, the fact that most of this relevant information is only available as unstructured text with variable degrees of digitisation has hampered its systematic access, use and exploitation. Text mining technologies have the ability to automatically extract, analyse and aggregate such information, providing valuable new insights into the drug discovery and development process. In the context of the eTRANSAFE project, we present PretoxTM (Preclinical Toxicology Text Mining), the first system specifically designed to detect, extract, organise and visualise treatment-related findings from toxicology reports. The PretoxTM tool comprises three main components: PretoxTM Corpus, PretoxTM Pipeline and PretoxTM Web App. The PretoxTM Corpus is a gold standard corpus of preclinical treatment-related findings annotated by toxicology experts. This corpus was used to develop, train and validate the PretoxTM Pipeline, which extracts treatment-related findings from preclinical study reports. The extracted information is then presented for expert visualisation and validation in the PretoxTM Web App. Scientific Contribution While text mining solutions have been widely used in the clinical domain to identify adverse drug reactions from various sources, no similar systems exist for identifying adverse events in animal models during preclinical testing. PretoxTM fills this gap by efficiently extracting treatment-related findings from preclinical toxicology reports. This provides a valuable resource for toxicology research, enhancing the efficiency of safety evaluations, saving time, and leading to more effective decision-making in the drug development process.
format Article
id doaj-art-c0f1a6247ef343cab56757ffe7449629
institution Kabale University
issn 1758-2946
language English
publishDate 2025-02-01
publisher BMC
record_format Article
series Journal of Cheminformatics
spelling doaj-art-c0f1a6247ef343cab56757ffe74496292025-02-09T12:52:17ZengBMCJournal of Cheminformatics1758-29462025-02-0117112310.1186/s13321-024-00925-xPretoxTM: a text mining system for extracting treatment-related findings from preclinical toxicology reportsJavier Corvi0Nicolás Díaz-Roussel1José M. Fernández2Francesco Ronzano3Emilio Centeno4Pablo Accuosto5Celine Ibrahim6Shoji Asakura7Frank Bringezu8Mirjam Fröhlicher9Annika Kreuchwig10Yoko Nogami11Jeong Rih12Raul Rodriguez-Esteban13Nicolas Sajot14Joerg Wichard15Heng-Yi Michael Wu16Philip Drew17Thomas Steger-Hartmann18Alfonso Valencia19Laura I. Furlong20Salvador Capella-Gutierrez21Life Sciences Department, Barcelona Supercomputing Center (BSC)Life Sciences Department, Barcelona Supercomputing Center (BSC)Life Sciences Department, Barcelona Supercomputing Center (BSC)Hospital del Mar Medical Research Institute (IMIM)Hospital del Mar Medical Research Institute (IMIM)MedBioInformatics SolutionsBayer AG, In Vitro SafetyEisaiChemical and Preclinical Safety, Merck Healthcare KGaATranslational Medicine, Preclinical Safety, Novartis Biomedical ResearchBayer AG, In Vitro SafetyEisaiIpsen InnovationRoche Innovation Center BaselServierBayer AG, In Vitro SafetyGenentech Research and Early Development (gRED) Computational Sciences, Genentech, Inc.PDS ConsultantsBayer AG, In Vitro SafetyLife Sciences Department, Barcelona Supercomputing Center (BSC)MedBioInformatics SolutionsLife Sciences Department, Barcelona Supercomputing Center (BSC)Abstract Over the last few decades the pharmaceutical industry has generated a vast corpus of knowledge on the safety and efficacy of drugs. Much of this information is contained in toxicology reports, which summarise the results of animal studies designed to analyse the effects of the tested compound, including unintended pharmacological and toxic effects, known as treatment-related findings. Despite the potential of this knowledge, the fact that most of this relevant information is only available as unstructured text with variable degrees of digitisation has hampered its systematic access, use and exploitation. Text mining technologies have the ability to automatically extract, analyse and aggregate such information, providing valuable new insights into the drug discovery and development process. In the context of the eTRANSAFE project, we present PretoxTM (Preclinical Toxicology Text Mining), the first system specifically designed to detect, extract, organise and visualise treatment-related findings from toxicology reports. The PretoxTM tool comprises three main components: PretoxTM Corpus, PretoxTM Pipeline and PretoxTM Web App. The PretoxTM Corpus is a gold standard corpus of preclinical treatment-related findings annotated by toxicology experts. This corpus was used to develop, train and validate the PretoxTM Pipeline, which extracts treatment-related findings from preclinical study reports. The extracted information is then presented for expert visualisation and validation in the PretoxTM Web App. Scientific Contribution While text mining solutions have been widely used in the clinical domain to identify adverse drug reactions from various sources, no similar systems exist for identifying adverse events in animal models during preclinical testing. PretoxTM fills this gap by efficiently extracting treatment-related findings from preclinical toxicology reports. This provides a valuable resource for toxicology research, enhancing the efficiency of safety evaluations, saving time, and leading to more effective decision-making in the drug development process.https://doi.org/10.1186/s13321-024-00925-xNatural language processingText miningToxicologyAdverse effectPreclinicalAnimal model
spellingShingle Javier Corvi
Nicolás Díaz-Roussel
José M. Fernández
Francesco Ronzano
Emilio Centeno
Pablo Accuosto
Celine Ibrahim
Shoji Asakura
Frank Bringezu
Mirjam Fröhlicher
Annika Kreuchwig
Yoko Nogami
Jeong Rih
Raul Rodriguez-Esteban
Nicolas Sajot
Joerg Wichard
Heng-Yi Michael Wu
Philip Drew
Thomas Steger-Hartmann
Alfonso Valencia
Laura I. Furlong
Salvador Capella-Gutierrez
PretoxTM: a text mining system for extracting treatment-related findings from preclinical toxicology reports
Journal of Cheminformatics
Natural language processing
Text mining
Toxicology
Adverse effect
Preclinical
Animal model
title PretoxTM: a text mining system for extracting treatment-related findings from preclinical toxicology reports
title_full PretoxTM: a text mining system for extracting treatment-related findings from preclinical toxicology reports
title_fullStr PretoxTM: a text mining system for extracting treatment-related findings from preclinical toxicology reports
title_full_unstemmed PretoxTM: a text mining system for extracting treatment-related findings from preclinical toxicology reports
title_short PretoxTM: a text mining system for extracting treatment-related findings from preclinical toxicology reports
title_sort pretoxtm a text mining system for extracting treatment related findings from preclinical toxicology reports
topic Natural language processing
Text mining
Toxicology
Adverse effect
Preclinical
Animal model
url https://doi.org/10.1186/s13321-024-00925-x
work_keys_str_mv AT javiercorvi pretoxtmatextminingsystemforextractingtreatmentrelatedfindingsfrompreclinicaltoxicologyreports
AT nicolasdiazroussel pretoxtmatextminingsystemforextractingtreatmentrelatedfindingsfrompreclinicaltoxicologyreports
AT josemfernandez pretoxtmatextminingsystemforextractingtreatmentrelatedfindingsfrompreclinicaltoxicologyreports
AT francescoronzano pretoxtmatextminingsystemforextractingtreatmentrelatedfindingsfrompreclinicaltoxicologyreports
AT emiliocenteno pretoxtmatextminingsystemforextractingtreatmentrelatedfindingsfrompreclinicaltoxicologyreports
AT pabloaccuosto pretoxtmatextminingsystemforextractingtreatmentrelatedfindingsfrompreclinicaltoxicologyreports
AT celineibrahim pretoxtmatextminingsystemforextractingtreatmentrelatedfindingsfrompreclinicaltoxicologyreports
AT shojiasakura pretoxtmatextminingsystemforextractingtreatmentrelatedfindingsfrompreclinicaltoxicologyreports
AT frankbringezu pretoxtmatextminingsystemforextractingtreatmentrelatedfindingsfrompreclinicaltoxicologyreports
AT mirjamfrohlicher pretoxtmatextminingsystemforextractingtreatmentrelatedfindingsfrompreclinicaltoxicologyreports
AT annikakreuchwig pretoxtmatextminingsystemforextractingtreatmentrelatedfindingsfrompreclinicaltoxicologyreports
AT yokonogami pretoxtmatextminingsystemforextractingtreatmentrelatedfindingsfrompreclinicaltoxicologyreports
AT jeongrih pretoxtmatextminingsystemforextractingtreatmentrelatedfindingsfrompreclinicaltoxicologyreports
AT raulrodriguezesteban pretoxtmatextminingsystemforextractingtreatmentrelatedfindingsfrompreclinicaltoxicologyreports
AT nicolassajot pretoxtmatextminingsystemforextractingtreatmentrelatedfindingsfrompreclinicaltoxicologyreports
AT joergwichard pretoxtmatextminingsystemforextractingtreatmentrelatedfindingsfrompreclinicaltoxicologyreports
AT hengyimichaelwu pretoxtmatextminingsystemforextractingtreatmentrelatedfindingsfrompreclinicaltoxicologyreports
AT philipdrew pretoxtmatextminingsystemforextractingtreatmentrelatedfindingsfrompreclinicaltoxicologyreports
AT thomasstegerhartmann pretoxtmatextminingsystemforextractingtreatmentrelatedfindingsfrompreclinicaltoxicologyreports
AT alfonsovalencia pretoxtmatextminingsystemforextractingtreatmentrelatedfindingsfrompreclinicaltoxicologyreports
AT lauraifurlong pretoxtmatextminingsystemforextractingtreatmentrelatedfindingsfrompreclinicaltoxicologyreports
AT salvadorcapellagutierrez pretoxtmatextminingsystemforextractingtreatmentrelatedfindingsfrompreclinicaltoxicologyreports