Closed-form interpretation of neural network classifiers with symbolic gradients
I introduce a unified framework for finding a closed-form interpretation of any single neuron in an artificial neural network. Using this framework I demonstrate how to interpret neural network classifiers to reveal closed-form expressions of the concepts encoded in their decision boundaries. In con...
Saved in:
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
IOP Publishing
2025-01-01
|
Series: | Machine Learning: Science and Technology |
Subjects: | |
Online Access: | https://doi.org/10.1088/2632-2153/ad9fd0 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1823858321404198912 |
---|---|
author | Sebastian J Wetzel |
author_facet | Sebastian J Wetzel |
author_sort | Sebastian J Wetzel |
collection | DOAJ |
description | I introduce a unified framework for finding a closed-form interpretation of any single neuron in an artificial neural network. Using this framework I demonstrate how to interpret neural network classifiers to reveal closed-form expressions of the concepts encoded in their decision boundaries. In contrast to neural network-based regression, for classification, it is in general impossible to express the neural network in the form of a symbolic equation even if the neural network itself bases its classification on a quantity that can be written as a closed-form equation. The interpretation framework is based on embedding trained neural networks into an equivalence class of functions that encode the same concept. I interpret these neural networks by finding an intersection between the equivalence class and human-readable equations defined by a symbolic search space. The approach is not limited to classifiers or full neural networks and can be applied to arbitrary neurons in hidden layers or latent spaces. |
format | Article |
id | doaj-art-2390bebbaa5f4264b47d8a87aacb24a6 |
institution | Kabale University |
issn | 2632-2153 |
language | English |
publishDate | 2025-01-01 |
publisher | IOP Publishing |
record_format | Article |
series | Machine Learning: Science and Technology |
spelling | doaj-art-2390bebbaa5f4264b47d8a87aacb24a62025-02-11T12:16:53ZengIOP PublishingMachine Learning: Science and Technology2632-21532025-01-016101503510.1088/2632-2153/ad9fd0Closed-form interpretation of neural network classifiers with symbolic gradientsSebastian J Wetzel0https://orcid.org/0000-0002-2939-9081University of Waterloo , Waterloo, Ontario N2L 3G1, Canada; Perimeter Institute for Theoretical Physics , Waterloo, Ontario N2L 2Y5, Canada; Homes Plus Magazine Inc. , Waterloo, Ontario N2V 2B1, CanadaI introduce a unified framework for finding a closed-form interpretation of any single neuron in an artificial neural network. Using this framework I demonstrate how to interpret neural network classifiers to reveal closed-form expressions of the concepts encoded in their decision boundaries. In contrast to neural network-based regression, for classification, it is in general impossible to express the neural network in the form of a symbolic equation even if the neural network itself bases its classification on a quantity that can be written as a closed-form equation. The interpretation framework is based on embedding trained neural networks into an equivalence class of functions that encode the same concept. I interpret these neural networks by finding an intersection between the equivalence class and human-readable equations defined by a symbolic search space. The approach is not limited to classifiers or full neural networks and can be applied to arbitrary neurons in hidden layers or latent spaces.https://doi.org/10.1088/2632-2153/ad9fd0artificial neural networkssymbolic regressioninterpretation of neural networks |
spellingShingle | Sebastian J Wetzel Closed-form interpretation of neural network classifiers with symbolic gradients Machine Learning: Science and Technology artificial neural networks symbolic regression interpretation of neural networks |
title | Closed-form interpretation of neural network classifiers with symbolic gradients |
title_full | Closed-form interpretation of neural network classifiers with symbolic gradients |
title_fullStr | Closed-form interpretation of neural network classifiers with symbolic gradients |
title_full_unstemmed | Closed-form interpretation of neural network classifiers with symbolic gradients |
title_short | Closed-form interpretation of neural network classifiers with symbolic gradients |
title_sort | closed form interpretation of neural network classifiers with symbolic gradients |
topic | artificial neural networks symbolic regression interpretation of neural networks |
url | https://doi.org/10.1088/2632-2153/ad9fd0 |
work_keys_str_mv | AT sebastianjwetzel closedforminterpretationofneuralnetworkclassifierswithsymbolicgradients |