Closed-form interpretation of neural network classifiers with symbolic gradients

I introduce a unified framework for finding a closed-form interpretation of any single neuron in an artificial neural network. Using this framework I demonstrate how to interpret neural network classifiers to reveal closed-form expressions of the concepts encoded in their decision boundaries. In con...

Full description

Saved in:
Bibliographic Details
Main Author: Sebastian J Wetzel
Format: Article
Language:English
Published: IOP Publishing 2025-01-01
Series:Machine Learning: Science and Technology
Subjects:
Online Access:https://doi.org/10.1088/2632-2153/ad9fd0
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823858321404198912
author Sebastian J Wetzel
author_facet Sebastian J Wetzel
author_sort Sebastian J Wetzel
collection DOAJ
description I introduce a unified framework for finding a closed-form interpretation of any single neuron in an artificial neural network. Using this framework I demonstrate how to interpret neural network classifiers to reveal closed-form expressions of the concepts encoded in their decision boundaries. In contrast to neural network-based regression, for classification, it is in general impossible to express the neural network in the form of a symbolic equation even if the neural network itself bases its classification on a quantity that can be written as a closed-form equation. The interpretation framework is based on embedding trained neural networks into an equivalence class of functions that encode the same concept. I interpret these neural networks by finding an intersection between the equivalence class and human-readable equations defined by a symbolic search space. The approach is not limited to classifiers or full neural networks and can be applied to arbitrary neurons in hidden layers or latent spaces.
format Article
id doaj-art-2390bebbaa5f4264b47d8a87aacb24a6
institution Kabale University
issn 2632-2153
language English
publishDate 2025-01-01
publisher IOP Publishing
record_format Article
series Machine Learning: Science and Technology
spelling doaj-art-2390bebbaa5f4264b47d8a87aacb24a62025-02-11T12:16:53ZengIOP PublishingMachine Learning: Science and Technology2632-21532025-01-016101503510.1088/2632-2153/ad9fd0Closed-form interpretation of neural network classifiers with symbolic gradientsSebastian J Wetzel0https://orcid.org/0000-0002-2939-9081University of Waterloo , Waterloo, Ontario N2L 3G1, Canada; Perimeter Institute for Theoretical Physics , Waterloo, Ontario N2L 2Y5, Canada; Homes Plus Magazine Inc. , Waterloo, Ontario N2V 2B1, CanadaI introduce a unified framework for finding a closed-form interpretation of any single neuron in an artificial neural network. Using this framework I demonstrate how to interpret neural network classifiers to reveal closed-form expressions of the concepts encoded in their decision boundaries. In contrast to neural network-based regression, for classification, it is in general impossible to express the neural network in the form of a symbolic equation even if the neural network itself bases its classification on a quantity that can be written as a closed-form equation. The interpretation framework is based on embedding trained neural networks into an equivalence class of functions that encode the same concept. I interpret these neural networks by finding an intersection between the equivalence class and human-readable equations defined by a symbolic search space. The approach is not limited to classifiers or full neural networks and can be applied to arbitrary neurons in hidden layers or latent spaces.https://doi.org/10.1088/2632-2153/ad9fd0artificial neural networkssymbolic regressioninterpretation of neural networks
spellingShingle Sebastian J Wetzel
Closed-form interpretation of neural network classifiers with symbolic gradients
Machine Learning: Science and Technology
artificial neural networks
symbolic regression
interpretation of neural networks
title Closed-form interpretation of neural network classifiers with symbolic gradients
title_full Closed-form interpretation of neural network classifiers with symbolic gradients
title_fullStr Closed-form interpretation of neural network classifiers with symbolic gradients
title_full_unstemmed Closed-form interpretation of neural network classifiers with symbolic gradients
title_short Closed-form interpretation of neural network classifiers with symbolic gradients
title_sort closed form interpretation of neural network classifiers with symbolic gradients
topic artificial neural networks
symbolic regression
interpretation of neural networks
url https://doi.org/10.1088/2632-2153/ad9fd0
work_keys_str_mv AT sebastianjwetzel closedforminterpretationofneuralnetworkclassifierswithsymbolicgradients