Closed-form interpretation of neural network classifiers with symbolic gradients

I introduce a unified framework for finding a closed-form interpretation of any single neuron in an artificial neural network. Using this framework I demonstrate how to interpret neural network classifiers to reveal closed-form expressions of the concepts encoded in their decision boundaries. In con...

Full description

Saved in:

Bibliographic Details
Main Author:	Sebastian J Wetzel
Format:	Article
Language:	English
Published:	IOP Publishing 2025-01-01
Series:	Machine Learning: Science and Technology
Subjects:	artificial neural networks symbolic regression interpretation of neural networks
Online Access:	https://doi.org/10.1088/2632-2153/ad9fd0
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1823858321404198912
author	Sebastian J Wetzel
author_facet	Sebastian J Wetzel
author_sort	Sebastian J Wetzel
collection	DOAJ
description	I introduce a unified framework for finding a closed-form interpretation of any single neuron in an artificial neural network. Using this framework I demonstrate how to interpret neural network classifiers to reveal closed-form expressions of the concepts encoded in their decision boundaries. In contrast to neural network-based regression, for classification, it is in general impossible to express the neural network in the form of a symbolic equation even if the neural network itself bases its classification on a quantity that can be written as a closed-form equation. The interpretation framework is based on embedding trained neural networks into an equivalence class of functions that encode the same concept. I interpret these neural networks by finding an intersection between the equivalence class and human-readable equations defined by a symbolic search space. The approach is not limited to classifiers or full neural networks and can be applied to arbitrary neurons in hidden layers or latent spaces.
format	Article
id	doaj-art-2390bebbaa5f4264b47d8a87aacb24a6
institution	Kabale University
issn	2632-2153
language	English
publishDate	2025-01-01
publisher	IOP Publishing
record_format	Article
series	Machine Learning: Science and Technology
spelling	doaj-art-2390bebbaa5f4264b47d8a87aacb24a62025-02-11T12:16:53ZengIOP PublishingMachine Learning: Science and Technology2632-21532025-01-016101503510.1088/2632-2153/ad9fd0Closed-form interpretation of neural network classifiers with symbolic gradientsSebastian J Wetzel0https://orcid.org/0000-0002-2939-9081University of Waterloo , Waterloo, Ontario N2L 3G1, Canada; Perimeter Institute for Theoretical Physics , Waterloo, Ontario N2L 2Y5, Canada; Homes Plus Magazine Inc. , Waterloo, Ontario N2V 2B1, CanadaI introduce a unified framework for finding a closed-form interpretation of any single neuron in an artificial neural network. Using this framework I demonstrate how to interpret neural network classifiers to reveal closed-form expressions of the concepts encoded in their decision boundaries. In contrast to neural network-based regression, for classification, it is in general impossible to express the neural network in the form of a symbolic equation even if the neural network itself bases its classification on a quantity that can be written as a closed-form equation. The interpretation framework is based on embedding trained neural networks into an equivalence class of functions that encode the same concept. I interpret these neural networks by finding an intersection between the equivalence class and human-readable equations defined by a symbolic search space. The approach is not limited to classifiers or full neural networks and can be applied to arbitrary neurons in hidden layers or latent spaces.https://doi.org/10.1088/2632-2153/ad9fd0artificial neural networkssymbolic regressioninterpretation of neural networks
spellingShingle	Sebastian J Wetzel Closed-form interpretation of neural network classifiers with symbolic gradients Machine Learning: Science and Technology artificial neural networks symbolic regression interpretation of neural networks
title	Closed-form interpretation of neural network classifiers with symbolic gradients
title_full	Closed-form interpretation of neural network classifiers with symbolic gradients
title_fullStr	Closed-form interpretation of neural network classifiers with symbolic gradients
title_full_unstemmed	Closed-form interpretation of neural network classifiers with symbolic gradients
title_short	Closed-form interpretation of neural network classifiers with symbolic gradients
title_sort	closed form interpretation of neural network classifiers with symbolic gradients
topic	artificial neural networks symbolic regression interpretation of neural networks
url	https://doi.org/10.1088/2632-2153/ad9fd0
work_keys_str_mv	AT sebastianjwetzel closedforminterpretationofneuralnetworkclassifierswithsymbolicgradients

Closed-form interpretation of neural network classifiers with symbolic gradients

Similar Items