Closed-form interpretation of neural network classifiers with symbolic gradients

I introduce a unified framework for finding a closed-form interpretation of any single neuron in an artificial neural network. Using this framework I demonstrate how to interpret neural network classifiers to reveal closed-form expressions of the concepts encoded in their decision boundaries. In con...

Full description

Saved in:

Bibliographic Details
Main Author:	Sebastian J Wetzel
Format:	Article
Language:	English
Published:	IOP Publishing 2025-01-01
Series:	Machine Learning: Science and Technology
Subjects:	artificial neural networks symbolic regression interpretation of neural networks
Online Access:	https://doi.org/10.1088/2632-2153/ad9fd0
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	I introduce a unified framework for finding a closed-form interpretation of any single neuron in an artificial neural network. Using this framework I demonstrate how to interpret neural network classifiers to reveal closed-form expressions of the concepts encoded in their decision boundaries. In contrast to neural network-based regression, for classification, it is in general impossible to express the neural network in the form of a symbolic equation even if the neural network itself bases its classification on a quantity that can be written as a closed-form equation. The interpretation framework is based on embedding trained neural networks into an equivalence class of functions that encode the same concept. I interpret these neural networks by finding an intersection between the equivalence class and human-readable equations defined by a symbolic search space. The approach is not limited to classifiers or full neural networks and can be applied to arbitrary neurons in hidden layers or latent spaces.
ISSN:	2632-2153

Closed-form interpretation of neural network classifiers with symbolic gradients

Similar Items