P2 ∝ a3 Kepler’s third law Newton’s law of gravitation, to explain it Kepler’s third law Planck’s law B = 2hν3 c2 ( exp ( hν kB T) − 1 ) −1 Empirical fi t: Problem:
P2 ∝ a3 Kepler’s third law Newton’s law of gravitation, to explain it Kepler’s third law Planck’s law B = 2hν3 c2 ( exp ( hν kB T) − 1 ) −1 Empirical fi t: Quantum mechanics, to explain it (Partially) Problem:
P2 ∝ a3 Kepler’s third law Newton’s law of gravitation, to explain it Kepler’s third law Planck’s law B = 2hν3 c2 ( exp ( hν kB T) − 1 ) −1 Empirical fi t: Neural Network Weights Quantum mechanics, to explain it (Partially) Problem:
P2 ∝ a3 Kepler’s third law Newton’s law of gravitation, to explain it Kepler’s third law Planck’s law B = 2hν3 c2 ( exp ( hν kB T) − 1 ) −1 Empirical fi t: Neural Network Weights ??? Quantum mechanics, to explain it (Partially) Problem:
How can I try this? • Open-source • Extensible Python API compatible with scikit-learn • Can be distributed over 1000s of cores (w/ slurm, PBS, LSF, or Kubernetes)
How can I try this? • Open-source • Extensible Python API compatible with scikit-learn • Can be distributed over 1000s of cores (w/ slurm, PBS, LSF, or Kubernetes) • Custom operators, losses, constraints
But, there’s a problem: • Genetic algorithms, like PySR, scale terribly with expression complexity. • One must search over: • (permutations of operators) x (permutations of variables + possible constants)
But, there’s a problem: • Genetic algorithms, like PySR, scale terribly with expression complexity. • One must search over: • (permutations of operators) x (permutations of variables + possible constants)
But, there’s a problem: • Genetic algorithms, like PySR, scale terribly with expression complexity. • One must search over: • (permutations of operators) x (permutations of variables + possible constants) • But, we know that neural networks can ef fi ciently fi nd very complex functions!
But, there’s a problem: • Genetic algorithms, like PySR, scale terribly with expression complexity. • One must search over: • (permutations of operators) x (permutations of variables + possible constants) • But, we know that neural networks can ef fi ciently fi nd very complex functions! • Can we exploit this?
How this works: 1. Train NN normally, and freeze parameters. 2. Record input/outputs of network over training set. PySR 3. Fit the input/outputs of the neural network with PySR
Full Symbolic Distillation (g ∘ f)(x1 , x2 , x3 , x4 ) = Fully-interpretable approximation of the original neural network! • Easier to interpret and compare with existing models in the domain speci fi c language
Full Symbolic Distillation (g ∘ f)(x1 , x2 , x3 , x4 ) = Fully-interpretable approximation of the original neural network! • Easier to interpret and compare with existing models in the domain speci fi c language • Easier to impose symbolic priors (can potentially get better generalization!)
Xi y Inductive bias • Introducing some form of inductive bias is needed to eliminate the functional degeneracy. For example: • the latent space between and could have some aggregation over a set! f g ∑ i
Inductive bias • Other inductive biases to eliminate the degeneracy: • Sparsity on latent space ( fewer equations, fewer variables) ⇒ • (Also see related work of Sebastian Wetzel & Roger Melko; and Steve Brunton & Nathan Kutz!)
Inductive bias • Other inductive biases to eliminate the degeneracy: • Sparsity on latent space ( fewer equations, fewer variables) ⇒ • (Also see related work of Sebastian Wetzel & Roger Melko; and Steve Brunton & Nathan Kutz!) • Smoothness penalty (try to encourage expression-like behavior)
Inductive bias • Other inductive biases to eliminate the degeneracy: • Sparsity on latent space ( fewer equations, fewer variables) ⇒ • (Also see related work of Sebastian Wetzel & Roger Melko; and Steve Brunton & Nathan Kutz!) • Smoothness penalty (try to encourage expression-like behavior) • “Disentangled sparsity”
• Disentangled Sparsity: • Want few latent features AND want each latent feature to have few dependencies Miles Cranmer, Can Cui, et al. “Disentangled Sparsity Networks for Explainable AI”
• Disentangled Sparsity: • Want few latent features AND want each latent feature to have few dependencies • This makes things much easier for the genetic algorithm! Miles Cranmer, Can Cui, et al. “Disentangled Sparsity Networks for Explainable AI”
• Disentangled Sparsity: • Want few latent features AND want each latent feature to have few dependencies • This makes things much easier for the genetic algorithm! Miles Cranmer, Can Cui, et al. “Disentangled Sparsity Networks for Explainable AI”
Example: Graph neural network activations = forces, under a sparsity regularization Miles Cranmer, Alvaro Sanchez-Gonzalez, Peter Battaglia, Rui Xu, Kyle Cranmer, David Spergel and Shirley Ho,
·· xi ≈ 1 vi ∑ j≠i f(xi − xj , vi , vj ) Known acceleration Learned force law Learned parameters for planets i, j Newton’s laws of motion (assumed) Learn via gradient descent.
This allows us to fi nd both and simultaneously f f vi
Solution: re-optimize ! vi • The were optimized for the neural network. vi • The symbolic formula is not a *perfect* approximation of the network. • Thus: we need to re-optimize for the symbolic function ! vi f
Learned Coarse Models for Ef fi cient Turbulence Simulation
(ICLR 2022)
Kimberly Stachenfeld, Drummond B. Fielding, Dmitrii Kochkov, Miles Cranmer,
Tobias Pfaff, Jonathan Godwin, Can Cui, Shirley Ho, Peter Battaglia, Alvaro Sanchez- Gonzalez Example: Trained to reproduce turbulence simulations at lower resolution:
Learned Coarse Models for Ef fi cient Turbulence Simulation
(ICLR 2022)
Kimberly Stachenfeld, Drummond B. Fielding, Dmitrii Kochkov, Miles Cranmer,
Tobias Pfaff, Jonathan Godwin, Can Cui, Shirley Ho, Peter Battaglia, Alvaro Sanchez- Gonzalez 1000x speedup: Example: Trained to reproduce turbulence simulations at lower resolution:
Learned Coarse Models for Ef fi cient Turbulence Simulation
(ICLR 2022)
Kimberly Stachenfeld, Drummond B. Fielding, Dmitrii Kochkov, Miles Cranmer,
Tobias Pfaff, Jonathan Godwin, Can Cui, Shirley Ho, Peter Battaglia, Alvaro Sanchez- Gonzalez 1000x speedup: How did the model actually achieve this? Example: Trained to reproduce turbulence simulations at lower resolution: