Semi Parametric Inducing Point Networks and Neural Processes

Semi-Parametric Inducing Point Networks and Neural Processes May 2023 Richa
Rastogi, Yair Schiff, Alon Hacohen, Zhaozhi Li, Ian Lee, Yuntian Deng, Mert R. Sabuncu, Volodymyr Kuleshov Richa Rastogi

▪We have access to training set at inference time: ▪Goal
is to learn parametric mapping conditioned on this dataset: 𝒟 𝑡 𝑟 𝑎 𝑖 𝑛 = { 𝒙 ( 𝑖 ), 𝒚 ( 𝑖 ) } 𝑛 𝑖 =1 𝒚 = 𝑓 𝜽 ( 𝒙 𝒟 𝑡 𝑟 𝑎 𝑖 𝑛 ) Semi-parametric setup:

Meta Learning setup: 𝒟 c → f(x; 𝒟 c )
Image credit: Dubois, Yann and Gordon, Jonathan and Foong, Andrew YK. ”Neural Process Family." (2020). http://yanndubs.github.io/Neural-Process-Family

𝒟 c → f(x; 𝒟 c ) Neural Processes 𝒟
c → p(y|x, 𝒟 c ) Image credit: Dubois, Yann and Gordon, Jonathan and Foong, Andrew YK. ”Neural Process Family." (2020). http://yanndubs.github.io/Neural-Process-Family

Most parametric models scale superlinearly in size of dataset (e.g.,
attention between attributes scales quadratically). Meta-learning tasks benefit from conditioning on larger contexts.

Reference dataset Motivating example: Parametric models are poor fit for
long sequence imputation and cannot scale to larger reference datasets A long sequence, such as time series, biological sequence or text sequence where missing chunks of information need to be retrieved from a reference dataset

Semi- Parametric Inducing Point Networks Inducing points for attention between
datapoints, in addition to attention between attributes Linear time and space complexity in the size and the dimension of the data during training. Neural Processes architecture that supports larger context sizes.

SPIN Overview • During training, learn the inducing points H
• Encoder module maps • At inference, discard only keep • Predictor module is parametric and maps D → H D H (Xquery , H) → Yquery

Image credit: https://www.geeksforgeeks.org/ml-k-means-algorithm/

Cross-Attention Between Attributes (XABA): Reduce dimensionality of datapoints

(Self-)Attention Between Latent Attributes (ABLA): Enables inducing points to refine
internal representations

Cross-Attention Between Datapoints (XABD): Generate inducing points that reduce context
size

We stack multiple SPIN layers to form the complete Dataset
Encoder

Predictor Module: Query refined inducing points; computation is constant time
with respect to reference dataset size

Image credit: Dubois, Yann and Gordon, Jonathan and Foong, Andrew
YK. ”Neural Process Family." (2020). http://yanndubs.github.io/Neural-Process-Family Applying SPIN to Neural Processes…

Inducing Point Neural Processes (IPNP)

IPNP better scales to larger contexts

IPNP better estimates uncertainty-aware predictions

SOTA results on genome imputation SPIN outperforms state-of-the-art, and is
more efficient than alternative Transformer-based approaches (Non-Parametric Transformers, Set- Transformers)

Summary SPIN is linear time and space complexity in the
size and the dimension of the data. SPIN learns a compact encoding of the training set for downstream applications. At inference time, computational complexity does not depend on training set size. IPNP is uncertainty aware, meta-learning algorithm that scales to larger context sizes.

Semi Parametric Inducing Point Networks and Neu...

Semi Parametric Inducing Point Networks and Neural Processes

Richa Rastogi

More Decks by Richa Rastogi

Featured

Transcript