Understanding Neural Network Architectures with Attention and Diffusion

ATTENTION AND DIFFUSION Understanding Neural Network Architectures Michał Karzyński EuroPython
2023

THE SPEAKER Michał Karzyński (@postrational) Software Architect Open Neural Network
Exchange   Operators group

THE TALK: MODELS Transformers Diffusion Natural Language Images GPT BERT
T5 Stable Diffusion Midjourney DALL-E Attention Convolution & Attention

CHAT GPT

DIFFUSION MODELS Stable Diffusion Midjourney DALL-E

THE TALK: OPERATIONS Linear a.k.a.: Dense, Fully-connected   Convolution Filter
scan to produce feature maps Attention Key-value store lookup

MULTI-LAYER PERCEPTRON

linear linear input input Multi-layer Perceptron Multi-layer Perceptron

linear linear Multi-layer Perceptron Multi-layer Perceptron

OPERATION: CONVOLUTION

CONVOLUTIONAL NETWORKS

convolution convolution max pool max pool linear linear VGG-16 VGG-16

VISUALIZATION OF CONVOLUTIONAL NETWORKS Zeiler and Fergus. "Visualizing and Understanding
Convolutional Networks." arXiv:1311.2901v3 [cs.CV],

Zeiler and Fergus, 2013

ENCODER-DECODER ARCHITECTURE

convolution convolution max pool max pool deconvolution deconvolution Convolutional Autoencoder
Convolutional Autoencoder unpooling unpooling

SKIP CONNECTIONS   AND RESIDUAL NETWORKS

convolution convolution max pool max pool linear linear ResNet-18 ResNet-18
average pool average pool add add + + + + + + + + + + + + + + + + + +

convolution convolution max pool max pool deconvolution deconvolution Convolutional U-Net
Convolutional U-Net unpooling unpooling concatenate concatenate c c c c c c c c c c

OPERATION: ATTENTION store = { 'key1': 'value1', 'key2': 'value2', 'key3':
'value3', } query = 'key1' value = store[query]

Vaswani, Ashish, et al. "Attention Is All You Need." arXiv:1706.03762v5
[cs.CL], 2017

TRANSFORMER ARCHITECTURE

linear linear inputs inputs Nx Nx append word to output
append word to output K K V V V V Q Q multi-head attention multi-head attention Transformer Transformer positional encoding positional encoding embedding embedding add add + + + + + + + + + + + + + + + + + + + + + + + + Nx Nx

FORWARD AND BACKWARD DIFFUSION

FORWARD BACKWARD + + + + + + + +
add genrated noise add genrated noise substract estimated noise substract estimated noise - - - -

LATENT DIFFUSION

Latent Space Latent Space + + + + + +
+ + "Logo for "Logo for EuroPython EuroPython in Prague" in Prague" + + add add + + + + + + + + K K V V V V Q Q + + + + K K V V V V Q Q + + + + BERT Encoder BERT Encoder ResBlock ResBlock ResBlock ResBlock Spatial Transformer Spatial Transformer Spatial Transformer Spatial Transformer linear linear multi-head attention multi-head attention Latent Diffusion Latent Diffusion timestep / positional encoding timestep / positional encoding embedding embedding deconvolution deconvolution K K V V V V Q Q + + + + convolution convolution up/down sample up/down sample c c c c c c c c concatenate concatenate c c

CLOSING REMARKS

Social media was the fi rst contact between A.I. and
humanity, and humanity lost. YUVAL HARARI

THANK YOU

Understanding Neural Network Architectures with...

Understanding Neural Network Architectures with Attention and Diffusion

Michał Karzyński

More Decks by Michał Karzyński

Other Decks in Technology

Featured

Transcript

ATTENTION AND DIFFUSION Understanding Neural Network Architectures Michał Karzyński EuroPython

THE SPEAKER Michał Karzyński (@postrational) Software Architect Open Neural Network

THE TALK: MODELS Transformers Diffusion Natural Language Images GPT BERT

CHAT GPT

DIFFUSION MODELS Stable Diffusion Midjourney DALL-E

THE TALK: OPERATIONS Linear a.k.a.: Dense, Fully-connected   Convolution Filter

MULTI-LAYER PERCEPTRON

linear linear input input Multi-layer Perceptron Multi-layer Perceptron

linear linear Multi-layer Perceptron Multi-layer Perceptron

OPERATION: CONVOLUTION

OPERATION: CONVOLUTION

CONVOLUTIONAL NETWORKS

convolution convolution max pool max pool linear linear VGG-16 VGG-16

VISUALIZATION OF CONVOLUTIONAL NETWORKS Zeiler and Fergus. "Visualizing and Understanding

Zeiler and Fergus, 2013

ENCODER-DECODER ARCHITECTURE

convolution convolution max pool max pool deconvolution deconvolution Convolutional Autoencoder

SKIP CONNECTIONS   AND RESIDUAL NETWORKS

convolution convolution max pool max pool linear linear ResNet-18 ResNet-18

U-NET

convolution convolution max pool max pool deconvolution deconvolution Convolutional U-Net

OPERATION: ATTENTION store = { 'key1': 'value1', 'key2': 'value2', 'key3':

Vaswani, Ashish, et al. "Attention Is All You Need." arXiv:1706.03762v5

TRANSFORMER ARCHITECTURE

linear linear inputs inputs Nx Nx append word to output

FORWARD AND BACKWARD DIFFUSION

FORWARD BACKWARD + + + + + + + +

LATENT DIFFUSION

Latent Space Latent Space + + + + + +

CLOSING REMARKS

Social media was the fi rst contact between A.I. and

THANK YOU