Slide 1

Slide 1 text

ATTENTION AND DIFFUSION Understanding Neural Network Architectures Michał Karzyński EuroPython 2023

Slide 2

Slide 2 text

THE SPEAKER Michał Karzyński (@postrational) Software Architect Open Neural Network Exchange 
 Operators group

Slide 3

Slide 3 text

THE TALK: MODELS Transformers Diffusion Natural Language Images GPT BERT T5 Stable Diffusion Midjourney DALL-E Attention Convolution & Attention

Slide 4

Slide 4 text

CHAT GPT

Slide 5

Slide 5 text

DIFFUSION MODELS Stable Diffusion Midjourney DALL-E

Slide 6

Slide 6 text

THE TALK: OPERATIONS Linear a.k.a.: Dense, Fully-connected 
 Convolution Filter scan to produce feature maps Attention Key-value store lookup

Slide 7

Slide 7 text

MULTI-LAYER PERCEPTRON

Slide 8

Slide 8 text

linear linear input input Multi-layer Perceptron Multi-layer Perceptron

Slide 9

Slide 9 text

linear linear Multi-layer Perceptron Multi-layer Perceptron

Slide 10

Slide 10 text

OPERATION: CONVOLUTION

Slide 11

Slide 11 text

OPERATION: CONVOLUTION

Slide 12

Slide 12 text

CONVOLUTIONAL NETWORKS

Slide 13

Slide 13 text

convolution convolution max pool max pool linear linear VGG-16 VGG-16

Slide 14

Slide 14 text

VISUALIZATION OF CONVOLUTIONAL NETWORKS Zeiler and Fergus. "Visualizing and Understanding Convolutional Networks." arXiv:1311.2901v3 [cs.CV],

Slide 15

Slide 15 text

Zeiler and Fergus, 2013

Slide 16

Slide 16 text

ENCODER-DECODER ARCHITECTURE

Slide 17

Slide 17 text

convolution convolution max pool max pool deconvolution deconvolution Convolutional Autoencoder Convolutional Autoencoder unpooling unpooling

Slide 18

Slide 18 text

SKIP CONNECTIONS 
 AND RESIDUAL NETWORKS

Slide 19

Slide 19 text

convolution convolution max pool max pool linear linear ResNet-18 ResNet-18 average pool average pool add add + + + + + + + + + + + + + + + + + +

Slide 20

Slide 20 text

U-NET

Slide 21

Slide 21 text

convolution convolution max pool max pool deconvolution deconvolution Convolutional U-Net Convolutional U-Net unpooling unpooling concatenate concatenate c c c c c c c c c c

Slide 22

Slide 22 text

OPERATION: ATTENTION store = { 'key1': 'value1', 'key2': 'value2', 'key3': 'value3', } query = 'key1' value = store[query]

Slide 23

Slide 23 text

Vaswani, Ashish, et al. "Attention Is All You Need." arXiv:1706.03762v5 [cs.CL], 2017

Slide 24

Slide 24 text

TRANSFORMER ARCHITECTURE

Slide 25

Slide 25 text

linear linear inputs inputs Nx Nx append word to output append word to output K K V V V V Q Q multi-head attention multi-head attention Transformer Transformer positional encoding positional encoding embedding embedding add add + + + + + + + + + + + + + + + + + + + + + + + + Nx Nx

Slide 26

Slide 26 text

FORWARD AND BACKWARD DIFFUSION

Slide 27

Slide 27 text

FORWARD BACKWARD + + + + + + + + add genrated noise add genrated noise substract estimated noise substract estimated noise - - - -

Slide 28

Slide 28 text

LATENT DIFFUSION

Slide 29

Slide 29 text

Latent Space Latent Space + + + + + + + + "Logo for "Logo for EuroPython EuroPython in Prague" in Prague" + + add add + + + + + + + + K K V V V V Q Q + + + + K K V V V V Q Q + + + + BERT Encoder BERT Encoder ResBlock ResBlock ResBlock ResBlock Spatial Transformer Spatial Transformer Spatial Transformer Spatial Transformer linear linear multi-head attention multi-head attention Latent Diffusion Latent Diffusion timestep / positional encoding timestep / positional encoding embedding embedding deconvolution deconvolution K K V V V V Q Q + + + + convolution convolution up/down sample up/down sample c c c c c c c c concatenate concatenate c c

Slide 30

Slide 30 text

CLOSING REMARKS

Slide 31

Slide 31 text

Social media was the fi rst contact between A.I. and humanity, and humanity lost. YUVAL HARARI

Slide 32

Slide 32 text

THANK YOU