Differentiable programming in Gluon and Python (For not only for medical image analysis)

Slide 1

Slide 1 text

Diﬀerentiable programming in Gluon For (not only medical) image analysis Jan Margeta | | March 9, 2018 [email protected] @jmargeta

Slide 2

Slide 2 text

Congenital heart diseases HVSMR 2016: MICCAI Workshop on Whole-Heart and Great Vessel Segmentation from 3D Cardiovascular MRI in Congenital Heart Disease

Slide 3

Slide 3 text

Fontan procedure

Slide 4

Slide 4 text

Learning from images

Slide 5

Slide 5 text

How to do it

Slide 6

Slide 6 text

How NOT! to do it

Slide 7

Slide 7 text

Building with smaller reusable modules well defined function (testability) like normal functions (reusability) injecting knowledge where possible (less data needed) Quicker to prototype

Slide 8

Slide 8 text

How does the output change when parameter x changes? Diﬀerentiable programming Diﬀerentiable reusable blocks + chain rule = df dx df du du dv dv dx class Sin(mx.autograd.Function): def forward(self, x): self.save_for_backward(x) y_np = np.sin(x.asnumpy()) return mx.nd.array(y_np) def backward(self, dy): x, = self.saved_tensors y_np = np.cos(x.asnumpy()) return dy * mx.nd.array(y_np)

Slide 9

Slide 9 text

z = f(x, y) = x . y df/dx = y df/dy = x A diﬀerentiable block class Multiply(Function): def forward(self, x, y): self.save_for_backward(x, y) return x y def backward(self, dz): x, y = self.saved_tensors return [y dz, x * dz]

Slide 10

Slide 10 text

Define and run Tensorflow, MxNet, CNTK, Keras Ooops, be careful about your placeholders import tensorflow as tf X = tf.placeholder("float") W = tf.Variable(rng.randn(2, 1), dtype=np.float32, name="weight") b = tf.Variable(rng.randn(1), dtype=np.float32, name="bias") pred = tf.matmul(X, W) + b init = tf.global_variables_initializer() with tf.Session() as sess: sess.run(init) out = sess.run(pred, feed_dict={'x': np.array([[2., 1.0]])})

Slide 11

Slide 11 text

Define by run Chainer, PyTorch, Gluon, Tensorflow Eager Imperative execution just like Python from mxnet import nd W = nd.array(np.random.randn(2, 1)) b = nd.array(np.random.randn(1)) X = nd.array([[2.0, 1.0]]) out = nd.dot(X, W) + b

Slide 12

Slide 12 text

Advantages Flow control ops are pure Python (no tf.scan, tf.while...) And recursion works too! def funky_function(a): b = a * 2 while (nd.norm(b) < 10).asscalar(): b = b ** 2 if (mx.nd.sum(b) > 1).asscalar(): c = b else: c = 10 * b return c a = nd.random_normal(shape=3) c = funky_function(a)

Slide 13

Slide 13 text

Errors where you would expect Debugging just works! nd.dot(nd.zeros(shape=3), nd.zeros(shape=2)) MXNetError Traceback (most recent call last) in () ----> 1 nd.dot(nd.zeros(shape=3), nd.zeros(shape=2)) ... MXNetError: [09:39:23] src/operator/tensor/./dot-inl.h:998: Check failed: lshape[0] == rshape[0] (3 vs. 2) dot shape error: [3] X [2] def looper(a): for i in range(5): a = a ** 2 if (mx.nd.sum(a) > 1).asscalar(): pdb.set_trace() return a return a / 2 a = nd.random_normal(shape=3) c = looper(a)

Slide 14

Slide 14 text

Announced Oct. 2017 API largely inspired by Chainer and PyTorch resource eﬀicient + static compilation export models in Python, R, Julia, Scala, Go, Javascript

Slide 15

Slide 15 text

Computing gradients with autograd Record execution flow (define by run) Attach gradients to variables that are optimizable Now, compute gradients! Gradients are computed and stored in x.grad, y.grad yy, xx = np.indices((20, 20)) x, y = nd.array(xx), nd.array(yy) x.attach_grad() y.attach_grad() with mx.autograd.record(): z = (x - 5) ** 2 + (y - 10) ** 2 z.backward()

Slide 16

Slide 16 text

Check this out, Tensorflow... def funky_function(a): b = a * 2 while (nd.norm(b) < 10).asscalar(): b = b ** 2 if (mx.nd.sum(b) > 1).asscalar(): c = b else: c = 10 * b return c a = nd.random_normal(shape=3) a.attach_grad() with mx.autograd.record(): c = funky_function(a) c.backward()

Slide 17

Slide 17 text

Shapes are computed on first run! conv = gluon.nn.Conv2D(16, kernel_size=(3, 3), padding=(1, 1)) conv.collect_params().initialize() conv(nd.zeros((1, 3, 256, 256))).shape # (1, 16, 256, 256) conv(nd.zeros((4, 3, 128, 128))).shape # (4, 16, 128, 128) dense = gluon.nn.Dense(16) dense.collect_params().initialize() dense(nd.zeros((10, 128, 256))).shape # (10, 16) dense(nd.zeros((5, 128, 256))).shape # (4, 16)

Slide 18

Slide 18 text

Building reusable blocks in Gluon

Slide 19

Slide 19 text

gluon.nn.Sequential almost like Keras net = nn.Sequential() net.add(nn.Conv2D(3, kernel_size=5, padding=2)) net.add(nn.Conv2D(6, kernel_size=5, padding=2)) net.add(nn.Flatten()) net.add(nn.Dense(12))

Slide 20

Slide 20 text

gluon.nn.Block Infinite flexibility! class MLP(gnn.Block): def __init__(self, **kwargs): super().__init__(**kwargs) with self.name_scope(): self.dense0 = nn.Dense(128) self.dense1 = nn.Dense(64) self.dense2 = nn.Dense(10) def forward(self, x): x = nd.relu(self.dense0(x)) x = nd.relu(self.dense1(x)) return self.dense2(x)

Slide 21

Slide 21 text

Hybrid blocks Construct symbolic graphs Allows JIT compilation for faster execution class HybridNet(gluon.nn.HybridBlock): def __init__(self, **kwargs): super().__init__(**kwargs) self.conv = nn.Conv2D(3, kernel_size=5, padding=2) # see the F? def hybrid_forward(self, F, x): return self.conv(x) x = mx.sym.Variable('x') y = net(x) net.collect_params().initialize() net.hybridize() net(arr)

Slide 22

Slide 22 text

Mix and match anything you like class BigNet(gluon.HybridBlock): def __init__(self, **kwargs): super().__init__(**kwargs) with self.name_scope(): self.convin = nn.Conv2D(3, kernel_size=5, padding=2) self.iterative_process = HybridNet(prefix='iter_') self.convout = nn.Conv2D(6, kernel_size=5, padding=2) def hybrid_forward(self, F, x): x = x_in = self.convin(x) for _ in range(2): x = self.iterative_process(x) x = F.concat(x_in, x, dim=1) return self.convout(x)

Slide 23

Slide 23 text

Uniform access to data Dataset Batching with loader Iterating over batches class MyDataset(gluon.data.Dataset): def __getitem__(self, index) -> Union: im, y0, y1 = load_image(index, ...) return im, y0, y1 def __len__(self): return 10 data = mx.gluon.data.DataLoader( MyDataset(), batch_size=4, shuffle=True, num_workers=4) for X, Y0, Y1 in data: update(model, X, Y0, Y1)

Slide 24

Slide 24 text

Loss function Definition of the module's performance High for bad model parameters, low for better ones. Many already in gluon.loss, defining new ones is easy def squared_diff_loss(x, y): return ((x - y.reshape_like(x))**2).sum()

Slide 25

Slide 25 text

Define model def make_segmentation_model(num_output_classes=4): net = gluon.nn.Sequential() net.add(nn.Conv2d(10), activation='relu') # ... net.add(nn.Conv2d(num_output_classes)) return net

Slide 26

Slide 26 text

Trainer Initialize parameters ctx - a device or a list of devices where the model will live Trainer updates only a selected subset of parameters net.collect_params().initialize(mx.init.Normal(), ctx=mx.gpu(0)) trainer = gluon.Trainer( net.collect_params(), 'sgd', {'learning_rage': 0.1} )

Slide 27

Slide 27 text

Training loop for epoch in range(num_epochs): for batch in train_data: data = batch.data[0].as_in_context(computation_context) label = batch.label[0].as_in_context(computation_context) # record the computational graph with mx.autograd.record(): prediction = net(data) # compute the loss loss = loss_function(prediction, label) # compute the gradients loss.backward() # run one optimization step trainer.step(data.shape[0])

Slide 28

Slide 28 text

Pretrained model zoo Choose the right tradeoﬀ between speed and accuracy AlexNet, DenseNet, Inception v3, MobileNet, ResNet, SqueezeNet, VGG class MobileNet(gluon.nn.HybridBlock): def __init__(self, multiplier=1.0, classes=1000, **kwargs): super().__init__(**kwargs) with self.name_scope(): self.features = nn.HybridSequential(prefix='') #... self.output = nn.Dense(classes) def hybrid_forward(self, F, x): x = self.features(x) return self.output(x)

Slide 29

Slide 29 text

Instant image recognition with a pretrained model from mxnet.gluon.model_zoo import vision image_recognizer = vision.resnet18_v1(pretrained=True) image_normalized = mx.image.color_normalize( image / 255.0, mean=mx.nd.array([0.485, 0.456, 0.406]), std=mx.nd.array([0.229, 0.224, 0.225])) predictions = image_recognizer(image_normalized)

Slide 30

Slide 30 text

Fine-tuning from a pretrained network class FineTunedClassifier(gluon.nn.HybridBlock): def __init__(self, classes=20, donor): super().__init__() self.features = donor.features self.output = gluon.nn.HybridSequential('output') with self.output.name_scope(): self.output.add(gluon.nn.Flatten()) self.output.add(gluon.nn.Dense(classes)) def hybrid_forward(self, F, x): x = self.features(x) return self.output(x) donor = get_model(name='mobilenet1.0', pretrained=True) finetuned_net = FineTunedClassifier(classes=4, donor=donor) finetuned_net.output.initialize()

Slide 31

Slide 31 text

Image colorizer

Slide 32

Slide 32 text

Handwritten digit... calculator

Slide 33

Slide 33 text

No content

Slide 34

Slide 34 text

Image preprocessor Map image onto a new image 1 input channel (gray image) 1 output channel (new gray image) class Preprocessor(nn.HybridSequential): def __init__(self, **kwargs): super().__init__(**kwargs) with self.name_scope(): # ... self.output.add(nn.Conv2D(1, kernel_size=(1, 1)) loss = custom_similarity_function

Slide 35

Slide 35 text

Landmark estimator Predict locations of landmarks 1 input channel (gray image) (num_landmarks * 2)-dimensional input channel class LandmarkEstimator(nn.HybridSequential): def __init__(self, num_landmarks, **kwargs): super().__init__(**kwargs) with self.name_scope(): # ... self.output.add( nn.Conv2D(num_landmarks * 2, kernel_size=(1, 1)) loss = loss.L2Loss()

Slide 36

Slide 36 text

Segmentator Predict class label of each voxel 1 input channel (gray image), landmarks channels one output channel per target class class Segmentator(nn.HybridBlock): def __init__(self, num_output_classes, **kwargs): super().__init__(**kwargs) # ... self.conv_out = nn.Conv2D( num_output_classes, kernel_size=(1, 1)) def hybrid_forward(self, F, x): x = self.conv0(x) # ... return self.conv_out(x) loss = loss.SoftmaxCrossEntropyLoss() + custom

Slide 37

Slide 37 text

View estimator Predict transformation to get a desired view (num_landmarks * 2)-dimensional input channel 2 output channels (3 translations, 3 rotations) class ViewEstimator(nn.HybridBlock): def __init__(self, **kwargs): super().__init__(**kwargs) self.translation = nn.HybridSequential() self.translation.add(gluon.nn.Flatten()) self.translation.add(gluon.nn.Dense(3)) self.rotation = nn.HybridSequential() self.rotation.add(gluon.nn.Flatten()) self.rotation.add(gluon.nn.Dense(3)) def hybrid_forward(self, F, x): return [self.translation(x), self.rotation(x)] loss = loss.L2Loss()

Slide 38

Slide 38 text

3D printing *Using mock segmentation

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

No content

Slide 41

Slide 41 text

The current browser does not support WebGL.

Slide 42

Slide 42 text

Conclusions build smaller trainable, reusable and testable blocks optimize the whole at the end iterate faster with standard tools divide ML work by specialty if needed ecosystem of pretrained diﬀerentiable modules needed (just like async)

Slide 43

Slide 43 text

Thanks Mehdi Hedjazi Moghari, Boston Children's Hospital, MIT, Harvard Medical School D.F. Pace, A.V. Dalca, T. Geva, A.J. Powell, M.H. Moghari, P. Golland, “Interactive whole-heart segmentation in congenital heart disease”, Medical Image Computing and Computer Assisted Interventions (MICCAI 2015), Lecture Notes in Computer Science; 9351:80-88, 2015. [email protected] twitter.com/jmargeta

Slide 44

Slide 44 text

References HVSMR 2016: MICCAI Workshop on Whole-Heart and Great Vessel Segmentation from 3D Cardiovascular MRI in Congenital Heart Disease Gluon home MxNet the straight dope Pytorch vs Gluon comparison Deep learning in Apache Gluon Deep learning est mort, vive diﬀerential programming So ware 2.0

Slide 45

Slide 45 text

References Understanding cnn MMdnn - framework interoperability Mxnet model server Backpropagation

Slide 46

Slide 46 text

References Introducing Gluon Deep learning framework benchmark Compare top DL libraries Autograd in Gluon

Slide 47

Slide 47 text

References [Diﬀerentiable Programming]) ( ) Neural Networks, Types, and Functional Programming https://pseudoprofound.wordpress.com/2016/08/03/diﬀerentiable- programming/