Slide 1

Slide 1 text

Copyright © 2015 Russel Winder 1 Making Computations Execute Very Quickly Dr Russel Winder email: [email protected] twitter: @russel_winder Web: http://www.russel.org.uk

Slide 2

Slide 2 text

Copyright © 2015 Russel Winder 2 Python is slow…

Slide 3

Slide 3 text

Copyright © 2015 Russel Winder 3 …at computation.

Slide 4

Slide 4 text

Copyright © 2015 Russel Winder 4 CPU Bound vs I/O Bound 1/2 — ● Python is entirely fine for essentially I/O bound activity: ● Managing user interfaces via native code widgets (Qt, GTK, Wx, ) … ● Managing networking activity. Common theme here, the use of an event loop.

Slide 5

Slide 5 text

Copyright © 2015 Russel Winder 5 CPU Bound vs I/O Bound 2/2 — ● Python uses hardware floating point, but via the Python heap. ● Python uses hardware integers for small integer values, but via the Python heap. Result: non-trivial numerical activity is slow.

Slide 6

Slide 6 text

Copyright © 2015 Russel Winder 6

Slide 7

Slide 7 text

Copyright © 2015 Russel Winder 7 

Slide 8

Slide 8 text

Copyright © 2015 Russel Winder 8

Slide 9

Slide 9 text

Copyright © 2015 Russel Winder 9 What is the value of ?

Slide 10

Slide 10 text

Copyright © 2015 Russel Winder 10 Well that's easy, it's…

Slide 11

Slide 11 text

Copyright © 2015 Russel Winder 11 

Slide 12

Slide 12 text

Copyright © 2015 Russel Winder 12 Exactly.

Slide 13

Slide 13 text

Copyright © 2015 Russel Winder 13 It's simples. Александр Орлов 2009

Slide 14

Slide 14 text

Copyright © 2015 Russel Winder 14 Albeit irrational.

Slide 15

Slide 15 text

Copyright © 2015 Russel Winder 15 Approximating  ● What is it's value represented as a floating point number? ● We can only obtain an approximation. ● A plethora of possible algorithms to choose from, a popular one is to employ the following integral equation. π 4 =∫ 0 1 1 1+x2 dx

Slide 16

Slide 16 text

Copyright © 2015 Russel Winder 16 One possible algorithm ● Use quadrature to estimate the value of the integral which is the area under the curve. – π= 4 n ∑ i=1 n 1 1+( i−0.5 n ) 2 With n = 3 not much to do, but potentially lots of error. Use n = 107 or n = 109? Embarrassingly parallel.

Slide 17

Slide 17 text

Copyright © 2015 Russel Winder 17 Code!

Slide 18

Slide 18 text

Copyright © 2015 Russel Winder 18 C++ D Chapel

Slide 19

Slide 19 text

Copyright © 2015 Russel Winder 19 Because addition is commutative and associative, expression can be decomposed into sums of partial sums.

Slide 20

Slide 20 text

Copyright © 2015 Russel Winder 20 a + b + c + d + e + f = ( a + b ) + ( c + d ) + ( e + f )

Slide 21

Slide 21 text

Copyright © 2015 Russel Winder 21 Scatter Gather — map reduce data parallel

Slide 22

Slide 22 text

Copyright © 2015 Russel Winder 22 Code!

Slide 23

Slide 23 text

Copyright © 2015 Russel Winder 23 C++ D Chapel

Slide 24

Slide 24 text

Copyright © 2015 Russel Winder 24 The Python data model and its GIL make Python unsuitable for parallel computation.

Slide 25

Slide 25 text

Copyright © 2015 Russel Winder 25 PyPy and NumPy do not help, nor does Cython, Numba, etc., as much as they perhaps should.

Slide 26

Slide 26 text

Copyright © 2015 Russel Winder 26 Native code, e.g. C++, D, Chapel, are the way forward for CPU-bound components of a Python-based system.

Slide 27

Slide 27 text

Copyright © 2015 Russel Winder 27 And then there is OpenCL and OpenGL, soon to be replaced by Vulkan.

Slide 28

Slide 28 text

Copyright © 2015 Russel Winder 28 Making Computations Execute Very Quickly Dr Russel Winder email: [email protected] twitter: @russel_winder Web: http://www.russel.org.uk