PyPy Use Case and Experiment

Slide 1

Slide 1 text

Use Case / Experiment TsundereChen

Slide 2

Slide 2 text

Test Environment • Vagrant, Ubuntu/16.04 • The benchmark result on Host OS and Guest OS is really close, so I use VM to get result  (BTW, it's really easy to get your VM dirty

Slide 3

Slide 3 text

Test Version • CPython2 2.7.12 • CPython3 3.5.2 • PyPy2 5.1.2 (Installed from apt-get) • PyPy2 5.6.0 (Compiled from source) • PyPy3 5.7.1 (Compiled from source)

Slide 4

Slide 4 text

Why only CPython & PyPy • Cython • You'll need to learn Cython's syntax, it's mixing C and Python. • Jython • The latest version of Jython 2.7.0 is released in May 2015, so it's outdated

Slide 5

Slide 5 text

Some notice • PyPy3 is still in beta, so if it's slower than CPython 3, no surprise • And not every module can run faster in PyPy than CPython, there will be samples later

Slide 6

Slide 6 text

Why PyPy3 is beta ? • The way CPython develop and the way PyPy develop is different • CPython • Focus on Python3, only maintain Python2 when security issue pops up • PyPy • Focus on PyPy2, also updating PyPy3, but it's not their main development

Slide 7

Slide 7 text

How to run PyPy

Slide 8

Slide 8 text

PyPy Installation • pypy.org/download.html • If you download binary • run bin/pypy

Slide 9

Slide 9 text

PyPy Installation • If you want to compile PyPy from scratch • First, install dependencies • http://doc.pypy.org/en/latest/build.html • Then, cd to pypy/goal

Slide 10

Slide 10 text

PyPy Installation • No-JIT • ../../rpython/bin/rpython -- opt=2 • JIT-Enabled • ../../rpython/bin/rpython -- opt=jit

Slide 11

Slide 11 text

PyPy Installation • Notice • Compile PyPy takes lots of time, and compile it with JIT-Enabled takes even more. • Usually takes 30min up • And you need at least 4G RAM to compile it on 64-Bit Machine, make sure you have enough RAM for this, or it may be killed by system

Slide 12

Slide 12 text

Mandelbrot — For Fun

Slide 13

Slide 13 text

Benchmark result

Slide 14

Slide 14 text

gcbench

Slide 15

Slide 15 text

json_bench

Slide 16

Slide 16 text

django_template

Slide 17

Slide 17 text

nqueens

Slide 18

Slide 18 text

regex_v8

Slide 19

Slide 19 text

richards

Slide 20

Slide 20 text

scimark

Slide 21

Slide 21 text

sqlalchemy_declarative

Slide 22

Slide 22 text

sqlalchemy_imperative

Slide 23

Slide 23 text

So, why do we still need CPython?

Slide 24

Slide 24 text

Not every case should use PyPy • For example, when it comes to the code below, CPython is faster than PyPy  myStr = “”  for x in xrange(1, 10**6):  myStr += str(myStr[x])

Slide 25

Slide 25 text

Can run on PyPy? http://packages.pypy.org/

Slide 26

Slide 26 text

Enough benchmark, let's get to DSL

Slide 27

Slide 27 text

Example • Language: Brainf*ck • 8 commands • + mem[ptr] += 1 - mem[ptr] -= 1  < ptr -= 1 > ptr += 1  , input() . print()  [ while(mem[ptr]){ ] }

Slide 28

Slide 28 text

Repo for Brainf*ck experiment • https://github.com/TsundereChen/bf_to_py

Slide 29

Slide 29 text

Our Goal • Build a Brainf*ck Interpreter • Build a Brainf*ck to Python translator, and compile it with PyPy

Slide 30

Slide 30 text

Interpreter • Just read in the ﬁle, and execute the command • But, we can add JIT here

Slide 31

Slide 31 text

What to do to add JIT • We need to ﬁnd "Reds" and "Greens" • Greens -> Deﬁne instructions • Reds -> What's being manipulated

Slide 32

Slide 32 text

What to do to add JIT • from rpython.rlib.jit import JitDriver • jitdriver = JitDriver(greens=[], reds=[]) • and add jit_merge_point to your main loop

Slide 33

Slide 33 text

Difference Hm....Not very good, right ? Notice the second It’s 2.73… v.s 2.61…

Slide 34

Slide 34 text

JIT is not enough...  How about some opts

Slide 35

Slide 35 text

Optimize • Speed up loop • Because every loop needs to look up address in dictionary, but the dictionary is static, so we can use @elidable decorator and add a function to speed up

Slide 36

Slide 36 text

Difference Hmm.... Better

Slide 37

Slide 37 text

Difference Hmm.... Better

Slide 38

Slide 38 text

Difference Hmm.... Better

Slide 39

Slide 39 text

Difference Hmm.... Better

Slide 40

Slide 40 text

Okay...enough interpreter  Let's talk about compiler

Slide 41

Slide 41 text

Basic Knowledge • It reads in Brainf*ck ﬁle, then turn into IR • Then you can choose to do Optimize in IR • Finally, turn your IR into Python Code, and compile it with PyPy to generate a binary ﬁle Brainf*ck Code IR Python Code Binary File

Slide 42

Slide 42 text

Architecture • ir.py -> For Brainf*ck to IR and IR to Python • trans.py -> Main program • python trans.py • optmode 1 to open optimization, 0 to not to • opt.py -> Optimize tricks

Slide 43

Slide 43 text

Optimizations • opt_contract ( Contract) • Operation like " +++++ ", means that we have to do "mem[p] += 1" ﬁve times • But because we have IR, so we can change the instruction to "mem[p] += 5" • When it comes to “+ - > <“, this trick can apply

Slide 44

Slide 44 text

Optimizations • opt_clearloop (Clear Loop) • Command like [-], it means when(mem[p]), do mem[p] -= 1 • We know what the result is, so we can set mem[p] to zero directly  mem[p] = 0

Slide 45

Slide 45 text

Optimizations • opt_multiloop & opt_copyloop (Multiplication and Copy) • Command like [->+>+<<] is copy mem[p]'s value to mem[p+1] and mem[p+2], and set mem[p] to zero • If we know what this is doing, we can make it short

Slide 46

Slide 46 text

Optimizations • opt_multiloop & opt_copyloop (Multiplication and Copy) • Same trick can apply to [->++<], make  mem[p+1] = 2 * mem[p] and set mem[p] = 0 • Which is multiplication

Slide 47

Slide 47 text

Optimizations • opt_offsetops (Operation Offsets) • In Brainf*ck, we know that we have a pointer indicating where we are now, and pointer usually move a lot • What if we can calculate offset for Instructions directly, so we don't need to move the pointer around

Slide 48

Slide 48 text

Optimizations • opt_cancel (Cancel Instructions) • ++++-->>+-<<< do the same thing as ++< • Then, why waste all the time on these Instructons ?

Slide 49

Slide 49 text

Can it run faster ? Yes — JIT

Slide 50

Slide 50 text

Result

Slide 51

Slide 51 text

Result

Slide 52

Slide 52 text

Result

Slide 53

Slide 53 text

Result

Slide 54

Slide 54 text

Great! So I'll use JIT from now on

Slide 55

Slide 55 text

Wait a sec... • Not every case can use JIT • Because JIT needs to warm-up and Analysis  Maybe warm-up can take more time than your code actually run • And it's import to avoid to record the warm-up time when you want to do some benchmarking

Slide 56

Slide 56 text

Wait a sec... • And do you really need JIT ? • It may cost a lot for one to import JIT to a project • Sometimes, maybe buy more server is a better choice than import JIT into your project

Slide 57

Slide 57 text

Wait a sec... • But if you analyzed your project, know how difﬁcult it is for you to import PyPy and JIT into your project, then you're good to go! • BTW, ﬁle size of executable with JIT Enabled is bigger than the one with No-JIT

Slide 58

Slide 58 text

Question?

Slide 59

Slide 59 text

References • Tutorial: Writing an Interpreter with PyPy, part 1 • https://morepypy.blogspot.tw/2011/04/tutorial-writing-interpreter-with- pypy.html • PyPy - Tutorial for Brainf*ck Interpreter • http://wdv4758h.github.io/posts/2015/01/pypy-tutorial-for-brainfuck- interpreter/ • matslina/bfoptimization • https://github.com/matslina/bfoptimization/ • Virtual Machine Constructions for Dummies • https://www.slideshare.net/jserv/vm-construct