Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Optimizing Python programs, PyPy to the rescue by Richard Plangger

Pycon ZA
October 06, 2016

Optimizing Python programs, PyPy to the rescue by Richard Plangger

In this talk I want to show how you can use PyPy for your benefit. It will kick off with a short introduction covering PyPy and its just in time compiler. PyPy is the most advanced Python interpreter around and while it should generally just speed up your programs there is a wide range of performance that you can get out of PyPy.

Throughout the talk some developer statements and big applications will motivate why PyPy is a viable option to optimize your Python programs. In addition I will present the companies value after switching to PyPy.

The first part, will cover considerations why one should write Python programs, and only spend fractions of the development time to optimize your program. The second part of this session will be about this small part of time: in cases where you need it, I'll show tools that help you inspect and change your program to improve it. We will also dive into one tool more elaborately. VMProf, a platform to inspect your program while it is running, imposing very little overhead.

As a result of this talk, an audience member should be equipped with tools that helps him to understand performance issues and optimize programs.

Pycon ZA

October 06, 2016
Tweet

More Decks by Pycon ZA

Other Decks in Programming

Transcript

  1. MORE "GENERAL" PYPY TALK Goals: An approach to optimize Python

    programs Examples How not to start optimizing What is PyPy up to now?
  2. PYPY IS A ... ... fast virtual machine for Python

    developed by researchers, freelancers and many contributors.
  3. $ p y t h o n y o u

    r p r o g r a m . p y $ p y p y y o u r p r o g r a m . p y
  4. ABOUT ME Working on PyPy (+1,5y) Master thesis → GSoC

    2015 → PyPy living and working in Austria
  5. FOR EXAMPLE? CPU time Peak Heap Memory Requests per second

    Latency ... Dissatisfaction with one criteria of your program!
  6. a = 3 # O(1) [ x + 1 f

    o r x i n r a n g e ( n ) ] # O(n) [ [ x + y f o r x i n r a n g e ( n ) ] \ f o r y i n r a n g e ( m ) ] # O(n*m) == O(n) if n > m
  7. ONLY OPTIMIZE A ROUTINE IF ... you know that the

    complexity cannot be stripped down
  8. Written in Python Moved to vmprof.com Log files can easily

    take up to 40MB uncompressed Takes ~10 seconds to parse with CPython Complexity is linear to input size of the log file
  9. - Takes too long to parse - Parsing is done

    each request Our criteria: CPU time to long + requests per second (Many objects are allocated)
  10. Caching - Easily done with your favourite caching framework Reduce

    CPU time - PyPy seems to be good at that?
  11. LET'S RUN IT... $ c p y t h o

    n 2 . 7 p a r s e . p y 4 0 m b . l o g ~ 1 0 s e c o n d s $ p y p y 2 p a r s e . p y 4 0 m b . l o g ~ 2 s e c o n d s
  12. VMPROF $ p i p i n s t a

    l l v m p r o f $ p y t h o n - m v m p r o f - - w e b p a r s e . p y → link
  13. A SIMPLIFIED VIEW 1. Start interpretation 2. Loops trigger recording

    3. Optimization stage 4. Machine code generation
  14. JITVIEWER Tool to inspect PyPy internals Helps you to learn

    and understand PyPy Provided at vmprof.com
  15. PROPERTIES & TRICKS Type specialization Object unboxing GC scheme Dicts

    Dynamic class creation (Instance maps) Function calls (+ Inlining)
  16. Q: WHAT DOES YOUR SERVICE DO? A: ... allow generally

    large companies to send targeted marketing (e.g. serve ads) to people based on data we have learned
  17. Q: PYPY, WHERE WAS IT MOST HELPFUL? A: ... ~30%

    speedups immediately from switching to PyPy ...
  18. Q: PYPY ISSUES? A: ... we had to solve for

    rolling deploys ... but that's ok, that's fairly easy ...
  19. Q: VALUE TO YOUR COMPANY? A: Latency speedup was somewhere

    aroudn 10% ... But that number is deceiving It's very valuable for us obviously But it's only 10%, because even this app that I'm talking about, which is fairly high volume (500,000 QPS), is a WSGI app So it spends lots of time blocking