Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Python Profiling and Performance - Elementary to Enterprise

Python Profiling and Performance - Elementary to Enterprise

Delivered at PyBay 2016 (http://pybay.com/) on August 21, 2016 in UCSF Mission Bay's Convention Center.

If you enjoyed this, you will likely enjoy my course on Enterprise Software with Python, published by O'Reilly:


Shortform on Twitter: @mhashemi
Longform on my blog: http://sedimental.org
Code on GitHub: https://github.com/mahmoud

See you next year!

Mahmoud Hashemi

August 21, 2016

More Decks by Mahmoud Hashemi

Other Decks in Programming


  1. About Me • Lead Developer, PayPal Python Infrastructure • Python

    programmer since 2009 • Built services taking 101 to 109 requests per day • Enjoys: ◦ Coding open-source Python ◦ Reading Wikipedia ◦ occu.py-ing PayPal • @mhashemi & github.com/mahmoud
  2. Python Profiling and Performance - Mahmoud Hashemi Profiling and performance

    The engineer’s delight and/or doom. • Types of performance • Ground rules • Profiling tools • Scaling strategies GOTTA GO FAST
  3. Python Profiling and Performance - Mahmoud Hashemi Performance types What

    is fast? 1. Latency - Response time “200 millisecond client roundtrip” 2. Throughput - Successful traffic flow “200 requests per second” 3. Efficiency - Utilization and return on investment “2000 users per 4-core VM at 50% CPU” X. Scalability - Not a type of performance Complex, and often more about scaling people than software
  4. Python Profiling and Performance - Mahmoud Hashemi Safety first! Always

    wear your seatbelt, helmet, and check your mirrors: 1. Predictability is power 2. Good work takes cycles 3. Abide by Amdahl’s Law Performance ground rules
  5. Python Profiling and Performance - Mahmoud Hashemi Predictability is power

    • Automated tests first, then benchmarks • Test after every optimization • Remember, optimized code is: ◦ Harder to write and read ◦ Less maintainable ◦ Buggier, more brittle • See CPython’s consistency: ◦ Fast startup ◦ No JIT, no warmup ◦ No complex GC Performance ground rule #1
  6. Python Profiling and Performance - Mahmoud Hashemi Good work takes

    cycles • Healthy enterprise applications have big bones • Gather requirements ◦ Security ◦ Instrumentation ◦ Compatibility • Establish percentile SLAs: 50, 95, 99, max • Stick to the budget and put down the ping pong paddles • Good enough is good enough! Performance ground rule #2
  7. Python Profiling and Performance - Mahmoud Hashemi Abide by Amdahl’s

    Law • Speedups are relative to task significance • Keep perspective, maximize impact • Focus on one part at a time • Recheck proportions after every optimization Performance ground rule #3 https://commons.wikimedia.org/wiki/File:Optimizing-different-parts.svg
  8. Python Profiling and Performance - Mahmoud Hashemi Performance ground rules

    Always wear your seatbelt, helmet, and check your mirrors: 1. Predictability is power 2. Good work takes cycles 3. Abide by Amdahl’s Law Recap
  9. Python Profiling and Performance - Mahmoud Hashemi Python profiling tools

    1. Casual profiling ◦ time.time() ◦ timeit module 2. Offline profiling ◦ cProfile ◦ Other advanced options 3. Online profiling
  10. Python Profiling and Performance - Mahmoud Hashemi time.time() Trusty, print-based

    debugging will never die: import time start = time.time() do_work() print time.time() - start, 'seconds' Casual Python profiling Easy to use and explain. But: • Single measurement might misrepresent • Measurement expense could exceed operation • Must switch to time.clock() on Windows for better resolution • Gets tedious
  11. Python Profiling and Performance - Mahmoud Hashemi The timeit module

    Built-in, and fixes the main problems of time.time()-based profiling: • Does multiple runs to offset system variability • Each run does thousands or millions of repetitions, compensating for very fast operations • Automatically uses the platform’s best timing function • Multiple easy usage patterns Command line $ python -m timeit -s "import json" "json.dumps({})" 1000000 loops, best of 3: 1.73 usec per loop Python import timeit print timeit.repeat('json.dumps({})', setup='import json', number=1000000, repeat=3) Casual Python profiling Other timeit facts: • It disables garbage collection for even more consistency • It represents one of the few valid uses of Python code generation • Jupyter notebook has built-in support (%%timeit) • Less than 300 lines of code Notable successor: perf • Runs separate processes • Somewhat more statistical • Still easy to use
  12. Python Profiling and Performance - Mahmoud Hashemi The cProfile module

    Simpler tools assume you know which code you want to measure, whereas the built-in cProfile module: • Measures a whole thread of execution • Identifies time-consuming functions • Command-line and programmatic interfaces Command line $ python -m cProfile target_code.py Python import cProfile, pstats pr = cProfile.Profile() pr.enable() # call the code you want to measure pr.disable() pstats.Stats(pr).sort_stats('time').print_stats() Offline Python profiling 624865 function calls (517453 primitive calls) in 0.289 seconds Ordered by: internal time List reduced from 44 to 10 due to restriction <10> ncalls tottime percall cumtime percall filename:lineno(function) 2989/7 0.054 0.000 0.186 0.027 serdes.py:365(vo_tree) 12618/45 0.040 0.000 0.106 0.002 serdes_bin.py:78(append_field) 16393 0.037 0.000 0.044 0.000 serdes.py:315(_uniq_field) 84450/48 0.028 0.000 0.186 0.004 serdes.py:348(field_tree) 84450 0.023 0.000 0.037 0.000 {getattr} 68389 0.014 0.000 0.014 0.000 field.py:116(__get__) 26182 0.012 0.000 0.017 0.000 serdes_bin.py:128(append_int) 2757 0.009 0.000 0.010 0.000 serdes.py:336(_uniq_vo) 2489/43 0.008 0.000 0.105 0.002 serdes_bin.py:205(append_vo) 5103/94 0.007 0.000 0.103 0.001 serdes_bin.py:192(append_list) The cProfile + pstats text output is utilitarian, but interesting visualizations can be created with RunSnakeRun, SnakeViz, Gprof2dot, pyprof2calltree, and pyinstrument.
  13. Python Profiling and Performance - Mahmoud Hashemi Advanced tooling Offline

    Python profiling • line_profiler ◦ Line-by-line profiling inside of functions • yep ◦ Function-oriented profiler that crosses the Python-C boundary • GreenletProfiler ◦ A concurrency-friendly profiler for applications using greenlet • memory_profiler ◦ Process and line-by-line memory consumption measurement A variety of measurement and tuning tools can be built out of the following modules: • sys.setprofile() • sys.settrace() • sys._getframe() • gc • resource • psutil (not built-in) Just run a search for “python <module>” and check out the docs. Also, the built-in dis module works well for getting the most out of line_profiler
  14. Python Profiling and Performance - Mahmoud Hashemi Online profiling Offline

    profiling involves hypothetical scenarios, historical data, and performance sacrifice. Online profiling offers: • Imperceptible performance impact • Live data collection, in real time Online profiling requires more time to be accurate. Still, ending speculation is worth the work. At PayPal we use sampro for its lightweight online collection process. We also use lithoxyl for semantic instrumentation, including performance measurements. github.com/doublereedkurt/sampro github.com/mahmoud/lithoxyl
  15. Python Profiling and Performance - Mahmoud Hashemi Scaling strategies There

    are eight ways to scale Python projects. 1. Add more hardware 2. Rearchitect to divide work 3. Adopt the asynchronous approach 4. Use a smarter algorithm 5. Write faster Python 6. Build native Python extensions 7. Use a library with a faster implementation 8. Use a different Python runtime At PayPal we’ve used all 8 of these, and most multi-engineer projects will use at least four or five.
  16. Python Profiling and Performance - Mahmoud Hashemi Add more hardware

    Python scaling strategy #1 Adding more hardware is something enterprises often do best. It’s the obvious choice, and is either a “happy path” solution or a last-ditch desperation move. The Good • Solves certain problems • Easy to explain • The essence of scalability The Bad • Only solves certain problems • Provisioning & deployment must scale • Budget limits If adding more machines solves all your problems in a reliable and cost-effective manner, congratulations on achieving scalability!
  17. Python Profiling and Performance - Mahmoud Hashemi Rearchitect to divide

    work Python scaling strategy #2 Whether split across machines or CPUs, reworking your application to redistribute the work can make sense technically and organizationally. The Good • Easy-to-explain principle • Many SOA technologies The Bad • Easy to mess up ◦ Short-term correctness ◦ Long-term extensibility • “Now you have n2 problems” Scaling through abstractions like services and queues is such a common enterprise practice that there are bound to be replicable success stories within your organization.
  18. Python Profiling and Performance - Mahmoud Hashemi Adopt the asynchronous

    approach Python scaling strategy #3 In IO-bound scenarios, this is the go-to method for getting more done. The Good • Many libraries exist • Great way to learn about operating systems and Python The Bad • Drastically changes application • Complicates code, especially debugging • Limits “off-the-shelf” library usage Make sure you need it. Ask: What are the utilization requirements? How many requests per process are being handled now? Applications of substance are rarely only IO-bound, and the limitations brought by async can complicate feature development.
  19. Python Profiling and Performance - Mahmoud Hashemi Use a smarter

    algorithm Python scaling strategy #4 The ideal way to reduce work and speed up projects in CPU-bound scenarios. The Good • Python is runnable pseudocode! • Many books and Wikipedia articles • Good for those interview questions The Bad • Might be hard to find • Often takes time to understand • Language-specific algorithms Python has many built-in examples of this approach. bisect and heapq for list operations, itertools for iterators, re for string operations, among others. Even naive implementations can be made smarter through careful caching. Always make sure your queues and caches are bounded!
  20. Python Profiling and Performance - Mahmoud Hashemi Write faster Python

    Python scaling strategy #5 Python has obvious ways of doing things, but they aren't always the fastest. The Good • Packaging & deployment stay the same • Easy to measure and iterate • Same debugging and profiling tools • Python builtins are fast! The Bad • Gains can be small • Minimal parallelization • Code can be less clear Relatively low-risk, high-reward, and easy to get started using aforementioned profiling techniques. Becomes second nature pretty readily.
  21. Python Profiling and Performance - Mahmoud Hashemi Build a native

    Python extension Python scaling strategy #6 Python’s close relationship with C is one of its greatest strengths. The Good • C has 2-10x less overhead • Integrate with C open-source ecosystem • CPython is clean, idiomatic C • Cython is mature and greatly helpful The Bad • C has riskier bugs • Takes longer to get right • Complicates build and deployment Not for the faint of heart, but the partnership of Python and C is a key to performance and scale found in almost every Python enterprise.
  22. Python Profiling and Performance - Mahmoud Hashemi Use a faster

    implementation Python scaling strategy #7 Python has a huge community, so let’s leverage those open-source libraries. The Good • Python standards • Often “just works” • Many libraries are drop-in The Bad • Availability and maturity • Architectural compatibility • Build and deployment constraints There are so many success stories here. Built-in, ElementTree vs cElementTree, and external, json vs ujson. Not to mention all the libraries based on NumPy for fast math. See the “Choosing Dependencies” segment for more specific guidance.
  23. Python Profiling and Performance - Mahmoud Hashemi Use a different

    runtime Python scaling strategy #8 While CPython reigns supreme, Python does have alternate implementations, like PyPy and Jython, which can offer some scalability benefits in specific cases. The Good • 2-10x speedups depending • Easy to test for simple projects • Work with cool new features • Great for a blog posts The Bad • Drastically changes deployment • Architecture and environment limits • Libraries are often incompatible • All large projects will need code rework While some experimentation is fine, this type of change almost never happens, making it a very risky scaling strategy for substantial projects.
  24. Python Profiling and Performance - Mahmoud Hashemi Quick recap Remember

    and reference: • How to define “performant” • How to stay safe when optimizing • How to measure • How to scale 8 ways from Sunday Profiling and performance GOTTA GO FAST
  25. Python Profiling and Performance - Mahmoud Hashemi Thanks! @mhashemi github.com/mahmoud

    sedimental.org paypal-engineering.com/tag/python O’Reilly’s Enterprise Software with Python