Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Higher Performance Python

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
Avatar for ianozsvald ianozsvald
November 16, 2019

Higher Performance Python

Higher Performance Python given at PyDataCambridge 2019. This talk covers evaluating two different OLS approaches using line_profiler, applying one with a set of Pandas options (iloc, apply, apply with raw=True), compiling with Numba and using multi-core with Dask, along with some "being a highly performant developer" advice.

Avatar for ianozsvald

ianozsvald

November 16, 2019
Tweet

More Decks by ianozsvald

Other Decks in Science

Transcript

  1.  Interim Chief Data Scientist  19+ years experience 

    Quickly build strategic data science plans  Team coaching & public courses Introductions By [ian]@ianozsvald[.com] Ian Ozsvald 2nd Edition April 2020
  2.  Introduce profiling, faster Pandas, multi-core  Reflect on “good

    practice” so you can be highly performant Today’s goal By [ian]@ianozsvald[.com] Ian Ozsvald
  3.  Calculate features including slope  Ordinary Least Squared on

    time series  100,000 rows in a DataFrame to trial  Estimate for 100x (52wk*2 time windows)  Lots of CPU time – can we do better? A typical higher-performance task By [ian]@ianozsvald[.com] Ian Ozsvald
  4.  Pandas and NumPy distributed computing  Bag (standard Python

    collections), Array (NumPy) and Distributed DataFrame (Pandas)  Super-easy parallelised Pandas functions Dask By [ian]@ianozsvald[.com] Ian Ozsvald
  5.  Sklearn & iloc – 90 minutes  Lstsq &

    apply raw – 10 minutes  With Numba – 1 minute  With Dask – 30 seconds (180x theoretical speed-up)  Don’t go crazy – remember maintenance Costs on the “big problem” By [ian]@ianozsvald[.com] Ian Ozsvald
  6.  iloc & apply are fine  Sklearn in dev,

    lstsq for prod? Maintenance cost...  Numba (and Dask) need team buy-in  Profile before trying ideas (else you’re guessing)  Test everything! Bulwark is nice Being “highly performant” By [ian]@ianozsvald[.com] Ian Ozsvald
  7.  Your organisers are volunteers  Thank your volunteers &

    speakers please  Get a free (1st ed) signed book later Thank your organisers By [ian]@ianozsvald[.com] Ian Ozsvald
  8.  Jan: Successful Data Science Projects  Feb: Software Engineering

    for Data Scientists (2 day)  Mar: (planned) High Performance Python  https://IanOzsvald.com/training Upcoming public courses By [ian]@ianozsvald[.com] Ian Ozsvald
  9.  Measure – don’t guess  Test everything  I’d

    love a postcard if you learned something new  Join my thoughts+jobs list for tips and my training list  Lots of past talks on ianozsvald.com Summary By [ian]@ianozsvald[.com] Ian Ozsvald