Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Higher Performance Python (ODSC 2019)

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
Avatar for ianozsvald ianozsvald
November 21, 2019

Higher Performance Python (ODSC 2019)

Making Pythonic data science faster with line_profiler, more efficient Pandas, dask, Swifter and compilation with Numba. Get a 100x speed-up on common operations using smarter tool choices.

Avatar for ianozsvald

ianozsvald

November 21, 2019
Tweet

More Decks by ianozsvald

Other Decks in Science

Transcript

  1.  Interim Chief Data Scientist  19+ years experience 

    Quickly build strategic data science plans  Team coaching & public courses Introductions By [ian]@ianozsvald[.com] Ian Ozsvald 2nd Edition April 2020
  2.  Introduce profiling, faster Pandas, multi-core  Reflect on “good

    practice” so you can be highly performant Today’s goal By [ian]@ianozsvald[.com] Ian Ozsvald
  3.  Calculate features including slope  Ordinary Least Squared on

    time series  100,000 rows in a DataFrame to trial  Estimate for 100x (52wk*2 time windows)  Lots of CPU time – can we do better? A typical higher-performance task By [ian]@ianozsvald[.com] Ian Ozsvald
  4. A typical task – need slope of the line By

    [ian]@ianozsvald[.com] Ian Ozsvald
  5.  Python, Pandas and NumPy distributed computing  Bag (standard

    Python collections), Array (NumPy) and Distributed DataFrame (Pandas)  dask-ml for distributed sklearn machine learning  Super-easy parallelised Pandas functions Dask By [ian]@ianozsvald[.com] Ian Ozsvald
  6.  Sklearn & iloc – 90 minutes  Lstsq &

    apply raw – 10 minutes  With Numba – 1 minute  Add Dask – 30 seconds (180x theoretical speed-up)  Don’t go crazy – remember maintenance Costs on the “big problem” By [ian]@ianozsvald[.com] Ian Ozsvald
  7.  iloc & apply are fine, cache where possible 

    Sklearn in dev, lstsq for prod? Maintenance cost...  Numba (and Dask) need team buy-in  Profile before trying ideas (else you’re guessing)  Test everything! Bulwark is nice On being “highly performant” By [ian]@ianozsvald[.com] Ian Ozsvald
  8.  We publish the 2nd edition next April  Thanks

    to O’Reilly for free copies to sign  I run training courses – come chat and tell me your needs please! Book signing (1st ed) later today By [ian]@ianozsvald[.com] Ian Ozsvald
  9.  Measure – don’t guess  Test everything  I’d

    love a postcard if you learned something new  Join my Thoughts & Jobs email list for tips via my blog  Lots of past talks on ianozsvald.com Summary By [ian]@ianozsvald[.com] Ian Ozsvald