Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Making Pandas Fly (PyDataAmsterdam 2020)

June 18, 2020

Making Pandas Fly (PyDataAmsterdam 2020)

Another variant of the recent talks, this one focuses on making Pandas faster by digging into NumPy, using my `dtype_diet` memory-saving tool and understanding what's going on with some of Pandas' low level functions. See https://ianozsvald.com/ for more.


June 18, 2020

More Decks by ianozsvald

Other Decks in Science


  1. Making Pandas Fly (live from London) @IanOzsvald – ianozsvald.com Ian

    Ozsvald PyDataAmsterdam 2020
  2.  Interim Chief Data Scientist  19+ years experience 

    Team coaching & public courses – Higher Performance! Introductions By [ian]@ianozsvald[.com] Ian Ozsvald 2nd Edition!
  3.  All volunteers – go say thank you in #lobby

     NumFOCUS benefits us all Thank the organisers! By [ian]@ianozsvald[.com] Ian Ozsvald
  4.  Pandas – Saving RAM – Calculating faster by dropping

    to Numpy  Advice for “being highly performant” Today’s goal By [ian]@ianozsvald[.com] Ian Ozsvald
  5.  Go to Notebook for demo Demo By [ian]@ianozsvald[.com] Ian

  6. NumPy vs Pandas overhead (ser.sum()) By [ian]@ianozsvald[.com] Ian Ozsvald 25

    files, 83 functions Very few NumPy calls! Thanks!
  7. Overhead... By [ian]@ianozsvald[.com] Ian Ozsvald

  8. Overhead with ser.values.sum() By [ian]@ianozsvald[.com] Ian Ozsvald 18 files, 51

    functions Many fewer Pandas calls (but still a lot!)
  9. Is Pandas unnecessarily slow? By [ian]@ianozsvald[.com] Ian Ozsvald Missing? The

    bottleneck library! This certainly helps
  10. Is Pandas unnecessarily slow – NO! By [ian]@ianozsvald[.com] Ian Ozsvald

    https://github.com/pandas-dev/pandas/issues/34773 - the truth is a bit complicated!
  11.  Install optional (but great!) Pandas dependencies – bottleneck –

    numexpr  Investigate https://github.com/ianozsvald/dtype_diet  Investigate my ipython_memory_usage (PyPI/Conda) Being highly performant By [ian]@ianozsvald[.com] Ian Ozsvald https://pandas.pydata.org/pandas-docs/stable/user_guide/enhancingperf.html
  12.  Mistakes slow us down (PAY ATTENTION!) – Try nullable

    Int64 & boolean, forthcoming Float64 – Write tests (unit & end-to-end) – Codify your assumptions – bulwark library – https://github.com/ianozsvald/notes_to_self Being highly performant By [ian]@ianozsvald[.com] Ian Ozsvald
  13.  Make it right then make it fast  Think

    about being performant  See blog for my classes  I’d love a postcard if you learned something new! Summary By [ian]@ianozsvald[.com] Ian Ozsvald
  14. Covid 19 UK economic impact? By [ian]@ianozsvald[.com] Ian Ozsvald