Making Pandas Fly (PyDataAmsterdam 2020)

3d644406158b4d440111903db1f62622?s=47 ianozsvald
June 18, 2020

Another variant of the recent talks, this one focuses on making Pandas faster by digging into NumPy, using my `dtype_diet` memory-saving tool and understanding what's going on with some of Pandas' low level functions. See https://ianozsvald.com/ for more.



June 18, 2020


  Making Pandas Fly (live from London) @IanOzsvald – ianozsvald.com Ian Ozsvald PyDataAmsterdam 2020

    Ozsvald PyDataAmsterdam 2020
  Interim Chief Data Scientist  19+ years experience

    Team coaching & public courses – Higher Performance! Introductions 2nd Edition!
  All volunteers – go say thank you in #lobby

    NumFOCUS benefits us all Thank the organisers!
  4.  Pandas – Saving RAM – Calculating faster by dropping

    to Numpy  Advice for “being highly performant” Today’s goal By [ian]@ianozsvald[.com] Ian Ozsvald
  Go to Notebook for demo Demo

  NumPy vs Pandas overhead (ser.sum()) 25 files, 83 functions Very few NumPy calls! Thanks!

    files, 83 functions Very few NumPy calls! Thanks!
  Overhead...

  Overhead with ser.values.sum() 18 files, 51 functions Many fewer Pandas calls (but still a lot!)

    functions Many fewer Pandas calls (but still a lot!)
  Is Pandas unnecessarily slow? Missing? The bottleneck library! This certainly helps

    bottleneck library! This certainly helps
  10. Is Pandas unnecessarily slow – NO! By [ian]@ianozsvald[.com] Ian Ozsvald

    https://github.com/pandas-dev/pandas/issues/34773 - the truth is a bit complicated!
  11.  Install optional (but great!) Pandas dependencies – bottleneck –

    numexpr  Investigate https://github.com/ianozsvald/dtype_diet  Investigate my ipython_memory_usage (PyPI/Conda) Being highly performant By [ian]@ianozsvald[.com] Ian Ozsvald https://pandas.pydata.org/pandas-docs/stable/user_guide/enhancingperf.html
  12.  Mistakes slow us down (PAY ATTENTION!) – Try nullable

    Int64 & boolean, forthcoming Float64 – Write tests (unit & end-to-end) – Codify your assumptions – bulwark library – https://github.com/ianozsvald/notes_to_self Being highly performant By [ian]@ianozsvald[.com] Ian Ozsvald
  13.  Make it right then make it fast  Think

    about being performant  See blog for my classes  I’d love a postcard if you learned something new! Summary By [ian]@ianozsvald[.com] Ian Ozsvald
  Covid 19 UK economic impact?