Making Pandas Fly (PyDataAmsterdam 2020)

Making Pandas Fly (PyDataAmsterdam 2020)

Another variant of the recent talks, this one focuses on making Pandas faster by digging into NumPy, using my `dtype_diet` memory-saving tool and understanding what's going on with some of Pandas' low level functions. See for more.



June 18, 2020


  1. 2.

     Interim Chief Data Scientist  19+ years experience 

    Team coaching & public courses – Higher Performance! Introductions By [ian]@ianozsvald[.com] Ian Ozsvald 2nd Edition!
  2. 3.

     All volunteers – go say thank you in #lobby

     NumFOCUS benefits us all Thank the organisers! By [ian]@ianozsvald[.com] Ian Ozsvald
  3. 4.

     Pandas – Saving RAM – Calculating faster by dropping

    to Numpy  Advice for “being highly performant” Today’s goal By [ian]@ianozsvald[.com] Ian Ozsvald
  4. 6.

    NumPy vs Pandas overhead (ser.sum()) By [ian]@ianozsvald[.com] Ian Ozsvald 25

    files, 83 functions Very few NumPy calls! Thanks!
  5. 8.

    Overhead with ser.values.sum() By [ian]@ianozsvald[.com] Ian Ozsvald 18 files, 51

    functions Many fewer Pandas calls (but still a lot!)
  6. 10.

    Is Pandas unnecessarily slow – NO! By [ian]@ianozsvald[.com] Ian Ozsvald - the truth is a bit complicated!
  7. 11.

     Install optional (but great!) Pandas dependencies – bottleneck –

    numexpr  Investigate  Investigate my ipython_memory_usage (PyPI/Conda) Being highly performant By [ian]@ianozsvald[.com] Ian Ozsvald
  8. 12.

     Mistakes slow us down (PAY ATTENTION!) – Try nullable

    Int64 & boolean, forthcoming Float64 – Write tests (unit & end-to-end) – Codify your assumptions – bulwark library – Being highly performant By [ian]@ianozsvald[.com] Ian Ozsvald
  9. 13.

     Make it right then make it fast  Think

    about being performant  See blog for my classes  I’d love a postcard if you learned something new! Summary By [ian]@ianozsvald[.com] Ian Ozsvald