Upgrade to Pro — share decks privately, control downloads, hide ads and more …

pandas: Powerful data analysis tool for Python

pandas: Powerful data analysis tool for Python

Talk by Wes McKinney, Co-Founder Lambda Foundry at Data Science London meetup 18/10/12

Data Science London

November 12, 2012
Tweet

More Decks by Data Science London

Other Decks in Technology

Transcript

  1. pandas: Powerful data analysis tools for Python Wes McKinney Lambda

    Foundry, Inc. @wesmckinn Data Science London Meetup 2012-10-18
  2. Me • Recovering mathematician • 3 years in the quant

    finance industry • Last 2: statistics + freelance + open source • My new company: Lambda Foundry • Python for data analysis/science
  3. Book • In print this month • ~470 pages •

    IPython • pandas • NumPy • matplotlib • Case studies
  4. pandas? • http://pandas.pydata.org • Rich relational data tool built on

    top of NumPy • Like R’s data.frame on steroids • Excellent performance • Easy-to-use, highly consistent API • A foundation for data analysis in Python
  5. pandas • In heavy production in many places: finance, web

    analytics, ... • Generally much better performance than other open source alternatives (e.g. R) • Hope: basis for the “next generation” statistical computing and analysis environment
  6. Simplifying data wrangling • Data munging / preparation / cleaning

    / integration is slow, error prone, and time consuming • Everyone already <3’s Python for data wrangling: pandas takes it to the next level
  7. Dev roadmap • Most recent major/disruptive effort was the time

    series overhaul (0.8.0+) • Major initiatives • High perf CSV parser engine • “NDFrame”: truly N-dimensional labeled array object • Faster, compressible serialization • Expand data type support
  8. Thanks! • Follow me on Twitter: @wesmckinn • GitHub: wesm

    • Blog: http://blog.wesmckinney.com • Exciting Python things ahead