Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to scientific programming in python

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
Avatar for Olivier Hervieu Olivier Hervieu
September 13, 2014

Introduction to scientific programming in python

Talk given at the Pycon JP - 2014/09/13

Avatar for Olivier Hervieu

Olivier Hervieu

September 13, 2014
Tweet

More Decks by Olivier Hervieu

Other Decks in Programming

Transcript

  1. • born on twitter • not my initial proposal for

    pyconjp* ! * speakerdeck.com/ohe/shit-happens-dot-dot-dot-v2 Why this talk?
  2. What to expect? • tour of tools that can (must?)

    be used by the everyday scientific programmer • some guidelines on how to industrialize your stack
  3. A little about me • software engineer at @tinyclues •

    10 years of experience, work everyday with python for 6 years (I know, I’m old) • first conference in japan (yeah!) • more about.me/ohe • slides can be found on speakerdeck.com/ohe
  4. • ipython is a must-use (for every pythonista) • if

    you don’t use it, install it now (pip install ipython) • ipython provides: • a powerful interactive shell • a browser-based notebook with support for code, text, mathematical expressions, inline plots and other rich media (as described on their website) • easy to use, high performance tools for parallel computing
  5. • notebook mode supports literate programming and reproducible sessions •

    notebook allows to store chunks of python along side the results and additional comments (HTML, Latex, MarkDown) • a notebook can be exported in various file formats
  6. • ipython is the de-facto standard for sharing python sessions

    —> see nbviewer.ipython.org • the project is well maintained, very stable (no surprises when you upgrade your version)
  7. • numpy provides a powerful N-dimensions array object • methods

    on these arrays are fast because they relies on well-optimised librairies for linear algebra (BLAS, ATLAS, MKL) • numpy is tolerant to python’s lists
  8. python vs numpy def  matmult(a,b):          zip_b

     =  zip(*b)          return  [[sum(ele_a*ele_b  for  ele_a,  ele_b  in  zip(row_a,  col_b))                              for  col_b  in  zip_b]  for  row_a  in  a] matmult np.dot speedup (10, 10) 936µs 2µs 450x (100, 100) 693000µs 53µs 13000x (1000, 1000) 744000000µs 13900µs 53000x
  9. • you don’t want to implement your matrix multiplication method

    :-) • numpy inherits from years of computer based numerical analysis problem solving • don’t believe benchmarks about python performance (who says Julia?)
  10. • provides numerous numerical routines, that run efficiently on top

    of numpy arrays for: • optimization • signal processing • linear algebra … • provides also some convenient data structures as compressed sparse matrix and spatial data structures
  11. • if you had already use some scikits (scikit-learn, scikit-image)

    you already used scipy extensively • in other words, scipy is a toolbox for mathematicians, it contains many hidden treasures for them • for the programmer, APIs are a bit harsh, as for the naming of methods (but this naming is totally explicit for mathematicians)
  12. • The ultimate plotting library that renders 2D and 3D

    high-quality plots for python (I think other languages are a bit jealous too ;) • The API mimics, in many ways the MATLAB one, easing the transition from MATLAB users to python • Once again, no surprises, matplotlib is a very stable and mature project (expect one major release per year) • I recommend you to watch “Introduction to Numpy and Matplotlib” (4hours!) on youtube* * https://www.youtube.com/watch?v=3Fp1zn5ao2M
  13. • scikit-learn is one of the numerous scikits that have

    been developed in the last years (there’s also scikit-image, scikit-statsmodel etc…) • it provides a ready-to-use environment to play with standard machine learning algorithms • expect a very clean API • the project is very active and have an awesome community
  14. • fairly “new” project (open-sourced in 2009) but development is

    really active since 2012 • data manipulation library based on Numpy • provides a DataFrame data structure that furnishes methods for accessing, merging/grouping, indexing data easily • doesn’t play well (yet?) with scikits (there’s some attempt like sklearn-pandas)
  15. • numpy/scipy/scikit-learn rely on many low-level Fortran/C library such as

    BLAS, ATLAS, the Intel MKL… • most of these libraries are shipped by your favorite OS unoptimized (well, this is not the case for Mac OS) • you may want to re-compile these librairies
  16. • re-compile is the (very) long way! • we did

    that at tinyclues for two years, we’re now using a packaged python distribution. Some of them: • anaconda (powered by continuum analytics) • canopy (powered by enthought)
  17. • sadly, these distributions come with another package management tool

    (conda, enpkg) that are sometimes not playing nice with pip and/or virtualenv • adds a new step to this famous tweet about python package managers :)
  18. We’re not done • librairies for performance: numba, cython, …

    • domain specific librairies: sympy, nltk, … • bindings: rpy2, … • storage: pytables, …
  19. Free (as in free beer) • All these libraries come

    for free and are developed by passionate developers. • Please, be grateful; help them! • by finding and filling bugs (we always love to see that our code is really used by someone) • by fixing bugs or giving a beer to developers • by supporting them financially • by hosting one of their sprint (if your office is big enough)
  20. Recommended • API design for machine learning software: experiences from

    the scikit-learn project: http:// arxiv.org/abs/1309.0238 • Programming Collective Intelligence: http:// shop.oreilly.com/product/9780596529321.do • PyData Channel on Vimeo: http://vimeo.com/pydata