Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Pythonic Whirlwind

A Pythonic Whirlwind

Python caught on as a language for steering computational science applications long before it enjoyed its current mainstream popularity. Today, Python has a mature ecosystem of tools for writing and interacting with high-performance codes, and IPython/Jupyter notebooks are an increasingly common method of developing and documenting analyses. In this talk, we will discuss Python for computational science, including some of the standard scientific software packages available in Python.

Presented at the Cornell Scientific Software Club (cornell-ssw.github.io).

David Bindel

October 03, 2016
Tweet

More Decks by David Bindel

Other Decks in Programming

Transcript

  1. Goals and Non-Goals • Non-goal: Teach Python in one hour!

    • Goal: Give some perspective • Why Python? • What to learn about? • Where to learn more?
  2. Syntactic features • Shell style comments (start with #) •

    Leading white space significant • Colons for code blocks • Style guide is PEP8 (Python Enhancement Proposal)
  3. Deeper features: data • Basic data types • Integers, floating

    point, complex, string, boolean, None • Dictionary, tuple, Linguistics • Objects, functions • Supports higher-order functions (and decorators) • Support for matrices (NumPy) and tables (pandas) • MATLAB-ish array ops (e.g. slicing)
  4. Deeper features: modules/packages • Docstrings everywhere • File-based module system

    • Packaging systems (distutils, PyPI) • "Batteries includes" stdlib
  5. Deeper features: functions and objects • Can manipulate functions, objects,

    classes as data • Especially useful with decorators • Obeys "duck typing" • Pro: Don't have to satisfy type checker to run • Con: Hard to catch errors by static analysis1 1 This may be changing; see e.g. PEP484 and mypy
  6. Deeper features: iteration and mapping • Lots of ways to

    do functional mapping • map command • Python generators for generalized for • List comprehension and dictionary comprehension
  7. Deeper features: iteration and mapping xs = [1, 2, 3,

    4, 5] # Produce generator for list [2, 3, 4, 5, 6] ys = map(lambda x: x + 1, xs) # Produce generator for list [2, 3, 4, 5, 6] zs = [x+1 for x in xs] # Iterate over the collections with generalized for for x, y, z in zip(xs, ys, zs): print(x, y, z)
  8. Logistical advice • 2 vs 3: Use Python 3 if

    possible (it usually is now) • Distro: Use Anaconda or Intel
  9. Learning Python A whirlwind tour This is free O'Reilly report

    inspired the name of this tutorial. Jake VanderPlas is one of the very visible leaders in the scientific Python stack.
  10. Learning Python Python and the scientific stack Katy Huff was

    a founding member of The Hacker Within at University of Wisconsin, Madison, and has long been active in Software Carpentry. I have seen this book described more than once as "Software Carpentry in book form."
  11. Learning Python Understanding the language David Beazley is a long-time

    Python exponent, teacher, and consultant. [This book] is my favored reference for understanding what Python is really doing behind the scenes, particularly with advanced features like decorators.
  12. Learning Python Making it fast High-Performance Python is about all

    the tricks you can use to make Python run fast, including specialized interpreters like PyPy and compilers like Cython, among others. Some of the content will be obsolete soon; the technology moves fast. Nonetheless, I think this is valuable stuff to learn.
  13. Learning Python Or search for Python on O'Reilly • 275

    books • 41 webcasts • 40 books • 40 articles/blogs • 15 conferences • 11 reports (free)
  14. The old new thing? • Guido Van Rossum (Python BDFL)

    started work in 1989 • Release 0.9.0 was February 1991 • Current release series (Python 3) started 2008 • Compare: • "C with classes": 1979 • C++ 2.0 in 1989 • First public Java release in 1991
  15. Evolution of SciPy/NumPy • 1995: matrix-sig and Numeric start •

    1997: Paul Dubois takes over numeric • 2000: Oliphant, Jones, Peterson start SciPy (from MultiPack) • 2002: numarray competes with numeric • 2006: Oliphant introduces NumPy
  16. Recent events • Python 3.0 launches in 2008 • Python

    becomes a "language of data science" • Wide-spread adoption in intro programming classes
  17. Which Python? There are many Pythons • CPython - main

    version • PyPy - JIT for statically-typed RPython • Jython - runs on JVM • IronPython - integrates with .NET • PythonNet - same We assume CPython.
  18. Batteries included Python offers good-enough replacements for • Control scripts

    in shell or Tcl • Perl/Awk/etc data scripts • Plotters (with Matplotlib and company) • MATLAB (with SciPy/NumPy) • Mathematica-style notebooks (with Jupyter)
  19. Crazy glue Long history of tools to bridge performance gap:

    • SWIG and f2py (mid 90s): wrapper generator • Cython (previously Pyrex): C/Python hybrid • Numba: recent LLVM-based JIT accelerator • and many others
  20. NumPy and SciPy These are the base of the Python

    scientific ecosystem. • NumPy is about array manipulation and basic LA • SciPy has more high-level scientific codes
  21. Visualization tools • Matplotlib is default • Pandas plot builds

    on matplotlib • Seaborn also builds on matplotlib • Bokeh for interactive web plots • Altair is the new kid on the block
  22. pandas The pandas library is Python's answer to the R

    data frame object. There is a book, too.
  23. Scikits • Too specialized for the main SciPy distribution •

    Many domains: finance, audio, geoscience, vision, ML, ... • scikit-learn may be the best known
  24. Cython • Compiles Python down to C • C type

    annotations for faster code • Or syntax to call C routines • O'Reilly inevitably has a book
  25. Numba Add @jit before your function defs, and then a

    miracle occurs. from numba import jit from numpy import arange # jit decorator tells Numba to compile this function. # Arg types inferred by Numba when function is called. @jit def sum2d(arr): M, N = arr.shape result = 0.0 for i in range(M): for j in range(N): result += arr[i,j] return result a = arange(9).reshape(3,3) print(sum2d(a))
  26. SymPy and SAGE • SAGE: Symbolic tools a la Magma,

    Maple, Mathematica • SymPy: Lightweight symbolic math library • SAGE includes SymPy
  27. Package managers There have been many package managers over time:

    distutils, setuptools, distribute, pip/virtualenv, and conda. Outside of the scientific community, pip packages seem to be the standard. But building numerical codes is generally a problem, and this is part of why conda was created.
  28. Parallelism • Python threads support concurrency, not parallelism • For

    parallelism on one node: multiprocessing • But many other options, whether one node or cluster
  29. Jupyter • Mix text, code, output • Interact via web

    interface • Works with many languages • Started with Python • Then Julia, Python, and R • Now many others as well
  30. Hosted services • GitHub: automatic rendering from repos • http://notebooks.azure.com

    • http://mybinder.org/ • http://wakari.io/ • http://cloud.sagemath.com/
  31. Jupyter in the news! The LIGO Open Science Center (LOSC)

    released tutorial notebooks on the signal processing on the LIGO data -- you can reproduce it for yourself!
  32. Role of notebooks • Notebooks are great for notes, tutorials,

    one-off analyses • Less useful for • Developing programs or modules • Running large-scale simulations • Does not replace Make, SnakeMake, PyDoIt, etc
  33. Why Python? • Simple to learn • There is a

    reason we use it for CS 1110! • Widely adopted • Lots of libraries (especially for computational science) • Many collaborators/students will already know it • You can find answers to common questions easily • Serves as a useful glue language
  34. Why not Python? • Too slow? • Maybe too slow

    for numerical kernels... • But it makes a nice interface • And Cython/Numba make it faster than you might think • I wish there were better static analysis tools • But this may be changing • And I only wish for this sometimes!
  35. What I think You should learn some Python • It

    is easy to get started • It is ubiquitous • It makes a good programming "Swiss army knife"