Slide 1

Slide 1 text

A Pythonic Whirlwind

Slide 2

Slide 2 text

Goals and Non-Goals • Non-goal: Teach Python in one hour! • Goal: Give some perspective • Why Python? • What to learn about? • Where to learn more?

Slide 3

Slide 3 text

Python basics #!/usr/bin/env python import sys def main(name="world"): print("Hello, {0}!".format(name)) if __name__ == "__main__": main(*sys.argv[1:])

Slide 4

Slide 4 text

Syntactic features • Shell style comments (start with #) • Leading white space significant • Colons for code blocks • Style guide is PEP8 (Python Enhancement Proposal)

Slide 5

Slide 5 text

Deeper features: data • Basic data types • Integers, floating point, complex, string, boolean, None • Dictionary, tuple, Linguistics • Objects, functions • Supports higher-order functions (and decorators) • Support for matrices (NumPy) and tables (pandas) • MATLAB-ish array ops (e.g. slicing)

Slide 6

Slide 6 text

Deeper features: modules/packages • Docstrings everywhere • File-based module system • Packaging systems (distutils, PyPI) • "Batteries includes" stdlib

Slide 7

Slide 7 text

Deeper features: functions and objects • Can manipulate functions, objects, classes as data • Especially useful with decorators • Obeys "duck typing" • Pro: Don't have to satisfy type checker to run • Con: Hard to catch errors by static analysis1 1 This may be changing; see e.g. PEP484 and mypy

Slide 8

Slide 8 text

Deeper features: iteration and mapping • Lots of ways to do functional mapping • map command • Python generators for generalized for • List comprehension and dictionary comprehension

Slide 9

Slide 9 text

Deeper features: iteration and mapping xs = [1, 2, 3, 4, 5] # Produce generator for list [2, 3, 4, 5, 6] ys = map(lambda x: x + 1, xs) # Produce generator for list [2, 3, 4, 5, 6] zs = [x+1 for x in xs] # Iterate over the collections with generalized for for x, y, z in zip(xs, ys, zs): print(x, y, z)

Slide 10

Slide 10 text

Python as an acceptable Lisp A comparison: http://norvig.com/python-lisp.html

Slide 11

Slide 11 text

Logistical advice • 2 vs 3: Use Python 3 if possible (it usually is now) • Distro: Use Anaconda or Intel

Slide 12

Slide 12 text

Where to learn?

Slide 13

Slide 13 text

Learning Python: First steps • https://www.python.org • https://developers.google.com/edu/python/ • https://swcarpentry.github.io/python-second-language/ • CS 1133: Transition to Python

Slide 14

Slide 14 text

Learning Python A whirlwind tour This is free O'Reilly report inspired the name of this tutorial. Jake VanderPlas is one of the very visible leaders in the scientific Python stack.

Slide 15

Slide 15 text

Learning Python Python and the scientific stack Katy Huff was a founding member of The Hacker Within at University of Wisconsin, Madison, and has long been active in Software Carpentry. I have seen this book described more than once as "Software Carpentry in book form."

Slide 16

Slide 16 text

Learning Python Understanding the language David Beazley is a long-time Python exponent, teacher, and consultant. [This book] is my favored reference for understanding what Python is really doing behind the scenes, particularly with advanced features like decorators.

Slide 17

Slide 17 text

Learning Python Making it fast High-Performance Python is about all the tricks you can use to make Python run fast, including specialized interpreters like PyPy and compilers like Cython, among others. Some of the content will be obsolete soon; the technology moves fast. Nonetheless, I think this is valuable stuff to learn.

Slide 18

Slide 18 text

Learning Python Or search for Python on O'Reilly • 275 books • 41 webcasts • 40 books • 40 articles/blogs • 15 conferences • 11 reports (free)

Slide 19

Slide 19 text

Some history

Slide 20

Slide 20 text

The old new thing? • Guido Van Rossum (Python BDFL) started work in 1989 • Release 0.9.0 was February 1991 • Current release series (Python 3) started 2008 • Compare: • "C with classes": 1979 • C++ 2.0 in 1989 • First public Java release in 1991

Slide 21

Slide 21 text

Evolution of SciPy/NumPy • 1995: matrix-sig and Numeric start • 1997: Paul Dubois takes over numeric • 2000: Oliphant, Jones, Peterson start SciPy (from MultiPack) • 2002: numarray competes with numeric • 2006: Oliphant introduces NumPy

Slide 22

Slide 22 text

Recent events • Python 3.0 launches in 2008 • Python becomes a "language of data science" • Wide-spread adoption in intro programming classes

Slide 23

Slide 23 text

Role of Python

Slide 24

Slide 24 text

Which Python? There are many Pythons • CPython - main version • PyPy - JIT for statically-typed RPython • Jython - runs on JVM • IronPython - integrates with .NET • PythonNet - same We assume CPython.

Slide 25

Slide 25 text

Batteries included Python offers good-enough replacements for • Control scripts in shell or Tcl • Perl/Awk/etc data scripts • Plotters (with Matplotlib and company) • MATLAB (with SciPy/NumPy) • Mathematica-style notebooks (with Jupyter)

Slide 26

Slide 26 text

Crazy glue Long history of tools to bridge performance gap: • SWIG and f2py (mid 90s): wrapper generator • Cython (previously Pyrex): C/Python hybrid • Numba: recent LLVM-based JIT accelerator • and many others

Slide 27

Slide 27 text

What to learn?

Slide 28

Slide 28 text

NumPy and SciPy These are the base of the Python scientific ecosystem. • NumPy is about array manipulation and basic LA • SciPy has more high-level scientific codes

Slide 29

Slide 29 text

Visualization tools • Matplotlib is default • Pandas plot builds on matplotlib • Seaborn also builds on matplotlib • Bokeh for interactive web plots • Altair is the new kid on the block

Slide 30

Slide 30 text

pandas The pandas library is Python's answer to the R data frame object. There is a book, too.

Slide 31

Slide 31 text

Scikits • Too specialized for the main SciPy distribution • Many domains: finance, audio, geoscience, vision, ML, ... • scikit-learn may be the best known

Slide 32

Slide 32 text

Cython • Compiles Python down to C • C type annotations for faster code • Or syntax to call C routines • O'Reilly inevitably has a book

Slide 33

Slide 33 text

Numba Add @jit before your function defs, and then a miracle occurs. from numba import jit from numpy import arange # jit decorator tells Numba to compile this function. # Arg types inferred by Numba when function is called. @jit def sum2d(arr): M, N = arr.shape result = 0.0 for i in range(M): for j in range(N): result += arr[i,j] return result a = arange(9).reshape(3,3) print(sum2d(a))

Slide 34

Slide 34 text

SymPy and SAGE • SAGE: Symbolic tools a la Magma, Maple, Mathematica • SymPy: Lightweight symbolic math library • SAGE includes SymPy

Slide 35

Slide 35 text

Package managers There have been many package managers over time: distutils, setuptools, distribute, pip/virtualenv, and conda. Outside of the scientific community, pip packages seem to be the standard. But building numerical codes is generally a problem, and this is part of why conda was created.

Slide 36

Slide 36 text

Parallelism • Python threads support concurrency, not parallelism • For parallelism on one node: multiprocessing • But many other options, whether one node or cluster

Slide 37

Slide 37 text

Jupyter notebooks

Slide 38

Slide 38 text

Jupyter • Mix text, code, output • Interact via web interface • Works with many languages • Started with Python • Then Julia, Python, and R • Now many others as well

Slide 39

Slide 39 text

Hosted services • GitHub: automatic rendering from repos • http://notebooks.azure.com • http://mybinder.org/ • http://wakari.io/ • http://cloud.sagemath.com/

Slide 40

Slide 40 text

Jupyter in the news! The LIGO Open Science Center (LOSC) released tutorial notebooks on the signal processing on the LIGO data -- you can reproduce it for yourself!

Slide 41

Slide 41 text

Role of notebooks • Notebooks are great for notes, tutorials, one-off analyses • Less useful for • Developing programs or modules • Running large-scale simulations • Does not replace Make, SnakeMake, PyDoIt, etc

Slide 42

Slide 42 text

Concluding thoughts

Slide 43

Slide 43 text

Why Python? • Simple to learn • There is a reason we use it for CS 1110! • Widely adopted • Lots of libraries (especially for computational science) • Many collaborators/students will already know it • You can find answers to common questions easily • Serves as a useful glue language

Slide 44

Slide 44 text

Why not Python? • Too slow? • Maybe too slow for numerical kernels... • But it makes a nice interface • And Cython/Numba make it faster than you might think • I wish there were better static analysis tools • But this may be changing • And I only wish for this sometimes!

Slide 45

Slide 45 text

What I think You should learn some Python • It is easy to get started • It is ubiquitous • It makes a good programming "Swiss army knife"