Slide 1

Slide 1 text

Python Data Visualization What now?

Slide 2

Slide 2 text

Hi. I’m Rob Story. wrobstory.github.io github.com/wrobstory/pdxdatasci2014 @oceankidbilly

Slide 3

Slide 3 text

I work @simple We’re hiring Data Engineers, Scientists, and Analysts. Great company. Great team. Interesting Data.

Slide 4

Slide 4 text

Question:

Slide 5

Slide 5 text

! I have data. It’s November 2014. I want to make a chart. What library should I use?

Slide 6

Slide 6 text

! It depends on what you’re trying to do As with programming languages, databases, or web frameworks,

Slide 7

Slide 7 text

! Quick Web Visualization D3 Wrappers: NVD3, C3, Dimple, Vega ! Custom Web Visualization “Raw” D3 ! Exploratory Visualization R —> ggplot, RCharts Python —> Pandas, Seaborn, ggplot, Bokeh ! Custom Python Visualization “Raw” Matplotlib or Bokeh

Slide 8

Slide 8 text

What about creating journal-quality figures? Libraries built on Matplotlib and Bokeh renderers give you the flexibility to use them for data exploration and customization for publishing.

Slide 9

Slide 9 text

First rule of Python data vis: Use the IPython Notebook. Second rule of Python data vis: Use the IPython Notebook.

Slide 10

Slide 10 text

IPython Notebook: Flexible Reproducible Customizable Publishable Interactive-able all-the-other-ibles-and-ables

Slide 11

Slide 11 text

The Toolset: IPython Notebook Pandas Matplotlib Seaborn ggplot Bokeh?

Slide 12

Slide 12 text

NOTEBOOK DEMO! Q: What happens when data exploration reveals that your data is boring? A: Make a bar chart and move on.

Slide 13

Slide 13 text

! Statistical Visualization With Seaborn and the ggplot port, we now have a great toolset DEMO! Q: Is it more likely to snow on Mt. Hood in December or March?

Slide 14

Slide 14 text

What now? Q: How should developers interested in building Python data vis libraries proceed? Why didn’t you bring up Vincent.py, the thing you made?

Slide 15

Slide 15 text

We should be building on common visualization “kernels”. Building rendering engines is a lot of work.

Slide 16

Slide 16 text

High Level Tool -> kernel Bokeh —> Bokeh (Bokeh.js) Seaborn —> Matplotlib/Bokeh ggplot —> Matplotlib/Bokeh Vincent —> Vega —> D3 ! The first three can leverage everything that the Bokeh and Matplotlib team are working on. Publishing to different formats, rendering widgets, etc.

Slide 17

Slide 17 text

Let’s build better abstractions around common toolsets 1. Build on kernel “primitives” (Bokeh glyphs, Matplotlib line/bar, etc) 2. Document your data interfaces

Slide 18

Slide 18 text

That one is worth repeating: DOCUMENT YOUR DATA INTERFACES!

Slide 19

Slide 19 text

! If users struggle to get data into a chart, either your API is broken, your docs are broken, or both are broken.

Slide 20

Slide 20 text

! Ingest common data formats lists, dicts, numpy arrays, Pandas DataFrames & Series ! Document the shape of those formats Should my data be “long” or “wide”? Should I have a list of dicts, or a dict of lists? Does my DataFrame need a specific structure?

Slide 21

Slide 21 text

Q: Should I just contribute to MPL/Bokeh core? Bokeh: Yes! Bokeh has high-level Chart interfaces built on their glyphs. I’m sure they would appreciate PRs building on those interfaces. MPL: …maybe! Seaborn has shown that a lib built on MPL with a focused API can work *really* well. Model for other libraries?

Slide 22

Slide 22 text

What does the Future look like? Bokeh Write once, render everywhere Out of core “big data” visualization Crossfilter-like linked brushing but… Can’t quite recommend it yet for everyday analysis. Moving very fast, documentation hard to follow, data input inconsistent. It’s still a young project. DEMO!

Slide 23

Slide 23 text

Last Thing: is there a place for D3 in the IPy Notebook? Yes. I think so. One topic that wasn’t covered at all here was interactivity outside of IPython widgets. Data transitions, etc. Bokeh is working on it, but D3 already has many libs with these features. Sticky.py? Maybe one day…

Slide 24

Slide 24 text

FIN! THANK YOU!