Slide 1

Slide 1 text

@jakevdp Jake VanderPlas Jake VanderPlas @jakevdp Puget Sound Python Nov 9, 2016 Visualization in Python with Altair

Slide 2

Slide 2 text

@jakevdp Jake VanderPlas Statistical Visualization in Python with Altair Jake VanderPlas @jakevdp Puget Sound Python Nov 9, 2016

Slide 3

Slide 3 text

@jakevdp Jake VanderPlas Declarative Statistical Visualization in Python with Altair Jake VanderPlas @jakevdp Puget Sound Python Nov 9, 2016

Slide 4

Slide 4 text

@jakevdp Jake VanderPlas Python Viz is a bit Painful... "I have been using Matplotlib for a decade now, and I still have to look most things up" “I love Python but I switch to R for making plots” “I do viz in Python, but switch from matplotlib to seaborn to bokeh depending on what I need to do”

Slide 5

Slide 5 text

@jakevdp Jake VanderPlas Problem: where would you tell beginners to start? - Matplotlib - Bokeh - Plotly - Seaborn - Holoviews - VisPy - ggplot - pandas plot - Lightning Each library has strengths, but arguably none is yet the “killer viz app” for Data Science.

Slide 6

Slide 6 text

@jakevdp Jake VanderPlas Some examples . . .

Slide 7

Slide 7 text

@jakevdp Jake VanderPlas import matplotlib.pyplot as plt from numpy.random import rand for color in ['red', 'green', 'blue']: x, y = rand(2, 100) size = 200.0 * rand(100) plt.scatter(x, y, c=color, s=size, label=color, alpha=0.3, edgecolor='none') plt.legend(frameon=True) plt.show() Plotting with Matplotlib

Slide 8

Slide 8 text

@jakevdp Jake VanderPlas Plotting with Matplotlib Advantages: - Matlab-like API - Well-tested, standard tool for over a decade - LOADS of rendering backends - Can reproduce just about any plot… if you have time Disadvantages: - Matlab-like API - Often poor stylistic defaults (though see 2.0 release) - Imperative model: lots of manual tweaking required (though see Seaborn & ggplot) - Poor support for web/interactive graphs (though see http://mpld3.github.io/) - Often slow for large & complicated data

Slide 9

Slide 9 text

@jakevdp Jake VanderPlas Matplotlib Gallery

Slide 10

Slide 10 text

@jakevdp Jake VanderPlas from bokeh.plotting import figure, show from bokeh.models import LinearAxis, Range1d p = figure() for color in ['red', 'green', 'blue']: x, y = rand(2, 100) size = 0.03 * rand(100) p.circle(x, y, fill_color=color, radius=size, legend=color, fill_alpha=0.3, line_color=None) show(p) Plotting with Bokeh

Slide 11

Slide 11 text

@jakevdp Jake VanderPlas Plotting with Bokeh Advantages: - Web view/interactivity - Imperative and Declarative layer - Handles large and/or streaming datasets - Modern default plot styles Disadvantages: - No vector output (need PDF/EPS? Sorry) - Newer tool with a smaller user-base than matplotlib

Slide 12

Slide 12 text

@jakevdp Jake VanderPlas Bokeh Gallery

Slide 13

Slide 13 text

@jakevdp Jake VanderPlas Moving to Statistical Visualization

Slide 14

Slide 14 text

@jakevdp Jake VanderPlas from altair import load_dataset iris = load_dataset('iris') iris.head() Data in Tidy Format: i.e. rows are samples, columns are features Statistical Visualization

Slide 15

Slide 15 text

@jakevdp Jake VanderPlas color_map = dict(zip(iris.species.unique(), ['blue', 'green', 'red'])) for species, group in iris.groupby('species'): plt.scatter(group['petalLength'], group['sepalWidth'], color=color_map[species], alpha=0.3, edgecolor=None, label=species) plt.legend(frameon=True, title='species') plt.xlabel('petalLength') plt.ylabel('sepalLength') Statistical Visualization: Grouping

Slide 16

Slide 16 text

@jakevdp Jake VanderPlas color_map = dict(zip(iris.species.unique(),['blue', 'green', 'red'])) n_panels = len(color_map) fig, ax = plt.subplots(1, n_panels, figsize=(n_panels * 5, 3), sharex =True, sharey=True) for i, (species, group) in enumerate(iris.groupby('species')): ax[i].scatter(group['petalLength'], group['sepalWidth'], color =color_map[species], alpha =0.3, edgecolor=None, label =species) ax[i].legend(frameon=True, title='species') plt.xlabel('petalLength') plt.ylabel('sepalLength') Statistical Visualization: Faceting

Slide 17

Slide 17 text

@jakevdp Jake VanderPlas color_map = dict(zip(iris.species.unique(),['blue', 'green', 'red'])) n_panels = len(color_map) fig, ax = plt.subplots(1, n_panels, figsize=(n_panels * 5, 3), sharex =True, sharey=True) for i, (species, group) in enumerate(iris.groupby('species')): ax[i].scatter(group['petalLength'], group['sepalWidth'], color =color_map[species], alpha =0.3, edgecolor=None, label =species) ax[i].legend(frameon=True, title='species') plt.xlabel('petalLength') plt.ylabel('sepalLength') Statistical Visualization: Faceting Problem: We’re mixing the what with the how

Slide 18

Slide 18 text

@jakevdp Jake VanderPlas Most Useful for Data Science is Declarative Visualization Declarative - Specify What should be done - Details determined automatically - Separates Specification from Execution Imperative - Specify How something should be done. - Must manually specify plotting steps - Specification & Execution intertwined. Declarative visualization lets you think about data and relationships, rather than incidental details.

Slide 19

Slide 19 text

@jakevdp Jake VanderPlas Seaborn: Declarative Visualization . . . Almost import seaborn as sns g = sns.FacetGrid(iris, col="species", hue="species") g.map(plt.scatter, "petalLength", "sepalWidth", alpha=0.3) g.add_legend();

Slide 20

Slide 20 text

@jakevdp Jake VanderPlas Altair for Declarative Visualization from altair import Chart Chart(iris).mark_circle( opacity=0.3 ).encode( x='petalLength', y='sepalWidth', color='species' )

Slide 21

Slide 21 text

@jakevdp Jake VanderPlas Altair. Declarative statistical visualization library for Python, driven by Vega-Lite http://github.com/altair-viz/altair Collaboration with Brian Granger (Jupyter team), myself, and UW’s Interactive Data Lab

Slide 22

Slide 22 text

@jakevdp Jake VanderPlas Changing the Encoding is Trivial from altair import Chart Chart(iris).mark_circle( opacity=0.3 ).encode( x='petalLength', y='sepalWidth', color='species', )

Slide 23

Slide 23 text

@jakevdp Jake VanderPlas Changing the Encoding is Trivial from altair import Chart Chart(iris).mark_circle( opacity=0.3 ).encode( x='petalLength', y='sepalWidth', color='species', column='species' )

Slide 24

Slide 24 text

#JSM2016 Jake VanderPlas So What Is Altair?

Slide 25

Slide 25 text

#JSM2016 Jake VanderPlas D3 is Everywhere . . . (click for live version)

Slide 26

Slide 26 text

#JSM2016 Jake VanderPlas But working in D3 can be challenging . . .

Slide 27

Slide 27 text

#JSM2016 Jake VanderPlas Bar Chart: d3 var margin = {top: 20, right: 20, bottom: 30, left: 40}, width = 960 - margin.left - margin.right, height = 500 - margin.top - margin.bottom; var x = d3.scale.ordinal() .rangeRoundBands([0, width], .1); var y = d3.scale.linear() .range([height, 0]); var xAxis = d3.svg.axis() .scale(x) .orient("bottom"); var yAxis = d3.svg.axis() .scale(y) .orient("left") .ticks(10, "%"); var svg = d3.select("body").append("svg") .attr("width", width + margin.left + margin.right) .attr("height", height + margin.top + margin.bottom) .append("g") .attr("transform", "translate(" + margin.left + "," + margin.top + ")"); d3.tsv("data.tsv", type, function(error, data) { if (error) throw error; x.domain(data.map(function(d) { return d.letter; })); y.domain([0, d3.max(data, function(d) { return d.frequency; })]); svg.append("g") .attr("class", "x axis") .attr("transform", "translate(0," + height + ")") .call(xAxis); svg.append("g") .attr("class", "y axis") .call(yAxis) .append("text") .attr("transform", "rotate(-90)") .attr("y", 6) .attr("dy", ".71em") .style("text-anchor", "end") .text("Frequency"); svg.selectAll(".bar") .data(data) .enter().append("rect") .attr("class", "bar") .attr("x", function(d) { return x(d.letter); }) .attr("width", x.rangeBand()) .attr("y", function(d) { return y(d.frequency); }) .attr("height", function(d) { return height - y(d.frequency); }); }); function type(d) { d.frequency = +d.frequency; return d; } D3 is a Javascript package that streamlines manipulation of objects on a webpage.

Slide 28

Slide 28 text

#JSM2016 Jake VanderPlas Bar Chart: Vega { "width": 400, "height": 200, "padding": {"top": 10, "left": 30, "bottom": 30, "right": 10}, "data": [ { "name": "table", "values": [ {"x": 1, "y": 28}, {"x": 2, "y": 55}, {"x": 3, "y": 43}, {"x": 4, "y": 91}, {"x": 5, "y": 81}, {"x": 6, "y": 53}, {"x": 7, "y": 19}, {"x": 8, "y": 87}, {"x": 9, "y": 52}, {"x": 10, "y": 48}, {"x": 11, "y": 24}, {"x": 12, "y": 49}, {"x": 13, "y": 87}, {"x": 14, "y": 66}, {"x": 15, "y": 17}, {"x": 16, "y": 27}, {"x": 17, "y": 68}, {"x": 18, "y": 16}, {"x": 19, "y": 49}, {"x": 20, "y": 15} ] } ], "scales": [ { "name": "x", "type": "ordinal", "range": "width", "domain": {"data": "table", "field": "x"} }, { "name": "y", "type": "linear", "range": "height", "domain": {"data": "table", "field": "y"}, "nice": true } ], "axes": [ {"type": "x", "scale": "x"}, {"type": "y", "scale": "y"} ], "marks": [ { "type": "rect", "from": {"data": "table"}, "properties": { "enter": { "x": {"scale": "x", "field": "x"}, "width": {"scale": "x", "band": true, "offset": -1}, "y": {"scale": "y", "field": "y"}, "y2": {"scale": "y", "value": 0} }, "update": { "fill": {"value": "steelblue"} Vega is a detailed declarative specification for visualizations, built on D3.

Slide 29

Slide 29 text

#JSM2016 Jake VanderPlas Bar Chart: Vega-Lite { "description": "A simple bar chart with embedded data.", "data": { "values": [ {"a": "A","b": 28}, {"a": "B","b": 55}, {"a": "C","b": 43}, {"a": "D","b": 91}, {"a": "E","b": 81}, {"a": "F","b": 53}, {"a": "G","b": 19}, {"a": "H","b": 87}, {"a": "I","b": 52} ] }, "mark": "bar", "encoding": { "x": {"field": "a", "type": "ordinal"}, "y": {"field": "b", "type": "quantitative"} } } Vega-Lite is a simpler declarative specification aimed at statistical visualization.

Slide 30

Slide 30 text

#JSM2016 Jake VanderPlas Bar Chart: Altair Altair is a Python API for creating Vega-Lite specifications.

Slide 31

Slide 31 text

@jakevdp Jake VanderPlas From Declarative API to declarative Grammar url = load_dataset('iris', url_only=True) chart = Chart(url).mark_circle( opacity=0.3 ).encode( x='petalLength:Q', y='sepalWidth:Q', color='species:N', ) chart.display()

Slide 32

Slide 32 text

@jakevdp Jake VanderPlas From Declarative API to declarative Grammar >>> chart.to_dict() {'config': {'mark': {'opacity': 0.3}}, 'data': {'url': 'https://vega.github.io/vega-datasets/data/iris.json'}, 'encoding': {'color': {'field': 'species', 'type': 'nominal'}, 'x': {'field': 'petalLength', 'type': 'quantitative'}, 'y': {'field': 'sepalWidth', 'type': 'quantitative'}}, 'mark': 'circle'}

Slide 33

Slide 33 text

#JSM2016 Jake VanderPlas Key Features of Altair: - Designed with Statistical Visualizations in mind - Data specified in Tidy Format & linked to a declared type: Quantitative, Nominal, Ordinal, Temporal - Well-defined set of marks to represent data - Encoding Channels map data features (i.e. columns) to visual encodings (e.g. x, y, color, size, etc.) - Simple data transformations supported natively

Slide 34

Slide 34 text

#JSM2016 Jake VanderPlas But why another plotting library? Teaching: students can learn visualization concepts with minimal syntactic distraction. Publishing: Instead of publishing pixels, can publish data + plot specification for greater flexibility & reproducibility. Cross-Pollination: Vega-Lite has the potential to provide a cross-platform lingua franca of statistical visualization. - Matplotlib - Bokeh - Plotly - Seaborn - Holoviews - VisPy - ggplot - pandas plot - Lightning

Slide 35

Slide 35 text

@jakevdp Jake VanderPlas Altair/Vega-Lite supports many plot types:

Slide 36

Slide 36 text

@jakevdp Jake VanderPlas Altair/Vega-Lite supports many plot types:

Slide 37

Slide 37 text

@jakevdp Jake VanderPlas Altair/Vega-Lite supports many plot types:

Slide 38

Slide 38 text

@jakevdp Jake VanderPlas Altair/Vega-Lite supports many plot types:

Slide 39

Slide 39 text

@jakevdp Jake VanderPlas Altair/Vega-Lite supports many plot types:

Slide 40

Slide 40 text

@jakevdp Jake VanderPlas Altair/Vega-Lite supports many plot types:

Slide 41

Slide 41 text

#JSM2016 Jake VanderPlas (Visualizations from jakevdp/altair-examples).

Slide 42

Slide 42 text

@jakevdp Jake VanderPlas Some Live Examples . . . See the notebook at https://github.com/jakevdp/talks/blob/master/2016-11-9-Altair.ipynb

Slide 43

Slide 43 text

@jakevdp Jake VanderPlas or $ conda install altair --channel conda-forge $ pip install altair $ jupyter nbextension install --sys-prefix --py vega Try Altair: http://github.com/ellisonbg/altair/ For a Jupyter notebook tutorial, type import altair altair.tutorial()

Slide 44

Slide 44 text

@jakevdp Jake VanderPlas Altair’s Development is Active! - More plot types - Higher-level Statistical routines - Improve layering API - Vega-Tooltip interaction - Vega-Lite's Grammar of Interaction (See [1]) [1] http://idl.cs.washington.edu/papers/vega-lite/

Slide 45

Slide 45 text

@jakevdp Jake VanderPlas Email: [email protected] Twitter: @jakevdp Github: jakevdp Web: http://vanderplas.com Blog: http://jakevdp.github.io Thank You!