Slide 1

Slide 1 text

@jakevdp Jake VanderPlas Jake VanderPlas @jakevdp Convoy Tech Feb 15, 2018 Bespoke Visualizations with a Declarative Twist

Slide 2

Slide 2 text

@jakevdp Jake VanderPlas Python Viz is a bit Painful... "I have been using Matplotlib for a decade now, and I still have to look most things up" “I love Python but I switch to R for making plots” “I do viz in Python, but switch from matplotlib to seaborn to bokeh depending on what I need to do”

Slide 3

Slide 3 text

@jakevdp Jake VanderPlas Python’s Visualization Landscape matplotlib seaborn pandas ggpy scikit- plot Yellow brick networkx basemap /cartopy javascript pythreejs bqplot bokeh toyplot plotly ipyvolume cufflinks holoviews datashader d3js mpld3 Altair Vincent OpenGL Glumpy Vispy ipyleaflet Lightning GlueViz YT d3po Vega-Lite Vega MayaVi graphviz GR framework PyQTgraph pygal chaco Vaex graph-tool

Slide 4

Slide 4 text

@jakevdp Jake VanderPlas Problem: where would you tell beginners to start? - Matplotlib - Bokeh - Plotly - Seaborn - Holoviews - VisPy - ggplot - pandas plot - Lightning Each library has strengths, but arguably none is yet the “killer viz app” for Data Science.

Slide 5

Slide 5 text

@jakevdp Jake VanderPlas Some examples . . .

Slide 6

Slide 6 text

@jakevdp Jake VanderPlas http://matplotlib.org/

Slide 7

Slide 7 text

@jakevdp Jake VanderPlas import matplotlib.pyplot as plt from numpy.random import rand for color in ['red', 'green', 'blue']: x, y = rand(2, 100) size = 200.0 * rand(100) plt.scatter(x, y, c=color, s=size, label=color, alpha=0.3, edgecolor='none') plt.legend(frameon=True) plt.show() Plotting with Matplotlib

Slide 8

Slide 8 text

@jakevdp Jake VanderPlas Plotting with Matplotlib Strengths: - Designed like MatLab: switching was easy

Slide 9

Slide 9 text

@jakevdp Jake VanderPlas Plotting with Matplotlib Strengths: - Designed like MatLab: switching was easy - Many rendering backends

Slide 10

Slide 10 text

@jakevdp Jake VanderPlas Plotting with Matplotlib Strengths: - Designed like MatLab: switching was easy - Many rendering backends - Can reproduce just about any plot (with a bit of effort)

Slide 11

Slide 11 text

@jakevdp Jake VanderPlas Plotting with Matplotlib Strengths: - Designed like MatLab: switching was easy - Many rendering backends - Can reproduce just about any plot (with a bit of effort) - Well-tested, standard tool for over a decade

Slide 12

Slide 12 text

@jakevdp Jake VanderPlas Matplotlib Gallery

Slide 13

Slide 13 text

@jakevdp Jake VanderPlas Plotting with Matplotlib Strengths: - Designed like MatLab: switching was easy - Many rendering backends - Can reproduce just about any plot with a bit of effort - Well-tested, standard tool for over a decade Weaknesses: - API is imperative & often overly verbose - Sometimes poor stylistic defaults - Poor support for web/interactive graphs - Often slow for large & complicated data

Slide 14

Slide 14 text

@jakevdp Jake VanderPlas http://bokeh.pydata.org/

Slide 15

Slide 15 text

@jakevdp Jake VanderPlas from bokeh.plotting import figure, show from bokeh.models import LinearAxis, Range1d p = figure() for color in ['red', 'green', 'blue']: x, y = rand(2, 100) size = 0.03 * rand(100) p.circle(x, y, fill_color=color, radius=size, legend=color, fill_alpha=0.3, line_color=None) show(p) Plotting with Bokeh

Slide 16

Slide 16 text

@jakevdp Jake VanderPlas Plotting with Bokeh

Slide 17

Slide 17 text

@jakevdp Jake VanderPlas Bokeh Gallery

Slide 18

Slide 18 text

@jakevdp Jake VanderPlas Plotting with Bokeh Advantages: - Web view/interactivity - Imperative and Declarative layer - Handles large and/or streaming datasets - Geographical visualization - Fully open source Disadvantages: - No vector output (need PDF/EPS? Sorry) - Newer tool with a smaller user-base than matplotlib

Slide 19

Slide 19 text

@jakevdp Jake VanderPlas http://plot.ly/

Slide 20

Slide 20 text

@jakevdp Jake VanderPlas Basic Plotting with Plotly

Slide 21

Slide 21 text

@jakevdp Jake VanderPlas Plotly Gallery

Slide 22

Slide 22 text

@jakevdp Jake VanderPlas Plotting with Plotly Advantages: - Web view/interactivity - Multi-language support - 3D plotting capability - Animation capability - Geographical visualization Disadvantages: - Some features require a paid plan

Slide 23

Slide 23 text

@jakevdp Jake VanderPlas Moving to Statistical Visualization

Slide 24

Slide 24 text

@jakevdp Jake VanderPlas from altair import load_dataset iris = load_dataset('iris') iris.head() Data in Tidy Format: i.e. rows are samples, columns are features Statistical Visualization

Slide 25

Slide 25 text

@jakevdp Jake VanderPlas color_map = dict(zip(iris.species.unique(), ['blue', 'green', 'red'])) for species, group in iris.groupby('species'): plt.scatter(group['petalLength'], group['sepalWidth'], color=color_map[species], alpha=0.3, edgecolor=None, label=species) plt.legend(frameon=True, title='species') plt.xlabel('petalLength') plt.ylabel('sepalLength') Statistical Visualization: Grouping

Slide 26

Slide 26 text

@jakevdp Jake VanderPlas color_map = dict(zip(iris.species.unique(),['blue', 'green', 'red'])) n_panels = len(color_map) fig, ax = plt.subplots(1, n_panels, figsize=(n_panels * 5, 3), sharex =True, sharey=True) for i, (species, group) in enumerate(iris.groupby('species')): ax[i].scatter(group['petalLength'], group['sepalWidth'], color =color_map[species], alpha =0.3, edgecolor=None, label =species) ax[i].legend(frameon=True, title='species') plt.xlabel('petalLength') plt.ylabel('sepalLength') Statistical Visualization: Faceting

Slide 27

Slide 27 text

@jakevdp Jake VanderPlas color_map = dict(zip(iris.species.unique(),['blue', 'green', 'red'])) n_panels = len(color_map) fig, ax = plt.subplots(1, n_panels, figsize=(n_panels * 5, 3), sharex =True, sharey=True) for i, (species, group) in enumerate(iris.groupby('species')): ax[i].scatter(group['petalLength'], group['sepalWidth'], color =color_map[species], alpha =0.3, edgecolor=None, label =species) ax[i].legend(frameon=True, title='species') plt.xlabel('petalLength') plt.ylabel('sepalLength') Statistical Visualization: Faceting Problem: We’re mixing the what with the how

Slide 28

Slide 28 text

@jakevdp Jake VanderPlas Most Useful for Data Science is Declarative Visualization Declarative - Specify What should be done - Details determined automatically - Separates Specification from Execution Imperative - Specify How something should be done. - Must manually specify plotting steps - Specification & Execution intertwined. Declarative visualization lets you think about data and relationships, rather than incidental details.

Slide 29

Slide 29 text

@jakevdp Jake VanderPlas Seaborn: Declarative Visualization . . . Almost import seaborn as sns g = sns.FacetGrid(iris, col="species", hue="species") g.map(plt.scatter, "petalLength", "sepalWidth", alpha=0.3) g.add_legend();

Slide 30

Slide 30 text

@jakevdp Jake VanderPlas http://altair-viz.github.io/

Slide 31

Slide 31 text

@jakevdp Jake VanderPlas Altair for Declarative Visualization from altair import Chart from vega_datasets import data iris = data.iris() Chart(iris).mark_circle().encode( x='petalLength', y='sepalWidth', color='species' )

Slide 32

Slide 32 text

@jakevdp Jake VanderPlas Altair for Declarative Visualization from altair import Chart from vega_datasets import data iris = data.iris() Chart(iris).mark_circle().encode( x='petalLength', y='sepalWidth', color='species' ).interactive()

Slide 33

Slide 33 text

@jakevdp Jake VanderPlas Encodings are Flexible: from altair import Chart from vega_datasets import data iris = data.iris() Chart(iris).mark_circle().encode( x='petalLength', y='sepalWidth', color='species', column='species' )

Slide 34

Slide 34 text

@jakevdp Jake VanderPlas Altair. Declarative statistical visualization library for Python, driven by Vega-Lite http://github.com/altair-viz/altair Collaboration with Brian Granger (Jupyter team), myself, and UW’s Interactive Data Lab

Slide 35

Slide 35 text

Jake VanderPlas So What Is Altair?

Slide 36

Slide 36 text

Jake VanderPlas D3 is Everywhere . . . (live version at NYT)

Slide 37

Slide 37 text

Jake VanderPlas But working in D3 can be challenging . . .

Slide 38

Slide 38 text

Jake VanderPlas Bar Chart: d3 var margin = {top: 20, right: 20, bottom: 30, left: 40}, width = 960 - margin.left - margin.right, height = 500 - margin.top - margin.bottom; var x = d3.scale.ordinal() .rangeRoundBands([0, width], .1); var y = d3.scale.linear() .range([height, 0]); var xAxis = d3.svg.axis() .scale(x) .orient("bottom"); var yAxis = d3.svg.axis() .scale(y) .orient("left") .ticks(10, "%"); var svg = d3.select("body").append("svg") .attr("width", width + margin.left + margin.right) .attr("height", height + margin.top + margin.bottom) .append("g") .attr("transform", "translate(" + margin.left + "," + margin.top + ")"); d3.tsv("data.tsv", type, function(error, data) { if (error) throw error; x.domain(data.map(function(d) { return d.letter; })); y.domain([0, d3.max(data, function(d) { return d.frequency; })]); svg.append("g") .attr("class", "x axis") .attr("transform", "translate(0," + height + ")") .call(xAxis); svg.append("g") .attr("class", "y axis") .call(yAxis) .append("text") .attr("transform", "rotate(-90)") .attr("y", 6) .attr("dy", ".71em") .style("text-anchor", "end") .text("Frequency"); svg.selectAll(".bar") .data(data) .enter().append("rect") .attr("class", "bar") .attr("x", function(d) { return x(d.letter); }) .attr("width", x.rangeBand()) .attr("y", function(d) { return y(d.frequency); }) .attr("height", function(d) { return height - y(d.frequency); }); }); function type(d) { d.frequency = +d.frequency; return d; } D3 is a Javascript package that streamlines manipulation of objects on a webpage.

Slide 39

Slide 39 text

Jake VanderPlas Bar Chart: Vega { "width": 400, "height": 200, "padding": {"top": 10, "left": 30, "bottom": 30, "right": 10}, "data": [ { "name": "table", "values": [ {"x": 1, "y": 28}, {"x": 2, "y": 55}, {"x": 3, "y": 43}, {"x": 4, "y": 91}, {"x": 5, "y": 81}, {"x": 6, "y": 53}, {"x": 7, "y": 19}, {"x": 8, "y": 87}, {"x": 9, "y": 52}, {"x": 10, "y": 48}, {"x": 11, "y": 24}, {"x": 12, "y": 49}, {"x": 13, "y": 87}, {"x": 14, "y": 66}, {"x": 15, "y": 17}, {"x": 16, "y": 27}, {"x": 17, "y": 68}, {"x": 18, "y": 16}, {"x": 19, "y": 49}, {"x": 20, "y": 15} ] } ], "scales": [ { "name": "x", "type": "ordinal", "range": "width", "domain": {"data": "table", "field": "x"} }, { "name": "y", "type": "linear", "range": "height", "domain": {"data": "table", "field": "y"}, "nice": true } ], "axes": [ {"type": "x", "scale": "x"}, {"type": "y", "scale": "y"} ], "marks": [ { "type": "rect", "from": {"data": "table"}, "properties": { "enter": { "x": {"scale": "x", "field": "x"}, "width": {"scale": "x", "band": true, "offset": -1}, "y": {"scale": "y", "field": "y"}, "y2": {"scale": "y", "value": 0} }, "update": { "fill": {"value": "steelblue"} Vega is a detailed declarative specification for visualizations, built on D3.

Slide 40

Slide 40 text

Jake VanderPlas Bar Chart: Vega-Lite { "description": "A simple bar chart with embedded data.", "data": { "values": [ {"a": "A","b": 28}, {"a": "B","b": 55}, {"a": "C","b": 43}, {"a": "D","b": 91}, {"a": "E","b": 81}, {"a": "F","b": 53}, {"a": "G","b": 19}, {"a": "H","b": 87}, {"a": "I","b": 52} ] }, "mark": "bar", "encoding": { "x": {"field": "a", "type": "ordinal"}, "y": {"field": "b", "type": "quantitative"} } } Vega-Lite is a simpler declarative specification aimed at statistical visualization.

Slide 41

Slide 41 text

Jake VanderPlas Bar Chart: Altair Altair is a Python API for creating Vega-Lite specifications.

Slide 42

Slide 42 text

@jakevdp Jake VanderPlas From Declarative API to declarative Grammar url = load_dataset('iris', url_only=True) chart = Chart(url).mark_circle( opacity=0.3 ).encode( x='petalLength:Q', y='sepalWidth:Q', color='species:N', ) chart.display()

Slide 43

Slide 43 text

@jakevdp Jake VanderPlas From Declarative API to declarative Grammar >>> chart.to_dict() {'config': {'mark': {'opacity': 0.3}}, 'data': {'url': 'https://vega.github.io/vega-datasets/data/iris.json'}, 'encoding': {'color': {'field': 'species', 'type': 'nominal'}, 'x': {'field': 'petalLength', 'type': 'quantitative'}, 'y': {'field': 'sepalWidth', 'type': 'quantitative'}}, 'mark': 'circle'}

Slide 44

Slide 44 text

Jake VanderPlas Key Features of Altair: - Designed with Statistical Visualizations in mind - Data specified in Tidy Format & linked to a declared type: Quantitative, Nominal, Ordinal, Temporal - Well-defined set of marks to represent data - Encoding Channels map data features (i.e. columns) to visual encodings (e.g. x, y, color, size, etc.) - Simple data transformations supported natively

Slide 45

Slide 45 text

Jake VanderPlas But why another plotting library? Teaching: students can learn visualization concepts with minimal syntactic distraction. Publishing: Instead of publishing pixels, can publish data + plot specification for greater flexibility & reproducibility. Cross-Pollination: Vega-Lite has the potential to provide a cross-platform lingua franca of statistical visualization. - Matplotlib - Bokeh - Plotly - Seaborn - Holoviews - VisPy - ggplot - pandas plot - Lightning

Slide 46

Slide 46 text

@jakevdp Jake VanderPlas Altair/Vega-Lite supports many plot types:

Slide 47

Slide 47 text

@jakevdp Jake VanderPlas Altair/Vega-Lite supports many plot types:

Slide 48

Slide 48 text

@jakevdp Jake VanderPlas Altair/Vega-Lite supports many plot types:

Slide 49

Slide 49 text

@jakevdp Jake VanderPlas Altair/Vega-Lite supports many plot types:

Slide 50

Slide 50 text

@jakevdp Jake VanderPlas Altair/Vega-Lite supports many plot types:

Slide 51

Slide 51 text

@jakevdp Jake VanderPlas Altair/Vega-Lite supports many plot types:

Slide 52

Slide 52 text

Jake VanderPlas (Visualizations from jakevdp/altair-examples).

Slide 53

Slide 53 text

Jake VanderPlas Altair 2.0: a Grammar of Interaction

Slide 54

Slide 54 text

@jakevdp Jake VanderPlas Some Live Examples . . . See the notebook at https://github.com/jakevdp/talks/blob/master/2016-11-9-Altair.ipynb

Slide 55

Slide 55 text

@jakevdp Jake VanderPlas or $ conda install altair --channel conda-forge $ pip install altair $ jupyter nbextension install --sys-prefix --py vega Try Altair: http://github.com/ellisonbg/altair/ For a Jupyter notebook tutorial, type import altair altair.tutorial()

Slide 56

Slide 56 text

@jakevdp Jake VanderPlas Altair’s Development is Active! - More plot types - Higher-level Statistical routines - Improve layering API - Vega-Tooltip interaction - Vega-Lite's Grammar of Interaction

Slide 57

Slide 57 text

@jakevdp Jake VanderPlas Email: [email protected] Twitter: @jakevdp Github: jakevdp Web: http://vanderplas.com Blog: http://jakevdp.github.io Thank You!