Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Visualization in Python with Altair

Visualization in Python with Altair

Introducing Altair for declarative statistical visualization in Python. Talk given at the Puget Sound Python meetup, Nov 9, 2016

Jake VanderPlas

November 09, 2016
Tweet

More Decks by Jake VanderPlas

Other Decks in Programming

Transcript

  1. @jakevdp
    Jake VanderPlas
    Jake VanderPlas @jakevdp
    Puget Sound Python
    Nov 9, 2016
    Visualization in Python
    with Altair

    View Slide

  2. @jakevdp
    Jake VanderPlas
    Statistical
    Visualization in Python
    with Altair
    Jake VanderPlas @jakevdp
    Puget Sound Python
    Nov 9, 2016

    View Slide

  3. @jakevdp
    Jake VanderPlas
    Declarative Statistical
    Visualization in Python
    with Altair
    Jake VanderPlas @jakevdp
    Puget Sound Python
    Nov 9, 2016

    View Slide

  4. @jakevdp
    Jake VanderPlas
    Python Viz is a bit Painful...
    "I have been using Matplotlib for a decade
    now, and I still have to look most things up"
    “I love Python but I switch to R for
    making plots”
    “I do viz in Python, but switch from
    matplotlib to seaborn to bokeh
    depending on what I need to do”

    View Slide

  5. @jakevdp
    Jake VanderPlas
    Problem: where would you tell
    beginners to start?
    - Matplotlib
    - Bokeh
    - Plotly
    - Seaborn
    - Holoviews
    - VisPy
    - ggplot
    - pandas plot
    - Lightning
    Each library has strengths, but
    arguably none is yet the “killer
    viz app” for Data Science.

    View Slide

  6. @jakevdp
    Jake VanderPlas
    Some examples . . .

    View Slide

  7. @jakevdp
    Jake VanderPlas
    import matplotlib.pyplot as plt
    from numpy.random import rand
    for color in ['red', 'green', 'blue']:
    x, y = rand(2, 100)
    size = 200.0 * rand(100)
    plt.scatter(x, y, c=color, s=size, label=color,
    alpha=0.3, edgecolor='none')
    plt.legend(frameon=True)
    plt.show()
    Plotting with Matplotlib

    View Slide

  8. @jakevdp
    Jake VanderPlas
    Plotting with Matplotlib
    Advantages:
    - Matlab-like API
    - Well-tested, standard tool for over a decade
    - LOADS of rendering backends
    - Can reproduce just about any plot… if you have time
    Disadvantages:
    - Matlab-like API
    - Often poor stylistic defaults (though see 2.0 release)
    - Imperative model: lots of manual tweaking required
    (though see Seaborn & ggplot)
    - Poor support for web/interactive graphs
    (though see http://mpld3.github.io/)
    - Often slow for large & complicated data

    View Slide

  9. @jakevdp
    Jake VanderPlas
    Matplotlib Gallery

    View Slide

  10. @jakevdp
    Jake VanderPlas
    from bokeh.plotting import figure, show
    from bokeh.models import LinearAxis, Range1d
    p = figure()
    for color in ['red', 'green', 'blue']:
    x, y = rand(2, 100)
    size = 0.03 * rand(100)
    p.circle(x, y, fill_color=color, radius=size,
    legend=color, fill_alpha=0.3,
    line_color=None)
    show(p)
    Plotting with Bokeh

    View Slide

  11. @jakevdp
    Jake VanderPlas
    Plotting with Bokeh
    Advantages:
    - Web view/interactivity
    - Imperative and Declarative layer
    - Handles large and/or streaming datasets
    - Modern default plot styles
    Disadvantages:
    - No vector output (need PDF/EPS? Sorry)
    - Newer tool with a smaller user-base than
    matplotlib

    View Slide

  12. @jakevdp
    Jake VanderPlas
    Bokeh Gallery

    View Slide

  13. @jakevdp
    Jake VanderPlas
    Moving to Statistical
    Visualization

    View Slide

  14. @jakevdp
    Jake VanderPlas
    from altair import load_dataset
    iris = load_dataset('iris')
    iris.head()
    Data in Tidy Format: i.e. rows are samples, columns are
    features
    Statistical Visualization

    View Slide

  15. @jakevdp
    Jake VanderPlas
    color_map = dict(zip(iris.species.unique(),
    ['blue', 'green', 'red']))
    for species, group in iris.groupby('species'):
    plt.scatter(group['petalLength'], group['sepalWidth'],
    color=color_map[species],
    alpha=0.3, edgecolor=None,
    label=species)
    plt.legend(frameon=True, title='species')
    plt.xlabel('petalLength')
    plt.ylabel('sepalLength')
    Statistical Visualization: Grouping

    View Slide

  16. @jakevdp
    Jake VanderPlas
    color_map = dict(zip(iris.species.unique(),['blue', 'green', 'red']))
    n_panels = len(color_map)
    fig, ax = plt.subplots(1, n_panels, figsize=(n_panels * 5, 3),
    sharex
    =True, sharey=True)
    for i, (species, group) in enumerate(iris.groupby('species')):
    ax[i].scatter(group['petalLength'], group['sepalWidth'],
    color
    =color_map[species],
    alpha
    =0.3, edgecolor=None,
    label
    =species)
    ax[i].legend(frameon=True, title='species')
    plt.xlabel('petalLength')
    plt.ylabel('sepalLength')
    Statistical Visualization: Faceting

    View Slide

  17. @jakevdp
    Jake VanderPlas
    color_map = dict(zip(iris.species.unique(),['blue', 'green', 'red']))
    n_panels = len(color_map)
    fig, ax = plt.subplots(1, n_panels, figsize=(n_panels * 5, 3),
    sharex
    =True, sharey=True)
    for i, (species, group) in enumerate(iris.groupby('species')):
    ax[i].scatter(group['petalLength'], group['sepalWidth'],
    color
    =color_map[species],
    alpha
    =0.3, edgecolor=None,
    label
    =species)
    ax[i].legend(frameon=True, title='species')
    plt.xlabel('petalLength')
    plt.ylabel('sepalLength')
    Statistical Visualization: Faceting
    Problem:
    We’re mixing the what with the how

    View Slide

  18. @jakevdp
    Jake VanderPlas
    Most Useful for Data Science is
    Declarative Visualization
    Declarative
    - Specify What should be
    done
    - Details determined
    automatically
    - Separates Specification
    from Execution
    Imperative
    - Specify How something
    should be done.
    - Must manually specify
    plotting steps
    - Specification &
    Execution intertwined.
    Declarative visualization lets you think about data
    and relationships, rather than incidental details.

    View Slide

  19. @jakevdp
    Jake VanderPlas
    Seaborn: Declarative Visualization
    . . . Almost
    import seaborn as sns
    g = sns.FacetGrid(iris, col="species", hue="species")
    g.map(plt.scatter, "petalLength", "sepalWidth", alpha=0.3)
    g.add_legend();

    View Slide

  20. @jakevdp
    Jake VanderPlas
    Altair for Declarative Visualization
    from altair import Chart
    Chart(iris).mark_circle(
    opacity=0.3
    ).encode(
    x='petalLength',
    y='sepalWidth',
    color='species'
    )

    View Slide

  21. @jakevdp
    Jake VanderPlas
    Altair.
    Declarative statistical visualization library for Python,
    driven by Vega-Lite
    http://github.com/altair-viz/altair
    Collaboration with Brian Granger (Jupyter team), myself,
    and UW’s Interactive Data Lab

    View Slide

  22. @jakevdp
    Jake VanderPlas
    Changing the Encoding is Trivial
    from altair import Chart
    Chart(iris).mark_circle(
    opacity=0.3
    ).encode(
    x='petalLength',
    y='sepalWidth',
    color='species',
    )

    View Slide

  23. @jakevdp
    Jake VanderPlas
    Changing the Encoding is Trivial
    from altair import Chart
    Chart(iris).mark_circle(
    opacity=0.3
    ).encode(
    x='petalLength',
    y='sepalWidth',
    color='species',
    column='species'
    )

    View Slide

  24. #JSM2016
    Jake VanderPlas
    So What Is Altair?

    View Slide

  25. #JSM2016
    Jake VanderPlas
    D3 is Everywhere . . .
    (click for live version)

    View Slide

  26. #JSM2016
    Jake VanderPlas
    But working in D3 can
    be challenging . . .

    View Slide

  27. #JSM2016
    Jake VanderPlas
    Bar Chart: d3
    var margin = {top: 20, right: 20, bottom: 30, left: 40},
    width = 960 - margin.left - margin.right,
    height = 500 - margin.top - margin.bottom;
    var x = d3.scale.ordinal()
    .rangeRoundBands([0, width], .1);
    var y = d3.scale.linear()
    .range([height, 0]);
    var xAxis = d3.svg.axis()
    .scale(x)
    .orient("bottom");
    var yAxis = d3.svg.axis()
    .scale(y)
    .orient("left")
    .ticks(10, "%");
    var svg = d3.select("body").append("svg")
    .attr("width", width + margin.left + margin.right)
    .attr("height", height + margin.top + margin.bottom)
    .append("g")
    .attr("transform", "translate(" + margin.left + "," + margin.top + ")");
    d3.tsv("data.tsv", type, function(error, data) {
    if (error) throw error;
    x.domain(data.map(function(d) { return d.letter; }));
    y.domain([0, d3.max(data, function(d) { return d.frequency; })]);
    svg.append("g")
    .attr("class", "x axis")
    .attr("transform", "translate(0," + height + ")")
    .call(xAxis);
    svg.append("g")
    .attr("class", "y axis")
    .call(yAxis)
    .append("text")
    .attr("transform", "rotate(-90)")
    .attr("y", 6)
    .attr("dy", ".71em")
    .style("text-anchor", "end")
    .text("Frequency");
    svg.selectAll(".bar")
    .data(data)
    .enter().append("rect")
    .attr("class", "bar")
    .attr("x", function(d) { return x(d.letter); })
    .attr("width", x.rangeBand())
    .attr("y", function(d) { return y(d.frequency); })
    .attr("height", function(d) { return height - y(d.frequency); });
    });
    function type(d) {
    d.frequency = +d.frequency;
    return d;
    }
    D3 is a Javascript package that
    streamlines manipulation of
    objects on a webpage.

    View Slide

  28. #JSM2016
    Jake VanderPlas
    Bar Chart: Vega
    {
    "width": 400,
    "height": 200,
    "padding": {"top": 10, "left": 30, "bottom": 30, "right": 10},
    "data": [
    {
    "name": "table",
    "values": [
    {"x": 1, "y": 28}, {"x": 2, "y": 55},
    {"x": 3, "y": 43}, {"x": 4, "y": 91},
    {"x": 5, "y": 81}, {"x": 6, "y": 53},
    {"x": 7, "y": 19}, {"x": 8, "y": 87},
    {"x": 9, "y": 52}, {"x": 10, "y": 48},
    {"x": 11, "y": 24}, {"x": 12, "y": 49},
    {"x": 13, "y": 87}, {"x": 14, "y": 66},
    {"x": 15, "y": 17}, {"x": 16, "y": 27},
    {"x": 17, "y": 68}, {"x": 18, "y": 16},
    {"x": 19, "y": 49}, {"x": 20, "y": 15}
    ]
    }
    ],
    "scales": [
    {
    "name": "x",
    "type": "ordinal",
    "range": "width",
    "domain": {"data": "table", "field": "x"}
    },
    {
    "name": "y",
    "type": "linear",
    "range": "height",
    "domain": {"data": "table", "field": "y"},
    "nice": true
    }
    ],
    "axes": [
    {"type": "x", "scale": "x"},
    {"type": "y", "scale": "y"}
    ],
    "marks": [
    {
    "type": "rect",
    "from": {"data": "table"},
    "properties": {
    "enter": {
    "x": {"scale": "x", "field": "x"},
    "width": {"scale": "x", "band": true, "offset": -1},
    "y": {"scale": "y", "field": "y"},
    "y2": {"scale": "y", "value": 0}
    },
    "update": {
    "fill": {"value": "steelblue"}
    Vega is a detailed declarative
    specification for visualizations,
    built on D3.

    View Slide

  29. #JSM2016
    Jake VanderPlas
    Bar Chart: Vega-Lite
    {
    "description": "A simple bar chart with embedded data.",
    "data": {
    "values": [
    {"a": "A","b": 28}, {"a": "B","b": 55}, {"a": "C","b": 43},
    {"a": "D","b": 91}, {"a": "E","b": 81}, {"a": "F","b": 53},
    {"a": "G","b": 19}, {"a": "H","b": 87}, {"a": "I","b": 52}
    ]
    },
    "mark": "bar",
    "encoding": {
    "x": {"field": "a", "type": "ordinal"},
    "y": {"field": "b", "type": "quantitative"}
    }
    }
    Vega-Lite is a simpler
    declarative specification aimed
    at statistical visualization.

    View Slide

  30. #JSM2016
    Jake VanderPlas
    Bar Chart: Altair
    Altair is a Python API for creating
    Vega-Lite specifications.

    View Slide

  31. @jakevdp
    Jake VanderPlas
    From Declarative API
    to declarative Grammar
    url = load_dataset('iris', url_only=True)
    chart = Chart(url).mark_circle(
    opacity=0.3
    ).encode(
    x='petalLength:Q',
    y='sepalWidth:Q',
    color='species:N',
    )
    chart.display()

    View Slide

  32. @jakevdp
    Jake VanderPlas
    From Declarative API
    to declarative Grammar
    >>> chart.to_dict()
    {'config': {'mark': {'opacity': 0.3}},
    'data':
    {'url': 'https://vega.github.io/vega-datasets/data/iris.json'},
    'encoding': {'color': {'field': 'species', 'type': 'nominal'},
    'x': {'field': 'petalLength', 'type': 'quantitative'},
    'y': {'field': 'sepalWidth', 'type': 'quantitative'}},
    'mark': 'circle'}

    View Slide

  33. #JSM2016
    Jake VanderPlas
    Key Features of Altair:
    - Designed with Statistical Visualizations in mind
    - Data specified in Tidy Format & linked to a
    declared type: Quantitative, Nominal, Ordinal,
    Temporal
    - Well-defined set of marks to represent data
    - Encoding Channels map
    data features (i.e. columns) to
    visual encodings (e.g. x, y, color, size, etc.)
    - Simple data transformations supported
    natively

    View Slide

  34. #JSM2016
    Jake VanderPlas
    But why another plotting library?
    Teaching: students can learn
    visualization concepts with minimal
    syntactic distraction.
    Publishing: Instead of publishing
    pixels, can publish data + plot
    specification for greater flexibility &
    reproducibility.
    Cross-Pollination: Vega-Lite has the
    potential to provide a cross-platform
    lingua franca of statistical visualization.
    - Matplotlib
    - Bokeh
    - Plotly
    - Seaborn
    - Holoviews
    - VisPy
    - ggplot
    - pandas plot
    - Lightning

    View Slide

  35. @jakevdp
    Jake VanderPlas
    Altair/Vega-Lite supports many plot types:

    View Slide

  36. @jakevdp
    Jake VanderPlas
    Altair/Vega-Lite supports many plot types:

    View Slide

  37. @jakevdp
    Jake VanderPlas
    Altair/Vega-Lite supports many plot types:

    View Slide

  38. @jakevdp
    Jake VanderPlas
    Altair/Vega-Lite supports many plot types:

    View Slide

  39. @jakevdp
    Jake VanderPlas
    Altair/Vega-Lite supports many plot types:

    View Slide

  40. @jakevdp
    Jake VanderPlas
    Altair/Vega-Lite supports many plot types:

    View Slide

  41. #JSM2016
    Jake VanderPlas
    (Visualizations from
    jakevdp/altair-examples).

    View Slide

  42. @jakevdp
    Jake VanderPlas
    Some Live Examples . . .
    See the notebook at
    https://github.com/jakevdp/talks/blob/master/2016-11-9-Altair.ipynb

    View Slide

  43. @jakevdp
    Jake VanderPlas
    or
    $ conda install altair --channel conda-forge
    $ pip install altair
    $ jupyter nbextension install --sys-prefix --py vega
    Try Altair:
    http://github.com/ellisonbg/altair/
    For a Jupyter notebook tutorial, type
    import altair
    altair.tutorial()

    View Slide

  44. @jakevdp
    Jake VanderPlas
    Altair’s Development is Active!
    - More plot types
    - Higher-level Statistical routines
    - Improve layering API
    - Vega-Tooltip interaction
    - Vega-Lite's Grammar of Interaction
    (See [1])
    [1] http://idl.cs.washington.edu/papers/vega-lite/

    View Slide

  45. @jakevdp
    Jake VanderPlas
    Email: [email protected]
    Twitter: @jakevdp
    Github: jakevdp
    Web: http://vanderplas.com
    Blog: http://jakevdp.github.io
    Thank You!

    View Slide