Jake VanderPlas
November 09, 2016
2.4k

# Visualization in Python with Altair

Introducing Altair for declarative statistical visualization in Python. Talk given at the Puget Sound Python meetup, Nov 9, 2016

## Jake VanderPlas

November 09, 2016

## Transcript

1. @jakevdp
Jake VanderPlas
Jake VanderPlas @jakevdp
Puget Sound Python
Nov 9, 2016
Visualization in Python
with Altair

2. @jakevdp
Jake VanderPlas
Statistical
Visualization in Python
with Altair
Jake VanderPlas @jakevdp
Puget Sound Python
Nov 9, 2016

3. @jakevdp
Jake VanderPlas
Declarative Statistical
Visualization in Python
with Altair
Jake VanderPlas @jakevdp
Puget Sound Python
Nov 9, 2016

4. @jakevdp
Jake VanderPlas
Python Viz is a bit Painful...
"I have been using Matplotlib for a decade
now, and I still have to look most things up"
“I love Python but I switch to R for
making plots”
“I do viz in Python, but switch from
matplotlib to seaborn to bokeh
depending on what I need to do”

5. @jakevdp
Jake VanderPlas
Problem: where would you tell
beginners to start?
- Matplotlib
- Bokeh
- Plotly
- Seaborn
- Holoviews
- VisPy
- ggplot
- pandas plot
- Lightning
Each library has strengths, but
arguably none is yet the “killer
viz app” for Data Science.

6. @jakevdp
Jake VanderPlas
Some examples . . .

7. @jakevdp
Jake VanderPlas
import matplotlib.pyplot as plt
from numpy.random import rand
for color in ['red', 'green', 'blue']:
x, y = rand(2, 100)
size = 200.0 * rand(100)
plt.scatter(x, y, c=color, s=size, label=color,
alpha=0.3, edgecolor='none')
plt.legend(frameon=True)
plt.show()
Plotting with Matplotlib

8. @jakevdp
Jake VanderPlas
Plotting with Matplotlib
- Matlab-like API
- Well-tested, standard tool for over a decade
- Can reproduce just about any plot… if you have time
- Matlab-like API
- Often poor stylistic defaults (though see 2.0 release)
- Imperative model: lots of manual tweaking required
(though see Seaborn & ggplot)
- Poor support for web/interactive graphs
(though see http://mpld3.github.io/)
- Often slow for large & complicated data

9. @jakevdp
Jake VanderPlas
Matplotlib Gallery

10. @jakevdp
Jake VanderPlas
from bokeh.plotting import figure, show
from bokeh.models import LinearAxis, Range1d
p = figure()
for color in ['red', 'green', 'blue']:
x, y = rand(2, 100)
size = 0.03 * rand(100)
legend=color, fill_alpha=0.3,
line_color=None)
show(p)
Plotting with Bokeh

11. @jakevdp
Jake VanderPlas
Plotting with Bokeh
- Web view/interactivity
- Imperative and Declarative layer
- Handles large and/or streaming datasets
- Modern default plot styles
- No vector output (need PDF/EPS? Sorry)
- Newer tool with a smaller user-base than
matplotlib

12. @jakevdp
Jake VanderPlas
Bokeh Gallery

13. @jakevdp
Jake VanderPlas
Moving to Statistical
Visualization

14. @jakevdp
Jake VanderPlas
Data in Tidy Format: i.e. rows are samples, columns are
features
Statistical Visualization

15. @jakevdp
Jake VanderPlas
color_map = dict(zip(iris.species.unique(),
['blue', 'green', 'red']))
for species, group in iris.groupby('species'):
plt.scatter(group['petalLength'], group['sepalWidth'],
color=color_map[species],
alpha=0.3, edgecolor=None,
label=species)
plt.legend(frameon=True, title='species')
plt.xlabel('petalLength')
plt.ylabel('sepalLength')
Statistical Visualization: Grouping

16. @jakevdp
Jake VanderPlas
color_map = dict(zip(iris.species.unique(),['blue', 'green', 'red']))
n_panels = len(color_map)
fig, ax = plt.subplots(1, n_panels, figsize=(n_panels * 5, 3),
sharex
=True, sharey=True)
for i, (species, group) in enumerate(iris.groupby('species')):
ax[i].scatter(group['petalLength'], group['sepalWidth'],
color
=color_map[species],
alpha
=0.3, edgecolor=None,
label
=species)
ax[i].legend(frameon=True, title='species')
plt.xlabel('petalLength')
plt.ylabel('sepalLength')
Statistical Visualization: Faceting

17. @jakevdp
Jake VanderPlas
color_map = dict(zip(iris.species.unique(),['blue', 'green', 'red']))
n_panels = len(color_map)
fig, ax = plt.subplots(1, n_panels, figsize=(n_panels * 5, 3),
sharex
=True, sharey=True)
for i, (species, group) in enumerate(iris.groupby('species')):
ax[i].scatter(group['petalLength'], group['sepalWidth'],
color
=color_map[species],
alpha
=0.3, edgecolor=None,
label
=species)
ax[i].legend(frameon=True, title='species')
plt.xlabel('petalLength')
plt.ylabel('sepalLength')
Statistical Visualization: Faceting
Problem:
We’re mixing the what with the how

18. @jakevdp
Jake VanderPlas
Most Useful for Data Science is
Declarative Visualization
Declarative
- Specify What should be
done
- Details determined
automatically
- Separates Specification
from Execution
Imperative
- Specify How something
should be done.
- Must manually specify
plotting steps
- Specification &
Execution intertwined.
Declarative visualization lets you think about data
and relationships, rather than incidental details.

19. @jakevdp
Jake VanderPlas
Seaborn: Declarative Visualization
. . . Almost
import seaborn as sns
g = sns.FacetGrid(iris, col="species", hue="species")
g.map(plt.scatter, "petalLength", "sepalWidth", alpha=0.3)

20. @jakevdp
Jake VanderPlas
Altair for Declarative Visualization
from altair import Chart
Chart(iris).mark_circle(
opacity=0.3
).encode(
x='petalLength',
y='sepalWidth',
color='species'
)

21. @jakevdp
Jake VanderPlas
Altair.
Declarative statistical visualization library for Python,
driven by Vega-Lite
http://github.com/altair-viz/altair
Collaboration with Brian Granger (Jupyter team), myself,
and UW’s Interactive Data Lab

22. @jakevdp
Jake VanderPlas
Changing the Encoding is Trivial
from altair import Chart
Chart(iris).mark_circle(
opacity=0.3
).encode(
x='petalLength',
y='sepalWidth',
color='species',
)

23. @jakevdp
Jake VanderPlas
Changing the Encoding is Trivial
from altair import Chart
Chart(iris).mark_circle(
opacity=0.3
).encode(
x='petalLength',
y='sepalWidth',
color='species',
column='species'
)

24. #JSM2016
Jake VanderPlas
So What Is Altair?

25. #JSM2016
Jake VanderPlas
D3 is Everywhere . . .
(click for live version)

26. #JSM2016
Jake VanderPlas
But working in D3 can
be challenging . . .

27. #JSM2016
Jake VanderPlas
Bar Chart: d3
var margin = {top: 20, right: 20, bottom: 30, left: 40},
width = 960 - margin.left - margin.right,
height = 500 - margin.top - margin.bottom;
var x = d3.scale.ordinal()
.rangeRoundBands([0, width], .1);
var y = d3.scale.linear()
.range([height, 0]);
var xAxis = d3.svg.axis()
.scale(x)
.orient("bottom");
var yAxis = d3.svg.axis()
.scale(y)
.orient("left")
.ticks(10, "%");
var svg = d3.select("body").append("svg")
.attr("width", width + margin.left + margin.right)
.attr("height", height + margin.top + margin.bottom)
.append("g")
.attr("transform", "translate(" + margin.left + "," + margin.top + ")");
d3.tsv("data.tsv", type, function(error, data) {
if (error) throw error;
x.domain(data.map(function(d) { return d.letter; }));
y.domain([0, d3.max(data, function(d) { return d.frequency; })]);
svg.append("g")
.attr("class", "x axis")
.attr("transform", "translate(0," + height + ")")
.call(xAxis);
svg.append("g")
.attr("class", "y axis")
.call(yAxis)
.append("text")
.attr("transform", "rotate(-90)")
.attr("y", 6)
.attr("dy", ".71em")
.style("text-anchor", "end")
.text("Frequency");
svg.selectAll(".bar")
.data(data)
.enter().append("rect")
.attr("class", "bar")
.attr("x", function(d) { return x(d.letter); })
.attr("width", x.rangeBand())
.attr("y", function(d) { return y(d.frequency); })
.attr("height", function(d) { return height - y(d.frequency); });
});
function type(d) {
d.frequency = +d.frequency;
return d;
}
D3 is a Javascript package that
streamlines manipulation of
objects on a webpage.

28. #JSM2016
Jake VanderPlas
Bar Chart: Vega
{
"width": 400,
"height": 200,
"padding": {"top": 10, "left": 30, "bottom": 30, "right": 10},
"data": [
{
"name": "table",
"values": [
{"x": 1, "y": 28}, {"x": 2, "y": 55},
{"x": 3, "y": 43}, {"x": 4, "y": 91},
{"x": 5, "y": 81}, {"x": 6, "y": 53},
{"x": 7, "y": 19}, {"x": 8, "y": 87},
{"x": 9, "y": 52}, {"x": 10, "y": 48},
{"x": 11, "y": 24}, {"x": 12, "y": 49},
{"x": 13, "y": 87}, {"x": 14, "y": 66},
{"x": 15, "y": 17}, {"x": 16, "y": 27},
{"x": 17, "y": 68}, {"x": 18, "y": 16},
{"x": 19, "y": 49}, {"x": 20, "y": 15}
]
}
],
"scales": [
{
"name": "x",
"type": "ordinal",
"range": "width",
"domain": {"data": "table", "field": "x"}
},
{
"name": "y",
"type": "linear",
"range": "height",
"domain": {"data": "table", "field": "y"},
"nice": true
}
],
"axes": [
{"type": "x", "scale": "x"},
{"type": "y", "scale": "y"}
],
"marks": [
{
"type": "rect",
"from": {"data": "table"},
"properties": {
"enter": {
"x": {"scale": "x", "field": "x"},
"width": {"scale": "x", "band": true, "offset": -1},
"y": {"scale": "y", "field": "y"},
"y2": {"scale": "y", "value": 0}
},
"update": {
"fill": {"value": "steelblue"}
Vega is a detailed declarative
specification for visualizations,
built on D3.

29. #JSM2016
Jake VanderPlas
Bar Chart: Vega-Lite
{
"description": "A simple bar chart with embedded data.",
"data": {
"values": [
{"a": "A","b": 28}, {"a": "B","b": 55}, {"a": "C","b": 43},
{"a": "D","b": 91}, {"a": "E","b": 81}, {"a": "F","b": 53},
{"a": "G","b": 19}, {"a": "H","b": 87}, {"a": "I","b": 52}
]
},
"mark": "bar",
"encoding": {
"x": {"field": "a", "type": "ordinal"},
"y": {"field": "b", "type": "quantitative"}
}
}
Vega-Lite is a simpler
declarative specification aimed
at statistical visualization.

30. #JSM2016
Jake VanderPlas
Bar Chart: Altair
Altair is a Python API for creating
Vega-Lite specifications.

31. @jakevdp
Jake VanderPlas
From Declarative API
to declarative Grammar
chart = Chart(url).mark_circle(
opacity=0.3
).encode(
x='petalLength:Q',
y='sepalWidth:Q',
color='species:N',
)
chart.display()

32. @jakevdp
Jake VanderPlas
From Declarative API
to declarative Grammar
>>> chart.to_dict()
{'config': {'mark': {'opacity': 0.3}},
'data':
{'url': 'https://vega.github.io/vega-datasets/data/iris.json'},
'encoding': {'color': {'field': 'species', 'type': 'nominal'},
'x': {'field': 'petalLength', 'type': 'quantitative'},
'y': {'field': 'sepalWidth', 'type': 'quantitative'}},
'mark': 'circle'}

33. #JSM2016
Jake VanderPlas
Key Features of Altair:
- Designed with Statistical Visualizations in mind
- Data specified in Tidy Format & linked to a
declared type: Quantitative, Nominal, Ordinal,
Temporal
- Well-defined set of marks to represent data
- Encoding Channels map
data features (i.e. columns) to
visual encodings (e.g. x, y, color, size, etc.)
- Simple data transformations supported
natively

34. #JSM2016
Jake VanderPlas
But why another plotting library?
Teaching: students can learn
visualization concepts with minimal
syntactic distraction.
pixels, can publish data + plot
specification for greater flexibility &
reproducibility.
Cross-Pollination: Vega-Lite has the
potential to provide a cross-platform
lingua franca of statistical visualization.
- Matplotlib
- Bokeh
- Plotly
- Seaborn
- Holoviews
- VisPy
- ggplot
- pandas plot
- Lightning

35. @jakevdp
Jake VanderPlas
Altair/Vega-Lite supports many plot types:

36. @jakevdp
Jake VanderPlas
Altair/Vega-Lite supports many plot types:

37. @jakevdp
Jake VanderPlas
Altair/Vega-Lite supports many plot types:

38. @jakevdp
Jake VanderPlas
Altair/Vega-Lite supports many plot types:

39. @jakevdp
Jake VanderPlas
Altair/Vega-Lite supports many plot types:

40. @jakevdp
Jake VanderPlas
Altair/Vega-Lite supports many plot types:

41. #JSM2016
Jake VanderPlas
(Visualizations from
jakevdp/altair-examples).

42. @jakevdp
Jake VanderPlas
Some Live Examples . . .
See the notebook at
https://github.com/jakevdp/talks/blob/master/2016-11-9-Altair.ipynb

43. @jakevdp
Jake VanderPlas
or
\$ conda install altair --channel conda-forge
\$ pip install altair
\$ jupyter nbextension install --sys-prefix --py vega
Try Altair:
http://github.com/ellisonbg/altair/
For a Jupyter notebook tutorial, type
import altair
altair.tutorial()

44. @jakevdp
Jake VanderPlas
Altair’s Development is Active!
- More plot types
- Higher-level Statistical routines
- Improve layering API
- Vega-Tooltip interaction
- Vega-Lite's Grammar of Interaction
(See [1])
[1] http://idl.cs.washington.edu/papers/vega-lite/

45. @jakevdp
Jake VanderPlas
Email: [email protected]