Pretty Pictures Please - Hannah Aizenmann

Pretty Pictures Please - Hannah Aizenmann

The Python visualization landscape has a couple of really great libraries for doing data visualization, but most everyone defaults to always using the same library for all their pictures. This talk will give an overview of the philosophies underpinning matplotlib, chaco, bokeh, vispy, vincent, and d3py and discuss what sort of applications each library is best suited for.

79ecef8e99fbedb7bca755c7ec1926f1?s=128

PyGotham 2014

August 17, 2014
Tweet

Transcript

  1. Pretty Pictures Please Hannah Aizenman Department of Computer Science City

    College of New York & CUNY Graduate Center story645@gmail.com @story645
  2. ggplot ggplot is not a good fit for people trying

    to make highly customized data visualizations. While you can make some very intricate, great looking plots, ggplot sacrifices highly customization in favor of generally doing ”what you’d expect”. - how it works
  3. ggplot from ggplot import * ggplot(diamonds, aes(x=’price’, fill=’cut’)) +\ geom_density(

    alpha=0.25) +\ facet_wrap("clarity")
  4. Mayavi 3D scientific data visualization and plotting in Python -

    tagline
  5. Mayavi # Create the data. #..trunc # View it. from

    mayavi import mlab s = mlab.mesh(x, y, z) mlab.show()
  6. Vispy Vispy is a high-performance interactive 2D/3D data visualization library.

    Vispy leverages the computational power of modern Graphics Processing Units (GPUs) through the OpenGL library to display very large datasets....As of today (July 2014), using Vispy requires knowing OpenGL. - homepage
  7. vispy VERT_SHADER = """ // simple vertex shader attribute vec3

    a_position; void main (void) { gl_Position = vec4(a_position, 1.0);}""" FRAG_SHADER = """ // simple fragment shader uniform vec4 u_color; void main(){gl_FragColor = u_color;}""" class Canvas(app.Canvas): def __init__(self): app.Canvas.__init__(self, close_keys=’escape’) # Create program self._program = gloo.Program(VERT_SHADER, FRAG_SHADER) # Set uniform and attribute self._program[’u_color’] = 0.2, 1.0, 0.4, 1 self._program[’a_position’] = gloo.VertexBuffer(vPosition) def on_initialize(self, event): gloo.set_clear_color((1, 1, 1, 1)) def on_resize(self, event): width, height = event.size gloo.set_viewport(0, 0, width, height) def on_draw(self, event): gloo.clear() self._program.draw(’triangle_strip’)
  8. chaco Chaco is a plotting application toolkit. This means that

    it can build both static plots and dynamic data visualizations that let you interactively explore your data. - tutorial
  9. chaco class PlotExample(HasTraits): plot = Instance(Plot) traits_view = View(UItem(’plot’, editor=ComponentEditor()),

    width=400, height=400, resizable=True,) def __init__(self, index, series_a, **kw): super(PlotExample, self).__init__(**kw) plot_data = ArrayPlotData(index=index) plot_data.set_data(’series_a’, series_a) #...trunc self.plot = Plot(plot_data) self.plot.plot((’index’, ’series_a’), type=’bar’, bar_width=0.8, color=’auto’) #..trunc self.plot.value_range.low = 0 # replace the index values with some nicer labels #..trunc index = numpy.array([1,2,3,4,5]) demo = PlotExample(index, index*10, index*5, index*2) if __name__ == "__main__": demo.configure_traits()
  10. Bokeh however the main goal of Bokeh is to provide

    approachable capability for novel interactive visualizations in the browser. If you would like to have the benefits of HTML canvas rendering, dynamic downsampling, abstract rendering, server plot hosting, and the possibility of interacting from languages besides python, please consider Bokeh for your project. - Bokeh FAQ
  11. Bokeh import numpy as np from bokeh.plotting import * N

    = 100 x = np.linspace(0, 4*np.pi, N) y = np.sin(x) output_file("legend.html", title="legend.py example") figure(tools="pan,wheel_zoom, box_zoom,reset,previewsave,select") scatter(x, y, legend="sin(x)", name="legend_example") line(x, y, legend="sin(x)") line(x, 2*y, line_dash=[4, 4], line_color="orange", line_width=2, legend="2*sin(x)") square(x, 3*y, fill_color=None, line_color="green", legend="3*sin(x)") line(x, 3*y, fill_color=None, line_color="green", legend="3*sin(x)") show() # open a browser
  12. d3py You probably don’t want to stop reading here, though.

    Instead, you should go check out vincent which is a much nicer take on this idea, created using vega, and is in general a much more gentlemanly way to go about this sort of thing. It’s also being properly updated and developed, unlike the code below. - d3py
  13. Vincent The data capabilities of Python. The visualization capabilities of

    JavaScript - concept
  14. Vincent cats = [’y1’, ’y2’, ’y3’, ’y4’] index = range(1,

    21, 1) multi_iter1 = {’index’: index} for cat in cats: multi_iter1[cat] = [random.randint(10, 100) for x in index] lines = vincent.Line( multi_iter1, iter_idx=’index’) lines.legend( title=’Categories’) lines.axis_titles( x=’Index’, y=’Data Value’)
  15. Matplotlib matplotlib is designed with the philosophy that you should

    be able to create simple plots with just a few commands, or just one! If you want to see a histogram of your data, you shouldnt need to instantiate objects, call methods, set properties, and so on; it should just work. - matplotlib intro
  16. Matplotlib: Pylab import matplotlib.pyplot as plt plt.figure() plt.plot([1,2,3,4]) plt.ylabel(’some numbers’)

    plt.show()
  17. Matplotlib: API import matplotlib.pyplot as plt fig = plt.figure() ax

    = fig.add_subplot(1,1,1) ax.plot([1,2,3,4]) ax.set_ylabel(’some numbers’) fig.savefig("fig.png")
  18. Matplotlib: Backend from matplotlib.backends.backend_agg import ( FigureCanvasAgg as FigureCanvas) from

    matplotlib.figure import Figure fig = Figure() canvas = FigureCanvas(fig) ax = fig.add_subplot(1,1,1) ax.plot([1,2,3,4]) ax.set_ylabel(’some numbers’) canvas.print_figure(’test’)
  19. seaborn If matplotlib tries to make easy things easy and

    hard things possible, seaborn aims to make a well-defined set of hard things easy too. - intro
  20. seaborn import seaborn as sns sns.set(style="ticks") df = sns.load_dataset( "anscombe")

    sns.lmplot("x", "y", col="dataset", hue="dataset", data=df, col_wrap=2, ci=None, palette="muted", size=4, scatter_kws={"s": 50, "alpha": 1})
  21. Basemap Basemap is geared toward the needs of earth scientists,

    particular oceanographers and meteorologists... Over the years, the capabilities of Basemap have evolved as scientists in other disciplines (such as biology, geology and geophysics) requested and contributed new features. - Jeff Whitaker (intro)
  22. Basemap ax = fig.add_subplot(1,1,1) m = Basemap(projection=’cyl’, ax=ax, resolution =

    ’l’, llcrnrlat=10,urcrnrlat=40, llcrnrlon=100,urcrnrlon=140) m.drawcoastlines(color=’.8’) m.drawcountries(color=’.8’) m.drawmapboundary(color=’.8’) m.drawrivers(color=’lightblue’, linewidth=.5) x, y = m(113.7333, 22.5333) m.scatter(x,y, s=50, c=’red’, zorder=100) ax.text(x+2, y, "Pearl")
  23. Cartopy Cartopy was originally developed at the UK Met Office

    to allow scientists to visualize their data on maps quickly, easily and most importantly, accurately. - intro
  24. Cartopy ax = fig.add_subplot(111, projection=cartopy.crs.PlateCarree()) ax.add_feature( cartopy.feature.LAND) ax.add_feature( cartopy.feature.OCEAN) ax.add_feature(

    cartopy.feature.COASTLINE) ax.add_feature( cartopy.feature.BORDERS, linestyle=’:’) ax.add_feature( cartopy.feature.LAKES, alpha=0.5) ax.add_feature( cartopy.feature.RIVERS) ax.set_extent([-20, 60, -40, 40])
  25. mpld3 The mpld3 project brings together Matplotlib, the popular Python-based

    graphing library, and D3js, the popular Javascript library for creating interactive data visualizations for the web. The result is a simple API for exporting your matplotlib graphics to HTML code - intro
  26. mpld3 scatter = ax.scatter(np.random.normal(size=100), np.random.normal(size=100), s=1000*np.random.random(size=100), c=np.random.random(size=100), alpha=0.3, cmap=plt.cm.jet) ax.grid(color=’white’,

    linestyle=’solid’) ax.set_title("Scatter Plot (with tooltips!)", size=20) labels = [’point {0}’.format(i + 1) for i in range(100)] tooltip = mpld3.plugins.PointLabelTooltip(scatter, labels=labels) mpld3.plugins.connect(fig, tooltip) mpld3.show()
  27. plotly Publish your Matplotlib figures to the web with one

    line! - Python API tagline
  28. plotly import plotly.plotly as py py.sign_in(’story645’, ’abcd’) n = 50

    x,y,z,s,ew = np.random.rand(5, n) c, ec = np.random.rand(2, n, 4) area_scale, width_scale = 500, 5 fig, ax = plt.subplots() sc = ax.scatter(x, y, c=c, s=np.square(s)*area_scale, edgecolor=ec, linewidth=ew*width_scale) ax.grid() plot_url = py.plot_mpl(fig)
  29. Why Matplotlib? • Science! • Publication quality • GUI embeddable

    • Extendable • Seaborn, Basemap, Cartopy, mpld3, plotly • Largest community in the Python viz ecosystem
  30. Acknowledgments %99 of the figures, code and descriptions came from

    the various project’s pages, so thank you to all their authors and contributers for providing documentation and examples.