Slide 1

Slide 1 text

Scientific Visualization By Eitan Lees

Slide 2

Slide 2 text

Scientific Visualization By Eitan Lees * * Mostly in python

Slide 3

Slide 3 text

Scientific Visualization By Eitan Lees * * Mostly in python Data

Slide 4

Slide 4 text

Scientific Visualization By Eitan Lees * * Mostly in python Data Plots

Slide 5

Slide 5 text

By Eitan Lees Data Plots

Slide 6

Slide 6 text

Scientific Visualization By Eitan Lees

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

Acknowledgements - Sachin Shanbhag for letting me explore - Nathan Crock for putting it all together - Much of today’s talk comes from Jake Vanderplas’ PyCon 2017 talk - Brian Granger and the Jupyter project

Slide 9

Slide 9 text

Show and Tell

Slide 10

Slide 10 text

So you want to make a plot?

Slide 11

Slide 11 text

- Visit - Mongo - Vesta - Gnuplot - TikZ - Matlab - Matplotlib - Adobe Illustrator

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

Government Solutions

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

Advantages: - All in one platform - Developed with scientific application in mind - Very popular for physical modeling Weakness: - Too much for simple plots - GUI forward - Stability

Slide 17

Slide 17 text

Commercial Solutions

Slide 18

Slide 18 text

Data Visualization

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

Python Visualization Landscape

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

Strengths: - Designed like MatLab: switching was easy

Slide 23

Slide 23 text

Strengths: - Designed like MatLab: switching was easy - Many rendering backends

Slide 24

Slide 24 text

Strengths: - Designed like MatLab: switching was easy - Many rendering backends - Can reproduce just about any plot (with a bit of effort)

Slide 25

Slide 25 text

Strengths: - Designed like MatLab: switching was easy - Many rendering backends - Can reproduce just about any plot (with a bit of effort) - Well-tested, standard tool for over a decade

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

Strengths: - Designed like MatLab: switching was easy - Many rendering backends - Can reproduce just about any plot (with a bit of effort) - Well-tested, standard tool for over a decade Weaknesses: - API is imperative & often overly verbose - Poor support for web/interactive graphics - Often slow for large & complicated data

Slide 28

Slide 28 text

Objective: Improve on the weaknesses of matplotlib (without sacrificing the strengths!)

Slide 29

Slide 29 text

Building on Matplotlib Common Idea: Keep matplotlib as a versatile well tested backend, and provide a new domain specific API.

Slide 30

Slide 30 text

Key Features: - Provides the DataFrame object - Also provides a simple API for plotting

Slide 31

Slide 31 text

Key Features: - Provides the DataFrame object - Also provides a simple API for plotting - Recently more sophisticated statistical visualization tools have been added

Slide 32

Slide 32 text

Key Features: - Like pandas, wraps matplotlib - Nice set of color palettes & plot styles - Focus on statistical visualization & modeling

Slide 33

Slide 33 text

No content

Slide 34

Slide 34 text

No content

Slide 35

Slide 35 text

No content

Slide 36

Slide 36 text

But what about the internet?!?

Slide 37

Slide 37 text

No content

Slide 38

Slide 38 text

No content

Slide 39

Slide 39 text

Common Idea: build a new API that produces a plot serialization (often JSON) that can be displayed in the browser (often in Jupyter notebooks)

Slide 40

Slide 40 text

Bokeh

Slide 41

Slide 41 text

No content

Slide 42

Slide 42 text

Bokeh Advantages: - Web view/interactivity - Handles large data and/or streaming datasets - Geographical visualizations - Fully open source Weakness: - Plotly has some paid features - Limited output formats - Smaller (but growing) community

Slide 43

Slide 43 text

No content

Slide 44

Slide 44 text

No content

Slide 45

Slide 45 text

The elephant in the room

Slide 46

Slide 46 text

No content

Slide 47

Slide 47 text

D3 is everywhere

Slide 48

Slide 48 text

But working with D3 can be challenging ...

Slide 49

Slide 49 text

“Simple” Barchart

Slide 50

Slide 50 text

D3 is a Javascript package that streamlines the manipulation of objects on a webpage

Slide 51

Slide 51 text

Vega is a detailed declarative specification for visualizations, built on D3

Slide 52

Slide 52 text

Vega-Lite is a simpler declarative specification aimed at statistical visualization

Slide 53

Slide 53 text

Altair is a Python API for creating Vega-Lite specifications

Slide 54

Slide 54 text

No content

Slide 55

Slide 55 text

Bridging the gap

Slide 56

Slide 56 text

No content

Slide 57

Slide 57 text

- Aggregates data and sends pixels - Can handle interactive visualization of billions of rows - Datasets themselves stored in objects that automatically produce intelligent visualizations

Slide 58

Slide 58 text

Don’t care about the internet?

Slide 59

Slide 59 text

Want more power?

Slide 60

Slide 60 text

No content

Slide 61

Slide 61 text

No content

Slide 62

Slide 62 text

No content

Slide 63

Slide 63 text

The Python Visualization Landscape

Slide 64

Slide 64 text

Toward Declarative Visualization

Slide 65

Slide 65 text

What do you mean by declarative visualization?

Slide 66

Slide 66 text

Example: Statistical Data Tidy data: i.e. rows are samples, columns are features

Slide 67

Slide 67 text

Tidy data: i.e. rows are samples, columns are features “ I want to scatter petal length vs. sepal length, and color by species” Example: Statistical Data

Slide 68

Slide 68 text

No content

Slide 69

Slide 69 text

No content

Slide 70

Slide 70 text

Problem: We’re mixing the what with the how

Slide 71

Slide 71 text

Toward a well-motivated Declarative Visualization Imperative - Specify How something should be done. - Specification & Execution intertwined. - “Put a red circle here and a blue circle here” Declarative - Specify What should be done. - Separates Specification from Execution - “Map to a position, and to a color” Declarative visualizations lets you think about the data and relationships, rather than incidental details

Slide 72

Slide 72 text

Toward a well-motivated Declarative Visualization Imperative - Specify How something should be done. - Specification & Execution intertwined. - “Put a red circle here and a blue circle here” Declarative - Specify What should be done. - Separates Specification from Execution - “Map to a position, and to a color” Declarative visualizations lets you think about the data and relationships, rather than incidental details

Slide 73

Slide 73 text

Based on the Vega and Vega-Lite grammars https://altair-viz.github.io/ Altair Declarative visualization in Python

Slide 74

Slide 74 text

Altair for Statistical Visualization

Slide 75

Slide 75 text

Encodings are Flexible:

Slide 76

Slide 76 text

Altair is Interactive

Slide 77

Slide 77 text

...

Slide 78

Slide 78 text

... Altair Tutorial THIS Friday in THIS room from 3:30-4:30 pm