Slide 1

Slide 1 text

COMPLEX DECISIONS SIMPLIFIED edgetier Data Visualisation in Python Quick and easy routes to plotting magic Shane Lynn Ph.D. @shane_a_lynn www.edgetier.com | info@edgetier.com | @TeamEdgeTier

Slide 2

Slide 2 text

• Data Visualisation Basics • Basic Python Setup & Core Libraries • Code examples and comparisons • What to avoid COMPLEX DECISIONS SIMPLIFIED edgetier Outline

Slide 3

Slide 3 text

Commercially focused SaaS to increase revenue and reduce costs Focus on data science, machine learning, and automation AI system works alongside customer service agents to increase efficiency by 100% EdgeTier specialise in data and artificial intelligence products for customer contact centres. COMPLEX DECISIONS SIMPLIFIED edgetier EdgeTier

Slide 4

Slide 4 text

COMPLEX DECISIONS SIMPLIFIED edgetier Data Visualisation Data visualisation is a general term that describes any effort to help people understand the significance of data by placing it in a visual context.

Slide 5

Slide 5 text

COMPLEX DECISIONS SIMPLIFIED edgetier Data Visualisation Iteration speed Un-intrusive Flexible Aesthetically pleasing Choice of Data Visualisation Tool is important

Slide 6

Slide 6 text

COMPLEX DECISIONS SIMPLIFIED edgetier Chart Choice Source: www.extremepresentation.com

Slide 7

Slide 7 text

COMPLEX DECISIONS SIMPLIFIED edgetier Chart Choice – Fearsome Foursome HISTOGRAM An accurate graphical representation of the distribution of numeric data. BARPLOT Represents the value of entities using bar of various length. SCATTER PLOT Show the relationship between 2 numeric variables. LINE CHART Shows the evolution of numeric variables. Icons: www.data-to-viz.com

Slide 8

Slide 8 text

COMPLEX DECISIONS SIMPLIFIED edgetier Chart Choice – Fearsome Foursome HISTOGRAM An accurate graphical representation of the distribution of numeric data. BARPLOT Represents the value of entities using bar of various length. SCATTER PLOT Show the relationship between 2 numeric variables. LINE CHART Shows the evolution of numeric variables. Icons: www.data-to-viz.com BARPLOT Represents the value of entities using bar of various length. SCATTER PLOT Show the relationship between 2 numeric variables. BOXPLOT Summarize the distribution of numeric variables SANKEY DIAGRAM Showing flows with smooth links Special Mentions CHOROPLETH MAP Display an aggregated value for each region of a map

Slide 9

Slide 9 text

COMPLEX DECISIONS SIMPLIFIED edgetier Data Visualisation in Python - Lots of choice of libraries - Many tools, with varied APIs & outputs - Best to conquer and become familiar with one / two Python Visualisation seaborn Interactive environment Data Manipulation Library Visualisation Library

Slide 10

Slide 10 text

COMPLEX DECISIONS SIMPLIFIED edgetier Matplotlib Low level plotting library with Matlab-like API + Very flexible, complete control - Verbose plots, aesthetically lacking, sometimes difficult with Pandas ...need to know enough to debug… Grand daddy of Python Plotting

Slide 11

Slide 11 text

COMPLEX DECISIONS SIMPLIFIED edgetier Pandas / Seaborn / Altair Pandas – Visualisation API built into DataFrame & Series objects, interface to Matplotlib. Seaborn – extends and provides high- level API on Matplotlib with improved styling. Altair – Built on “Vega-Lite” visualisation grammar. Allows some interactive plots in Jupyter Notebooks. Higher level plotting

Slide 12

Slide 12 text

COMPLEX DECISIONS SIMPLIFIED edgetier Basic Notebook Setup Top of notebook – inline vs notebook style. Theme also can be chosen here Imports on Matplotlib

Slide 13

Slide 13 text

COMPLEX DECISIONS SIMPLIFIED edgetier Sample Data EdgeTier relevant sample dataset on chat system performance. Agents answering customer chats from different websites and languages – 5477 chats over 100 agents.

Slide 14

Slide 14 text

COMPLEX DECISIONS SIMPLIFIED edgetier The Bar Plot

Slide 15

Slide 15 text

COMPLEX DECISIONS SIMPLIFIED edgetier The Bar Plot - Matplotlib Python visualisation libraries often require that the data for plotting is pre-formatted for visualisation. For Pandas and Matplotlib, the visualisation library often only present the values, and does not do calculations. Bar plot of chats per user

Slide 16

Slide 16 text

COMPLEX DECISIONS SIMPLIFIED edgetier The Bar Plot - Matplotlib .bar() function does the work, manually position ‘x’ labels and positions. Most code here is formatting and display.

Slide 17

Slide 17 text

COMPLEX DECISIONS SIMPLIFIED edgetier The Bar Plot - Pandas Plot output is Matplotlib – same manipulation. Slightly simpler API / data access.

Slide 18

Slide 18 text

COMPLEX DECISIONS SIMPLIFIED edgetier The Bar Plot - Seaborn Simpler data access again. Same Matplotlib formatting functions seaborn

Slide 19

Slide 19 text

COMPLEX DECISIONS SIMPLIFIED edgetier The Bar Plot - Altair Not Matplotlib-based – very different syntax and formatting. Ordering was difficult here. Only one command for everything. JSON format behind.

Slide 20

Slide 20 text

COMPLEX DECISIONS SIMPLIFIED edgetier The Bar Plot - Altair Not Matplotlib-based – very different syntax and formatting. Ordering was difficult here. Only one command for everything. JSON format behind.

Slide 21

Slide 21 text

COMPLEX DECISIONS SIMPLIFIED edgetier The Bar Plot seaborn

Slide 22

Slide 22 text

COMPLEX DECISIONS SIMPLIFIED edgetier Prettier Pandas Plots seaborn Seaborn styles are applied to all matplotlib plots – Cheat your way to nicer looking Pandas Plots!

Slide 23

Slide 23 text

COMPLEX DECISIONS SIMPLIFIED edgetier More Challenging Bar Plot For the top 20 agents, what was the split of the top websites? We want a ‘stacked bar’ for this visualisation.

Slide 24

Slide 24 text

COMPLEX DECISIONS SIMPLIFIED edgetier Stacked Bar - Matplotlib

Slide 25

Slide 25 text

COMPLEX DECISIONS SIMPLIFIED edgetier Stacked Bar - Matplotlib

Slide 26

Slide 26 text

COMPLEX DECISIONS SIMPLIFIED edgetier Stacked Bar - Matplotlib

Slide 27

Slide 27 text

COMPLEX DECISIONS SIMPLIFIED edgetier Stacked Bar - Pandas Plotting code is simple, but data manipulation required.

Slide 28

Slide 28 text

COMPLEX DECISIONS SIMPLIFIED edgetier Stacked Bar - Seaborn Elegant API, simple code structure, but … …embarrassingly… no stacked-bar chart support! seaborn

Slide 29

Slide 29 text

COMPLEX DECISIONS SIMPLIFIED edgetier Stacked Bar - Seaborn Elegant API, simple code structure, but … …embarrassingly… no stacked-bar chart support! seaborn

Slide 30

Slide 30 text

COMPLEX DECISIONS SIMPLIFIED edgetier Stacked Bar - Altair Simple output, short code. Some issues around data storage, JSON formats, and sorting is difficult.

Slide 31

Slide 31 text

COMPLEX DECISIONS SIMPLIFIED edgetier Seaborn - Estimators Calculations done as part of plotting – no previous data manipulations. Separation of data and visualisation code. seaborn

Slide 32

Slide 32 text

COMPLEX DECISIONS SIMPLIFIED edgetier Seaborn - Estimators Very simple to change estimator function to calculate different statistics. Similar functionality available in Altair seaborn

Slide 33

Slide 33 text

COMPLEX DECISIONS SIMPLIFIED edgetier Histograms

Slide 34

Slide 34 text

COMPLEX DECISIONS SIMPLIFIED edgetier Histograms All libraries good at univariate distribution visualisations. seaborn

Slide 35

Slide 35 text

COMPLEX DECISIONS SIMPLIFIED edgetier Histograms seaborn

Slide 36

Slide 36 text

COMPLEX DECISIONS SIMPLIFIED edgetier Histograms Layering / comparison achieved unfortunately by building up the histograms in place.

Slide 37

Slide 37 text

COMPLEX DECISIONS SIMPLIFIED edgetier Histograms - Seaborn Some really nice options for impressive and informative hints on Seaborn graphs. seaborn

Slide 38

Slide 38 text

COMPLEX DECISIONS SIMPLIFIED edgetier Scatter Plots - Pandas

Slide 39

Slide 39 text

COMPLEX DECISIONS SIMPLIFIED edgetier Scatter Plots - Pandas Pandas: Good for quick single-coloured scatter visualisations. Messy with multiple categories.

Slide 40

Slide 40 text

COMPLEX DECISIONS SIMPLIFIED edgetier Scatter Plots - Pandas Pandas: Good for quick single-coloured scatter visualisations. Messy with multiple categories.

Slide 41

Slide 41 text

COMPLEX DECISIONS SIMPLIFIED edgetier Scatter Plots - Seaborn Seaborn / Altair: Better higher level representation, and better for multi-category scatters. seaborn

Slide 42

Slide 42 text

COMPLEX DECISIONS SIMPLIFIED edgetier Scatter Plots - Altair Seaborn / Altair: Better higher level representation, and better for multi-category scatters.

Slide 43

Slide 43 text

COMPLEX DECISIONS SIMPLIFIED edgetier Line Plots

Slide 44

Slide 44 text

COMPLEX DECISIONS SIMPLIFIED edgetier Line Plots Plot chats per language over time Pandas: Needs data manipulation, simple thereafter.

Slide 45

Slide 45 text

COMPLEX DECISIONS SIMPLIFIED edgetier Line Plots Pandas: Needs data manipulation, simple thereafter.

Slide 46

Slide 46 text

COMPLEX DECISIONS SIMPLIFIED edgetier Line Plots Pandas: Needs data manipulation, simple thereafter.

Slide 47

Slide 47 text

COMPLEX DECISIONS SIMPLIFIED edgetier Line Plots Seaborn/Altair: Operate directly on raw data seaborn

Slide 48

Slide 48 text

COMPLEX DECISIONS SIMPLIFIED edgetier More Options! Folium: Generate interactive maps using leaflet.js Matplotlib: Basemap plugin Geospatial Viz Bokeh: Makes visualisations for web browser interaction. Plotly: Online visualisations – runs by default in cloud Interactive Plots

Slide 49

Slide 49 text

COMPLEX DECISIONS SIMPLIFIED edgetier What to Avoid – Angles? Pie Charts: Radial angle for comparison. Humans are very bad at accurate radial comparisons – we’ve evolved for speedy length / distance comparisons. https://blog.funnel.io/why-we-dont-use-pie-charts-and-some-tips-on- better-data-visualizations

Slide 50

Slide 50 text

COMPLEX DECISIONS SIMPLIFIED edgetier What to Avoid – Angles? Pie Charts: Radial angle for comparison. Humans are very bad at accurate radial comparisons – we’ve evolved for speedy length / distance comparisons. https://blog.funnel.io/why-we-dont-use-pie-charts-and-some-tips-on- better-data-visualizations

Slide 51

Slide 51 text

COMPLEX DECISIONS SIMPLIFIED edgetier What to Avoid – Area? Area: We’re bad at area – rank these bubbles by area, and compare them relative to each other.

Slide 52

Slide 52 text

COMPLEX DECISIONS SIMPLIFIED edgetier What to Avoid – Area? Area: We’re bad at area – rank these bubbles by area, and compare them relative to each other. https://www.data-to-viz.com/caveat/area_hard.html

Slide 53

Slide 53 text

COMPLEX DECISIONS SIMPLIFIED edgetier What to Avoid – 3d? 3d: In general, 3D is “fake fancy”. Impractical but gee-whizz – avoid! Caveat: Interactive Scatters?

Slide 54

Slide 54 text

Wide variety of tools available in Python. Get familiar with Pandas syntax for quick & simple exploration, and use with Seaborn themes. Learn one more high-level library in detail – Seaborn or Altair for publication of output and more flexibility “Simplicity is the ultimate sophistication” Leonardo Da Vinci COMPLEX DECISIONS SIMPLIFIED edgetier Conclusions

Slide 55

Slide 55 text

COMPLEX DECISIONS SIMPLIFIED edgetier Data Visualisation in Python Quick and easy routes to plotting magic Shane Lynn PhD @shane_a_lynn | @TeamEdgeTier www.edgetier.com | info@edgetier.com | @TeamEdgeTier

Slide 56

Slide 56 text

Resources Tour of Python’s Data Landscape https://dsaber.com/2016/10/02/a-dramatic-tour-through-pythons-data- visualization-landscape-including-ggplot-and-altair/ Python Graph Gallery https://python-graph-gallery.com/ From Data to Viz https://www.data-to-viz.com/ COMPLEX DECISIONS SIMPLIFIED edgetier More?