A Spectrum of Data
Quantitative Qualitative
Nominal
•City
•Gender
Ordinal
•Months
•Seasons
•Agreement
Interval
•Latitude
•Longitude
•Dates
Ratio
•Amount
•Age
•Height
Slide 6
Slide 6 text
Another Spectrum of Data
Structured Unstructured
•Books
•Blogs
•Video
•Images
•Music
•Databases
•Spreadsheets
•Etc
Slide 7
Slide 7 text
What is structure?
Unstructured Structured
Slide 8
Slide 8 text
Unstructured Data
Examples: blog posts
books, images, video
Not easily searchable
Has no consistent
underlying
organization
Slide 9
Slide 9 text
Structured Data
Searchable
Has a consistent
underlying
organization
Examples:
spreadsheets,
.csv files, xml
files, databases
Slide 10
Slide 10 text
What Can Be Visualized?
Quantitative
Qualitative
Unstructured Structured
Slide 11
Slide 11 text
What Can Be Visualized?
Quantitative
Qualitative
Unstructured Structured
Slide 12
Slide 12 text
What Can Be Visualized?
Quantitative
Qualitative
Unstructured Structured
Slide 13
Slide 13 text
What Can Be Visualized?
Quantitative
Qualitative
Unstructured Structured
Slide 14
Slide 14 text
Unstructured data can be
transformed into
structured data
Slide 15
Slide 15 text
What Can Be Visualized?
Quantitative
Qualitative
Unstructured Structured
Slide 16
Slide 16 text
ANY DATA CAN BE
STRUCTURED.
Slide 17
Slide 17 text
ANY DATA CAN BE
VISUALIZED.
Slide 18
Slide 18 text
Structuring Data:
Abstract Data Models
This is a
person. This is person data
model, which is an
abstraction of a person.
o Person ID
o Name
o Age
o Height
o Profession
o Hobbies
o Nationality
o Native Language
Person
Slide 19
Slide 19 text
An OSBridge Attendee
Data Model
• Name
• Email address
• Home address
• Company/organization
• Twitter/identi.ca
• Website
• Age
• Gender
• Years in open source
• Food preference
• Favorite language
• Current projects
• Favorite color
Slide 20
Slide 20 text
One Rule for Transforming
Unstructured Data into
Structured Data
CONSISTENCY.
CONSISTENCY.
CONSISTENCY.
Slide 21
Slide 21 text
What you need for
data visualization:
Structured Data
Slide 22
Slide 22 text
But what do you need for a
good data visualization?
Slide 23
Slide 23 text
The
Dimensionality
Problem
Slide 24
Slide 24 text
A dimension is a
variable in the data
• Name
• Email address
• Home address
• Company/organization
• Twitter/identi.ca
• Website
• Age
• Gender
• Years in open source
• Food preference
• Favorite language
• Current projects
• Favorite color
These are all
dimensions of
the data
Slide 25
Slide 25 text
Dimensionality Increases Quickly
Email
Home address
Company/Org
Twitter
Website
Age
Gender
Years in OS
Food preference
Favorite language
Current projects
Favorite color
78 pairwise
relationships!
Slide 26
Slide 26 text
Questions Reduce Dimensionality
Email
Home address
Company/Org
Twitter
Website
Age
Gender
Years in OS
Food preference
Favorite language
Current projects
Favorite color
How are food preferences
distributed among
languages and projects?
Slide 27
Slide 27 text
Questions Reduce Dimensionality
Email
Home address
Company/Org
Twitter
Website
Age
Gender
Years in OS
Food preference
Favorite language
Current projects
Favorite color
How are languages and
projects distributed
geographically?
Slide 28
Slide 28 text
Multiple comparison
problem
The more relationships you look
at, the more likely you are to
find a pattern that only exists
because of random chance
Slide 29
Slide 29 text
The View: The Smallest
Unit of Visualization
Slide 30
Slide 30 text
The View: The Smallest
Unit of Visualization
Slide 31
Slide 31 text
The View: The Smallest
Unit of Visualization
Slide 32
Slide 32 text
The View: The Smallest
Unit of Visualization
Slide 33
Slide 33 text
Why is reducing
dimensionality
especially important
for visualization?
Slide 34
Slide 34 text
Dimension count: 1
Date dimension
encoded as
position along
the horizontal
axis
Slide 35
Slide 35 text
Dimension count: 2
Time dimension
encoded as
position along
the vertical
axis
Slide 36
Slide 36 text
Dimension count: 3
Type of training
dimension
encoded as color
Slide 37
Slide 37 text
Dimension count: 4
Number of stat
points gained
from training
dimension
encoded as
bubble size
Slide 38
Slide 38 text
So what do you need for a
good data visualization?
Slide 39
Slide 39 text
What you need:
Structured Data Questions
Slide 40
Slide 40 text
Choosing the right
views for your data
Slide 41
Slide 41 text
Quantitative Data
Slide 42
Slide 42 text
Spatial (Location) Data
Slide 43
Slide 43 text
Temporal (Time) Data
Slide 44
Slide 44 text
Relational (Network) Data
Slide 45
Slide 45 text
What if one view
isn’t enough?
Slide 46
Slide 46 text
Multiple Coordinated
Views to the rescue!
Slide 47
Slide 47 text
Filtering, Brushing +
Linking
Slide 48
Slide 48 text
Focus + Context
This view shows the
context, or overview
of the data
Slide 49
Slide 49 text
Focus + Context
This view shows the details
of the data selected in the
context view
Selection
Slide 50
Slide 50 text
Tools
Slide 51
Slide 51 text
D3.js
Limited support for
multiple coordinated
views, large data
sets
Beautiful browser-
based visualizations
JavaScript
visualization
library
Slide 52
Slide 52 text
Improvise
Few resources
for learning
Java-based desktop
design environment
Powerful, flexible
tool for creating
multiple-coordinated
view visualizations
Slide 53
Slide 53 text
Python
Great for data
wrangling and rapid
prototyping
Check out MatPlotLib
and ggplot
visualization
libaries
Slide 54
Slide 54 text
Go forth and
visualize!
Slide 55
Slide 55 text
Credits
Liz Mc, CC 2.0, via Flickr
Kent K. Barns, CC 2.0,
kentkb.com
Jerome Collins, CC 2.0,
via Flickr
Cristinacosta, CC 2.0,
via Flickr
Thinking Machine 4, Mid-
game, by Martin Wattenberg
and Marek Walczak