Seeing at the
speed of thought
Empowering others through data exploration
Greg Goltsov
Senior Data Engineer
@gregoltsov
www.gregory.goltsov.info
(will have link to slides)
Slide 2
Slide 2 text
Seeing at the
speed of thought
Empowering others through data exploration
Slide 3
Slide 3 text
Seeing at the
speed of thought
Empowering others through data exploration
Slide 4
Slide 4 text
Seeing at the
speed of thought
Empowering others through data exploration
yourself
Slide 5
Slide 5 text
Seeing at the
speed of thought
Empowering others through data exploration
yourself
your team
Slide 6
Slide 6 text
Seeing at the
speed of thought
Empowering others through data exploration
yourself
your team
your company
Slide 7
Slide 7 text
Touch Surgery
Built marketing/sales dashboards for
Fortune 10 companies
Built educational dashboards for 4 of the
top 10 world-rated medical universities
All from scratch
Slide 8
Slide 8 text
Appear Here
World’s biggest online marketplace
for retail spaces
Internal recommendation system
Highly visual debug interface for
non-tech people
Slide 9
Slide 9 text
Southern Cross
Austereo
Modernising the data pipeline
Spearheading data-driven culture
throughout the company
Datasets covering 80% Australians
weekly
Slide 10
Slide 10 text
BI/DW
tools
Slide 11
Slide 11 text
BI/DW
tools
Slide 12
Slide 12 text
Remove barriers
Make feedback fast
Remove yourself
Slide 13
Slide 13 text
Remove
barriers
Slide 14
Slide 14 text
Remove
barriers
Catalogued datasets
with one-line
import in Python
Messy dataset
in PDFs
Slide 15
Slide 15 text
Remove
barriers
Dashboard with right
filters, Excel export
“Can you run a
query?”
Slide 16
Slide 16 text
Remove
barriers.
Foster
curiosity.
Slide 17
Slide 17 text
Make feedback
fast
Slide 18
Slide 18 text
Make feedback
fast
Found a new trend
via tinkering
“Tomorrow I’ll see
results of the batch job”
Slide 19
Slide 19 text
Make feedback
fast
“Check the dash in
15 mins”
“I put your request
into the backlog”
Slide 20
Slide 20 text
Make feedback
fast. Let people
tinker.
Slide 21
Slide 21 text
Remove
yourself
Slide 22
Slide 22 text
Remove
yourself
Data pipeline +
products
Ad-hoc
Slide 23
Slide 23 text
No content
Slide 24
Slide 24 text
No content
Slide 25
Slide 25 text
Remove
yourself.
Don’t stand
in the way.
Slide 26
Slide 26 text
Remove
barriers
Make
feedback
fast
Remove
yourself
Slide 27
Slide 27 text
The goal is to turn data
into information, and
information into insight.
– Carly Fiorina, former HP CEO
Slide 28
Slide 28 text
Insight
Information
Data
Slide 29
Slide 29 text
Insight
Information
Data
Value
↑
Abundance
Slide 30
Slide 30 text
Insight
Information
Data
Fraud
Access pattern
Logs
Slide 31
Slide 31 text
Insight
Information
Data
Key influencers
MOM trends
Tweets
Slide 32
Slide 32 text
Ad-hoc
queries
Data
pipeline
Fast to develop
Every query gets
thrown away after
Upfront investment
Every integration
builds foundations
Slide 33
Slide 33 text
Visualise your ETL.
Augment your Data
Warehouses with
Data Lakes.
Slide 34
Slide 34 text
No content
Slide 35
Slide 35 text
Extract Transform Load
Sources
Data
Warehouse
Slide 36
Slide 36 text
Extract Transform Load
Sources
Data
Warehouse
Data Insight
Time
Slide 37
Slide 37 text
Volume
Variety
Velocity
"3D Data Management: Controlling Data Volume, Velocity and Variety”, Gartner Inc. 2001
Slide 38
Slide 38 text
Volume
Variety
Velocity
"3D Data Management: Controlling Data Volume, Velocity and Variety”, Gartner Inc. 2001
Slide 39
Slide 39 text
Analysis of Unstructured Data: Applications of Text Analytics and Sentiment Mining
~80%
of all data is unstructured
Slide 40
Slide 40 text
~80%
of your data is unstructured
Slide 41
Slide 41 text
http://www.ft.com/cms/s/0/de15414e-ebad-11e1-985a-00144feab49a.html#axzz2F3CM6G7g
“Making sense of
unstructured data isn’t
about technology, it’s a
business challenge”
Slide 42
Slide 42 text
Aberdeen Group research
Don’t use
unstructured
data
Use
unstructured
data
Happy with the ability
to share data
18% 60%
Pleased with the
accessibility
20% 50%
Slide 43
Slide 43 text
Volume
Variety
Velocity
Machine learning
"3D Data Management: Controlling Data Volume, Velocity and Variety”, Gartner Inc. 2001
Collect Store
Process/
Analyse
Sources
Data
Warehouse
Data Insight
Insight
Time
Slide 46
Slide 46 text
Collect Store
Process/
Analyse
Slide 47
Slide 47 text
Collect Store
Process/
Analyse
Slide 48
Slide 48 text
Collect Store
Process/
Analyse
Slide 49
Slide 49 text
No content
Slide 50
Slide 50 text
Look at data.
A lot.
Slide 51
Slide 51 text
Look at data.
A lot.
http:/
/www.forbes.com/sites/gilpress/2016/03/23/data-
preparation-most-time-consuming-least-enjoyable-
data-science-task-survey-says
Slide 52
Slide 52 text
No content
Slide 53
Slide 53 text
No content
Slide 54
Slide 54 text
Scale computation and
storage separately
Go from non-trivial data to
dashboard in minutes
Spark is 20-100x faster than
MapReduce
Turnkey solution: www.databricks.com
OSS: Apache Zeppelin on AWS EMR Spark
Slide 55
Slide 55 text
We made it!
Now what?
Slide 56
Slide 56 text
We made it!
Now what?
Human scale.
Slide 57
Slide 57 text
AirBnB Scaling Tribal Knowledge
Slide 58
Slide 58 text
AirBnB Scaling Tribal Knowledge
Slide 59
Slide 59 text
AirBnB Scaling Tribal Knowledge
Slide 60
Slide 60 text
AirBnB Scaling Tribal Knowledge
Slide 61
Slide 61 text
AirBnB Scaling Tribal Knowledge
Slide 62
Slide 62 text
No content
Slide 63
Slide 63 text
THANK
YOU
Speaker Name: Greg Goltsov
Email: [email protected]
Organized by
UNICOM Trainings & Seminars Pvt. Ltd.
[email protected]
http://www.unicomlearning.com/2017/Big_Data_Visualization_Summit_Sydney