Become a Data Scientist
Francesco Tisiot
Analytics Tech Lead
Slide 2
Slide 2 text
Verona, Italy
http://ritt.md/ftisiot
Over10 Years in Analytics
[email protected]
@FTisiot
Oracle ACE Director
ITOUG Board President
Francesco Tisiot
Analytics Tech Lead
Data Engineering Analytics Data Science
www.rittmanmead.com
[email protected]
@rittmanmead
Slide 5
Slide 5 text
Agenda
•OAC
•Data Scientist
•Become a Data Scientist
Slide 6
Slide 6 text
Oracle Analytics Cloud
• Platform Services (PaaS)
• Delivered entirely in the cloud:
•No infrastructure footprint
•Flexibility
•Simplified, metered licensing
• Several options to suit your needs:
•BYOL
•Functionality bundled into 2 editions
•Professional
•Enterprise
Slide 7
Slide 7 text
Functions
OAC supports Every type of analytics
Classic Modern
Slide 8
Slide 8 text
Classic Enterprise BI
• Similar to OBIEE 12c
•Centrally maintained & governed
•Semantic model
• Interactive Dashboards
•KPI measurement & monitoring
•Guided navigation paths
• BI Publisher
•Highly formatted, burst outputs
• Action Framework
•Navigation actions
•Scheduled agents
Slide 9
Slide 9 text
Modern Data Discovery
• Data Preparation
•Acquire data
•Clean/Enrich
•Transform
•Repeatable Flows
• Data Visualisation
•Create visual insights rapidly
•Construct narrated storyboards
•Share findings
Slide 10
Slide 10 text
Unique
Source of
Truth
Raw Data
To Insights
Specific
Access
Control
Data
Enrichment
and
Cleaning
Unified Analytics
Free
Discovery
Centralised
Reporting
https://speakerdeck.com/ftisiot/become-an-equilibrista-find-the-right-balance-in-the-analytics-tech-ecosystem
Slide 11
Slide 11 text
Augmented Analytics
Data Enrichment
Suggestions
Explain
One-Click
Advanced
Analytics
Advanced
Machine
Learning
Natural
Language
Processing
Slide 12
Slide 12 text
Data Scientist
Slide 13
Slide 13 text
https://bigdata-madesimple.com/what-is-a-data-scientist-14-definitions-of-a-data-scientist/
Data Scientist
Is a person who has the knowledge
and skills to conduct sophisticated
and systematic analyses of data.
A data scientist extracts insights
from data sets, and evaluates and
identifies strategic opportunities.
Slide 14
Slide 14 text
https://bigdata-madesimple.com/what-is-a-data-scientist-14-definitions-of-a-data-scientist/
D
ata Scientist
Is a Data Analyst
who lives in
California!
Slide 15
Slide 15 text
Data Scientist Skills
Slide 16
Slide 16 text
https://www.oralytics.com/2012/06/data-science-is-multidisciplinary.html
Brendan Tierney
Oracle Ace Director
Slide 17
Slide 17 text
Data Scientist …Company Missing a Data Scientist
Slide 18
Slide 18 text
Low Hanging Fruit Theory
Democratise
Data Science
Slide 19
Slide 19 text
Basic Operations
What are the
Drivers for My
Sales?
Based on my Experience
I can Guess….
Statistically Significant
Drivers for Sales Are …
Augmented
Analytics
Slide 20
Slide 20 text
Basic Operations
Is this Client
going to accept
the Offer?
YES/NO
50%
70%
Basic ML
Model
Slide 21
Slide 21 text
Become a Data Scientist with OAC
Slide 22
Slide 22 text
Before Starting…. Define the Problem!
Slide 23
Slide 23 text
Problem Definition:
Predicting Wine Quality
Slide 24
Slide 24 text
Task Experience Performance
Classify
Good/Bad
Wine
TEP
Corpus of Wine Descriptions
with Rating
Accuracy
Slide 25
Slide 25 text
Become a Data Scientist with OAC
Connect
Slide 26
Slide 26 text
Connection Options in OAC
Pre-Defined
Data Models
External
Data Sources
Slide 27
Slide 27 text
Select Relevant Columns and Apply Filters
Slide 28
Slide 28 text
Become a Data Scientist with OAC
Connect Clean
Slide 29
Slide 29 text
What He Really Does
What Everybody Thinks
a Data Scientist Does
Become a Data Scientist with OAC
Connect Clean
Transform
&
Enrich
Slide 38
Slide 38 text
Feature Engineering
Location -> ZIP Code
2 Locations -> Distance
Name -> Sex
Day/Month/Year -> Date
Data Flow
Additional
Data Sources?
Slide 39
Slide 39 text
Data Preparation Recommendations
Slide 40
Slide 40 text
Spatial Enrichment
Oracle Spatial Studio
http://ritt.md/spatial-studio
Slide 41
Slide 41 text
Become a Data Scientist with OAC
Connect Clean
Transform
&
Enrich
Analyse
Slide 42
Slide 42 text
Data Overview
Slide 43
Slide 43 text
Explain
Slide 44
Slide 44 text
Explain - Key Drivers
Slide 45
Slide 45 text
Become a Data Scientist with OAC
Connect Clean
Analyse
Train
&
Evaluate
Transform
&
Enrich
Slide 46
Slide 46 text
What Problem are we Trying to Solve?
Supervised Unsupervised
“I want to predict the value of Y,
here are some examples”
“Here is a dataset,
make sense out of it!”
Classification
Regression
https://towardsdatascience.com/supervised-vs-unsupervised-learning-14f68e32ea8d
Clustering
Slide 47
Slide 47 text
Easy Models
Slide 48
Slide 48 text
NLP
Slide 49
Slide 49 text
DataFlow Train Model
Slide 50
Slide 50 text
Which Model - Parameters?
Slide 51
Slide 51 text
Select, Try, Save, Change, Try, Save …..
Slide 52
Slide 52 text
Compare - Classification
Real Value
Predicted Value
Good
Bad
Bad
Good
Slide 53
Slide 53 text
There is No Single Truth…
502/(502+896) = 64.09%
471/(471+866)=64.77%
Precision
Slide 54
Slide 54 text
Compare - Regression
Slide 55
Slide 55 text
Become a Data Scientist with OAC
Connect Clean
Analyse
Train
&
Evaluate
Predict
Transform
&
Enrich
Slide 56
Slide 56 text
Use On the Fly
Slide 57
Slide 57 text
Step of a Data Flow
Slide 58
Slide 58 text
Congratulations!
…You are now a Data Scientist!
Slide 59
Slide 59 text
Nearly
There
Slide 60
Slide 60 text
97%
95%
90%
80%
60%
50%
.
Required Knowledge
Slide 61
Slide 61 text
…But
80% > 50%
Data Cleaning
Model Creation &
Evaluation
Feature
Engineering
Feature
Selection
Slide 62
Slide 62 text
ML Production Deployment
Data Scientist
ML -> Data
Oracle Machine Learning
Slide 63
Slide 63 text
Become a Data Scientist with OAC
http://ritt.md/OAC-datascience