Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Exploratory v6.0

Introduction to Exploratory v6.0

Kan Nishida

June 10, 2020
Tweet

More Decks by Kan Nishida

Other Decks in Technology

Transcript

  1. Kan Nishida CEO/co-founder Exploratory Summary Beginning of 2016, launched Exploratory,

    Inc. to democratize Data Science. Prior to Exploratory, Kan was a director of product development at Oracle leading teams to build various Data Science products in areas including Machine Learning, BI, Data Visualization, Mobile Analytics, Big Data, etc. While at Oracle, Kan also provided training and consulting services to help organizations transform with data. @KanAugust Speaker
  2. 3 Data Science is not just for Engineers and Statisticians.

    Exploratory makes it possible for Everyone to do Data Science. The Third Wave
  3. 4 Questions Communication Data Access Data Wrangling Visualization Analytics (Statistics

    / Machine Learning) Data Analysis Data Science Workflow
  4. 5 Questions Communication (Dashboard, Note, Slides) Data Access Data Wrangling

    Visualization Analytics (Statistics / Machine Learning) Data Analysis ExploratoryɹModern & Simple UI
  5. An exploratory and iterative process of asking many questions and

    find answers from data in order to build better hypothesis for Explanation, Prediction, and Control. 23 EDA (Exploratory Data Analysis)
  6. 24

  7. 25 Far better an approximate answer to the right question,

    which is often vague, than an exact answer to the wrong question, which can always be made precise. — John Tukey
  8. 26 • How the variation in variables? • How are

    the variables associated or correlated to one another? Two Principle Questions for EDA
  9. 28 A relationship where changes in one variable happen together

    with changes in another variable with a certain rule. Association and Correlation
  10. 29 Association Correlation Any type of relationship between two variables.

    A certain type of (usually linear) association between two continuous variables
  11. 30 US UK Japan 5000 2500 Variations of Monthly Income

    are different among countries. Country Monthly Income 0 Association
  12. 31 Age Monthly Income The bigger the Age is, the

    bigger the Monthly Income is. Correlation
  13. 37 • How the variation in variables? • How are

    the variables associated or correlated to one another? Two Principle Questions for EDA
  14. • Map: Standard • Map: More Granular Level Zoom In/Out

    • Limit Values for Color / Repeat By • Multiple Reference Lines
  15. 68 Various types of names can be used for the

    column assignment. For example, the country can be Name (US, United States, etc.), ISO2, ISO3 codes.
  16. You can use ‘Limit’ to show only the Top N

    countries or the countries that match with a given condition.
  17. • Model Interpretability • Survival Analysis: Random Survival Forest •

    Survival Analysis: Cox Regression: Survival Curve, Prediction • ROC Curve
  18. 76 • How the variation in variables? • How are

    the variables associated or correlated to one another? Two Principle Questions for EDA
  19. Job Age Good Looking Nationality School Politician 60s FALSE Japanese

    Law School Actor 40s TRUE American Actor’s School Actor 50s TRUE American Actor’s School Politician 40s TRUE Canadian Law School Politician 50s TRUE American Law School Actor 50s TRUE American Actor’s School Algorithm Model Build Prediction Model
  20. Job Age Good Looking Nationality School Politician 60s FALSE Japanese

    Law School Actor 40s TRUE American Actor’s School Actor 50s TRUE American Actor’s School Politician 40s TRUE Canadian Law School Politician 50s TRUE American Law School Actor 50s TRUE American Actor’s School Algorithm Model Job Age Good Looking Nationality School ? 60s FALSE Japanese Law School ? 40s TRUE American Actor’s School ? 50s TRUE American Actor’s School Job Age Good Looking Nationality School Politician 60s FALSE Japanese Law School Actor 40s TRUE American Actor’s School Actor 50s TRUE American Actor’s School Predict
  21. The algorithm has learned the patterns to predict. Algorithm Model

    Job Age Good Looking Nationality School Politician 60s FALSE Japanese Law School Actor 40s TRUE American Actor’s School Actor 50s TRUE American Actor’s School Politician 40s TRUE Canadian Law School Politician 50s TRUE American Law School Actor 50s TRUE American Actor’s School
  22. The difference among the prediction model algorithms (Statistical Learning, Machine

    Learning) is about what relationships they can recognize. What kinds of relationships have the algorithms found?
  23. • Variable Importance • Prediction by Variable • Summary -

    Model Quality • Predicted Data Analytics Grammer
  24. Variable Importance It uses the ‘Permutation Importance’ method, which scores

    the importance for each variable by evaluating how the prediction quality degrades without the variable.
  25. 112 Calculated by Kaplan-Meier Predicted by Cox Regression Model Cox

    Regression is not good at capturing the relationship that changes overtime due to its constraint that the hazard ratio is constant.
  26. 113 Random Survival Forest can capture the relationship that changes

    overtime because it is a machine learning algorithm with less constrains. Calculated by Kaplan-Meier Predicted by Random Survival Forest Model
  27. • Part 1 - Basics: Visualizing Summarized Data • Part

    2 - Visualizing Time Series Data • Part 3 - Visualizing Variance & Correlation • Part 4 - Visualizing Uncertainty - 6/17 (Wed) • Part 5 - Data Wrangling for Data Visualization Data Visualization Workshop