Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How to choose the right charts for Exploratory Data Analysis

Kan Nishida
September 04, 2019

How to choose the right charts for Exploratory Data Analysis

This is to show you how to choose the right charts for Exploratory Data Analysis and how to use charts like Histogram, Density Plot, Boxplot, Scatter, Stack Bar charts in Exploratory.  

Kan Nishida

September 04, 2019
Tweet

More Decks by Kan Nishida

Other Decks in Science

Transcript

  1. Kan Nishida co-founder/CEO Exploratory Summary Beginning of 2016, launched Exploratory,

    Inc. to democratize Data Science. Prior to Exploratory, Kan was a director of product development at Oracle leading teams for building various Data Science products in areas including Machine Learning, BI, Data Visualization, Mobile Analytics, Big Data, etc. While at Oracle, Kan also provided training and consulting services to help organizations transform with data. @KanAugust Speaker
  2. Data Science is not just for Engineers and Statisticians. Exploratory

    makes it possible for Everyone to do Data Science. The Third Wave
  3. First Wave Second Wave Third Wave Proprietary Open Source UI

    & Programming Programming 2016 2000 1976 Monetization Commoditization Democratization Statisticians Data Scientists Democratization of Data Science Algorithms Experience Tools Open Source UI & Automation Business Users Theme Users
  4. Questions Communication (Dashboard, Note, Slides) Data Access Data Wrangling Visualization

    Analytics (Statistics / Machine Learning) Exploratory Data Analysis
  5. An exploratory and iterative process of asking many questions and

    find answers from data in order to build better hypothesis for Explanation, Prediction, and Control. 12 EDA (Exploratory Data Analysis)
  6. 18 Predict how many customers we will have. Prediction e.g.

    Customers will be 1000 by end of this year.
  7. 19 e.g. We want to grow Customers to 1000. What

    can we do to make that happen? Control
  8. 21 Hypothesis If the weather will be warm, we will

    have more customers. If we offer 10% discount, we would have more customers. Prediction Control
  9. An exploratory and iterative process of asking many questions and

    find answers from data in order to build better hypothesis for Explanation, Prediction, and Control. 26 EDA (Exploratory Data Analysis)
  10. Goal • Want to explain how the salary is decided.

    • Want to predict the salary based on the attributes. • What to control to increase salary, if possible!
  11. An exploratory and iterative process of asking many questions and

    find answers from data in order to build better hypothesis for Prediction and Control. 30 EDA (Exploratory Data Analysis)
  12. 31 Far better an approximate answer to the right question,

    which is often vague, than an exact answer to the wrong question, which can always be made precise. — John Tukey
  13. 32 • How the variation in variables? • How are

    the variables associated (or correlated) to one another? Two Principle Questions for EDA
  14. “Since the aim of exploratory data analysis is to learn

    what seems to be, it should be no surprise that pictures play a vital role in doing it well. There is nothing better for making you think of questions you had forgotten to ask (even mentally),” John Tukey
  15. Questions for Variance • What are the typical values? •

    Are there any outliers compared to the general trend in the variance? • How the data is distributed? • Are there any patterns you can spot in the variance? 38
  16. 40

  17. Numerical 0 10 20 30 40 50 11 22 45

    Continuous and Ordinal relationship among values.
  18. 50 It splits numerical values into a set of ‘bins’

    with equal range it shows the size (or number of rows) for each ‘bin’.
  19. Visualizing Variance with Histogram 51 1. Visualize a variance of

    Monthly Income 2. Find if there is a difference in the Monthly Income variance between Male and Female. 3. Find if there is a difference in the Monthly Income variance among Job Roles.
  20. 52 1. Visualize a variance of Monthly Income 2. Find

    if there is a difference in the Monthly Income variance between Male and Female. 3. Find if there is a difference in the Monthly Income variance among Job Roles. Visualize the Variance with Histogram
  21. 53

  22. Visualize the Variance with Density Plot 57 1. Visualize a

    variance of Monthly Income 2. Find if there is a difference in the Monthly Income variance between Male and Female. 3. Find if there is a difference in the Monthly Income variance among Job Roles.
  23. Visualize the Variance with Density Plot 59 1. Visualize a

    variance of Monthly Income 2. Find if there is a difference in the Monthly Income variance between Male and Female. 3. Find if there is a difference in the Monthly Income variance among Job Roles.
  24. 60 Manager’s Monthly Income range seems to be higher while

    Sales Rep & Research Scientist are lower.
  25. 62 • Draws a smooth curve to visualize the distribution

    of data. • The height shows an estimated data density of any given point.
  26. Visualizing Variance with Density Plot 63 1. Visualize a variance

    of Monthly Income 2. Find if there is a difference in the Monthly Income variance between Male and Female. 3. Find if there is a difference in the Monthly Income variance among Job Roles.
  27. Visualizing Variance with Density Plot 64 1. Visualize a variance

    of Monthly Income 2. Find if there is a difference in the Monthly Income variance between Male and Female. 3. Find if there is a difference in the Monthly Income variance among Job Roles.
  28. 65

  29. Visualizing Variance with Density Plot 66 1. Visualize a variance

    of Monthly Income 2. Find if there is a difference in the Monthly Income variance between Male and Female. 3. Find if there is a difference in the Monthly Income variance among Job Roles.
  30. Visualizing Variance with Density Plot 68 1. Visualize a variance

    of Monthly Income 2. Find if there is a difference in the Monthly Income variance between Male and Female. 3. Find if there is a difference in the Monthly Income variance among Job Roles.
  31. Categorical California Texas New York Florida Oregon • No continuous

    relationship • Limited Set of Values • Ordinal relationship is NOT necessary
  32. Visualize the Variance with Bar Chart 74 1. Visualize the

    variation of Job Role 2. Find if there is a difference in the variations in Job Role between Male and Female. 3. Find if there is a difference in the variations in Job Role between Attrition Status.
  33. Visualize the Variance with Bar Chart 75 1. Visualize the

    variation of Job Role 2. Find if there is a difference in the variations in Job Role between Male and Female. 3. Find if there is a difference in the variations in Job Role between Attrition Status.
  34. 76

  35. 77

  36. Visualize the Variance with Bar Chart 78 1. Visualize the

    variation of Job Role 2. Find if there is a difference in the variations in Job Role between Male and Female. 3. Find if there is a difference in the variations in Job Role between Attrition Status.
  37. Visualize the Variance with Bar Chart 80 1. Visualize the

    variation of Job Role 2. Find if there is a difference in the variations in Job Role between Male and Female. 3. Find if there is a difference in the variations in Job Role between Attrition Status.
  38. 84 A relationship where changes in one variable happen together

    with changes in another variable with a certain rule. Association and Correlation
  39. 85 Association Correlation Any type of relationship between two variables.

    A certain type of (usually linear) association between two variables
  40. 86 US UK Japan 5000 2500 Monthly Income variances are

    different among countries. Country Monthly Income 0 Association
  41. 87 Age Monthly Income The bigger the Age is, the

    bigger the Monthly Income is. Correlation
  42. 92 How much the income would be in this company?

    $20,000 $1,000 Monthly Income Variance
  43. 94 0 30 20 If we can find a correlation

    between Monthly Income and Working Years… 10 $20,000 $1,000 Working Years Monthly Income
  44. 95 0 30 20 10 $20,000 $1,000 Working Years If

    Working Years is 20 years, Monthly Income would be around $15,000. $15,000 Monthly Income
  45. 96 5000 0 30 20 Working Years Correlation Variance 100

    $20,000 $1,000 $15,000 Correlation reduces Uncertainty caused by Variance. Monthly Income
  46. If we can find strong correlations, it makes it easier

    to explain how Monthly Income changes and to predict what Monthly Income will be.
  47. Correlation is not equal to Causation. Causation is a special

    type of Correlation. If we can confirm a given Correlation is Causation, then we can control the outcome.
  48. 104 • Category vs. Numerical • Numerical vs. Numerical •

    Category vs. Category Combination of Data Types
  49. 105 • Category vs. Numerical • Numerical vs. Numerical •

    Category vs. Category Combination of Data Types
  50. Boxplot • Displays the distribution of numerical values by Category

    • Y Axis represents range of values, X Axis represents each Category
  51. 3Q (3rd Quartile / 75 Percentile) 2Q (2nd Quartile /

    50 Percentile / Median) 1Q (1st Quartile / 25 Percentile)
  52. Visualize the relationship between Categorical and Numerical 119 1. Visualize

    the relationship between Monthly Salary and Job Role. 2. Visualize the relationship between Monthly Salary and Gender.
  53. Visualize the relationship between Categorical and Numerical 120 1. Visualize

    the relationship between Monthly Salary and Job Role. 2. Visualize the relationship between Monthly Salary and Gender.
  54. 122 1. Visualize the relationship between Monthly Salary and Job

    Role. 2. Visualize the relationship between Monthly Salary and Gender. Visualize the relationship between Categorical and Numerical
  55. 127 • Category vs. Numerical • Numerical vs. Numerical •

    Category vs. Category Combination of Data Types
  56. 130 Numeric Numeric Each data point (row) is positioned at

    an intersection of two numeric variables.
  57. Visualize the relationship between Numerical and Numerical 132 1. Visualize

    the relationship between Monthly Salary and Age. 2. Visualize the relationship between Monthly Salary and Total Working Years. 3. Find if the correlations are different among Job Roles.
  58. Visualize the relationship between Numerical and Numerical 133 1. Visualize

    the relationship between Monthly Salary and Age. 2. Visualize the relationship between Monthly Salary and Total Working Years. 3. Find if the correlations are different among Job Roles.
  59. 136

  60. Visualize the relationship between Numerical and Numerical 138 1. Visualize

    the relationship between Monthly Salary and Age. 2. Visualize the relationship between Monthly Salary and Total Working Years. 3. Find if the correlations are different among Job Roles.
  61. 139

  62. Visualize the relationship between Numerical and Numerical 141 1. Visualize

    the relationship between Monthly Salary and Age. 2. Visualize the relationship between Monthly Salary and Total Working Years. 3. Find if the correlations are different among Job Roles.
  63. 143 • Category vs. Numerical • Numerical vs. Numerical •

    Category vs. Category Combination of Data Types
  64. 145 Category vs. Category Calculating the size (number of rows)

    for each pair and/or calculate the ratio against the total size.
  65. 146 1. Visualize the relationship between Job Role and Education.

    2. Visualize the relationship between Job Role and Attrition. 3. Visualize the relationship between Job Role and Monthly Salary. 4. Visualize the relationship between Monthly Salary and Total Working Years. Visualize the relationship between Categorical and Categorical
  66. 147 1. Visualize the relationship between Job Role and Education.

    2. Visualize the relationship between Job Role and Attrition. 3. Visualize the relationship between Job Role and Monthly Salary. 4. Visualize the relationship between Monthly Salary and Total Working Years.
  67. 149 1. Visualize the relationship between Job Role and Education.

    2. Visualize the relationship between Job Role and Attrition. 3. Visualize the relationship between Job Role and Monthly Salary. 4. Visualize the relationship between Monthly Salary and Total Working Years.
  68. • Logical is a special case of Categorical. • It

    can have only two unique values. • They are TRUE or FALSE.
  69. 155 1. Visualize the relationship between Job Role and Education.

    2. Visualize the relationship between Job Role and Attrition. 3. Visualize the relationship between Job Role and Monthly Salary. 4. Visualize the relationship between Monthly Salary and Total Working Years.
  70. Visualize the relationship between Job Role and Monthly Income. Job

    Role (Categorical) vs. Monthly Income (Numerical)
  71. Visualize the relationship between Job Role and Monthly Income. Job

    Role (Categorical) vs. Monthly Income (Numerical) Monthly Income (Categorical) Binning
  72. 161 1. Visualize the relationship between Job Role and Education.

    2. Visualize the relationship between Job Role and Attrition. 3. Visualize the relationship between Job Role and Monthly Salary. 4. Visualize the relationship between Monthly Salary and Total Working Years.