Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Visualization Workshop Part 4 - Visualizin...

Data Visualization Workshop Part 4 - Visualizing Uncertainty

This is a part of the Data Visualization Workshop. In this seminar, we'll focus on how to visualize the uncertainty with Error Bar.

* Introduction to Error Bar chart
* Introduction to Confidence Interval

Kan Nishida

June 17, 2020
Tweet

More Decks by Kan Nishida

Other Decks in Technology

Transcript

  1. Kan Nishida CEO/co-founder Exploratory Summary Beginning of 2016, launched Exploratory,

    Inc. to democratize Data Science. Prior to Exploratory, Kan was a director of product development at Oracle leading teams to build various Data Science products in areas including Machine Learning, BI, Data Visualization, Mobile Analytics, Big Data, etc. While at Oracle, Kan also provided training and consulting services to help organizations transform with data. @KanAugust Speaker
  2. 3 Data Science is not just for Engineers and Statisticians.

    Exploratory makes it possible for Everyone to do Data Science. The Third Wave
  3. 4 Questions Communication Data Access Data Wrangling Visualization Analytics (Statistics

    / Machine Learning) Data Analysis Data Science Workflow
  4. 5 Questions Communication (Dashboard, Note, Slides) Data Access Data Wrangling

    Visualization Analytics (Statistics / Machine Learning) Data Analysis ExploratoryɹModern & Simple UI
  5. 3 4 5 2 1 Very Good Very Bad I

    asked the audience to rate it.
  6. 0 1.25 2.5 3.75 5 1 2 3 4 5

    Average score: 3.6
  7. 0 1.25 2.5 3.75 5 1 2 3 4 5

    Average score: 3.4
  8. 0 1.25 2.5 3.75 5 1 2 3 4 5

    Average score: 3.3
  9. • The numbers vary. • Average is sensitive, it can

    be influenced significantly by extreme values, especially when the size is small.
  10. 0 2 4 6 8 10 12 14 1 2

    3 4 5 Average score: 3.4
  11. 0 2 4 6 8 10 12 14 1 2

    3 4 5 Average score: 3.5
  12. 0 2 4 6 8 10 12 14 1 2

    3 4 5 Average score: 3.6
  13. 3.3 3.4 3.6 3.5 Which is my average score? I

    would take the number from the biggest crowd more seriously because the outliers won’t impact so much on the average.
  14. • The numbers vary. • Average is sensitive, it can

    be influenced significantly by extreme values, especially when the size is small. • Intuitively speaking, the bigger the data size is the more trust we want to give.
  15. 25 Average Scoreɿ? Ideally, I want to give a presentation

    to as many audience as possible and get the survey result from them.
  16. • We have no way of knowing the ‘True mean’

    of all the potential audience (Population) because they didn’t join the seminar for whatever the reason was. (It’s impossible!) • We know the mean score of this group (Sample) as 3.6. • Most likely, this ‘sample mean’ is different from the ‘True mean’, but can we have a range around 3.6 assuming that the ‘True mean’ will reside within the range? If so, what would be the range? 28
  17. • We have no way of knowing the ‘True mean’

    weight of all Americans. • We know the mean weight of a given sample as 84kg. • Most likely, this ‘sample mean’ is different from the ‘True mean’, but can we have a range around 84kg assuming that the ‘True mean’ will reside within the range? If so, what would be the range? 29 Confidence Interval!
  18. 36 True Mean 95% of these confidence intervals should include

    the true mean of the population. }Sample
  19. 37 We happen to be looking at one of the

    sample and its mean and its confidence interval. } True Mean Sample
  20. 38

  21. Once the data is imported, the Summary view automatically generates

    a chart for each column along with metrics to describe the data.
  22. Each row is for each employee of 1,470. There are

    27 variables to describe each employee.
  23. 49 Exercise 1. Compare the average (mean) Monthly Income between

    Male and Female. 2. Compare it for each Job Role and find if there is disparity between Male and Female for any Job Roles.
  24. 51 Create an Error Bar chart, assign Gender to X-Axis

    and Monthly Income to Y-Axis. This will create the chart comparing the mean of Monthly Income by Gender.
  25. 53 Exercise 1 1. Compare the average (mean) Monthly Income

    between Male and Female. 2. Compare it for each Job Role and find if there is disparity between Male and Female for any Job Roles.
  26. 57

  27. 58 We observe how many men and women are in

    this organization by counting them outside the office. Example
  28. Even with Categorical, the variance (the ratio of male /

    female) and the sample size are the important factors when considering the difference among the categories.
  29. 65 Exercise 2 1. Compare the ratio of Male and

    Female. 2. Compare it among the Job Roles and find if there are any different patterns.
  30. 66

  31. Create an Error Bar chart, assign Gender to X-Axis and

    keep ‘Number of Rows’ for the Y-Axis. Then, switch the Calculation Type to ‘Ratio (%)’. This will create the chart comparing the ratio of Female and Male.
  32. 69 Exercise 3 Find if there are any differences in

    the ratio of Attrition among the Job Roles.
  33. 70

  34. 73

  35. Create an Error Bar chart, assign Job Role to X-Axis

    and keep ‘Number of Rows’ for the Y-Axis. Then, switch the Calculation Type to ‘Ratio (%)’. This will create the chart comparing the ratios among the Job Roles.
  36. Sales Executive Research Scientist Manager Sales Rep All 326 292

    102 83 Ratio 22.18% 19.86% 6.94% 5.65% This Error Bar is visualizing the ratio of employees by the Job Role.
  37. How can we compare the ratios of the employees who

    left the companies among the Job Roles? Attrition = Whether a given employee left (True) or not (False).
  38. Sales Executive Research Scientist Manager Sales Rep All 326 292

    102 83 TRUE 57 47 5 33 FALSE 269 245 97 50 TRUE = Those who left the company.
  39. Sales Executive Research Scientist Manager Sales Rep All 326 292

    102 83 TRUE 57 47 5 33 Ratio 40% 33% 3.5% 23% FALSE 269 245 97 50 We want to visualize the ratio of those who left the company.
  40. The original question: Find if there are any differences in

    the ratio of Attrition among the Job Roles. Attrition Rate Not Number of Attrition
  41. Sales Executive Research Scientist Manager Sales Rep TRUE 57 47

    5 33 FALSE 269 245 97 50 Attrition rate should be calculated within each Job Role.
  42. Sales Executive Research Scientist Manager Sales Rep TRUE 57 47

    5 33 FALSE 269 245 97 50 Attrition Rate 17.48% 16.1% 4.9% 39.76%
  43. 88 The Attrition Rates for these 4 Job Roles seem

    to be in the same range. There is not much difference in these Job Roles. However, they are significantly different from the other 5 Job Roles.
  44. • The numbers vary. • Average (Mean) is sensitive, it

    can be influenced significantly by extreme values, especially when the size is small. • When comparing the categorical values we can use the Ratio, but the ratio can be also vary, especially when the size is small. Conclusion
  45. • To compare the means or the ratios we should

    take account of the variance in the data and the size of the data. • Confidence Interval is a useful tool that gives us the context around the mean and the ratio. • It helps us compare the means and the ratio and conclude if there are any differences that should be investigated further. Conclusion
  46. If you want to compare the means or the ratios

    with confidence interval, Error Bar chart is your friend!
  47. • Part 1 - Basics: Visualizing Summarized Data • Part

    2 - Visualizing Time Series Data • Part 3 - Visualizing Variance & Correlation • Part 4 - Visualizing Uncertainty • Part 5 - Data Wrangling for Data Visualization - 7/1 (Wed) Data Visualization Workshop