Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Exploratory Seminar #38 - Why Excel Users Love Exploratory? - Part 2

Exploratory Seminar #38 - Why Excel Users Love Exploratory? - Part 2

We have interviewed our users who used to be using Excel on why they have transitioned to Exploratory, and we've come up with a list of 9 reasons we often hear.

* Intuitive Understanding of Data at Every Step
* Analytical Capability
* Reproducibility for Reporting
* Easy to Share, But in a Managed way

In this seminar (Part 2), Kan will be discussing the second half of the topics.

19fc8f6113c5c3d86e6176362ff29479?s=128

Kan Nishida
PRO

March 17, 2021
Tweet

Transcript

  1. EXPLORATORY Online Seminar #38 Why Excel Users Love Exploratory -

    Part 2 ❤
  2. Kan Nishida CEO/co-founder Exploratory Summary In Spring 2016, launched Exploratory,

    Inc. to democratize Data Science. Prior to Exploratory, Kan was a director of product development at Oracle leading teams to build various Data Science products in areas including Machine Learning, BI, Data Visualization, Mobile Analytics, Big Data, etc. While at Oracle, Kan also provided training and consulting services to help organizations transform with data. @KanAugust Speaker
  3. Vision Everyone will be making informed decisions based on data.

  4. Mission Democratize Data Science

  5. 5 Questions Communication Data Access Data Wrangling Visualization Analytics (Statistics

    / Machine Learning) Data Analysis Data Science Workflow
  6. 6 Questions Communication (Dashboard, Note, Slides) Data Access Data Wrangling

    Visualization Analytics (Statistics / Machine Learning) Data Analysis ExploratoryɹModern & Simple UI
  7. EXPLORATORY Online Seminar #38 Why Excel Users Love Exploratory -

    Part 2 ❤
  8. 1. Better Performance & Bigger Data Size 2. Lower Learning

    Cost 3. Better Debuggability 4. Data Reproducibility & Automation 5. No Dependency on Excel God 6. Intuitive Understanding with Visualization 7. Better and Quicker Analytical Capability 8. Reproducibility for Reporting 9. Easy to Share, But in a Managed way 9 Reasons Why Excel Users Love Exploratory
  9. 9 Reasons Why Excel Users Love Exploratory 1. Better Performance

    & Bigger Data Size 2. Lower Learning Cost 3. Better Debuggability 4. Data Reproducibility & Automation 5. No Dependency on Excel God 6. Intuitive Understanding with Visualization 7. Better and Quicker Analytical Capability 8. Reproducibility for Reporting 9. Easy to Share, But in a Managed way Part 1
  10. Check out the Part 1 recording!

  11. None
  12. 9 Reasons Why Excel Users Love Exploratory 1. Better Performance

    & Bigger Data Size 2. Lower Learning Cost 3. Better Debuggability 4. Data Reproducibility & Automation 5. No Dependency on Excel God 6. Intuitive Understanding with Visualization 7. Better and Quicker Analytical Capability 8. Reproducibility for Reporting 9. Easy to Share, But in a Managed way Part 2
  13. 9 Reasons Why Excel Users Love Exploratory 1. Better Performance

    & Bigger Data Size 2. Lower Learning Cost 3. Better Debuggability 4. Data Reproducibility & Automation 5. No Dependency on Excel God 6. Intuitive Understanding with Visualization 7. Better and Quicker Analytical Capability 8. Reproducibility for Reporting 9. Easy to Share, But in a Managed way
  14. 14 Without an intuitive understanding of data… • Fail to

    realize the problems with your data. • Not sure what you get is what you have expected. • Fail to find the patterns in your data.
  15. 15 Without an intuitive understanding of data… • Fail to

    realize the problems with your data. • Not sure what you get is what you have expected. • Fail to find the patterns in your data.
  16. Typical Problems with Data 16 1. Missing Values 2. Outliers

    3. Not Sufficient Data
  17. You can see only what is in the window. 17

  18. 18 There might be missing values, but you won’t notice

    unless you scroll and check all the rows.
  19. There might be unexpected outlier values but you won’t notice

    unless you scroll and check all the rows. 19
  20. The data is missing some periods (dates), but it’s hard

    to find that out. 20
  21. 21 Top 100 Customers based on Sales Failing to realize

    the problems with data at the beginning doesn’t mean that the problems don’t exist. They just will be carried through the downstream. 1. Remove/Impute NAs 2. Remove outlier values 3. Clean up the customer names 4. Summarize the Sales by Customer 5. Extract the first and the last order date 6. Extract the customers country 7. Filter for Northern America customers 8. Calculate the customer life time period 9. Calculate the sales per day 10. Keep only the top 100 customers Is this really top 100?
  22. If you realize the problems after the data wrangling tasks

    in Excel, you will need to fix the problems and adjust all the following tasks as required. 22
  23. With Exploratory

  24. 24 Summary view is automatically generated when you import data.

    It gives you a quick and intuitive way to identify the data problems if there is any.
  25. 25 You can quickly know how many NAs (missing values)

    are.
  26. 26 You can also quickly see if there are any

    outlier values.
  27. 27 You can quickly check if it has all the

    data as you expect. If it is date and time data, you can check the beginning and the ending of period.
  28. 28 Without an intuitive understanding of data… • Fail to

    realize the problems with your data. • Not sure what you get is what you have expected. • Fail to find the patterns in your data.
  29. Order Data Country Population Data Let’s say you want to

    join two data sets. 29
  30. 30 With Excel, you can use the ‘vlookup’ function.

  31. After the join operation, it looks good. 31

  32. But, if you look closely there are some rows that

    didn’t get joined as expected. 32
  33. With Exploratory

  34. You can Join multiple data frames together with UI. 34

  35. After the Join operation, you can quickly check how the

    data looks under the Summary view. 35 Looks there is 1 row that doesn’t get joined.
  36. You can use the Table View Filter to show only

    the rows with NAs. 36
  37. The row with ‘US’ country didn’t get joined with the

    population data correctly. 37
  38. You can convert country code right before the Join step.

    38
  39. Convert from Country Name to Country Name. This will standardize

    the country names. 39
  40. You can check if there are any NAs in the

    Summary view and make sure all the rows are joined as expected. 40
  41. In Excel, everything is shown as text and it’s hard

    to realize the problems in your data. With Exploratory, you can quickly realize the problems thanks to visual presentation of your data at each step of data wrangling operations. Exploratory Excel 41
  42. 42 Without an intuitive understanding of data… • Fail to

    realize the problems with your data. • Not clear if the result of data wrangling is what you have expected. • Fail to find the patterns in your data.

  43. 43 Let’s say you’re visualizing the number of returned items

    over a given period.
  44. 44 Looks every December has similar spike.

  45. 45 By quickly switching the data aggregation and grouping you

    can find seasonal patterns in your data.
  46. Original Data Summarize Visualize 46 With Excel, you need to

    summarize the data first, then visualize.
  47. 47 Switching the column assignments is complicated Need to assign

    the area that contains the target data…
  48. 48 Bar Chart Scatter Chart Configuring the chart data assignment

    is different among the chart types, which makes it harder to quickly try various ways to find patterns in data.
  49. 49 The experience of data visualization in Excel is not

    designed to explore data and discover patterns and trends. It is rather designed for presenting the data.
  50. With Exploratory

  51. You can quickly create charts by assigning the columns and

    setting the data aggregation. 51
  52. For Date and Time data, you can set the aggregation

    level flexibly by rounding the date or extracting only a part of Date and Time. 52
  53. A data visualization grammar based design lets you use the

    same framework to configure all the charts. For example, the Color is to create multiple groups. 53
  54. You can use Repeat By to separate into multiple charts.

    54 Looks that the return rate spike in December is coming from the North America.
  55. 55 Being able to understand the data intuitively and visualy

    is critical in any steps of data analysis. Recognize the problems. Make sure you get what you expected at each step of Data Wrangling. Find the patterns and trends quickly. Data Wrangling Data Visualization Summary
  56. 9 Reasons Why Excel Users Love Exploratory 1. Better Performance

    & Bigger Data Size 2. Lower Learning Cost 3. Better Debuggability 4. Data Reproducibility & Automation 5. No Dependency on Excel God 6. Intuitive Understanding with Visualization 7. Better and Quicker Analytical Capability 8. Reproducibility for Reporting 9. Easy to Share, But in a Managed way
  57. Excel is not designed for Analytics.

  58. Well… Ok, you can do Analytics in Excel. But, the

    most of you won’t do it because it’s hard to do it. $BOEP 8JMMEP 㱠
  59. 59 For example, let’s look at Correlation…

  60. 60 Age Monthly Income The bigger the Age is, the

    bigger the Monthly Income is. Correlation
  61. Strong Negative Correlation No Correlation Strong Positive Correlation 0 1

    -1 -0.5 0.5 Correlation
  62. In Excel, you can calculate the correlation coefficient with ‘CORREL’

    function. 62
  63. But, what does 0.81 really mean? 63

  64. Even with the same correlation coefficient the data can look

    very different. Correlation Coefficientɿ0.81 Anscombe’s Quartet
  65. You could visualize the relationship, but creating the chart in

    Excel is …
  66. Also, it’s cumbersome to try every single combination… 66

  67. With Exploratory

  68. You can quickly visualize the relationship between a given two

    variables and see a set of correlation related metrics such as the correlation coefficient. 68
  69. You can quickly calculate the correlation coefficients between all the

    combinations of the numeric variables. 69
  70. Also, if you have a variable of your interest, you

    can use the Correlation Mode under Summary view to … 70
  71. Quickly explore the relationships with all the other variables. 71

  72. You can sort the variables based on the correlation coefficient

    or R Squared. 72
  73. You can evaluate if the relationship is significant with the

    statistical test result. 73
  74. When it is not significant… Correlation coefficient is close to

    0. Not enough data to prove. 74
  75. Now that we know Sales and Sales Comp are correlated,

    but is that because an increase in Sales Comp causes the increase in Sales? 75
  76. 76 $PSSFMBUJPO $BVTBUJPO 㱠

  77. None
  78. 78 Shark Attack Ice Cream Sales

  79. 79 Hot Confounding Shark Attack Ice Cream Sales

  80. Multivariate Analysis

  81. You can build a model to perform the Multivariate Analysis.

    81
  82. You can quickly understand the relationship with various pre-built visualization

    charts. 82
  83. 83

  84. 84 Again, you can do such analysis in Excel, too,

    but will you?
  85. 85 $BOEP 8JMMEP 㱠

  86. 9 Reasons Why Excel Users Love Exploratory 1. Better Performance

    & Bigger Data Size 2. Lower Learning Cost 3. Better Debuggability 4. Data Reproducibility & Automation 5. No Dependency on Excel God 6. Intuitive Understanding with Visualization 7. Better and Quicker Analytical Capability 8. Reproducibility for Reporting 9. Easy to Share, But in a Managed way
  87. Typically, you create charts and copy and paste them to

    Power Point (or Keynote) in order to create a report or present to others. 87
  88. 3 Problems for Reproducibility of Reporting 88 1. Need to

    copy the charts, one by one. 2. If any problems with data you’ll need to do it again. 3. The charts pasted in the report are not guaranteed to reproduce.
  89. 3 Problems for Reproducibility of Reporting 89 1. Need to

    copy the charts, one by one. 2. If any problems with data you’ll need to do it again. 3. The charts pasted in the report are not guaranteed to reproduce.
  90. Need to copy the charts, one by one. 90

  91. It becomes harder to find and manage the charts when

    they’re created in various sheets. 91
  92. With Exploratory

  93. You can create Dashboard, Note, and Slides directly inside Exploratory.

    93
  94. 94 You can add the charts embedded directly inside the

    Note.
  95. If you have added comments to charts… 95

  96. The chart gets inserted along with the comment, which will

    make the note creation quicker. 96
  97. 3 Problems for Reproducibility of Reporting 97 1. Need to

    copy the charts, one by one. 2. If any problems with data you’ll need to do it again. 3. The charts pasted in the report are not guaranteed to reproduce.
  98. What if there was an error in the data? 98

  99. With Excel, a single problem in the upstream of the

    data preparation flow will require all the subsequent charts to be adjusted. 99
  100. And, you will need to copy and paste all the

    charts again… 100
  101. With Exploratory

  102. When you change or fix the data in the upstream

    of the data preparation flow, all the subsequent steps will be automatically updated to accommodate the change. 102
  103. And, all the charts that reference to the subsequent steps

    will also be automatically updated. 103
  104. You don’t need to revisit all the charts one by

    one, you just click the Run button, then all the charts will be regenerated to accommodate the changes automatically. 104
  105. If the data in the original Excel files have been

    updated, then you can click ‘Re-Import’ button, which will not only import the data but also apply all the data wrangling steps and update the charts automatically. 105
  106. 3 Problems for Reproducibility of Reporting 106 1. Need to

    copy the charts, one by one. 2. If any problems with data you’ll need to do it again. 3. The charts pasted in the report are not guaranteed to reproduce.
  107. I want to customize the report to see how my

    product is doing. Report Author You
  108. 108 • When you are given the report (Excel and

    Powerpoint) you don’t know which charts are supposed to be coming from which Excel sheets. • It’s hard to know how the chart data was prepared since there is no data wrangling steps and chart creation being recorded. Blackbox of Report Creation
  109. With Exploratory

  110. &%' You can share your report as an EDF (Exploratory

    Data Format) which contains everything you need to reproduce the report. 110 Report Author You
  111. Export as EDF. 111

  112. You can import the EDF. 112

  113. Just by importing the EDF you’ll be able to reproduce

    exactly the same thing of the original report. 113
  114. You need Data, Data Wrangling Steps, and Chart configuration to

    reproduce the Note, and the EDF contains all of them. 114
  115. 115 Not only reproducing the original report, you can also

    check how the data and the charts were created by looking at the UIs. By sharing the reports as reproducible formats it makes it easier to share the works such as data wrangling and analysis that is needed to produce the reports with other members. You can scale the team productivity by collaborating with others instead of concentrating all the reporting works on a single person.
  116. 9 Reasons Why Excel Users Love Exploratory 1. Better Performance

    & Bigger Data Size 2. Lower Learning Cost 3. Better Debuggability 4. Data Reproducibility & Automation 5. No Dependency on Excel God 6. Intuitive Understanding with Visualization 7. Better and Quicker Analytical Capability 8. Reproducibility for Reporting 9. Easy to Share, But in a Managed way
  117. When you share the Excel files via mail… 117

  118. Which one is the right Sales data?

  119. “There are many spreadsheet data flying around via Emails, Slack,

    Google Docs, or random folders at document sharing servers. But, nobody is really sure which ones are the right ones to look at.” - a person we hear very often
  120. • Not sure which data file is the correct one.

    • Not clear what’s in the data. • Nobody knows how the data has been transformed or manipulated. • Someone has to keep updating the data manually. 120
  121. With Exploratory

  122. You can publish the data you have prepared to Exploratory

    Server. 122
  123. You can share with others by inviting them. 123

  124. You can open the data in browsers and see it

    with the Summary view. 124
  125. You can quickly search and sort the data. 125

  126. You can see the data information and the data dictionary.

    126
  127. You can schedule the data to keep it always up-to-date.

    127
  128. You can download the data as CSV. 128

  129. Or, you can use an API to download the data.

    129
  130. By downloading EDF, you can import the data along the

    data wrangling steps required to reproduce the data in Exploratory. 130
  131. That’s it for today! 131

  132. EXPLORATORY Online Seminar #39 3/24/2021 (Wed) 11AM PT Introduction to

    Parameter - SQL, Filter, & Calculation -
  133. None
  134. Information Email kan@exploratory.io Website https://exploratory.io Twitter @ExploratoryData Seminar https://exploratory.io/online-seminar

  135. Q & A 135

  136. EXPLORATORY 136