Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Seminar #37 - Why Excel Users Love Exploratory? - Part 1

Seminar #37 - Why Excel Users Love Exploratory? - Part 1

We have found 8 top reasons on why our users have chosen Exploratory over Excel for performing their day-to-day data analysis based on the interviews.

In this seminar, Kan will discuss the first 4 of them along with a live demo.

* Data Size Limitation & Performance
* Learning Cost
* Debuggability
* Data Reproducibility & Automation
* Specialization with Excel God
* Lack of Intuitive Understanding
* Lack of Analytical Capability
* Reproducibility for Reporting

UI Tool: Exploratory(https://exploratory.io/)
Exploratory Online Seminar: https://exploratory.io/online-seminar

Kan Nishida

March 10, 2021
Tweet

More Decks by Kan Nishida

Other Decks in Technology

Transcript

  1. Kan Nishida CEO/co-founder Exploratory Summary In Spring 2016, launched Exploratory,

    Inc. to democratize Data Science. Prior to Exploratory, Kan was a director of product development at Oracle leading teams to build various Data Science products in areas including Machine Learning, BI, Data Visualization, Mobile Analytics, Big Data, etc. While at Oracle, Kan also provided training and consulting services to help organizations transform with data. @KanAugust Speaker
  2. 4 Questions Communication Data Access Data Wrangling Visualization Analytics (Statistics

    / Machine Learning) Data Analysis Data Science Workflow
  3. 5 Questions Communication (Dashboard, Note, Slides) Data Access Data Wrangling

    Visualization Analytics (Statistics / Machine Learning) Data Analysis ExploratoryɹModern & Simple UI
  4. • More than 50,000 potentially infectious people may have been

    missed by contact tracers and not told to self-isolate because of the data blunder. • Nearly 16,000 coronavirus cases went unreported in England. • Public Health England (PHE) admitted it was to blame for the IT error that meant 15,841 positive test results were left off the official daily figures between 25 September and 2 October.
  5. “It was down to a combination of human error and

    IT, that is a human using IT. It’s fundamentally an IT error but there is a bit of human error involved in this too.”
  6. • PHE was responsible for collating the test results from

    public and private labs, and publishing the daily updates on case count and tests performed. • Much of the work is still done manually, with individual labs sending PHE spreadsheets containing their results. • One lab had sent its daily test report to PHE in the form of a CSV file – the simplest possible database format, just a list of values separated by commas. That report was then loaded into Microsoft Excel, and the new tests at the bottom were added to the main database.
  7. • While CSV files can be any size, Microsoft Excel

    files can only be 1,048,576 rows long – or, in older versions which PHE may have still been using, a mere 65,536. • When a CSV file longer than that is opened, the bottom rows get cut off and are no longer displayed. That means that, once the lab had performed more than a million tests, it was only a matter of time before its reports failed to be read by PHE.
  8. ”The error was the latest in a “pandemic of incompetence

    from the government. There are mistakes and there are really serious mistakes. This is a highly significant mistake that tells me the system is not fit for purpose.” Joe AndersonɺLiverpool’s mayor
  9. • In 2013, an Excel error at JPMorgan masked the

    loss of almost $6bn (£4.6bn), after a cell mistakenly divided by the sum of two interest rates, rather than the average. • “There is no way to trace where your data comes from, there’s no audit trail (so you can overtype numbers and not know it), and there’s no easy way to test spreadsheets, for starters. • The biggest problem is that anyone can create Excel spreadsheets – badly. Because it’s so easy to use, the creation of even important spreadsheets is not restricted to people who understand programming and do it in a methodical, well- documented way,”
  10. 1. Better Performance & Bigger Data Size 2. Lower Learning

    Cost 3. Better Debuggability 4. Data Reproducibility & Automation 5. No Dependency on Excel God 6. Intuitive Understanding with Visualization 7. Better and Quicker Analytical Capability 8. Reproducibility for Reporting 8 Reasons Why Excel Users Love Exploratory
  11. 8 Reasons Why Excel Users Love Exploratory 1. Better Performance

    & Bigger Data Size 2. Lower Learning Cost 3. Better Debuggability 4. Data Reproducibility & Automation 5. No Dependency on Excel God 6. Intuitive Understanding with Visualization 7. Better and Quicker Analytical Capability 8. Reproducibility for Reporting
  12. 21 • Even if you can import the data (without

    being left out), as the data size grows the slower it takes to import. • Once it goes beyond a few hundred rows it can freeze or shutdown.
  13. 23 Initial Import From the 2nd times (Restarting the app,

    reopening the projects, etc.) 26 secs. 5 secs. Excel Exploratory Excel Exploratory 0.5 secs 500,000 Rows (11 Columns) Data 26 secs.
  14. 24 57 secs. 7 secs. Excel Exploratory Excel Exploratory 1ඵ

    0.6 secs. 57 secs. 1,000,000 Rows (11 Columns) Data Initial Import From the 2nd times (Restarting the app, reopening the projects, etc.)
  15. 25 Excel Exploratory 0.5 M 1 M 26ඵ 57ඵ 5ඵ

    7ඵ 26 secs. 57 secs. 5 secs. 7 secs. Initial Import Rows 0.5 M 1 M
  16. 26 Excel Exploratory 26ඵ 57ඵ 26 secs. 57 secs. 0.5

    secs. 0.6 secs. 0.5 M 1 M Rows 0.5 M 1 M From the 2nd times (Restarting the app, reopening the projects, etc.)
  17. 8 Reasons Why Excel Users Love Exploratory 1. Better Performance

    & Bigger Data Size 2. Lower Learning Cost 3. Better Debuggability 4. Data Reproducibility & Automation 5. No Dependency on Excel God 6. Intuitive Understanding with Visualization 7. Better and Quicker Analytical Capability 8. Reproducibility for Reporting
  18. Easy to enter data, but… As you start adding more

    calculations or data wrangling tasks it becomes dramatically more complicated and more error prone.
  19. • Conditional Calculation with IF function • Joining Multiple Data

    Sets with VLOOKUP function • Clean and manipulate Text data • Visualizing Data with Charts For example…
  20. • Conditional Calculation with IF function • Joining Multiple Data

    Sets with VLOOKUP function • Clean and manipulate Text data • Visualizing Data with Charts For example…
  21. 34 =IF(J2 > 500, "High", IF(J2 > 100, "Medium", “Low"))

    IF inside another IF IF function starts getting ‘nested’, and that’s nasty…
  22. 37

  23. 40 Create conditions with multiple columns. e.g. When ‘Sales’ is

    NA or ‘Returned’ is TRUE, ‘Sales’ should be 0.
  24. • Conditional Calculation with IF function • Joining Multiple Data

    Sets with VLOOKUP function • Clean and manipulate Text data • Visualizing Data with Charts For example…
  25. 49 Order Data Price Data Want to bring in the

    pricing information from Price data based on the product ID.
  26. 51 Order Data Price Data If you want to use

    multiple columns as the keys to join 2 data sets, sorry but I don’t know what to tell you…
  27. You can select from various Join operation types and set

    multiple columns as the Join keys. 55
  28. • Conditional Calculation with IF function • Joining Multiple Data

    Sets with VLOOKUP function • Clean and manipulate Text data • Visualizing Data with Charts For example…
  29. People love entering names in various ways. We celebrate Diversity!

    59 • UPPERCAE, lowercase, Title Case • Half-width / Full-width (Japanese) • Extra Spaces, Tabs, New Lines • Unnecessary Text Letters
  30. 65

  31. • Conditional Calculation with IF function • Joining Multiple Data

    Sets with VLOOKUP function • Clean and manipulate Text data • Visualizing Data with Charts For example…
  32. 1. Need to summarize (aggregate) the data first 2. Switching

    the column assignments is complicated 3. Different charts ask you different ways to configure Top 3 Annoyances when creating charts
  33. 68 Order Data Summarize Create a Chart 1. Need to

    summarize (aggregate) the data first
  34. 69 2. Switching the column assignments is complicated Need to

    assign the area that contains the target data…
  35. 73 The chart takes care of the summarization (aggregation). You

    can configure how you want to summarize easily.
  36. 76 ‘Visualization Grammar’ based design provides a consistent way to

    configure the charts, and the configuration is maintained even when switching the chart types.
  37. 77 ‘Visualization Grammar’ based design provides a consistent way to

    configure the charts, and the configuration is maintained even when switching the chart types.
  38. 79 Just by assigning a column, you can group the

    data and aggregate the numbers.
  39. 80 By assigning a column to Repeat By you can

    create a chart for each category quickly.
  40. 8 Reasons Why Excel Users Love Exploratory 1. Better Performance

    & Bigger Data Size 2. Lower Learning Cost 3. Better Debuggability 4. Data Reproducibility & Automation 5. No Dependency on Excel God 6. Intuitive Understanding with Visualization 7. Better and Quicker Analytical Capability 8. Reproducibility for Reporting
  41. Data Analyst Business People Can you create a list for

    north American customers who are top 100 based on Sales per day
  42. 1. Removed NA rows 2. Removed outlier values 3. Cleaned

    up the customer names 4. Summarized the Sales by Customer 5. Extracted the first and the last order date for each customer 6. Extracted the customers country information 7. Filtered only for Northern America customers 8. Calculated the customer life time period for each customer 9. Calculated the sales per day for each customer 10. Keep only the top 100 customers 85 Where did it go wrong? If you have recorded what you have done to get the result…
  43. Many of the data wrangling operations are done in different

    sheets, which makes it hard to understand what and how they are done.
  44. Need to click each cell to see what kind of

    calculation is done there.
  45. If the data is filtered those cells that need to

    be looked at might be hidden…
  46. 1. Removed NA rows 2. Removed outlier values 3. Cleaned

    up the customer names 4. Summarized the Sales by Customer 5. Extracted the first and the last order date for each customer 6. Extracted the customers country information 7. Filtered only for Northern America customers 8. Calculated the customer life time period for each customer 9. Calculated the sales per day for each customer 10. Keep only the top 100 customers 90 After all the works, finally you have found where the problem was made!
  47. 1. Removed NA rows 2. Removed outlier values 3. Cleaned

    up the customer names 4. Summarized the Sales by Customer 5. Extracted the first and the last order date for each customer 6. Extracted the customers country information 7. Filtered only for Northern America customers 8. Calculated the customer life time period for each customer 9. Calculated the sales per day for each customer 10. Keep only the top 100 customers 91 Sheet 1 Once you fixed the problem, you will need to review and fix the following steps manually, one by one. Sheet 2 Sheet 3
  48. Once you fixed the problem, you will need to review

    and fix the following steps manually, one by one.
  49. 94 All the data wrangling steps you do in Exploratory

    are recorded as Steps. 1. Removed NA rows 2. Removed outlier values 3. Cleaned up the customer names 4. Summarized the Sales by Customer 5. Extracted the first and the last order date 6. Extracted the customers country 7. Filtered for Northern America customers 8. Calculated the customer life time period 9. Calculated the sales per day 10. Keep only the top 100 customers
  50. 95 The data wrangling steps are managed and ordered in

    a way the data should be reproduced.
  51. 96 Each step has its own title that explains what

    kind of task is, and you can customize it along with comments.
  52. 97 You can click each step to open what the

    operation is with an intuitive UI.
  53. The summary view is generated automatically for each step so

    that you can see how the data is changed before and after the step, in an intuitive way. 98
  54. Found that this calculation is subtracting the last date from

    the first date where it is supposed to be the opposite. 100
  55. 8 Reasons Why Excel Users Love Exploratory 1. Better Performance

    & Bigger Data Size 2. Lower Learning Cost 3. Better Debuggability 4. Data Reproducibility & Automation 5. No Dependency on Excel God 6. Intuitive Understanding with Visualization 7. Better and Quicker Analytical Capability 8. Reproducibility for Reporting
  56. Hey, sorry, the file I gave you had some errors.

    Here is the correct one! Data Analyst Business People
  57. If the same file has been updated, then all you

    need to do is to click the Import button. 103
  58. Data Analyst Business People Can you create the same list

    you created before, but with this month’s data?
  59. You will need to replicate what you have done before,

    and this for sure will cause unexpected errors. 109
  60. 113 If you need to switch the data file (e.g.

    Excel) then you can open the data source step.
  61. Once the new data is imported, you can click the

    last step, which will reproduce all the data wrangling operations for the new data.
  62. 117 How about when I have another data frame that

    I want to apply the same data wrangling steps?
  63. 118 I want to apply the same data wrangling steps

    that I have created for another data frame.
  64. 119 I want to apply the same data wrangling steps

    that I have created for another data frame. A data frame I have wrangled before. A data frame I need to wrangle now.
  65. 8 Reasons Why Excel Users Love Exploratory 1. Better Performance

    & Bigger Data Size 2. Lower Learning Cost 3. Better Debuggability 4. Data Reproducibility & Automation 5. No Dependency on Excel God 6. Intuitive Understanding with Visualization 7. Better and Quicker Analytical Capability 8. Reproducibility for Reporting
  66. ʮ๺ถސ٬Ұ೔͋ͨΓͷച্τοϓ100ʯσʔλ 1. Removed NA rows 2. Removed outlier values 3.

    Cleaned up the customer names 4. Summarized the Sales by Customer 5. Extracted the first and the last order date 6. Extracted the customers country 7. Filtered for Northern America customers 8. Calculated the customer life time period 9. Calculated the sales per day 10. Keep only the top 100 customers
  67. You Boss The guy who has been taking care of

    the Excel left the company. Can you take care of it from now on? One day…
  68. 127 • Nobody knows how the data was cleaned and

    transformed because there is no history or steps. • Each calculations with special Excel functions is not each to understand for others. Blackbox
  69. 128 • Always the Excel works go to the same

    one person. It becomes a bottleneck, and it doesn’t scale. • Transferring such works is hard, and usually never happens. Bottleneck / Burnout
  70. &%' EDF - Exploratory Data Format You can export the

    data along with the steps to reproduce the data as an EDF, and you can share it with others so that they can import the data along with the steps to prepare the data.
  71. Select a step you want to export the data for

    and select ‘EDF’ under the Export menu.
  72. 134 You can click each step to open the related

    UI dialog and intuitively understand how the data was created, and update it if needed.
  73. 135 By exporting your data in a reproducible way as

    EDF and sharing it with other members, you can collaborate with others more effectively.
  74. 8 Reasons Why Excel Users Love Exploratory 1. Better Performance

    & Bigger Data Size 2. Lower Learning Cost 3. Better Debuggability 4. Data Reproducibility & Automation 5. No Dependency on Excel God 6. Intuitive Understanding with Visualization 7. Better and Quicker Analytical Capability 8. Reproducibility for Reporting Part 2