Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Seminar #35 - How to Start Exploratory for Excel Users Part 1

Kan Nishida
February 24, 2021

Seminar #35 - How to Start Exploratory for Excel Users Part 1

There are so many things you can do with Exploratory and they are much more capable and flexible than the things you would do with Excel.

But, there are a few things you want to know before you get started especially in the area of Data Wrangling so that you can smoothly transition to the Exploratory way.

We have compiled a list of such topics based on the feedback from our users. This is the 1st half of this series.

- Convert Data Type
- Create Calculations
- Filter
- Summarize / Aggregate
- Join Columns
- Replace Values

UI Tool: Exploratory(https://exploratory.io/)

Kan Nishida

February 24, 2021
Tweet

More Decks by Kan Nishida

Other Decks in Technology

Transcript

  1. Kan Nishida CEO/co-founder Exploratory Summary In Spring 2016, launched Exploratory,

    Inc. to democratize Data Science. Prior to Exploratory, Kan was a director of product development at Oracle leading teams to build various Data Science products in areas including Machine Learning, BI, Data Visualization, Mobile Analytics, Big Data, etc. While at Oracle, Kan also provided training and consulting services to help organizations transform with data. @KanAugust Speaker
  2. 4 Questions Communication Data Access Data Wrangling Visualization Analytics (Statistics

    / Machine Learning) Data Analysis Data Science Workflow
  3. 5 Questions Communication (Dashboard, Note, Slides) Data Access Data Wrangling

    Visualization Analytics (Statistics / Machine Learning) Data Analysis ExploratoryɹModern & Simple UI
  4. 7 We have compiled a list of the things that

    our Exploratory users wished they would have known when they transitioned into Exploratory from Excel. By knowing these basic things, you can start Exploratory smoothly and take advantage of what Exploratory offers to improve your productively in visualizing and analyzing data.
  5. Agenda 1. Import CSV / Excel Data 2. Convert Data

    Type 3. Create Calculations 4. Filter 5. Replace Values 6. Join - Add Columns from Another Data Frame 7. Summarize (Aggregate) Data 8. Condition - IF_ELSE / CASE_WHEN 9. Merge - Add Rows from Other Data Frames 10. Transform Wide Data to Long Data 11. Text Data Wrangling - Extract, Replace, Remove, Convert 8
  6. Agenda 1. Import CSV / Excel Data 2. Convert Data

    Type 3. Create Calculations 4. Filter 5. Replace Values 6. Join - Add Columns from Another Data Frame 7. Summarize (Aggregate) Data 8. Condition - IF_ELSE / CASE_WHEN 9. Merge - Add Rows from Other Data Frames 10. Transform Wide Data to Long Data 11. Text Data Wrangling - Extract, Replace, Remove, Convert 9 Part 1
  7. Agenda 1. Import CSV / Excel Data 2. Convert Data

    Type 3. Create Calculations 4. Filter 5. Replace Values 6. Join - Add Columns from Another Data Frame 7. Summarize (Aggregate) Data 8. Condition - IF_ELSE / CASE_WHEN 9. Merge - Add Rows from Other Data Frames 10. Transform Wide Data to Long Data 11. Text Data Wrangling - Extract, Replace, Remove, Convert 10
  8. 11 We’ll use this Sales data. Each row represents an

    order of a product from a customer.
  9. 14 Once you select a file, you can see the

    data in the preview table.
  10. • Empty Rows at the Beginning • Error While Importing

    • Time Zone 15 A few things to check when importing data…
  11. • Empty Rows at the Beginning • Error While Importing

    • Time Zone 16 A few things to check when importing data…
  12. A few things to check when importing data… • Empty

    Rows at the Beginning • Error While Importing • Time Zone 21
  13. 22 We tend to see this problem for data with

    non-Latin languages on Windows PC. You might get an error like the one below when importing the data. This happens when the encoding of your data is not appropriately handled.
  14. A few things to check when importing data… • Empty

    Rows at the Beginning • Error While Importing • Time Zone 24
  15. 28 It assumes that the data is in the default

    time zone. The default timezone is determined based on the locale setting on your PC.
  16. 31 Click the ‘Save’ button to import the data as

    a Data Frame. You can use only alphabet, number, dash, and underscore.
  17. 32 Once it’s imported you will see a summary view

    is automatically generated for your data. It gives you a summary information for each column.
  18. Agenda 1. Import CSV / Excel Data 2. Convert Data

    Type 3. Create Calculations 4. Filter 5. Replace Values 6. Join - Add Columns from Another Data Frame 7. Summarize (Aggregate) Data 8. Condition - IF_ELSE / CASE_WHEN 9. Merge - Add Rows from Other Data Frames 10. Transform Wide Data to Long Data 11. Text Data Wrangling - Extract, Replace, Remove, Convert 33
  19. Convert Data • Convert from Character to Date / Time

    • Convert from Character to Numeric 34
  20. Convert Data • Convert from Character to Date / Time

    • Convert from Character to Numeric 35
  21. 36 In general, it automatically assign appropriate data types with

    heuristic guess based on the first 1,000 rows.
  22. 37 But, sometimes it doesn’t come as expected due to

    the ways the data was formatted in the original data file.
  23. 38 With this example data, the original data uses ‘/

    (slash)’ as the separator, which prevents it from recognizing these columns as Date columns.
  24. 2017-01-01 year month day 41 ZNE This function converts a

    given data from Character type to Date type as long as the data is in an order of Year, Month, and Day.
  25. ymd("2015-10-01") ymd("2015/10/01") ymd("Created on 2015 October 1st") 42 ymd What

    it matters is the order of Year, Month, and Day. What comes between them is not important at all!
  26. 43 Once it’s converted to Date, you can see the

    data distribution along the time and check the summary metrics.
  27. 44 But, the data is not always presented in Year,

    Month, Day… How about Month, Day, and Year?
  28. Convert Data • Convert from Character to Date / Time

    • Convert from Character to Numeric 52
  29. 54 When you have a mix of character and numeric

    letters in a single column then it is recognized as Character type. In this case, the ‘$’ sign makes it a Character type column.
  30. 56 The ‘parse_number’ function strips out all the non-numeric characters

    and symbols, and converts it to Numeric type. It opens a Calculation Editor dialog with ‘parse_number’ function being populated.
  31. 57 Once it’s converted to Numeric type, you can see

    how the Sales data is distributed and check the summary metrics for Sales.
  32. 59 You can select multiple columns with Command (or Control)

    key and select ‘Change Data Type’ / ‘Convert to Numeric’ from the column header menu.
  33. Agenda 1. Import CSV / Excel Data 2. Convert Data

    Type 3. Create Calculations 4. Filter 5. Replace Values 6. Join - Add Columns from Another Data Frame 7. Summarize (Aggregate) Data 8. Condition - IF_ELSE / CASE_WHEN 9. Merge - Add Rows from Other Data Frames 10. Transform Wide Data to Long Data 11. Text Data Wrangling - Extract, Replace, Remove, Convert 62
  34. 63 We want to calculate ‘Unit Price’ by dividing the

    Sales by the Quantity. Unit Price = Sales / Quantity
  35. 67 A newly calculated column is being added, and the

    definition of the calculation is registered as a new step.
  36. 69 Now, we want to know how much of the

    orders are greater than $500.
  37. 70 We want to flag TRUE for the rows with

    ‘Greater than $500 Sales’, otherwise FALSE.
  38. 71 Sales >= 500 Here is a simple calculation. It’s

    a logical calculation that returns either TRUE (when matches) or FALSE (when not matches).
  39. 75 We can quickly see that about 13% of the

    orders are greater than $300 for the Sales.
  40. 77

  41. 79 Logical Operator Operator Meaning Example Equal Category == “Funiture”

    != Not Equal Category != “Furniture” > Greater Than Sales > 10000 >= Greater Than or Equal Sales >= 10000 < Less Than Sales < 10000 <= Less Than or Equal Sales < 10000 %in% Contain Category %in% c(“Furniture”, “Technology”) %nin% Not Contain Category %nin% c(“Furniture”, “Technology”) ==
  42. 80 For the Equal, you need to use ‘==‘ as

    the operator, instead of ‘=‘.
  43. 82 You can use the same ‘==‘ operator, but make

    sure to surround the character value with either double quotes or single quotes
  44. Agenda 1. Import CSV / Excel Data 2. Convert Data

    Type 3. Create Calculations 4. Filter 5. Replace Values 6. Join - Add Columns from Another Data Frame 7. Summarize (Aggregate) Data 8. Condition - IF_ELSE / CASE_WHEN 9. Merge - Add Rows from Other Data Frames 10. Transform Wide Data to Long Data 11. Text Data Wrangling - Extract, Replace, Remove, Convert 83
  45. There are 3 types of Filters in Exploratory. • Step

    Filter • Chart (or Analytics) Filter • Table View Filter 85
  46. There are 3 types of Filters in Exploratory. • Step

    Filter • Chart (or Analytics) Filter • Table View Filter 86
  47. 87 We want to filter to keep only 2020 data

    based on the Order Date column.
  48. 89 Type 2020 as the value. This makes the filter

    condition to be ‘Order Date is 2020’.
  49. 90 Now the data is only for 2020. And the

    filter definition is registered as a new Step.
  50. There are 3 types of Filters in Exploratory. • Step

    Filter • Chart (or Analytics) Filter • Table View Filter 91
  51. 103

  52. 104 Now, the chart is ‘pinned’ to the ‘Step 5

    - Filter’. This means it is taking only the 2020 data.
  53. 105 If you want to see the whole data without

    the 2020 filter you can move the ‘pinned’ position to the previous step by drag-and-drop.
  54. There are 3 types of Filters in Exploratory. • Step

    Filter • Chart (or Analytics) Filter • Table View Filter 106
  55. 107 If you are looking for a particular set of

    data you might want to use Table View filter.
  56. 108 Just like the Chart filter, this won’t create a

    Step (right-hand-side). It lives only inside the Table view. It’s best to use when you are looking for rows with specific values.
  57. Agenda 1. Import CSV / Excel Data 2. Convert Data

    Type 3. Create Calculations 4. Filter 5. Replace Values 6. Join - Add Columns from Another Data Frame 7. Summarize (Aggregate) Data 8. Condition - IF_ELSE / CASE_WHEN 9. Merge - Add Rows from Other Data Frames 10. Transform Wide Data to Long Data 11. Text Data Wrangling - Extract, Replace, Remove, Convert 109
  58. 111 There are ‘Tech’ and ‘Technology’ in the Category column.

    We want to combine these two into a single value of ‘Technology’.
  59. 114 Once it’s done, we can confirm that ‘Tech’ and

    ‘Technology’ have been merged together.
  60. Agenda 1. Import CSV / Excel Data 2. Convert Data

    Type 3. Create Calculations 4. Filter 5. Replace Values 6. Join - Add Columns from Another Data Frame 7. Summarize (Aggregate) Data 8. Condition - IF_ELSE / CASE_WHEN 9. Merge - Add Rows from Other Data Frames 10. Transform Wide Data to Long Data 11. Text Data Wrangling - Extract, Replace, Remove, Convert 115
  61. 117 We have a customer attribute data, and want to

    bring this information into the Sales data.
  62. 121 UPPER CASE letters and lower case letter are considered

    different, so they don’t get joined! $VTUPNFS*% 0SEFS%BUF 4BMFT ""   ##   $$   $VTUPNFS*% $PVOUSZ .BSLFU "" +BQBO "TJB1BDJpD CC 'SBODF &VSPQF $$ 64 64$" Join Key $VTUPNFS*% 0SEFS%BUF 4BMFT $PVOUSZ .BSLFU ""   +BQBO "TJB1BDJpD ##   /" /" $$   64 64$"
  63. 122 But, you can make them joined by checking the

    ‘Case Insensitive Join’ parameter.
  64. 124 You can check if all the rows are joined

    as expected under the Summary view. When you see the same number of NA rows across the new columns that is usually when some rows are not joined. Looks all the rows from the main data frame have found the corresponding rows in the target data frame.
  65. • You can select multiple columns as Join Key Columns.

    • You can select from various types of Join methods. 125 A few things about Join…
  66. 126 /BNF $PVOUSZ 0SEFS%BUF 4BMFT 5PN 64   5PN

    6,   +PIO 64   /BNF $PVOUSZ "HF 5PN 64  5PN 6,  +PIO 64  Join Key /BNF $PVOUSZY $PVOUSZZ 0SEFS%BUF 4BMFT "HF 5PN 64 64    5PN 64 6,    5PN 6, 64    5PN 6, 6,    +PIO 64 64    If the values of the Join Key in the main data frame find multiple corresponding values in the target data frame it will try to join with all of them. This would create ‘duplicated’ rows that might not be as you expect. Tom in US is repeated twice, so as Tom in UK.
  67. /BNF $PVOUSZ "HF 5PN 64  5PN 6,  +PIO

    64  /BNF $PVOUSZ 0SEFS%BUF 4BMFT 5PN 64   5PN 6,   +PIO 64   127 /BNF $PVOUSZ 0SEFS%BUF 4BMFT "HF 5PN 64    5PN 6,    +PIO 64    You can select multiple columns as Join Key columns, then each row in the main data frame finds only one corresponding row in the target data frame, hence it won’t need to duplicate itself.
  68. • You can select multiple columns as Join Key Columns.

    • You can select from various types of Join methods. 129 A few things about Join…
  69. 131 There are 2 notes that talk about various methods.

    Part 1: https://exploratory.io/note/kanaugust/Introduction-to-Join-Adding-Columns-from-Another-Data-Frame-iUm5YNI7RK Part 2: https://exploratory.io/note/kanaugust/Introduction-to-Join-Part-2-Filter-Data-based-on-Another-Data-Frame-Ehb9AnJ6se
  70. Agenda 1. Import CSV / Excel Data 2. Convert Data

    Type 3. Create Calculations 4. Filter 5. Replace Values 6. Join - Add Columns from Another Data Frame 7. Summarize (Aggregate) Data 8. Condition - IF_ELSE / CASE_WHEN 9. Merge - Add Rows from Other Data Frames 10. Transform Wide Data to Long Data 11. Text Data Wrangling - Extract, Replace, Remove, Convert 132 Part 1
  71. Agenda 1. Import CSV / Excel Data 2. Convert Data

    Type 3. Create Calculations 4. Filter 5. Replace Values 6. Join - Add Columns from Another Data Frame 7. Summarize (Aggregate) Data 8. Condition - IF_ELSE / CASE_WHEN 9. Merge - Add Rows from Other Data Frames 10. Transform Wide Data to Long Data 11. Text Data Wrangling - Extract, Replace, Remove, Convert 134 Part 2