Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Seminar #35 - How to Start Exploratory for Excel Users Part 1

Seminar #35 - How to Start Exploratory for Excel Users Part 1

There are so many things you can do with Exploratory and they are much more capable and flexible than the things you would do with Excel.

But, there are a few things you want to know before you get started especially in the area of Data Wrangling so that you can smoothly transition to the Exploratory way.

We have compiled a list of such topics based on the feedback from our users. This is the 1st half of this series.

- Convert Data Type
- Create Calculations
- Filter
- Summarize / Aggregate
- Join Columns
- Replace Values

UI Tool: Exploratory(https://exploratory.io/)

19fc8f6113c5c3d86e6176362ff29479?s=128

Kan Nishida
PRO

February 24, 2021
Tweet

Transcript

  1. EXPLORATORY Online Seminar #35 How to Start Exploratory for Excel

    Users Part 1
  2. Kan Nishida CEO/co-founder Exploratory Summary In Spring 2016, launched Exploratory,

    Inc. to democratize Data Science. Prior to Exploratory, Kan was a director of product development at Oracle leading teams to build various Data Science products in areas including Machine Learning, BI, Data Visualization, Mobile Analytics, Big Data, etc. While at Oracle, Kan also provided training and consulting services to help organizations transform with data. @KanAugust Speaker
  3. Mission Democratize Data Science

  4. 4 Questions Communication Data Access Data Wrangling Visualization Analytics (Statistics

    / Machine Learning) Data Analysis Data Science Workflow
  5. 5 Questions Communication (Dashboard, Note, Slides) Data Access Data Wrangling

    Visualization Analytics (Statistics / Machine Learning) Data Analysis ExploratoryɹModern & Simple UI
  6. EXPLORATORY Online Seminar #35 How to Start Exploratory for Excel

    Users Part 1
  7. 7 We have compiled a list of the things that

    our Exploratory users wished they would have known when they transitioned into Exploratory from Excel. By knowing these basic things, you can start Exploratory smoothly and take advantage of what Exploratory offers to improve your productively in visualizing and analyzing data.
  8. Agenda 1. Import CSV / Excel Data 2. Convert Data

    Type 3. Create Calculations 4. Filter 5. Replace Values 6. Join - Add Columns from Another Data Frame 7. Summarize (Aggregate) Data 8. Condition - IF_ELSE / CASE_WHEN 9. Merge - Add Rows from Other Data Frames 10. Transform Wide Data to Long Data 11. Text Data Wrangling - Extract, Replace, Remove, Convert 8
  9. Agenda 1. Import CSV / Excel Data 2. Convert Data

    Type 3. Create Calculations 4. Filter 5. Replace Values 6. Join - Add Columns from Another Data Frame 7. Summarize (Aggregate) Data 8. Condition - IF_ELSE / CASE_WHEN 9. Merge - Add Rows from Other Data Frames 10. Transform Wide Data to Long Data 11. Text Data Wrangling - Extract, Replace, Remove, Convert 9 Part 1
  10. Agenda 1. Import CSV / Excel Data 2. Convert Data

    Type 3. Create Calculations 4. Filter 5. Replace Values 6. Join - Add Columns from Another Data Frame 7. Summarize (Aggregate) Data 8. Condition - IF_ELSE / CASE_WHEN 9. Merge - Add Rows from Other Data Frames 10. Transform Wide Data to Long Data 11. Text Data Wrangling - Extract, Replace, Remove, Convert 10
  11. 11 We’ll use this Sales data. Each row represents an

    order of a product from a customer.
  12. 12 Select ‘File Data’ from the Data Frame menu.

  13. 13 Select ‘Text File’ for CSV data or ‘Excel File’

    for Excel file data.
  14. 14 Once you select a file, you can see the

    data in the preview table.
  15. • Empty Rows at the Beginning • Error While Importing

    • Time Zone 15 A few things to check when importing data…
  16. • Empty Rows at the Beginning • Error While Importing

    • Time Zone 16 A few things to check when importing data…
  17. 17 The column names look strange…

  18. 18 There are empty rows at the beginning in the

    original data.
  19. 19 Enter 2 for the ‘Skip first N rows’ parameter.

  20. 20 Click ‘Apply’ button to see if it has fixed

    the problem.
  21. A few things to check when importing data… • Empty

    Rows at the Beginning • Error While Importing • Time Zone 21
  22. 22 We tend to see this problem for data with

    non-Latin languages on Windows PC. You might get an error like the one below when importing the data. This happens when the encoding of your data is not appropriately handled.
  23. 23 Make sure to select an appropriate encoding for your

    data.
  24. A few things to check when importing data… • Empty

    Rows at the Beginning • Error While Importing • Time Zone 24
  25. 25 When your data includes Time…

  26. 26 Some data includes timezone information…

  27. 27 But some don’t include the timezone information…

  28. 28 It assumes that the data is in the default

    time zone. The default timezone is determined based on the locale setting on your PC.
  29. 29 If you know a correct timezone for your data

    you can change it.
  30. 30 Also, you can set your default timezone from the

    Configuration dialog.
  31. 31 Click the ‘Save’ button to import the data as

    a Data Frame. You can use only alphabet, number, dash, and underscore.
  32. 32 Once it’s imported you will see a summary view

    is automatically generated for your data. It gives you a summary information for each column.
  33. Agenda 1. Import CSV / Excel Data 2. Convert Data

    Type 3. Create Calculations 4. Filter 5. Replace Values 6. Join - Add Columns from Another Data Frame 7. Summarize (Aggregate) Data 8. Condition - IF_ELSE / CASE_WHEN 9. Merge - Add Rows from Other Data Frames 10. Transform Wide Data to Long Data 11. Text Data Wrangling - Extract, Replace, Remove, Convert 33
  34. Convert Data • Convert from Character to Date / Time

    • Convert from Character to Numeric 34
  35. Convert Data • Convert from Character to Date / Time

    • Convert from Character to Numeric 35
  36. 36 In general, it automatically assign appropriate data types with

    heuristic guess based on the first 1,000 rows.
  37. 37 But, sometimes it doesn’t come as expected due to

    the ways the data was formatted in the original data file.
  38. 38 With this example data, the original data uses ‘/

    (slash)’ as the separator, which prevents it from recognizing these columns as Date columns.
  39. 39 You can convert the data type from the column

    header menu.
  40. 40 It opens a Calculation Editor dialog with ‘ymd’ function

    being populated.
  41. 2017-01-01 year month day 41 ZNE This function converts a

    given data from Character type to Date type as long as the data is in an order of Year, Month, and Day.
  42. ymd("2015-10-01") ymd("2015/10/01") ymd("Created on 2015 October 1st") 42 ymd What

    it matters is the order of Year, Month, and Day. What comes between them is not important at all!
  43. 43 Once it’s converted to Date, you can see the

    data distribution along the time and check the summary metrics.
  44. 44 But, the data is not always presented in Year,

    Month, Day… How about Month, Day, and Year?
  45. 45 You can select the one that matches with the

    way your data is presented.
  46. 46 In this case, it will populate ‘mdy’ function. 01-01-2017

    year month day NEZ
  47. 47 How about when we have both Date and Time?

  48. 48 Select the one that matches with the way your

    data is presented!
  49. 49 In this case, it will populate ‘ymd_hms’ function.

  50. 2017-01-01 09:12:00 hour 50 ZNE@INT second minute

  51. Date POSIXct Date and Time information. Only Date information.

  52. Convert Data • Convert from Character to Date / Time

    • Convert from Character to Numeric 52
  53. 53 The ‘Sales’ column is registered as Character, but this

    is Numeric data!
  54. 54 When you have a mix of character and numeric

    letters in a single column then it is recognized as Character type. In this case, the ‘$’ sign makes it a Character type column.
  55. 55 You can select ‘Convert to Numeric’ from the column

    header menu.
  56. 56 The ‘parse_number’ function strips out all the non-numeric characters

    and symbols, and converts it to Numeric type. It opens a Calculation Editor dialog with ‘parse_number’ function being populated.
  57. 57 Once it’s converted to Numeric type, you can see

    how the Sales data is distributed and check the summary metrics for Sales.
  58. 58 There are still other columns that should be Numeric

    but are currently Character.
  59. 59 You can select multiple columns with Command (or Control)

    key and select ‘Change Data Type’ / ‘Convert to Numeric’ from the column header menu.
  60. 60 ‘Create Calculation for Multiple Columns’ opens, simply click ‘Run’

    button.
  61. 61 Both columns are now converted as Numeric.

  62. Agenda 1. Import CSV / Excel Data 2. Convert Data

    Type 3. Create Calculations 4. Filter 5. Replace Values 6. Join - Add Columns from Another Data Frame 7. Summarize (Aggregate) Data 8. Condition - IF_ELSE / CASE_WHEN 9. Merge - Add Rows from Other Data Frames 10. Transform Wide Data to Long Data 11. Text Data Wrangling - Extract, Replace, Remove, Convert 62
  63. 63 We want to calculate ‘Unit Price’ by dividing the

    Sales by the Quantity. Unit Price = Sales / Quantity
  64. 64 In Excel, you type the calculation in the cell.

  65. 65 In Exploratory, you can create a Calculation.

  66. 66 Type the calculation and enter the new column name,

    then click ‘Run’ button.
  67. 67 A newly calculated column is being added, and the

    definition of the calculation is registered as a new step.
  68. 68 You can quickly check the summary information for the

    new column.
  69. 69 Now, we want to know how much of the

    orders are greater than $500.
  70. 70 We want to flag TRUE for the rows with

    ‘Greater than $500 Sales’, otherwise FALSE.
  71. 71 Sales >= 500 Here is a simple calculation. It’s

    a logical calculation that returns either TRUE (when matches) or FALSE (when not matches).
  72. 72 Select ‘Create Calculation’ from the column header menu.

  73. 73 Check ‘Create New Column’ so that it will create

    a new column to hold the result.
  74. 74 A new column is created with TRUE or FALSE

    values.
  75. 75 We can quickly see that about 13% of the

    orders are greater than $300 for the Sales.
  76. 76 You can use ‘Highlight Mode’ to quickly see where

    the TRUE data is.
  77. 77

  78. 78 ࿦ཧԋࢉࢠʹʮҎ্ʯҎ֎Λ࢖͍͍ͨ࣌ʹ͸ʁ

  79. 79 Logical Operator Operator Meaning Example Equal Category == “Funiture”

    != Not Equal Category != “Furniture” > Greater Than Sales > 10000 >= Greater Than or Equal Sales >= 10000 < Less Than Sales < 10000 <= Less Than or Equal Sales < 10000 %in% Contain Category %in% c(“Furniture”, “Technology”) %nin% Not Contain Category %nin% c(“Furniture”, “Technology”) ==
  80. 80 For the Equal, you need to use ‘==‘ as

    the operator, instead of ‘=‘.
  81. 81 If we want to use Character data type column…

  82. 82 You can use the same ‘==‘ operator, but make

    sure to surround the character value with either double quotes or single quotes
  83. Agenda 1. Import CSV / Excel Data 2. Convert Data

    Type 3. Create Calculations 4. Filter 5. Replace Values 6. Join - Add Columns from Another Data Frame 7. Summarize (Aggregate) Data 8. Condition - IF_ELSE / CASE_WHEN 9. Merge - Add Rows from Other Data Frames 10. Transform Wide Data to Long Data 11. Text Data Wrangling - Extract, Replace, Remove, Convert 83
  84. 84 How ‘Filter’ works in Exploratory?

  85. There are 3 types of Filters in Exploratory. • Step

    Filter • Chart (or Analytics) Filter • Table View Filter 85
  86. There are 3 types of Filters in Exploratory. • Step

    Filter • Chart (or Analytics) Filter • Table View Filter 86
  87. 87 We want to filter to keep only 2020 data

    based on the Order Date column.
  88. 88 Select ‘Filter’ -> ‘Equal to’ -> ‘Year’ from the

    column header menu.
  89. 89 Type 2020 as the value. This makes the filter

    condition to be ‘Order Date is 2020’.
  90. 90 Now the data is only for 2020. And the

    filter definition is registered as a new Step.
  91. There are 3 types of Filters in Exploratory. • Step

    Filter • Chart (or Analytics) Filter • Table View Filter 91
  92. 92 Create a chart with the Order Date column.

  93. 93 A chart is created to show the number of

    rows for each month.
  94. 94 You can change the aggregation level.

  95. 95 You can assign a column to ‘Color’ to separate

    the line to multiple groups.
  96. 96 This chart is visualizing the data from the Step

    5.
  97. 97 Now, we can filter the chart data with the

    Chart Filter.
  98. 98 This will filter the data only for this chart,

    without creating a new Step.
  99. 99 We can duplicate this chart and save as a

    new chart.
  100. 100 This time, we can update the existing filter to

    ‘Office Supplies’.
  101. 101 Data Import Create Calculations Filter: Year == 2020

  102. 102 Furniture Office Supplies Data Import Create Calculations Filter: Year

    == 2020
  103. 103

  104. 104 Now, the chart is ‘pinned’ to the ‘Step 5

    - Filter’. This means it is taking only the 2020 data.
  105. 105 If you want to see the whole data without

    the 2020 filter you can move the ‘pinned’ position to the previous step by drag-and-drop.
  106. There are 3 types of Filters in Exploratory. • Step

    Filter • Chart (or Analytics) Filter • Table View Filter 106
  107. 107 If you are looking for a particular set of

    data you might want to use Table View filter.
  108. 108 Just like the Chart filter, this won’t create a

    Step (right-hand-side). It lives only inside the Table view. It’s best to use when you are looking for rows with specific values.
  109. Agenda 1. Import CSV / Excel Data 2. Convert Data

    Type 3. Create Calculations 4. Filter 5. Replace Values 6. Join - Add Columns from Another Data Frame 7. Summarize (Aggregate) Data 8. Condition - IF_ELSE / CASE_WHEN 9. Merge - Add Rows from Other Data Frames 10. Transform Wide Data to Long Data 11. Text Data Wrangling - Extract, Replace, Remove, Convert 109
  110. 110 Sometimes, you want to replace the values with certain

    words.
  111. 111 There are ‘Tech’ and ‘Technology’ in the Category column.

    We want to combine these two into a single value of ‘Technology’.
  112. 112 Select ‘Replace Values’ -> ‘With New Values’ from the

    column header menu.
  113. 113 Replace ‘Tech’ with ‘Technology’.

  114. 114 Once it’s done, we can confirm that ‘Tech’ and

    ‘Technology’ have been merged together.
  115. Agenda 1. Import CSV / Excel Data 2. Convert Data

    Type 3. Create Calculations 4. Filter 5. Replace Values 6. Join - Add Columns from Another Data Frame 7. Summarize (Aggregate) Data 8. Condition - IF_ELSE / CASE_WHEN 9. Merge - Add Rows from Other Data Frames 10. Transform Wide Data to Long Data 11. Text Data Wrangling - Extract, Replace, Remove, Convert 115
  116. 116 In Excel, you use ‘VLOOKUP’ function.

  117. 117 We have a customer attribute data, and want to

    bring this information into the Sales data.
  118. 118 Select ‘Join’ from the column header menu of ‘Customer

    ID’ column.
  119. 119 Select the customer attribute data frame.

  120. 120 Select the customer ID column as the Key column.

  121. 121 UPPER CASE letters and lower case letter are considered

    different, so they don’t get joined! $VTUPNFS*% 0SEFS%BUF 4BMFT ""   ##   $$   $VTUPNFS*% $PVOUSZ .BSLFU "" +BQBO "TJB1BDJpD CC 'SBODF &VSPQF $$ 64 64$" Join Key $VTUPNFS*% 0SEFS%BUF 4BMFT $PVOUSZ .BSLFU ""   +BQBO "TJB1BDJpD ##   /" /" $$   64 64$"
  122. 122 But, you can make them joined by checking the

    ‘Case Insensitive Join’ parameter.
  123. 123 Once it’s joined, you’ll see the new columns being

    added from the target data frame.
  124. 124 You can check if all the rows are joined

    as expected under the Summary view. When you see the same number of NA rows across the new columns that is usually when some rows are not joined. Looks all the rows from the main data frame have found the corresponding rows in the target data frame.
  125. • You can select multiple columns as Join Key Columns.

    • You can select from various types of Join methods. 125 A few things about Join…
  126. 126 /BNF $PVOUSZ 0SEFS%BUF 4BMFT 5PN 64   5PN

    6,   +PIO 64   /BNF $PVOUSZ "HF 5PN 64  5PN 6,  +PIO 64  Join Key /BNF $PVOUSZY $PVOUSZZ 0SEFS%BUF 4BMFT "HF 5PN 64 64    5PN 64 6,    5PN 6, 64    5PN 6, 6,    +PIO 64 64    If the values of the Join Key in the main data frame find multiple corresponding values in the target data frame it will try to join with all of them. This would create ‘duplicated’ rows that might not be as you expect. Tom in US is repeated twice, so as Tom in UK.
  127. /BNF $PVOUSZ "HF 5PN 64  5PN 6,  +PIO

    64  /BNF $PVOUSZ 0SEFS%BUF 4BMFT 5PN 64   5PN 6,   +PIO 64   127 /BNF $PVOUSZ 0SEFS%BUF 4BMFT "HF 5PN 64    5PN 6,    +PIO 64    You can select multiple columns as Join Key columns, then each row in the main data frame finds only one corresponding row in the target data frame, hence it won’t need to duplicate itself.
  128. 128 Click the green plus button and assign multiple columns

    as Join Key columns.
  129. • You can select multiple columns as Join Key Columns.

    • You can select from various types of Join methods. 129 A few things about Join…
  130. 130 Various types of Join methods are supported.

  131. 131 There are 2 notes that talk about various methods.

    Part 1: https://exploratory.io/note/kanaugust/Introduction-to-Join-Adding-Columns-from-Another-Data-Frame-iUm5YNI7RK Part 2: https://exploratory.io/note/kanaugust/Introduction-to-Join-Part-2-Filter-Data-based-on-Another-Data-Frame-Ehb9AnJ6se
  132. Agenda 1. Import CSV / Excel Data 2. Convert Data

    Type 3. Create Calculations 4. Filter 5. Replace Values 6. Join - Add Columns from Another Data Frame 7. Summarize (Aggregate) Data 8. Condition - IF_ELSE / CASE_WHEN 9. Merge - Add Rows from Other Data Frames 10. Transform Wide Data to Long Data 11. Text Data Wrangling - Extract, Replace, Remove, Convert 132 Part 1
  133. EXPLORATORY Online Seminar #36 How to Start Exploratory for Excel

    Users Part 2
  134. Agenda 1. Import CSV / Excel Data 2. Convert Data

    Type 3. Create Calculations 4. Filter 5. Replace Values 6. Join - Add Columns from Another Data Frame 7. Summarize (Aggregate) Data 8. Condition - IF_ELSE / CASE_WHEN 9. Merge - Add Rows from Other Data Frames 10. Transform Wide Data to Long Data 11. Text Data Wrangling - Extract, Replace, Remove, Convert 134 Part 2
  135. None
  136. Information Email kan@exploratory.io Website https://exploratory.io Twitter @ExploratoryData Seminar https://exploratory.io/online-seminar

  137. Q & A 137

  138. EXPLORATORY 138