Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Exploratory Seminar #36 - How to Start Exploratory for Excel Users Part 2

Exploratory Seminar #36 - How to Start Exploratory for Excel Users Part 2

There are so many things you can do with Exploratory and they are much more capable and flexible than the things you would do with Excel.

But, there are a few things you want to know before you get started especially in the area of Data Wrangling so that you can smoothly transition to the Exploratory way.

We have compiled a list of such topics based on the feedback from our users. This is the second half of this series.

* Summarize (Aggregate) Data
* If_Else / Case_When
* Merge Rows
* Transform Wide Data to Long Data
* Text Data Wrangling

19fc8f6113c5c3d86e6176362ff29479?s=128

Kan Nishida
PRO

March 03, 2021
Tweet

Transcript

  1. 🚀 EXPLORATORY Online Seminar #36 How to Start Exploratory for

    Excel Users - Part 2
  2. Kan Nishida CEO/co-founder Exploratory Summary In Spring 2016, launched Exploratory,

    Inc. to democratize Data Science. Prior to Exploratory, Kan was a director of product development at Oracle leading teams to build various Data Science products in areas including Machine Learning, BI, Data Visualization, Mobile Analytics, Big Data, etc. While at Oracle, Kan also provided training and consulting services to help organizations transform with data. @KanAugust Speaker
  3. Mission Democratize Data Science

  4. 4 Questions Communication Data Access Data Wrangling Visualization Analytics (Statistics

    / Machine Learning) Data Analysis Data Science Workflow
  5. 5 Questions Communication (Dashboard, Note, Slides) Data Access Data Wrangling

    Visualization Analytics (Statistics / Machine Learning) Data Analysis ExploratoryɹModern & Simple UI
  6. 🚀 EXPLORATORY Online Seminar #36 How to Start Exploratory for

    Excel Users - Part 2
  7. 7 We have compiled a list of the operations in

    Exploratory that makes the transition from Excel to Exploratory as smooth as possible, based on all the questions we have received from the users in the past.
  8. Agenda 1. Import CSV / Excel Data 2. Convert Data

    Type 3. Create Calculations 4. Filter 5. Replace Values 6. Join - Add Columns from Another Data Frame 7. Summarize (Aggregate) Data 8. Condition - IF_ELSE / CASE_WHEN 9. Merge - Add Rows from Other Data Frames 10. Transform Wide Data to Long Data 11. Text Data Wrangling - Extract, Replace, Remove, Convert 8
  9. Agenda 1. Import CSV / Excel Data 2. Convert Data

    Type 3. Create Calculations 4. Filter 5. Replace Values 6. Join - Add Columns from Another Data Frame 7. Summarize (Aggregate) Data 8. Condition - IF_ELSE / CASE_WHEN 9. Merge - Add Rows from Other Data Frames 10. Transform Wide Data to Long Data 11. Text Data Wrangling - Extract, Replace, Remove, Convert 9 Part 1
  10. 10 We have done the first half of this series

    last week. We’d highly recommend you check that out first. February 24th, 2021
  11. You can find it from our Online Seminar page.

  12. 12 Click the ‘Past Seminar’ link.

  13. 13

  14. Agenda 1. Import CSV / Excel Data 2. Convert Data

    Type 3. Create Calculations 4. Filter 5. Replace Values 6. Join - Add Columns from Another Data Frame 7. Summarize (Aggregate) Data 8. Condition - IF_ELSE / CASE_WHEN 9. Merge - Add Rows from Other Data Frames 10. Transform Wide Data to Long Data 11. Text Data Wrangling - Extract, Replace, Remove, Convert 14 Part 2
  15. Agenda 1. Import CSV / Excel Data 2. Convert Data

    Type 3. Create Calculations 4. Filter 5. Replace Values 6. Join - Add Columns from Another Data Frame 7. Summarize (Aggregate) Data 8. Condition - IF_ELSE / CASE_WHEN 9. Merge - Add Rows from Other Data Frames 10. Transform Wide Data to Long Data 11. Text Data Wrangling - Extract, Replace, Remove, Convert 15
  16. 16 We’ll use the same Sale data we used in

    the Part 1 seminar.
  17. 17 Each row represents an order of a product from

    a customer at a given date at this store.
  18. Country ച্ US 100 US 120 US 200 Japan 300

    Japan 200 Japan 190 Country Counts Sales US 3 140 Japan 3 230 18 Summarize
  19. 19 In Excel, you use Pivot Table to summarize data.

  20. 20 ExploratoryͰ͸ɺνϟʔτɾϏϡʔʹ͋ΔूܭςʔϒϧΛ࢖ͬͯूܭɺ·ͨ͸ σʔλϥϯάϦϯάͷεςοϓͱͯ͠σʔλࣗମΛूܭ͢Δ͜ͱ͕Ͱ͖Δɻ Chart View: Summarize Table Wrangling Step: Summarize

  21. Summarize Chart View: Summarize Table Wrangling Step: Summarize

  22. Chart View: Summarize Table

  23. 23 Select ‘Summarize Table’.

  24. 24 Assign columns to Group By to group the data.

  25. 25 Assign Columns to Value to summarize the values with

    summarize functions (e.g. Sum, Mean, etc.)
  26. 26 You can format the table.

  27. 27 Set the font style / color and configure the

    Visual Formatting.
  28. 28 Sometimes, you want to do something on top of

    the summarized data such as creating new calculations, joining with other data frames, etc.
  29. You can Summarize data as the Step

  30. 30 Select ‘Summarize’ from the column header menu.

  31. 31 Select the columns for Group by and Value, the

    same way we do with the Summarize Table under Chart view.
  32. 32 You can change the order of the columns with

    drag-and-drop.
  33. 33 You can rename the column names inside the Summarize

    dialog.
  34. 34 Once it’s done, you’ll see the result and a

    new step being added to the Step section.
  35. 35 You can create new calculations or add other wrangling

    steps on top of the data.
  36. 36 Once you add a new step you can check

    how the data is under the Summary view.
  37. Agenda 1. Import CSV / Excel Data 2. Convert Data

    Type 3. Create Calculations 4. Filter 5. Replace Values 6. Join - Add Columns from Another Data Frame 7. Summarize (Aggregate) Data 8. Condition - IF_ELSE / CASE_WHEN 9. Merge - Add Rows from Other Data Frames 10. Transform Wide Data to Long Data 11. Text Data Wrangling - Extract, Replace, Remove, Convert 37
  38. 38 In Excel, you use ‘IF’ function.

  39. 39 We want to label the regions whose sales are

    greater than $1,000 as ‘High’, otherwise ‘Low’. Market Region Sales Marketing Volume Europe UK 1500 200 High Europe France 200 3 Low Europe Germany 1100 150 High Europe Italy 300 50 Low North America US 3000 500 High
  40. 40 Select ‘Replace Values’ -> ‘By Setting Conditions’ from the

    column header menu.
  41. 41 You can create the conditions with UI.

  42. 42 It’s the same Filter UI for creating the condition.

  43. 43 Click the New Value button to open a dialog,

    and enter the word ‘High’. This is actually the calculation editor, so you can type column names or create calculations, too!
  44. 44 You can enter a default value in case the

    condition doesn’t match.
  45. 45 You can select either create a new column or

    overwrites the existing column.
  46. 46 A new column with the labels (High and Low)

    is created and a new step is added to the Wrangling Step area.
  47. 47 Multiple Conditions

  48. 48 What if we want to create multiple conditions? Sales

    > 200,000 AND Marketing > 30,000
  49. You can use ‘Custom’ tab and write the multiple conditions.

  50. You can create a single condition with the UI and

    copy the syntax so that you can paste it in the Custom tab to start quickly!
  51. None
  52. 52 AND OR Sales > 200000 & Marketing > 30000

    Sales > 200000 | Marketing > 30000
  53. None
  54. Agenda 1. Import CSV / Excel Data 2. Convert Data

    Type 3. Create Calculations 4. Filter 5. Replace Values 6. Join - Add Columns from Another Data Frame 7. Summarize (Aggregate) Data 8. Condition - IF_ELSE / CASE_WHEN 9. Merge - Add Rows from Other Data Frames 10. Transform Wide Data to Long Data 11. Text Data Wrangling - Extract, Replace, Remove, Convert 54
  55. 55 Sometimes, you want to add new rows from other

    data frames.
  56. 56 There is a new data being imported for the

    latest month.
  57. 57 Go to the main data frame to which you

    want to add the other data frames.
  58. 58 You can add the data at any of the

    steps. This time, we’ll add it right after the data source step so that we can apply all the data wrangling steps we have created so far for the merged data together.
  59. 59 Select ‘Merge’ from the Step menu.

  60. 60 Select the data frames you want to merge.

  61. 61 Once it’s merged you’ll notice that the number of

    rows is increased.
  62. 62 The ‘ID’ column holds the information about which of

    the original data frame each row is coming from.
  63. 63 You can see how many rows are coming from

    each of the original data frames.
  64. 64 When you click on the last step it will

    automatically apply all the data wrangling steps to the newly merged data.
  65. 65 You can find more detail information at our How-to

    page.
  66. Agenda 1. Import CSV / Excel Data 2. Convert Data

    Type 3. Create Calculations 4. Filter 5. Replace Values 6. Join - Add Columns from Another Data Frame 7. Summarize (Aggregate) Data 8. Condition - IF_ELSE / CASE_WHEN 9. Merge - Add Rows from Other Data Frames 10. Transform Wide Data to Long Data 11. Text Data Wrangling - Extract, Replace, Remove, Convert 66
  67. 67 Apple Map Mobility Trend Data

  68. 68 You can see how the movement of people has

    changed since 1/13/2020.
  69. 69 There are a lot of columns…

  70. 70 Each day of the mobility data is added as

    a column.
  71. 71 But, this way of having the data makes it

    difficult to visualize. Ideally, you want to select a single column that holds the date information.
  72. 72 This type of data is called ‘Wide’ data. New

    data grows to the horizontal direction.
  73. 73 On the other hand, if the new observations are

    added in a vertical direction we call it ‘Long’ data.
  74. 74 The ‘Long’ data is easy to visualize.

  75. 75 We want to transform the ‘Wide’ data to the

    ‘Long’ data. Gather
  76. 76 Select the multiple ‘date’ columns with the Shift key

    and select ‘Gather’ from the column header menu. You don’t need to select all the columns, you can adjust the column selection in the Gather dialog.
  77. 77 Inside the Gather dialog, make sure that the appropriate

    columns are selected for the Start and the End.
  78. Alternatively, you can select the columns by excluding the ones

    from the target for the Gather operation. This is useful when you expect that new columns will be added for the new dates in future because you don’t need to hard-code the column names for the target.
  79. 79 You can set the column names for the newly

    created columns.
  80. 80 Once you run it, the data is transformed to

    the ‘Long’ format.
  81. 81 If you find that the data type is not

    appropriate you can fix it.
  82. 82 Select the order that matches with the data.

  83. 83 It will open the ‘Create Calculation’ dialog with the

    ‘ymd’ function. Click the Run button
  84. 84 The ‘Date’ columns is now registered as ‘Date’ data

    type.
  85. 85 Go to the Chart view, and move the Pin

    button to the last step by drag- and-drop.
  86. 86 Assign the ‘Date’ column to X-Axis and the ‘Change’

    column to Y-Axis to see the mobility trend.
  87. 87 Use the Chart Filter to keep only the United

    States.
  88. 88 And, assign ‘sub-region’ column to Color so that we

    can see the mobility trend by State.
  89. Agenda 1. Import CSV / Excel Data 2. Convert Data

    Type 3. Create Calculations 4. Filter 5. Replace Values 6. Join - Add Columns from Another Data Frame 7. Summarize (Aggregate) Data 8. Condition - IF_ELSE / CASE_WHEN 9. Merge - Add Rows from Other Data Frames 10. Transform Wide Data to Long Data 11. Text Data Wrangling - Extract, Replace, Remove, Convert 89
  90. 90 When you want to clean or transform text data

    you can use Text Wrangling UI, which can be accessed from the column header menu.
  91. Text Data Wrangling UI

  92. The Preview area shows you the original values in an

    aggregated format so that you can see how many rows there are for each value.
  93. You can search values that contains a certain text.

  94. You can visualize the spaces, tabs, and new lines.

  95. There are many operations you can do with this UI!

  96. Unicorn Data

  97. https://www.cbinsights.com/research-unicorn-companies

  98. 98

  99. 99 • Replace • Remove • Extract • Convert Text

    Data Wrangling Operation Types:
  100. 100 • Replace • Remove • Extract • Convert Text

    Data Wrangling Operation Types:
  101. 101 Fintech E-Commerce Edtech Artificial Intelligence FinTechnology E-Commerce EdTechnology Artificial

    Intelligence Before After Replace
  102. 102 We want to replace ‘CapitalG’ with ‘Google’ in the

    text data.
  103. 103 Select ‘Work with Text Data’ -> ‘Text’ -> ‘Replace’.

  104. 104

  105. 105 • Replace • Remove • Extract • Convert Text

    Data Wrangling Operation Types:
  106. Preparation

  107. Multiple investor names are presented together in a single column.

    We want to separate them so that each investor will be in each row.
  108. Select ‘Separate’ -> ‘Separate to Rows by’ -> ‘Comma’.

  109. Now, each investor is in each row.

  110. There are multiple ‘Softbank’ related entries. We want to clean

    this up.
  111. 111 Softbank Group Softbank Remove ‘Group’ We want to make

    ‘Softbank Group’ to be ‘Softbank’ by removing ‘Group’ letters.
  112. 112 Select ‘Work with Text Data’ -> ‘Remove’ -> ‘Text’.

  113. 113 Type ‘Softbank’ to search the data that contains the

    letters.
  114. 114 Type ‘Group’ as the letters to be removed, and

    click ‘Preview’ button.
  115. 115 There are other letters we want to remove to

    make all the ‘Softbank’ related entries to be ‘Softbank’.
  116. 116 *OWFTUPST 4PGUCBOL(SPVQ 4PGUCBOL$PSQ 4PGUCBOL$BQJUBM *OWFTUPST 4PGUCBOL 4PGUCBOL 4PGUCBOL We

    can remove unnecessary letters all at once!
  117. 117 Select ‘Text (Multiple Candidates)’ as the operation, type ‘Group,

    Corp., Capit as the letters to be removed, and click the ‘Preview’ button.
  118. 118 Clear the search text.

  119. 119 Sometimes, unintended removals of letters can happen.

  120. 120 *OWFTUPST 4PGUCBOL$BQJUBM 4FRVPJB$BQJUBM$IJOB *OWFTUPST 4PGUCBOL 4FRVPJB$BQJUBM$IJOB Let’s say we

    want to remove ‘Capital’ only when it is located at the end of the value.
  121. 121 By selecting ‘End’ position, you can remove the specified

    letters only when it is located at the end.
  122. 122 Position "OZXIFSF #FHJOOJOH &OE 4PGUCBOL$BQJUBM 4PGUCBOL$BQJUBM 4PGUCBOL$BQJUBM 4FRVPJB$BQJUBM$IJOB 4FRVPJB$BQJUBM$IJOB

    4FRVPJB$BQJUBM$IJOB $BQJUBM( $BQJUBM( $BQJUBM(
  123. 123

  124. None
  125. 125 You can check the result by using the Table

    filter under the Table view.
  126. 126 There are 31 investors that contain ‘Softbank’.

  127. 127 But we see only 27 for Softbank under Summary

    view.
  128. 128 This can happen when there are extra spaces at

    the end or the beginning.
  129. 129 You can remove such unnecessary spaces with the Text

    Data Wrangling UI.
  130. 130 By checking the ‘Show Invisible Characters’ we can see

    where the extra spaces are and how they are about to be removed.
  131. 131 Once you run it, now you have 31 Softbank

    entries!
  132. Extra Spaces

  133. 133 • Replace • Remove • Extract • Convert Text

    Data Wrangling Operation Types:
  134. Remove or Replace URL.

  135. You can extract (or remove or replace) the text inside

    specified characters.
  136. 136 • Replace • Remove • Extract • Convert Text

    Data Wrangling Operation Types:
  137. 137 You can convert Country names or codes.

  138. Select ‘Work with Text Data’ -> ‘Convert’ -> ‘Country’

  139. You want to make sure to select the right type

    that matches with your data.
  140. You can convert to make the cases to be consistent.

  141. None
  142. 142 There are many tutorials about Text Data Wrangling!

  143. ❤ Why Excel Users Love Exploratory? Part 1 EXPLORATORY Online

    Seminar #37
  144. None
  145. Information Email kan@exploratory.io Website https://exploratory.io Twitter @ExploratoryData Seminar https://exploratory.io/online-seminar

  146. Q & A 146

  147. EXPLORATORY 147