Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Exploratory Seminar #36 - How to Start Explorat...

Exploratory Seminar #36 - How to Start Exploratory for Excel Users Part 2

There are so many things you can do with Exploratory and they are much more capable and flexible than the things you would do with Excel.

But, there are a few things you want to know before you get started especially in the area of Data Wrangling so that you can smoothly transition to the Exploratory way.

We have compiled a list of such topics based on the feedback from our users. This is the second half of this series.

* Summarize (Aggregate) Data
* If_Else / Case_When
* Merge Rows
* Transform Wide Data to Long Data
* Text Data Wrangling

Kan Nishida

March 03, 2021
Tweet

More Decks by Kan Nishida

Other Decks in Technology

Transcript

  1. Kan Nishida CEO/co-founder Exploratory Summary In Spring 2016, launched Exploratory,

    Inc. to democratize Data Science. Prior to Exploratory, Kan was a director of product development at Oracle leading teams to build various Data Science products in areas including Machine Learning, BI, Data Visualization, Mobile Analytics, Big Data, etc. While at Oracle, Kan also provided training and consulting services to help organizations transform with data. @KanAugust Speaker
  2. 4 Questions Communication Data Access Data Wrangling Visualization Analytics (Statistics

    / Machine Learning) Data Analysis Data Science Workflow
  3. 5 Questions Communication (Dashboard, Note, Slides) Data Access Data Wrangling

    Visualization Analytics (Statistics / Machine Learning) Data Analysis ExploratoryɹModern & Simple UI
  4. 7 We have compiled a list of the operations in

    Exploratory that makes the transition from Excel to Exploratory as smooth as possible, based on all the questions we have received from the users in the past.
  5. Agenda 1. Import CSV / Excel Data 2. Convert Data

    Type 3. Create Calculations 4. Filter 5. Replace Values 6. Join - Add Columns from Another Data Frame 7. Summarize (Aggregate) Data 8. Condition - IF_ELSE / CASE_WHEN 9. Merge - Add Rows from Other Data Frames 10. Transform Wide Data to Long Data 11. Text Data Wrangling - Extract, Replace, Remove, Convert 8
  6. Agenda 1. Import CSV / Excel Data 2. Convert Data

    Type 3. Create Calculations 4. Filter 5. Replace Values 6. Join - Add Columns from Another Data Frame 7. Summarize (Aggregate) Data 8. Condition - IF_ELSE / CASE_WHEN 9. Merge - Add Rows from Other Data Frames 10. Transform Wide Data to Long Data 11. Text Data Wrangling - Extract, Replace, Remove, Convert 9 Part 1
  7. 10 We have done the first half of this series

    last week. We’d highly recommend you check that out first. February 24th, 2021
  8. 13

  9. Agenda 1. Import CSV / Excel Data 2. Convert Data

    Type 3. Create Calculations 4. Filter 5. Replace Values 6. Join - Add Columns from Another Data Frame 7. Summarize (Aggregate) Data 8. Condition - IF_ELSE / CASE_WHEN 9. Merge - Add Rows from Other Data Frames 10. Transform Wide Data to Long Data 11. Text Data Wrangling - Extract, Replace, Remove, Convert 14 Part 2
  10. Agenda 1. Import CSV / Excel Data 2. Convert Data

    Type 3. Create Calculations 4. Filter 5. Replace Values 6. Join - Add Columns from Another Data Frame 7. Summarize (Aggregate) Data 8. Condition - IF_ELSE / CASE_WHEN 9. Merge - Add Rows from Other Data Frames 10. Transform Wide Data to Long Data 11. Text Data Wrangling - Extract, Replace, Remove, Convert 15
  11. 17 Each row represents an order of a product from

    a customer at a given date at this store.
  12. Country ച্ US 100 US 120 US 200 Japan 300

    Japan 200 Japan 190 Country Counts Sales US 3 140 Japan 3 230 18 Summarize
  13. 25 Assign Columns to Value to summarize the values with

    summarize functions (e.g. Sum, Mean, etc.)
  14. 28 Sometimes, you want to do something on top of

    the summarized data such as creating new calculations, joining with other data frames, etc.
  15. 31 Select the columns for Group by and Value, the

    same way we do with the Summarize Table under Chart view.
  16. 34 Once it’s done, you’ll see the result and a

    new step being added to the Step section.
  17. 36 Once you add a new step you can check

    how the data is under the Summary view.
  18. Agenda 1. Import CSV / Excel Data 2. Convert Data

    Type 3. Create Calculations 4. Filter 5. Replace Values 6. Join - Add Columns from Another Data Frame 7. Summarize (Aggregate) Data 8. Condition - IF_ELSE / CASE_WHEN 9. Merge - Add Rows from Other Data Frames 10. Transform Wide Data to Long Data 11. Text Data Wrangling - Extract, Replace, Remove, Convert 37
  19. 39 We want to label the regions whose sales are

    greater than $1,000 as ‘High’, otherwise ‘Low’. Market Region Sales Marketing Volume Europe UK 1500 200 High Europe France 200 3 Low Europe Germany 1100 150 High Europe Italy 300 50 Low North America US 3000 500 High
  20. 43 Click the New Value button to open a dialog,

    and enter the word ‘High’. This is actually the calculation editor, so you can type column names or create calculations, too!
  21. 45 You can select either create a new column or

    overwrites the existing column.
  22. 46 A new column with the labels (High and Low)

    is created and a new step is added to the Wrangling Step area.
  23. You can create a single condition with the UI and

    copy the syntax so that you can paste it in the Custom tab to start quickly!
  24. 52 AND OR Sales > 200000 & Marketing > 30000

    Sales > 200000 | Marketing > 30000
  25. Agenda 1. Import CSV / Excel Data 2. Convert Data

    Type 3. Create Calculations 4. Filter 5. Replace Values 6. Join - Add Columns from Another Data Frame 7. Summarize (Aggregate) Data 8. Condition - IF_ELSE / CASE_WHEN 9. Merge - Add Rows from Other Data Frames 10. Transform Wide Data to Long Data 11. Text Data Wrangling - Extract, Replace, Remove, Convert 54
  26. 57 Go to the main data frame to which you

    want to add the other data frames.
  27. 58 You can add the data at any of the

    steps. This time, we’ll add it right after the data source step so that we can apply all the data wrangling steps we have created so far for the merged data together.
  28. 62 The ‘ID’ column holds the information about which of

    the original data frame each row is coming from.
  29. 63 You can see how many rows are coming from

    each of the original data frames.
  30. 64 When you click on the last step it will

    automatically apply all the data wrangling steps to the newly merged data.
  31. Agenda 1. Import CSV / Excel Data 2. Convert Data

    Type 3. Create Calculations 4. Filter 5. Replace Values 6. Join - Add Columns from Another Data Frame 7. Summarize (Aggregate) Data 8. Condition - IF_ELSE / CASE_WHEN 9. Merge - Add Rows from Other Data Frames 10. Transform Wide Data to Long Data 11. Text Data Wrangling - Extract, Replace, Remove, Convert 66
  32. 71 But, this way of having the data makes it

    difficult to visualize. Ideally, you want to select a single column that holds the date information.
  33. 72 This type of data is called ‘Wide’ data. New

    data grows to the horizontal direction.
  34. 73 On the other hand, if the new observations are

    added in a vertical direction we call it ‘Long’ data.
  35. 76 Select the multiple ‘date’ columns with the Shift key

    and select ‘Gather’ from the column header menu. You don’t need to select all the columns, you can adjust the column selection in the Gather dialog.
  36. 77 Inside the Gather dialog, make sure that the appropriate

    columns are selected for the Start and the End.
  37. Alternatively, you can select the columns by excluding the ones

    from the target for the Gather operation. This is useful when you expect that new columns will be added for the new dates in future because you don’t need to hard-code the column names for the target.
  38. 81 If you find that the data type is not

    appropriate you can fix it.
  39. 83 It will open the ‘Create Calculation’ dialog with the

    ‘ymd’ function. Click the Run button
  40. 85 Go to the Chart view, and move the Pin

    button to the last step by drag- and-drop.
  41. 86 Assign the ‘Date’ column to X-Axis and the ‘Change’

    column to Y-Axis to see the mobility trend.
  42. 88 And, assign ‘sub-region’ column to Color so that we

    can see the mobility trend by State.
  43. Agenda 1. Import CSV / Excel Data 2. Convert Data

    Type 3. Create Calculations 4. Filter 5. Replace Values 6. Join - Add Columns from Another Data Frame 7. Summarize (Aggregate) Data 8. Condition - IF_ELSE / CASE_WHEN 9. Merge - Add Rows from Other Data Frames 10. Transform Wide Data to Long Data 11. Text Data Wrangling - Extract, Replace, Remove, Convert 89
  44. 90 When you want to clean or transform text data

    you can use Text Wrangling UI, which can be accessed from the column header menu.
  45. The Preview area shows you the original values in an

    aggregated format so that you can see how many rows there are for each value.
  46. 98

  47. 104

  48. Multiple investor names are presented together in a single column.

    We want to separate them so that each investor will be in each row.
  49. 111 Softbank Group Softbank Remove ‘Group’ We want to make

    ‘Softbank Group’ to be ‘Softbank’ by removing ‘Group’ letters.
  50. 115 There are other letters we want to remove to

    make all the ‘Softbank’ related entries to be ‘Softbank’.
  51. 117 Select ‘Text (Multiple Candidates)’ as the operation, type ‘Group,

    Corp., Capit as the letters to be removed, and click the ‘Preview’ button.
  52. 120 *OWFTUPST 4PGUCBOL$BQJUBM 4FRVPJB$BQJUBM$IJOB *OWFTUPST 4PGUCBOL 4FRVPJB$BQJUBM$IJOB Let’s say we

    want to remove ‘Capital’ only when it is located at the end of the value.
  53. 121 By selecting ‘End’ position, you can remove the specified

    letters only when it is located at the end.
  54. 123

  55. 125 You can check the result by using the Table

    filter under the Table view.
  56. 130 By checking the ‘Show Invisible Characters’ we can see

    where the extra spaces are and how they are about to be removed.
  57. You want to make sure to select the right type

    that matches with your data.