Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How to Use Exploratory

How to Use Exploratory

This is a walk through of 'how to use Exploratory' covering the following topics.

- Working with Chart, Time Series, Pin
- Data Wrangling Introduction
- How to Use the Step
- Branch Data Frame
- Creating Dashboard and Note

I have collected the features that are useful and would make your data analysis work much more efficient and productive.

Kan Nishida

April 08, 2020
Tweet

More Decks by Kan Nishida

Other Decks in Technology

Transcript

  1. Kan Nishida CEO/co-founder Exploratory Summary Beginning of 2016, launched Exploratory,

    Inc. to democratize Data Science. Prior to Exploratory, Kan was a director of product development at Oracle leading teams for building various Data Science products in areas including Machine Learning, BI, Data Visualization, Mobile Analytics, Big Data, etc. While at Oracle, Kan also provided training and consulting services to help organizations transform with data. @KanAugust Speaker
  2. Data Science is not just for Engineers and Statisticians. Exploratory

    makes it possible for Everyone to do Data Science. The Third Wave
  3. First Wave Second Wave Third Wave Proprietary Open Source UI

    & Programming Programming 2016 2000 1976 Monetization Commoditization Democratization Statisticians Data Scientists Democratization of Data Science Algorithms Experience Tools Open Source UI & Automation Business Users Theme Users
  4. Questions Communication (Dashboard, Note, Slides) Data Access Data Wrangling Visualization

    Analytics (Statistics / Machine Learning) Data Analysis ExploratoryɹModern & Simple UI
  5. 1. Create a Project 2. Import Data 3. Quick Insights

    with Summary View 4. Convert Data Type (Character, Numeric, Date, etc.) 5. Create Charts - Limit Values, Grouping, Trend line, etc.) 6. Create Calculations 7. Filter Data 8. Edit, Move, Disable, & Delete Steps 9. Introduction to Chart Pinning 10. Introduction to Branch 11. Summarize (Aggregate) Data 12. Data Reproducibility 13. Create Dashboard and Publish 9
  6. 1. Open Airbnb New York Data Page 2. Download the

    Data 3. Import the Data Import Data
  7. A different type of charts and Summary Statistics are shown

    for each column depending on the data type. 29
  8. • numeric • character • Date • POSIXct • logical

    • Factor Data Type There are many data types that are supported in Exploratory, but these 6 types are the most common and good enough for most cases. 30
  9. A histogram chart is used to show the distribution of

    data. Numeric values are grouped into a set of the bars that have equal range and each bar shows the number of rows within each of the ranges. Numeric 31
  10. Underneath the histogram chart, you can see a series of

    summary statistics such as Average, Median, etc. 32 Numeric - Summary Statistics
  11. You can also see how many NA rows there are

    and its percentage for each column. In this case, ‘review_scores_rating’ column has 11,162 missing values and that is 22.06% of the total rows. 33 NA (Not Available/ Missing Values)
  12. A horizontal bar chart is used to see the top

    categorical values along with the number of rows information. 34 Character
  13. A horizontal bar chart is used to indicate how many

    rows there are for TRUE and FALSE values. 35 Logical (TRUE / FALSE)
  14. A histogram chart is used to show the distribution of

    data for the Date and the POSIXct columns. You can also find the date range by looking at the Min and Max of the data. 36 Date / POSIXct (Date & Time)
  15. Factor data type is similar to Character data type except

    that it can have the ‘order’ information. The horizontal bar chart shows the number of the rows for each value. The bars are sorted in a way that is defined inside the column. 37 Factor
  16. It will open ‘Mutate (Create Calculation) dialog with the ‘parse_number’

    function pre-populated inside the Calculation Editor. Simply, click ‘Run’ button.
  17. The ‘price’ column is converted to ‘numeric’ data type and

    it now shows a histogram chart with the summary statistics.
  18. You can change the data type for multiple columns at

    once. Select multiple columns by using Command key (Mac) or Control key (Windows).
  19. The selected columns are listed and ‘Convert Data Type’ is

    selected in the Calculation Type in the dialog. Simply, click ‘Run’ button.
  20. Notice that these operations are recorded at the right hand

    side as the Data Wrangling Steps. We’ll cover this in more details later.
  21. 1.Create a Bar Chart 2.Sort the Bars 3.Limit X-Axis Values

    4.Group with Color Create the 1st Chart
  22. 53 Let’s create a chart to answer this question. “Which

    neighborhoods have more places on Airbnb?”
  23. Select ‘Top’ for the limit type and set 50 so

    that only the top 50 neighborhoods with the most ‘Number of Rows’ will be shown.
  24. 1. Create a Line Chart 2. Change Date Aggregation Level

    3. Show a Trend Line 4. Show a Cumulative Sum with Window Calculation 5. Create Groups with Color Visualize Time Series Data
  25. “How many places have been added to Airbnb over the

    years?” 64 Let’s create a chart to answer this question.
  26. 1. Create a Line Chart 2. Change Date Aggregation Level

    3. Show a Trend Line 4. Show a Cumulative Sum with Window Calculation 5. Create Groups with Color
  27. 1. Create a Line Chart 2. Change Date Aggregation Level

    3. Show a Trend Line 4. Show a Cumulative Sum with Window Calculation 5. Create Groups with Color
  28. 1. Create a Line Chart 2. Change Date Aggregation Level

    3. Show a Trend Line 4. Show a Cumulative Sum with Window Calculation 5. Create Groups with Color
  29. The numbers are going ups and downs. In order to

    make it easier to see the overall trend, let’s show the ‘trend line’.
  30. 1. Create a Line Chart 2. Change Date Aggregation Level

    3. Show a Trend Line 4. Show a Cumulative Sum with Window Calculation 5. Create Groups with Color
  31. What if we want to know how many places have

    been added to Airbnb accumulatively, instead of how many places were added at each given month?
  32. 1. Create a Line Chart 2. Change Date Aggregation Level

    3. Show a Trend Line 4. Show a Cumulative Sum with Window Calculation 5. Create Groups with Color
  33. You can assign ‘neighborhood_group’ column to Color to see how

    the places have been added to Airbnb in each of the 5 boroughs accumulatively.
  34. 86 There are too many decimal digits. Let’s round the

    values and keep only two decimal digits.
  35. You can go to Summary view to check the distribution

    of ‘availability_rate’ values. 92
  36. Let’s say you want to filter the data to keep

    only the places in Manhattan and Brooklyn. 94
  37. Same type of the operations are added into the same

    step. Here, the 2 calculations have been added to the same ‘Mutate’ step. 105
  38. 106 You can click the token to open the dialog

    to update the configuration.
  39. Let’s say you want to move the step 5 (Filter)

    to right before the step 4 (Mutate). 112
  40. Each token is now in its own step. This is

    useful when you want to see how the data change by each operation.
  41. You can also combine multiple steps into one step as

    long as they are the same type of operation (e.g. Create Calculation)
  42. 126 We want to assign the ‘availability_rate’ column, which we

    have created at the previous step to Y-Axis. But, the column is not there in the dropdown!
  43. In order to use the step 5 data, you need

    to move the ‘Pin’ to the step 5.
  44. Now that this chart is ‘Pinned’ to the step 5,

    it will always references the data at the step 5 regardless of which step you select at the right hand side. 133
  45. The data at the step 5 has only Manhattan and

    Brooklyn, hence the chart shows only the two boroughs with two colors (Blue & Orange). 134
  46. 138 What if we want to see all the neighborhoods,

    not just for Manhattan and Brooklyn?
  47. Now you can compare the average availability rates of the

    top 50 neighborhoods from all 5 boroughs. 141
  48. You can click ‘Enable Step’ icon to enable the step.

    But for this tutorial, we’ll continue with this step being disabled. 142
  49. Looks there are a lot of places in Staten Island

    (Purple) with high availability rates. For example, ‘Fort Wadsworth’ is 100% available, that’s crazy!
  50. There is only one place in this neighborhood! And this

    place happens to have 1 (100%) for the availability rate.
  51. Instead of ‘Top 50’ neighborhoods, we want to show only

    the neighborhood with at least 100 places listed. 146
  52. Select ‘Condition’ for Type, select ‘Number of Rows’ for Based

    on, ‘greater than’ for Operator, and type 100 for Value.
  53. The neighborhoods in Queens (Red) tend to be more available

    while the neighborhoods with lower availability rates are in Brooklyn or Manhattan.
  54. Sometimes, you might want to create different versions of data

    from the same data. For example, you might want to create a data frame to aggregate the data by city or property_type while you want to keep the original data to be not aggregated. Creating different data frames that are separated from one another will create a maintenance nightmare. Instead, you can use Branch feature to ‘branch off’ from the original data frame and create multiple data frames that share the original data frame.
  55. Create Branch Main Data Frame Branch Data Frame Data Import

    Convert Data Type Create Calculations Filter Aggregate by City
  56. Import Excel Data Convert Data Type Create Calculations Filter Branch

    Data Frame 1 Branch Data Frame 2 Aggregate by City Top 10 Cities Aggregate by Host Clustering with K-Means You can create multiple branches from any steps. Main Data Frame
  57. Changes in Main Data Frame will propagate to only the

    related branches automatically. Import Excel Data Convert Data Type Create Calculations Filter Aggregate by City Top 10 Cities Aggregate by Host Clustering with K-Means Branch Data Frame 1 Branch Data Frame 2 Main Data Frame
  58. 161 You can see that the branch data frame is

    branched off from the step 5 of the main data frame.
  59. 162 What if you want this branch data frame to

    branch off from the step 3 instead of step 5?
  60. Let’s say we want to summarize the data by property_type

    so that each row represents each property_type. Summarize
  61. • The data is updated and saved as the same

    file. • The data is updated but saved as a separate file. 176 2 scenarios
  62. 2 scenarios • The data is updated and saved as

    the same file. • The data is updated but saved as a separate file. 177
  63. This will re-read the data from the same file and

    run all the data wrangling steps automatically. 179
  64. • The data is updated and saved as the same

    file. • The data is updated but saved as a separate file. 180 2 scenarios
  65. 182 Click ‘Change File’ button and select the newly updated

    data file and click ‘Apply’ button.
  66. Data will be imported and all the data wrangling steps

    will be applied automatically. 183
  67. Dashboard • Create a new Dashboard • Add Charts and

    Analytics • Add Numbers • Publish & Share 186
  68. Dashboard • Create a new Dashboard • Add Charts and

    Analytics • Add Numbers • Publish & Share 187
  69. Dashboard • Create a new Dashboard • Add Charts and

    Analytics • Add Numbers • Publish & Share 191
  70. 200

  71. Dashboard • Create a new Dashboard • Add Charts and

    Analytics • Add Numbers • Publish & Share 201
  72. Go back to the data frame and add a new

    chart to create the Numbers. 203
  73. Dashboard • Create a new Dashboard • Add Charts and

    Analytics • Add Numbers • Publish & Share 216
  74. 219 Private Mode Only the ones you explicitly share with

    can open your dashboard. Public Mode Anybody can open your dashboard.
  75. 221 Once it’s published, an unique URL is assigned to

    the Dashboard. Click ‘Open in Browser’ link to open the Dashboard in the web browser.
  76. Share with Invite 224 Type the email and click ‘Share’

    button. This will send an invite email.
  77. 225 Share with Invite The person who is invited can

    log in with her/his Exploratory account and open the Dashboard. If the person doesn’t have an Exploratory account then she/he can create it for FREE. The viewers can continue to view any contents at Exploratory Cloud as long as they are invited to view.
  78. 226 Share with URL You can also share your Dashboard

    with URL. This allows anyone with the URL to open the Dashboard without logging into Exploratory Cloud.
  79. Schedule Dashboard 227 You can schedule the dashboard to keep

    the data always up-to-date by querying against the data sources and applying all the data wrangling steps automatically. Note that you can schedule only the ones with remote data sources that can be accessed by Exploratory Cloud.