Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Exploratory v6.3 - Introducing New Features

Exploratory v6.3 - Introducing New Features

In this seminar, I'm going to introduce some of the new features of Exploratory v6.3.

- Performance improvements
- Summary View Enhancements
- Analytics - Prediction
- Chart - Summary Table
- Chart - Repeat By with Multiple Y-Axis
- Data Wrangling - Text Wrangling UI Enhancements

19fc8f6113c5c3d86e6176362ff29479?s=128

Kan Nishida
PRO

January 13, 2021
Tweet

Transcript

  1. EXPLORATORY Online Seminar #29 Exploratory v6.3

  2. Kan Nishida CEO/co-founder Exploratory Summary In Spring 2016, launched Exploratory,

    Inc. to democratize Data Science. Prior to Exploratory, Kan was a director of product development at Oracle leading teams to build various Data Science products in areas including Machine Learning, BI, Data Visualization, Mobile Analytics, Big Data, etc. While at Oracle, Kan also provided training and consulting services to help organizations transform with data. @KanAugust Speaker
  3. 3 Data Science is not just for Engineers and Statisticians.

    Exploratory makes it possible for Everyone to do Data Science. The Third Wave
  4. 4 Questions Communication Data Access Data Wrangling Visualization Analytics (Statistics

    / Machine Learning) Data Analysis Data Science Workflow
  5. 5 Questions Communication (Dashboard, Note, Slides) Data Access Data Wrangling

    Visualization Analytics (Statistics / Machine Learning) Data Analysis ExploratoryɹModern & Simple UI
  6. EXPLORATORY Online Seminar #29 Exploratory v6.3

  7. • Performance • Summary View • Analytics • Chart •

    Data Wrangling v6.3 New Features
  8. Performance

  9. • Switching between Summary / Table / Chart / Analytics

    views • Switching between Data Frames • Moving between the Data Wrangling Steps • Rendering of the Chart graphics • Opening and Closing Projects Performance Improvements in UI Rendering:
  10. Create directly from the Summary View’s Correlation Mode. • Charts

    • Prediction Models • Statistical Tests Summary View - Correlation Mode
  11. Summary View - Correlation Mode

  12. You can now create Analytics, Stats Tests, and Charts directly

    from the Summary view.
  13. For example, you can create a scatter chart by selecting

    the menu.
  14. You can explore the correlation between the two variables in

    details.
  15. You can also select multiple columns to build prediction models.

  16. The selected columns will become the predictor variables.

  17. • Performance • Summary View • Analytics • Chart •

    Data Wrangling v6.3 New Features
  18. • Prediction with Another Data • Setup Base Level •

    Hypothesis Test - Probability Distribution • Test Mode & Summary Metrics for Cox Regression & Survival Forest Analytics
  19. Now, you can use the models you have built under

    Analytics view and predict with another data frame! Prediction!
  20. 1. Build a Prediction Model under Analytics view.

  21. 2. Open a data frame you want to predict.

  22. 3. Select ‘Predict with Model (Analytics)’ from the Step menu.

  23. 4. Select the model you have created under the Analytics

    view.
  24. A new column with predicted values will be added.

  25. • Prediction with Another Data • Setup for Base Level

    • Hypothesis Test - Probability Distribution • Test Mode & Summary Metrics for Cox Regression & Survival Forest Analytics
  26. With the Statistical Learning models, a coefficient of a given

    categorical variable can be interpreted in comparison to the base level category. “The monthly income of the Research Director would be about $4,096 higher compared to the base level, Sales Executive.”
  27. The most frequent value becomes the base level by default,

    but you can change this quickly inside the Analytics view now.
  28. Click the Edit icon for the variable you want to

    change the base level.
  29. Select a value for the base level.

  30. Now, the coefficients for Job Role variable are interpreted in

    comparison to ‘Manager.’ “The monthly income of the Healthcare Rep. would be about $4,119 lower compared to Manager (base level).”
  31. • Prediction with Another Data • Setup for Base Level

    • Hypothesis Test - Probability Distribution • Test Mode & Summary Metrics for Cox Regression & Survival Forest Analytics
  32. You can use the statistical tests to see if there

    is a significant difference in your data. For example, ’t Test’ evaluates if a given difference between the means (average) of two groups is significant or due to a chance.
  33. A probability of getting the t Value (or more extreme

    values) under the ‘null hypothesis’ that assumes no difference in Monthly Income between Male and Female. What is P Value?
  34. t Value You can now visualize where the given t

    value resides in the underlying ’t distribution’ (Probability distribution) curve.
  35. Significant Level for P Value

  36. You can run the test for multiple groups at once

    by using Repeat By.
  37. Can you spot if any of the Job Role has

    the t value that is inside the ‘significant’ area?
  38. • Repeat By with Multiple Y-Axis • ‘Repeat By’ Layout

    Setting • Quick Window Calculation • New Component: Summarize Table • Trend Line: Logistic Regression, Poisson Regression Chart
  39. • Repeat By with Multiple Y-Axis • ‘Repeat By’ Layout

    Setting • Quick Window Calculation • New Component: Summarize Table • Trend Line: Logistic Regression, Poisson Regression Chart
  40. It’s hard to compare multiple measures when they are in

    different scales.
  41. You can assign a measure to Y2 Axis but still

    a problem when you have many measures in different scales.
  42. Now, you can separate the measures to different charts by

    selecting ‘Each Y Axis’ from the ‘Repeat By’ menu.
  43. Each Y-Axis measure has its own chart that is separated

    from each other.
  44. • Repeat By with Multiple Y-Axis • ‘Repeat By’ Layout

    Setting • Quick Window Calculation • New Component: Summarize Table • Trend Line: Logistic Regression, Poisson Regression Chart
  45. You can change the layout directly inside the chart area.

    You no longer need to open a separate configuration dialog!
  46. You can assign a different marker to each Y-Axis chart.

    Bar Line
  47. You can apply a different Window Calculation to each Y-Axis

    chart. When you are looking at the ratio it’s convenient to see the actual values along.
  48. • Repeat By with Multiple Y-Axis • ‘Repeat By’ Layout

    Setting • Quick Window Calculation • New Component: Summarize Table • Trend Line: Logistic Regression, Poisson Regression Chart
  49. You can apply the window calculations quickly from the menu.

  50. • Repeat By with Multiple Y-Axis • ‘Repeat By’ Layout

    Setting • Quick Window Calculation • New Component: Summarize Table • Trend Line: Logistic Regression, Poisson Regression Chart
  51. Pivot Table

  52. • You can’t apply a different number format (currency, percent,

    etc.) for each column. • You can’t apply a different Color Setting for each column. • The format option is basic. • Only Grand Total is supported, Sub-total is not supported. Limitations with Pivot Table
  53. These limitations are mainly due to the fact that the

    Pivot table is designed to support assigning a column to ‘Column’ so that each value of the assigned column becomes its own column of the output.
  54. Summarize Table

  55. Pivot Table without ‘Column’, but with more features!

  56. Each column has a ‘Format’ menu with which you can

    configure each column.
  57. You can configure the number format and the background color

    for each column.
  58. You can apply various color formatting including ‘Conditional Color Formatting’!

  59. Not only Grand Total, but also Sub Total!

  60. • Repeat By with Multiple Y-Axis • ‘Repeat By’ Layout

    Setting • Quick Window Calculation • New Component: Summarize Table • Trend Line: Logistic Regression, Poisson Regression Chart
  61. None
  62. • Summarize Step: Selecting All Numerical Columns • Updates for

    Text Data Wrangling UI Data Wrangling
  63. Data Wrangling • Summarize Step: Selecting All Numerical Columns •

    Updates for Text Data Wrangling UI
  64. You can now select all the numerical columns at once

    and summarize with a same function (e.g. mean).
  65. Data Wrangling • Summarize Step: Selecting All Numerical Columns •

    Updates for Text Data Wrangling UI
  66. Updates for Text Data Wrangling UI • Easier Access -

    Reorganized the Column Header Menu • Better Experience - UI Updates • More Capability - New Functions
  67. Easier Access - Reorganized the Column Header Menu

  68. Better Experience - UI Updates

  69. It shows you the original values in an aggregated format

    so that you can see how many rows there are for each value.
  70. You can search values that contains a certain text.

  71. Pagenation, now performance doesn’t suffer when too many rows!

  72. You can visualize the spaces, tabs, and new lines.

  73. Once you click on the Preview button you can check

    if it’s working as expected or not.
  74. More Capability - New Functions

  75. You can now use Regular Expression to match the text,

    if you are familiar with it.
  76. Something like the above can be done much easier with

    a new ‘Text (Multiple Candidates)’ option without a need of using the Regular Expression!
  77. Remove or Replace URL.

  78. Emojiʂ

  79. Extra Spaces

  80. You can extract (or remove or replace) the text inside

    specified characters.
  81. Q & A

  82. EXPLORATORY Online Seminar #30 Exploratory Data Analysis

  83. None
  84. Information Email kan@exploratory.io Website https://exploratory.io Twitter @ExploratoryData Seminar https://exploratory.io/online-seminar

  85. EXPLORATORY 85