Exploratory Data Catalog - Democratizing Data within Organizations

19fc8f6113c5c3d86e6176362ff29479?s=47 Kan Nishida
December 11, 2019

Exploratory Data Catalog - Democratizing Data within Organizations

Kan is presenting Exploratory Data Catalog, a new solution from Exploratory to help you democratize data within your organization.

Kan talks about the common challenges when trying to Democratize Data within organizations and shows you how Exploratory Data Catalog can address them with a demo.

19fc8f6113c5c3d86e6176362ff29479?s=128

Kan Nishida

December 11, 2019
Tweet

Transcript

  1. Exploratory Seminar #23 Exploratory Data Catalog Democratize Data

  2. EXPLORATORY

  3. Kan Nishida CEO/co-founder Exploratory Summary Beginning of 2016, launched Exploratory,

    Inc. to democratize Data Science. Prior to Exploratory, Kan was a director of product development at Oracle leading teams for building various Data Science products in areas including Machine Learning, BI, Data Visualization, Mobile Analytics, Big Data, etc. While at Oracle, Kan also provided training and consulting services to help organizations transform with data. @KanAugust Speaker
  4. Mission Make Data Science Available for Everyone

  5. Questions Communication Data Access Data Wrangling Visualization Analytics (Statistics /

    Machine Learning) Data Analysis Data Science Workflow
  6. Questions Communication (Dashboard, Note, Slides) Data Access Data Wrangling Visualization

    Analytics (Statistics / Machine Learning) Data Analysis What you can do with Exploratory
  7. EXPLORATORY

  8. Exploratory Seminar #23 Exploratory Data Catalog Democratize Data

  9. Mission Make Data Science Available for Everyone

  10. Give a Man a Fish, and You Feed Him for

    a Day. Teach a Man To Fish, and You Feed Him for a Lifetime
  11. We build a tool to do Data Science easier and

    teach how to use Data Science to gain deeper insights from data.
  12. Exploratory Desktop Tool Teach Data Science Booster Training Online Seminar,

    Tutorials
  13. But…

  14. Democratizing Data Science is still Hard!

  15. Common Problems • We don’t have an access to data

    sources, so we need to ask someone to get the data for us. • We don’t know which one is the right data, there are too many spreadsheets flying around by emails. • Since data wrangling takes up most of our time hence we don’t have enough time left for analyzing data with Statistics and Machine Learning algorithms.
  16. Before Democratizing Data Science…

  17. We need to democratize Data!

  18. 3 Problems for Democratizing Data • Data Access • Data

    Governance • Data Readiness
  19. 3 Problems for Democratizing Data • Data Access • Data

    Governance • Data Readiness
  20. Data Access “We want everyone to do customer retention analysis

    by using data from our payment system, but we can’t expose our customers’ detail information to everyone.“
  21. Payment Data Customer Detail Customer Visit Data Who has Access

    to Data Source?
  22. Payment Data Customer Detail Customer Visit Data Who has Access

    to Data Source?
  23. Payment Detail Aggregated by Customer Anonymize Customer Detail Payment Data

    What level of data users have access?
  24. Payment Detail Aggregated by Customer Anonymize Customer Detail Payment Data

  25. Payment Detail Aggregated by Customer Anonymize Customer Detail Payment Data

  26. Payment Detail Aggregated by Customer Anonymize Customer Detail Payment Data

  27. Data Access • We want to create an environment where

    everyone can access any data. • But, in reality, we can’t let everyone access any data source. • It is dangerous to allow anyone to share our customers private information with anyone without any oversight.
  28. 3 Problems for Democratizing Data • Data Access • Data

    Governance • Data Readiness
  29. “There are many spreadsheet data flying around via Emails, Slack,

    Google Docs, or random folders at document sharing servers. But, nobody is really sure which ones are the right ones to look at.” Data Governance
  30. Which one is the right Sales data?

  31. What did they do to the data?

  32. • There are many similar data, we don’t know which

    one is the one to analyze. • Spreadsheets are getting copied and updated but we don’t know who updated and how it’s done. • We don’t know the context of the data and don’t know the meaning of each column. Data Governance
  33. 3 Problems for Democratizing Data • Data Access • Data

    Governance • Data Readiness
  34. “Every time when we try to analyze data we end

    up spending so much time cleaning and transforming data, and often we run out of time before getting to analyzing the data.” Data Readiness
  35. Data Wrangling • Most data is not ready for visualizing

    & analyzing without cleaning & transforming. • Always want to use the latest data. Data Readiness
  36. This is an old problem. And there is an old

    solution.
  37. Data Warehouse!

  38. IT / Data Engineers

  39. IT / Data Engineers Bottleneck

  40. • Slow, Expensive, Hard to maintain. • Don’t have enough

    resources. • Dependency on IT & Data Engineers. • Requirements for data continue to evolve. IT / Data Engineers
  41. Business needs to move quickly, so do you!

  42. Why do you want to Democratize Data Science again?

  43. We want to analyze data by ourselves to gain better

    insights from data quickly.
  44. Data Access Data Governance Data Readiness Self-service

  45. Exploratory Data Catalog

  46. Exploratory BI Excel R / Python / JS DB Cloud

    Files Web Pages Exploratory Data Catalog Schedule Data Catalog Web UI REST API
  47. 1. Prepare Data 2. Publish 4. Schedule Life Cycle of

    Data Catalog 3. Share 6. Reproduce & Extend 5. Discover
  48. 1. Prepare Data 2. Publish 4. Schedule 1. Prepare Data

    3. Share 6. Reproduce & Extend 5. Discover
  49. Exploratory Data Import Exploratory Data Catalog Schedule Data Catalog Web

    UI REST API 1. Prepare Data - Import Data
  50. Import Data from Cloud Apps

  51. Import Data from Database

  52. Import File Data

  53. Exploratory Data Wrangling Exploratory Data Catalog Schedule Data Catalog Web

    UI REST API 1. Prepare Data - Data Wrangling
  54. Clean and Transform Data

  55. 1. Prepare Data 2. Publish 4. Schedule Life Cycle of

    Data Catalog 3. Share 6. Reproduce & Extend 5. Discover
  56. Exploratory 2. Publish Exploratory Data Catalog Schedule Data Catalog Web

    UI REST API
  57. 2. Publish

  58. Browse Published Data - 3 Views Summary Table Metadata

  59. Stats and charts gives you a quick summary of your

    data. Summary
  60. Filter, Sort, and Visual Bar! Table

  61. Metadata • You can describe your data with Markdown text.

    • With Data Dictionary, you can provide a description for each column.
  62. 1. Prepare Data 2. Publish 4. Schedule 3. Share 3.

    Share 6. Reproduce & Extend 5. Discover
  63. 3. Share

  64. BI Excel R / Python / JS • Share in

    Private or Public mode. • An invitation will be sent to those you have shared with for the Privately shared data. • Those who are shared can create FREE accounts and browse and download the data. 3. Share Exploratory Data Catalog Schedule Data Catalog Web UI REST API Exploratory
  65. 1. Prepare Data 2. Publish 4. Schedule 4. Schedule 3.

    Share 6. Reproduce & Extend 5. Discover
  66. DB Cloud Files Web Pages Schedule - Automate Data Extraction

    and Wrangling BI Excel R / Python / JS Exploratory Data Catalog Schedule Data Catalog Web UI REST API Exploratory
  67. 4. Schedule Automate for extracting data from the data sources

    and transform the data by scheduling. Your data is always up-to-date even without opening Exploratory Desktop.
  68. 1. Prepare Data 2. Publish 4. Schedule 5. Discover 3.

    Share 6. Reproduce & Extend 5. Discover
  69. BI Excel R / Python / JS • My Insight

    • Insight Page • Tag Page 5. Discover Data Exploratory Data Catalog Schedule Data Catalog Web UI REST API Exploratory
  70. All your data or the data someone have shared with

    you in one place. 5. Discover - My Insight
  71. 5. Discover - Insight Page

  72. Search Rank Tag Author Insight Page

  73. Tag Page

  74. 1. Prepare Data 2. Publish 4. Schedule 6. Reproduce &

    Extend 3. Share 6. Reproduce & Extend 5. Discover
  75. BI Excel R / Python / JS Import Directly from

    Data Catalog Exploratory Schedule Data Catalog Web UI REST API Exploratory Data Catalog
  76. • Import as EDF (Exploratory Data Format) • Import the

    Final Result as CSV 6. Reproduce & Extend
  77. • Import as EDF (Exploratory Data Format) • Import the

    Final Result as CSV 6. Reproduce & Extend
  78. EDFʢExploratory Data Formatʣ Reproduce all the data wrangling steps.

  79. EDFʢExploratory Data Formatʣ All associated data are reproduced.

  80. • Import as EDF (Exploratory Data Format) • Import the

    Final Result as CSV 6. Reproduce & Extend
  81. BI Excel R / Python / JS Import Directly from

    Data Catalog Exploratory Schedule Data Catalog Web UI REST API Exploratory Data Catalog
  82. Data Catalog Data Source Access all data you have access

    directly inside Exploratory Desktop.
  83. Import!

  84. Re-Import Click Re-Import button to re- import the latest data

    when the shared data is updated at the Exploratory Server.
  85. There are various data that are shared publicly at Exploratory

    Server such as GDP, Population, Unemployment, etc. Public Data
  86. Data Wrangling Extend Exploratory Data Catalog Exploratory

  87. Sales Sales x GDP

  88. 3 Problems for Democratizing Data • Data Access • Data

    Governance • Data Readiness
  89. Exploratory Data Catalog • Data Access • Data Governance •

    Data Readiness
  90. Data Access

  91. Files Manage Data Access BI Excel R / Python /

    JS Exploratory Data Catalog User Access Management Exploratory Desktop Exploratory Desktop
  92. Data Access • You can decide: • Which data to

    be share data. • Which level of data to be shared. • No need to share the data source, but if you want you can.
  93. Data Governance

  94. DB Cloud Files Web Pages Single Source of Truth /

    Reproducible Data Sharing BI Excel R / Python / JS Exploratory Data Catalog Exploratory Desktop Exploratory Desktop Schedule Data Catalog Web UI REST API
  95. Data Governance • All the data in a single place.

    • Easier to discover data. • Reproduce the data and Know how it was prepared.
  96. Data Readiness

  97. DB Cloud Files Web Pages Data Wrangling as Service BI

    Excel Exploratory Data Catalog Exploratory Schedule Connection Wrangling Exploratory
  98. Data Readiness • Easy to prepare data and share. •

    Automate the data extraction and wrangling. • Not everyone needs to do the same data wrangling again and again.
  99. One more thing…

  100. How often is my shared data used?

  101. Monitor Data Access

  102. Demo

  103. How can we start?

  104. You have it already!

  105. Exploratory Data Catalog Schedule Data Catalog Web UI REST API

  106. Exploratory Desktop BI Excel R / Python / JS Exploratory

    Desktop Exploratory Cloud Exploratory Data Catalog Go to https://exploratory.io/insight Schedule Data Catalog Web UI REST API
  107. We can’t publish data to Exploratory Cloud…

  108. Exploratory Collaboration Server

  109. Exploratory Cloud Exploratory Data Catalog Exploratory Desktop BI Excel R

    / Python / JS Exploratory Desktop https://exploratory.io Schedule Data Catalog Web UI REST API
  110. Exploratory Data Catalog Schedule Data Catalog Web UI REST API

    Exploratory Desktop Scheduled Auto Data Wrangling BI Excel Discover - Data View - Dictionary - API R / Python / JS Exploratory Desktop Exploratory Cloud Exploratory Data Catalog Exploratory Collaboration Server Linux Server / AWS / GCP / Azure / etc. Firewall
  111. Exploratory Desktop Scheduled Auto Data Wrangling BI Excel Discover -

    Data View - Dictionary - API R / Python / JS Exploratory Desktop Exploratory Cloud Exploratory Data Catalog Exploratory Collaboration Server Linux Server / AWS / GCP / Azure / etc. Firewall
  112. Exploratory Collaboration Server is not just for Data Catalog

  113. Exploratory Desktop BI Excel R / Python / JS Exploratory

    Desktop Scheduled Auto Data & Insight Generation Discover - Data View - Dictionary - API Exploratory Collaboration Server Data Catalog Linux Server / AWS / GCP / Azure / etc. Insight Catalog Discover - Insight View - Dashboard, Note User Management Sharing Management
  114. Insight Catalog

  115. 1. Create Insights 2. Publish 4. Schedule Data Life Cycle

    of Insights 3. Share 6. Reproduce & Extend 5. Discover
  116. 1. Create Insight with Note

  117. 1. Create Insight with Dashboard

  118. 2. Publish Insight

  119. 3. Share Insight

  120. 4. Schedule Insight

  121. 5. Discover Insights - My Insight

  122. Search Rank Tag Author 5. Discover Insights - Insight Page

  123. 6. Reproduce & Extend

  124. Price Starts from $2,498

  125. Q & A

  126. Information Email kan@exploratory.io Website https://exploratory.io Exploratory Collaboration Server https://exploratory.io/collaboration-server Twitter

    @KanAugust
  127. EXPLORATORY