Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Governance 101

Data Governance 101

Data compliance, privacy and security is hard because:
* There is too much data
* There is too much complexity
* There is no context to data usage.

Automation is the only hope.

This talk introduces the first steps to automate data governance tasks to answer:
* Where is my data ?
* Who has access to the data ?
* How is the data used ?

We will discuss data governance automation examples for AWS Redshift, Snowflake and MySQL from Tokern Data Governance projects (https://tokern.io) .
The information from these tasks will set the foundation for an effective strategy for compliance, privacy and security.

Aac8cfb2226d6b77ae610105d6f9f0f2?s=128

vrajat

June 27, 2020
Tweet

Transcript

  1. Data Governance 101

  2. Agenda • • ◦ ◦ ◦ • ◦ ◦ ◦

  3. What is Data Governance? • • ◦ ◦ ◦

  4. Why is Data Governance hard? • • •

  5. There is too much data. • 1.7 megabytes every second,

    per person • 1/3 of all data cloud https://www.nodegraph.se/big-data-facts/ https://gawker.com/the-public-nyc-taxicab-database-that-accidentally-track-1646724546
  6. There is too much complexity ~ 1500 Data Technologies >7000

    Marketing Tech Companies >700 Sales Tech Companies
  7. There is no context for data usage • • •

    • • •
  8. Automation to ease data governance • • •

  9. Where is my sensitive data? • • •

  10. PIICatcher - Data Catalog of PII data https://github.com/tokern/piicatcher

  11. Data Lineage to track sensitive data • • • https://github.com/tokern/data-lineage

    https://tokern.io/docs/data-lineage/example/
  12. Who has access to sensitive data? 1 2 3

  13. How is sensitive data used? • • • •

  14. Query History in Snowflake

  15. Query History for MySQL • • • https://tokern.io/blog/proxysql-database-audit/

  16. Summary • • • • ◦ ◦ ◦ ◦ ◦