Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Governance in the world privacy

Data Governance in the world privacy

In this presentation I talk about a problem statement and discuss how a data governance framework can deal with all kinds of unstructured data and agile processes that allow implementing privacy controls grounds up or implementing privacy controls in the existing system.


Toufiq Ali

August 01, 2020


  1. Data governance in the world of privacy Privacy by Design

    Blind Spots Toufiq Ali Principal Cybersecurity Engineer www.linkedin.com/in/toufiq-ali 1st August 2020 Data Governance Meetups at Hasgeek
  2. About me • Cybersecurity Professional with over a decade of

    experience in the field • Passionate about tailoring offensive and defensive practices in Cybersecurity • Active member of https://null.community • Recent project: https://jobs.null.community • LinkedIn: https://www.linkedin.com/in/toufiq-ali • Twitter: https://twitter.com/p0wnsauc3
  3. Key Take Away • Unstructured Data – Privacy implications •

    Privacy By Design blind spots - examples • Agile practices for implementing privacy controls
  4. What is data governance? Data governance is a set of

    principles and practices that ensure high quality through the complete lifecycle of your data. Ref: https://profisee.com/data-governance-what-why-how-who/
  5. Data Governance Framework Ref: https://profisee.com/data-governance-what-why-how-who/

  6. Blind spots • Unstructured Data • Inventory of systems that

    store/process/transmit data • Masking / Tokenizing / Anonymizing / Encrypting • You cannot protect something that you don’t know exists!
  7. Unstructured Data • Anything that does not have pre-defined data

    models or schema • For e.g.: • Sensitive data exchanged over emails, documents, text files etc. • Sensitive data stored on public S3 buckets • Data shared on social media • Data shared indirectly with Suppliers / Partners
  8. Inventory of systems that store/process/transmit data • You think your

    data in on stored in RDBMS • What about: • Poorly designed API’s. • Query String --> e.g. PasswordReset Tokens • Data Analytic systems (especially the one that record sessions) • Logs --> Webserver/Proxy/LB/errors/Waf • Support systems
  9. Case Study: Business Requirement • API designed for partners that

    returns all the attributes about a user stored in your DB after successful authentication • Partner 1 needs 5 attributes • Partner 2 needs 6 attributes • Partner 3 needs 2 attributes • ….. • …….. • ………. • Partner n needs 7 attributes
  10. Potential designs/1 • Create one API, send all the user

    details upon successful authentication. Let each partner choose the fields they want • Pros: Time to Market, Less Overhead, Quicker turn arounds • Cons: Not data privacy friendly • Create multiple versions of the API for each partner and send them only what they have signed up for • Pros: Data privacy friendly • Cons: Time to Market, More Overhead, Longer turn around
  11. Potential designs/2 • Create one API, and create views for

    each partner, thereby sharing only data they need (alert: cambridge analytica + Facebook scandal) • Pros: Efficient, Scalable, Provide intended access • Cons: Governance and oversight Blind spot: Poorly designed API’s can leak sensitive customer data that was not intended for sharing.
  12. Data Analytic systems (especially the one that record sessions)/1 •

    When the page loads à Entire link along with code will be submitted to any tracker implemented on the page • When you click on the Save button à Referrer to the next page is the URL you see in the browser Blind spot : Any sensitive data that is send in query string is will be present in web server logs, leaked via Referrer, submitted to analytic trackers if implemented, network device logs (if request visible in clear text)
  13. Query String --> e.g. Password Reset Tokens Ref: https://youtu.be/i-VXFqQb3yg

  14. Data Analytic systems (especially the one that record sessions)/2 •

    Glassboxà https://techcrunch.com/2019/02/06/iphone-session-replay- screenshots/ • Fullstory à https://www.wired.com/story/the-dark-side-of-replay- sessions-that-record-your-every-move-online/ Blind spot : When recording user sessions on your website or apps, you are also collecting and storing data on platform you don’t own. You trust but don’t verify.
  15. Logs --> Webserver/Proxy/LB/Errors/Waf • Web server will log URL +

    Query String(everything visible in address bar) • Corporate Proxies SSL intercept traffic your traffic • Application Load Balancers can see your traffic in clear text • Web Application Firewalls (Waf) can you see your traffic in clear text • Packet capturing devices
  16. Masking / Tokenizing / Anonymizing / Encrypting • When do

    you mask the data? • When do you Tokenize/ Anonymize the data? • When do you encrypt the data?
  17. Data Masking • Consistency - Masked in a use case

    & not masked in another • Patterns - Disparate masking rules • Client-side masking – inspect element or view-source • Unmasked Data in logged transactions • Depending on the tech stack used • https://discuss.elastic.co/t/mask-data/124694 • https://www.elastic.co/blog/kibana-custom-field-formatters • https://www.elastic.co/guide/en/kibana/current/scripted-fields.html • https://docs.splunk.com/Documentation/DSP/1.1.0/User/Masking • https://docs-snaplogic.atlassian.net/wiki/spaces/SD/pages/877103639/Mask • https://docs.snowflake.com/en/user-guide/security-column-ddm-use.html
  18. Data Tokenization / Anonymization • Genuine need to capture sensitive

    data • Genuine need to share data with vendors/suppliers/partners (Digital Marketing, Hackathons) • Data Science – Machine Learning & AI • Thumb Rules: • When you want to rebuild the data-set – Tokenize it (email address) • When you only care about the data structure – Anonymize it (passport number, pan number etc.)
  19. Data Encryption • Data in transit – SSL, TLS •

    Data at rest – Backups, Data Stores, Data Warehouse, Data Lakes • Encryption is not the silver bullet to protect everything sensitive • Encoding data and calling it encryption • Choosing the incorrect cryptography type (Symmetric, Asymmetric) • Writing your own Encryption routines • You can't have 100 percent security and then have 100 percent privacy and zero inconvenience • Ref: https://guardtime.com/blog/6-reasons-why-encryption-isnt-working
  20. Business IT/Engineering Data Privacy Officer Cybersecurity Legal Audit Data Catalog

    Data Classification Data Governance CONTINUOUS ASSURANCE
  21. Lessons Learnt • Catalog your privacy related data • Discover

    privacy related data in your existing systems • Don’t store data you don’t need • Define technical controls to protect sensitive data • Build – Measure – Learn - Feedback - Loop • Trust but always verify!
  22. Some open source scripts • Open source projects that you

    can extend: • https://github.com/dxa4481/truffleHog • https://github.com/ezekg/git-hound • https://github.com/Yelp/detect-secrets • https://github.com/michenriksen/gitrob • https://github.com/microsoft/presidio
  23. Thank you! Any questions?