Protecting the Data Lake with Open Policy Agent

Ec30daaec6b4cda5d3446db1b78fb330?s=47 Ash
May 23, 2019

Protecting the Data Lake with Open Policy Agent

Talk from KubeCon EU 2019, showing how the Open Policy Agent can help to enforce custom security policies in a Data Lake Platform.

Ec30daaec6b4cda5d3446db1b78fb330?s=128

Ash

May 23, 2019
Tweet

Transcript

  1. Protecting the Data Lake

  2. A bit about Ash Narkar ! @ashtalk

  3. • Data Lake Overview • Open Policy Agent ◦ Community

    ◦ Features ◦ Use Cases • Use case deep dive ◦ Ceph Data Protection Agenda
  4. Data is King !

  5. Data is King ! • Pervasive • Abundant • Customer

    Experience • Revenue Growth
  6. Data is King ! • Pervasive • Abundant • Customer

    Experience • Revenue Growth • Cyber Attacks • Breaches • Fines • Loss of Customer Trust
  7. What Is A Data Lake?

  8. Data Lake Features • Centralized Content • Scalability • Multiple

    data type support • Resource optimization
  9. Data Lake Platform Sources Consumers

  10. Data Lake Platform: Kafka Features • Distributed streaming platform •

    Building real-time streaming data pipelines and applications Security Challenges • Authorization using Access Control Lists(ACLs) • How to authorize requests based on context, like user, IP, common name in certificate Security Policies • Consumers of topics containing PII must be whitelisted • Producers to topics with high fanout must be whitelisted
  11. Data Lake Platform: Ceph Features • Unified distributed storage system

    • Delivers object, block, and file storage Security Challenges • Security protocol handles only Ceph clients and servers. NO human users or applications Security Policies • Users can access only those buckets belonging to the same geographical region as them • Access based on a user’s Business Unit, Department etc.
  12. Data Lake Platform: Elasticsearch Features • Full-text search and analytics

    engine • Store, search and analyze Security Challenges • Authorization is not considered as part of job • User responsible for implementing access control Security Policies • Access control policies for a patient’s PHI
  13. Security Challenge Overview • Distinct systems • Changing security requirements

    ❌ Hardcoding policy ❌ Tight coupling ✅ Expressiveness ✅ Speed and performance ✅ Unified Solution
  14. Who can solve the Security Challenge ?

  15. What Is OPA?

  16. OPA: Community Inception Project started in 2016 at Styra. Goal

    Unify policy enforcement across the stack. Use Cases Admission control Authorization ACLs RBAC IAM ABAC Risk management Data Protection Data Filtering Users Netflix Medallia Chef Cloudflare State Street Pinterest Intuit Capital One ...and many more. Today CNCF project (Incubating) 59 contributors 800+ slack members 2000+ stars 20+ integrations
  17. Service OPA Policy (Rego) Data (JSON) Request Decision Query OPA:

    General-purpose policy engine
  18. • Declarative Policy Language (Rego) ◦ Can user X do

    operation Y on resource Z? ◦ What invariants does workload W violate? ◦ Which records should bob be allowed to see? • Library, sidecar, host-level daemon ◦ Policy and data are kept in-memory ◦ Zero decision-time dependencies • Management APIs for control & observability ◦ Bundle service API for sending policy & data to OPA ◦ Status service API for receiving status from OPA ◦ Log service API for receiving audit log from OPA • Tooling to build, test, and debug policy ◦ opa run, opa test, opa fmt, opa deps, opa check, etc. ◦ VS Code plugin, Tracing, Profiling, etc. OPA: Features Service OPA Policy (Rego) Data (JSON) Request Decision Query
  19. How does OPA work?

  20. How does OPA work? Salary Service V1 OPA Policy (Rego)

    Data (JSON) Request Decision Query Example policy "Employees can read their own salary and the salary of anyone they manage."
  21. How does OPA work? Example policy Employees can read their

    own salary and the salary of anyone they manage. Input Data method: "GET" path: ["salary", "bob"] user: "bob"
  22. 3 Steps to OPA Step 1: Clone OPA Repo

  23. 3 Steps to OPA Step 1: Clone OPA Repo Step

    2: Build OPA binary
  24. 3 Steps to OPA Step 1: Clone OPA Repo Step

    2: Build OPA binary Step 3: Execute OPA binary
  25. None
  26. Use Cases CLOUD Host DB Host sshd App Container HTTP

    API Microservice APIs Orchestrator Admission Control Container Execution, SSH, sudo Linux Risk Management Data Protection and Data Filtering
  27. OPA Use Case: Ceph Data Protection

  28. Ceph Architecture CEPH STORAGE CLUSTER (RADOS) LIBRADOS RBD CEPH FS

    RADOSGW
  29. Ceph Data Protection: Setup OPA Node Port Service Incoming Request

    HTTP S3 Api RADOSGW RADOS user, method, bucket name Allow / Deny Policy (Rego) Data (JSON)
  30. Example policy "Users can access only those buckets belonging to

    the same geographical region as them."
  31. Demo: Ceph Data Protection https://katacoda.com/styra

  32. Data is King ! • Pervasive • Abundant • Customer

    Experience • Revenue Growth • Cyber Attacks • Breaches • Fines • Loss of Customer Trust
  33. Thank You! github.com/open-policy-agent/opa openpolicyagent.org slack.openpolicyagent.org Booth S20