Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Handling GDPR with Apache Kafka: How to Comply Without Freaking Out? (Kafka Summit London 2019)

Handling GDPR with Apache Kafka: How to Comply Without Freaking Out? (Kafka Summit London 2019)

Do you wonder how to cope with the right to be forgotten? Do you wonder how to only process the events of individuals who have given their consent for processing their data? Do you wonder how to protect PII data of your users? Or do you wonder how to implement these across all your heterogeneous languages, clients and processing frameworks without having to re-implement all your streaming services? This talk is for you!

In this talk, we will answer these questions and show you
1) how transparent end-to-end encryption can be implemented on top of Apache Kafka;
2) how crypto-shredding can be used to forget individuals; and
3) how record based access control can be implemented on top of Apache Kafka.

Above all, we will show how this can be done without touching any applications by using an out-of-process architecture (à la service-mesh).

David Jacot

May 14, 2019
Tweet

More Decks by David Jacot

Other Decks in Technology

Transcript

  1. Handling GDPR with Apache Kafka: How to comply without freaking

    out? David Jacot (@davidjacot) Kafka Summit London May 14, 2019
  2. @davidjacot Kafka: What are the challenges? 3 Encryption Right to

    be Forgotten Consent Kafka does not have an real encryption story Kafka only provides Topics authorization Kafka only knows how to expire or compact events ! ?
  3. @davidjacot Fortunately, solutions exist! 4 Encryption Right to be Forgotten

    Consent Kafka does not have an real encryption story Kafka only provides Topics authorization Kafka only knows how to expire or compact events E2E Encryption Crypto-Shredding Record based ACLs ! ? ✓
  4. @davidjacot End-to-End (Symmetric) Encryption 1 2 3 4 5 6

    7 8 9 10 Producer Consumer Encrypt Produce Decrypt Consume 5
  5. @davidjacot How do they exchange keys? Key Management Service (KMS)

    Master Key or Key Encryption Key (KEK) Producer Consumer 6
  6. @davidjacot Envelope Encryption Key: clearkey Value: clearvalue Headers: Key: clearkey

    Value: DEK(clearvalue) Headers: Key: clearkey Value: DEK(clearvalue) Headers: - KEK(DEK) - Ref. to the KEK Generate a DEK locally Use a KEK to wrap the DEK DEK: Data Encryption Key KEK: Key Encryption Key 7
  7. @davidjacot Envelope Decryption Key: clearkey Value: clearvalue Headers: Use the

    DEK to decrypt the Value Use the KEK to unwrap the DEK DEK: Data Encryption Key KEK: Key Encryption Key Key: clearkey Value: DEK(clearvalue) Headers: - KEK(DEK) - Ref. to the KEK 8
  8. @davidjacot … or a Master Key per Topic ... KMS

    Topic A Topic B Topic C 10 Topic A Topic B Topic C
  9. @davidjacot … or a Master Key per User KMS Topic

    A Topic B Topic C User 1 User 2 User 3 User 4 11
  10. @davidjacot Need to forget a User? Crypto-shred him! 12 KMS

    Topic A Topic B Topic C User 1 User 2 User 3 User 4
  11. @davidjacot Record (or User) based ACLs KMS Topic A Topic

    B Topic C 13 User 1 User 2 User 3 User 4
  12. @davidjacot Record (or User) based ACLs KMS Topic A Topic

    B Topic C 14 User 1 User 2 User 3 User 4
  13. @davidjacot Record (or User) based ACLs KMS Topic A Topic

    B Topic C 15 User 1 User 2 User 3 User 4
  14. @davidjacot Checkpoint 16 Encryption Right to be Forgotten Consent Kafka

    does not have an real encryption story Kafka only provides Topics authorization Kafka only knows how to expire or compact events E2E Encryption Crypto-Shredding Record based ACLs ! ? ✓
  15. @davidjacot client-marketing Setup on Kubernetes 22 Kafka client-eventsgen ksql-gen L7

    Proxy client-analytics console -consumer L7 Proxy console -consumer L7 Proxy Control Plane Data Plane control-plane KMS Policies ACL Clear mTLS clickstream
  16. @davidjacot click stream { "ip": "122.152.45.245", "userid": 38, "time": 1556469798309,

    "request": "GET / HTTP/1.1", "status": "302", "bytes": "4096", "agent": "Mozilla/5.0 ..." } 23
  17. @davidjacot Kafka Transparent L7 Proxy for Apache Kafka 25 Pod

    / VM App Proxy Broker 1 Broker 2 Broker 3 Kafka Client Interceptors 1 2 3 1 2 3 TCP connections going to Kafka are redirected to the proxy Each connection is proxied to its real destination Requests & Responses are intercepted and possibly altered Clear mTLS
  18. @davidjacot What is intercepted by the proxy? ApisRequest / ApisResponse

    ProduceRequest / ProduceResponse FetchRequest / FetchResponse 26
  19. @davidjacot RecordBatch (v2): well thought! baseOffset: int64 batchLength: int32 partitionLeaderEpoch:

    int32 magic: int8 (current magic value is 2) crc: int32 attributes: int16 lastOffsetDelta: int32 firstTimestamp: int64 maxTimestamp: int64 producerId: int64 producerEpoch: int16 baseSequence: int32 records: [Record] 27 https://kafka.apache.org/documentation/#recordbatch
  20. @davidjacot + Privacy & Compliance Transparent Integration Solved once for

    all 29 - Encryption comes with a cost Latency is increased (KMS) Message Format >v2
  21. @davidjacot What’s next? 30 External KMS Field level encryption Field

    level ACL Avro support Schema validation Lineage ...
  22. @davidjacot Summary 31 Encryption Right to be Forgotten Consent Kafka

    does not have an real encryption story Kafka only provides Topics authorization Kafka only knows how to expire or compact events E2E Encryption Crypto-Shredding Record based ACLs ! ? ✓