Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Towards Client-Side Field-Level Cryptography for Streaming Data Pipelines @ Current 2022, Austin Texas

Towards Client-Side Field-Level Cryptography for Streaming Data Pipelines @ Current 2022, Austin Texas

Abstract:

Apache Kafka offers several security features ranging from authentication and authorization mechanisms to over-the-wire encryption. This notwithstanding, end-to-end encryption between Kafka-based client applications, which fully protects payloads from fraudulent access at the broker's side can still be considered a blind spot. After highlighting the main benefits of explicit data-at-rest protection, this session discusses in-depth how to selectively encrypt and decrypt sensitive payload fields in the context of streaming data pipelines built upon Apache Kafka Connect and ksqlDB apps. In particular, an ecosystem community project named Kryptonite for Kafka - written and open-sourced by the speaker - is introduced. During this demo-driven talk, you will experience how to benefit from:

*a configurable single message transformation (SMT) that lets you perform encryption and decryption operations in Kafka Connect worker nodes without any additional code
*and a custom user-defined function (UDF) for ksqlDB to conveniently encrypt and decrypt specific columns in your SQL-based stream processing apps

Client-side field-level cryptography makes streaming data pipelines more secure by safeguarding your most sensitive and precious data against any form of uncontrolled or illegal access once it hits the Apache Kafka brokers.

Kryptonite for Kafka Project Repository:

https://github.com/hpgrahsl/kryptonite-for-kafka/

Live Demo Scenario Repository:

https://github.com/hpgrahsl/current22-k4k-demo

Recording:

https://www.confluent.io/events/current-2022/towards-client-side-field-level-cryptography/

Hans-Peter Grahsl

October 04, 2022
Tweet

More Decks by Hans-Peter Grahsl

Other Decks in Programming

Transcript

  1. Towards
    Client-Side Field-Level
    Cryptography
    for Streaming Data Pipelines
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022

    View Slide

  2. Why should we care?
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    2

    View Slide

  3. 61 %
    of breaches involved
    credential data1
    1 Verizon DBIR 2021 - https://www.verizon.com/dbir
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    3

    View Slide

  4. 85 %
    of breaches involved
    the human element1
    1 Verizon DBIR 2021 - https://www.verizon.com/dbir
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    4

    View Slide

  5. @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    5

    View Slide

  6. compromised external
    cloud assets
    more common than
    on-premises assets1
    1 Verizon DBIR 2021 - https://www.verizon.com/dbir
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    6

    View Slide

  7. Let's don't
    forget about the price tag
    of data breaches.
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    7

    View Slide

  8. Let's don't
    forget about the price tag
    of data breaches.
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    8

    View Slide

  9. $4.24M
    average cost of data
    breach2
    2 IBM Cost of Data Breach Report - https://www.ibm.com/security/data-breach
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    9

    View Slide

  10. $180
    per record cost of
    customer PII2
    2 IBM Cost of Data Breach Report - https://www.ibm.com/security/data-breach
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    10

    View Slide

  11. It's me ... Hans-Peter
    • Developer
    !
    Advocate @ Red Hat
    • Open-Source Enthusiast
    • Confluent Community Catalyst since 2019
    • MongoDB Champion since 2020
    • based in Graz, Austria
    "
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    11

    View Slide

  12. @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    12

    View Slide

  13. !
    But Kafka related? Yes!
    3
    3 https://spectralops.io/blog/misconfigured-kafdrop-puts-companies-apache-kafka-completely-exposed/
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    13

    View Slide

  14. unhappy
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    14

    View Slide

  15. Core Kafka
    Security Mechanisms
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    15

    View Slide

  16. over-the-wire encryption
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    16

    View Slide

  17. authentication
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    17

    View Slide

  18. authorization
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    18

    View Slide

  19. table stakes
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    19

    View Slide

  20. @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    20

    View Slide

  21. @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    21

    View Slide

  22. disturbing
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    22

    View Slide

  23. core security
    necessary !
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    23

    View Slide

  24. core security
    sufficient ?
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    24

    View Slide

  25. @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    25

    View Slide

  26. @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    26

    View Slide

  27. @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    27

    View Slide

  28. ?
    data in use by
    BROKERS
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    28

    View Slide

  29. BROKERS
    see everything ...
    and so does
    any legitimate
    Kafka client
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    29

    View Slide

  30. @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    30

    View Slide

  31. human promise
    is NOT
    technical promise
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    31

    View Slide

  32. ? ? ?
    end-to-end
    encryption
    ? ? ?
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    32

    View Slide

  33. Open-Source
    Community Project
    Kryptonite for Kafka
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    33

    View Slide

  34. client-side
    field level
    cryptography
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    34

    View Slide

  35. Client-Side Cryptography
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    35

    View Slide

  36. Client-Side Cryptography
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    36

    View Slide

  37. Field Level Cryptography
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    37

    View Slide

  38. Field Level Cryptography
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    38

    View Slide

  39. Kafka Connect Sources
    Single Message
    Transform
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    39

    View Slide

  40. CSFLC with Source Connectors
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    40

    View Slide

  41. Demo Part 1
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    41

    View Slide

  42. @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    42

    View Slide

  43. ksqlDB
    User-Defined Function
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    43

    View Slide

  44. CSFLC with Streaming SQL
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    44

    View Slide

  45. CSFLC with Streaming SQL
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    45

    View Slide

  46. Demo Part 2
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    46

    View Slide

  47. Here be
    Dragons
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    47

    View Slide

  48. @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    48

    View Slide

  49. Kafka Connect Sinks
    Single Message
    Transform
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    49

    View Slide

  50. CSFLC with Sink Connectors
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    50

    View Slide

  51. Demo Part 3
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    51

    View Slide

  52. @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    52

    View Slide

  53. @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    53

    View Slide

  54. @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    54

    View Slide

  55. Behind the Scenes?
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    55

    View Slide

  56. Cryptography
    • Tink by Google
    • AEAD based on AES GCM
    • DAEAD based on AES SIV
    • key rotation support
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    56

    View Slide

  57. Keyset Management
    • within SMT config (not
    recommended)
    • externalized to separate file (okayish)
    • remote / cloud KMS (recommended)
    • preliminary Azure Key Vault support
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    57

    View Slide

  58. !
    Little Ideas
    !
    • wildcard / regex matching for field names
    • dynamic keyset selection based on payload
    • additional KMS providers (GCP, AWS, ...)
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    58

    View Slide

  59. !
    Bigger Ideas
    !
    • add cryptography options (e.g. FPE)
    • extend scope beyond Kafka Connect and ksqlDB
    • make CSFLC language / runtime agnostic
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    59

    View Slide

  60. @hpgrahsl
    Let's stay in touch
    !
    on Twitter
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    60

    View Slide

  61. !
    TRY IT
    "
    • Project Code
    https://bit.ly/current22-k4k
    • Demo Scenarios
    https://bit.ly/current22-demo
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    61

    View Slide

  62. Data should continue
    to be a valuable
    asset, not become
    a costly liability.
    @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022
    62

    View Slide

  63. @hpgrahsl | #Current22 - Austin, Texas | Oct 4-5, 2022

    View Slide

  64. Photo Credits
    in order of appearance
    (c) John Salvino - https://unsplash.com/photos/bqGBbLq_yfc
    (c) Wolf Zimmermann - https://unsplash.com/photos/6sf5rf8QYFE
    (c) Jason Leung - https://unsplash.com/photos/SAYzxuS1O3M
    (c) Dev Asangbam - https://unsplash.com/photos/sh9vkVbVgo
    (c) Keenan Constance - https://unsplash.com/photos/VTLcvV6UVaI
    (c) Steve Johnson - https://unsplash.com/photos/hokONTrHIAQ
    (c) Pete Linforth - https://pixabay.com/illustrations/biometrics-access-identification-4503187/
    (c) Miguel Á. Padriñán - https://www.pexels.com/photo/close-up-shot-of-keys-on-a-red-surface-2882687/
    (c) Camila Quintero Franco - https://unsplash.com/photos/mC852jACK1g
    (c) Gerd Altmann - https://pixabay.com/illustrations/board-excuse-me-excuse-discharge-1848736/
    (c) Vijaya narasimha - https://pixabay.com/photos/crevasse-sand-stone-hills-rock-399957/
    (c) Gerd Altmann - https://pixabay.com/photos/trust-man-hood-map-prompt-4321822/
    (c) Matheo JBT - https://unsplash.com/photos/HLhvZ9HRAwo
    (c) Rob Laughter - https://unsplash.com/photos/WW1jsInXgwM
    (c) Markus Spiske - https://unsplash.com/photos/iar-afB0QQw
    (c) Nerene Grobler - https://unsplash.com/photos/sLxcfdsqLQ
    (c) Wilhelm Gunkel - https://unsplash.com/photos/L04Kczg_Jvs
    (c) Matt Walsh - https://unsplash.com/photos/tVkdGtEe2C4

    View Slide