Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Role Based Access Control in Real-Time Streaming Data: What, Why and How (Hojjat Jafarpour, DeltaStream) | RTA Summit 2023

Role Based Access Control in Real-Time Streaming Data: What, Why and How (Hojjat Jafarpour, DeltaStream) | RTA Summit 2023

Data streaming platforms such as Apache Kafka and AWS Kinesis have become a foundational part of real-time data processing. It is crucial for such systems to ensure security of streaming data as such data plays an increasingly important role in mission critical applications in organizations. Role-Based Access Control (RBAC) is one of the most common ways to provide security for data in motion. Access control privileges that are defined in a RBAC service determine which role can access and perform operations on specific resources. In this talk, we first present the state of the art in Role-Based Access Control for streaming data in platforms such as Apache Kafka and AWS Kinesis. We then discuss the shortcomings of the current solutions and present a novel approach where we bring the same RBAC concepts from relational systems to the data in motion space and explain how it addresses aforementioned shortcomings in the current solutions.

Attendees will learn about the state of the art in security and Role-Based Access Control in data streaming technologies and understand shortcomings and challenges in these approaches. They will also learn a novel approach that they can use in their organizations to secure access to the streaming data regardless of which system is storing the streaming data, whether it is Apache Kafka, AWS Kinesis or a hybrid of these systems.

StarTree

May 23, 2023
Tweet

More Decks by StarTree

Other Decks in Technology

Transcript

  1. Role Based Access Control in Real-Time Streaming Data: What, Why

    and How Hojjat Jafarpour Founder & CEO @ DeltaStream, Inc. [email protected] @hojjat
  2. Streaming Storage 3 • Apache Kafka, AWS Kinesis, Apache Pulsar,...

    • Backbone of streaming data • Decouple producers and consumers • Make data available in real-time (low latency) Streaming Storage (Kafka, Kinesis, …)
  3. Access Control • Access Control Lists (ACLs) ◦ Good for

    smaller scale • Role-based Access Control (RBAC) ◦ Access control privileges ◦ Roles ◦ Resources ◦ Privileges determine which role can access and perform operations on specific resources. 5
  4. Example1: Confluent Cloud • Allows access control to ◦ Organization,

    environment, cluster, granular Kafka resources(topics, consumer groups, and transactional IDs), SR and ksqlDB resources https://docs.confluent.io/cloud/current/access-management/access-control/cloud-rbac.html 6
  5. Example1: Confluent Cloud • Allows access control to ◦ Organization,

    environment, cluster, granular Kafka resources(topics, consumer groups, and transactional IDs), SR and ksqlDB resources • Roles are predefined ◦ As of now there are 13 available roles ▪ OrganizationAdmin, EnvironmentAdmin, CloudClusterAdmin, Operator,... https://docs.confluent.io/cloud/current/access-management/access-control/cloud-rbac.html 7
  6. Example1: Confluent Cloud • Allows access control to ◦ Organization,

    environment, cluster, granular Kafka resources(topics, consumer groups, and transactional IDs), SR and ksqlDB resources • Roles are predefined ◦ As of now there are 13 available roles ▪ OrganizationAdmin, EnvironmentAdmin, CloudClusterAdmin, Operator,... • Each role has View Scope and Admin Scope https://docs.confluent.io/cloud/current/access-management/access-control/cloud-rbac.html 8
  7. Example1: Confluent Cloud • Role bindings ◦ Permissions for principals

    ▪ Which roles are assigned to a given user ▪ e.g., grant permissions to a new user 9
  8. Example1: Confluent Cloud • Role bindings ◦ Permissions for principals

    ▪ Which roles are assigned to a given user ▪ e.g., grant permissions to a new user ◦ Permissions on resources ▪ Which roles can access to a given resource ▪ e.g., grant permissions on a resource such as a new cluster to roles 10
  9. Example1: Confluent Cloud • Example role bindings confluent iam rbac

    role-binding create --principal User:u-a03bcd --role CloudClusterAdmin --environment env-nx5jd --cloud-cluster lkc-xyxmz +--------------+-------------------+ | Principal | User:u-a03bcd | | Role | CloudClusterAdmin | | ResourceType | Cluster | +--------------+-------------------+ 11
  10. Example1: Confluent Cloud • Example role bindings confluent iam rbac

    role-binding create --principal User:u-e03vqq --role ResourceOwner \ --environment env-nx5jd --cloud-cluster lkc-xyxmz --kafka-cluster-id lkc-xyxmz \ --resource Topic:connect-config +----------------+----------------+ | Principal | User:u-e03vqq | | Email | | | Role | ResourceOwner | | Environment | | | CloudCluster | | | ClusterType | | | LogicalCluster | | | ResourceType | Topic | | Name | connect-config | | PatternType | LITERAL | +----------------+----------------+ 12
  11. Example1: Confluent Cloud • Need to use Confluent Cloud Console,

    CLI or API • Confluent only commands • Limited roles available 13
  12. Example2: AWS MSK • AWS MSK uses AWS Identity and

    Access Management(IAM) for authentication and authorization 14
  13. Example2: AWS MSK • AWS MSK uses AWS Identity and

    Access Management(IAM) for authentication and authorization • Uses Authorization Policies ◦ Specifies which actions to allow or deny on a resource for a role 15
  14. Example2: AWS MSK • There is a list of available

    actions ◦ kafka-cluster:Connect, kafka-cluster:DescribeCluster,.. ◦ Some actions depend on others ▪ Actions with dependencies should include those dependencies 16
  15. Example2: AWS MSK • There is a list of available

    actions ◦ kafka-cluster:Connect, kafka-cluster:DescribeCluster,.. ◦ Some actions depend on others ▪ Actions with dependencies should include those dependencies • Four types of resources that can be used in authorization policy ◦ Cluster, Topic, Group and Transactional ID 17
  16. Example2: AWS MSK 18 { "Version": "2022-12-16", "Statement": [ {

    "Effect": "Allow", "Action": [ "kafka-cluster:Connect", "kafka-cluster:AlterCluster", "kafka-cluster:DescribeCluster" ], "Resource": [ "arn:aws:kafka:us-east-1:0123456789012:cluster/MyTestCluster/abcd1234-0123-abcd-5678-1234abcd-1" ] } ] }
  17. Example2: AWS MSK • Need to use AWS IAM •

    AWS IAM requests only • Can be used through AWS Management Console, the API, or the AWS CLI 19
  18. RBAC in Relational Systems • The most familiar for data

    at rest • Widely used and large user base ◦ From open source relational databases such as Postgres to commercial cloud data warehouses 23
  19. RBAC in Relational Systems • The most familiar for data

    at rest • Widely used and large user base ◦ From open source relational databases such as Postgres to commercial cloud data warehouses • Why not use the same model for data in motion(streaming data)? 24
  20. Example 28 CC_1 MSK_1 orders customers • Two Kafka clusters,

    one on Confluent Cloud(CC_1) and one on AWS(MSK_1) • Two topics, orders and customers
  21. Hierarchical Namespacing • Same as relational systems ◦ Relation: static

    or dynamic set of tuples ▪ STREAM ▪ CHANGELOG ▪ MATERIALIZED VIEW ▪ TABLE 29
  22. Hierarchical Namespacing • Same as relational systems ◦ Relation: static

    or dynamic set of tuples ▪ STREAM ▪ CHANGELOG ▪ MATERIALIZED VIEW ▪ TABLE ◦ Schema: a logical grouping of relational objects such as Streams, Changelogs, Materialized Views, and Tables. 30
  23. Hierarchical Namespacing • Same as relational systems ◦ Relation: static

    or dynamic set of tuples ▪ STREAM ▪ CHANGELOG ▪ MATERIALIZED VIEW ▪ TABLE ◦ Schema: a logical grouping of relational objects such as Streams, Changelogs, Materialized Views, and Tables. ◦ Database: a logical grouping of schemas. 31
  24. Example 33 CC_1 MSK_1 orders customers • Run continuous queries

    and build materialized views ◦ Create enriched_orders by joining orders with customers. ◦ Build a materialized view to compute daily order value per customer enriched_orders
  25. Key Concepts • User: represents an authenticated identity. • Objects

    (Securable Objects): represents an entity to which access can be granted. 36
  26. Key Concepts • User: represents an authenticated identity. • Objects

    (Securable Objects): represents an entity to which access can be granted. • Privilege: is a defined level of access to an object. 37
  27. Key Concepts • User: represents an authenticated identity. • Objects

    (Securable Objects): represents an entity to which access can be granted. • Privilege: is a defined level of access to an object. • Role: is an entity to which privileges can be granted. 38
  28. RBAC for Streaming Data • Use the same SQL syntax

    as the relational systems ◦ CREATE/DROP/ALTER ◦ GRANT/REVOKE 39
  29. Example 40 • Create a new role to access the

    enriched_orders stream and daily_spent materialized view CREATE ROLE test_role_1; GRANT ROLE test_role_1 TO USER user_1; GRANT SELECT ON RELATION online_store.public.enriched_orders TO ROLE test_role_1; GRANT SELECT ON RELATION online_store.public.daily_spent TO ROLE test_role_1;
  30. Example 41 • Create a new role to access all

    objects in online_store database CREATE ROLE test_role_2; GRANT ROLE test_role_2 TO USER user_2; GRANT USAGE ON DATABASE online_store TO ROLE test_role_2;
  31. Relational RBAC for Streaming Data • Works across streaming stores

    ◦ Apache Kafka, AWS Kinesis, … • Familiar syntax ◦ No need to learn new syntax/concepts 43
  32. Relational RBAC for Streaming Data • Works across streaming stores

    ◦ Apache Kafka, AWS Kinesis, … • Familiar syntax ◦ No need to learn new syntax/concepts • Hierarchical namespacing 44
  33. Relational RBAC for Streaming Data • Works across streaming stores

    ◦ Apache Kafka, AWS Kinesis, … • Familiar syntax ◦ No need to learn new syntax/concepts • Hierarchical namespacing • Unified view of all streaming data by abstracting the streaming stores 45
  34. Relational RBAC for Streaming Data • Works across streaming stores

    ◦ Apache Kafka, AWS Kinesis, … • Familiar syntax ◦ No need to learn new syntax/concepts • Hierarchical namespacing • Unified view of all streaming data by abstracting the streaming stores • No limitations on roles ◦ Can build as many custom roles as needed 46
  35. We have built exactly this at 47 Try all of

    these and much more for free at www.deltastream.io