Pro Yearly is on sale from $80 to $50! »

Understanding Streaming Data and Analytics with Apache Kafka®

Understanding Streaming Data and Analytics with Apache Kafka®

02ff2dde723b6e26f4ef03ee6b3f6eb9?s=128

Ricardo Ferreira

October 01, 2020
Tweet

Transcript

  1. Understanding streaming data and analytics with apache kafka® @riferrei |

    @apachekafka | @elastic
  2. About me @riferrei | @apachekafka | @elastic • RICARDO FERREIRA

    • Developer advocate • Elastic community team • Kafka summit pc member • Riferrei@elastic.co • riferrei@riferrei.com
  3. None
  4. None
  5. @riferrei | @apachekafka | @elastic ”there were lots of databases

    and other systems built to store data, but what was missing in our architecture was something that would help us to handle continuous flows of data.” – jay kreps Origins of apache kafka
  6. None
  7. @riferrei | @apachekafka | @elastic Event-driven architecture Job change recommendation

    engine Search engine Email service
  8. @riferrei | @apachekafka | @elastic SQL SQL SQL Recommendation engine

    Search engine Email service database LOG IMPLEMENT WITH a DATABASE
  9. @riferrei | @apachekafka | @elastic Databases CAN’T handle events database

    1000x more volume Non-transactional events Transactional events LOG
  10. Databases 30 years ago...

  11. Databases these days

  12. @riferrei | @apachekafka | @elastic Databases are limited

  13. Limited? Are you kidding me?

  14. @riferrei | @apachekafka | @elastic ARE DATABASES LIMITED? YES THEY

    ARE. WHY DO WE HAVE TO MOVE DATA FROM ONE DB TO ANOTHER JUST for ANALYTICS?
  15. @riferrei | @apachekafka | @elastic What then?

  16. “The truth is the log. The database is a cache

    of a subset of the log.” — pat helland Immutability changes everything http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf
  17. @riferrei | @apachekafka | @elastic log as first-class citizen database

    LOG 0 1 2 3 4 5 6 7 8 LOG reads writes Destination System a (time = 1) Destination System b (time = 3)
  18. @riferrei | @apachekafka | @elastic SOLUTION: BUILD A COMMIT LOG

    Commit LOG User tracking Historical data Operational metrics Nosql database Graph database Sql database ... HADOOP Elastic search grafana Machine learning REC. ENGINE SEARCH SECURITY EMAIL SOCIAL GRAPH microservices
  19. @riferrei | @apachekafka | @elastic ”WE’VE COME TO THINK OF

    KAFKA AS A STREAMING PLATFORM: A SYSTEM THAT LETS YOU PUBLISH AND SUBSCRIBE TO STREAMS OF DATA, STORE THEM, AND PROCESS THEM, AND THAT IS EXACTLY WHAT APACHE KAFKA IS BUILT TO BE.” – jay kreps Origins of apache kafka
  20. @riferrei | @apachekafka | @elastic ORIGINS OF APACHE KAFKA Databases

    Messaging Batch Expensive Time Consuming Difficult to Scale No Persistence After Consumption No Replay Highly Scalable Durable Persistent Ordered Fast (Low Latency)
  21. @riferrei | @apachekafka | @elastic ORIGINS OF APACHE KAFKA Databases

    Messaging Batch Expensive Time Consuming Difficult to Scale No Persistence After Consumption No Replay Highly Scalable Durable Persistent Ordered Fast (Low Latency) Highly Scalable Durable Persistent Ordered Fast (Low Latency) Distributed Commit log
  22. @riferrei | @apachekafka | @elastic ORIGINS OF APACHE KAFKA Databases

    Messaging Batch Expensive Time Consuming Difficult to Scale No Persistence After Consumption No Replay Highly Scalable Durable Persistent Ordered Fast (Low Latency) Highly Scalable Durable Persistent Ordered Fast (Low Latency) Stream processing Continuous flows Scalable integration Distributed Streaming platform
  23. @riferrei | @apachekafka | @elastic ”the ability to combine these

    three areas – to bring all the streams of data together across all the use cases – is what makes the idea of a streaming platform so appealing to people” – jay kreps Origins of apache kafka
  24. @riferrei | @confluentinc | @itau

  25. 01 Data Streams with messaging 02 Data analytics with stream

    processing 03 Sophisticated STORAGE SYSTEM Distributed streaming platform
  26. @riferrei | @apachekafka | @elastic Data streams With messaging

  27. @riferrei | @apachekafka | @elastic producer Messaging as you know

    it consumer broker write push
  28. @riferrei | @apachekafka | @elastic producer Kafka does messaging different

    consumer broker write pull
  29. @riferrei | @apachekafka | @elastic Kafka does messaging different broker

    pull Group 1 Group 2 Group 3 pull pull queueing Pub/sub
  30. @riferrei | @apachekafka | @elastic Kafka does messaging different 0

    1 2 3 4 5 6 7 topic 0 1 2 3 Partition 1 4 5 6 7 Partition 2
  31. @riferrei | @apachekafka | @elastic Kafka does messaging different 0

    1 2 3 Partition 1 4 5 6 7 Partition 2 8 9 Partition 3 producer write consumer consumer consumer pull pull pull
  32. @riferrei | @apachekafka | @elastic Kafka does messaging different 0

    1 2 3 Partition 1 4 5 6 7 Partition 2 8 9 Partition 3 producer Key 002
  33. @riferrei | @apachekafka | @elastic Kafka does messaging different producer

    write consumer pull Bytes serialize deserialize
  34. @riferrei | @apachekafka | @elastic producer Kafka does messaging different

    broker write 250gb 250gb 500gb Data is always Persistent
  35. @riferrei | @apachekafka | @elastic Data ANALYTICS WITH STREAM PROCESSING

  36. @riferrei | @apachekafka | @elastic How to process data streams?

    consumer broker 1) pull number of records < 4 12 number of records > 5 9 3) write 2) process
  37. @riferrei | @apachekafka | @elastic How to process data streams?

    consumer broker 1) pull 3) write What IF WE COULD HAVE A Processing LAYER FOR THE DATA STREAMS? number of records < 4 12 number of records > 5 9 2) process
  38. @riferrei | @apachekafka | @elastic Using stream processors producer consumer

    broker write pull Stream processors
  39. @riferrei | @apachekafka | @elastic Using stream processors Kafka streams

  40. @riferrei | @apachekafka | @elastic Using stream processors ksqldb

  41. @riferrei | @apachekafka | @elastic Scalable data integration broker Stream

    processors connectors
  42. @riferrei | @apachekafka | @elastic sophisticated Storage system

  43. @riferrei | @apachekafka | @elastic Kafka as a storage system

    Broker 1 250gb 250gb 500gb 1tb storage Broker 2 500gb 500gb 500gb 1.5tb storage Cluster storage → 2.5tb Elastic storage
  44. @riferrei | @apachekafka | @elastic Kafka as a storage system

    Broker 1 250gb 250gb 500gb 1tb storage Broker 2 500gb 500gb 500gb 1.5tb storage Partition-level replication Partition 1 Partition 2 Partition 2
  45. @riferrei | @apachekafka | @elastic Kafka as a storage system

    Commit LOG consumer Polling 100 records consumer Constant time performance Time spent: 1 MS Polling 100 records Time spent: 1 MS Commit LOG 5kb 5tb
  46. @riferrei | @apachekafka | @elastic Kafka as a storage system

    Optimized for massive reads Broker 1 250gb 250gb 500gb 1tb storage pagecache nic consumer Kafka uses the sendfile api to: - Bypass pagecache to kernel space - Bypass kernel space to user buffer - Bypass user buffer to kernel space - Bypass kernel space to socket buffer Partition 1 Partition 2
  47. @riferrei | @apachekafka | @elastic Kafka as a storage system

    File management in kafka Partition 0 Partition 1 Partition 2 Segment 0 Segment 1 + Segment 2 + 0000Seg1.log 0000Seg1.index
  48. @riferrei | @apachekafka | @elastic Putting the Pieces together

  49. @riferrei | @apachekafka | @elastic Streaming PAC-MAN

  50. @riferrei | @apachekafka | @elastic STREAMING PAC-MAN Api gateway Lambda

    function Kafka (MSK) Ksqldb (ecs) Kafka (MSK) scoreboard https://github.com/riferrei/streaming-pacman-aws
  51. @riferrei | @apachekafka | @elastic 2. Name yourself 1. Get

    the game Streaming pac-man
  52. @riferrei | @apachekafka | @elastic Making data available

  53. @riferrei | @apachekafka | @elastic Api gateway Lambda function scoreboard

    Redis cache push From kafka to the world
  54. From kafka to the world @riferrei | @apachekafka | @elastic

    Amazon alexa Lambda function scoreboard Redis cache push
  55. @riferrei | @apachekafka | @elastic Your code Ksqldb (ECS) pull

    Kafka (MSK) From kafka to the world
  56. @riferrei | @apachekafka | @elastic how can I learn more?

  57. @riferrei | @apachekafka | @elastic Use professional books

  58. @riferrei | @apachekafka | @elastic Thank you