Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Boston Meetup, Fev 2020

Boston Meetup, Fev 2020

Ricardo Ferreira

February 19, 2020
Tweet

More Decks by Ricardo Ferreira

Other Decks in Programming

Transcript

  1. About me @riferrei | @kafkameetup | @CONFLUENTINC • RICARDO FERREIRA

    • Works for confluent • Developer advocate • [email protected] • HTTPS://RIFERREI.NET
  2. Origins of apache kafka @riferrei | @kafkameetup | @CONFLUENTINC ”there

    were lots of databases and other systems built to store data, but what was missing in our architecture was something that would help us to handle continuous flows of data.” – jay kreps
  3. @riferrei | @kafkameetup | @CONFLUENTINC First realization > I changed

    my job from oracle to confluent I work at confluent event state
  4. @riferrei | @kafkameetup | @CONFLUENTINC SQL SQL SQL Recommendation engine

    Search engine Email service database LOG Let’s implement this!
  5. @riferrei | @kafkameetup | @CONFLUENTINC second realization database 1000x more

    volume Non-transactional events Transactional events LOG
  6. @riferrei | @kafkameetup | @CONFLUENTINC ARE DATABASES LIMITED? YES, THEY

    ARE. WHY DO WE HAVE TO MOVE DATA FROM ONE DB TO ANOTHER JUST TO DO ANALYTICS?
  7. @riferrei | @kafkameetup | @CONFLUENTINC SHARED STATE = MORE DB’S

    Business line 1 Business line 2 Business line 3
  8. @riferrei | @kafkameetup | @CONFLUENTINC THIRD REALIZATION User tracking Historical

    data Operational metrics Nosql database Graph database Sql database microservices ... HADOOP Elastic search grafana Machine learning REC. ENGINE SEARCH SECURITY EMAIL SOCIAL GRAPH
  9. “The truth is the log. The database is a cache

    of a subset of the log.” — pat helland Immutability changes everything http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf
  10. @riferrei | @kafkameetup | @CONFLUENTINC log as first-class citizen database

    LOG 0 1 2 3 4 5 6 7 8 LOG reads writes Destination System a (time = 1) Destination System b (time = 3)
  11. @riferrei | @kafkameetup | @CONFLUENTINC SOLUTION: BUILD A COMMIT LOG

    Commit LOG User tracking Historical data Operational metrics Nosql database Graph database Sql database microservices ... HADOOP Elastic search grafana Machine learning REC. ENGINE SEARCH SECURITY EMAIL SOCIAL GRAPH
  12. @riferrei | @kafkameetup | @CONFLUENTINC STREAMS AND TABLES DUALITY {"user":"riferrei","score":"1001"}

    {"user":"riferrei","score":"1002"} {"user":"riferrei","score":"1003"} {"user":"riferrei","score":"1004"} {"user":"riferrei","score":"1005"} {"user":"riferrei","score":"1005"} stream table
  13. Origins of apache kafka @riferrei | @kafkameetup | @CONFLUENTINC ”WE’VE

    COME TO THINK OF KAFKA AS A STREAMING PLATFORM: A SYSTEM THAT LETS YOU PUBLISH AND SUBSCRIBE TO STREAMS OF DATA, STORE THEM, AND PROCESS THEM, AND THAT IS EXACTLY WHAT APACHE KAFKA IS BUILT TO BE.” – jay kreps
  14. @riferrei | @kafkameetup | @CONFLUENTINC ORIGINS OF APACHE KAFKA Databases

    Messaging Batch Expensive Time Consuming Difficult to Scale No Persistence After Consumption No Replay Highly Scalable Durable Persistent Ordered Fast (Low Latency)
  15. @riferrei | @kafkameetup | @CONFLUENTINC ORIGINS OF APACHE KAFKA Databases

    Messaging Batch Expensive Time Consuming Difficult to Scale No Persistence After Consumption No Replay Highly Scalable Durable Persistent Ordered Fast (Low Latency) Highly Scalable Durable Persistent Ordered Fast (Low Latency) Distributed Commit log
  16. @riferrei | @kafkameetup | @CONFLUENTINC ORIGINS OF APACHE KAFKA Databases

    Messaging Batch Expensive Time Consuming Difficult to Scale No Persistence After Consumption No Replay Highly Scalable Durable Persistent Ordered Fast (Low Latency) Highly Scalable Durable Persistent Ordered Fast (Low Latency) Stream processing Continuous flows Scalable integration Distributed Streaming platform
  17. Origins of apache kafka @riferrei | @kafkameetup | @CONFLUENTINC ”the

    ability to combine these three areas – to bring all the streams of data together across all the use cases – is what makes the idea of a streaming platform so appealing to people” – jay kreps
  18. @riferrei | @kafkameetup | @CONFLUENTINC Complete scoreboard USER_GAME USER_losses Stats_per_user

    losses_per_user SCOREBOARD storage process storage process storage