Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Calculating the Value of Pie: Real-Time Survey Analysis With Apache Kafka (Danica Fine, Confluent) | RTA Summit 2023

Calculating the Value of Pie: Real-Time Survey Analysis With Apache Kafka (Danica Fine, Confluent) | RTA Summit 2023

These days, it’s difficult to keep up with friends and family, even more so when important things are involved––like pie. Do you find yourself wishing that you could quickly and efficiently poll your loved ones, peers, and colleagues for their pastry preferences? Look no further. This talk explores how to create an interactive, real-time survey bot with Telegram and Apache Kafka® and how to analyze the survey responses using ksqlDB.

Kafka Producers and Consumers are the main ingredients that come together to make an interactive Telegram bot that issues on-demand surveys to friends and family. Survey responses are written to Kafka in real-time where ksqlDB helps to measure, mix, and knead the data––serving up a fresh and delicious result that we can consume immediately! In this session, you’ll see the entire pipeline from start to finish, learning how to think about data and use schemas, how to configure Producers and Consumers, and how to use ksqlDB for both stateless and stateful processing.

Come along to see how this recipe for real-time survey analysis comes together––and how you, too, can calculate the value of pie!

StarTree

May 23, 2023
Tweet

More Decks by StarTree

Other Decks in Technology

Transcript

  1. dfine@confluent.io @TheDanicaFine linkedin.com/in/danica-fine/ Surveying the Problem Landscape • Surveys are

    EVERYWHERE • Great business value ◦ External: what do your customers think of your product? ◦ Internal: how are your employees doing? • Often batch-processed
  2. dfine@confluent.io @TheDanicaFine linkedin.com/in/danica-fine/ Paradigm Shift Batch Processing Realtime Processing Batch

    Result Batch Result PROCESSOR … … PROCESSOR Realtime Update Realtime Update Realtime Update Realtime Update
  3. dfine@confluent.io @TheDanicaFine linkedin.com/in/danica-fine/ Thinking in Events • Natural way to

    reason about things • Indicate that something has happened ◦ When ◦ What/Who • Immutable pieces of information
  4. dfine@confluent.io @TheDanicaFine linkedin.com/in/danica-fine/ Thinking in Events • Natural way to

    reason about things • Indicate that something has happened ◦ When ◦ What/Who • Immutable pieces of information
  5. dfine@confluent.io @TheDanicaFine linkedin.com/in/danica-fine/ • Topics ◦ Basic storage unit ◦

    Read/Write: ▪ Producer and consumer clients ▪ Completely decoupled • Partitions ◦ Immutable, append-only logs ◦ Data is replicated at this level Kafka Storage P0 P1 P2 Kafka Topic
  6. dfine@confluent.io @TheDanicaFine linkedin.com/in/danica-fine/ Kafka, but make it simple. • Fully-managed,

    cloud-based Kafka • Auxiliary tools: ◦ Kafka Connect ◦ ksqlDB ◦ Schema management
  7. dfine@confluent.io @TheDanicaFine linkedin.com/in/danica-fine/ survey-entries { "doc": "Contains individual survey entries

    for a user and survey question.", "fields": [ { "name": "survey_id", "type": "string" }, { "name": "user_id", "type": "string" }, { "name": "name", "type": "string" }, { "default": null, "name": "company", "type": ["null","string"] }, { "default": null, "name": "location", "type": ["null","string"] }, { "name": "response", "type": "string" } ], "name": "surveyEntry", "namespace": "survey.bot", "type": "record" }
  8. dfine@confluent.io @TheDanicaFine linkedin.com/in/danica-fine/ survey-respondents { "doc": "Respondents to survey questions.",

    "fields": [ { "name": "user_id", "type": "string" }, { "name": "name", "type": "string" }, { "default": null, "name": "company", "type": ["null","string"] }, { "default": null, "name": "location", "type": ["null","string"] } ], "name": "surveyRespondent", "namespace": "survey.bot", "type": "record" }
  9. dfine@confluent.io @TheDanicaFine linkedin.com/in/danica-fine/ survey-responses { "doc": "Contains individual survey responses

    for a user and survey question.", "fields": [ { "name": "survey_id", "type": "string" }, { "name": "user_id", "type": "string" }, { "name": "response", "type": "string" } ], "name": "surveyResponse", "namespace": "survey.bot", "type": "record" }
  10. dfine@confluent.io @TheDanicaFine linkedin.com/in/danica-fine/ survey-questions { "doc": "Details for a single

    question survey.", "fields": [ { "name": "survey_id", "type": "string" }, { "name": "question", "type": "string" }, { "name": "summary", "type": "string" }, { "name": "options", "type": { "items": "string", "type": "array" } }, { "name": "enabled", "type": "boolean" } ], "name": "surveyQuestion", "namespace": "survey.bot", "type": "record" }
  11. dfine@confluent.io @TheDanicaFine linkedin.com/in/danica-fine/ Creating a Survey { "survey_id": "1", "question":

    "Which Thanksgiving Pie is your favorite?", "summary": "Thanksgiving Pie", "options": [ "Pumpkin Pie", "Pecan Pie", "Apple Pie", "Thanksgiving Leftover Pot Pie", "Other" ], "enabled": true }
  12. dfine@confluent.io @TheDanicaFine linkedin.com/in/danica-fine/ Telegram as a Producer • python-telegram-bot library

    ◦ Wrapper for Telegram API ◦ Define conversation handlers • Produce data to Kafka ◦ Capture data using conversation handlers ◦ Create survey-entry message, serialize, and produce
  13. dfine@confluent.io @TheDanicaFine linkedin.com/in/danica-fine/ Conversation Handler survey_handler = ConversationHandler( entry_points =

    [CommandHandler(survey, survey_command)], states = { SURVEY_STATE.NAME: [ MessageHandler(filters.TEXT & ~filters.COMMAND, name_command), CommandHandler('cancel', cancel_command) ], ... SURVEY_STATE.RESPONSE: [ MessageHandler(filters.Regex( "^(Pumpkin Pie|Pecan Pie|Apple Pie|Thanksgiving Leftover Pot Pie|Other)$") ), response_command), CommandHandler('cancel', cancel_command) ], SURVEY_STATE.CONFIRM: [ CommandHandler('y', confirm_command), CommandHandler('n', cancel_command) ] }, fallbacks=[CommandHandler('cancel', cancel_command)] )
  14. dfine@confluent.io @TheDanicaFine linkedin.com/in/danica-fine/ Command Handler async def name_command(update: Update, context:

    ContextTypes.DEFAULT_TYPE) -> None: # capture and store name data context.user_data['name'] = update.message.text # get chat_id as user_id context.user_data['user_id'] = update.message.chat_id # prompt for company information await update.message.reply_text( "Enter the company that you work for or use /skip to go to the next question." ) return SURVEY_STATE.COMPANY
  15. dfine@confluent.io @TheDanicaFine linkedin.com/in/danica-fine/ Producer Code def send_entry(entry): # send survey

    entry message try: # set up Kafka producer for survey entries producer = clients.producer(clients.entry_serializer()) # prep key and value for message k = str(metadata.get('survey_id')) value = SurveyEntry.dict_to_entry(entry) logger.info("Publishing survey entry message for key %s", k) producer.produce(config['topics']['survey-entries'], key=k, value=value) except Exception as e: logger.error("Got exception %s", e) raise e finally: producer.poll() producer.flush()
  16. dfine@confluent.io @TheDanicaFine linkedin.com/in/danica-fine/ Extracting Respondents CREATE STREAM survey_respondents WITH (

    kafka_topic = 'survey-respondents', value_format = 'AVRO' ) AS SELECT user_id, name, company, location FROM survey_entries EMIT CHANGES;
  17. dfine@confluent.io @TheDanicaFine linkedin.com/in/danica-fine/ Extracting Respondents CREATE STREAM survey_respondents WITH (

    kafka_topic = 'survey-respondents', value_format = 'AVRO' ) AS SELECT user_id, name, company, location FROM survey_entries EMIT CHANGES;
  18. dfine@confluent.io @TheDanicaFine linkedin.com/in/danica-fine/ Extracting Respondents CREATE STREAM survey_respondents WITH (

    kafka_topic = 'survey-respondents', value_format = 'AVRO' ) AS SELECT user_id, name, company, location FROM survey_entries EMIT CHANGES;
  19. dfine@confluent.io @TheDanicaFine linkedin.com/in/danica-fine/ Extracting Respondents CREATE STREAM survey_respondents WITH (

    kafka_topic = 'survey-respondents', value_format = 'AVRO' ) AS SELECT user_id, name, company, location FROM survey_entries EMIT CHANGES;
  20. dfine@confluent.io @TheDanicaFine linkedin.com/in/danica-fine/ Extracting Responses CREATE STREAM survey_responses WITH (

    kafka_topic = 'survey-responses', value_format = 'AVRO' ) AS SELECT user_id, survey_id, response FROM survey_entries EMIT CHANGES;
  21. dfine@confluent.io @TheDanicaFine linkedin.com/in/danica-fine/ Questions Table CREATE TABLE survey_questions ( id

    STRING PRIMARY KEY ) WITH ( kafka_topic = 'survey-questions', value_format = 'AVRO' );
  22. dfine@confluent.io @TheDanicaFine linkedin.com/in/danica-fine/ Analysis CREATE TABLE survey_results_live WITH ( kafka_topic='survey-results-live',

    value_format='AVRO' ) AS SELECT question AS question, HISTOGRAM(response) AS results FROM survey_entries GROUP BY question EMIT CHANGES;
  23. dfine@confluent.io @TheDanicaFine linkedin.com/in/danica-fine/ Results What is your Favorite Thanksgiving Pie?

    { Pumpkin Pie=1, Pecan Pie=1, ... Apple Pie=3 } { Pumpkin Pie=1, Pecan Pie=1, ... Apple Pie=4 } { Pumpkin Pie=2, Pecan Pie=1, ... Apple Pie=4 } { Pumpkin Pie=2, Pecan Pie=2, ... Apple Pie=4 } ...
  24. dfine@confluent.io @TheDanicaFine linkedin.com/in/danica-fine/ Analysis CREATE TABLE survey_results_live WITH ( kafka_topic='survey-results-live',

    value_format='AVRO' ) AS SELECT question AS question, HISTOGRAM(response) AS results FROM survey_entries GROUP BY question EMIT CHANGES;
  25. dfine@confluent.io @TheDanicaFine linkedin.com/in/danica-fine/ Analysis CREATE TABLE survey_results_final WITH ( kafka_topic='survey-results-final',

    value_format='AVRO' ) AS SELECT question AS question, HISTOGRAM(response) AS results FROM survey_entries WINDOW TUMBLING (SIZE 48 HOURS, GRACE PERIOD 10 MINUTE) GROUP BY question EMIT FINAL;
  26. dfine@confluent.io @TheDanicaFine linkedin.com/in/danica-fine/ Analysis CREATE TABLE survey_results_final WITH ( kafka_topic='survey-results-final',

    value_format='AVRO' ) AS SELECT question AS question, HISTOGRAM(response) AS results FROM survey_entries WINDOW TUMBLING (SIZE 48 HOURS, GRACE PERIOD 10 MINUTE) GROUP BY question EMIT FINAL;
  27. dfine@confluent.io @TheDanicaFine linkedin.com/in/danica-fine/ Alerting with Telegram • Kafka Connect ◦

    Connects Kafka and external systems ◦ Independent framework ◦ Configuration-based • Kafka Connect HTTP Sink Connector ◦ Fully-managed in Confluent Cloud ◦ Migrates the events from the survey_results_final topic as they happen !
  28. dfine@confluent.io @TheDanicaFine linkedin.com/in/danica-fine/ Alerting with Telegram • Kafka Connect ◦

    Connects Kafka and external systems ◦ Independent framework ◦ Configuration-based • Kafka Connect HTTP Sink Connector ◦ Fully-managed in Confluent Cloud ◦ Migrates the events from the survey_results_final topic as they happen !