Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Event Driven Architectures with Apache Kafka on Heroku

Chris Castle
November 03, 2016

Event Driven Architectures with Apache Kafka on Heroku

Apache Kafka is the backbone for building architectures that deal with billions of events a day. Chris Castle, Developer Advocate, will show you where it might fit in your roadmap.

- What Apache Kafka is and how to use it on Heroku
- How Kafka enables you to model your data as immutable streams of events, introducing greater parallelism into your applications
- How you can use it to solve scale problems across your stack such as managing high throughput inbound events and building data pipelines

Learn more at https://www.heroku.com/kafka

Reveal.js version of slides: http://slides.com/christophercastle/deck#/

Chris Castle

November 03, 2016
Tweet

More Decks by Chris Castle

Other Decks in Programming

Transcript

  1. Event Driven Architectures with Apache Kafka on Heroku Chris Castle,

    Developer Advocate Rand Fitzpatrick, Director of Product November 3, 2016
  2. What problems does Apache Kafka solve? What are the core

    concepts of Kafka? Why Apache Kafka on Heroku?
  3. Forward-Looking Statements Statement under the Private Securities Litigation Reform Act

    of 1995: This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be deemed forward-looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services. The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of any litigation, risks associated with completed and any possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the financial results of salesforce.com, inc. is included in our annual report on Form 10-K for the most recent fiscal year and in our quarterly report on Form 10-Q for the most recent fiscal quarter. These documents and others containing important disclosures are available on the SEC Filings section of the Investor Information section of our Web site. Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward- looking statements.
  4. Event-Driven Architecture Event-driven architecture (EDA), also known as message-driven architecture,

    is a software architecture pattern promoting the production, detection, consumption of, and reaction to events. Source: Wikipedia
  5. What Are Events? Context When was the event? (event time,

    process time)? What produced the event? (causal history, device, etc) Where did the event occur? (system location, geo location) Operation What function was applied? (create, update, delete, etc) What are the characteristics of the function? State What is the data involved in the event? How is that data identified? "Contextualized operation on state"
  6. Event Examples Product views Completed sales Page visits Site logins

    Shipping notifications Inventory received IoT sensor values Weather data Traffic data Tweets Election polling data! Completed sale 2016-11-03T15:13:27Z Retail www site referrer Google search Inventory item purchased Amazon Echo, Black $179.99 ID B00X4WHP5E Context Operation State
  7. Why Should I Care? Scaling too slowly leads to dropped

    data Overprovisioning leads to inefficient systems Dataflow between processing stages requires coordination Parallel pipelines with the same data can be nontrivial Service discovery must support current and future processes Sequencing service availability is critical to system function Possible loss of state when individual services fail
  8. Why Should I Care? Inbound Streams Scaling too slowly leads

    to dropped data Overprovisioning leads to inefficient systems Backpressure and other coordination is hard! Data Pipelines Dataflow between processing stages requires coordination Parallel pipelines with the same data can be nontrivial Provenance and auditability!? Microservices Service discovery must support current and future processes Sequencing service availability is critical to system function Possible loss of state when individual services fail
  9. Why Should I Care? Inbound Streams Event streams in Kafka

    allow other resources to pull when ready Resources can fail and reconnect without dropping events Kafka provides elasticity, reducing the need for backpressure Data Pipelines Dataflow coordination is reduced via event stream structure The immutability of data allows for trivial parallel processing Tracking provenance and lineage of data becomes possible Microservices Services now only need to discover topics in Kafka Service availability sequencing is relaxed Inter-service communication is more robust
  10. Apache Kafka Core Concepts PRODUCERS CONSUMERS ​Brokers The instances running

    Kafka and managing streams of events in a cluster. ​Producers + Consumers Clients that write to or read from a Kafka cluster. ​Topics Streams of events that are replicated across the brokers. Configured with time based retention or log compaction. ​Partitions Discrete subsets of topics, and important tuning points for parallelism and ordering. BROKER TOPIC PARTITION
  11. Example Producers Product views Completed sales Page visits Site logins

    Shipping notifications Inventory received IoT data Weather data Traffic data Tweets Election polling data! Web server Payment processor Browser Authentication service Shipping provider Warehouse Motion sensor Rain gauge Vehicle sensor Twitter Online/phone survey
  12. Personalization engine Accounting system Reporting dashboard Security audit service Shipping

    provider Inventory database Actuator Climate model Traffic map Analytics dashboard Election forecast Example Consumers Product views Completed sales Page visits Site logins Shipping notifications Inventory received IoT data Weather data Traffic data Tweets Election polling data!
  13. Complex Controls TOPIC PARTITION Other Kafka primitives to provide structure

    to Kafka event streams Retention Log compaction Replication factor Delivery guarantees
  14. Kafka Connect Some examples: HDFS, JDBC, Elasticsearch, Couchbase, Oracle, MS

    SQL Server, Cassandra, DynamoDB, Salesforce Streaming API, Splunk Image credit: Confluent Kafka Connect announcement blog post
  15. Without Heroku Apache Kafka The heart of the event management

    system, with a broad variety of configurations and options. Apache Zookeeper The system’s consensus and coordination cluster is vital for Kafka’s operation. OS + JVM Tuning Tuning the cluster runtimes can be an art. Instances + Networking Physical or virtual, the infrastructure behind clusters must be well considered. Myriad Moving Pieces
  16. Apache Kafka on Heroku Experienced Staff Self-Healing Current Version No-Downtime

    Upgrades Heroku engineers have contributed patches to the core open source Kafka project.
  17. Let's Review... ...and get you started with Kafka! Apache Kafka

    is a valuable tool for building architectures to support inbound event streams, data processing pipelines, and microservices coordination. The primitives provided by Kafka -- topics, partitions, retention duration, log compaction, and replication -- provide the tools to manage structured event streams. Apache Kafka on Heroku simplifies operational complexity so that any developer can get started quickly and feel confident that their application is supported by a rock-solid, production service. Get started at hrku.co/use-kafka
  18. Q&A Rand Fitzpatrick, Director of Product Chris Castle, Developer Advocate

    But first, please take one minute to answer a few quick questions so we can make webinars like this even better for you.
  19. Learn More Apache Kafka on Heroku Get Started Documentation Kafka

    Event Stream Modeling Podcast: Managed Kafka with Heroku Engineer Tom Crayford https://www.heroku.com/kafka https://elements.heroku.com/addons/heroku-kafka https://devcenter.heroku.com/articles/kafka-on-heroku https://devcenter.heroku.com/articles/kafka-event-stream-modeling http://softwareengineeringdaily.com/2016/10/25/managed-kafka- with-tom-crayford/