Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Big Whoop! Apache Kafka mit Protobuf und Sy...

The Big Whoop! Apache Kafka mit Protobuf und Symfony

Heutzutage löst man die Aufteilung und Modularisation von großen monolithischen "Legacy"-Applikationen und Datenbanken mit HTTP Microservices. Trivago ist einen anderen Weg gegangen und nutzt Apache Kafka, Debezium und Stream Processing. Sie haben sich schon vor dem Release von Symfony 2.0.0, im Januar 2012, für die Nutzung von Symfony2 entschieden und ihre PHP-Applikationen basieren seitdem auf Symfony.

Der Talk zeigt, wie eine Stream-Architektur aufgebaut ist und wie man performant mit PHP und Symfony Google Protocol Buffer Nachrichten erzeugt und in einen Kafka Stream schreibt. Außerdem wird gezeigt, warum Trivago so von Streaming-Architekturen überzeugt ist und warum dieser Weg 1 Million mal schneller ist als HTTP Microservices.

René Kerner

May 04, 2018
Tweet

More Decks by René Kerner

Other Decks in Technology

Transcript

  1. https://lparchive.org/The-Secret-of-Monkey-Island/Update%201/1-somi_001.gif Up Next: “The Big Whoop! Apache Kafka mit Protobuf

    und Symfony” René Kerner, Software-Engineer/Architect at trivago since 2011 @rk3rn3r
  2. Apache Kafka Semantically: - Everything is an event - “Tim

    ordered one pair of shoes.” (fact / event) - “Stock for shoes is low.” (fact) - “We have product A for 5.99€.” (new state / add product to inventory) - “Price of product A is 4.99€.” (state change / change product price) - Event → Message - Message → Offset, CRC, Magic Byte, Attributes, Timestamp, Key byte[ ], Value byte [ ] (prior Kafka 0.11) https://steemit-production-imageproxy-upload.s3.amazonaws.co m/DQmekFUCSWKUEZ9aKSdUJSpa9FdQwHGpxu3D81GmQt2 J6Xq
  3. Apache Kafka Semantically: - Everything is an event - “Tim

    ordered one pair of shoes.” (fact / event) - “Stock for shoes is low.” (fact) - “We have product A for 5.99€.” (new state / add product to inventory) - “Price of product A is 4.99€.” (state change / change product price) - Event → Message - Message → Offset, CRC, Magic Byte, Attributes, Timestamp, Key byte[ ], Value byte [ ] (prior Kafka 0.11) Technically: - Everything is a LOG - SQL datastore writes: - lock - write to (append only) commit LOG - acknowledge write - process storage + index update - release lock - NoSQL writes: - write to immutable (append only) commit LOG (e.g. Cassandra SSTable files) - APP: - (Debug) LOG - “Turning the database inside-out” (Martin Kleppmann) https://martin.kleppmann.com/2015/11/05/database-inside- out-at-oredev.html https://steemit-production-imageproxy-upload.s3.amazonaws.co m/DQmekFUCSWKUEZ9aKSdUJSpa9FdQwHGpxu3D81GmQt2 J6Xq
  4. Kafka Architecture - Cluster of multiple Broker - Binary protocol

    - HTTP REST API - Replication (ISR) - Zookeeper (high-consistent, high-available datastore) for metadata storage, leader election etc. - Producer - Consumer - CQRS (Command Query Responsibility Segregation): decouple W/R - Kafka Connect (DB to K // K to DB) - Kafka Streams ([Stateful] Stream Processing high-level API) - Processor API https://dzone.com/storage/temp/5639205-kafka1.png
  5. Let’s dig one level deeper... - Kafka Topic == stream

    - 1 topic consists of 1 to n partitions, distributed over the cluster (scaling) - One partition is one stream of strictly ordered messages (same key → same partition) - Every message has a number == offset - Leader election - ISR (In-Sync replica) https://media.giphy.com/media/AQE9xUzw6bwrK/200.gif https://webassets.mongodb.com/_com_assets/cms/image00-umnm8strpv.png
  6. Why Kafka? (1) The Monolith - Big monorepo with all

    customer facing and B2B apps, admin, etc., inside (SVN) - Big Java monorepo with backend services (SVN) - Hard coupled - Big central database/s - 1 dev location, 10-20 devs, 2 ops, few Product Managers - Everyone knows + owns everything → No one owns + knows anything
  7. Why Kafka? (2) The Monsterlith - Microservices Architecture - Even

    bigger, heavily growing central database/s - 4 dev location, >300 devs, 2 big ops teams, many teams spread over 6 buildings (3 buildings + 3 locations), many POs with very different responsibilities / goals - Everyone knows + owns everything → No one owns + knows anything - Still hard coupled by central datastore/s http://i.imgur.com/kXruc8k.gif
  8. How to … - change a monolithic database to a

    contextual architecture? bounded context - integrate the different teams? - enable independent, cross-functional teams? - technically focus knowledge / expertise? - integrate all these microservices? - get ownership of data to the responsible teams? - share data (read-only) with others? - migrate to a new architecture? - setup “bounded context/s” and migrate data from central datastore/s without breaking legacy apps? - decouple writers and readers? CQRS (Command Query Responsibility Segregation)? http://rs534.pbsrc.com/albums/ee347/Defalto/navigatorhead.gif~c400
  9. Our solution (2) - CDC: Change Data Capture the big

    central databases using Debezium (debezium.io) / Kafka Connect - Sink streams into bounded context datastore/s using Kafka Connect Sinks - Migration scenario: Central DB → Kafka Topic → Bounded Context → processing → write to new table + CDC or directly emit processed data to another Kafka Topic → sink the new data into Central DB until legacy apps migrated - Consume/Produce directly (without CDC) - “Every asynchronous process can be done as Stream Processor (SP) using Kafka Streams.” -me - Apps read from database/s or caches, or directly from streams - Message Format: Google Protocol Buffers (Protobuf) https://lparchive.org/The-Secret-of-Monkey-Island/Update%2036/3-somi_1625.gif
  10. Protobuf with PHP - Google Protocol Buffers - Interface Definition

    Language (IDL) - Optimized serialization format for structured data - Optimized for size (low memory usage) and speed - Best for storage, network transport, etc. - Proto2, Proto3 - Google PHP package + C-Extension: https://github.com/google/protobuf/tree/master/php - installable with pecl - poor feature set, no proto2 support - DrSlump Protobuf-PHP: https://github.com/drslump/Protobuf-PHP - installable with composer - supports everything that the original, binary google protoc compiler using a protoc plugin - class generator → plain PHP https://martin.kleppmann.com/2012/12/protobuf.png Person.proto
  11. PHP to Kafka Connection (1) PHP extension based connection -

    Based on PHP C-extension rdkafka and librdkafka - Supports Kafka 0.8, 0.9, 0.10 (0.11 branch available) - Installable with pecl - https://github.com/arnaud-lb/php-rdkafka - PHP 5.3+, PHP 7.x - Many Symfony bundles available as composer packages (using ext-rdkafka) - Plain PHP connector PHP 7.1+: https://github.com/weiboad/kafka-php - Supports Kafka 0.8+, 0.9+ consumers, 0.10
  12. PHP to Kafka Connection (2) PHP extension based connection -

    Based on PHP C-extension rdkafka and librdkafka - Supports Kafka 0.8, 0.9, 0.10 (0.11 branch available) - Installable with pecl - https://github.com/arnaud-lb/php-rdkafka - PHP 5.3+, PHP 7.x - Some Symfony bundles available as composer packages (using ext-rdkafka) - Plain PHP connector PHP 7.1+: https://github.com/weiboad/kafka-php - Supports Kafka 0.8+, 0.9+ consumers, 0.10 Using a socket based (Log)Multiplexer - Producer: - Create Protobuf messages within PHP - Send data (message) to a socket - Consumer: - Read messages from a socket - Socket is a (Log)(De-)Multiplexer forwarding messages from/to Kafka - Envelope needed for keyed data - https://github.com/trivago/gollum - Supports Kafka 0.8+, including 0.10, 0.11, 1.x using sarama - https://github.com/Shopify/sarama