Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Fast Cars, Big Data - How Streaming Can Help Fo...

Codemotion
November 12, 2019
7

Fast Cars, Big Data - How Streaming Can Help Formula 1_Tugdual Grall_Codemotion Berlin 2019

Modern race cars produce lot of data, and all this in real time. In this presentation I will show you how data could be generated and used by various applications in the car, on the track or team head quarter. The demonstration will show how to move data using messaging systems like Apache Kafka, process the data using Apache Spark and use various storage technics: Distributed File System, NoSQL Database. This presentation is a great opportunity to see how to build a " near real time big data application". The code of the demo is available as open source.

About: Tugdual Grall, Technical Account Manager - Redis Labs

Tugdual Grall is Technical Account Manager at Redis Labs where help customer and community to adopt Redis. Before joining Redis Labs Tug was PM at RedHat, Developer Advocate at MapR, MongoDB and Couchbase. Tug has also worked as CTO at eXo Plaform and JavaEE product manager, and software engineer at Oracle. Tugdual is Co-Founder of the Nantes JUG (Java User Group) that holds since 2008 monthly meeting about Java ecosystem.

Codemotion

November 12, 2019
Tweet

More Decks by Codemotion

Transcript

  1. Fast Cars, Fast Data How Streaming Can Help Formula 1

    Tugdual Grall 12-13 November, 2019
  2. @tgrall • Tugdual “Tug” Grall • Redis Labs • Red

    Hat (PM Dev Experience) • MapR (DevRel & PM) • MongoDB (DevRel) • Couchbase (DevRel) • eXo Platform (CTO) • Oracle (PM & Soft Engineer) About me 2 tgrall tgrall https://tgrall.github.io [email protected] [email protected] • Pet Projects • https://promoglisse-speed-challenge.com • Promoglisse Mobile App (iOS/Android)
  3. @tgrall • Around 200 sensors per car • Up to

    2000 channels • Sensor data are sent to the paddock in 2ms or less • 1.5 billions of data points for a race • 5 Billions for a full race weekend • 2Gb+ of data per car per lap Got examples 5 3Tb of data over a full race Source: Intel
  4. @tgrall Data in Motorsports 8 RPM Speed Lateral acceleration Gear

    Throttle Brakepressure F1 Framework - http://f1framework.blogspot.de/2013/08/short-guide-to-f1-telemetry-spa-circuit.html
  5. @tgrall • http://kafka.apache.org • Open sourced by LinkedIn in 2011

    • Distributed messaging system • Built to scale • Implemented in Scala/Java Data Streaming: Apache Kafka 10 ProducerRecord<String, byte[]> rec = new ProducerRecord<>(
 “/mycar“,
 eventName,
 value.toString().getBytes());
 producer.send(rec, (recordMetadata, e) -> {
 if (e != null) { … }); producer.flush();
  6. @tgrall Organize event into “Topics” 11 And decouple Producers from

    Consumers Kafka Cluster Topic: Ferrari Topic: Mercedes Topic: Red Bull Consumers Consumers Consumers Producer API Consumer API
  7. @tgrall Node 2 Node 2 Node 1 Topic are partitioned

    for scalability 12 Consumers Consumers Consumers Topic:Ferrari:partition1 Topic:RedBull:partition1 Topic:Mercedes:partition1 Topic:Ferrari:partition2 Topic:RedBull:partition2 Topic:Mercedes:partition2 Topic:Ferrari:partition3 Topic:RedBull:partition3 Topic:Mercedes:partition3
  8. @tgrall • In a Kafka partitions messages are • appended

    to the end • consumed in order received • The offset is the sequential id of a message Partition message order is like a Queue 13 Topic:Ferrari:Partition 1 5 4 3 2 1 6 Consumers Producers Topic:Ferrari:Partition 2 3 2 1 Consumers Producers Topic:Ferrari:Partition 3 5 4 3 2 1 Consumers Producers 5 4 3 2 1 Old message New message
  9. @tgrall • Producers append messages to end • Consumers read

    from front • Read cursor: offset ID of most recent read message Partition message order is like a Queue 14 Topic:Ferrari:Partition 1 5 4 3 2 1 6 Consumers Group App-1 Producers Consumers Group App-2
  10. @tgrall • Messages remain on the partition, available to other

    consumers Unlike a Queue, events are still persisted after delivered 15 Topic:Ferrari:Partition 1 5 4 3 2 1 6 Client Application Consumer Poll Get Unread 3 2 1 Unread Events
  11. @tgrall Processing of the same message for different applications 16

    Node 2 Node 2 Node 1 Topic:Ferrari:partition1 Topic:RedBull:partition1 Topic:Mercedes:partition1 Topic:Ferrari:partition2 Topic:RedBull:partition2 Topic:Mercedes:partition2 Topic:Ferrari:partition3 Topic:RedBull:partition3 Topic:Mercedes:partition3 Consumers Consumers Consumers Leaderboards Streaming Processing Real Time Analytics
  12. @tgrall • https://redis.io/topics/streams-intro • A new Redis Data Structure •

    Distributed messaging system Data Streaming: Redis Streams 17 > XADD mycar * sensor-id 1234 temperature 19.8 > XRANGE mycar - + > XREADGROUP GROUP speed-analyser c1 STREAMS mycar >
  13. @tgrall • Asynchronous data exchange between producers and consumers •

    Scale with consumer groups • Rich choice of options to the consumers to read streaming & static data • Automatic eviction of data based on upper limit Data Streaming: Redis Streams 18
  14. @tgrall Organize event into “Streams” (Keys) 19 And decouple Producers

    from Consumers Redis Cluster Ferrari Mercedes Red Bull Consumers Consumers Consumers Producer API Consumer API
  15. @tgrall Processing of the same message for different store/models 21

    Key-Value Time Series RDBMS Graph Search DW New events Events
  16. @tgrall • You have to chose depending of the “consuming

    application” needs: • Keep the data into the event log (Apache Kafka, Redis Streams) • Into a NoSQL engine, RDBMS • … • Some of the common “technical” requirements: • Highly available • Flexible schema • Easy to use (as a developer and sysadmins) • … Storing Data 22
  17. @tgrall • Request / Response • Batch • Stream Processing

    • Real-time reaction to events • Continuous applications • Process both real-time and historical data Processing Data 23
  18. @tgrall • Request / Response • Batch • Stream Processing

    • Real-time reaction to events • Continuous applications • Process both real-time and historical data Data Processing using Spark’s Structured Streaming 24
  19. @tgrall Redis Streams & Apache Spark 26 val spark =

    SparkSession.builder.appName("Redis Racing Application”).master("local[*]") .config("spark.redis.host", “localhost") .config("spark.redis.port", “12000") .getOrCreate(); carEventStream.createOrReplaceTempView(“events"); val q = spark.sql("select car count(*) from events group by car”); val query = q .writeStream .outputMode("complete") .format("console") .start() query.awaitTermination()