Fast Cars, Big Data - How Streaming Can Help Formula 1_Tugdual Grall_Codemotion Berlin 2019

Fast Cars, Fast Data How Streaming Can Help Formula 1
Tugdual Grall 12-13 November, 2019

@tgrall • Tugdual “Tug” Grall • Redis Labs • Red
Hat (PM Dev Experience) • MapR (DevRel & PM) • MongoDB (DevRel) • Couchbase (DevRel) • eXo Platform (CTO) • Oracle (PM & Soft Engineer) About me 2 tgrall tgrall https://tgrall.github.io [email protected] [email protected] • Pet Projects • https://promoglisse-speed-challenge.com • Promoglisse Mobile App (iOS/Android)

@tgrall What’s the point of data in motorsports? 3

@tgrall Some of the most heavily instrumented objects in the
world 4 Source: Sauber Team

@tgrall • Around 200 sensors per car • Up to
2000 channels • Sensor data are sent to the paddock in 2ms or less • 1.5 billions of data points for a race • 5 Billions for a full race weekend • 2Gb+ of data per car per lap Got examples 5 3Tb of data over a full race Source: Intel

@tgrall Data Communication 6

@tgrall 7

@tgrall Data in Motorsports 8 RPM Speed Lateral acceleration Gear
Throttle Brakepressure F1 Framework - http://f1framework.blogspot.de/2013/08/short-guide-to-f1-telemetry-spa-circuit.html

Data Streaming Moving millions of events per h|mn|s 9

@tgrall • http://kafka.apache.org • Open sourced by LinkedIn in 2011
• Distributed messaging system • Built to scale • Implemented in Scala/Java Data Streaming: Apache Kafka 10 ProducerRecord<String, byte[]> rec = new ProducerRecord<>(  “/mycar“,  eventName,  value.toString().getBytes());  producer.send(rec, (recordMetadata, e) -> {  if (e != null) { … }); producer.flush();

@tgrall Organize event into “Topics” 11 And decouple Producers from
Consumers Kafka Cluster Topic: Ferrari Topic: Mercedes Topic: Red Bull Consumers Consumers Consumers Producer API Consumer API

@tgrall Node 2 Node 2 Node 1 Topic are partitioned
for scalability 12 Consumers Consumers Consumers Topic:Ferrari:partition1 Topic:RedBull:partition1 Topic:Mercedes:partition1 Topic:Ferrari:partition2 Topic:RedBull:partition2 Topic:Mercedes:partition2 Topic:Ferrari:partition3 Topic:RedBull:partition3 Topic:Mercedes:partition3

@tgrall • In a Kafka partitions messages are • appended
to the end • consumed in order received • The offset is the sequential id of a message Partition message order is like a Queue 13 Topic:Ferrari:Partition 1 5 4 3 2 1 6 Consumers Producers Topic:Ferrari:Partition 2 3 2 1 Consumers Producers Topic:Ferrari:Partition 3 5 4 3 2 1 Consumers Producers 5 4 3 2 1 Old message New message

@tgrall • Producers append messages to end • Consumers read
from front • Read cursor: offset ID of most recent read message Partition message order is like a Queue 14 Topic:Ferrari:Partition 1 5 4 3 2 1 6 Consumers Group App-1 Producers Consumers Group App-2

@tgrall • Messages remain on the partition, available to other
consumers Unlike a Queue, events are still persisted after delivered 15 Topic:Ferrari:Partition 1 5 4 3 2 1 6 Client Application Consumer Poll Get Unread 3 2 1 Unread Events

@tgrall Processing of the same message for different applications 16
Node 2 Node 2 Node 1 Topic:Ferrari:partition1 Topic:RedBull:partition1 Topic:Mercedes:partition1 Topic:Ferrari:partition2 Topic:RedBull:partition2 Topic:Mercedes:partition2 Topic:Ferrari:partition3 Topic:RedBull:partition3 Topic:Mercedes:partition3 Consumers Consumers Consumers Leaderboards Streaming Processing Real Time Analytics

@tgrall • https://redis.io/topics/streams-intro • A new Redis Data Structure •
Distributed messaging system Data Streaming: Redis Streams 17 > XADD mycar * sensor-id 1234 temperature 19.8 > XRANGE mycar - + > XREADGROUP GROUP speed-analyser c1 STREAMS mycar >

@tgrall • Asynchronous data exchange between producers and consumers •
Scale with consumer groups • Rich choice of options to the consumers to read streaming & static data • Automatic eviction of data based on upper limit Data Streaming: Redis Streams 18

@tgrall Organize event into “Streams” (Keys) 19 And decouple Producers
from Consumers Redis Cluster Ferrari Mercedes Red Bull Consumers Consumers Consumers Producer API Consumer API

Decoupled and Flexible Architecture 20

@tgrall Processing of the same message for different store/models 21
Key-Value Time Series RDBMS Graph Search DW New events Events

@tgrall • You have to chose depending of the “consuming
application” needs: • Keep the data into the event log (Apache Kafka, Redis Streams) • Into a NoSQL engine, RDBMS • … • Some of the common “technical” requirements: • Highly available • Flexible schema • Easy to use (as a developer and sysadmins) • … Storing Data 22

@tgrall • Request / Response • Batch • Stream Processing
• Real-time reaction to events • Continuous applications • Process both real-time and historical data Processing Data 23

@tgrall • Request / Response • Batch • Stream Processing
• Real-time reaction to events • Continuous applications • Process both real-time and historical data Data Processing using Spark’s Structured Streaming 24

@tgrall Add new “applications” on the same events 25 Decouple
Producers from Consumers Events Store

@tgrall Redis Streams & Apache Spark 26 val spark =
SparkSession.builder.appName("Redis Racing Application”).master("local[*]") .config("spark.redis.host", “localhost") .config("spark.redis.port", “12000") .getOrCreate(); carEventStream.createOrReplaceTempView(“events"); val q = spark.sql("select car count(*) from events group by car”); val query = q .writeStream .outputMode("complete") .format("console") .start() query.awaitTermination()

Demonstration 27

@tgrall My personal Formula 1 championship…. 28 Web Socket

Thank you!

Fast Cars, Big Data - How Streaming Can Help Fo...

Fast Cars, Big Data - How Streaming Can Help Formula 1_Tugdual Grall_Codemotion Berlin 2019

Codemotion

More Decks by Codemotion

Featured

Transcript

Fast Cars, Fast Data How Streaming Can Help Formula 1

@tgrall • Tugdual “Tug” Grall • Redis Labs • Red

@tgrall What’s the point of data in motorsports? 3

@tgrall Some of the most heavily instrumented objects in the

@tgrall • Around 200 sensors per car • Up to

@tgrall Data Communication 6

@tgrall 7

@tgrall Data in Motorsports 8 RPM Speed Lateral acceleration Gear

Data Streaming Moving millions of events per h|mn|s 9

@tgrall • http://kafka.apache.org • Open sourced by LinkedIn in 2011

@tgrall Organize event into “Topics” 11 And decouple Producers from

@tgrall Node 2 Node 2 Node 1 Topic are partitioned

@tgrall • In a Kafka partitions messages are • appended

@tgrall • Producers append messages to end • Consumers read

@tgrall • Messages remain on the partition, available to other

@tgrall Processing of the same message for different applications 16

@tgrall • https://redis.io/topics/streams-intro • A new Redis Data Structure •

@tgrall • Asynchronous data exchange between producers and consumers •

@tgrall Organize event into “Streams” (Keys) 19 And decouple Producers

Decoupled and Flexible Architecture 20

@tgrall Processing of the same message for different store/models 21

@tgrall • You have to chose depending of the “consuming

@tgrall • Request / Response • Batch • Stream Processing

@tgrall • Request / Response • Batch • Stream Processing

@tgrall Add new “applications” on the same events 25 Decouple

@tgrall Redis Streams & Apache Spark 26 val spark =

Demonstration 27

@tgrall My personal Formula 1 championship…. 28 Web Socket

Thank you!