Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Fast Cars, Big Data - How Streaming Can Help Fo...

Tugdual Grall
September 16, 2016

Fast Cars, Big Data - How Streaming Can Help Formula 1

Modern cars produce data. Lots of data. And Formula 1 cars produce more than their share. I will present a working demonstration of how modern data streaming can be applied to the data acquisition and analysis problem posed by modern motorsports.

Instead of bringing multiple Formula 1 cars to the talk, I will show how we instrumented a high fidelity physics-based automotive simulator to produce realistic data from simulated cars running on the Spa-Francorchamps track. We move data from the cars, to the pits, to the engineers back at HQ.

The result is near real-time visualization and comparison of performance and a great exposition of how to move data using messaging systems like Kafka, and process data in real time with Apache Spark, then analyse data using SQL with Apache Drill.

Code available here: https://github.com/mapr-demos/racing-time-series

Tugdual Grall

September 16, 2016
Tweet

More Decks by Tugdual Grall

Other Decks in Technology

Transcript

  1. © 2016 MapR Technologies © 2016 MapR Technologies MapR Confidential

    © 2016 MapR Technologies1 Fast Cars, Big Data How Streaming Can Help Formula 1 Tugdual Grall @tgrall
  2. © 2016 MapR Technologies © 2016 MapR Technologies @tgrall {“about”

    : “me”} Tugdual “Tug” Grall • MapR • Technical Evangelist • MongoDB • Technical Evangelist • Couchbase • Technical Evangelist • eXo • CTO • Oracle • Developer/Product Manager • Mainly Java/SOA • Developer in consulting firms • Web • @tgrall • http://tgrall.github.io • tgrall
 • NantesJUG co-founder
 • Pet Project : • http://www.resultri.com • [email protected][email protected]
  3. © 2016 MapR Technologies 3 Agenda • What’s the point

    of data in motorsports? • Live demo • Architecture • What’s next?
  4. © 2016 MapR Technologies 8 Real Analytics as Well as

    Visualization • Inputs • Predictive analysis of consumables and tires • Physical models of car + driver performance • Tire wear slows lap times, lower fuel weight speeds lap times • Competitors’ options • Weather conditions • Current GP points status • Outputs • Tactical options, outcome distributions
  5. © 2016 MapR Technologies 9 Data for Marketing as well

    http://formula1.ferrari.com/en/inforacing-hungarian-gp-2015/
  6. © 2016 MapR Technologies 10 • Up to 300 sensors

    per car • Up to 2000 channels Some Examples?
  7. © 2016 MapR Technologies 10 • Up to 300 sensors

    per car • Up to 2000 channels • Sensor data are sent to the paddock in 2ms Some Examples?
  8. © 2016 MapR Technologies 10 • Up to 300 sensors

    per car • Up to 2000 channels • Sensor data are sent to the paddock in 2ms • 1.5 billions of data points for a race Some Examples?
  9. © 2016 MapR Technologies 10 • Up to 300 sensors

    per car • Up to 2000 channels • Sensor data are sent to the paddock in 2ms • 1.5 billions of data points for a race • 5 billions for a full race weekend Some Examples?
  10. © 2016 MapR Technologies 10 • Up to 300 sensors

    per car • Up to 2000 channels • Sensor data are sent to the paddock in 2ms • 1.5 billions of data points for a race • 5 billions for a full race weekend • 5/6Gb of compressed data per car for 90mn Some Examples?
  11. © 2016 MapR Technologies 10 • Up to 300 sensors

    per car • Up to 2000 channels • Sensor data are sent to the paddock in 2ms • 1.5 billions of data points for a race • 5 billions for a full race weekend • 5/6Gb of compressed data per car for 90mn US Grand Prix 2014 : 243 Tb (race teams combined) Some Examples?
  12. © 2016 MapR Technologies 13 Simplified Demo System Outline Archive

    MapR DB Jetty / Bootstrap / d3 Apache Drill (SQL access) TORCS race simulator MapR Streams
  13. © 2016 MapR Technologies 14 TORCS for Cars, Physics and

    Drivers TORCS is a pseudo- physics based racing simulator with full graphics output and pluggable control modules. TORCS is commonly used for AI research, but the control model can just as well collect data
  14. © 2016 MapR Technologies © 2016 MapR Technologies @tgrall 16

    IoT : Racing Cars Producers Consumers sensors data Real Time Analytics https://github.com/mapr-demos/racing-time-series
  15. © 2016 MapR Technologies © 2016 MapR Technologies @tgrall 17

    IoT : Racing Cars sensors data https://github.com/mapr-demos/racing-time-series Kafka Producer (Java) Kafka Consumer
 +
 OJAI (Java) Kafka Consumer
 +
 WebSocket (Java + JS) SQL
  16. © 2016 MapR Technologies © 2016 MapR Technologies @tgrall 18

    Big Datastore Distributed File System HDFS/MapR-FS NoSQL Database HBase/MapR-DB ….
  17. © 2016 MapR Technologies © 2016 MapR Technologies @tgrall 19

    Store data as File or Row? HDFS / MapR-FS • Data stores as “files” • Fast with Large Scans • Slow random read/writes NoSQL (HBase/MapR-DB) • Data stores as row/documents • Fast with random read/writes
  18. © 2016 MapR Technologies 21 What is Kafka? • http://kafka.apache.org/

    • Created at LinkedIn, open sourced in 2011 • Implemented in Scala / Java • Distributed messaging system built to scale
  19. © 2016 MapR Technologies 22 Key Concepts • Feeds of

    messages are organised in topics • Processes that publish messages are called producers • Processes that subscribed to topic and process messages are consumers • A Kafka cluster is made of one or more brokers (== node)
  20. © 2016 MapR Technologies 23 Topics and Partitions • Split

    topics into partitions for scalability 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 0 1 2 3 4 5 6 7 Partition 0 Partition 1 Partition 2 Writes
  21. © 2016 MapR Technologies 24 Consumer Groups • Single consumer

    abstraction for scalability • Max 1 consumer per partition • Any number of consumer groups
  22. © 2016 MapR Technologies 25 Produce Messages 
 ProducerRecord<String, byte[]>

    rec = new ProducerRecord<>(
 “/stream/car_1_topic“,
 eventName,
 value.toString().getBytes());
 producer.send(rec, (recordMetadata, e) -> {
 if (e != null) { … }); producer.flush();
  23. © 2016 MapR Technologies 26 Consume Messages long pollTimeOut =

    800;
 while(true) {
 ConsumerRecords<String, String> records = consumer.poll(pollTimeOut);
 if (!records.isEmpty()) {
 Iterable<ConsumerRecord<String, String>> iterable = records::iterator;
 StreamSupport.stream(iterable.spliterator(), false).forEach((record) -> {
 // work with record object … record.value();
 … 
 });
 consumer.commitAsync();
 }
 }
  24. © 2016 MapR Technologies 28 More real life Kafka …

    Zookeeper Broker 1 Topic A Topic B Broker 2 Topic A Topic B Broker 3 Topic A Topic B Producer Producer Producer Consumer Consumer Consumer
  25. © 2016 MapR Technologies 29 • Distributed messaging system built

    to scale • Use Apache Kafka API 0.9.0 • No code change • Does not use the same “broker” architecture • Log stored in MapR Storage
 (Scalable, Secured, Fast, Multi DC) • No Zookeeper
  26. © 2016 MapR Technologies 30 Kafka Zookeeper Broker 1 Topic

    A Topic B Broker 2 Topic A Topic B Broker 3 Topic A Topic B Producer Producer Producer Consumer Consumer Consumer
  27. © 2016 MapR Technologies 31 MapR Streams Stream Topic A

    Topic B Stream Topic A Topic B Stream Topic A Topic B Producer Producer Producer Consumer Consumer Consumer
  28. © 2016 MapR Technologies 33 3 Sensor Data V1 •

    3 main data points: • Speed (m/s) • RPM • Distance (m) • Buffered { "_id":"1.458141858E9/0.324", "car" = "car1", "timestamp":1458141858, "racetime”:0.324, "records": [ { "sensors":{ "Speed":3.588583, "Distance":2003.023071, "RPM":1896.575806 }, "racetime":0.324, "timestamp":1458141858 }, { "sensors":{ "Speed":6.755624, "Distance":2004.084717, "RPM":1673.264526 }, "racetime":0.556, "timestamp":1458141858 },
  29. © 2016 MapR Technologies 34 3 Sensor Data V2 •

    3 main data points: • Speed (m/s) • RPM • Distance (m) • Throttle • Gear • … • Buffered { "_id":"1.458141858E9/0.324", "car" = "car1", "timestamp":1458141858, "racetime”:0.324, "records": [ { "sensors":{ "Speed":3.588583, "Distance":2003.023071, "RPM":1896.575806, "gear" : 2 }, "racetime":0.324, "timestamp":1458141858 }, { "sensors":{ "Speed":6.755624, "Distance":2004.084717, “RPM":1673.264526, "gear" : 2 }, "racetime":0.556, "timestamp":1458141858 },
  30. © 2016 MapR Technologies 35 • It works, is available

    on github, ASL 2 • Data collected is unrealistically limited, lacks – Tire pressure, temperature x 4 – Brake usage, temperature x 8 – Engine monitoring is primitive (RPMs only, no KERS) – Data rate is fixed, real data comes in at highly variable rates – Real data has variable delays due to RF dropout + buffering • Data collected is in pure JSON – Real data is columnar compressed blobs
  31. © 2016 MapR Technologies 36 Next Steps • Near Real

    Time Data Processing • Aggregation • Machine Learning • Alerts
  32. © 2016 MapR Technologies 37 • Cluster Computing Platform •

    Extends “MapReduce” with extensions – Streaming – Interactive Analytics • Run in Memory • http://spark.apache.org/
  33. © 2016 MapR Technologies 38 • Streaming Dataflow Engine •

    Datastream/Dataset APIs • CEP, Graph, ML • Run in Memory • https://flink.apache.org/
  34. © 2016 MapR Technologies © 2016 MapR Technologies @tgrall 39

    IoT : Racing Cars V2.0 sensors data https://github.com/mapr-demos/racing-time-series Alerts
  35. © 2016 MapR Technologies 40 Spark & Streams val topics

    = “/app/racing/stream:all_cars" val sparkConf = new SparkConf().setAppName(“SensorStream") val ssc = new StreamingContext(sparkConf, Seconds(2)) // Create direct kafka stream with brokers and topics val topicsSet = topics.split(",").toSet val kafkaParams = Map[String, String]( ConsumerConfig.GROUP_ID_CONFIG -> "race1", ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer", ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer", ConsumerConfig.AUTO_OFFSET_RESET_CONFIG -> "earliest", ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG -> "false", "spark.kafka.poll.time" -> "1000" ) val messages = KafkaUtils.createDirectStream[String, String](ssc, kafkaParams, topicsSet) val sensorDStream = messages.map(_._2).map(parseSensor) sensorDStream.foreachRDD { rdd => // There exists at least one element in RDD if (!rdd.isEmpty) { ….. } }
  36. © 2016 MapR Technologies © 2016 MapR Technologies @tgrall 41

    Streaming Architecture & Formula 1 • Stream data in real time • Big Data Store to deal with the scale • NoSQL Database, Distributed File System • Decouple the source from the consumer(s) • Dashboard, Analytics, Machine Learning • Add new use case….
  37. © 2016 MapR Technologies © 2016 MapR Technologies @tgrall 42

    Streaming Architecture & Formula 1 • Stream data in real time • Big Data Store to deal with the scale • NoSQL Database, Distributed File System • Decouple the source from the consumer(s) • Dashboard, Analytics, Machine Learning • Add new use case…. This is not only about Formula 1! (Telco, Finance, Retail, Content, IT)
  38. © 2016 MapR Technologies 43 MapR Converged Data Platform Open

    Source Engines & Tools Commercial Engines & Applications Utility-Grade Platform Services Data Processing Enterprise Storage MapR-FS MapR-DB MapR Streams Database Event Streaming Global Namespace High Availability Data Protection Self-healing Unified Security Real-time Multi-tenancy Search & Others Cloud & Managed Services Custom Apps Unified Management and Monitoring
  39. © 2016 MapR Technologies 44 MapR Platform Services: Open API

    Architecture
 Assures Interoperability, Avoids Lock-in MapR-FS Enterprise Storage MapR-DB NoSQL Database MapR Streams Global Event Streaming HDFS API POSIX NFS SQL, Hbase API JSON API Kafka API
  40. © 2016 MapR Technologies 46 Q & A @mapr |

    @tgrall maprtech [email protected] Engage with us! MapR maprtech mapr-technologies