Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Fast Cars, Big Data - How Streaming Can Help Formula 1

Tugdual Grall
September 16, 2016

Fast Cars, Big Data - How Streaming Can Help Formula 1

Modern cars produce data. Lots of data. And Formula 1 cars produce more than their share. I will present a working demonstration of how modern data streaming can be applied to the data acquisition and analysis problem posed by modern motorsports.

Instead of bringing multiple Formula 1 cars to the talk, I will show how we instrumented a high fidelity physics-based automotive simulator to produce realistic data from simulated cars running on the Spa-Francorchamps track. We move data from the cars, to the pits, to the engineers back at HQ.

The result is near real-time visualization and comparison of performance and a great exposition of how to move data using messaging systems like Kafka, and process data in real time with Apache Spark, then analyse data using SQL with Apache Drill.

Code available here: https://github.com/mapr-demos/racing-time-series

Tugdual Grall

September 16, 2016
Tweet

More Decks by Tugdual Grall

Other Decks in Technology

Transcript

  1. © 2016 MapR Technologies
    © 2016 MapR Technologies
    MapR Confidential © 2016 MapR Technologies1
    Fast Cars, Big Data
    How Streaming Can Help Formula 1
    Tugdual Grall
    @tgrall

    View Slide

  2. © 2016 MapR Technologies
    © 2016 MapR Technologies
    @tgrall
    {“about” : “me”}
    Tugdual “Tug” Grall
    • MapR
    • Technical Evangelist
    • MongoDB
    • Technical Evangelist
    • Couchbase
    • Technical Evangelist
    • eXo
    • CTO
    • Oracle
    • Developer/Product Manager
    • Mainly Java/SOA
    • Developer in consulting firms
    • Web
    • @tgrall
    • http://tgrall.github.io
    • tgrall

    • NantesJUG co-founder

    • Pet Project :
    • http://www.resultri.com
    [email protected]
    [email protected]

    View Slide

  3. © 2016 MapR Technologies 3
    Agenda
    • What’s the point of data in motorsports?
    • Live demo
    • Architecture
    • What’s next?

    View Slide

  4. © 2016 MapR Technologies 4
    How data plays in F1 motorsports

    View Slide

  5. © 2016 MapR Technologies 5

    View Slide

  6. © 2016 MapR Technologies 6
    Data in Motorsports
    http://f1framework.blogspot.de/2013/08/short-guide-to-f1-telemetry-spa-circuit.html

    View Slide

  7. © 2016 MapR Technologies 7
    Difference is due to
    later and sharper
    braking

    View Slide

  8. © 2016 MapR Technologies 8
    Real Analytics as Well as Visualization
    • Inputs
    • Predictive analysis of consumables and tires
    • Physical models of car + driver performance
    • Tire wear slows lap times, lower fuel weight speeds lap times
    • Competitors’ options
    • Weather conditions
    • Current GP points status
    • Outputs
    • Tactical options, outcome distributions

    View Slide

  9. © 2016 MapR Technologies 9
    Data for Marketing as well
    http://formula1.ferrari.com/en/inforacing-hungarian-gp-2015/

    View Slide

  10. © 2016 MapR Technologies 10
    Some Examples?

    View Slide

  11. © 2016 MapR Technologies 10
    • Up to 300 sensors per car
    Some Examples?

    View Slide

  12. © 2016 MapR Technologies 10
    • Up to 300 sensors per car
    • Up to 2000 channels
    Some Examples?

    View Slide

  13. © 2016 MapR Technologies 10
    • Up to 300 sensors per car
    • Up to 2000 channels
    • Sensor data are sent to the paddock in 2ms
    Some Examples?

    View Slide

  14. © 2016 MapR Technologies 10
    • Up to 300 sensors per car
    • Up to 2000 channels
    • Sensor data are sent to the paddock in 2ms
    • 1.5 billions of data points for a race
    Some Examples?

    View Slide

  15. © 2016 MapR Technologies 10
    • Up to 300 sensors per car
    • Up to 2000 channels
    • Sensor data are sent to the paddock in 2ms
    • 1.5 billions of data points for a race
    • 5 billions for a full race weekend
    Some Examples?

    View Slide

  16. © 2016 MapR Technologies 10
    • Up to 300 sensors per car
    • Up to 2000 channels
    • Sensor data are sent to the paddock in 2ms
    • 1.5 billions of data points for a race
    • 5 billions for a full race weekend
    • 5/6Gb of compressed data per car for 90mn
    Some Examples?

    View Slide

  17. © 2016 MapR Technologies 10
    • Up to 300 sensors per car
    • Up to 2000 channels
    • Sensor data are sent to the paddock in 2ms
    • 1.5 billions of data points for a race
    • 5 billions for a full race weekend
    • 5/6Gb of compressed data per car for 90mn
    US Grand Prix 2014 : 243 Tb (race teams combined)
    Some Examples?

    View Slide

  18. © 2016 MapR Technologies 11
    So how does that work?

    Especially for real-time data?

    View Slide

  19. © 2016 MapR Technologies 12
    Production System Outline

    View Slide

  20. © 2016 MapR Technologies 13
    Simplified Demo System Outline
    Archive
    MapR DB
    Jetty /
    Bootstrap /
    d3
    Apache Drill
    (SQL access)
    TORCS race
    simulator
    MapR
    Streams

    View Slide

  21. © 2016 MapR Technologies 14
    TORCS for Cars, Physics and Drivers
    TORCS is a pseudo-
    physics based racing
    simulator with full graphics
    output and pluggable
    control modules.
    TORCS is commonly used
    for AI research, but the
    control model can just as
    well collect data

    View Slide

  22. © 2016 MapR Technologies 15
    Let’s see it work!

    View Slide

  23. © 2016 MapR Technologies
    © 2016 MapR Technologies
    @tgrall 16
    IoT : Racing Cars
    Producers
    Consumers
    sensors data
    Real Time
    Analytics
    https://github.com/mapr-demos/racing-time-series

    View Slide

  24. © 2016 MapR Technologies
    © 2016 MapR Technologies
    @tgrall 17
    IoT : Racing Cars
    sensors data
    https://github.com/mapr-demos/racing-time-series
    Kafka
    Producer
    (Java)
    Kafka Consumer

    +

    OJAI
    (Java)
    Kafka Consumer

    +

    WebSocket
    (Java + JS)
    SQL

    View Slide

  25. © 2016 MapR Technologies
    © 2016 MapR Technologies
    @tgrall 18
    Big Datastore
    Distributed File System
    HDFS/MapR-FS
    NoSQL Database
    HBase/MapR-DB
    ….

    View Slide

  26. © 2016 MapR Technologies
    © 2016 MapR Technologies
    @tgrall 19
    Store data as File or Row?
    HDFS / MapR-FS
    • Data stores as “files”
    • Fast with Large Scans
    • Slow random read/writes
    NoSQL (HBase/MapR-DB)
    • Data stores as row/documents
    • Fast with random read/writes

    View Slide

  27. © 2016 MapR Technologies 20
    Kafka & MapR Streams

    View Slide

  28. © 2016 MapR Technologies 21
    What is Kafka?
    • http://kafka.apache.org/
    • Created at LinkedIn, open sourced in 2011
    • Implemented in Scala / Java
    • Distributed messaging system built to scale

    View Slide

  29. © 2016 MapR Technologies 22
    Key Concepts
    • Feeds of messages are organised in topics
    • Processes that publish messages are called producers
    • Processes that subscribed to topic and process messages are
    consumers
    • A Kafka cluster is made of one or more brokers (== node)

    View Slide

  30. © 2016 MapR Technologies 23
    Topics and Partitions
    • Split topics into partitions for scalability
    0 1 2 3 4 5 6 7 8
    0 1 2 3 4 5
    0 1 2 3 4 5 6 7
    Partition 0
    Partition 1
    Partition 2
    Writes

    View Slide

  31. © 2016 MapR Technologies 24
    Consumer Groups
    • Single consumer abstraction for scalability
    • Max 1 consumer per partition
    • Any number of consumer groups

    View Slide

  32. © 2016 MapR Technologies 25
    Produce Messages

    ProducerRecord rec = new ProducerRecord<>(

    “/stream/car_1_topic“,

    eventName,

    value.toString().getBytes());

    producer.send(rec, (recordMetadata, e) -> {

    if (e != null) { … });
    producer.flush();

    View Slide

  33. © 2016 MapR Technologies 26
    Consume Messages
    long pollTimeOut = 800;

    while(true) {

    ConsumerRecords records = consumer.poll(pollTimeOut);

    if (!records.isEmpty()) {

    Iterable> iterable = records::iterator;

    StreamSupport.stream(iterable.spliterator(), false).forEach((record) -> {

    // work with record object
    … record.value();



    });

    consumer.commitAsync();

    }

    }

    View Slide

  34. © 2016 MapR Technologies 27
    Big Picture
    Producer
    Producer
    Producer
    Consumer
    Consumer
    Consumer

    View Slide

  35. © 2016 MapR Technologies 28
    More real life Kafka …
    Zookeeper
    Broker 1
    Topic A Topic B
    Broker 2
    Topic A Topic B
    Broker 3
    Topic A Topic B
    Producer
    Producer
    Producer
    Consumer
    Consumer
    Consumer

    View Slide

  36. © 2016 MapR Technologies 29
    • Distributed messaging system built to scale
    • Use Apache Kafka API 0.9.0
    • No code change
    • Does not use the same “broker” architecture
    • Log stored in MapR Storage

    (Scalable, Secured, Fast, Multi DC)
    • No Zookeeper

    View Slide

  37. © 2016 MapR Technologies 30
    Kafka
    Zookeeper
    Broker 1
    Topic A Topic B
    Broker 2
    Topic A Topic B
    Broker 3
    Topic A Topic B
    Producer
    Producer
    Producer
    Consumer
    Consumer
    Consumer

    View Slide

  38. © 2016 MapR Technologies 31
    MapR Streams
    Stream
    Topic A Topic B
    Stream
    Topic A Topic B
    Stream
    Topic A Topic B
    Producer
    Producer
    Producer
    Consumer
    Consumer
    Consumer

    View Slide

  39. © 2016 MapR Technologies 32
    What’s next?

    View Slide

  40. © 2016 MapR Technologies 33
    3
    Sensor Data V1
    • 3 main data points:
    • Speed (m/s)
    • RPM
    • Distance (m)
    • Buffered
    { "_id":"1.458141858E9/0.324",
    "car" = "car1",
    "timestamp":1458141858,
    "racetime”:0.324,
    "records":
    [
    {
    "sensors":{
    "Speed":3.588583,
    "Distance":2003.023071,
    "RPM":1896.575806
    },
    "racetime":0.324,
    "timestamp":1458141858
    },
    {
    "sensors":{
    "Speed":6.755624,
    "Distance":2004.084717,
    "RPM":1673.264526
    },
    "racetime":0.556,
    "timestamp":1458141858
    },

    View Slide

  41. © 2016 MapR Technologies 34
    3
    Sensor Data V2
    • 3 main data points:
    • Speed (m/s)
    • RPM
    • Distance (m)
    • Throttle
    • Gear
    • …
    • Buffered
    { "_id":"1.458141858E9/0.324",
    "car" = "car1",
    "timestamp":1458141858,
    "racetime”:0.324,
    "records":
    [
    {
    "sensors":{
    "Speed":3.588583,
    "Distance":2003.023071,
    "RPM":1896.575806,
    "gear" : 2
    },
    "racetime":0.324,
    "timestamp":1458141858
    },
    {
    "sensors":{
    "Speed":6.755624,
    "Distance":2004.084717,
    “RPM":1673.264526,
    "gear" : 2
    },
    "racetime":0.556,
    "timestamp":1458141858
    },

    View Slide

  42. © 2016 MapR Technologies 35
    • It works, is available on github, ASL 2
    • Data collected is unrealistically limited, lacks
    – Tire pressure, temperature x 4
    – Brake usage, temperature x 8
    – Engine monitoring is primitive (RPMs only, no KERS)
    – Data rate is fixed, real data comes in at highly variable rates
    – Real data has variable delays due to RF dropout + buffering
    • Data collected is in pure JSON
    – Real data is columnar compressed blobs

    View Slide

  43. © 2016 MapR Technologies 36
    Next Steps
    • Near Real Time Data Processing
    • Aggregation
    • Machine Learning
    • Alerts

    View Slide

  44. © 2016 MapR Technologies 37
    • Cluster Computing Platform
    • Extends “MapReduce” with
    extensions
    – Streaming
    – Interactive Analytics
    • Run in Memory
    • http://spark.apache.org/

    View Slide

  45. © 2016 MapR Technologies 38
    • Streaming Dataflow Engine
    • Datastream/Dataset APIs
    • CEP, Graph, ML
    • Run in Memory
    • https://flink.apache.org/

    View Slide

  46. © 2016 MapR Technologies
    © 2016 MapR Technologies
    @tgrall 39
    IoT : Racing Cars V2.0
    sensors data
    https://github.com/mapr-demos/racing-time-series
    Alerts

    View Slide

  47. © 2016 MapR Technologies 40
    Spark & Streams
    val topics = “/app/racing/stream:all_cars"
    val sparkConf = new SparkConf().setAppName(“SensorStream")
    val ssc = new StreamingContext(sparkConf, Seconds(2))
    // Create direct kafka stream with brokers and topics
    val topicsSet = topics.split(",").toSet
    val kafkaParams = Map[String, String](
    ConsumerConfig.GROUP_ID_CONFIG -> "race1",
    ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer",
    ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> "org.apache.kafka.common.serialization.StringDeserializer",
    ConsumerConfig.AUTO_OFFSET_RESET_CONFIG -> "earliest",
    ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG -> "false",
    "spark.kafka.poll.time" -> "1000"
    )
    val messages = KafkaUtils.createDirectStream[String, String](ssc, kafkaParams, topicsSet)
    val sensorDStream = messages.map(_._2).map(parseSensor)
    sensorDStream.foreachRDD { rdd =>
    // There exists at least one element in RDD
    if (!rdd.isEmpty) { ….. }
    }

    View Slide

  48. © 2016 MapR Technologies
    © 2016 MapR Technologies
    @tgrall 41
    Streaming Architecture & Formula 1
    • Stream data in real time
    • Big Data Store to deal with the scale
    • NoSQL Database, Distributed File System
    • Decouple the source from the consumer(s)
    • Dashboard, Analytics, Machine Learning
    • Add new use case….

    View Slide

  49. © 2016 MapR Technologies
    © 2016 MapR Technologies
    @tgrall 42
    Streaming Architecture & Formula 1
    • Stream data in real time
    • Big Data Store to deal with the scale
    • NoSQL Database, Distributed File System
    • Decouple the source from the consumer(s)
    • Dashboard, Analytics, Machine Learning
    • Add new use case….
    This is not only about Formula 1!
    (Telco, Finance, Retail, Content, IT)

    View Slide

  50. © 2016 MapR Technologies 43
    MapR Converged Data Platform
    Open Source Engines & Tools Commercial Engines & Applications
    Utility-Grade Platform Services
    Data Processing
    Enterprise Storage
    MapR-FS MapR-DB MapR Streams
    Database Event Streaming
    Global Namespace High Availability Data Protection Self-healing Unified Security Real-time Multi-tenancy
    Search &
    Others
    Cloud &
    Managed
    Services
    Custom Apps
    Unified Management and Monitoring

    View Slide

  51. © 2016 MapR Technologies 44
    MapR Platform Services: Open API Architecture

    Assures Interoperability, Avoids Lock-in
    MapR-FS
    Enterprise Storage
    MapR-DB
    NoSQL Database
    MapR Streams
    Global Event Streaming
    HDFS
    API
    POSIX
    NFS
    SQL,
    Hbase
    API
    JSON
    API
    Kafka
    API

    View Slide

  52. © 2016 MapR Technologies 45

    View Slide

  53. © 2016 MapR Technologies 46
    Q & A
    @mapr | @tgrall maprtech
    [email protected]
    Engage with us!
    MapR
    maprtech
    mapr-technologies

    View Slide

  54. © 2016 MapR Technologies
    © 2016 MapR Technologies
    MapR Confidential 47

    View Slide