Slide 1

Slide 1 text

The F1 Demo: Streaming Real-time Telemetry Using Apache Kafka and StreamSets DataOps Summit SF - September 5, 2019

Slide 2

Slide 2 text

Randy Zwitch Senior Director of Developer Advocacy @randyzwitch [email protected] /in/randyzwitch/ /randyzwitch

Slide 3

Slide 3 text

Volume Agility Spatio- Temporal

Slide 4

Slide 4 text

(You’re welcome, Dima)

Slide 5

Slide 5 text

OmniSciDB: Compiled, Columnar and (Lots of) Cores Traditional DBs can be highly inefficient - Each operator in SQL treated as a separate function - Incurs tremendous overhead and prevents vectorization OmniSci compiles queries w/ LLVM to create one custom function - Queries run at speeds approaching hand-written functions - LLVM enables generic targeting of different architectures (GPUs, X86, ARM, etc.) - Code can be generated to run query on CPU and GPU simultaneously

Slide 6

Slide 6 text

The F1 Demo at NVIDIA GTC 2019

Slide 7

Slide 7 text

“We need to build something cool for our booth...”

Slide 8

Slide 8 text

Step 1: Write UDP stream to Kafka https://raw.githubusercontent.com/omnisci/vehicle-telematics-analytics-demo/master/dataengineering/pipelines/UD P736c69c5-0b2b-4e9a-8263-85d8bd5e5fd2.json

Slide 9

Slide 9 text

Step 2: Parse UDP Packets, Write to Kafka https://raw.githubusercontent.com/omnisci/vehicle-telematics-analytics-demo/master/dataengineering/pipelines/01P arsemessagestoJSONcopy3d6023cc-0620-4312-9957-01f0d91b8302.json

Slide 10

Slide 10 text

Step 3: Parse JSON, Write to OmniSci https://raw.githubusercontent.com/omnisci/vehicle-telematics-analytics-demo/master/dataengineering/pipelines/02L oadF1messagestoOmniSci269b8b03-6dd1-4744-980b-bf7008ff714b.json

Slide 11

Slide 11 text

What I Learned - Build large pipelines as a series of smaller pipelines - Watch your defaults when developing! - Avoid serializing to plain text to improve throughput - Watch out for Jython issues in multi-threaded pipelines / Use Groovy instead

Slide 12

Slide 12 text

• • • • •

Slide 13

Slide 13 text

No content