Scaling a data team in a hyper growth and high tech environment

by Marketing OGZ

Slide 1

Slide 1 text

Big Data Expo 2022 Scaling a data team in a hyper growth and high tech environment

Slide 2

Slide 2 text

Conﬁdential 2

Slide 3

Slide 3 text

2025 2022 Show the potential, build the brand Lightyear 0 3 Mass market, setting new standards Lightyear 2 Lightyear 0 to pave the way

Slide 4

Slide 4 text

Data @ Lightyear 4 Optimised thermal system for high efficient EVs Most efficient automotive solar panels Most efficient inverters Most aerodynamic production car design Lightweight body and chassis Highest specific energy battery pack 4 in-wheel motors Maximising solar yield Minimising energy consumption

Slide 5

Slide 5 text

5 Strategy

Slide 6

Slide 6 text

6 Infrastructure

Slide 7

Slide 7 text

7 Maintainable Accessible Real-time Scalable

Slide 8

Slide 8 text

Use Case Vehicle Monitoring Important notice: the following use case is about collecting vehicle telemetry data from our prototypes! Data collection from the actual customer vehicles will be diﬀerent and will not contain any PII data, such as the vehicle location. 8

Slide 9

Slide 9 text

Vehicle Telemetry One Year Ago 9

Slide 10

Slide 10 text

Slide 11

Slide 11 text

— Output in proprietary formats — Files stored all over the place — Knowledge of Matlab required to analyse data — Resulting analysis are snapshots — Shared as screenshots via mail or chat Problem Statement 11

Slide 12

Slide 12 text

— Over the air telemetry logging — (Near) real time availability — Exploratory analysis — Interactive dashboards — Data management and governance Ideal Situation 12

Slide 13

Slide 13 text

13 Initial Solution

Slide 14

Slide 14 text

Beneﬁts 14

Slide 15

Slide 15 text

Slide 16

Slide 16 text

— InﬂuxDB to Databricks export is expensive and very slow — Data in Databricks always lags behind at least several hours — No fallback mechanism if InﬂuxDB is unreachable — No notice of signals that could not be translated/processed — No easy way to integrate other systems Drawbacks 16

Slide 17

Slide 17 text

Updated Solution 17

Slide 18

Slide 18 text

Slide 19

Slide 19 text

— Data is first send to Kafka, which is built for handling large amounts of real-time data — Databricks and InfluxDB can consume data from Kafka simultaneously and in real-time — Kafka consumers can be built in such a way that failed inserts to the other system are automatically retried — Other systems, like the core platform, can easily start consuming from Kafka as well — Untranslatable messages are send to a separate topic, from where we can pick them up and investigate the issue Benefits 19

Slide 20

Slide 20 text

Circling Back 20

Slide 21

Slide 21 text

Thank you. V2.0 220609 21