Scaling a data team in a hyper growth and high tech environment

Big Data Expo 2022 Scaling a data team in a
hyper growth and high tech environment

Conﬁdential 2

2025 2022 Show the potential, build the brand Lightyear 0
3 Mass market, setting new standards Lightyear 2 Lightyear 0 to pave the way

Data @ Lightyear 4 Optimised thermal system for high efficient
EVs Most efficient automotive solar panels Most efficient inverters Most aerodynamic production car design Lightweight body and chassis Highest specific energy battery pack 4 in-wheel motors Maximising solar yield Minimising energy consumption

5 Strategy

6 Infrastructure

7 Maintainable Accessible Real-time Scalable

Use Case Vehicle Monitoring Important notice: the following use case
is about collecting vehicle telemetry data from our prototypes! Data collection from the actual customer vehicles will be diﬀerent and will not contain any PII data, such as the vehicle location. 8

Vehicle Telemetry One Year Ago 9

— Output in proprietary formats — Files stored all over
the place — Knowledge of Matlab required to analyse data — Resulting analysis are snapshots — Shared as screenshots via mail or chat Problem Statement 11

— Over the air telemetry logging — (Near) real time
availability — Exploratory analysis — Interactive dashboards — Data management and governance Ideal Situation 12

13 Initial Solution

Beneﬁts 14

— InﬂuxDB to Databricks export is expensive and very slow
— Data in Databricks always lags behind at least several hours — No fallback mechanism if InﬂuxDB is unreachable — No notice of signals that could not be translated/processed — No easy way to integrate other systems Drawbacks 16

Updated Solution 17

— Data is first send to Kafka, which is built
for handling large amounts of real-time data — Databricks and InfluxDB can consume data from Kafka simultaneously and in real-time — Kafka consumers can be built in such a way that failed inserts to the other system are automatically retried — Other systems, like the core platform, can easily start consuming from Kafka as well — Untranslatable messages are send to a separate topic, from where we can pick them up and investigate the issue Benefits 19

Circling Back 20

Thank you. V2.0 220609 21

Scaling a data team in a hyper growth and high ...

Scaling a data team in a hyper growth and high tech environment

Marketing OGZ PRO

More Decks by Marketing OGZ

Featured

Transcript

Big Data Expo 2022 Scaling a data team in a

Conﬁdential 2

2025 2022 Show the potential, build the brand Lightyear 0

Data @ Lightyear 4 Optimised thermal system for high eﬃcient

5 Strategy

6 Infrastructure

7 Maintainable Accessible Real-time Scalable

Use Case Vehicle Monitoring Important notice: the following use case

Vehicle Telemetry One Year Ago 9

10

— Output in proprietary formats — Files stored all over

— Over the air telemetry logging — (Near) real time

13 Initial Solution

Beneﬁts 14

15

— InﬂuxDB to Databricks export is expensive and very slow

Updated Solution 17

18

— Data is ﬁrst send to Kafka, which is built

Circling Back 20

Thank you. V2.0 220609 21