Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling a data team in a hyper growth and high ...

Marketing OGZ
September 19, 2022
220

Scaling a data team in a hyper growth and high tech environment

Marketing OGZ

September 19, 2022
Tweet

Transcript

  1. Big Data Expo 2022 Scaling a data team in a

    hyper growth and high tech environment
  2. 2025 2022 Show the potential, build the brand Lightyear 0

    3 Mass market, setting new standards Lightyear 2 Lightyear 0 to pave the way
  3. Data @ Lightyear 4 Optimised thermal system for high efficient

    EVs Most efficient automotive solar panels Most efficient inverters Most aerodynamic production car design Lightweight body and chassis Highest specific energy battery pack 4 in-wheel motors Maximising solar yield Minimising energy consumption
  4. Use Case Vehicle Monitoring Important notice: the following use case

    is about collecting vehicle telemetry data from our prototypes! Data collection from the actual customer vehicles will be different and will not contain any PII data, such as the vehicle location. 8
  5. 10

  6. — Output in proprietary formats — Files stored all over

    the place — Knowledge of Matlab required to analyse data — Resulting analysis are snapshots — Shared as screenshots via mail or chat Problem Statement 11
  7. — Over the air telemetry logging — (Near) real time

    availability — Exploratory analysis — Interactive dashboards — Data management and governance Ideal Situation 12
  8. 15

  9. — InfluxDB to Databricks export is expensive and very slow

    — Data in Databricks always lags behind at least several hours — No fallback mechanism if InfluxDB is unreachable — No notice of signals that could not be translated/processed — No easy way to integrate other systems Drawbacks 16
  10. 18

  11. — Data is first send to Kafka, which is built

    for handling large amounts of real-time data — Databricks and InfluxDB can consume data from Kafka simultaneously and in real-time — Kafka consumers can be built in such a way that failed inserts to the other system are automatically retried — Other systems, like the core platform, can easily start consuming from Kafka as well — Untranslatable messages are send to a separate topic, from where we can pick them up and investigate the issue Benefits 19