• High availability & resilient system • QoS 2 for devices • Aggregate data • Monitoring (Infra & fleet) • Expose data to Data Scientists (API) TECHNICAL NEEDS
• Storage ◦ Use a single DB & remove TTL ◦ Use a full time series DB ◦ Compression • Grafana integration ◦ Datasource plugin for MongoDB • Spring XD EOL ◦ Use Kafka broker ◦ Use Kafka streams to enrich data in real-time • Security implementation ◦ ACL per device • Historical data (only for migration purpose) ◦ Load data from HDFS to DB ◦ Performance issue PROBLEMS TO SOLVE
• Limitations ◦ Time Series DB ◦ Fields name length ◦ Aggregate data per minute ◦ Upsert (Order & minute) ◦ Flatten data ◦ Aggregation are needed most of the time (avg, count, …) ◦ Time precision MONGODB TEST & LIMITATIONS
• Query by tags • ACL associated to tags • Retrieved data are ready to use • Python pandas integration • Range queries QUASARDB TEST - USED QUERY FEATURES
BENCHMARK SUMMARY Storage HDFS QuasarDB MongoDB Size Size Ratio Size Ratio 1.47 Go 2.80 Go 1.9 5.60 Go 3.81 Time Precision Nanoseconds Milliseconds Loading (1 day ~ 1.47 Go) 1h 30 m 2h 25m Querying with index scan QuasarDB is 2.3 times faster without index scan QuasarDB is 18 times faster
• Kafka ◦ Connect vs consumer/producer ◦ Partitioning ◦ Expose JVM metrics from all consumers/producers/streams • Fork real time data ◦ No down time • Load historical data ◦ Cluster stress ◦ Use HDFS for historical raw data storage ABOUT MIGRATION