Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[HeatWavejp Meetup #05] トヨタにおけるMySQL HeatWave Lakehouseを用いた大規模データロードと分析の評価 [根山 亮 氏(トヨタ自動車)]

[HeatWavejp Meetup #05] トヨタにおけるMySQL HeatWave Lakehouseを用いた大規模データロードと分析の評価 [根山 亮 氏(トヨタ自動車)]

HeatWavejp Meetup #05 年忘れ MySQL HeatWave 事例祭り & オフライン忘年会もあるよ!(2023/12/05 開催)の講演資料です。

【講演内容】
トヨタにおけるMySQL HeatWave Lakehouseを用いた大規模データロードと分析の評価

コネクテッドカー台数の拡大に伴い、ユーザー体験の向上や付加価値サービスの需要が高まり、その中でAIが重要な役割を果たすようになっている。AIモデル開発には、これらの車から得られる多数のセンサーデータを効率的に前処理する必要がある。
オラクルのMySQL HeatWave、特にLakehouse機能は、オブジェクトストレージから直接メモリにデータをロードすることで、データの入出力操作を強化し、効率的なソリューションを提供する。
本発表では、コネクテッドカーからの大量データを効率的にロードするためのLakehouse機能の有効性を評価し、その結果を発表するとともに、柔軟なデータ分析プラットフォーム構築への示唆を考察する。

【発表者】
根山 亮 氏
トヨタ自動車株式会社 社会システムPF開発部 InfoTech-AS データ流通基盤グループ.
グループ長兼シニアリサーチャー

【イベント情報】
HeatWavejp Meetup #05
https://heatwavejp.connpass.com/event/300343/

More Decks by HeatWavejp(MySQL HeatWave Japan User Group)

Other Decks in Technology

Transcript

  1. Evaluating Large Data Loads and Analysis with MySQL HeatWave Lakehouse

    at Toyota Ryo NEYAMA TOYOTA MOTOR CORPORATION HeatWavejp Meetup, Tokyo
  2. About Me 2 Remote Presentation in Oracle CloudWorld 2022 (The

    photo is edited for privacy protection) Ryo Neyama Current Position: Group Manager & Senior Researcher, Social System PF Development Div., TOYOTA MOTOR CORPORATION Professional Journey: 1999~2007: IBM Tokyo Research Laboratory 2007~2008: A startup company 2008~ : Toyota's subsidiary → Toyota (in Tokyo) Research Interests: • Distributed Computing and Databases • Advanced Mobility Platform Technical Contributions & Productization: • Web Services & Security (published two books) • Transactional Cache for Java • Partitioning Facility for J2EE App Server • Data Collection for HD Map Generation (below) And... two-daughter father, outdoor lover
  3. Agenda 1. Introduction • The state of the automotive industry

    and Toyota's aim for a mobility society • Challenges of building a data platform to support advanced mobility services 2. Evaluation of MySQL HeatWave and Lakehouse for Efficient and Cost-effective Analysis of Vehicle Big Data • Performance comparison with other products (OCW '22 recap) • HeatWave data load performance from Object Storage with Lakehouse • HeatWave's data compression capability 3. Summary and Future Work 3
  4. Agenda 1. Introduction • The state of the automotive industry

    and Toyota's aim for a mobility society • Challenges of building a data platform to support advanced mobility services 2. Evaluation of MySQL HeatWave and Lakehouse for Efficient and Cost-effective Analysis of Vehicle Big Data • Performance comparison with other products (OCW '22 recap) • HeatWave data load performance from Object Storage with Lakehouse • HeatWave's data compression capability 3. Summary and Future Work 4
  5. Automotive Industry: Once-in-a-century Transformation 1908~: Ford Model-T Today (2023): CASE*,

    SDGs**, etc. 5 *CASE: Connected, Autonomous, Shared & Services, Electric (advocated by Mercedes-Benz Group AG) **SDGs: Sustainable Development Goals This photo is licensed under CC BY-SA.
  6. Transformation into a Mobility Company 6 Electrification Intelligence "Traditional" AUTOMOTIVE

    COMPANY "All New" MOBILITY COMPANY Diversification Toyota Mobility Concept For details: "New Management Policy & Direction Announcement - Presentation Message from Management Koji Sato", Apr. 07, 2023. https://global.toyota/en/newsroom/corporate/39013233.html
  7. Why Does Toyota Mobility Concept Matter? 7 Electrification Intelligence Diversification

    Driving Efficiency People Mobility Safety Goods Mobility Community Design / Regional Energy Management Service Mobility Safety Assurance Values Services Because we aim to provide these values and services for society
  8. Agenda 1. Introduction • The state of the automotive industry

    and Toyota's aim for a mobility society • Challenges of building a data platform to support advanced mobility services 2. Evaluation of MySQL HeatWave and Lakehouse for Efficient and Cost-effective Analysis of Vehicle Big Data • Performance comparison with other products (OCW '22 recap) • HeatWave data load performance from Object Storage with Lakehouse • HeatWave's data compression capability 3. Summary and Future Work 11
  9. ML & Dev Ops for All People's Happiness 13 ML

    & Dev Ops Connectivity and AI/ML play essential roles for this process.
  10. C A S E Connected Autonomous Shared Electric Connected Company

    A virtual company leading "Connected" We Use Connectivity to Make a Better World 14
  11. Data Volume Predictions for Connected Cars Shifts in the total

    number of connected cars Shifts in the volume of data transfer per-vehicle EB Location data up to several dozen KB per month [constant] CAN data up to hundred MB per month [constant] Dynamic map generation up to several GB per month [as needed] Peripheral sensing data up to several dozen GB per month [as needed] Time and scale of data on a per-vehicle basis Overall volume of connected car data transfer Data from cars continuously grow huge 15
  12. End-to-End Dataflow Overview 16 Location data Sensor data Video data

    LiDAR data Data Communication Module Data Platform (a.k.a. TBDC; Toyota Big Data Center) Batch Processing Data Ingestion Stream Processing AI/ML Cellular or Wi-Fi End-user Services Customers Developers AI/ML is a key enabler for value-added services
  13. Question: For carbon neutrality, is it possible to reduce the

    total amount of travel time and fuel consumption by traffic dispersion when abrupt traffic events happen? ▪ Motivation ▪ Demonstration AI-improved route from Haneda Airport IC to Misato-Chuo IC Improvement summary •Distance: +7.5km •Travel time: -1hour (see Figure-1) •Fuel consumption: -25% (see Figure-2) Normal traffic situation Congested traffic situation [Detour route] non-shortest travel time [Usual Route] the shortest travel time [Detour route] shortest travel time with low fuel consumption Congestion [Usual route] congestion happens Usual route (40.1km) Detour route (47.6km) Time Figure-1: Travel time Detour Route Usual Route Travel time -1 hour 燃料消費 Time Figure-2: Fuel consumption Usual Route Detour Route -25% Fuel consumption Example) Traffic Dispersion in Abrupt Traffic Events 17 We need to calculating average speed and fuel consumption per road link.
  14. Example) Cruising Area Prediction for Battery EV 18 Question: Given

    the current location and remaining battery level, how far could the battery EV reach? ▪ Motivation ▪ Demonstration current location reachable area Combine the following three methods: 1. 1. Route search on road network 2. 2. Predict energy consumption along the routes (physical model) 3. 3. Draw contour lines for edge nodes (computational geometry) battery consumption low high How far could we reach? Where should we charge the battery if the battery level would be short ? Example results We need to calculate avg. speed and gradient per road link in the procedure #2.
  15. Location data Sensor data Video data LiDAR data Data Communication

    Module Data Platform (a.k.a. Toyota Big Data Center) Batch Processing Data Ingestion Stream Processing AI/ML Cellular or Wi-Fi End-user Services Customers Developers A Challenge in AI Algorithm Development 19 Discovering needs Formulating hypothesis Gathering data Reviewing algorism Prototyping MVP Functional suitability evaluation Developing prototype infra Evaluation, Cost estimation, Performance analysis Adjustment, Review of use cases Judgement of productization Development of infrastructure Evaluation, Cost estimation, Performance analysis Demonstration, proposal of productization Development process of AI algorism Development Process of Platform To develop new AI algorism, agility of preprocessing, machine learning, and evaluation is one of top priorities. Challenge: How to process large amounts of data efficiently and at an affordable cost
  16. MySQL HeatWave & Lakehouse 20 Source:https://dev.mysql.com/doc/heatwave/en/heatwave-introduction.html ✓ Scalability: Capable of

    large data loads in object storage, i.e. HeatWave Lakehouse* ✓ Efficiency and Affordability: HTAP-ready in-memory columnar database with multiple level parallelization, which is faster and cheaper preprocessing ✓# of cars in each "mesh" ✓Travel time per links ✓・・・ Features AI Models Machine Learning *OCI: Oracle Cloud Infrastructure Data Communication Module * A large amount of data from connected cars We expect that HeatWave & Lakehouse could use for preprocessing in development of new AI algorism *GA in July 2023 Advantages of HeatWave and Lakehouse:
  17. Future use cases 21 Object Storage HeatWave Object Storage HeatWave

    Stream Processing (data conversion etc.) MDS Common data Use case 1: On-demand Cluster Create HeatWave cluster and load data only when needed Use case 2: Dedicated Cluster Share a HeatWave cluster and data in an organization Use case 3: Common Cluster Keep ingesting common data into a HeatWave cluster and share data across organizations HeatWave Output data Input data MDS: MySQL Database Service  - Lower cost. - Suitable for experimental analytics.  - Waiting time for cluster creation and data loads.  - The common preprocessing task and resulting data can be shared. - Suitable for design & development teams sharing domain-specific concerns.  - Higher cost.  - Suitable for real-time data analytics for most recent data  - Managing long-term data from numerous vehicles can incur significant expenses. We will explore how we can incorporate these use cases into our platform
  18. Agenda 1. Introduction • The state of the automotive industry

    and Toyota's aim for a mobility society • Challenges of building a data platform to support advanced mobility services 2. Evaluation of MySQL HeatWave and Lakehouse for Efficient and Cost-effective Analysis of Vehicle Big Data • Performance comparison with other products (OCW '22 recap) • HeatWave data load performance from Object Storage with Lakehouse • HeatWave's data compression capability 3. Summary and Future Work 22
  19. Performance Comparison with Other Products 23 Type of test cases

    Evaluation items Targets of comparison Results OLAP Benchmark TPC-H Fundamental performance A HeatWave is 2 to 30 times faster for most queries and 6+ times better in cost performance Dynamic Map Range Search Fundamental performance Redis HeatWave is thousands of times faster than Redis for key scans, which Redis is not good at Key Search Fundamental performance Redis Redis is 3 times faster than HeatWave, but HeatWave is also fast enough Feature extraction (Calculation of statistics of multivariate) Aggregation (GROUP BY) Performance of extracting features from data in data store B HeatWave is 2 to 60 times faster Pattern mining (Trajectory data mining) Composition (JOIN) Performance of JOIN between a set of VIN (Vehicle Identification Number) and logs travel data of vehicles B, D HeatWave is 2 to 5 times faster Legends - A: DWH Service B: Data Search Service C: Data Processing Service D: ETL Service HeatWave showed its advantages in most test cases In CloudWorld 2022, we presented the results in our talk*. * "Data Collection and Management Platform for Advanced Mobility Services at Toyota"
  20. Assessment of HeatWave Updates since 2022 24 • HeatWave data

    load performance from Object Storage with Lakehouse* • HeatWave's data compression capability** * We evaluated HeatWave Lakehouse in the beta program. ** GA in March 2022 Assessment targets:
  21. Agenda 1. Introduction • The state of the automotive industry

    and Toyota's aim for a mobility society • Challenges of building a data platform to support advanced mobility services 2. Evaluation of MySQL HeatWave and Lakehouse for Efficient and Cost-effective Analysis of Vehicle Big Data • Performance comparison with other products (OCW '22 recap) • HeatWave data load performance from Object Storage with Lakehouse • HeatWave's data compression capability 3. Summary and Future Work 25
  22. Evaluation #1 Large-data Loads Performance using HeatWave Lakehouse 26 Objectives:

    Evaluate data-load performance: ① MDS vs. Lakehouse with CSV files. ② CSV files vs. native-format files, i.e. suspend and resume capability, with Lakehouse. Conclusions: ① Lakehouse was 65x faster than MDS: 11 HeatWave nodes could load 11TB CSV- format* sensor data in 88 minutes, which was >4 days in MDS (see Figure-1.1). ② Native format was 11.7 times faster than CSV format**: 11 HeatWave nodes could load or store 11TB native-format sensor data both in 7.5 minutes (see Figure-1.2). Results: Input Data: • 18 columns CSV sensor data (1,280 files, ~11TB in total), which corresponds to 1 day data collected from 10 million vehicles. * We evaluated HeatWave Lakehouse in the beta program, and as a result, there was no support for the Parquet file format during our testing. ** Native format files are not accessible to users, so we cannot manipulate the native-format files ourselves. Evaluation Environment: • MySQL shape: MySQL.HeatWave.VM.Standard.E3 • MySQL version: 8.0.31-HeatWave-Preview 7.5 min (resume) 7.5 min (suspend) input data (native-format files) Figure-1.2: Data load time of native-format files in Lakehouse >4 days MySQL client Lakehouse Figure-1.1: Lakehouse vs. MDS with CSV files 88 min MDS input data (CSV files) input data (CSV files) Source : https://www.oracle.com/mysql/heatwave/#data-recovery (Note: the figures are edited by the author.)
  23. Agenda 1. Introduction • The state of the automotive industry

    and Toyota's aim for a mobility society • Challenges of building a data platform to support advanced mobility services 2. Evaluation of MySQL HeatWave and Lakehouse for Efficient and Cost-effective Analysis of Vehicle Big Data • Performance comparison with other products (OCW '22 recap) • HeatWave data load performance from Object Storage with Lakehouse • HeatWave's data compression capability 3. Summary and Future Work 27
  24. Evaluation #2 Data compression capability in HeatWave 28 Objectives: Evaluate:

    ① how much memory consumption differs with and without data compression. ② how much of a performance penalties are imposed by data compression. Conclusions: ① Compression reduced memory consumption by 64.5%: With the original 25.15GB CSV files, HeatWave consumed 16.27GB and 5.77GB memory respectively with and without data compression (see Figure-2.1). ② The performance penalties varied between 50% and 100% for different queries. However, for more compute- intensive queries, the impact was less pronounced (see Table 2.1 and Figure 2.2). Results: 25.15 19.19 16.27 5.77 0 10 20 30 CSV files MySQL HeatWave (w/o compression) HeatWave (w/ compression) Data Size [GB] Figure-2.1: Memory consumption reduction 64.5% # Query Data Size 1 Simple Aggregation Calculate average speed by geographical mesh. 100GB 2 Aggregation (GROUP BY) Calculate average speed by geographical mesh and VIN (Vehicle Identification Number). 100GB 3 Composition (JOIN) JOIN between a set of VIN and logs travel data of vehicles. 100GB 4 Composition (JOIN) with more VINs Same as #3 except larger number of vehicles. 250GB Table-2.1: Queries and data size 5.88 8.7 7.33 11.28 13.62 12.4 0 5 10 15 #1 #2 #3 Processing Time [sec] w/o compression w/ compression 165 187 0 50 100 150 200 #4 Figure-2.2: Processing time 13.3% performance penalties 91.8% 56.6% 69.2%
  25. Reference) SQL Queries 29 SELECT COUNT(*), AVG(CAST(Speed_TypeA AS DOUBLE)) AS

    ps_speed_avg gridsquarecode3rd AS mesh_id FROM `random_csv_data` WHERE (Speed_TypeA != "") and (length(gridsquarecode3rd) = 10) GROUP BY mesh_id; Query #1 SELECT /*+ SET_VAR(USE_SECONDARY_ENGINE=FORCED) */ vin, gridsquarecode3rd, AVG(Speed_TypeA) FROM probe WHERE pt_dt LIKE "202001%" AND vin <> "0" GROUP BY vin, gridsquarecode3rd; Query #2 WITH top100vin AS( SELECT /*+ SET_VAR(USE_SECONDARY_ENGINE=FORCED) */ vin, (max(Odometerkm)-min(Odometerkm)) as Odometer_km_month FROM probe WHERE pt_dt LIKE "202001%" AND (gridsquarecode3rd like '5339-45%' OR gridsquarecode3rd like '5339-46%' OR gridsquarecode3rd like '5339-35%' OR gridsquarecode3rd like '5339-36%') GROUP BY vin ORDER BY Odometer_km_month DESC LIMIT 100 /* for #3, or 1000000 for #4 */ ) SELECT /*+ SET_VAR(USE_SECONDARY_ENGINE=FORCED) */ a.vin, a.gps_timestamp, a.mmlongitude, a.gridsquarecode3rd, a.Speed_TypeA, b.Odometer_km_month FROM top100vin AS b STRAIGHT_JOIN probe AS a ON b.vin = a.vin Query #3 and #4
  26. Agenda 1. Introduction • The state of the automotive industry

    and Toyota's aim for a mobility society • Challenges of building a data platform to support advanced mobility services 2. Evaluation of MySQL HeatWave and Lakehouse for Efficient and Cost-effective Analysis of Vehicle Big Data • Performance comparison with other products (OCW '22 recap) • HeatWave data load performance from Object Storage with Lakehouse • HeatWave's data compression capability 3. Summary and Future Work 30
  27. Summary and Future Work Summary • Background • Toyota is

    evolving into a mobility company with the aim of creating a better world. • Motivation • The preprocessing of extensive data gathered from connected cars to enhance our services through AI/ML. • Findings • Our research indicates that MySQL HeatWave and Lakehouse represent some of the most effective solutions for our specific use cases. Future work • Our feature requests to HeatWave • Lakehouse support in HeatWave on AWS • AWS S3 support as data source • Load and store between HeatWave and object storage in various data formats including native format • Predicate pushdown support with SQL "WHERE" clause in Lakehouse including filter optimization using partitions or metadata in Parquet files • Collaboration with our product team to make HeatWave available on our data platform 31
  28. Mission Producing Happiness for All We make the happiness of

    others our first priority. We make better products more affordable. We value every second and every cent. We give all our effort and offer all our ingenuity. We look forward, not backward. We believe the impossible is possible. 32
  29. Creating Mobility for All In a diverse and uncertain world,

    Toyota strives to raise the quality and availability of mobility. We wish to create new possibilities for all humankind, and support a sustainable relationship with our planet. Vision 33
  30. Applying imagination to improve society through a people-first design philosophy.

    Practicing Genchi Genbutsu to understand operations at their essence. Creating a physical platform to enable the mobility of people and things. A flexible system that changes with the software. Expanding our abilities by uniting the strength of partners, communities, customers and employees to produce mobility and happiness for all. 【 Software 】 【 Hardware 】 【 Partnership 】 Combining software, hardware and partnerships to create unique value that comes from the Toyota Way. The Toyota Way Value 34