Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Lakeflow_SDPとAsset_Bundlesによるパイプライン開発具体事例_-_S...

Avatar for Ryota Ikeuchi Ryota Ikeuchi
April 22, 2026
110

 Lakeflow_SDPとAsset_Bundlesによるパイプライン開発具体事例_-_Slidev.pdf

Avatar for Ryota Ikeuchi

Ryota Ikeuchi

April 22, 2026

Transcript

  1. データ取り込み Source App DB - DynamoDB テーブル user_setting id :

    "amzn...123" attributes (Map) pref_name : "大阪府" pop_start : "12:00" pop_end : "18:00" ... Source Storage - S3 Full s3://{bucket_name}/user_setting/full/{date}/AWSDynamoDB/{export_id}/data/{fil e_id}.json.gz { "Item": { "id": "amzn...123", "attributes": { "pop_start": "12:00", ... } } } Incremental s3://{bucket_name}/user_setting/incremental/{date}/AWSDynamoDB/data/{file_i d}.jsonl.gz { "Keys": { "id": "amzn...123" }, "OldImage": { "attributes": { "pop_start": "12:00", ... } }, "NewImage": { "attributes": { "pop_start": "06:00", ... } }, ... } Bronze (ST) Full テーブル b_user_setting_full record source_path ... {"Ite m":...} s3://.../full/04-1 6/... ... Incremental テーブル b_user_setting_incremental record source_path ... {"Key s":...} s3://.../incrementa l/04-17/... ... DynamoDB to S3 - Lambda // incremental response = dynamodb.export_table_to_point_in_time( TableArn=TABLE_ARN, S3Bucket=S3_BUCKET, S3Prefix=f"user_setting/incremental/{now_jst.strftime('%Y-%m-%d')}", ExportFormat="DYNAMODB_JSON", ExportType="INCREMENTAL_EXPORT", IncrementalExportSpecification={ "ExportFromTime": export_from, "ExportToTime": export_to } ) S3 to Bronze - Lakeflow SDP // incremental CREATE STREAMING TABLE b_user_setting_incremental AS SELECT value AS record, _metadata.file_path AS source_path, current_timestamp() AS ingest_timestamp FROM cloud_files( 's3://{bucket_name}/user_setting/incremental/*/AWSDynamoDB/data/', 'text' );
  2. 変更データキャプチャー Bronze (ST) - Raw Input Events b_user_setting_full record source_path

    ... {"Item": {"id":"amzn...123", "attributes": {"pop_start":"1 2:00", ...}}} .../04-16/... ... b_user_setting_incremental record source_path ... {"Keys": {"id":"amzn...123"}, "OldImage": {"attributes": {"pop_start":"12:00"}}, "NewI mage": {"attributes": {"pop_s tart":"06:00"}}} .../04-17/... ... Silver (ST) - CDC Source s_cdcs_customize_setting user_id pop_start ... event_na me last_updated amzn...123 12:00 ... INSERT 04-16 amzn...123 06:00 ... MODIFY 04-17 full と incremental を 1つのCDCソースに集約 Silver (ST) - SCD Type 2 s_customize_setting user_ id pop_st art ... event_n ame last_upd ated __START_ AT __END_AT amz n...1 23 12:00 ... INSERT 04-16 04-16 04-17 amz n...1 23 06:00 ... MODIFY 04-17 04-17 NULL Bronze to CDC Source - Lakeflow SDP CREATE FLOW f_cdcs_customize_setting_incremental AS INSERT INTO s_cdcs_customize_setting BY NAME SELECT user_id, ... FROM (SELECT COALESCE(try_cast(r:NewImage.id.S AS STRING), try_cast(r:Keys.id.S AS STRING)) AS user_id, ... CASE WHEN r:OldImage IS NOT NULL AND r:NewImage IS NOT NULL THEN 'MODIFY' WHEN r:OldImage IS NULL AND r:NewImage IS NOT NULL THEN 'INSERT' WHEN r:OldImage IS NOT NULL AND r:NewImage IS NULL THEN 'REMOVE' ELSE 'UNKNOWN' END AS event_name, ... FROM (SELECT parse_json(record) AS r, ingest_timestamp FROM STREAM(live.b_user_setting_incremental)) ...) ... CREATE FLOW f_cdcs_customize_setting_full … CDC Source to SCD Type 2 - Lakeflow SDP CREATE FLOW f_sync_customize_setting AS AUTO CDC INTO live.s_customize_setting FROM STREAM(live.s_cdcs_customize_setting) KEYS (user_id) APPLY AS DELETE WHEN event_name = 'REMOVE' SEQUENCE BY last_updated STORED AS SCD TYPE 2;
  3. 環境分離 - Asset Bundles YAML で カタログ、スキーマ、バケット、パス を定義する SQL はプレースホルダーにして

    デプロイ先によって自動で切り替わ るようにする databricks.yml targets: dev: variables: catalog: ct_dev schema: silver_${workspace. current_user.short_name} input_bucket_user_setting:... input_path_user_setting_ full:... input_path_user_setting_ incremental:... prod: variables: catalog: ct_prod schema: silver input_bucket_user_setting:... input_path_user_setting_ full:... input_path_user_setting_ incremental:... ct_de_etl.pipeline.yml resources: pipelines: ct_de_etl: catalog: ${var.catalog} schema: ${var.schema} configuration: input.bucket.user_setting: ${var. input_bucket_user_setting} input.path.user_setting.full: ${var. input_path_user_setting_ful input.path.user_setting. incremental: ${var. input_path_user_setting_ incremental} b_user_setting.sql CREATE STREAMING TABLE b_user_setting_full AS SELECT ... FROM cloud_files( 's3://${input.bucket.user_setting} /${input.path.user_setting.full}', 'text' ); CREATE STREAMING TABLE b_user_setting_incremental AS SELECT ... FROM cloud_files( 's3://${input.bucket.user_setting} /${input.path.user_setting.incremental}', 'text' );
  4. おわりに 今回:データエンジニアリング編 Lakeflow SDP DynamoDB データの取り込み / full と incremental

    の集約 Asset Bundles かんたん環境分離 次回:アナリティクスエンジニアリング編 どんなカスタマイズ設定の人が多いかな☀️ 🔊 ダッシュボードなどをつくる