CNDF2023前夜祭で話したデータ基盤のSREに関連する話です。
ݰքṗͷΫϥυωΠςΟϒͳσʔλج൫ӡ༻ͷ࣮ફʙॵ͍ʂʂՆͩʂʂʂૣ͘ϏΞΨʔσϯߦ͖͍ͨʂʂ̍ฤʙ
View Slide
ࢁԼ!QZBNB(.0ϖύϘٕज़ج൫νʔϜγχΞɾϓϦϯγύϧɹΩϟϯϓɺཱྀߦɺώϧτϯ८ΓɺιϫχΤ८Γ($1ɺ"JSflPXɺ1VC4VCɺ%BUBflPX
ࠓ͢͜ͱγεςϜ֓ཁͦ͜ʹ͋ͬͨ՝ͲͷΑ͏ʹղܾͨ͠ͷ͔
ϗεςΟϯάࣄۀ &$ࢧԉࣄۀ ϋϯυϝΠυɾͦͷଞࣄۀ
γεςϜߏIngest PipelineDataFlowCloudComposerExtractAnalyticsMLBigQueryVertex AIsourcemonitorCloud LoggingCloud MonitoringDBσʔλɺϩάΛBigQueryʹूPub/Sub
σʔλج൫ͷར׆༻ྨࣅը૾ݕࡧਪનੜ࢈ੑࢦඪͷଌఆ
ͳ͕ͥʁʁʁ
https://note.com/udzura/n/n5c8647d38fff ΑΓҾ༻
ͦ͜ʹ͋ͬͨ՝མͪଓ͚ΔDAGͱຫੑతͳτΠϧΤϯυϢʔβʔ͔ΒͷϑΟʔυόοΫͰݦࡏԽ͢Δোޮͷѱ͍։ൃ
མͪଓ͚ΔDAGͱຫੑతͳτΠϧ
མͪଓ͚ΔDAGͱຫੑతͳτΠϧDAG = Directed Acyclic Graph = ༗ඇ८ճάϥϑhttps://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html ΑΓҾ༻
DAGͷࣦഊཁҼ1.༷ʑͳλΠϜΞτ2.࿈ٳʹΑΔσʔλͷܽଛ3.ιʔεͷσʔλߏͷมߋɺଐੑ4.มߋʹΑΔΤϯόά
τΠϧͨΔॴҎ1.༷ʑͳλΠϜΞτ → ϦτϥΠɺλΠϜΞτᮢมߋ2.࿈ٳʹΑΔσʔλͷܽଛ → ϦτϥΠɺεΩοϓ3.ιʔεͷσʔλߏɺଐੑͷมߋ → มߋ͓ͯ͠͠·͍4.มߋʹΑΔΤϯόά → मਖ਼͓ͯ͠͠·͍
ॳखSLI/SLOͷࡦఆ
SLO/SLOͷࡦఆʮͳʹ͔ΒΔ͔ʁʯ - վળͷ༏ઌॱҐʮͲΕ͘Β͍Δ͔ʯ - Ͱվળͷ߹͍ΛܾΊΔ·ͣࢦඪΛ࡞Δ͜ͱͰɺۦಈͰҙࢥܾఆͰ͖Δ
SLO/SLOͷࡦఆSREϫʔΫϒοΫ - Ch 13. Data Processing PipelinesͲͷΑ͏ʹܾΊΔ͔ʁhttps://sre.google/workbook/data-processing/https://cloud.google.com/stackdriver/docs/solutions/slo-monitoring/sli-metrics/data-proc-metrics?hl=jaGoogle Cloud ΦϖϨʔγϣϯεΠʔτ σʔλॲཧαʔϏε
SLO/SLOͷࡦఆGrafanaͰ؆୯ʹμογϡϘʔυԽͰ͖Δ
SLO/SLOͷࡦఆ1.DAGͷޭ2.DataFlowδϣϒʹ͓͚ΔΤϥʔͷൃੜ3.DataFlowδϣϒͷεςοϓʹ͓͚Δσʔλͷ
վળͷଧͪखλΠϜΞτͷཁҼͱͳΔɺԆΛऔΓআ͘ɺՁ؍Λৢ͢ΔλΠϜΞτʹ͍ͭͯɺϝτϦΫε͔Βଥͳ͔Λ֬ೝ͢Δ1. ༷ʑͳλΠϜΞτ
վળͷଧͪखholiday-jpΛ׆༻ͯ͠ɺ࿈ٳ࣌ͷॲཧΛ࣮BigQuery͔ΒUDFͰ࿈ٳͷ݅ΛఆٛͰ͖ΔΑ͏ʹͨ͠2.࿈ٳʹΑΔσʔλͷܽଛ
վળͷଧͪखDebeziumͰDDLΛݕͨ͠ΒSlack௨͢ΔσʔλͷNULLൺ͕૿͑ͨΒ௨͢Δ3.ιʔεͷσʔλߏɺଐੑͷมߋhttps://github.com/pyama86/debezium-ddl-notifier/
վળͷଧͪखຖશDAGΛޕલɺޕޙʹ࣮ߦͯ͠ɺຊ൪࣮ߦΑΓ σʔλͷෆ߹ͳͲΛૣ͘ݕϢχοτςετͷ֦ॆ4. มߋʹΑΔΤϯόά
࠷͓ۙئ͍ͯ͠ɺTABΛԡ͚ͩ͢ͷࣄͰ͢## ೖྗ༷- AirflowͷDAGͷϑΝΠϧΛೖྗͱ͠·͢## αϯϓϧ࣮1```python{sample1}```## αϯϓϧ࣮2```python{sample2}```## ࢦࣔ- ೖྗͱͯ͠͞ΕͨDAGͷςετίʔυΛpytestΛ༻͍࣮͍ͯͯͩ͘͠͞ɻ- Ϋϥεʹ͍ͭͯspymockΛ༻͍ͯϞοΫԽ͍ͯͩ͘͠͞ɻ- 2ͭͷαϯϓϧ࣮Λࣔ͠·͢ɻͦΕΛࢀߟʹ͍ͯͩ͘͠͞ɻ- ͋ͳ͕ͨੜͨ͠ςετͦͷ··ϑΝΠϧʹอଘͯ͠ɺ࣮ߦ͠·͢ɻग़ྗ༰ͷઆ໌ͳͲෆཁͰ͢ɻ## ೖྗ ϑΝΠϧ໊:{dag_file_path}```python{dag_file_content}```...$ poetry run python bin/test_generator.py dags/example.py
ΤϯυϢʔβʔ͔Βͷ ϑΟʔυόοΫͰݦࡏԽ͢Δো
ʁʁʁʮࠓͷσʔλ͕ೖͬͯͳ͍Έ͍ͨ ͳΜͰ͚͢ͲɺͳΜ͔͋Γ·ͨ͠ʁʯ
ͳΜͱɺࢹ͕ ΄ͱΜͲͳ͍ͷͰ͋Δ(؆қతͳσʔλ߹ੑͷςετ͕͋Δ͚ͩ)
ࢹࣾձAWSͰECS͍ͬͯΔͱ͜ΖMackerel Container AgentಋೖCloud MonitoringͷΞϥʔτఆٛΛSLIʹ߹ΘͤͯTerraformͰ࣮GCPͱൺͯར༻ن͕খ͍ͨ͞ΊɺશࣾͰར༻͞Ε͍ͯΔMackerelΛར༻͢Δ͜ͱͰֶशɺಋೖίετͷݮ
σʔλύΠϓϥΠϯͷࢹΰʔϧσϯσʔλͷೖ@type dummy@label @INPUTtag exampledummy {"accessed_at": "2022-01-01T00:00:00Z","account_id": 1,"client_id": "12345abcde","event": "example_event",}https://docs.fluentd.org/v/0.12/input/dummyຖඵμϛʔσʔλΛύΠϓϥΠϯʹྲྀ͠ɺΤϯυϙΠϯτͰࢹ
ޮͷѱ͍։ൃ
։ൃڥͱۚͱݖྗCloud ComposerͷڥͰ݁߹ςετΛ࣮ߦ͢ΔͨΊʹɺ ςετ༻ͷڥΛຖேࣗಈ࡞ɺຖ൩ࣗಈআίετͱνʔϜߏΛؑΈͯɺຖே̎ڥ࡞ͯ͠ɺڞ༻
։ൃڥͱۚͱݖྗCloud ComposerͷڥͰ݁߹ςετΛ࣮ߦ͢ΔͨΊʹɺ ςετ༻ͷڥΛຖேࣗಈ࡞ɺຖ൩ࣗಈআ → ো࣌ɺ࣌ؒ֎ʹ͑ͳ͍ίετͱνʔϜߏ͔Β̎ڥ࡞ͯ͠ɺڞ༻→ ςετͪͷൃੜɺग़ྗ݁Ռͷࠞ߹
I'm Kubernetes Rock StarΦϯϓϨϛεͷKubernetes্ͰPR͝ͱʹ࣮ߦڥΛ࡞https://tech.pepabo.com/2023/07/07/data-platform-ci/
ਓͱνΣοΫϦετDAG A DAG B࣮ͷ֬ೝΛਓ͕͍ͬͯͨEmptyOperator(taks_id="external_task_sensor_target)
ྲྀΕͷRuby Rock StarʮͦΕASTݟͨΒ͍͍͡ΌΜʯ
Abstract Syntax Treefrom airflow.operators.empty importEmptyOperatorEmptyOperator(task_id="child_task3")% python -m ast example.pyModule(body=[ImportFrom(module='airflow.operators.empty',names=[alias(name='EmptyOperator')],level=0),Expr(value=Call(func=Name(id='EmptyOperator', ctx=Load()),args=[],keywords=[keyword(arg='task_id',value=Constant(value='child_task3'))]))],type_ignores=[])grepͰͰ͖ͳ͍͜ͱɺ੩తղੳͳΒղܾͰ͖Δ
͜͜·Ͱͨ͜͠ͱσʔλج൫ͷ։ൃɺӡ༻ʹ͓͚Δ՝Λ SREͷεΩϧͰվળ
Ҿୀࣦഊʂʂ1
શํҐ࠾༻͍ͯ͠·͢ɻ .-ͬͱΓ͍ͨ͠ɺ43&ͬͱ͓͠Ζ͍ͨ͘͠࠷৽ͷ࠾༻ใΛνΣοΫˠ !QC@SFDSVJU
ಥવͰ͕͢ɺ൧ͷΛ͠·͢
झຯιϫχΤΊ͙Γhttps://www.web-soigner.jp/magazine_hakata_gourmet_guide/Α͏ͦ͜Ԭʂʂ̍
ډञ
ར͖ͷ͔ͨࢤ٬୯Ձ :4000ʙ6000ԁॴ: ᷫԂਪਓ: 2 - 4ਓਪ͠ɿڕ͔Βͳʹ·Ͱɺ͕ͯ͢ඒຯ͍ɺ ຊञछྨ͕͋Γ๛ɻ ͦͯ͠ɺWEB༧Ͱ͖Δɻ ࢲతNo.1
͍ͤΜ͍٬୯Ձ :6000ʙ8000ԁॴ: தऱਪਓ: 2 - 8ਓਪ͠ɿ҆͘ͳ͍͕ɺԿ৯ͯඒຯ͍ɻ ΤϏϑϥΠ͕৴͡ΒΕͳ͍σΧ͞ͳͷͱ ͠ඒຯ͍ɻ
ḉ٬୯Ձ :4000ʙ8000ԁॴ: தऱਪਓ: 2 - 8ਓਪ͠ɿͱʹ͔͘ḉͱञɻ ͋Μ·ΓຊभͰ৯Εͳ͍ḉͩͱࢥ͏
ڕࡾಙ٬୯Ձ :3000ʙ5000ԁॴ: ᷫԂਪਓ: 2 - 8ਓਪ͠ɿڕͱञ͕͍҆ɻͱʹ͔҆ͯ͘͘ɺඒຯ͍
ୋউ٬୯Ձ :4000ʙ6000ԁॴ: தऱਪਓ: 2 - 8ਓਪ͠ɿ෦ͷ໊લ͕ϗʔΫεͷબखɻ ΊͪΌඒຯ͍Θ͚͡Όͳ͍͚Ͳɺ ԿͰ͋Δɻศརɻ
ϥʔϝϯ
Ұ٬୯Ձ :1000ԁॴ: ᷫԂɺതଟਪ͠ɿಲࠎofಲࠎ͍͍ͩͨ৯͏ͱ͖ᬦᬡͯ͠Δ͔Βࣸਅͳ͔ͬͨ
໊ౡ٬୯Ձ :1000ԁॴ: തଟɺ໊ౡਪ͠ɿͦΜͳʹ͍ڧ͘ͳͯ͘ɺ ɹɹɹόϥϯε͕͍͍ɻ͖
݉ދ٬୯Ձ :1000ԁॴ: ఱਆɺതଟਪ͠ɿԬͰྲྀߦͬͯΔ͚ͭ໙ɻ ɹɹɹύϧί͕Ұ൪ฒͳ͍͔
φϯόʔϫϯ٬୯Ձ :1000ԁॴ: ఱਆɺതଟɺᷫԂਪ͠ɿ͜ͷลͰ৯ΕΔܥͰҰ൪͖ɻ ී௨ͷϥʔϝϯʹމຑͱߚੜᇙ͍Εͯ ৯Δͷ͕͓͢͢Ί
ͦͷଞ
Τετ٬୯Ձ :1000ԁॴ: Ͳ͜Ͱ͋Δਪ͠ɿϩʔΧϧ͏ͲΜνΣʔϯɻ ͏ͲΜࡉ໙ʹωΪଟΊ͕͖ɻ ډञͰɺ৴͡ΒΕͳ͍ίεύ
ؒඈߦ٬୯Ձ :3000-5000ԁॴ: തଟਪ͠ɿޫʹೖͬͯΔϗςϧόʔɻ Ҝࢠ͕͍͍ͷͱΏͬ͘ΓͤΔɻ ɹɹɹτϫΠϥΠτͱ͍͏ΧΫςϧ͕͖
ϐΤτϩ٬୯Ձ :1000-3000ԁॴ: Ͳ͜Ͱ͋Δਪ͠ɿϥϯνͰͰɺεύʔΫϦϯάͱ ɹɹɹύελorϐβͰܹ҆ඒຯ͍
࠷ߴͷCNDF2023ʹ ͠·͠ΐ͏ʂʂʂ