Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
AWSで作る、サーバーレスデータ分析基盤構築 / jawsug-niigata-11
Search
kasacchiful
January 15, 2022
Programming
1
300
AWSで作る、サーバーレスデータ分析基盤構築 / jawsug-niigata-11
JAWS-UG新潟#11で発表した資料です。
kasacchiful
January 15, 2022
Tweet
Share
More Decks by kasacchiful
See All by kasacchiful
AWS Application Composerで始める、 サーバーレスなデータ基盤構築 / 20240406-jawsug-hokuriku-shinkansen
kasacchiful
1
260
AWSの各種サービス紹介と活用方法 − AI・ML活用デモを交えて − / 20231208aws-aiml-seminar
kasacchiful
0
390
Amazon Rekognition デモ / 20231208-aws_seminar-01-rekognition-demo
kasacchiful
0
390
Amazon Lookout for Vision デモ / 20231208-aws_seminar-02-lookout-vision-demo
kasacchiful
0
390
Python機械学習勉強会in新潟のロゴが無いので、生成AIで作ってみましょう / osc2023niigata
kasacchiful
0
320
Amazon Bedrock概要と生成AIの基礎 / 20231118-jawsug-niigata-15
kasacchiful
0
63
生成AIと自然言語処理の基礎 / 20231111-pyml-niigata-18.pdf
kasacchiful
0
150
最近やってる、サーバーレスでデータ分析基盤を構築している話 / 20230916-nds63
kasacchiful
0
74
Glue for Rayを使ってみよう #devio2023 / devio2023-glue-for-ray
kasacchiful
0
690
Other Decks in Programming
See All in Programming
Random\Randomizer クラスで日常のあれこれを解決しよう! / Random\Randomizer class solves familiar trouble
cocoeyes02
0
240
障害対応を起点としたもっといい開発と運用のサイクル作りのためにできること / Hatena Enginner Seminar #29
polamjag
0
170
"config" ってなんだ? / What is "config"?
okashoi
0
240
PHPの次期バージョンはこの時期どうなっているのか - Internalsの開発体制について - PHPカンファレンス小田原
youkidearitai
PRO
1
190
GraphQLサーバの構成要素を整理する #ハッカー鮨 #tsukijigraphql / graphql server technology selection
izumin5210
4
840
DMMプラットフォームがTiDB Cloudを採用した背景
pospome
8
4.1k
Ruby Function Composition
bkuhlmann
1
330
初心者のためのRubyKaigi入門/RubyKaigi Introduction
a_matsuda
0
190
AWS CDKコントリビュートTIPS / aws-cdk-contribution-tips
gotok365
2
190
サイコロで理解する統計的仮説検定の考え方
tatamiya
4
930
Azure OpenAI Serviceのプロンプトエンジニアリング入門
tomokusaba
3
700
TYPO3 v13 – The road to LTS: What's new and new APIs
luisasofie_xoxo
0
200
Featured
See All Featured
Scaling GitHub
holman
457
140k
Six Lessons from altMBA
skipperchong
21
3k
Writing Fast Ruby
sferik
621
60k
Typedesign – Prime Four
hannesfritz
36
2.1k
Thoughts on Productivity
jonyablonski
58
3.8k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
121
39k
It's Worth the Effort
3n
180
27k
WebSockets: Embracing the real-time Web
robhawkes
59
7k
The World Runs on Bad Software
bkeepers
PRO
61
6.7k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
244
20k
The Art of Programming - Codeland 2020
erikaheidi
42
12k
The Pragmatic Product Professional
lauravandoore
25
5.8k
Transcript
AWSͰ࡞ΔɺαʔόʔϨε σʔλੳج൫ߏங JAWS-UG৽ׁ#11 2022-01-15 @kasacchiful
Classmethod, Inc. Solutions Architect / Software Develper Favorite: Community: •
JAWS-UG Niigata • Python ML in Niigata • JaSST Niigata • ASTER • SWANII • etc. Hiroshi Kasahara @kasacchiful @kasacchiful 2
αʔόʔϨεͷੳج൫
σʔλੳʹ͓͚Δ֤छAWSαʔϏε
σʔλͷՃʗੳʹ AWS Lambda Մೳ
ෳࡶɾେنͳΒ AWS Step Functions Λ׆༻
αʔόʔϨεύλʔϯ IUUQTBXTBNB[PODPNKQTFSWFSMFTTQBUUFSOTTFSWFSMFTTQBUUFSO
Ϣʔεέʔεผʹύλʔϯ͕͋Δ IUUQTBXTBNB[PODPNKQTFSWFSMFTTQBUUFSOTTFSWFSMFTTQBUUFSO
ύλʔϯͷৄࡉBlack BeltͷࢿྉΛࢀߟʹ IUUQTEBXTTUBUJDDPNXFCJOBSTKQQEGTFSWJDFT@"84@#MBDL#FU@4FSWFSMFTT@6TFDBTF@1BUUFSOTQEG :PV5VCFͰͷղઆಈըIUUQTZPVUVCF)*M8ESC@Z.
S3ʹೖΕͯ͠·͑ɺͳΜͱ͔ͳΔ
αʔόʔϨεͰσʔλ࿈ܞ͢Δࡍʹ ϋϚͬͨͱ͜Ζ
Step FunctionsͷεςʔτϚγϯͰLambdaͷ ϫʔΫϑϩʔΛ੍ޚͯ͠ɺσʔλΛՃ
Step FunctionsͷεςʔτϚγϯͰLambdaͷ ϫʔΫϑϩʔΛ੍ޚͯ͠ɺσʔλΛՃ σʔλൃੜݩ͔ΒɺσʔλΛऔ ಘͯ͠4ʹอଘ ֤ϑΝΠϧຖʹɺ࠷ݶͷσʔ λՃΛͯ͠ɺ4ʹอଘ 2VJDL4JHIU #* ༻ʹ
ෳϑΝΠϧͷσʔλΛ·ͱΊ ͯదʹܗ͢Δ
͍Ζ͍ΖϋϚͬͨͱ͜Ζ 4ͭհ
1. ಛఆͷσʔλϑΝΠϧଟ͗͢
Έ: ͋ΔಛఆͷσʔλϑΝΠϧ͚ͩҟৗʹଟ͍ • 5ؒͷσʔλ͕1ϑΝΠϧʹ͋Δ • தϛϦඵ୯ҐͷϨίʔυ • ಛఆͷॲཧ͚͕͔͔ͩ࣌ؒΔ
• ݅ଟ͍σʔλɺBIʹग़ྗ͠ͳ͍߲ͩͬͨ • ࣍ॲཧ͔ΒΓͯ͠ɺຖ࣌ॲཧʹมߋ • ࣍ॲཧͷϘτϧωοΫΛআ͍ͨ ରॲ๏: ͋ΔಛఆͷσʔλϑΝΠϧ͚ͩɺຖ࣌ॲ ཧʹมߋ
2. AthenaͷΫΥʔλ
• σʔλҠߦ࣌ʹɺ࣍ॲཧͷ࠷ޙͷLambdaͰΤϥʔʹͳΔ • લஈͰॲཧͨ͠ෳσʔλΛAthenaͬͯSQLΫΤϦͰऔಘ͢Δͱ͜ΖͰ ্ݶʹҾ͔͔ͬΔ • Lambdaؔ1ͭʹ͖ͭɺɹstart-query-executionɹAPIΛ5ճίʔϧ • Ұ࣌తʹόʔετͰ্ݶ80·Ͱ૿͑Δ͚ͲɺσʔλҠߦ࣌ʹ20Ͱ಄ଧͪ •
্ݶ؇ਃ͢Ε্ݶ͋͛ΒΕΔ Έ: AthenaͷΫΤϦಉ࣮࣌ߦͷΫΥʔλʹ Ҿ͔͔ͬΔ
IUUQTEPDTBXTBNB[PODPNKB@KQTUFQGVODUJPOTMBUFTUEHMJNJUTPWFSWJFXIUNM
ରॲ๏: Step Functions ͷMapεςʔτͷ࠷େಉ ࣮࣌ߦΛઃఆ • Mapεςʔτ (ྻ͢ͱɺಉ࣮࣌ߦͰྻཁૉΛॲཧ͢ΔΠϝʔδ) ͷ࠷େಉ࣮࣌ߦΛઃఆ͠ɺAthenaͷ start-query-execution
APIίʔ ϧΛ࠷େ20·Ͱʹ͓͑͞Δ
Mapεςʔτʹ͍ͭͯɺҎԼͷهࣄΛࢀߟʹ IUUQTEFWDMBTTNFUIPEKQBSUJDMFTTUFQGVODUJPOTVQEBUFNBQTUBUF IUUQTEPDTBXTBNB[PODPNKB@KQTUFQGVODUJPOTMBUFTUEHBNB[POTUBUFTMBOHVBHFNBQTUBUFIUNM
3. Step FunctionsͷΫΥʔλ
Έ: Step FunctionsͷΠϕϯτཤྺ͕ΫΥʔ λʹҾ͔͔ͬΔ • ͋Δಛఆͷ͚ͩɺຖ࣌ॲཧͷϑΝΠϧ͕ҟৗʹଟ͍ • 1࣌ؒܦͬͯҟৗऴྃɻStep FunctionsͷΠϕϯτཤྺͷ্ݶ౸ୡ (25,000Πϕϯτ)
• ্ݶ؇ෆՄͷ߲ { "error": "States.Runtime", "cause": "The execution reached the maximum number of history events (25000)." }
IUUQTEPDTBXTBNB[PODPNKB@KQTUFQGVODUJPOTMBUFTUEHMJNJUTPWFSWJFXIUNM
ରॲ๏: Step Functions ͷεςʔτϚγϯΛೖΕ ࢠʹ • εςʔτϚγϯΛೖΕࢠʹ͢Δ͜ͱͰɺΠϕϯτཤྺ্ݶʹҾ͔͔ͬ Βͳ͍Α͏ʹͨ͠ • Lambdaͷಉ࣮࣌ߦ͕͔ͳΓ૿͑ΔͷͰɺҎԼͷରԠΛՃ
✓ Lambdaͷಉ࣮࣌ߦͷ্ݶ؇ਃ ✓ Step FunctionsͷMapεςʔτͷ࠷େಉ࣮࣌ߦΛઃఆ
มߋલ มߋޙ
มߋલ มߋޙ
4. Lambdaͷεέʔϧ͕͍͔ͭͳ͍
Έ: 1ճ͚ͩLambdaͷRateLimitΤϥʔʹૺ۰ • ಉ࣮࣌ߦͷΤϥʔͷΑ͏͚ͩͲ… • ͢Ͱʹಉ࣮࣌ߦͷ্ݶΛҾ্͖͍͛ͯΔͷͷɺ֤ؔͷϞχλϦ ϯάݟΔݶΓɺಉ࣮࣌ߦʹ౸ୡ͍ͯ͠ͳ͍ { "error": "Lambda.TooManyRequestsException",
"cause": "Rate Exceeded. (Service: Lambda, Status Code: 429, Request ID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx, Extended Request ID: null)" }
IUUQTEPDTBXTBNB[PODPNKB@KQMBNCEBMBUFTUEHJOWPDBUJPOTDBMJOHIUNM
ରॲ๏: LambdaؔͷRetryઃఆΛݟ͠ • Step Functions ͷ Mapεςʔτͷ࠷େಉ࣮࣌ߦΛݟ͠ • Step Functions
Ͱఆٛ͢Δ Lambda ͷ Retry ઃఆΛݟ͠
Retry ͷִؒʹ͍ͭͯҎԼͷهࣄ͕ৄ͍͠ $ node -e '((i,m,b)=>{for(let w=i,c=0;c<m;c++){console.log(w+=(c==0?0:b**c))}})(2,7,1.85)' 2 3.85 7.272500000000001
13.604125000000002 25.317631250000005 46.987617812500005 87.07709295312502 IUUQTEFWDMBTTNFUIPEKQBSUJDMFTXBJU@UJNF@BOE@QBSBNT@JO@TUFQ@GVODUJPO@SFUSZ
Lambda ͷ Provisioned Concurrency ઃఆࠓճ ࣮ࢪͯ͠ͳ͍ IUUQTEFWDMBTTNFUIPEKQBSUJDMFTMBNCEBQSPWJTJPOFEDPODVSSFODZDPMETUBSU
σʔλͷՃʹ AWS Glueͱ͍͏αʔϏε͋ΔΑʁ
σʔλͷՃͳΒGlue͕͋Δ GlueΘͣʹɺΘ͟Θ͟Step Functions + LambdaͰΉඞཁ͋Δͷ͔ʁ • Step Functions + Lambdaͷ߹ɺΑ͘ΘΕΔ։ൃϑϨʔϜϫʔΫ͕͑ΔͷͰɺෳਓ
Ͱͷ։ൃ͕͍͢͠ɻ ✓ ࠓճ Serverless Framework ͬͨɻ • σʔλϑΝΠϧ͕ଟͯ͘ɺσʔλ1݅͋ͨΓͷ༰ྔ͕ͦ͜·Ͱେ͖͘ͳ͚Εɺ࣍ ୈͰLambdaͰॲཧ͕Ͱ͖Δɻ • LambdaͰΓΕͳ͍σʔλ༰ྔ࣮ߦ࣌ؒΛѻ͏߹ɺGlueͬͨํ͕͍͍ɻ ✓ ࠷େϝϞϦׂ: 10240MBɺ࠷େ࣮ߦ࣌ؒ: 15ɺ /tmp σΟϨΫτϦαΠζ: 512MB
͓·͚
͓·͚: AWS Data Wrangler͕ศར IUUQTHJUIVCDPNBXTMBCTBXTEBUBXSBOHMFS
͓·͚: AWS Data Wrangler͕ศར PandasͷػೳΛAWSʹ֦ு͢ΔɺΦʔϓϯιʔεͷPythonϥΠϒϥϦ • PandasσʔλϑϨʔϜͱAWSͷσʔλؔ࿈ͷαʔϏεͱΛ͏·͘ଓͯ͘͠Ε Δ ✓ Redshift
/ Glue / Athena / EMR ͳͲ • ௨ৗͷETLλεΫʹඞཁͳ͕ؔἧ͍ͬͯΔ
ҙ: ϑΝΠϧαΠζ͕େ͖ͯ͘ɺͦͷ··ͩ ͱLambdaʹΒͳ͍ • LambdaͷσϓϩΠύοέʔδඇѹॖ࣌ʹ250MBҎԼʹ͢Δඞཁ͕͋Δ ✓ AWS Data WranglerΛී௨ʹpipΠϯετʔϧ͢Δͱɺ250MB͑Δ •
GitHubͷReleaseϖʔδʹ͋ΔɺLambda Layer༻ͷzipϑΝΠϧΛར༻͠Α͏
·ͱΊ • αʔόʔϨεαʔϏεΛۦͯ͠ɺσʔλੳج൫ΛߏஙՄೳ • αʔόʔϨεͷΑ͋͘ΔΞʔΩςΫνϟύλʔϯΛ͏·͍͘͜ͳ͠ ·͠ΐ͏
͓͠·͍