Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
AWSにおけるデータ分析入門 / Introduction To Data Analytic...
Search
hedgehog051
October 06, 2021
0
200
AWSにおけるデータ分析入門 / Introduction To Data Analytics In AWS
hedgehog051
October 06, 2021
Tweet
Share
More Decks by hedgehog051
See All by hedgehog051
AWS Generative AI CDK Constructsについて
hedgehog051
2
210
KnowledgeBasesとAgentsの紹介
hedgehog051
4
1.5k
BedrockUpdatesPost-GW Summary
hedgehog051
4
640
来てくれClaude 3! Agents for Amazon Bedrockのモデル比較或いはチューニングの話
hedgehog051
5
1.5k
Relic_Tech_Camp_GenerativeAI.pdf
hedgehog051
11
86k
concurrencyで爆速並列デプロイ
hedgehog051
1
1.7k
AWS App Runnerについてとこれから期待したいこと/About-AWS-App-Runner-and-what-to-expect-in-the-future
hedgehog051
0
64
また増えた!?AWSコンテナ関連サービスを10分でざっくり掴もう/Learn-about-AWS-0container-services-in-10-minutes
hedgehog051
0
74
Featured
See All Featured
Embracing the Ebb and Flow
colly
84
4.6k
The MySQL Ecosystem @ GitHub 2015
samlambert
250
12k
BBQ
matthewcrist
86
9.5k
Why Our Code Smells
bkeepers
PRO
335
57k
Bash Introduction
62gerente
610
210k
Optimizing for Happiness
mojombo
376
70k
Done Done
chrislema
182
16k
Learning to Love Humans: Emotional Interface Design
aarron
273
40k
Into the Great Unknown - MozCon
thekraken
35
1.6k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
7
630
KATA
mclloyd
29
14k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
10
1.3k
Transcript
"8 4 ʹ ͓ ͚ Δ σ ʔ λ
ੳ ೖ ג ࣜ ձ ࣾ R e l i c ۽ ా
ࣗݾհ • ۽ా ,BO,VNBEB • ळdΠϯϑϥΤϯδχΞ • ݄ʹגࣜձࣾ3FMJDೖࣾ
σʔλੳ͕͍ͨ͠ʜ
ϏδωεΛΠϯςϦδΣϯε͍ͨ͠ʜ
ʑσʔλੳͷػӡߴ·Δ
ͦͷલʹ
• ݁Ռ࣮ͳͲͷσʔλΛऩू ˞زΒചΓ্͔͛ͨɺͲΕ͘Β͍ΞΫηε͕͔͋ͬͨͳͲ • ऩूͨ͠σʔλ͔ΒԿ͔͠ΒͷΠϯαΠτΛಘΔ ˞Կ͕ചΕ͍ͯΔ͔ɺ͍ͭɺ୭ʹചΕ͍ͯΔ͔ͳͲ • ಘΒΕͨΠϯαΠτʹରͯ͠ΞΫγϣϯΛى͜͢ ˞Ձ֨ΛௐɺදࣔΛௐɺλʔήοτ֦େͳͲ
σʔλੳͬͯԿ͢Δͷ
ԿΛ࣮ݱͨͯ͘͠σʔλੳΛ ͢Δͷ͔Λ໌֬ʹ͢Δͷ͕େࣄ
"84Ͱͷσʔλੳؔ࿈αʔϏε
ͳΔ΄ͲɺΘ͔ΒΜ
• ݁Ռ࣮ͳͲͷσʔλΛऩू ˠͲ͏ͬͯूΊΔ͔ɺԿॲʹूΊΔ͔ • ऩूͨ͠σʔλ͔ΒԿ͔͠ΒͷΠϯαΠτΛಘΔ ˠੳ͘͢͠ՃɺੳɺՄࢹԽ σʔλੳج൫Λߏங͢Δʹ͋ͨͬͯ
ͬ͘͟Γྨ
ऩू Amazon Kinesis Amazon Kinesi s Video Streams Amazon Kinesi
s Data Streams Amazon Kinesi s Data Firehose Amazon Manage d Streaming for Apache Kafka AWS Data Pipeline AWS Data Exchange
ੵ Amazon Redshift Amazon LakeFarmation Amazon S3
Ճ Amazon EMR AWS Glue AWS Glue Elastic Views
AWS Glue DataBrew Amazon Kinesi s Data Analytics
ੳ Amazon EMR AWS Athena Amazon Kinesi s Data Analytics
Amazon Redshift Amazon QuickSight Amazon OpenSearch Service
ՄࢹԽ Amazon ElasticSearch Service Amazon QuickSight Amazon OpenSearch Service ৭ʑ͋ͬͯ
ؾ࣋ͪɺগ͠ํੑݟ͖͑ͯͨ ؾ͕͢Δ
ͦΕͧΕΛͬ͘͟Γ
ऩू
ϦΞϧλΠϜετϦʔϛϯά Amazon Kinesis Amazon Kinesi s Video Streams Amazon Kinesi
s Data Streams Amazon Kinesi s Data Firehose Amazon Manage d Streaming for Apache Kafka AWS Data Pipeline AWS Data Exchange
Amazon Kinesis Amazon Kinesi s Video Streams Amazon Kinesi s
Data Streams Amazon Kinesi s Data Firehose Amazon Manage d Streaming for Apache Kafka KinesisαʔϏεͷ૯শ ετϦʔϛϯάಈըͷΩϟϓνϟɺ ॲཧɺอଘ ετϦʔϜσʔλͷΩϟϓνϟɺ ॲཧɺอଘ AWS σʔλετΞʹ ετϦʔϜσʔλΛϩʔυ ϚωʔδυܕApache Kafk a ετϦʔϜσʔλͷૹड৴
ͦͷଞ Amazon Kinesis Amazon Kinesi s Video Streams Amazon Kinesi
s Data Streams Amazon Kinesi s Data Firehose Amazon Manage d Streaming for Apache Kafka AWS Data Pipeline AWS Data Exchange
AWS Data Pipeline AWS Data Exchange αʔυύʔςΟσʔλͷ αϒεΫϦϓγϣϯ Reuters͕ఏڙ͢ΔهࣄσʔλͳͲ ఆظ࣮ߦʹΑΔσʔλҠಈɺม
ੵ
Amazon Redshift Amazon LakeFarmation Amazon S3 σʔλΣΞϋε γεςϜ͔Βେͳ”ߏԽσʔλ ” ΛूΊཧ͢Δݿ
σʔλϨΠΫΛߏங ະՃͰ༻్ఆΊΒΕ͍ͯͳ͍ σʔλΛอ͢Δ ΦϒδΣΫτετϨʔδ ”ߏԽσʔλ”ɺ“ඇߏԽσʔλ ” ͳͲΛอ͢ΔετϨʔδ
Ճɾੳ
Amazon EMR AWS Glue AWS Glue Elastic View s
(ϓϨϏϡʔ) AWS Glue DataBrew ϏοάσʔλϑϨʔϜϫʔΫ ؔ࿈OSSΛΈ߹Θͤͯେྔσʔλͷ ETLετϦʔϛϯάॲཧੳΛ࣮ߦ αʔόϨεETL(நग़/ม/ϩʔυ) ϊʔίʔυͰσʔλͷ ΫϦʔϯΞοϓͱਖ਼نԽ ϚςϦΞϥΠζυϏϡʔߏங ෳσʔλετΞʹΞΫηεͯ͠ σʔλΛ݁߹&ίϐʔ
AWS Athena Amazon Kinesi s Data Analytics ΞυϗοΫΫΤϦΛS3ʹର࣮ͯ͠ߦ ετϦʔϛϯάσʔλΛมɺੳ Amazon
Redshift σʔλΣΞϋε ෳࡶͳSQLΫΤϦΛ࣮ߦ
ՄࢹԽ
Amazon QuickSight Amazon OpenSearch Service&Kibana ϦΞϧλΠϜσʔλݕࡧ/ՄࢹԽ αʔόϨεBIπʔϧ/ՄࢹԽ
ͲΜͳ࣌ʹ͏ ओཁͦ͏ͳͷ
Amazon Kinesis Video Streams ɾಈըσʔλΛੜ͢ΔσόΠε͍҃ΞϓϦέʔγϣϯ͕͋Δ ɾHLSͰϥΠϒಈըըϝσΟΞΛϒϥβεϚϗʹετϦʔϛϯά͍ͨ͠ ɾϦΞϧλΠϜͷํϝσΟΞετϦʔϛϯάwebϒϥβετϦʔϛϯά͕͍ͨ͠ ɾಈըσʔλΛRekognitionVideo(ಈըೝࣝ)SageMaker(ML)ʹ͍͍ͨ
ɾαʔόσόΠε͕ੜ͢ΔϩάΠϕϯτσʔλΛϦΞϧλΠϜͰߴऩू͍ͨ͠ ɾ1ඵҎԼͷ͞ͰσʔλΛऩू͍ͨ͠ ɾετϦʔϛϯάσʔλΛLambdaͰॲཧ͍ͨ͠ ɾετϦʔϛϯάσʔλΛEC2ʹసૹ͍ͨ͠ ɾετϦʔϛϯάσʔλΛKinesis Data Analyticsʹసૹͯ͠ϦΞϧλΠϜੳ͍ͨ͠ Amazon Kinesis Data
Streams
ɾετϦʔϜσʔλΛS3RedshiftɺOpenSearchService৴͍ͨ͠ ɾ΄΅ϦΞϧλΠϜ(60ඵҎ)ͷ͞ͰσʔλΛ্هσʔλετΞ৴͍ͨ͠ ɾσʔλΛDatadogɺNewRelicɺMongoDBͳͲͷαʔϏεϓϩόΠμ৴͍ͨ͠ ɾσʔλΛσʔλετΞʹ৴͢ΔલʹApachParquetApacheORCʹม͍ͨ͠ ɾΞϓϦͷ։ൃΠϯϑϥͷཧΛͤͣʹσʔλετΞ৴͍ͨ͠ Amazon Kinesis Data Firehose
ɾετϦʔϛϯάσʔλʹରͯ͠ϦΞϧλΠϜʹඪ४SQLͰΫΤϦ͍ͨ͠ ɾ1ඵະຬͷ͞ͰετϦʔϛϯάσʔλΛϦΞϧλΠϜͰੳ͍ͨ͠ ɾApache FlinkΛ༷ͬͯʑͳAWSαʔϏεͱ౷߹ͯ͠ετϦʔϛϯά ETL͍ͨ͠ ɾSQLɺJavaɺScalaɺPythonͰੳΞϓϦέʔγϣϯΛߏஙͯ͠ੳ͍ͨ͠ Amazon Kinesis Data Analytics
ɾϊϯϦΞϧλΠϜ ɾAWSͷετϨʔδίϯϐϡʔςΟϯάɺΦϯϓϨϛεͷσʔλΛఆظతʹҠಈ͍ͨ͠ ɾσʔλҠಈͷࡍʹ؆୯ͳมͳͲͷॲཧΛߦ͍͍ͨ ɾRDS→DynamoDBͳͲͷσʔλҠಈ͕͍ͨ͠ͳͲ AWS Data Pipeline
ɾߏԽσʔλɺߏԽσʔλΛੳ͍ͨ͠ ɾେن(ϖλόΠτ)σʔλʹରͯ͠ෳࡶͳSQLΫΤϦΛ࣮ߦ͍ͨ͠ ɾܧଓతͳॻ͖ࠐΈߋ৽ͳ͘ɺେنσʔλΛҰׅͰੳ͕͍ͨ͠ ɾRedshift SpectrumΛ༻͍ͯS3ͷσʔλʹରͯ͠SQLΫΤϦΛ࣮ߦ͍ͨ͠ ɾΫΤϦ݁ՌΛS3ʹอଘͯ͠ଞAWSαʔϏεͳͲͰར༻͍ͨ͠ Amazon Redshift
ɾσʔλS3ʹ͋ΓɺγϯϓϧͳΞυϗοΫΫΤϦΛ࣮ߦ͍ͨ͠ ɾcsvɺjsonɼorcɺParquetܗࣜͳͲͷϑΝΠϧʹΫΤϦ͍ͨ͠ ɾαʔόϨεʹΫΤϦΛ࣮ߦ͍ͨ͠ ɾETLෆཁ ɾΫΤϦ݁ՌΛcsvʹग़ྗ͍ͨ͠ AWS Athena
ɾσʔλϨΠΫΛ؆୯ʹߏங͍ͨ͠ ɾࠓޙͷσʔλੳʹ͚ͯنʹؔΘΒͣະՃͷσʔλΛҰݩอ͍ͨ͠ ɾσʔλՃޙɺະՃσʔλอ͍࣋ͨ͠ ɾ৫ͷ༷ʑͳ෦ॺ͕֤ʑσʔλΛͬͯੳΛ͍ͨ͠ Amazon LakeFarmation
ɾOSSΛॊೈʹΧελϚΠζͯ͠σʔλॲཧΛΓ͍ͨ ɾେنσʔληοτͷETL(நग़/ม/ಡΈࠐΈ)Λ͍ͨ͠ ɾApache Spark MLlibɺTensorFlowɺApache MXNetͰML͍ͨ͠ ɾApache SparkApache HiveͰS3ͷΫϦοΫετϦʔϜσʔλΛੳ͍ͨ͠ ɾApache
FlinkͱApache Spark StreamingͰϦΞϧλΠϜετϦʔϛϯά͍ͨ͠ Amazon EMR
ɾαʔόʔϨεͰதنͷETL(நग़/ม/ಡΈࠐΈ)͕͍ͨ͠ ɾRedshiftɺS3ɺRDSɺDynamoDBͳͲͷσʔλΛETL͍ͨ͠ ɾσʔλιʔεΛఆظతʹΫϩʔϧͯ͠DataCatalogΛߋ৽ࣗ͠ಈతʹม͍ͨ͠ AWS Glue
ɾOpenSearchΫϥελΛ؆୯ʹߏஙͯ͠ΞϓϦͷϩάσʔλΛੳ͍ͨ͠ ɾΞϓϦΣϒαΠτɺσʔλϨΠΫΧλϩάͷݕࡧͰ͖ΔΑ͏ʹ͍ͨ͠ ɾΠϯϑϥͷϩάϝτϦοΫΛऩूͯ͠ϦΞϧλΠϜʹՄࢹԽ͍ͨ͠ ɾετϦʔϜσʔλΛϦΞϧλΠϜʹՄࢹԽ͍ͨ͠ Amazon OpenSearch Service&Kibana
ɾαʔόϨεͳBIπʔϧ͕͍͍ͨ ɾ༷ʑͳσʔλιʔε͔ΒσʔλΛՄࢹԽ͍ͨ͠ ɹ※S3ɺRDSɺAthenaɺRedshiftɺOpenSearchɺcsvjsonͳͲ ɾϦΞϧλΠϜͰͳ͘ఆظతͳάϥϑσʔλͳͲͷϨϙʔτ͕ཉ͍͠ ɾ༷ʑͳάϥϑΛ༻͍ͯੳ͍ͨ͠ Amazon QuickSight
2VJDL4JHIUՄࢹԽΠϝʔδ IUUQTBXTBNB[PODPNKQRVJDLTJHIUHBMMFSZ
None
None
બఆʹ͓͚ΔߟྀϙΠϯτ
·ͱΊ
·ͱΊ ऩू/ੳ/ՄࢹԽͷཻʹӨڹ͢ΔͷͰɺ Կͷҝͷੳ͔Λ໌֬ʹ͠Α͏
͋Γ͕ͱ͏͍͟͝·ͨ͠