Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Speaker Deck
PRO
Sign in
Sign up for free
Azure Databricks Learning Series #2 - Databricks SQL
Hiroyuki Nakazato / äžé 浩ä¹
January 25, 2023
Technology
0
72
Azure Databricks Learning Series #2 - Databricks SQL
Hiroyuki Nakazato / äžé 浩ä¹
January 25, 2023
Tweet
Share
More Decks by Hiroyuki Nakazato / äžé 浩ä¹
See All by Hiroyuki Nakazato / äžé 浩ä¹
Microsoft Build 2022 Recap Party!! Azure ã®ããŒã¿ & åæãµãŒã㹠泚ç®ã¢ããããŒã / microsoft-build-2022-recap-azure-data-and-analytics
nakazax
0
440
Microsoft Build 2022 - Azure ã®ããŒã¿ & åæãµãŒãã¹ ææ°ã¢ããããŒã / Microsoft Build 2022 Updates on Azure Data and Analytics Services
nakazax
1
660
Architecture patterns of Azure Cosmos DB & Azure Synapse Analytics
nakazax
1
180
CY2021 Updates for ADF & Synapse & Cosmos
nakazax
0
66
Ignite 2020 Update Azure Synapse Analytics
nakazax
0
920
Ignite 2020 Update - Azure DB for MySQL and Postgres - Azure Arc - and more
nakazax
2
330
Build 2020 Update Azure Cosmos DB and Azure Synapse Analytics
nakazax
0
890
Ignite 2019 Update Microsoft Data Platform
nakazax
0
59
1æ¥100å以äžã®Hadoopã¯ã©ã¹ã¿ãŒãäœ¿ãæšãŠãæ¹æ³ / How to throw away 100 Hadoop clusters a day
nakazax
0
4.3k
Other Decks in Technology
See All in Technology
230125 ã¢ãã¿ãŒããŠã³ãLT ITã¬ãžã§ããç¿(Ryu.Cyber)ãã
comucal
PRO
0
4.7k
DNSæš©åšãµãŒãã®ã¯ã©ãŠããµãŒãã¹åãã«è¡ãããæ»æããã³å¯Ÿç / DNS Pseudo-Random Subdomain Attack and mitigations
kazeburo
5
1.3k
SmartHRããOktaãžã®SCIM飿ºã§äœãåºãHRããªãã³ã®ã¢ã«ãŠã³ã管ç
jousysmiler
1
120
OPENLOGI Company Profile
hr01
0
12k
ããæ¬çªãããã¯ãŒã¯ããŸãããšä»®æ³ç°å¢ã«âã³ããŒâã§ãããããããã§ãã? / janog51
corestate55
0
380
ã©ãºãã€ãšGASã§å 湿åšã®æ¶ãå¿ããLINEã§ãªãã€ã³ãïŒæäœ
minako__ph
0
150
OVN-Kubernetes-Introduction-ja-2023-01-27.pdf
orimanabu
1
390
01_ãŠãŒã¶ãŒãªãµãŒã宿œã®é²ãæ¹
kouzoukaikaku
0
530
ã¹ã¯ã©ã å°å ¥ããŠå€ãã£ãããŒã ãçµç¹ã®ãããã
yumechi
0
190
ECããã¯ã«ã³ãã¡ã¬ã³ã¹2023 ECäºæ¥éšã®ã¢ãã€ã«éçº2023
tatsumi0000
0
300
ãÎïŒããã ïŒãWinActorããå§ãããã€ã®ãŸã«ãªã¹ããªã³ã° / WinAtorã©ã€ããã³ã°ããŒã¯å€§äŒ20230123
lambda
0
110
SPAã»SSGã§SSRã®ãããªOGP察å¿ïŒ
simo123
2
150
Featured
See All Featured
Facilitating Awesome Meetings
lara
33
4.6k
Automating Front-end Workflow
addyosmani
1351
200k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
657
120k
Atom: Resistance is Futile
akmur
256
24k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
349
27k
Thoughts on Productivity
jonyablonski
49
2.7k
Building Adaptive Systems
keathley
27
1.3k
It's Worth the Effort
3n
177
26k
How to train your dragon (web standard)
notwaldorf
66
4.3k
In The Pink: A Labor of Love
frogandcode
132
21k
Music & Morning Musume
bryan
37
4.6k
Adopting Sorbet at Scale
ufuk
65
7.8k
Transcript
Azure Databricks Learning Series ~ #2: Databricks SQL ~
ã¹ããŒã«ãŒ Hiroyuki Nakazato äžé æµ©ä¹ æ¥æ¬ãã€ã¯ããœããæ ªåŒäŒç€Ÿ ã«ã¹ã¿ã㌠ãµã¯ã»ã¹äºæ¥æ¬éš ã¯ã©ãŠã ãœãªã¥ãŒã·ã§ã³
ã¢ãŒããã¯ã Yici Chen é³ ãã€ æ¥æ¬ãã€ã¯ããœããæ ªåŒäŒç€Ÿ ã«ã¹ã¿ã㌠ãµã¯ã»ã¹äºæ¥æ¬éš ã«ã¹ã¿ã㌠ãšã³ãžãã¢
ã¢ãžã§ã³ã 1. Databricks SQL (DB SQL) ã®æŠèŠ 2. DB SQL
ã®ç¹é· 3. DB SQL ã®ææ°æ å ± 4. DB SQL ã®äŸ¡æ Œã®èãæ¹ 5. DB SQL ãš Azure ãµãŒãã¹ãçµã¿åããã ããŒã³ãŒã ETL ãšããŒã¿åæã® ã¢ãŒããã¯ãã£ãŒçŽ¹ä» & ãã¢
Databricks SQL ã®æŠèŠ â¢ ã¬ã€ã¯ããŠã¹ ãã©ãããã©ãŒã ãšã¯ ⢠Databricks SQL ãšã¯
ããŒã¿ãšåæã«é¢ããçŸåšã®ãŠãŒã¶ãŒ ããŒãº çããŒã¿ æŽçãããããŒã¿ ãããããŠãŒã¹ ã±ãŒã¹åãã® ã³ã©ãã¬ãŒã·ã§ã³ ãã©ãããã©ãŒã ⢠ããŒã¿
ãšã³ãžãã¢ãããŒã¿ ãµã€ãšã³ãã£ã¹ããã¢ããªã¹ã åãã®ãã€ãã£ã ãµããŒã ⢠䞻èŠãªèšèªãšãã¬ãŒã ã¯ãŒã¯ã®å©çš ãã¹ãŠã®ããŒã¿ã«å¯Ÿããå°çšã®ããŒã¿ ã¬ã€ã¯ ⢠äœã³ã¹ãã§èä¹ æ§ã®é«ãã¯ã©ãŠã ã¹ãã¬ãŒãž ⢠ããããããŒã¿ã®çš®é¡ã«å¯Ÿãããã€ãã£ã ãµããŒã ⢠ããŒã¿ã¯ãŠãŒã¶ãŒ ã¢ã«ãŠã³ãå ã«ä¿æ ã¬ã€ã¯äžã®æ§é åããã ãã©ã³ã¶ã¯ã·ã§ã³ ã¬ã€ã€ãŒ ⢠ããããšã¹ããªãŒãã³ã°ã®ãµããŒã ⢠ããã¯ã€ã³ãªãã®ããŒã¿ ã¬ã€ã¯äžã®ãªãŒãã³åœ¢åŒ ⢠ã¬ããã³ã¹ãç³»åãã»ãã¥ãªã㣠ããŒã¿ ãµã€ãšã³ã¹ãš æ©æ¢°åŠç¿ åæã BIãAI ããŒã¿ ãšã³ãžãã¢ãªã³ã°ãš ã¹ããªãŒãã³ã° å€§èŠæš¡/å°èŠæš¡ é«é »åºŠ/äœé »åºŠ æ§é å/åæ§é å /éæ§é å
ããŒã¿ ãŠã§ã¢ããŠã¹ ããŒã¿ ã¬ã€ã¯ ã¬ã€ã¯ããŠã¹ ããããããŒã¿ãåæãAI ã¯ãŒã¯ããŒãã çµ±åããåäžã®ãã©ãããã©ãŒã ãã©ãã€ã ã·ãã:
ã¬ã€ã¯ããŠã¹
Microsoft Azure äžã§ã®ã¬ã€ã¯ããŠã¹ ãã©ãããã©ãŒã Delta Lake Azure Data Lake Storage
Gen 2 æ§é åããŒã¿ åæ§é åããŒã¿ éæ§é åããŒã¿ çããŒã¿ æŽçãããããŒã¿ BIã»ããã·ã¥ããŒã ããŒã³ãŒã/ããŒã³ãŒã ããŒã¿ ãã€ãã©ã€ã³ SQL ããŒã¹ã®åæ æ©æ¢°åŠç¿ ããŒã¿ ãšã³ãžãã¢ãªã³ã° ããŒã¿ ãµã€ãšã³ã¹ ã¬ã€ã¯ ããŠã¹ Microsoft Azure ã¹ããªãŒãã³ã° ããŒã¿
Databricks SQL ããŒã¿ ã¬ã€ã¯ãæšæºç㪠SQL ã§åæã ãªãããªããã·ã¥ããŒãã§å¯èŠåã§ããæ©èœ Unity Catalog SQL
ãŠã§ã¢ããŠã¹ Photon ãšã³ãžã³ ãã£ãã·ã¥ (ã¯ãšãªçµæ / ãã£ã¹ã¯ / UI) Curated data æŽçãããããŒã¿ SQL ãšãã£ã¿ãŒ ããã·ã¥ããŒã ã¢ããªã¹ã ãšã¯ã¹ããªãšã³ã¹ 管çè ãšã¯ã¹ããªãšã³ã¹ â ã¢ããªã¹ããšç®¡çè åãã®åªãã UI/UX â ãã«ã ã¯ã©ã¹ã¿ãŒã®é«æ§èœãªã³ã³ãã¥ãŒãã£ã³ã° (Photon ãšã³ãžã³ & è±å¯ãªãã£ãã·ã¥) â ããŒã¿ ã¢ã¯ã»ã¹ãšã¡ã¿ããŒã¿ãäžå 管çã§ãã ããŒã¿ ã¬ããã³ã¹ ãœãªã¥ãŒã·ã§ã³ â ãªãŒãã³ã§é«æ§èœãACID ãå®çŸãã ææ°ã®ããŒã¿ ãããžã¡ã³ãã»ãã¯ãããžãŒ â Premium ã¬ãã«ã®ã¯ãŒã¯ã¹ããŒã¹ã§å©çšå¯èœ
Databricks SQL ã®æŽå² ï 2020 幎 4 æ Databricks 瀟ã
Redash 瀟ãè²·å ï 2020 幎 11 æ Databricks SQL ã®ãããªã㯠ãã¬ãã¥ãŒãéå§ ï 2021 幎 12 æ Databricks SQL ã®äžè¬æäŸãéå§ OSS ã®ã¡ãžã£ãŒãªããã·ã¥ããŒã ããŒã« å€ãã®ããŒã¿ ãœãŒã¹ãžã®æ¥ç¶ãã¯ãšãªã§ã®åæã ãªãããªããã·ã¥ããŒãã«ããå¯èŠå
Databricks SQL ã®äž»èŠæ©èœ SQL åæãããã·ã¥ããŒããã¢ã©ãŒãæ©èœã«å ããŠäž»èŠ BI ããŒã«ãš JDBC/ODBC æ¥ç¶ã«å¯Ÿå¿ æ©èœ
æŠèŠ SQL åæ Azure Data Lake Storage Gen2 ã Amazon S3 ãªã©ã®ã¯ã©ãŠã ã¹ãã¬ãŒãžã« æ ŒçŽãããŠããããŒã¿ã ANSI æšæº SQL æºæ ã® SQL ã§åæ ããã·ã¥ããŒã SQL ã®å®è¡çµæããªãããªããã·ã¥ããŒãã§å¯èŠå ã¢ã©ãŒã SQL ãã¹ã±ãžã¥ãŒã«å®è¡ããç¹å®ã®ãã£ãŒã«ããäºåå®çŸ©ããéŸå€ã è¶ ãããã¢ã©ãŒãã«ãã£ãŠéç¥ BI ããŒã« ãµããŒã Power BI, Tableau, Looker, Qlik Sense, MicroStrategy ãªã©ã® äž»èŠãª BI ããŒã«ããæ¥ç¶å¯èœ (äžèŠ§ã¯ Databricks ããŒãã㌠ãåç §) JDBC/ODBC ãã©ã€ã㌠ãµããŒã JDBC/ODBC ãã©ã€ããŒãå©çšããŠä»»æã®ããã°ã©ã ããæ¥ç¶å¯èœ
ã¢ããªã¹ã ãšã¯ã¹ããªãšã³ã¹ â ããŒã¿ããŒã¹ãšããŒãã«ãç°¡åã«æ¢çŽ¢ãã äœ¿ãæ £ãã ANSI SQL ã§ããŒã¿ã åæãã â
ã€ã³ã¿ã©ã¯ãã£ããªããžã¥ã¢ã©ã€ãŒãŒã·ã§ã³ ã§çµæããã°ããçè§£ãã â ã¯ãšãªãä¿åã»å ±æã»åå©çšããŠã ããè¿ éã«çµæãåŸããã æ°ããã€ã³ãµã€ãããã°ããèŠã€ãã
SQL ãŠã§ã¢ããŠã¹ â SQL åæãš BI ã«æé©åããã èšç®ãªãœãŒã¹ãçŽ æ©ãã»ããã¢ãã â èªåã¹ã±ãŒã«ã«ããé«ãã³ã³ã«ã¬ã³ã·ãŒ
â èªå忢ãã¹ããã ã€ã³ã¹ã¿ã³ã¹ã® 掻çšã«ããã³ã¹ãæé©å â ãµãŒããŒã¬ã¹ (ãã¬ãã¥ãŒ) ã§å©çšå¯èœ 髿§èœãã€ã¹ã±ãŒã©ãã«ãã¹ãã¬ãŒãžãšå®å šã«åé¢ãããèšç®ãªãœãŒã¹
管çè ãšã¯ã¹ããªãšã³ã¹ â ãŠã§ã¢ããŠã¹ããšã«åŠçã¯ãšãªæ°ã ã¯ã©ã¹ã¿ãŒæ°ã®å¢æžã®æšç§»ãç£èŠ â 䜿çšç¶æ³ã®é©åãªçè§£ãšãµã€ãžã³ã°ã® æé©åã«åœ¹ç«ãŠããã SQL ãŠã§ã¢ããŠã¹ã®ç£èŠãšãµã€ãžã³ã°
管çè ãšã¯ã¹ããªãšã³ã¹ â SQL ãŠã§ã¢ããŠã¹ã§å®è¡ããå šã¯ãšãªã ã¯ãšãªå±¥æŽã«èšé²ããã â ã¯ãšãªããšã®å®è¡æéãåŠçããŒã¿éã è¿åŽè¡æ°ã I/O ããã©ãŒãã³ã¹ã確èª
â ã¯ãšãª ãããã¡ã€ã« (ã¯ãšãªå®è¡ã®è©³çŽ°ã® èŠèŠå) ã Spark Web UI ã«ãã 詳现ãªãã¬ãŒã¹ â ããã«ããã¯ãã³ã¹ãã®ãããæäœãç¹å® ããŠã¯ãšãªãæ¹å ã¯ãšãªã®çè§£ãšæé©å
管çè ãšã¯ã¹ããªãšã³ã¹ â æ°ãããŠãŒã¶ãŒã®å©çšéå§ãããŒã¿ã®æ€ åºã»ä¿è·ã»ç®¡çã確å®ã«å®è¡ â ãŠã§ã¢ããŠã¹ã®ç£èŠãšã¯ãšãªå±¥æŽã«ãã ã³ã¹ããšäœ¿çšç¶æ³ãå¹ççã«ç®¡ç â çµèŸŒã®ç£æ»èšŒè·¡ã«ããã³ã³ãã©ã€ã¢ã³ã¹ã® ããŒãºã«å¯Ÿå¿
ã»ã«ã ãµãŒãã¹åæã®ã¬ããã³ã¹ã容æã«è¡ãã
â¢ é«æ§èœ ⢠é«ã¹ã±ãŒã©ããªã㣠⢠é«ã³ã¹ãæ§èœ Databricks SQL ã®ç¹é·
Databricks SQL ã® 3 ã€ã®ç¹é· 1 é«ã¹ã±ãŒã©ããªã㣠2 髿§èœ 3
é«ã³ã¹ãæ§èœ ⢠ã¹ãã¬ãŒãžãšã³ã³ãã¥ãŒãã® å®å šãªåé¢ â¢ ãã«ã ã¯ã©ã¹ã¿ãŒ ⢠èªåã¹ã±ãŒã« ⢠Photon ãšã³ãžã³ â¢ è€æ°ã®ãã£ãã·ã¥ ⢠Delta Lake ã«æé©å ⢠ã¹ããã VM ã®æŽ»çš â¢ èªå忢 ⢠DBU ã®äºåè³Œå ¥å²åŒ ⢠ãµãŒããŒã¬ã¹ (ãã¬ãã¥ãŒ)
Databricks SQL ã®ã¢ãŒããã¯ãã£ãŒ â Azure Databricks ã¯ãŒã¯ã¹ããŒã¹å ã« è€æ°ã® SQL ãŠã§ã¢ããŠã¹ãäœæå¯èœ
â ãŠã§ã¢ããŠã¹ã¯ 1 ã€ä»¥äžã®ã¯ã©ã¹ã¿ãŒã§æ§æ â ãŠã§ã¢ããŠã¹ãæ§æããã¯ã©ã¹ã¿ãŒæ°ãå¢ãã ããšã§ã¯ãšãªã®åæå®è¡æ§èœãåäž â ã¯ã©ã¹ã¿ãŒ ãµã€ãºãäžããããšã§åŠçæ§èœã åäžããã¯ãšãªã®åŸ æ©æéãççž® â èªåã¹ã±ãŒã«ãèšå®ããããšã§ã¯ãšãªã®åŠç ç¶æ³ã«å¿ããŠã¯ã©ã¹ã¿ãŒæ°ãèªåçã«å¢æž ã¹ãã¬ãŒãžãšã³ã³ãã¥ãŒãã£ã³ã°ãå®å šã«åé¢ãé«ãã¹ã±ãŒã©ããªãã£ãæã€ SQL Warehouse #1 Cluster #N Driver Node ⊠SQL Warehouse #N Cluster #N Driver Node ⊠⊠⊠Cluster #1 Driver Node ⊠Worker Nodes Worker Nodes Worker Nodes ADLS Gen2 ãªã©ã®ã¯ã©ãŠã ãªããžã§ã¯ã ã¹ãã¬ãŒãž Application or User connection Application or User connection
ã¯ã©ã¹ã¿ãŒ ãµã€ãºãšä»®æ³ãã·ã³ (VM) ã®å¯Ÿå¿ ï ã¯ã©ã¹ã¿ãŒ ãµã€ãº: 2X-Small ãã 4X-Large
(T ã·ã£ã ãµã€ãžã³ã°ãšè¡šçŸ) ï ãã©ã€ã㌠ããŒã: ã¯ã©ã¹ã¿ãŒ ãµã€ãºã«å¿ã㊠VM ã®ã¹ããã¯ãå€åãå°æ°ã¯ 1 åºå® ï ã¯ãŒã«ãŒ ããŒã: ã¯ã©ã¹ã¿ãŒ ãµã€ãºã«å¿ã㊠VM ã®å°æ°ãå€åãã¹ããã¯ã¯ Standard_E8ds_v4 åºå® ã¯ã©ã¹ã¿ãŒ ãµã€ãº ãã©ã€ã㌠ããŒã ã¹ãã㯠ãã©ã€ã㌠ããŒã å°æ° ã¯ãŒã«ãŒ ããŒã ã¹ãã㯠ã¯ãŒã«ãŒ ããŒã å°æ° DBU 2X-Small Standard_E8ds_v4 1 Standard_E8ds_v4 1 4 X-Small Standard_E8ds_v4 1 Standard_E8ds_v4 2 6 Small Standard_E16ds_v4 1 Standard_E8ds_v4 4 12 Medium Standard_E32ds_v4 1 Standard_E8ds_v4 8 24 Large Standard_E32ds_v4 1 Standard_E8ds_v4 16 40 X-Large Standard_E64ds_v4 1 Standard_E8ds_v4 32 80 2X-Large Standard_E64ds_v4 1 Standard_E8ds_v4 64 144 3X-Large Standard_E64ds_v4 1 Standard_E8ds_v4 128 272 4X-Large Standard_E64ds_v4 1 Standard_E8ds_v4 256 528
ãã«ã ã¯ã©ã¹ã¿ãŒã®ã¯ãšãªå²ãåœãŠã«é¢ãã仿§ â 1 ã¯ã©ã¹ã¿ãŒã«å²ãåœãŠãããã¯ãšãªæ° ï 以å㯠Docs ã« 10
ã¯ãšãª / ã¯ã©ã¹ã¿ãŒã®èšèŒããã£ãããçŸåšã¯ã¯ãšãªã®èšç®ã³ã¹ãã«åºã¥ãå²ãåœãŠã«å€æŽ â ã¯ãšãªã®ã«ãŒãã£ã³ã° ï æ°ããã»ãã·ã§ã³ïŒæãè² è·ãäœãã¯ã©ã¹ã¿ãŒã«ã«ãŒãã£ã³ã° ï æ¢åã®ã»ãã·ã§ã³ïŒãã®ã»ãã·ã§ã³ã®åã®ã¯ãšãªãå®è¡ããã¯ã©ã¹ã¿ãŒã«ã«ãŒãã£ã³ã° (ãã®ã¯ã©ã¹ã¿ãŒãå©çš ã§ããªãå Žåã¯æãè² è·ãäœãã¯ã©ã¹ã¿ãŒã«ã«ãŒãã£ã³ã°) â ã¯ãšãªã®ãã¥ãŒã€ã³ã° ï ãŠã§ã¢ããŠã¹ã STARTING ç¶æ ãŸãã¯ããã¹ãŠã®ã¯ã©ã¹ã¿ãŒãåŠçèœåã®éçãŸã§ã¯ãšãªãå®è¡ããŠãããšãã ã¯ãšãªã¯ãã¥ãŒã«ç»é²ããã â» ãŠã§ã¢ããŠã¹ã STARTING ç¶æ ã®å Žåãé€ããã¡ã¿ããŒã¿ ã¯ãšãª (DESCRIBE table ãªã©) ã ç¶æ ã®å€æŽã¯ãšãª (SET ãªã©) ã¯ãã¥ãŒã«ç»é²ãããªã Databricks SQL ãŠã§ã¢ããŠã¹ãšã¯ - Azure Databricks - Databricks SQL | Microsoft Learn
ã¯ã©ã¹ã¿ãŒæ°ã®èªåã¹ã±ãŒã«ã®ä»æ§ â èªåã¹ã±ãŒã« ã¢ãŠãã®ä»æ§ ï å®è¡äžã®ã¯ãšãª & ãã¥ãŒå ã®ã¯ãšãª & 次ã®
2 åéã«äºæ³ãããåä¿¡ã¯ãšãªã®äºæž¬åŠçæéãå ã«å€å®ãã ï äžèšã«ãããããããã¥ãŒå ã§ 5 å以äžåŸ æ©ããŠããã¯ãšãªãååšããå Žåãã¯ã©ã¹ã¿ãŒã远å ãã â èªåã¹ã±ãŒã« ã€ã³ã®ä»æ§ ï äœè² è·ã®ç¶æ ã 15 åéç¶ããå Žåã«ã¯ã©ã¹ã¿ãŒãçž®å°ãã ï ãã®å Žåãéå» 15 åéã®ããŒã¯è² è·ãåŠçããã®ã«ååãªã¯ã©ã¹ã¿ãŒãç¶æããã ï äŸïŒããŒã¯è² è·ã 25 ã®åæã¯ãšãªã®å Žåã3 ã€ã®ã¯ã©ã¹ã¿ãŒãç¶æããã Databricks SQL ãŠã§ã¢ããŠã¹ãšã¯ - Azure Databricks - Databricks SQL | Microsoft Learn äºæž¬åŠçæé èªåã¹ã±ãŒã« ã¢ãŠãã®ä»æ§ 2 åæªæºã®å Žå ã¯ã©ã¹ã¿ãŒã远å ããªã 2 åãã 6 åã®å Žå 1 ã¯ã©ã¹ã¿ãŒã远å 6 åãã 12 åã®å Žå 2 ã¯ã©ã¹ã¿ãŒã远å 12 åãã 22 åã®å Žå 3 ã¯ã©ã¹ã¿ãŒã远å äžèšä»¥å€ã®å Žå 3 ã¯ã©ã¹ã¿ãŒ + äºæž¬åŠçæéã 15 åå¢å ããããšã« 1 ã¯ã©ã¹ã¿ãŒã远å
SQL ãŠã§ã¢ããŠã¹ã®ã¯ã©ã¹ã¿ãŒ ãµã€ãºãšã¯ã©ã¹ã¿ãŒæ°ã®å¢å ã¯ã©ã¹ã¿ãŒ #1 ãã©ã€ããŒ: Standard_E16ds_v4 à 1 ã¯ãŒã«ãŒ:
Standard_E8ds_v4 à 4 SQL ãŠã§ã¢ããŠã¹ ãµã€ãº: Small, ã¯ã©ã¹ã¿ãŒæ°: 1 ã¯ã©ã¹ã¿ãŒ #1 ãã©ã€ããŒ: Standard_E32ds_v4 à 1 ã¯ãŒã«ãŒ: Standard_E8ds_v4 à 8 ãµã€ãº: Medium, ã¯ã©ã¹ã¿ãŒæ°: 1 ã¯ã©ã¹ã¿ãŒ ãµã€ãº ã¢ãã ãã©ã€ããŒã®ã¹ãã㯠ã¢ãããšã¯ãŒã«ãŒå°æ°å¢å â æ§èœã¢ããã«ããã¯ãšãªã®åŸ æ©æéãççž® â ãã£ã¹ã¯ ãã£ãã·ã¥ã®å®¹éãå¢å ãµã€ãº: Small, ã¯ã©ã¹ã¿ãŒæ°: 2 ã¯ã©ã¹ã¿ãŒ #1 ãã©ã€ããŒ: Standard_E16ds_v4 à 1 ã¯ãŒã«ãŒ: Standard_E8ds_v4 à 4 ã¯ã©ã¹ã¿ãŒ #2 ãã©ã€ããŒ: Standard_E16ds_v4 à 1 ã¯ãŒã«ãŒ: Standard_E8ds_v4 à 4 ã¯ã©ã¹ã¿ãŒæ° å¢å åãæ§æã®ã¯ã©ã¹ã¿ãŒã远å â ã¯ãšãªã®åæå®è¡æ§èœãåäž
Databricks SQL ã TPC-DS 100TB ã§ã®äžçèšé²ãæŽæ° 2021 幎 11 æã®çºè¡š
- TPC-DS 100TB ã¯ã©ã¹ã§ä»ç€Ÿ DWH ãšæ¯èŒã㊠2.7 åé«éãäŸ¡æ Œæ§èœã«ãã㊠12 ååªããŠãããšã®çµæãå ¬åŒã«å ¬é (åºå ž) Databricks ã DWH ããã©ãŒãã³ã¹ã®å ¬åŒèšé²ãæŽæ° - Databricks ããã°
Photon ãšã³ãžã³ â Databricks 瀟ããã€ãã£ã ã³ãŒã (C++) ã§ æ°ããéçºãããã¯ãã«åã¯ãšãª ãšã³ãžã³
â Apache Spark ãš 100% ã®äºææ§ãæã€ â çããŒã¿ãšåæå圢åŒã®æ§é åããŒã¿ã® äž¡è ã«å¯ŸããŠåªããæ§èœãçºæ® â 以åã® Databricks ã©ã³ã¿ã€ã ãã 2 å 以äžã®é«éå â Databricks SQL 㯠Photon ãšã³ãžã³ã ããã©ã«ãã§æå¹å ã¬ã€ã¯ããŠã¹ ãã©ãããã©ãŒã ã®ããã®æ¬¡äžä»£é«éã¯ãšãª ãšã³ãžã³
SQL ãŠã§ã¢ããŠã¹ã®ãã£ãã·ã¥ è€æ°ã®ãã£ãã·ã¥ãèªåçã«é©çšãããå€§å¹ ãªæ§èœåäžã«å¯äž User BI Application / SQL Connection SQL
Warehouse #1 Cluster #1 Driver Node Worker Nodes Worker Nodes ãã£ã¹ã¯ ãã£ãã·ã¥ (æ§ç§° Delta ãã£ãã·ã¥) ã¯ã©ãŠã ã¹ãã¬ãŒãžã® Delta Lake & Parquet ã®ããŒã¿ã ã¯ãŒã«ãŒ ããŒãã®ããŒã«ã« SSD ã«ãã£ãã·ã¥ åäžã®ããŒã¿ã® 2 åç®ä»¥éã®èªã¿åããå€§å¹ ã«é«éå ã¯ãšãªçµæã®ãã£ãã·ã¥ SQL ãŠã§ã¢ããŠã¹ãä»ããå šã¯ãšãªã®çµæã»ãã ãã£ãã·ã¥ ⢠å°ããªçµæã»ãã (1MB æªæº) â On Cluster (Driver Node) ⢠倧ããªçµæã»ãã (1MB 以äž) â On Cloud Storage åäžã®ã¯ãšãªã® 2 åç®ã®çµæè¿åŽãå€§å¹ ã«é«éå Databricks SQL UI ãã£ãã·ã¥ ã¯ãšãªãšããã·ã¥ããŒãã®ãŠãŒã¶ãŒããšã®ãã£ãã·ã¥ (on DBFS) SQL ãŠã§ã¢ããŠã¹ã忢ããŠããŠãããã·ã¥ããŒã衚瀺ãå¯èœ Delta Table ã¯ãšãªãã£ãã·ã¥-Azure Databricks - Databricks SQL | Microsoft Docs How to Extract Large Query Results Through Cloud Object Stores - The Databricks Blog
ãã£ã¹ã¯ ãã£ãã·ã¥ ã¯ã©ãŠã ã¹ãã¬ãŒãžäžã® Delta Lake ããã³ Parquet 圢åŒã®ããŒã¿ã®æåã®èªã¿åãæã« ã¯ãŒã«ãŒ
ããŒãã®ããŒã«ã« SSD ã«ãã£ãã·ã¥ã2 åç®ä»¥éã®èªã¿åããå€§å¹ ã«é«éå é ç® èª¬æ ãã£ãã·ã¥å¯Ÿè±¡ã®ããŒã¿ ã¯ãšãªã§åŠçããããŒã¿ (ã¯ãšãªçµæãã£ãã·ã¥ã§ã¯ãªã) ãã£ãã·ã¥å¯Ÿè±¡ã®ããŒã¿åœ¢åŒ Delta Lake ããã³ Parquet 圢åŒã®ããŒã¿ ãã£ãã·ã¥ã®ããªã¬ãŒ æåã®èªã¿åãæã«èªåçã«å®è¡ ãã£ãã·ã¥ã®åŒ·å¶ CACHE SELECT ã³ãã³ãã§ããŒã¿ããã£ãã·ã¥ã«äºåã«èªã¿èŸŒãã ããã©ãŒãã³ã¹ ã€ã³ã¡ã¢ãªã® Spark ãã£ãã·ã¥ãããé«éã«èªã¿åãå¯èœïŒé«éãªäžéããŒã¿åœ¢åŒã§æ ŒçŽ & å¹çç㪠ã¢ã«ãŽãªãºã ã§å±é ãã£ãã·ã¥ã®äžè²«æ§ ãã£ã¹ã¯ ãã£ãã·ã¥ãããŒã¿ ãã¡ã€ã«ã®äœæã»åé€ã»æŽæ°ãèªåçã«æ€ç¥ããã£ãã·ã¥ã«åæ ïŒãŠãŒã¶ãŒã æç€ºçã«ãã£ãã·ã¥ã®ç¡å¹åãæŽæ°ãè¡ãå¿ èŠã¯ãªã ãã£ãã·ã¥ã®åé€ LRU (Least Recently Used; æåŸã®äœ¿çšããæãæéãçµéãã) ããŒã¿ãèªåçã«åé€ ã¯ã©ã¹ã¿ãŒã®åèµ·åã«ãã£ãŠããã£ãã·ã¥ã¯åé€ããã ãã£ãã·ã¥ã®å®¹é ã¯ãŒã«ãŒ ããŒãã®ããŒã«ã« SSD ã®äœ¿çšå¯èœãªé åã®ååãå©çš (Standard_E8ds_v4 ã®ããŒã«ã« SSD 㯠300 GiB = 150 GiB ããã£ã¹ã¯ ãã£ãã·ã¥ã«å©çš) Azure Databricks ã§ãã£ãã·ã¥ã䜿çšããŠããã©ãŒãã³ã¹ãæé©åãã - Azure Databricks | Microsoft Learn
ãã£ã¹ã¯ ãã£ãã·ã¥ ã¯ã©ãŠã ã¹ãã¬ãŒãžäžã® Delta Lake ããã³ Parquet 圢åŒã®ããŒã¿ã®æåã®èªã¿åãæã« ã¯ãŒã«ãŒ
ããŒãã®ããŒã«ã« SSD ã«ãã£ãã·ã¥ã2 åç®ä»¥éã®èªã¿åããå€§å¹ ã«é«éå é ç® èª¬æ ãã£ãã·ã¥å¯Ÿè±¡ã®ããŒã¿ ã¯ãšãªã§åŠçããããŒã¿ (ã¯ãšãªçµæãã£ãã·ã¥ã§ã¯ãªã) ãã£ãã·ã¥å¯Ÿè±¡ã®ããŒã¿åœ¢åŒ Delta Lake ããã³ Parquet 圢åŒã®ããŒã¿ ãã£ãã·ã¥ã®ããªã¬ãŒ æåã®èªã¿åãæã«èªåçã«å®è¡ ãã£ãã·ã¥ã®åŒ·å¶ CACHE SELECT ã³ãã³ãã§ããŒã¿ããã£ãã·ã¥ã«äºåã«èªã¿èŸŒãã ããã©ãŒãã³ã¹ ã€ã³ã¡ã¢ãªã® Spark ãã£ãã·ã¥ãããé«éã«èªã¿åãå¯èœïŒé«éãªäžéããŒã¿åœ¢åŒã§æ ŒçŽ & å¹çç㪠ã¢ã«ãŽãªãºã ã§å±é ãã£ãã·ã¥ã®äžè²«æ§ ãã£ã¹ã¯ ãã£ãã·ã¥ãããŒã¿ ãã¡ã€ã«ã®äœæã»åé€ã»æŽæ°ãèªåçã«æ€ç¥ããã£ãã·ã¥ã«åæ ïŒãŠãŒã¶ãŒã æç€ºçã«ãã£ãã·ã¥ã®ç¡å¹åãæŽæ°ãè¡ãå¿ èŠã¯ãªã ãã£ãã·ã¥ã®åé€ LRU (Least Recently Used; æåŸã®äœ¿çšããæãæéãçµéãã) ããŒã¿ãèªåçã«åé€ ã¯ã©ã¹ã¿ãŒã®åèµ·åã«ãã£ãŠããã£ãã·ã¥ã¯åé€ããã ãã£ãã·ã¥ã®å®¹é ã¯ãŒã«ãŒ ããŒãã®ããŒã«ã« SSD ã®äœ¿çšå¯èœãªé åã®ååãå©çš (Standard_E8ds_v4 ã®ããŒã«ã« SSD 㯠300 GiB = 150 GiB ããã£ã¹ã¯ ãã£ãã·ã¥ã«å©çš) Medium ãµã€ãºã®ã¯ã©ã¹ã¿ãŒã®å Žåã ã¯ãŒã«ãŒ ããŒã à 8 å° = èš 1.2TiB ã®ããŒã«ã« SSD ã ãã£ã¹ã¯ ãã£ãã·ã¥ã«å©çšå¯èœ Azure Databricks ã§ãã£ãã·ã¥ã䜿çšããŠããã©ãŒãã³ã¹ãæé©åãã - Azure Databricks | Microsoft Learn
SQL ãŠã§ã¢ããŠã¹ã®åãæ¹ã®ã¢ã€ãã£ã¢ ï è€æ°ããŒã ã§ãŠã§ã¢ããŠã¹ã掻çšããå Žåãåäžã®å·šå€§ãªãŠã§ã¢ããŠã¹ãéçšããããã çšéãå©çšè ãªã©ã®èгç¹ã§ãŠã§ã¢ããŠã¹ãåããæ¹ã管çã容æ ï ç°ãªããŠã§ã¢ããŠã¹ã§ãåäžã®ããŒã¿ã«ã¢ã¯ã»ã¹å¯èœãªã®ããã€ã³ã ï ã¡ãªããïŒè² è·ã®éäžåé¿ãã¿ã°ã«ãã課éã®æç¢ºåã忢ãªã©ã®èª¿æŽã容æ å šç€Ÿããã·ã¥ããŒãçšãŠã§ã¢ããŠã¹
(Small, Min=1, Max=2) ããŒã¿ ã¢ããªã¹ã ããŒã çšãŠã§ã¢ããŠã¹ (Medium, Min=1, Max=4) BI ããŒã çšãŠã§ã¢ããŠã¹ (Large, Min=1, Max=2) ADLS Gen2 ãªã©ã®ã¯ã©ãŠã ãªããžã§ã¯ã ã¹ãã¬ãŒãž
⢠Databricks SQL ãµãŒããŒã¬ã¹ (ãã¬ãã¥ãŒ) ⢠Unity Catalog Databricks SQL
ã®ææ°æ å ±
SQL ãŠã§ã¢ããŠã¹ã®ã¿ã€ããšæäŸæ©èœ SQL ãŠã§ã¢ããŠã¹ æ©èœæŠèŠ æäŸæ©èœ Classic Self managed, introductory
SKU, compute in your account Pro Self managed, advanced SKU, compute in your account Serverless* Fully managed, elastic, best value ã¢ããã㯠SQL ã¯ãšãª SQL ãšãã£ã¿ ⢠⢠⢠ANSI SQL ⢠⢠⢠ããŒã¿ ãããžã¡ã³ã ã¬ããã³ã¹ ã¯ãšãªå±¥æŽ & ã¯ãšãªãããã¡ã€ã« ⢠⢠⢠Unity Catalog åã Data Explorer ⢠⢠⢠Managed Data Sharing ⢠⢠⢠æ¥ç¶æ§ Partner Connect ⢠⢠⢠SQL Rest API, Python, Node.js, Go* ⢠⢠⢠ããã©ãŒãã³ã¹ Photon ⢠⢠⢠Predictive I/O ⢠⢠SQL ETL/ELT Query ãã§ãã¬ãŒã·ã§ã³* ⢠⢠Materialized View * ⢠⢠Workflows çµ±å* ⢠⢠ããŒã¿ ãµã€ãšã³ã¹ ML Geospatial 颿° ⢠⢠Python UDF* ⢠⢠Notebook Integration* ⢠⢠ãµãŒããŒã¬ã¹ SQL ããŒã¿ ãŠã§ã¢ããŠã¹ ãã« ãããŒãžã ã³ã³ãã¥ãŒã ⢠Intelligent Workload Management* ⢠Serverless Query Result Caching* ⢠What are the SQL warehouse types? - Azure Databricks - Databricks SQL | Microsoft Learn Databricks SQL Azure Pricing - Databricks * ãã¬ãã¥ãŒæ©èœ
Databricks SQL ãµãŒããŒã¬ã¹ã®ç¹é· 1 é«ãçç£æ§ 2 ãã« ãããŒãžã 3 ã³ã¹ãåæž
⢠ãŠãŒã¶ãŒã®ã¯ãšãªã¯ããã«éå§ã ã¯ã©ã¹ã¿èµ·åãåŸ ã€å¿ èŠãªã â¢ å³æã®ã¯ã©ã¹ã¿ ã¹ã±ãŒãªã³ã°ã§ æŽãªãåæå©çšãŠãŒã¶ãŒã«å¯Ÿå¿ â¢ æ§æäžèŠ â¢ ããã©ãŒãã³ã¹ ãã¥ãŒãã³ã°äžèŠ â¢ ãã£ãã·ãã£ç®¡çäžèŠ â¢ èªåã¢ããã°ã¬ãŒã / ãããé©çš ⢠䜿ã£ãåã ãã®æ¯æã; ã¯ã©ã¹ã¿ã®ã¢ã€ãã«æéãåæž ⢠ãªãœãŒã¹ã®éå°ããããžã§ãã³ã°ã ææ¢ ⢠æçµã¯ãšãªå®è¡ãã 10 åã® ã¢ã€ãã«åŸ ã¡å®¹éãåé€
ãªã Databricks SQL ãµãŒããŒã¬ã¹ãäœãããã®ãïŒ ã¯ã©ã¹ã¿ãŒã®èµ·åæéãççž®ããããšãã倿°ã®ãŠãŒã¶ãŒèŠæã«å¿ãããã Quicker cluster startup times ·
Community (azure.com)
Databricks SQL ãµãŒããŒã¬ã¹ã§äœãå€ããã®ãïŒ ãããŒãžããªãµãŒããŒçŸ€ åžžæèµ·åããŠãããµãŒããŒçŸ€ã¯ãèªåçã«ãããé©çš / ã¢ããã°ã¬ãŒãããã ... æé©åããããã£ãã·ã㣠ããã©ã«ãã§ã¯æåŸã®ã¯ãšãªãã
10 åçµéããã¢ã€ãã« ã¯ã©ã¹ã¿ã¯ é€å»ããã Serverless SQL Compute ã»ãã¥ã¢ ããŒã¿æå·åãå«ãã 3 ã¬ã€ã€ãŒã®åé¢ å³æã®ã³ã³ãã¥ãŒã ãŠãŒã¶ãŒã¯ã¯ã©ã¹ã¿ãŒã®èµ·å / ã¹ã±ãŒã« ã¢ããæã« 10 ç§ä»¥å ã« èšç®ãªãœãŒã¹ãå²ãåœãŠãããããåŸ ã¡æéãã»ãŒãªã
åŸæ¥ã® Databricks SQL ã®ã³ã³ãã¥ãŒãã£ã³ã° ãªãœãŒã¹ã®é 眮 ããŒã¿ ãã¬ãŒã³ã¯ãŠãŒã¶ãŒã管çãã Azure ãµãã¹ã¯ãªãã·ã§ã³ã«é 眮 Azure
Databricks ã¯ãŒã¯ã¹ããŒã¹ VNET ã¯ãŒã¯ã¹ããŒã¹ ã¯ãŒã¯ã¹ããŒã¹ Customers Account ãŠãŒã¶ãŒç®¡çã® Azure ãµãã¹ã¯ãªãã·ã§ã³ Azure Databricks ãµãŒãã¹ ãŠãŒã¶ãŒç®¡çã®ã¯ã©ãŠã ã¹ãã¬ãŒãž ããŒã¿ ãã¬ãŒã³ (ã¯ã©ã¹ã¿ãŒãæ§æãã VM 矀ãšåšèŸºãªãœãŒã¹) ã³ã³ãããŒã« ãã¬ãŒã³
Databricks SQL ãµãŒããŒã¬ã¹ã®ã³ã³ãã¥ãŒãã£ã³ã° ãªãœãŒã¹ã®é 眮 ããŒã¿ ãã¬ãŒã³ã¯ Azure Databricks ã管çãããµãã¹ã¯ãªãã·ã§ã³ã«é 眮 Azure
Databricks ã¯ãŒã¯ã¹ããŒã¹ ã¯ãŒã¯ã¹ããŒã¹ ã¯ãŒã¯ã¹ããŒã¹ Customers Account ãŠãŒã¶ãŒç®¡çã® Azure ãµãã¹ã¯ãªãã·ã§ã³ Azure Databricks ãµãŒãã¹ ãŠãŒã¶ãŒç®¡çã®ã¯ã©ãŠã ã¹ãã¬ãŒãž Databricks SQL ãµãŒããŒã¬ã¹çšããŒã¿ ãã¬ãŒã³ ã³ã³ãããŒã« ãã¬ãŒã³
åŸæ¥ã®ã¯ã©ã¹ã¿ãŒãšãµãŒããŒã¬ã¹ã®ååå¿çãŸã§ã®æéæ¯èŒ â Classic ãŸã㯠Pro ã®å Žåã ã¯ã©ã¹ã¿ãŒã®èµ·åå®äºãŸã§ (VM ã®å°æ° ã«ããã)
5-7 åçšåºŠåŸ ã€å¿ èŠããã â ãµãŒããŒã¬ã¹ã¯ãŠãŒã¶ãŒ ãªã¯ãšã¹ããåã㊠æŠã 10 ç§ä»¥å ã«ã³ã³ãã¥ãŒãã£ã³ã° ãªãœãŒã¹ã®æºåãå®äº â ãããªãæéççž® (2-4 ç§ãŸã§ççž®) ã ç®æšã«éçºãé²è¡äž ãµãŒããŒã¬ã¹ã¯ååã®ã¯ãšãªå¿çé床ãå€§å¹ ã«åäž ~10s ~2-4s ~7m
Databricks SQL ãµãŒããŒã¬ã¹ã¯ Azure ã¹ãã¬ãŒãž ã¢ã«ãŠã³ãã® ä»®æ³ãããã¯ãŒã¯ ãµãŒãã¹ ãšã³ããã€ã³ãã«å¯Ÿå¿ ãµãŒãã¹
ãšã³ããã€ã³ã: Azure ãªãœãŒã¹ãžã®ã¢ã¯ã»ã¹ãç¹å®ã® Vnet ã®ãµããããããã®éä¿¡ã« éå®ã§ããæ©èœãéä¿¡ã¯åžžã« Azure ã®ããã¯ããŒã³ ãããã¯ãŒã¯ãéããªã©ã®ç¹é·ãæã€ ãµãŒããŒã¬ã¹ãå©çšãã VNet ãšãµããããã®äžèЧãä»¥äž Docs ã§å ¬éãããŠããããããã ã¹ãã¬ãŒãž ã¢ã«ãŠã³ãã®ãã¡ã€ã¢ ãŠã©ãŒã«ã§èš±å¯ãã圢 Configure Azure storage firewalls to allow access from serverless SQL warehouses https://learn.microsoft.com/ja-jp/azure/databricks/sql/admin/serverless-firewall NW ã»ãã¥ãªãã£ãå¿ é ã®çµç¹ã§ããµãŒããŒã¬ã¹ãå®å¿ããŠå©çšã§ãã
Databricks SQL ãµãŒããŒã¬ã¹ã®å©çšéå§ã«ããã£ãŠã®èæ ®ç¹ 2023 幎 1 æçŸåšã®æ å ± ï ãããªã㯠ãã¬ãã¥ãŒã®ã¹ããŒã¿ã¹
ï å©çšå¯èœãªãªãŒãžã§ã³ã¯ç±³åœæ±éšãç±³åœæ±éš 2ã西ãšãŒããã ï å©çšéå§ã«ããã£ãŠç³è«ãå¿ èŠ (ç³è«ãã©ãŒã ã«ã€ããŠãä»¥äž URL ã«èšèŒãã) ææ°ã®æ å ±ã«ã€ããŠã¯ä»¥äž URL ãåç § https://learn.microsoft.com/ja-jp/azure/databricks/serverless-compute/
Unity Catalog: ã¡ã¿ããŒã¿ãšãŠãŒã¶ãŒã®äžå 管ç ããŒã¿ ãšã¹ããŒãã®çµ±äžãã¥ãŒãäœæ Unity Catalog ãªã Databricks Workspace
2 Databricks Workspace 1 User Management Metastore Clusters SQL Warehouses User Management Metastore Clusters SQL Warehouses Access Controls Access Controls ã¯ãŒã¯ã¹ããŒã¹åäœã§ ãŠãŒã¶ãŒãšã¡ã¿ããŒã¿ã管ç Databricks Workspace Databricks Workspace Unity Catalog User Management Metastore Clusters SQL Warehouses Clusters SQL Warehouses Access Controls Unity Catalog ãã ãŠãŒã¶ãŒãšã¡ã¿ããŒã¿ç®¡çã Unity Catalog ã«äžå å
Unity Catalog ã«ããããŒã¿ ã¬ããã³ã¹ 以äžã® 4 ã€ã®åéãã«ã㌠åé æŠèŠ ããŒã¿
ã¢ã¯ã»ã¹ ã³ã³ãããŒã« 誰ãã©ã®ããŒã¿ã«ã¢ã¯ã»ã¹ããããã³ã³ãããŒã« ããŒã¿ã®ãªããŒãž äžæµã®ããŒã¿ãœãŒã¹ãšäžæµããŒã¿ãœãŒã¹ã远跡ãã ããŒã¿ ãã£ã¹ã«ããªãŒ è³ç£ãæ€çŽ¢ããŠçºèŠããæ©èœ ããŒã¿ ã¢ã¯ã»ã¹ç£æ» ããŒã¿ãžã®ãã¹ãŠã®ã¢ã¯ã»ã¹ãææããèšé²ãã
3 éå±€ã®ããŒã ã¹ããŒã¹ æ¢åã®ã¡ã¿ã¹ãã¢ãžã®ã·ãŒã ã¬ã¹ãªã¢ã¯ã»ã¹ 41 SELECT * FROM main.paul.red_wine; --
<catalog>.<database>.<table> SELECT * FROM hive_metastore.default.customers; Unity Catalog Catalog 2 Catalog 1 Database 2 Database 1 External Table Views External Tables Managed Tables hive_metastore (legacy) default (database) customers (table)
ããŒã¿ ã¢ã¯ã»ã¹ ã³ã³ãããŒã«ã®äžå 管ç ã¢ã¯ã»ã¹èš±å¯ã®æ¿è«Ÿ (Grant) ãšç®¡çã Unity Catalog ã§äžå çã«è¡ãã 42
GRANT <privilege> ON <securable_type> <securable_name> TO `<principal>` GRANT SELECT ON iot.events TO engineers Choose permission level Sync groups from your identity provider âTableâ= collection of files in ADLS/S3 Using ANSI SQL DCL Using UI
⢠Azure Databricks ã®äŸ¡æ Œ ⢠ãµãŒããŒã¬ã¹ä»¥å€ (Classic ãš Pro) â¢
ãµãŒããŒã¬ã¹ ⢠ã³ã¹ãåæž Databricks SQL ã®äŸ¡æ Œã®èãæ¹
Azure Databricks ã®æ§æ â« ãœãããŠã§ã¢ ã¬ã€ã€ãŒ : Databricks Unit (DBU)
åäœã«ããæé課é â« ã€ã³ãã© ã¬ã€ã€ãŒ : Azure ãªãœãŒã¹èª²é
Azure VM ã³ã¹ã â« ãµãŒããŒã¬ã¹ 以å€ã®ã€ã³ãã©èª²é㯠9 å²ã VM 課é
â« VM 課éã¯äœ¿çšéé¢ä¿ãªããå©çšæéã§ç®åº èšç®åŒ : VM ã³ã¹ã = VM å䟡 à ã€ã³ã¹ã¿ã³ã¹æ° à 皌åæé æ³šïŒ â¢ ã€ã³ãã© ãªãœãŒã¹ : VM ããããŒãžã ãã£ã¹ã¯ã Blob Storage ããããªã㯠IP ã¢ãã¬ã¹ãªã© ⢠ãµãŒããŒã¬ã¹ã®ã³ã³ãã¥ãŒãã£ã³ã°ã¯ Azure ã§ã¯ãªã Databricks ãµãã¹ã¯ãªãã·ã§ã³ã«ååšãã ã€ã³ãã© ã¬ã€ã€ãŒã® VM 課éã¯çºçããããœãããŠã§ã¢ ã¬ã€ã€ãŒèª²éã«å å«
Azure Databricks ã®èª²é 課éé¡å šäœ ã¿ã€ã ⢠VM ⢠ãããŒãžã ãã£ã¹ã¯
⢠Blob Storage ⢠ãããªã㯠IP ã¢ãã¬ã¹ ãœãããŠã§ã¢ ã¬ã€ã€ãŒ : Databricks ã€ã³ãã© ã¬ã€ã€ãŒ : Azure Data Science & Engineering SQL ⢠Classic ⢠Pro ⢠ãµãŒããŒã¬ã¹
DBU ã³ã¹ã â« ãµãŒããŒã¬ã¹ä»¥å€ã® DBU ã³ã¹ãã¯äœ¿çšéé¢ä¿ãªããçµéæéã§ç®åº â« ã¯ã©ã¹ã¿ãŒç«ã¡äžãåŸããŒã¿ããã§ããçµéæéã§è²»çšãçºç èšç®åŒïŒ DBU
ã³ã¹ã ïŒ ã¯ã©ã¹ã¿ãŒã® DBU æ° Ã ã¯ã©ã¹ã¿ãŒæ° à DBU ã®æé à 皌åæé æ³šïŒ â¢ ã¯ã©ã¹ã¿ãŒã® DBU æ°ã¯ã¯ã©ã¹ã¿ãŒ ãµã€ãºã«ãã ⢠ã¯ã©ã¹ã¿ãŒæ°ã¯ã¹ã±ãŒãªã³ã°ã§æå€§æå°ãæå® ⢠DBU ã®æéã¯ãªãŒãžã§ã³ãã¯ãŒã¯ããŒããäŸ¡æ Œã¬ãã«ã«ãã ⢠Classic ãš Pro ã®äŸ¡æ Œã¬ãã«ã¯ Premium ã®ã¿
Databricks SQL ã®ã¯ã©ã¹ã¿ãŒ
DBU ã®æé (æ±æ¥æ¬ãªãŒãžã§ã³) https://azure.microsoft.com/ja-jp/pricing/details/databricks/ (2023 幎 1 æ 10 æ¥æç¹ã®æ å ±)
(Type: Classic ) (Type: Pro)
DBU ã³ã¹ã (èšç®äŸ) èšç®åŒ : (æ±æ¥æ¬ã®å Žå) DBU ã³ã¹ã ïŒ ã¯ã©ã¹ã¿ãŒã®
DBU æ° Ã ã¯ã©ã¹ã¿ãŒæ° à DBU ã®æé à 皌åæé 24 1 0.22 5.28 $/hour =
DBU ã³ã¹ã â« ãµãŒããŒã¬ã¹ã® DBU ã³ã¹ãã¯äœ¿çšéã§ç®åº â« ã¯ã©ã¹ã¿ãŒã¯ã¯ãšãªåŸ 10 ç§ä»¥å ã«èµ·å
â« ããã©ã«ãã§æçµã¯ãšãª 10 åçµéã§ã¯ã©ã¹ã¿ãŒé€å» èšç®åŒ : DBU ã³ã¹ã ïŒ ã¯ã©ã¹ã¿ãŒã® DBU æ° Ã ã¯ã©ã¹ã¿ãŒæ° à DBU ã®æé à 皌åæé 皌åæé ïŒã¯ãšãªæé + ã¢ã€ãã«æé æ³šïŒ ⢠ã¯ã©ã¹ã¿ãŒã® DBU æ°ã¯å šã¯ãŒã¯ããŒãå ±é ⢠ãµãŒããŒã¬ã¹å¯Ÿå¿ (ãããªã㯠ãã¬ãã¥ãŒ) ãªãŒãžã§ã³ : ç±³åœæ±éš (eastus)ãç±³åœæ±éš 2 (eastus2)ã西ãšãŒããã (westeurope) (2023 幎 1 æ 10 æ¥æç¹) ⢠DBU ã®æéã¯ã¯ãŒã¯ããŒããšäŸ¡æ Œã¬ãã«ã«ãã â¢ äŸ¡æ Œã¬ãã«ã¯ã¯ãŒã¯ã¹ããŒã¹äœæææå® ⢠ãµãŒããŒã¬ã¹ã®äŸ¡æ Œã¬ãã«ã¯ Premium ã®ã¿
DBU ã®æé (æ±æ¥æ¬ãªãŒãžã§ã³) æ³šïŒ â¢ https://azure.microsoft.com/ja-jp/pricing/details/databricks/ (2023 幎 1 æ
10 æ¥æç¹ã®æ å ±) â¢ æ±æ¥æ¬ã»è¥¿æ¥æ¬ãªãŒãžã§ã³ã§ã¯ãµãŒããŒã¬ã¹ã¯æªå¯Ÿå¿ (Type: Classic) (Type: Pro)
æ¹æ³ 1 : DBU ã®äºåè³Œå ¥ â« DBU ã®æéã¯äºåè³Œå ¥ã§ã³ã¹ãåæžå¯èœ â« 1
幎ã 3 幎ã®äºåè³Œå ¥ã§æå€§ 37% ç¯çŽ DBU ã³ã¹ãã®èšç®åŒ : DBU ã³ã¹ã ïŒ ã¯ã©ã¹ã¿ãŒã® DBU æ° Ã ã¯ã©ã¹ã¿ãŒæ° à DBU ã®æé à 皌åæé
äºåè³Œå ¥ãã©ã³ https://azure.microsoft.com/ja-jp/pricing/details/databricks/ (2023 幎 1 æ 10 æ¥æç¹æ å ±) 1 幎éãã©ã³
3 幎éãã©ã³
æ¹æ³ 2 : èªåã¹ã±ãŒã« (ãµãŒããŒã¬ã¹ä»¥å€ã®å Žå) â« ã¹ã±ãŒãªã³ã°èšå®ã§ã¯ã©ã¹ã¿ãŒã®æå€§ã»æå°ãæå® â« è² è·ãäœãç¶æ ãç¶ããšã¯ã©ã¹ã¿ãŒãèªååæž â«
ç¡é§ãªãªãœãŒã¹ãæžããã³ã¹ãåæžãå¯èœ ãµãŒããŒã¬ã¹ä»¥å€ã® DBU ã³ã¹ãã®èšç®åŒ : DBU ã³ã¹ã ïŒ ã¯ã©ã¹ã¿ãŒã® DBU æ° Ã ã¯ã©ã¹ã¿ãŒæ° à DBU ã®æé à 皌åæé
æ¹æ³ 3 : ã¹ããã ã€ã³ã¹ã¿ã³ã¹ ããªã·ãŒ (ãµãŒãã¬ã¹ä»¥å€ã®å Žå) ã» ã³ã¹ãæé©å (æ¢å®)
ï ã¯ãŒã«ãŒ ããŒããã¹ããã VM ã§èµ·å ( ãã©ã€ããŒã¯ãªã³ããã³ã ) ï ã¹ããã VM ã¯ãªã³ããã³ãã® 80%+ ã®ã³ã¹ãç¯çŽ ï èæ ®äºé ï èµ·åæã«ãªãŒãžã§ã³ã«äœå°ã® VM ããªãå Žåããªã³ããã³ãã§èµ·å ï èµ·åããŠããéã«ãªãŒãžã§ã³ã«äœå°ã® VM ããªããªã£ãå Žåããªã³ããã³ãã§åèµ·å ã» ä¿¡é Œæ§æé©å ï ãã©ã€ããŒãã¯ãŒã«ãŒããªã³ããã³ãã§èµ·å ï ã¯ã©ã¹ã¿ãŒã確å®ã«èµ·åãããå Žåãã VM ã®äºçŽå®¹éé©çšã®éã¯ãã¡ããéžæ
Azure Databricks ã®èª²é (åæ²) 課éé¡å šäœ Type ⢠VM ⢠ãããŒãžã
ãã£ã¹ã¯ ⢠Blob Storage ⢠ãããªã㯠IP ã¢ãã¬ã¹ ãœãããŠã§ã¢ ã¬ã€ã€ãŒ : Databricks ã€ã³ãã© ã¬ã€ã€ãŒ : Azure Data Science & Engineering SQL ⢠Classic ⢠Pro ãµãŒããŒã¬ã¹
Databricks SQL ãš Azure ãµãŒãã¹ãçµã¿åããã ããŒã³ãŒã ETL ãšããŒã¿åæ ã¢ãŒããã¯ãã£ãŒçŽ¹ä» &
ãã¢
Azure Databricks ãäžå¿ãšããããŒã¿ ãã€ãã©ã€ã³ Azure Data Lake Storage Gen2 Azure
Databricks ãŽãŒã«ã ããã³ãº ã·ã«ã㌠ããŒããã㯠Azure Databricks ããã°ã©ãã³ã°ãçšããããŒã¿å å·¥ 1 Databricks ã§æºãããããŒãº SQL ãçšããããŒã¿åæã»å¯èŠå 2 ã¯ã©ãŠã ã¹ãã¬ãŒãžãžã®çããŒã¿æ ŒçŽ 1 ãã£ã¬ã³ãž ã㌠ã³ãŒãã®ããŒã¿å å·¥ã®ããŒãºãžã®å¯Ÿå¿ 2 ã»ã«ã ãµãŒãã¹ BI ã®ããŒãºãžã®å¯Ÿå¿ 3 Python, JAR etc. Databricks SQL ããŒã¿å å·¥ ããŒã¿åæ ãªã³ãã¬ãã¹ ã¯ã©ãŠã ããŒã¿ ïŒ çããŒã¿ 1 ïŒ 2 ïŒ 3
Databricks + Azure ãµãŒãã¹ãçµã¿åãããããŒã¿ ãã€ãã©ã€ã³ Azure Data Lake Storage Gen2
Azure Databricks ãŽãŒã«ã ããã³ãº ã·ã«ã㌠ããŒããã㯠Azure Databricks Python, JAR etc. Databricks SQL ããŒã¿å å·¥ ããŒã¿åæ Azure Data Factory Synapse ãã€ãã©ã€ã³ ãããã³ã° ããŒã¿ ãã㌠Power BI PBI Desktop PBI Services ãªã³ãã¬ãã¹ ã¯ã©ãŠã ããŒã¿ ã³ã㌠ã¢ã¯ãã£ãã㣠çããŒã¿ Azure Data Factory Synapse ãã€ãã©ã€ã³ ããã°ã©ãã³ã°ãçšããããŒã¿å å·¥ 1 Databricks ã§æºãããããŒãº SQL ãçšããããŒã¿åæã»å¯èŠå 2 ã¯ã©ãŠã ã¹ãã¬ãŒãžãžã®çããŒã¿æ ŒçŽ 1 ãã£ã¬ã³ãž ã㌠ã³ãŒãã®ããŒã¿å å·¥ã®ããŒãºãžã®å¯Ÿå¿ 2 ã»ã«ã ãµãŒãã¹ BI ã®ããŒãºãžã®å¯Ÿå¿ 3 å©çšãã Azure ãµãŒãã¹ Azure Data Factory / Azure Synapse Analytics ã® ãã€ãã©ã€ã³æ©èœ Microsoft Power BI
Databricks + Azure ãµãŒãã¹ãçµã¿åãããããŒã¿ ãã€ãã©ã€ã³ Azure Data Lake Storage Gen2
Azure Databricks ãŽãŒã«ã ããã³ãº ã·ã«ã㌠ããŒããã㯠Azure Databricks Python, JAR etc. Databricks SQL ããŒã¿å å·¥ ããŒã¿åæ Azure Data Factory Synapse ãã€ãã©ã€ã³ ãããã³ã° ããŒã¿ ãã㌠Power BI PBI Desktop PBI Services ãªã³ãã¬ãã¹ ã¯ã©ãŠã ããŒã¿ ã³ã㌠ã¢ã¯ãã£ãã㣠çããŒã¿ Azure Data Factory Synapse ãã€ãã©ã€ã³ Azure Databricks ã§ å°éã®ããŒã¿ ãšã³ãžã㢠ã®ããŒãºãå è¶³ã§ãã Azure ãµãŒãã¹ãçµã¿åãããããšã§ åžæ°ããŒã¿ ãšã³ãžã㢠ã®ããŒãºãå è¶³ã§ãã
Azure Data Factory / Synapse ãã€ãã©ã€ã³ ããŒã¿ã®ã³ããŒã ETL åŠçãã¹ã±ãžã¥ãŒã«ãã€ãã³ã ããŒã¹ã®ããªã¬ãŒã§å®è¡ãã
PaaS åãµãŒã㹠⢠100 以äžã®çµã¿èŸŒã¿ã®ã³ãã¯ã¿ãŒãAzure å å€ãåããå¯Ÿå¿ â¢ ãªã³ãã¬ãã¹ãå€éšã¯ã©ãŠããšã®ãã€ããªããæ¥ç¶ã«å¯Ÿå¿ ⢠ãã€ãã©ã€ã³ã®ãªãŒã±ã¹ãã¬ãŒã·ã§ã³: ã³ããŒãããŒã¿ ãããŒã åçš® Azure ãµãŒãã¹ã®å®è¡ãåå²ãç¹°ãè¿ããªã©ã®å¶åŸ¡ â¢ çŽæçãªå®è¡ã¢ãã¿ãªã³ã°
ãããã³ã° ããŒã¿ ãããŒ â¢ å€æ (Transformation) ãšåŒã°ããããŒããçµã¿åãããŠããŒã³ãŒãã§ ETL åŠçãäœæ â¢
Spark ã¯ã©ã¹ã¿ãŒäžã§ åŠçãå®è¡ãããããå€§èŠæš¡ãªããŒã¿ã«å¯Ÿå¿å¯èœ åå è±å 説æ éçŽ Aggregate æ¢åã®åãŸãã¯èšç®åã«ãã£ãŠã°ã«ãŒãåããããSUMã MINãMAXãCOUNT ãªã©ã®ããŸããŸãªçš®é¡ã®éèšã å®çŸ©ã§ããŸãã æŽŸçå Derived column ããŒã¿ ãããŒã®èšèªã䜿çšããŠãæ°ããåãçæãããã æ¢åã®ãã£ãŒã«ãã倿ŽããŸãã ãã©ããå Flatten JSON ãªã©ã®éå±€æ§é äœã®äžã§é åå€ãååŸãããã ããåã ã®è¡ã«å±éããŸãã çµå Join 2 ã€ã®ãœãŒã¹ãŸãã¯ã¹ããªãŒã ã®ããŒã¿ãçµåããŸãã éžæ Select å¥ååãšã¹ããªãŒã åãåã®ãããããŸãã¯äžŠã¹æ¿ã ã·ã³ã¯ Sink ã䜿ãã®ããŒã¿ã®æçµåä¿¡å ãœãŒã¹ Source ããŒã¿ ãããŒã®ããŒã¿ ãœãŒã¹ åéå Union è€æ°ã®ããŒã¿ ã¹ããªãŒã ãåçŽæ¹åã«çµåãã (代衚çãªå€æã®äŸ) (倿ã®äžèЧ) https://docs.microsoft.com/ja-jp/azure/data-factory/data-flow-transformation-overview
Power BI ãã©ãããã©ãŒã ããŒã¿ã®ååŸã»å å·¥ã»å¯èŠåãŸã§ äžè²«ãããã¹ã¯ãããããŒã« Power BI ã³ã³ãã³ãã å ±æã»ç®¡çããåºç€ç°å¢ Power
BI ã³ã³ãã³ãã Web / ã¢ãã€ã«ããã€ã¹ã§åç §/åæ Power BI ãµãŒãã¹ Power BI Desktop ãã©ãŠã¶ Power BI ã¢ãã€ã«
ã㢠ã·ããªãª CSV 泚æ CSV 泚ææçް Zip å°å£² ããŒã¿ã»ãã Delta
Lake 泚æ Delta Lake 泚ææçް Delta Lake 泚æãµããªãŒ Azure Data Lake Storage Gen2 Azure Databricks Power BI Desktop Databricks SQL ãŠã§ã¢ããŠã¹ ã¬ããŒã Synapse ãã€ãã©ã€ã³ ã³ã㌠& Zip å±é CSV ãã Delta Lake ã«å€æ 泚æãæ¥å¥ã« éèš ããŒã¿ ãã㌠#2 ããŒã¿ ãã㌠#1 ã³ã㌠ã¢ã¯ãã£ãã㣠GitHub (å¿å HTTPS ã¢ã¯ã»ã¹ãå¯èœãª ãããªã㯠ãªããžããªã«æ ŒçŽ) Synapse ãã€ãã©ã€ã³ã®ã³ã㌠ã¢ã¯ãã£ãã㣠1 Synapse ãã€ãã©ã€ã³ã®ãããã³ã° ããŒã¿ ãã㌠2 Databricks SQL ãŠã§ã¢ããŠã¹ã«ããåæ 3 Power BI Desktop ã®ã¬ããŒãã«ããåæ 4
ã㢠ã·ããªãª Synapse ãã€ãã©ã€ã³ã®ã³ã㌠ã¢ã¯ãã£ãã㣠1 Synapse ãã€ãã©ã€ã³ã®ãããã³ã° ããŒã¿ ãããŒ
2 Databricks SQL ãŠã§ã¢ããŠã¹ã«ããåæ 3 Power BI Desktop ã®ã¬ããŒãã«ããåæ 4 CSV 泚æ CSV 泚ææçް Zip å°å£² ããŒã¿ã»ãã Delta Lake 泚æ Delta Lake 泚ææçް Delta Lake 泚æãµããªãŒ Azure Data Lake Storage Gen2 Azure Databricks Power BI Desktop Databricks SQL ãŠã§ã¢ããŠã¹ ã¬ããŒã Synapse ãã€ãã©ã€ã³ ã³ã㌠& Zip å±é CSV ãã Delta Lake ã«å€æ 泚æãæ¥å¥ã« éèš ããŒã¿ ãã㌠#2 ããŒã¿ ãã㌠#1 ã³ã㌠ã¢ã¯ãã£ãã㣠GitHub (å¿å HTTPS ã¢ã¯ã»ã¹ãå¯èœãª ãããªã㯠ãªããžããªã«æ ŒçŽ)
ã㢠ã·ããªãª Synapse ãã€ãã©ã€ã³ã®ã³ã㌠ã¢ã¯ãã£ãã㣠1 Synapse ãã€ãã©ã€ã³ã®ãããã³ã° ããŒã¿ ãããŒ
2 Databricks SQL ãŠã§ã¢ããŠã¹ã«ããåæ 3 Power BI Desktop ã®ã¬ããŒãã«ããåæ 4 CSV 泚æ CSV 泚ææçް Zip å°å£² ããŒã¿ã»ãã Delta Lake 泚æ Delta Lake 泚ææçް Delta Lake 泚æãµããªãŒ Azure Data Lake Storage Gen2 Azure Databricks Power BI Desktop Databricks SQL ãŠã§ã¢ããŠã¹ ã¬ããŒã Synapse ãã€ãã©ã€ã³ ã³ã㌠& Zip å±é CSV ãã Delta Lake ã«å€æ 泚æãæ¥å¥ã« éèš ããŒã¿ ãã㌠#2 ããŒã¿ ãã㌠#1 ã³ã㌠ã¢ã¯ãã£ãã㣠GitHub (å¿å HTTPS ã¢ã¯ã»ã¹ãå¯èœãª ãããªã㯠ãªããžããªã«æ ŒçŽ)
ã㢠ã·ããªãª Synapse ãã€ãã©ã€ã³ã®ã³ã㌠ã¢ã¯ãã£ãã㣠1 Synapse ãã€ãã©ã€ã³ã®ãããã³ã° ããŒã¿ ãããŒ
2 Databricks SQL ãŠã§ã¢ããŠã¹ã«ããåæ 3 Power BI Desktop ã®ã¬ããŒãã«ããåæ 4 CSV 泚æ CSV 泚ææçް Zip å°å£² ããŒã¿ã»ãã Delta Lake 泚æ Delta Lake 泚ææçް Delta Lake 泚æãµããªãŒ Azure Data Lake Storage Gen2 Azure Databricks Power BI Desktop Databricks SQL ãŠã§ã¢ããŠã¹ ã¬ããŒã Synapse ãã€ãã©ã€ã³ ã³ã㌠& Zip å±é CSV ãã Delta Lake ã«å€æ 泚æãæ¥å¥ã« éèš ããŒã¿ ãã㌠#2 ããŒã¿ ãã㌠#1 ã³ã㌠ã¢ã¯ãã£ãã㣠GitHub (å¿å HTTPS ã¢ã¯ã»ã¹ãå¯èœãª ãããªã㯠ãªããžããªã«æ ŒçŽ)
Databricks SQL on Azure ã¯ãŒã¯ã·ã§ãã https://microsoft.github.io/azure-databricks-sql-workshop-ja/
Thank You ãå¿ãããšããæåŸãŸã§ãèŠèŽããã ãã èª ã«æé£ãããããŸããã æ¬ãŠã§ãããŒãžã®ãææ³ãæ¯éãèãããã ããã ãã¡ãã® QR ã³ãŒãããã¢ã³ã±ãŒãã«ã¢ã¯ã»ã¹ã§ããŸãã æ¬¡å以éã®ãŠã§ãããŒå 容æ¹åã®åèãšãããŠããã ãããã ãååã®ã»ã©ãããããé¡ãããããŸãã
© Copyright Microsoft Corporation. All rights reserved.