Slide 1

Slide 1 text

How Presto is used by CDP Yuichiro Kaneko(@yui-knk) 2019/06/04 Repro Tech: ࣮ફɾฒྻ෼ࢄॲཧج൫

Slide 2

Slide 2 text

ࣗݾ঺հ • Yuichiro Kaneko • Arm Treasure Data • CDP team (Writing Rails application) • CRuby Committer 2015/12~ • GitHub (yui-knk)

Slide 3

Slide 3 text

https://jobs.lever.co/treasure-data/ 8FBSFIJSJOH

Slide 4

Slide 4 text

1SFTUP$POGFSFODF5PLZP https://techplay.jp/event/733772

Slide 5

Slide 5 text

1SFTUP$POGFSFODF5PLZP • 2019೥7݄11೔(໦): 13:30- • PrestoΫϦΤʔλʔͰ͋ΓPresto Software Foundationͷ૑࢝ऀͰ΋͋ ΔMartin Traverso, Dain Sundstrom, David Philipsͷ3໊Λট͍ͯ… https://techplay.jp/event/733772

Slide 6

Slide 6 text

σʔλࢦ޲ΞϓϦέʔγϣϯσβΠϯ https://twitter.com/taroleo/status/1131217986000285696

Slide 7

Slide 7 text

5PEBZTUPQJD • ͬ͘͟ΓͱTDͷઆ໌ • ͪΐͬͱৄ͘͠CDPͷઆ໌ (Ϣʔεέʔεฤ) • ͪΐͬͱৄ͘͠CDPͷઆ໌ (ٕज़ฤ)

Slide 8

Slide 8 text

TDͷશମ૾

Slide 9

Slide 9 text

Plazma DB SDKs Hive Presto

Slide 10

Slide 10 text

1MB[NB%# • ࣮σʔλ + ϝλσʔλ • σϑΥϧτͰ͸time columnͰύʔςΟγϣϯ͞Ε͍ͯΔ • https://www.slideshare.net/treasure-data/td-techplazma • https://qiita.com/xerial/items/959d2b928353b595f31e • User Defined Partitioning (UDP)Λࢦఆ͢Δ͜ͱ΋Մೳ • https://support.treasuredata.com/hc/en-us/articles/360009798714-User-Defined- Partitioning-for-Presto

Slide 11

Slide 11 text

*OQVU%BUB • FluentdʹΑΔετϦʔϛϯάΞοϓϩʔυ • EmbulkʹΑΔόϧΫΞοϓϩʔυ • SDKʹΑΔετϦʔϛϯάΞοϓϩʔυ (td-js-sdkͳͲ) • UI͔ΒͷόϧΫΞοϓϩʔυ

Slide 12

Slide 12 text

2VFSZ&OHJOF • Query Engineͱͯ͠HiveͱPresto͕ଘࡏ͍ͯ͠Δ

Slide 13

Slide 13 text

Plazma DB SDKs Hive Presto

Slide 14

Slide 14 text

TDͷCDPͱ͸ (Ϣʔεέʔεฤ)

Slide 15

Slide 15 text

https://www.treasuredata.co.jp/learn/

Slide 16

Slide 16 text

σʔλͷ੔ཧ • ސ٬ʹؔ͢Δ༷ʑͳσʔλΛอ༗͍ͯ͠Δ • ސ٬Ϛελʔσʔλ • ଐੑσʔλ • ߦಈϩά ސ٬Ϛελʔ ଐੑσʔλ webϩά ߪങཤྺ ΞϓϦϩά

Slide 17

Slide 17 text

σʔλͷ੔ཧ • ސ٬σʔλʹ͸ؔ܎͕ଘࡏ͍ͯ͠Δ ސ٬Ϛελʔ ଐੑσʔλ webϩά ߪങཤྺ ΞϓϦϩά

Slide 18

Slide 18 text

ηάϝϯτͷఆٛ • ଐੑ΍ߦಈΛ΋ͱʹηάϝϯτΛఆٛ͢Δ ސ٬Ϛελʔ ଐੑσʔλ webϩά ߪങཤྺ ΞϓϦϩά ηάϝϯτ உੑ ੡඼AΛ2ͭҎ্ߪೖ ΞϓϦΛ࢖͍ͬͯΔ ࠷ۙαΠτΛ๚Εͨ

Slide 19

Slide 19 text

ηάϝϯτΛ׆༻͢Δ • ηάϝϯτ৘ใΛ΋ͱʹࣗࣾͷweb contentsΛग़͠෼͚Δ • ଞͷϚʔέςΟϯάπʔϧͳͲʹ݁ՌΛग़ྗ ηάϝϯτ உੑ ੡඼AΛ2ͭҎ্ߪೖ ΞϓϦΛ࢖͍ͬͯΔ ࠷ۙαΠτΛ๚Εͨ Profiles API Activation

Slide 20

Slide 20 text

TDͷCDPͱ͸ (ٕज़ฤ)

Slide 21

Slide 21 text

ސ٬Ϛελʔ ଐੑσʔλ webϩά ߪങཤྺ ΞϓϦϩά customers table behaviors tables ηάϝϯτ உੑ ੡඼AΛ2ͭҎ্ߪೖ ΞϓϦΛ࢖͍ͬͯΔ ࠷ۙαΠτΛ๚Εͨ Profiles API Activation

Slide 22

Slide 22 text

ސ٬Ϛελʔ ଐੑσʔλ webϩά ߪങཤྺ ΞϓϦϩά customers table behaviors tables ηάϝϯτ உੑ ੡඼AΛ2ͭҎ্ߪೖ ΞϓϦΛ࢖͍ͬͯΔ ࠷ۙαΠτΛ๚Εͨ Profiles API Activation

Slide 23

Slide 23 text

$%1Ͱ࢖͏ςʔϒϧͷੜ੒ • workflow(digdag)Λ࢖ͬͯCDPͰ࢖͏֤छςʔϒϧΛੜ੒͢Δ • ্ྲྀσʔλΛऔΓࠐΉλΠϛϯάʹ΋ґଘ͢Δ͕ɺྫ͑͹1೔1ճ workflowΛ࣮ߦ͢Δ • جຊతʹ͸PrestoͰ࣮ߦɻσʔλαΠζ͕େ͖͍έʔε΍Ϣʔβʔଆ ͷઃఆʹΑͬͯ͸HiveͰ࣮ߦ͢Δ͜ͱ΋͋Δ • workflow͸erb + ࣮ߦ࣌ʹparameterΛserver͔Βऔಘ࣮ͯ͠ߦ

Slide 24

Slide 24 text

ސ٬Ϛελʔ ଐੑσʔλ webϩά ߪങཤྺ ΞϓϦϩά customers table behaviors tables ηάϝϯτ உੑ ੡඼AΛ2ͭҎ্ߪೖ ΞϓϦΛ࢖͍ͬͯΔ ࠷ۙαΠτΛ๚Εͨ Profiles API Activation

Slide 25

Slide 25 text

ηάϝϯτΛઃఆ͢Δ ଐੑͷ৚݅ logͷ৚݅ ΞυϗοΫʹΫΤϦΛ࣮ߦ

Slide 26

Slide 26 text

+40/͔ΒΫΤϦΛੜ੒ { "conditions": [ { "array_matching": null, "left_value": { "name": "country" }, "operator": { "not": false, "right_value": "japan", "type": "Equal" }, "type": "Value" }, { "array_matching": null, "left_value": { "aggregation": { "grouping_columns": [], "type": "Count" }, "filter": { "conditions": [ { "array_matching": null, "column": “city”,… select a."cdp_customer_id" from "cdp_audience_21821"."customers" a where a."country" = 'japan' and a."cdp_customer_id" in ( select _r1."cdp_customer_id" from "cdp_audience_21821"."behavior_behavior_1" _r1 where _r1."city" = 'Mexico City9' ) -- set session distributed_join = 'true'

Slide 27

Slide 27 text

ηάϝϯτΫΤϦͷಛ௃ • ΞυϗοΫʹ࣮ߦ͞ΕΔ͜ͱ͕͋Δ • ΞυϗοΫ࣮ߦͨ͠ͱ͖ͷମݧΛଛͳ͍ͨ͘ͳ͍ • ݱঢ়͸͢΂ͯPrestoͰରԠ • ͱ͸͍͑৭ʑͳ໰୊͕… • ϝϞϦ • େྔͷtable scan • Timeout

Slide 28

Slide 28 text

αϯϓϧσʔλ • customers: 10,000,000Ϩίʔυ • behavior_1: 49,006,085Ϩίʔυ +-----------------+-------+---------+--------+ | cdp_customer_id | email | country | time | +-----------------+-------+---------+--------+ | 479c0e0f-c2ed.. | a@... | japan | 583200 | | 7ba0998c-774f.. | b@... | japan | 583201 | | 65bd4227-000e.. | c@... | usa | 583202 | +-----------------+-------+---------+--------+ +-----------+-------+-----------------+--------+ | item_name | count | cdp_customer_id | time | +-----------+-------+-----------------+--------+ | hci | 3 | 479c0e0f-c2ed.. | 583200 | | eml | 9 | 7ba0998c-774f.. | 583201 | | llu | 10 | 65bd4227-000e.. | 583202 | +-----------+-------+-----------------+--------+

Slide 29

Slide 29 text

αϯϓϧηάϝϯτ

Slide 30

Slide 30 text

αϯϓϧηάϝϯτͷΫΤϦ select a."cdp_customer_id" from "kaneko_reprotalk_sample"."customers" a where ( select count(*) from "kaneko_reprotalk_sample"."behavior_1" _r1 where _r1."cdp_customer_id" = a."cdp_customer_id" ) >= 2 -- set session distributed_join = 'true'

Slide 31

Slide 31 text

https://www.slideshare.net/treasure-data/2014-1113prestodbtechshowcase P.48

Slide 32

Slide 32 text

https://www.slideshare.net/treasure-data/2014-1113prestodbtechshowcase P.49

Slide 33

Slide 33 text

ৄ͘͠͸ • “PrestoͰ࣮ݱ͢ΔΠϯλϥΫςΟϒΫΤϦ - dbtech showcase 2014 Tokyo” • https://www.slideshare.net/treasure-data/ 2014-1113prestodbtechshowcase • “Presto: SQL on Everything” • https://research.fb.com/publications/presto-sql-on-everything/

Slide 34

Slide 34 text

໰୊ ϝϞϦ select a."cdp_customer_id" from "kaneko_reprotalk_sample"."customers" a where ( select count(*) from "kaneko_reprotalk_sample"."behavior_1" _r1 where _r1."cdp_customer_id" = a."cdp_customer_id" ) >= 2 -- set session distributed_join = 'true' https://support.treasuredata.com/hc/en-us/articles/360001450908-Presto-Performance-Tuning

Slide 35

Slide 35 text

໰୊ ϝϞϦ • Without distributed_join • Duration: 1 min • Result count: 9,962,475 records, 1 column • Peak memory: 7.53GB • With distributed_join • Duration: 1 min • Result count: 9,962,475 records, 1 column • Peak memory: 1.81GB

Slide 36

Slide 36 text

EJTUSJCVUFE@KPJO • Distributed joins require redistributing both tables using a hash of the join key. This can be slower (sometimes substantially) than broadcast joins, but allows much larger joins. Broadcast joins require that the tables on the right side of the join after filtering fit in memory on each node, whereas distributed joins only need to fit in distributed memory across all nodes. This can also be specified on a per-query basis using the distributed_join session property. https://github.com/prestosql/presto/commit/506f45764040322d244ee6487b12e313b9a9d00f

Slide 37

Slide 37 text

8JUIPVUEJTUSJCVUFE@KPJO 11 tasks Stage 1 1 task Stage 0 18 tasks Stage 2 11 tasks Stage 3 1 task Stage 4

Slide 38

Slide 38 text

8JUIPVUEJTUSJCVUFE@KPJO Stage 3: In 49,006,085 / Out 48,652,188 Output partitioning: HASH - Aggregate(PARTIAL)[cdp_customer_id_0] - ScanProject[table behavior_1] Stage 2: In 48,652,188 / Out 9,997,841 Output partitioning: BROADCAST [] - Aggregate(FINAL)[cdp_customer_id_0] - RemoteSource[3]

Slide 39

Slide 39 text

8JUIPVUEJTUSJCVUFE@KPJO • 9,997,841 * 11 = 109,976,251 Stage 1: In 119,976,262 / Out 9,962,475 Output partitioning: SINGLE [] - FilterProject(>= BIGINT '2') - CrossJoin - LeftJoin - ScanProject[table customers] - LocalExchange - RemoteSource[2]: Output: 109,976,251

Slide 40

Slide 40 text

8JUIEJTUSJCVUFE@KPJO 18 tasks Stage 1 1 task Stage 0 11 tasks Stage 2 9 task Stage 3 1 task Stage 4

Slide 41

Slide 41 text

8JUIEJTUSJCVUFE@KPJO Stage 3: In 49,006,085 / Out 48,651,697 Output partitioning: HASH - Aggregate(PARTIAL)[cdp_customer_id_0] - ScanProject[table behavior_1] Stage 2: In 10,000,000 / Out 10,000,000 Output partitioning: HASH - ScanProject[table customers]

Slide 42

Slide 42 text

8JUIEJTUSJCVUFE@KPJO Stage 1: In 58,651,715 / Out 9,962,475 Output partitioning: SINGLE [] - FilterProject(>= BIGINT '2') - CrossJoin - LeftJoin - RemoteSource[2]: Output: 10,000,000 - Aggregate(FINAL)[cdp_customer_id_0] - RemoteSource[3]: Output: 48,651,697

Slide 43

Slide 43 text

6TFS%FpOFE1BSUJUJPOJOH • All tables in Treasure Data are partitioned based on the time column. … User-defined partitioning (UDP) provides hash partitioning for a table on one or more columns in addition to the time column. https://support.treasuredata.com/hc/en-us/articles/360009798714-User-Defined-Partitioning- for-Presto

Slide 44

Slide 44 text

6TFS%FpOFE1BSUJUJPOJOH • With distributed_join / Without UDP • Duration: 1 min • Result count: 9,962,475 records, 1 column • Peak memory: 1.81GB • With distributed_join / With UDP • Duration: 4 mins • Result count: 9,962,475 records, 1 column • Peak memory: 982.00MB

Slide 45

Slide 45 text

8JUI6%1 18 tasks Stage 1 1 task Stage 0 18 tasks Stage 2 1 task Stage 3

Slide 46

Slide 46 text

8JUI6%1 Stage 2: In 49,006,085 / Out 9,997,841 Output partitioning: td-presto:[cdp_customer_id]/512 [cdp_customer_id_0] - Aggregate[cdp_customer_id_0] - ScanProject[table behavior_1]

Slide 47

Slide 47 text

8JUI6%1 Stage 1: In 19,997,859 / Out 9,962,475 Output partitioning: SINGLE [] - FilterProject(>= BIGINT '2') - CrossJoin - LeftJoin - ScanProject[table customers_udp] - RemoteSource[2]: Output: 9,997,841

Slide 48

Slide 48 text

6TFS%FpOFE1BSUJUJPOJOH • cdp_customer_idʹΑΔbehavior_1ςʔϒϧͷAggregation͕1ͭͷ stageͰ׬݁͢Δ • ࠓ·ͰͷྫͰ͸Aggregate(PARTIAL)ͱAggregate(FINAL)͕ଘࡏ͍ͯ͠ ͨ

Slide 49

Slide 49 text

ෳࡶͳηάϝϯτఆٛ • ηάϝϯτΛఆٛ͢Δͱ͖ʹଞͷηάϝϯτΛࢦఆ͢Δ͜ͱ͕Ͱ͖ Δ

Slide 50

Slide 50 text

select a."cdp_customer_id" from "cdp_audience_21597"."customers" a where ( ( select count(*) from "cdp_audience_21597"."behavior_weblogs" _r1 where _r1."cdp_customer_id" = a."cdp_customer_id" ) > 2 ) or ( ( select count(*) from "cdp_audience_21597"."behavior_weblogs" _r2 where _r2."td_ip_city_latitude" = 0 and _r2."cdp_customer_id" = a."cdp_customer_id" ) > 3 )

Slide 51

Slide 51 text

ෳࡶͳηάϝϯτఆٛ • ۃ୺ʹෳࡶͳηάϝϯτఆ͕ٛ͋Δͱ… • େྔͷtable scan • Timeout

Slide 52

Slide 52 text

վળࡦ • “Materialized View”ͷಋೖ • customersςʔϒϧͷ࡞੒࣌ʹ͍͔ͭ͘ͷηάϝϯτΛબΜͰܭࢉΛ ͓ͯ͘͠ • Queryͷੜ੒࣌ʹ͸࡞੒ࡁΈͷ“view”Λࢀর͢ΔΑ͏ʹมߋ͢Δ • ಉ͡ςʔϒϧʹର͢ΔαϒΫΤϦΛmergeͨ͠QueryΛੜ੒͢Δ (ΞΠ σΞ)

Slide 53

Slide 53 text

Thank you !!