How Presto is used by CDP

How Presto is used by CDP

B3ba3ccedfbf4d605f00bafd1a732529?s=128

yui-knk

June 04, 2019
Tweet

Transcript

  1. How Presto is used by CDP Yuichiro Kaneko(@yui-knk) 2019/06/04 Repro

    Tech: ࣮ફɾฒྻ෼ࢄॲཧج൫
  2. ࣗݾ঺հ • Yuichiro Kaneko • Arm Treasure Data • CDP

    team (Writing Rails application) • CRuby Committer 2015/12~ • GitHub (yui-knk)
  3. https://jobs.lever.co/treasure-data/ 8FBSFIJSJOH

  4. 1SFTUP$POGFSFODF5PLZP https://techplay.jp/event/733772

  5. 1SFTUP$POGFSFODF5PLZP • 2019೥7݄11೔(໦): 13:30- • PrestoΫϦΤʔλʔͰ͋ΓPresto Software Foundationͷ૑࢝ऀͰ΋͋ ΔMartin Traverso,

    Dain Sundstrom, David Philipsͷ3໊Λট͍ͯ… https://techplay.jp/event/733772
  6. σʔλࢦ޲ΞϓϦέʔγϣϯσβΠϯ https://twitter.com/taroleo/status/1131217986000285696

  7. 5PEBZTUPQJD • ͬ͘͟ΓͱTDͷઆ໌ • ͪΐͬͱৄ͘͠CDPͷઆ໌ (Ϣʔεέʔεฤ) • ͪΐͬͱৄ͘͠CDPͷઆ໌ (ٕज़ฤ)

  8. TDͷશମ૾

  9. Plazma DB SDKs Hive Presto

  10. 1MB[NB%# • ࣮σʔλ + ϝλσʔλ • σϑΥϧτͰ͸time columnͰύʔςΟγϣϯ͞Ε͍ͯΔ • https://www.slideshare.net/treasure-data/td-techplazma

    • https://qiita.com/xerial/items/959d2b928353b595f31e • User Defined Partitioning (UDP)Λࢦఆ͢Δ͜ͱ΋Մೳ • https://support.treasuredata.com/hc/en-us/articles/360009798714-User-Defined- Partitioning-for-Presto
  11. *OQVU%BUB • FluentdʹΑΔετϦʔϛϯάΞοϓϩʔυ • EmbulkʹΑΔόϧΫΞοϓϩʔυ • SDKʹΑΔετϦʔϛϯάΞοϓϩʔυ (td-js-sdkͳͲ) • UI͔ΒͷόϧΫΞοϓϩʔυ

  12. 2VFSZ&OHJOF • Query Engineͱͯ͠HiveͱPresto͕ଘࡏ͍ͯ͠Δ

  13. Plazma DB SDKs Hive Presto

  14. TDͷCDPͱ͸ (Ϣʔεέʔεฤ)

  15. https://www.treasuredata.co.jp/learn/

  16. σʔλͷ੔ཧ • ސ٬ʹؔ͢Δ༷ʑͳσʔλΛอ༗͍ͯ͠Δ • ސ٬Ϛελʔσʔλ • ଐੑσʔλ • ߦಈϩά ސ٬Ϛελʔ

    ଐੑσʔλ webϩά ߪങཤྺ ΞϓϦϩά
  17. σʔλͷ੔ཧ • ސ٬σʔλʹ͸ؔ܎͕ଘࡏ͍ͯ͠Δ ސ٬Ϛελʔ ଐੑσʔλ webϩά ߪങཤྺ ΞϓϦϩά

  18. ηάϝϯτͷఆٛ • ଐੑ΍ߦಈΛ΋ͱʹηάϝϯτΛఆٛ͢Δ ސ٬Ϛελʔ ଐੑσʔλ webϩά ߪങཤྺ ΞϓϦϩά ηάϝϯτ உੑ

    ੡඼AΛ2ͭҎ্ߪೖ ΞϓϦΛ࢖͍ͬͯΔ ࠷ۙαΠτΛ๚Εͨ
  19. ηάϝϯτΛ׆༻͢Δ • ηάϝϯτ৘ใΛ΋ͱʹࣗࣾͷweb contentsΛग़͠෼͚Δ • ଞͷϚʔέςΟϯάπʔϧͳͲʹ݁ՌΛग़ྗ ηάϝϯτ உੑ ੡඼AΛ2ͭҎ্ߪೖ ΞϓϦΛ࢖͍ͬͯΔ

    ࠷ۙαΠτΛ๚Εͨ Profiles API Activation
  20. TDͷCDPͱ͸ (ٕज़ฤ)

  21. ސ٬Ϛελʔ ଐੑσʔλ webϩά ߪങཤྺ ΞϓϦϩά customers table behaviors tables ηάϝϯτ

    உੑ ੡඼AΛ2ͭҎ্ߪೖ ΞϓϦΛ࢖͍ͬͯΔ ࠷ۙαΠτΛ๚Εͨ Profiles API Activation
  22. ސ٬Ϛελʔ ଐੑσʔλ webϩά ߪങཤྺ ΞϓϦϩά customers table behaviors tables ηάϝϯτ

    உੑ ੡඼AΛ2ͭҎ্ߪೖ ΞϓϦΛ࢖͍ͬͯΔ ࠷ۙαΠτΛ๚Εͨ Profiles API Activation
  23. $%1Ͱ࢖͏ςʔϒϧͷੜ੒ • workflow(digdag)Λ࢖ͬͯCDPͰ࢖͏֤छςʔϒϧΛੜ੒͢Δ • ্ྲྀσʔλΛऔΓࠐΉλΠϛϯάʹ΋ґଘ͢Δ͕ɺྫ͑͹1೔1ճ workflowΛ࣮ߦ͢Δ • جຊతʹ͸PrestoͰ࣮ߦɻσʔλαΠζ͕େ͖͍έʔε΍Ϣʔβʔଆ ͷઃఆʹΑͬͯ͸HiveͰ࣮ߦ͢Δ͜ͱ΋͋Δ •

    workflow͸erb + ࣮ߦ࣌ʹparameterΛserver͔Βऔಘ࣮ͯ͠ߦ
  24. ސ٬Ϛελʔ ଐੑσʔλ webϩά ߪങཤྺ ΞϓϦϩά customers table behaviors tables ηάϝϯτ

    உੑ ੡඼AΛ2ͭҎ্ߪೖ ΞϓϦΛ࢖͍ͬͯΔ ࠷ۙαΠτΛ๚Εͨ Profiles API Activation
  25. ηάϝϯτΛઃఆ͢Δ ଐੑͷ৚݅ logͷ৚݅ ΞυϗοΫʹΫΤϦΛ࣮ߦ

  26. +40/͔ΒΫΤϦΛੜ੒ { "conditions": [ { "array_matching": null, "left_value": { "name":

    "country" }, "operator": { "not": false, "right_value": "japan", "type": "Equal" }, "type": "Value" }, { "array_matching": null, "left_value": { "aggregation": { "grouping_columns": [], "type": "Count" }, "filter": { "conditions": [ { "array_matching": null, "column": “city”,… select a."cdp_customer_id" from "cdp_audience_21821"."customers" a where a."country" = 'japan' and a."cdp_customer_id" in ( select _r1."cdp_customer_id" from "cdp_audience_21821"."behavior_behavior_1" _r1 where _r1."city" = 'Mexico City9' ) -- set session distributed_join = 'true'
  27. ηάϝϯτΫΤϦͷಛ௃ • ΞυϗοΫʹ࣮ߦ͞ΕΔ͜ͱ͕͋Δ • ΞυϗοΫ࣮ߦͨ͠ͱ͖ͷମݧΛଛͳ͍ͨ͘ͳ͍ • ݱঢ়͸͢΂ͯPrestoͰରԠ • ͱ͸͍͑৭ʑͳ໰୊͕… •

    ϝϞϦ • େྔͷtable scan • Timeout
  28. αϯϓϧσʔλ • customers: 10,000,000Ϩίʔυ • behavior_1: 49,006,085Ϩίʔυ +-----------------+-------+---------+--------+ | cdp_customer_id

    | email | country | time | +-----------------+-------+---------+--------+ | 479c0e0f-c2ed.. | a@... | japan | 583200 | | 7ba0998c-774f.. | b@... | japan | 583201 | | 65bd4227-000e.. | c@... | usa | 583202 | +-----------------+-------+---------+--------+ +-----------+-------+-----------------+--------+ | item_name | count | cdp_customer_id | time | +-----------+-------+-----------------+--------+ | hci | 3 | 479c0e0f-c2ed.. | 583200 | | eml | 9 | 7ba0998c-774f.. | 583201 | | llu | 10 | 65bd4227-000e.. | 583202 | +-----------+-------+-----------------+--------+
  29. αϯϓϧηάϝϯτ

  30. αϯϓϧηάϝϯτͷΫΤϦ select a."cdp_customer_id" from "kaneko_reprotalk_sample"."customers" a where ( select count(*)

    from "kaneko_reprotalk_sample"."behavior_1" _r1 where _r1."cdp_customer_id" = a."cdp_customer_id" ) >= 2 -- set session distributed_join = 'true'
  31. https://www.slideshare.net/treasure-data/2014-1113prestodbtechshowcase P.48

  32. https://www.slideshare.net/treasure-data/2014-1113prestodbtechshowcase P.49

  33. ৄ͘͠͸ • “PrestoͰ࣮ݱ͢ΔΠϯλϥΫςΟϒΫΤϦ - dbtech showcase 2014 Tokyo” • https://www.slideshare.net/treasure-data/

    2014-1113prestodbtechshowcase • “Presto: SQL on Everything” • https://research.fb.com/publications/presto-sql-on-everything/
  34. ໰୊  ϝϞϦ select a."cdp_customer_id" from "kaneko_reprotalk_sample"."customers" a where (

    select count(*) from "kaneko_reprotalk_sample"."behavior_1" _r1 where _r1."cdp_customer_id" = a."cdp_customer_id" ) >= 2 -- set session distributed_join = 'true' https://support.treasuredata.com/hc/en-us/articles/360001450908-Presto-Performance-Tuning
  35. ໰୊  ϝϞϦ • Without distributed_join • Duration: 1 min

    • Result count: 9,962,475 records, 1 column • Peak memory: 7.53GB • With distributed_join • Duration: 1 min • Result count: 9,962,475 records, 1 column • Peak memory: 1.81GB
  36. EJTUSJCVUFE@KPJO • Distributed joins require redistributing both tables using a

    hash of the join key. This can be slower (sometimes substantially) than broadcast joins, but allows much larger joins. Broadcast joins require that the tables on the right side of the join after filtering fit in memory on each node, whereas distributed joins only need to fit in distributed memory across all nodes. This can also be specified on a per-query basis using the distributed_join session property. https://github.com/prestosql/presto/commit/506f45764040322d244ee6487b12e313b9a9d00f
  37. 8JUIPVUEJTUSJCVUFE@KPJO 11 tasks Stage 1 1 task Stage 0 18

    tasks Stage 2 11 tasks Stage 3 1 task Stage 4
  38. 8JUIPVUEJTUSJCVUFE@KPJO Stage 3: In 49,006,085 / Out 48,652,188 Output partitioning:

    HASH - Aggregate(PARTIAL)[cdp_customer_id_0] - ScanProject[table behavior_1] Stage 2: In 48,652,188 / Out 9,997,841 Output partitioning: BROADCAST [] - Aggregate(FINAL)[cdp_customer_id_0] - RemoteSource[3]
  39. 8JUIPVUEJTUSJCVUFE@KPJO • 9,997,841 * 11 = 109,976,251 Stage 1: In

    119,976,262 / Out 9,962,475 Output partitioning: SINGLE [] - FilterProject(>= BIGINT '2') - CrossJoin - LeftJoin - ScanProject[table customers] - LocalExchange - RemoteSource[2]: Output: 109,976,251
  40. 8JUIEJTUSJCVUFE@KPJO 18 tasks Stage 1 1 task Stage 0 11

    tasks Stage 2 9 task Stage 3 1 task Stage 4
  41. 8JUIEJTUSJCVUFE@KPJO Stage 3: In 49,006,085 / Out 48,651,697 Output partitioning:

    HASH - Aggregate(PARTIAL)[cdp_customer_id_0] - ScanProject[table behavior_1] Stage 2: In 10,000,000 / Out 10,000,000 Output partitioning: HASH - ScanProject[table customers]
  42. 8JUIEJTUSJCVUFE@KPJO Stage 1: In 58,651,715 / Out 9,962,475 Output partitioning:

    SINGLE [] - FilterProject(>= BIGINT '2') - CrossJoin - LeftJoin - RemoteSource[2]: Output: 10,000,000 - Aggregate(FINAL)[cdp_customer_id_0] - RemoteSource[3]: Output: 48,651,697
  43. 6TFS%FpOFE1BSUJUJPOJOH • All tables in Treasure Data are partitioned based

    on the time column. … User-defined partitioning (UDP) provides hash partitioning for a table on one or more columns in addition to the time column. https://support.treasuredata.com/hc/en-us/articles/360009798714-User-Defined-Partitioning- for-Presto
  44. 6TFS%FpOFE1BSUJUJPOJOH • With distributed_join / Without UDP • Duration: 1

    min • Result count: 9,962,475 records, 1 column • Peak memory: 1.81GB • With distributed_join / With UDP • Duration: 4 mins • Result count: 9,962,475 records, 1 column • Peak memory: 982.00MB
  45. 8JUI6%1 18 tasks Stage 1 1 task Stage 0 18

    tasks Stage 2 1 task Stage 3
  46. 8JUI6%1 Stage 2: In 49,006,085 / Out 9,997,841 Output partitioning:

    td-presto:[cdp_customer_id]/512 [cdp_customer_id_0] - Aggregate[cdp_customer_id_0] - ScanProject[table behavior_1]
  47. 8JUI6%1 Stage 1: In 19,997,859 / Out 9,962,475 Output partitioning:

    SINGLE [] - FilterProject(>= BIGINT '2') - CrossJoin - LeftJoin - ScanProject[table customers_udp] - RemoteSource[2]: Output: 9,997,841
  48. 6TFS%FpOFE1BSUJUJPOJOH • cdp_customer_idʹΑΔbehavior_1ςʔϒϧͷAggregation͕1ͭͷ stageͰ׬݁͢Δ • ࠓ·ͰͷྫͰ͸Aggregate(PARTIAL)ͱAggregate(FINAL)͕ଘࡏ͍ͯ͠ ͨ

  49. ෳࡶͳηάϝϯτఆٛ • ηάϝϯτΛఆٛ͢Δͱ͖ʹଞͷηάϝϯτΛࢦఆ͢Δ͜ͱ͕Ͱ͖ Δ

  50. select a."cdp_customer_id" from "cdp_audience_21597"."customers" a where ( ( select count(*)

    from "cdp_audience_21597"."behavior_weblogs" _r1 where _r1."cdp_customer_id" = a."cdp_customer_id" ) > 2 ) or ( ( select count(*) from "cdp_audience_21597"."behavior_weblogs" _r2 where _r2."td_ip_city_latitude" = 0 and _r2."cdp_customer_id" = a."cdp_customer_id" ) > 3 )
  51. ෳࡶͳηάϝϯτఆٛ • ۃ୺ʹෳࡶͳηάϝϯτఆ͕ٛ͋Δͱ… • େྔͷtable scan • Timeout

  52. վળࡦ • “Materialized View”ͷಋೖ • customersςʔϒϧͷ࡞੒࣌ʹ͍͔ͭ͘ͷηάϝϯτΛબΜͰܭࢉΛ ͓ͯ͘͠ • Queryͷੜ੒࣌ʹ͸࡞੒ࡁΈͷ“view”Λࢀর͢ΔΑ͏ʹมߋ͢Δ •

    ಉ͡ςʔϒϧʹର͢ΔαϒΫΤϦΛmergeͨ͠QueryΛੜ੒͢Δ (ΞΠ σΞ)
  53. Thank you !!