Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How Presto is used by CDP

How Presto is used by CDP

yui-knk

June 04, 2019
Tweet

More Decks by yui-knk

Other Decks in Programming

Transcript

  1. How Presto is used by CDP
    Yuichiro Kaneko(@yui-knk)
    2019/06/04
    Repro Tech: ࣮ફɾฒྻ෼ࢄॲཧج൫

    View Slide

  2. ࣗݾ঺հ
    • Yuichiro Kaneko

    • Arm Treasure Data

    • CDP team (Writing Rails application)

    • CRuby Committer 2015/12~

    • GitHub (yui-knk)

    View Slide

  3. https://jobs.lever.co/treasure-data/
    8FBSFIJSJOH

    View Slide

  4. 1SFTUP$POGFSFODF5PLZP
    https://techplay.jp/event/733772

    View Slide

  5. 1SFTUP$POGFSFODF5PLZP
    • 2019೥7݄11೔(໦): 13:30-

    • PrestoΫϦΤʔλʔͰ͋ΓPresto Software Foundationͷ૑࢝ऀͰ΋͋
    ΔMartin Traverso, Dain Sundstrom, David Philipsͷ3໊Λট͍ͯ…
    https://techplay.jp/event/733772

    View Slide

  6. σʔλࢦ޲ΞϓϦέʔγϣϯσβΠϯ
    https://twitter.com/taroleo/status/1131217986000285696

    View Slide

  7. 5PEBZTUPQJD
    • ͬ͘͟ΓͱTDͷઆ໌

    • ͪΐͬͱৄ͘͠CDPͷઆ໌ (Ϣʔεέʔεฤ)

    • ͪΐͬͱৄ͘͠CDPͷઆ໌ (ٕज़ฤ)

    View Slide

  8. TDͷશମ૾

    View Slide

  9. Plazma DB
    SDKs
    Hive Presto

    View Slide

  10. 1MB[NB%#
    • ࣮σʔλ + ϝλσʔλ

    • σϑΥϧτͰ͸time columnͰύʔςΟγϣϯ͞Ε͍ͯΔ

    • https://www.slideshare.net/treasure-data/td-techplazma

    • https://qiita.com/xerial/items/959d2b928353b595f31e

    • User Defined Partitioning (UDP)Λࢦఆ͢Δ͜ͱ΋Մೳ

    • https://support.treasuredata.com/hc/en-us/articles/360009798714-User-Defined-
    Partitioning-for-Presto

    View Slide

  11. *OQVU%BUB
    • FluentdʹΑΔετϦʔϛϯάΞοϓϩʔυ

    • EmbulkʹΑΔόϧΫΞοϓϩʔυ

    • SDKʹΑΔετϦʔϛϯάΞοϓϩʔυ (td-js-sdkͳͲ)

    • UI͔ΒͷόϧΫΞοϓϩʔυ

    View Slide

  12. 2VFSZ&OHJOF
    • Query Engineͱͯ͠HiveͱPresto͕ଘࡏ͍ͯ͠Δ

    View Slide

  13. Plazma DB
    SDKs
    Hive Presto

    View Slide

  14. TDͷCDPͱ͸ (Ϣʔεέʔεฤ)

    View Slide

  15. https://www.treasuredata.co.jp/learn/

    View Slide

  16. σʔλͷ੔ཧ
    • ސ٬ʹؔ͢Δ༷ʑͳσʔλΛอ༗͍ͯ͠Δ

    • ސ٬Ϛελʔσʔλ

    • ଐੑσʔλ

    • ߦಈϩά
    ސ٬Ϛελʔ
    ଐੑσʔλ
    webϩά
    ߪങཤྺ
    ΞϓϦϩά

    View Slide

  17. σʔλͷ੔ཧ
    • ސ٬σʔλʹ͸ؔ܎͕ଘࡏ͍ͯ͠Δ
    ސ٬Ϛελʔ
    ଐੑσʔλ
    webϩά
    ߪങཤྺ
    ΞϓϦϩά

    View Slide

  18. ηάϝϯτͷఆٛ
    • ଐੑ΍ߦಈΛ΋ͱʹηάϝϯτΛఆٛ͢Δ
    ސ٬Ϛελʔ
    ଐੑσʔλ
    webϩά
    ߪങཤྺ
    ΞϓϦϩά
    ηάϝϯτ
    உੑ
    ੡඼AΛ2ͭҎ্ߪೖ
    ΞϓϦΛ࢖͍ͬͯΔ
    ࠷ۙαΠτΛ๚Εͨ

    View Slide

  19. ηάϝϯτΛ׆༻͢Δ
    • ηάϝϯτ৘ใΛ΋ͱʹࣗࣾͷweb contentsΛग़͠෼͚Δ

    • ଞͷϚʔέςΟϯάπʔϧͳͲʹ݁ՌΛग़ྗ
    ηάϝϯτ
    உੑ
    ੡඼AΛ2ͭҎ্ߪೖ
    ΞϓϦΛ࢖͍ͬͯΔ
    ࠷ۙαΠτΛ๚Εͨ
    Profiles API
    Activation

    View Slide

  20. TDͷCDPͱ͸ (ٕज़ฤ)

    View Slide

  21. ސ٬Ϛελʔ
    ଐੑσʔλ
    webϩά
    ߪങཤྺ
    ΞϓϦϩά
    customers
    table
    behaviors
    tables
    ηάϝϯτ
    உੑ
    ੡඼AΛ2ͭҎ্ߪೖ
    ΞϓϦΛ࢖͍ͬͯΔ
    ࠷ۙαΠτΛ๚Εͨ
    Profiles API
    Activation

    View Slide

  22. ސ٬Ϛελʔ
    ଐੑσʔλ
    webϩά
    ߪങཤྺ
    ΞϓϦϩά
    customers
    table
    behaviors
    tables
    ηάϝϯτ
    உੑ
    ੡඼AΛ2ͭҎ্ߪೖ
    ΞϓϦΛ࢖͍ͬͯΔ
    ࠷ۙαΠτΛ๚Εͨ
    Profiles API
    Activation

    View Slide

  23. $%1Ͱ࢖͏ςʔϒϧͷੜ੒
    • workflow(digdag)Λ࢖ͬͯCDPͰ࢖͏֤छςʔϒϧΛੜ੒͢Δ

    • ্ྲྀσʔλΛऔΓࠐΉλΠϛϯάʹ΋ґଘ͢Δ͕ɺྫ͑͹1೔1ճ
    workflowΛ࣮ߦ͢Δ

    • جຊతʹ͸PrestoͰ࣮ߦɻσʔλαΠζ͕େ͖͍έʔε΍Ϣʔβʔଆ
    ͷઃఆʹΑͬͯ͸HiveͰ࣮ߦ͢Δ͜ͱ΋͋Δ

    • workflow͸erb + ࣮ߦ࣌ʹparameterΛserver͔Βऔಘ࣮ͯ͠ߦ

    View Slide

  24. ސ٬Ϛελʔ
    ଐੑσʔλ
    webϩά
    ߪങཤྺ
    ΞϓϦϩά
    customers
    table
    behaviors
    tables
    ηάϝϯτ
    உੑ
    ੡඼AΛ2ͭҎ্ߪೖ
    ΞϓϦΛ࢖͍ͬͯΔ
    ࠷ۙαΠτΛ๚Εͨ
    Profiles API
    Activation

    View Slide

  25. ηάϝϯτΛઃఆ͢Δ
    ଐੑͷ৚݅
    logͷ৚݅
    ΞυϗοΫʹΫΤϦΛ࣮ߦ

    View Slide

  26. +40/͔ΒΫΤϦΛੜ੒
    {
    "conditions": [
    {
    "array_matching": null,
    "left_value": {
    "name": "country"
    },
    "operator": {
    "not": false,
    "right_value": "japan",
    "type": "Equal"
    },
    "type": "Value"
    },
    {
    "array_matching": null,
    "left_value": {
    "aggregation": {
    "grouping_columns": [],
    "type": "Count"
    },
    "filter": {
    "conditions": [
    {
    "array_matching": null,
    "column": “city”,…
    select
    a."cdp_customer_id"
    from "cdp_audience_21821"."customers" a
    where a."country" = 'japan'
    and a."cdp_customer_id" in (
    select
    _r1."cdp_customer_id"
    from "cdp_audience_21821"."behavior_behavior_1" _r1
    where _r1."city" = 'Mexico City9'
    )
    -- set session distributed_join = 'true'

    View Slide

  27. ηάϝϯτΫΤϦͷಛ௃
    • ΞυϗοΫʹ࣮ߦ͞ΕΔ͜ͱ͕͋Δ

    • ΞυϗοΫ࣮ߦͨ͠ͱ͖ͷମݧΛଛͳ͍ͨ͘ͳ͍

    • ݱঢ়͸͢΂ͯPrestoͰରԠ

    • ͱ͸͍͑৭ʑͳ໰୊͕…

    • ϝϞϦ

    • େྔͷtable scan

    • Timeout

    View Slide

  28. αϯϓϧσʔλ
    • customers: 10,000,000Ϩίʔυ

    • behavior_1: 49,006,085Ϩίʔυ
    +-----------------+-------+---------+--------+
    | cdp_customer_id | email | country | time |
    +-----------------+-------+---------+--------+
    | 479c0e0f-c2ed.. | a@... | japan | 583200 |
    | 7ba0998c-774f.. | b@... | japan | 583201 |
    | 65bd4227-000e.. | c@... | usa | 583202 |
    +-----------------+-------+---------+--------+
    +-----------+-------+-----------------+--------+
    | item_name | count | cdp_customer_id | time |
    +-----------+-------+-----------------+--------+
    | hci | 3 | 479c0e0f-c2ed.. | 583200 |
    | eml | 9 | 7ba0998c-774f.. | 583201 |
    | llu | 10 | 65bd4227-000e.. | 583202 |
    +-----------+-------+-----------------+--------+

    View Slide

  29. αϯϓϧηάϝϯτ

    View Slide

  30. αϯϓϧηάϝϯτͷΫΤϦ
    select
    a."cdp_customer_id"
    from "kaneko_reprotalk_sample"."customers" a
    where (
    select
    count(*)
    from "kaneko_reprotalk_sample"."behavior_1" _r1
    where _r1."cdp_customer_id" = a."cdp_customer_id"
    ) >= 2
    -- set session distributed_join = 'true'

    View Slide

  31. https://www.slideshare.net/treasure-data/2014-1113prestodbtechshowcase P.48

    View Slide

  32. https://www.slideshare.net/treasure-data/2014-1113prestodbtechshowcase P.49

    View Slide

  33. ৄ͘͠͸
    • “PrestoͰ࣮ݱ͢ΔΠϯλϥΫςΟϒΫΤϦ - dbtech showcase 2014
    Tokyo”

    • https://www.slideshare.net/treasure-data/
    2014-1113prestodbtechshowcase

    • “Presto: SQL on Everything”

    • https://research.fb.com/publications/presto-sql-on-everything/

    View Slide

  34. ໰୊
    ϝϞϦ
    select
    a."cdp_customer_id"
    from "kaneko_reprotalk_sample"."customers" a
    where (
    select
    count(*)
    from "kaneko_reprotalk_sample"."behavior_1" _r1
    where _r1."cdp_customer_id" = a."cdp_customer_id"
    ) >= 2
    -- set session distributed_join = 'true'
    https://support.treasuredata.com/hc/en-us/articles/360001450908-Presto-Performance-Tuning

    View Slide

  35. ໰୊
    ϝϞϦ
    • Without distributed_join

    • Duration: 1 min

    • Result count: 9,962,475 records, 1 column

    • Peak memory: 7.53GB

    • With distributed_join

    • Duration: 1 min

    • Result count: 9,962,475 records, 1 column

    • Peak memory: 1.81GB

    View Slide

  36. EJTUSJCVUFE@KPJO
    • Distributed joins require redistributing both tables using a hash of the
    join key. This can be slower (sometimes substantially) than broadcast
    joins, but allows much larger joins. Broadcast joins require that the
    tables on the right side of the join after filtering fit in memory on each
    node, whereas distributed joins only need to fit in distributed memory
    across all nodes. This can also be specified on a per-query basis
    using the distributed_join session property.
    https://github.com/prestosql/presto/commit/506f45764040322d244ee6487b12e313b9a9d00f

    View Slide

  37. 8JUIPVUEJTUSJCVUFE@KPJO
    11 tasks
    Stage 1
    1 task
    Stage 0
    18 tasks
    Stage 2
    11 tasks
    Stage 3
    1 task
    Stage 4

    View Slide

  38. 8JUIPVUEJTUSJCVUFE@KPJO
    Stage 3: In 49,006,085 / Out 48,652,188
    Output partitioning: HASH
    - Aggregate(PARTIAL)[cdp_customer_id_0]
    - ScanProject[table behavior_1]
    Stage 2: In 48,652,188 / Out 9,997,841
    Output partitioning: BROADCAST []
    - Aggregate(FINAL)[cdp_customer_id_0]
    - RemoteSource[3]

    View Slide

  39. 8JUIPVUEJTUSJCVUFE@KPJO
    • 9,997,841 * 11 = 109,976,251
    Stage 1: In 119,976,262 / Out 9,962,475
    Output partitioning: SINGLE []
    - FilterProject(>= BIGINT '2')
    - CrossJoin
    - LeftJoin
    - ScanProject[table customers]
    - LocalExchange
    - RemoteSource[2]: Output: 109,976,251

    View Slide

  40. 8JUIEJTUSJCVUFE@KPJO
    18 tasks
    Stage 1
    1 task
    Stage 0
    11 tasks
    Stage 2
    9 task
    Stage 3
    1 task
    Stage 4

    View Slide

  41. 8JUIEJTUSJCVUFE@KPJO
    Stage 3: In 49,006,085 / Out 48,651,697
    Output partitioning: HASH
    - Aggregate(PARTIAL)[cdp_customer_id_0]
    - ScanProject[table behavior_1]
    Stage 2: In 10,000,000 / Out 10,000,000
    Output partitioning: HASH
    - ScanProject[table customers]

    View Slide

  42. 8JUIEJTUSJCVUFE@KPJO
    Stage 1: In 58,651,715 / Out 9,962,475
    Output partitioning: SINGLE []
    - FilterProject(>= BIGINT '2')
    - CrossJoin
    - LeftJoin
    - RemoteSource[2]: Output: 10,000,000
    - Aggregate(FINAL)[cdp_customer_id_0]
    - RemoteSource[3]: Output: 48,651,697

    View Slide

  43. 6TFS%FpOFE1BSUJUJPOJOH
    • All tables in Treasure Data are partitioned based on the time column.
    … User-defined partitioning (UDP) provides hash partitioning for a
    table on one or more columns in addition to the time column.
    https://support.treasuredata.com/hc/en-us/articles/360009798714-User-Defined-Partitioning-
    for-Presto

    View Slide

  44. 6TFS%FpOFE1BSUJUJPOJOH
    • With distributed_join / Without UDP

    • Duration: 1 min

    • Result count: 9,962,475 records, 1 column

    • Peak memory: 1.81GB

    • With distributed_join / With UDP

    • Duration: 4 mins

    • Result count: 9,962,475 records, 1 column

    • Peak memory: 982.00MB

    View Slide

  45. 8JUI6%1
    18 tasks
    Stage 1
    1 task
    Stage 0
    18 tasks
    Stage 2
    1 task
    Stage 3

    View Slide

  46. 8JUI6%1
    Stage 2: In 49,006,085 / Out 9,997,841
    Output partitioning: td-presto:[cdp_customer_id]/512
    [cdp_customer_id_0]
    - Aggregate[cdp_customer_id_0]
    - ScanProject[table behavior_1]

    View Slide

  47. 8JUI6%1
    Stage 1: In 19,997,859 / Out 9,962,475
    Output partitioning: SINGLE []
    - FilterProject(>= BIGINT '2')
    - CrossJoin
    - LeftJoin
    - ScanProject[table customers_udp]
    - RemoteSource[2]: Output: 9,997,841

    View Slide

  48. 6TFS%FpOFE1BSUJUJPOJOH
    • cdp_customer_idʹΑΔbehavior_1ςʔϒϧͷAggregation͕1ͭͷ
    stageͰ׬݁͢Δ

    • ࠓ·ͰͷྫͰ͸Aggregate(PARTIAL)ͱAggregate(FINAL)͕ଘࡏ͍ͯ͠
    ͨ

    View Slide

  49. ෳࡶͳηάϝϯτఆٛ
    • ηάϝϯτΛఆٛ͢Δͱ͖ʹଞͷηάϝϯτΛࢦఆ͢Δ͜ͱ͕Ͱ͖
    Δ

    View Slide

  50. select
    a."cdp_customer_id"
    from "cdp_audience_21597"."customers" a
    where (
    (
    select
    count(*)
    from "cdp_audience_21597"."behavior_weblogs" _r1
    where _r1."cdp_customer_id" = a."cdp_customer_id"
    ) > 2
    )
    or (
    (
    select
    count(*)
    from "cdp_audience_21597"."behavior_weblogs" _r2
    where _r2."td_ip_city_latitude" = 0
    and _r2."cdp_customer_id" = a."cdp_customer_id"
    ) > 3
    )

    View Slide

  51. ෳࡶͳηάϝϯτఆٛ
    • ۃ୺ʹෳࡶͳηάϝϯτఆ͕ٛ͋Δͱ…

    • େྔͷtable scan

    • Timeout

    View Slide

  52. վળࡦ
    • “Materialized View”ͷಋೖ

    • customersςʔϒϧͷ࡞੒࣌ʹ͍͔ͭ͘ͷηάϝϯτΛબΜͰܭࢉΛ
    ͓ͯ͘͠

    • Queryͷੜ੒࣌ʹ͸࡞੒ࡁΈͷ“view”Λࢀর͢ΔΑ͏ʹมߋ͢Δ

    • ಉ͡ςʔϒϧʹର͢ΔαϒΫΤϦΛmergeͨ͠QueryΛੜ੒͢Δ (ΞΠ
    σΞ)

    View Slide

  53. Thank you !!

    View Slide