Slide 1

Slide 1 text

Repro ʹ͓͚Δ Presto Cassandra Connector վ଄ൿ࿩ Presto Conference Tokyo 2020 (2020-11-20) Reproגࣜձࣾ Takeshi Arabiki (@a_bicky)

Slide 2

Slide 2 text

• Twitter: @a_bicky • Blog: ͋Βͼ͖೔ه • ॴଐ: Repro גࣜձࣾ (2017 ೥ 8 ݄ʙ) • SRE ͬΆ͍ʢʁʣνʔϜॴଐ • σʔλऩूɾॲཧؔ࿈౳ͷػೳ։ൃɾΞʔΩςΫνϟ࡮৽ͨ͠Γ • Presto, Cassandra, Kafka, Fluentd ౳ͷ໘౗ΛݟͨΓ • AWS ౳ͷΠϯϑϥपΓͷ໘౗ΛݟͨΓ • etc. ͳνʔϜ ࣗݾ঺հ

Slide 3

Slide 3 text

ຊൃදʹ͍ͭͯ https://speakerdeck.com/a_bicky/repro-tech-meetup-number-9

Slide 4

Slide 4 text

• Repro ʹ͓͚Δ Presto ͷ׆༻Օॴ • Presto Cassandra Connector ͷߴ଎Խ • վ଄൛ Presto Cassandra Connector ͷ EMR ΁ͷಋೖ ΞδΣϯμ

Slide 5

Slide 5 text

Repro ʹ͓͚Δ Presto ͷ׆༻Օॴ

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

഑৴ର৅ઃఆ

Slide 8

Slide 8 text

• ഑৴ର৅৚݅ʢϢʔβηάϝϯςʔγϣϯʣ͔Β SQL Λػցతʹੜ੒ • ಛఆͷ SQL Λ࠷దԽ͢Δ͜ͱ͕೉͍͠ • ϓογϡ௨஌ͷ഑৴௚લʹ഑৴ର৅ϢʔβΛܾఆ • ௕͔͔࣌ؒΔ SQL ͕ൃߦ͞ΕΔͱ༧ఆͷ഑৴࣌ؒʹؒʹ߹Θͳ͍ • Repro Λಋೖ͍ͯ͠ΔαʔϏεʹΑͬͯσʔλͷن໛΋ಛੑ΋ҟͳΔ • ಉ͡Α͏ͳ഑৴৚݅Ͱ΋αʔϏεʹΑͬͯ SQL ͷ࣮ߦ͕࣌ؒେ͖͘ҟͳΔ ಛघͳ Presto ͷ༻్

Slide 9

Slide 9 text

ϓογϡ௨஌ؔ࿈ͷΞʔΩςΫνϟུ֓ਤ event data processor scheduled job ᶃ Insert data ᶄ SQL ᶅ CQL ᶆ Data ᶇ user IDs etc. ᶈ Invoke push notification application

Slide 10

Slide 10 text

Cassandra ಋೖཧ༝ http://joker1007.hatenablog.com/entry/2018/06/29/201400

Slide 11

Slide 11 text

Cassandra ಋೖཧ༝ http://joker1007.hatenablog.com/entry/2018/06/29/201400 • ॻ͖ࠐΈͷεέʔϥϏϦςΟΛ୲อ͍ͨ͠ • MySQL ͩͱ sharding ͕ඞཁʹͳΔ͠؆୯ʹ writer Λ૿΍ͤͳ͍ • σʔλΛߋ৽͍ͨ͠ • Hive ͸ߋ৽Ͱ͖ͳ͍ͷͰ SQL Λ޻෉ͯ͠࠷৽ͷϨίʔυͷ஋Λऔಘ͢Δඞཁ͕͋Δ • σʔλͷ൓өΛ΄΅ϦΞϧλΠϜͰߦ͍͍ͨ • Hive ͩͱ͋Δఔ౓σʔλΛ·ͱΊ͔ͯΒ HDFS, S3 ౳ʹॻ͖ࠐ·ͳ͍ͱඇޮ཰ • Presto ͔ΒେྔͷσʔλΛߴ଎ʹऔಘ͍ͨ͠ • MySQL ͩͱ 1 worker ͚ͩͰσʔλΛऔಘ͢Δ͜ͱʹͳΓ௿଎ • Cassandra ͷ༻్ʹ͸޲͍͍ͯͳ͍͕ MySQL ʹൺ΂Δͱߴ଎

Slide 12

Slide 12 text

Bucketing ʹΑΔ Cassandra ͷෛՙ෼ࢄ CREATE TABLE main.event_counts ( service_id bigint, event_id bigint, dt text, bucket int, user_id bigint, occurrence bigint, PRIMARY KEY ((service_id, event_id, dt, bucket), user_id) ); partition key murmur3(user_id) % bucket num

Slide 13

Slide 13 text

഑৴৚݅ʹϚον͢ΔϢʔβͷநग़ͷྫ

Slide 14

Slide 14 text

഑৴৚݅ʹϚον͢ΔϢʔβͷநग़ͷྫ SELECT user_id FROM ( SELECT user_id FROM cassandra.main.event_counts WHERE service_id = 1 AND event_id = 2 AND bucket IN (0, 1, ...) AND dt IN ('2020-11-18', '2002-11-19', '2020-11-20') AND occurrence >= 1 ) AS condition_0 LEFT OUTER JOIN ( SELECT user_id FROM cassandra.main.event_counts WHERE service_id = 1 AND event_id = 3 AND bucket IN (0, 1, ...) AND dt IN ('2020-11-18', '2002-11-19', '2020-11-20') AND occurrence >= 1 ) AS condition_1 ON condition_0.user_id = condition_1.user_id WHERE condition_1.user_id IS NULL; -- 08_ΞΠςϜߪೖը໘Λ 3 ೔Ҏ಺ʹ 1 ճҎ্࣮ߦͨ͠ -- ߪೖΛ 3 ೔Ҏ಺ʹ 1 ճҎ্࣮ߦͨ͠ -- AND NOT

Slide 15

Slide 15 text

Presto Cassandra Connector ͷߴ଎Խ

Slide 16

Slide 16 text

஗͍ Planning: ͋Δ SQL ͷ౷ܭ৘ใ { "elapsed_time": "80620", "execution_time": "6833", "distributed_planning_time": "9", "analysis_time": "73698" } 73 ඵʂʂ

Slide 17

Slide 17 text

Presto ͷ Planning https://www.oreilly.com/library/view/presto-the-definitive/9781492044260/ Figure 4-1. Presto architecture overview with coordinator and workers

Slide 18

Slide 18 text

ݪҼௐࠪ: σόοάϩάͷ֬ೝ • /path/to/log.properties ʹ com.facebook.presto=DEBUG Λ௥Ճͯ͠࠶ىಈ • prestosql ൛ͷ৔߹͸ io.prestosql=DEBUG • JMX ΤʔδΣϯτ͕༗ޮͰ͋Ε͹࠶ىಈͤͣʹมߋՄೳ • cf. Amazon Elastic MapReduce (EMR) Ͱ͸͡ΊΔ Presto ೖ໳#σόοάϩάͷ༗ޮԽ 2018-09-19T11:03:24.824Z ...QueryStateMachine Query .... is PLANNING 2018-09-19T11:03:27.325Z ...CassandraPartitionManager... #partitions: 512 2018-09-19T11:03:28.795Z ...CassandraPartitionManager... #partitions: 256 2018-09-19T11:03:30.449Z ...CassandraPartitionManager... #partitions: 512 2018-09-19T11:03:38.877Z ...CassandraPartitionManager... #partitions: 2048 ... 2018-09-19T11:04:38.419Z ...CassandraPartitionManager... #partitions: 256

Slide 19

Slide 19 text

ιʔείʔυϦʔσΟϯά (Presto 0.203) • CassandraPartitionManager#getCassandraPartitions • WHERE ۟ʹࢦఆ͞ΕͨύʔςΟγϣϯͷ૊Έ߹Θͤͷ਺͚ͩ NativeCassandraSession#getPartitions Λݺͼग़͢ • cf. CassandraPartitionManager.java#L125-L127 • NativeCassandraSession#getPartitions • ࢦఆ͞ΕͨύʔςΟγϣϯͷ༗ແΛ SELECT DISTINCT ... Λ࣮ߦ͢Δ͜ͱͰ֬ೝ • cf. NativeCassandraSession.java#L418-L420

Slide 20

Slide 20 text

ιʔείʔυϦʔσΟϯά (Presto 0.203) • CassandraPartitionManager#getCassandraPartitions • WHERE ۟ʹࢦఆ͞ΕͨύʔςΟγϣϯͷ૊Έ߹Θͤͷ਺͚ͩ NativeCassandraSession#getPartitions Λݺͼग़͢ • cf. CassandraPartitionManager.java#L125-L127 • NativeCassandraSession#getPartitions • ࢦఆ͞ΕͨύʔςΟγϣϯͷ༗ແΛ SELECT DISTINCT ... Λ࣮ߦ͢Δ͜ͱͰ֬ೝ • cf. NativeCassandraSession.java#L418-L420 WHERE 句に指定されたパーティションの 組み合わせの数だけ SELECT DISTINCT を発⾏

Slide 21

Slide 21 text

ύʔςΟγϣϯͷ૊Έ߹Θͤྫ SELECT user_id FROM cassandra.main.event_counts WHERE service_id = 1 AND event_id = 2 AND dt IN ('2020-11-18', '2020-11-19', '2020-11-20') AND bucket IN (0, 1, ..., 63) AND occurrence >= 1; SELECT DISTINCT service_id, event_id, dt, bucket FROM main.event_counts WHERE service_id = 1 AND event_id = 2 AND dt = '2020-11-18' AND bucket = 0; SELECT DISTINCT service_id, event_id, dt, bucket FROM main.event_counts WHERE service_id = 1 AND event_id = 2 AND dt = '2020-11-19' AND bucket = 0; ... 1 x 1 x 3 x 64 = 192 queries Presto SQL CQL NativeCassandraSession#getPartitions

Slide 22

Slide 22 text

Presto ͷ Planningʢ࠶ܝʣ https://www.oreilly.com/library/view/presto-the-definitive/9781492044260/ Figure 4-1. Presto architecture overview with coordinator and workers

Slide 23

Slide 23 text

Presto ͷ Planningʢ࠶ܝʣ https://www.oreilly.com/library/view/presto-the-definitive/9781492044260/ Figure 4-1. Presto architecture overview with coordinator and workers • Planning ʹ͸ύʔςΟγϣϯ৘ใ͕ඞཁ • Hive ͸ Hive metastore ʹύʔςΟγϣϯ৘ใͳͲΛอ͍࣋ͯ͠Δ • Hive connector ͸ Hive metastore Λར༻͢Δ͜ͱͰύʔςΟγϣϯͷ༗ແΛ֬ೝ • Cassandra ͸ύʔςΟγϣϯͷ༗ແʹ͍ͭͯͷ৘ใΛอ͍࣋ͯ͠ͳ͍ • Cassandra connector ͸࣮ࡍʹ CQL Λ౤͛Δ͜ͱͰ༗ແΛ֬ೝ

Slide 24

Slide 24 text

IN ԋࢉࢠΛ࢖ͬͨ࠷దԽ (Presto 0.204) https://github.com/prestodb/presto/pull/10720 SELECT DISTINCT service_id, event_id, dt, bucket FROM main.event_counts WHERE service_id = 1 AND event_id = 2 AND dt IN ('2020-11-18', '2020-11-19', '2020-11-20') AND bucket IN (0, 1, ..., 63); CQL

Slide 25

Slide 25 text

IN ԋࢉࢠΛ࢖ͬͨ࠷దԽ (Presto 0.204) https://github.com/prestodb/presto/pull/10720 SELECT DISTINCT service_id, event_id, dt, bucket FROM main.event_counts WHERE service_id = 1 AND event_id = 2 AND dt IN ('2020-11-18', '2020-11-19', '2020-11-20') AND bucket IN (0, 1, ..., 63); CQL 3x faster in our use cases but not enough!!

Slide 26

Slide 26 text

• Bucketed ςʔϒϧͰύʔςΟγϣϯͷଘࡏ༗ແΛ֬ೝ͢Δҙຯ͕΄ͱΜͲͳ͍ • Ϩίʔυ਺͕ଟ͍Πϕϯτ౳͸Ͳͷ bucket ͷύʔςΟγϣϯ΋ଘࡏ͢ΔՄೳੑ͕ߴ͍ • Ϩίʔυ਺͕গͳ͍Πϕϯτ౳͸࠷దԽ͠ͳͯ͘΋े෼ߴ଎ • WHERE ۟ʹࢦఆ͞ΕͨύʔςΟγϣϯ͸શͯଘࡏ͢Δ΋ͷͱΈͳͤ͹Ұॠ • ύʔςΟγϣϯͷଘࡏ༗ແΛ֬ೝ͢ΔॲཧΛεΩοϓ͢ΔΦϓγϣϯΛ௥Ճ • prestodb/presto Λ fork ͯ͠վ଄ͨ͠ presto-cassandra.jar Λຊ൪Ͱ࢖༻ Planning ߴ଎ԽͷΞΠσΞ

Slide 27

Slide 27 text

Presto Cassandra Connector ͷվ଄ https://github.com/reproio/presto/pull/1

Slide 28

Slide 28 text

վ଄൛ Presto Cassandra Connector ͷޮՌ skip-partition-check Analysis time (ms) CPU time (ms) false 22,829 6,946 true 1,060 8,057 ※ com.facebook.presto.spi.eventlistener.QueryStatistics ͔ΒಘΒΕΔϝτϦΫε

Slide 29

Slide 29 text

վ଄൛ Presto Cassandra Connector ͷ EMR ΁ͷಋೖ

Slide 30

Slide 30 text

• ໰୊఺ • JVM ͷઃఆΛมߋͰ͖ͳ͍ • Resource groups ͷઃఆΛมߋͰ͖ͳ͍ • Plugin ͷಋೖ΍ event-listener ͷઃఆ͕Ͱ͖ͳ͍ • Connector ʹಠࣗͷ jar Λ࢖͑ͳ͍ • ղܾࡦ • Bootstrap actions ͰؤுΔ • ΧελϜ AMI Λ࢖͏ EMR Ͱ Presto ΛΧελϚΠζ͢Δࡍͷ໰୊఺ͱղܾࡦ

Slide 31

Slide 31 text

• Presto ౳ΛΠϯετʔϧ͢Δલʹ࣮ߦ͢ΔεΫϦϓτ • Bootstrap actions ͕ऴΘΒͳ͍ͱΠϯετʔϧ͕࢝·Βͳ͍ • Presto ΛΠϯετʔϧͨ͠ޙʹεΫϦϓτΛ࣮ߦ͢Δ͜ͱ͸Ͱ͖ͳ͍ Bootstrap actions # Execute the script in background and exit immediately # because Presto is not installed until bootstrap actions end # cf. https://forums.aws.amazon.com/thread.jspa?threadID=220183 if [ $(whoami) != "root" ]; then sudo "$0" "$@" & exit 0 fi # Commands to execute after installing Presto is here

Slide 32

Slide 32 text

վ଄൛ Presto Cassandra Connector ͷઃஔ until curl -s localhost:8889/v1/info; do sleep 1; done presto_ver=$(curl -s localhost:8889/v1/info | jq -r '.nodeVersion.version') plugin_dir=/usr/lib/presto/plugin presto_cassandra=$plugin_dir/cassandra/presto-cassandra-$presto_ver.jar mv $presto_cassandra $presto_cassandra.orig aws s3 cp $s3_uri/plugin/ $plugin_dir/ --recursive if [ ! -f $presto_cassandra ]; then echo "$presto_cassandra doesn't exist. Please upload it to S3." >&2 exit 1 fi # Don't specify this prop in EMR configurations to avoid UNUSED property error echo "cassandra.skip-partition-check=true" \ >> /etc/presto/conf/catalog/cassandra.properties

Slide 33

Slide 33 text

Background process ͷҟৗऴྃݕ஌ onexit() { exit_code=$? [ $exit_code -eq 0 ] && return payload="{...}" res=$(curl -sX POST --data-urlencode "payload=$payload" \ $SLACK_WEBHOOK_URL) if [ "$res" != "ok" ]; then echo "Failed to notify slack: res: ${res}, payload: ${payload}" >&2 fi } trap 'onexit' EXIT

Slide 34

Slide 34 text

εΫϦϓτͰมߋͨ͠ઃఆͷ൓ө stop presto-server start presto-server

Slide 35

Slide 35 text

·ͱΊ

Slide 36

Slide 36 text

• Repro Ͱ͸ϓογϡ௨஌ͷ഑৴ର৅நग़ͳͲʹ Presto Λ׆༻͍ͯ͠Δ • ෼ੳ΍ूܭ༻్Ͱ͸ͳ͘ Repro ͷαʔϏεͦͷ΋ͷΛࢧ͑ΔॏཁͳҰ෦ • σʔλιʔεͱͯ͠ओʹ Cassandra Λ࢖༻ • Bucketing ʹΑΓෛՙΛ෼ࢄ • Cassandra connector ͷ planning ʹ͸͕͔͔࣌ؒΔ • σʔλͷಛੑʹΑͬͯ͸ύʔςΟγϣϯ༗ແͷ֬ೝΛεΩοϓ͢ΔϝϦοτ͕େ͖͍ • ϘτϧωοΫಛఆͷͨΊʹσόοάϩάΛ༗ޮԽ͢Δͱେ͖ͳώϯτ͕ಘΒΕΔ • վ଄ͨ͠ίωΫλΛ EMR Ͱར༻͢Δʹ͸Ұ޻෉ඞཁ • Repro Ͱ͸ bootstrap actions Ͱ background process Λ্ཱͪ͛Δ͜ͱͰઃఆมߋ͍ͯ͠Δ ·ͱΊ

Slide 37

Slide 37 text

͓·͚

Slide 38

Slide 38 text

VisualVM ͷ Sampler ʹΑΔܭଌ 2020-11-20T01:04:12.310Z ... :: elapsed 7778ms :: planning 7571ms ... ͲΕ΋ߦ͖ண͘ઌ͸ CassandraPartitionManager#getCassandraPartitions ͕ͩ ෼ذ͕େมଟ͍ɾɾɾ