Upgrade to Pro — share decks privately, control downloads, hide ads and more …

位置情報を使ったサービス「スマポ」をPostgreSQLで作ってみた db tech showcase 2013 Tokyo

位置情報を使ったサービス「スマポ」をPostgreSQLで作ってみた db tech showcase 2013 Tokyo

Yoshiyuki Asaba

November 13, 2013
Tweet

More Decks by Yoshiyuki Asaba

Other Decks in Technology

Transcript

  1. Ґஔ৘ใΛ࢖ͬͨαʔϏεʮεϚ ϙʯΛPostgreSQLͰ࡞ͬͯΈͨ גࣜձࣾεϙοτϥΠτ ઙӋٛ೭

  2. ͸͡Ίʹ • ͜ͷࢿྉ͸εϚϙͰ͸͜͏΍ͬͯӡ༻͍ͯ͠·͢ͱ͍ ͏ࣄྫΛ঺հ͠·͢ɻ • ଞͷ؀ڥͰ΋ͦͷ··࢖͑Δ͔ʹ͍ͭͯ͸Έͳ͞Μͷ ͝൑அͷԼɺࢀߟʹ͍͚ͯͨͩ͠Δͱ޾͍Ͱ͕͢ɺԿ ͔͋ͬͯ΋ແอূͰ͓ئ͍͠·͢ɻ

  3. ձࣾ঺հ

  4. ձࣾ঺հ http://www.smapo.jp/

  5. εϚϙ঺հ

  6. None
  7. None
  8. ͳͥPostgreSQL͔

  9. PostgreSQLબఆཧ༝ • Ґஔ৘ใΛѻ͍͍ͨ • ʢಛʹॳظஈ֊Ͱ͸ʣαʔϏεӡӦͷϥϯχϯάίε τΛ཈͍͑ͨ • ϚχϡΞϧ͕ॆ࣮͍ͯ͠Δɺ౳

  10. POSTGIS

  11. ݱࡏ஍͔Β ͍ۙॱʹιʔτ shop A shop B shop C 100m 200m

    300m
  12. http://postgis.net/docs/manual-2.0/ST_Distance.html

  13. ӡ༻ฤ

  14. αʔόߏ੒ • AWSΛར༻ • Instance • m1.large * 2 (master/slave,

    availability zoneΛ෼͚͍ͯΔ) • ELB→App Servers→DB Servers • EBS • provisioned IOPS volume (IOPS=1000) • ෼ੳ؀ڥ͸ผͷσʔληϯλʔʹ഑ஔ
  15. MONITORING

  16. NEWRELIC • EnterpriseDB ͕ެ։͍ͯ͠ Δnewrelic pluginΛར༻ • webappͷύ ϑΥʔϚϯε ؂ࢹ΋Մೳ

  17. VACUUM/ANALYZE

  18. VACUUM/ANALYZE • VACUUM/ANALYZE͸ࠓͷͱ͜Ζautovacuum೚ͤ • UPDATEͷଟ͍ςʔϒϧ͸FILLFACTORΛઃఆ • HOT UPDATEͱݺ͹ΕΔ࠷దԽΛޮ͔ͤΔͨΊ • http://lets.postgresql.jp/documents/tutorial/hot_1/

  19. ྫɿFILLFACTOR=90 • ͬ͘͘͢͟͝Γͨ͠આ໌Ͱ͢ 8KB 8KB … 8KB 8KB PageHeader lineptr1(lp)

    lp2 lp4 lp3 … … lpN SpecialSpace tuple1 tuple2 tuple3 tuple4 … tupleN FreeSpace (90%௒͑ͨΒ࣍ͷϖʔδ)
  20. REPLICATION

  21. REPLICATION != BACKUP • replication͸৑௕Խɾෛՙ෼ࢄ͕໨త • backup͸Φϖϛεͨ࣌͠ʹ΋෮چͰ͖ͳ͍ͱ͍͚ͳ͍ • PITR(Point In

    Time Recovery)
  22. STREAMING REPLICATION • PostgreSQL 9.0͔Βಋೖ͞Εͨػೳ • SLAVE͸Read-onlyͳDBͱͯ͠ಈ͔͢͜ͱ΋Մೳ • hot standby

    wal_level = hot_standby max_wal_senders = 5 wal_keep_segments = 16 hot_standby = on max_standby_streaming_delay = 90s standby_mode = 'on' primary_conninfo = 'host=x.x.x.x port=5432 user=repl_user password=XXXXXXX' restore_command = 'envdir /etc/wal-e.d/env /usr/local/bin/wal-e wal-fetch "%f" "%p"' masterͷ postgresql.conf slaveͷpostgresql.conf slaveͷrecovery.conf
  23. archive_mode = on archive_command = 'envdir /etc/wal-e.d/ env /usr/local/bin/wal-e wal-push

    %p' WAL-E • https://github.com/wal-e/wal-e standby_mode = 'on' restore_command = 'envdir /etc/wal-e.d/ env /usr/local/bin/wal-e wal-fetch "%f" "%p"' postgresql.conf recovery.conf Amazon S3 PostgreSQL (master) PostgreSQL (slave) PostgreSQL (analytics) backup-push backup-fetch wal-push wal-fetch
  24. REPLICATION؂ࢹ • select * from pg_stat_replication; • replication͕ਖ਼ৗʹಈ͍͍ͯΕ͹slaveͷ਺͚ͩ݁Ռ͕ ؼͬͯ͘Δ •

    masterͰ࣮ߦ
  25. ANALYTICS

  26. ANALYTICS DB • fluentdͰϩάσʔλΛճऩ͠ɺ෼ੳઐ༻ͷPostgreSQL ΁֨ೲ • ֤छKPIΛPostgreSQLͰूܭ • Ϛγϯύϫʔ͕ඞཁͳͷͰɺ෺ཧαʔόͰߏங

  27. WARMUP

  28. σʔλΛΩϟογϡʹࡌͤΔ • cache? • PostgreSQLͷshared_buffer • OSͷϖʔδΩϟογϡ(free -mͰ֬ೝ) • ଞʹ΋͋Δ͕লུ

    • αʔό࠶ىಈ࣌ͳͲ͸Ωϟογϡ͕ΫϦΞ͞Ε͍ͯΔ

  29. ࡌͤํ • shared_buffer • SELECT count(*) FROM sample_table • ΠϯσοΫε͸ࡌΒͳ͍ͷͰ஫ҙ

    • page cache • (ionice -c 3) cat ෺ཧϑΝΠϧ > /dev/null
  30. ϑΝΠϧͷ৔ॴ test=# SELECT relname, current_setting('data_directory') || '/' || pg_relation_filepath(oid) as

    filepath, pg_relation_size(oid) as filesize FROM pg_class WHERE relname = 'sample_table'; ! relname | filepath | filesize --------------+-----------------------------------------------+---------- sample_table | /var/lib/postgresql/9.2/test/base/16385/18572 | 1351680 (1 row) test=# SELECT relname, current_setting('data_directory') || '/' || pg_relation_filepath(oid) as filepath, pg_relation_size(oid) as filesize FROM pg_class WHERE oid IN (SELECT indexrelid FROM pg_index WHERE indrelid = (SELECT oid FROM pg_class WHERE relname = ‘sample_table')); ! relname | filepath | filesize --------------------+----------------------------------------------- +---------- sample_table_pkey | /var/lib/postgresql/9.2/test/base/16385/22203 | 16384 idx_sample_table_b | /var/lib/postgresql/9.2/test/base/16385/22205 | 16384 idx_sample_table_c | /var/lib/postgresql/9.2/test/base/16385/22206 | 16384
  31. ӡ༻τϥϒϧू

  32. #1 SLOW QUERY

  33. ԿΛݟΑ͏͔ʁ • newrelicͷϨεϙϯελΠϜ • slow query log • log_min_duration_statement =

    1s • newrelic/cloudwatchͷαʔόεςʔλε
  34. ࢀরͷνϡʔχϯά • EXPLAIN ANALYZEͰͲͷ࣮ߦϓϥϯ͕஗͍͔֬ೝ • table scan͕஗͍ • index͕ແ͍ or

    index͕ෆద੾ʁ • table join͕஗͍ • ANALYZE͕଍Γͳ͍͔֬ೝ • limit͕۟͋Δ৔߹͸subqueryʹͯ͠JOINͷճ਺ΛݮΒͤΔ͔ࢼ͢
  35. EXPLAIN ANALYZE • ࣮ߦܭը͕ͦΕͧΕͲΕ͘Β͍͔͔͔࣌ؒͬͨݟΔ͜ ͱ͕Մೳ test=# EXPLAIN ANALYZE SELECT t1.a,

    t2.a FROM t1, t2 WHERE t1.a = t2.a and t1.a < 100; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.85..805.31 rows=95 width=8) (actual time=0.038..3.319 rows=99 loops=1) -> Index Only Scan using t1_pkey on t1 (cost=0.42..10.09 rows=95 width=4) (actual time=0.012..0.449 rows=99 loops=1) Index Cond: (a < 100) Heap Fetches: 99 -> Index Only Scan using idx_t2_a on t2 (cost=0.42..8.36 rows=1 width=4) (actual time=0.006..0.011 rows=1 loops=99) Index Cond: (a = t1.a) Heap Fetches: 99 Total runtime: 3.782 ms (8 rows) ϓϥϯϊʔυ ίετ ࣮ߦ࣌ؒ
  36. ΍ͬͨ͜ͱ • ࢖ͬͯ΄͍͠ΠϯσοΫε͕࢖ΘΕ͍ͯͳ͍ͷͰΫΤ ϦΛॻ͖׵͑ • ΠϯσοΫεͳ͍΋ͷ͸௥Ճ • JOINͨ͋͠ͱʹLIMIT͍ͯͨ͠Օॴ͸ɺઌʹLIMIT͔ͯ͠ ΒJOIN͢ΔΑ͏ʹมߋ SELECT

    *FROM a, b WHERE a.id = b.id and a.hoge >= 1000 LIMIT10; ↓ SELECT* FROM (SELECT * FROM a WHERE a.hoge >= 1000 LIMIT10) as aa, b WHERE aa.id = b.id
  37. #2 MAJOR VERSION UP

  38. PostgreSQL 9.1→9.2 PostGIS 1.5→2.0 tool down time comment pg_upgrade ˚

    postgisͷupgradeʹ ରԠ͍ͯ͠ͳ͍ʁ pg_dump/ pg_restore ☓ ఀࢭ͕࣌ؒ௕͍ slony-I ̋ ໘౗
  39. MASTER੾Γସ͑ PostgreSQL (چmaster) PostgreSQL (چslave) PostgreSQL (৽master) PostgreSQL (৽slave) AppServer

    slonyͷreplication streaming replication
  40. MASTER੾Γସ͑ PostgreSQL (چmaster) PostgreSQL (چslave) PostgreSQL (৽master) PostgreSQL (৽slave) AppServer

    slonyͷreplication ఀࢭ streaming replication
  41. ݕূ؀ڥͰ࣮ݧ

  42. ໰୊ͳ͠ʂ

  43. ੾Γସ͑౰೔

  44. SIGSEGV!!!

  45. ݁ہpg_dump/pg_restore΍Γ· ͨ͠ɻ͢Έ·ͤΜɻ 9.2→9.3ͰϦϕϯδ༧ఆɻ

  46. #3 LOCK଴ͪ໰୊

  47. SELECT WAITING • DB migration࣌ͳͲʹϩοΫ଴͕ͪൃੜ • psίϚϯυ • ps aux

    | grep postgres | grep waiting • SELECT * FROM pg_locks; • ΋͏গ͠ࡉ͔͘ϩοΫͷ֫ಘঢ়گΛ֬ೝͰ͖Δ
  48. ΫΤϦΩϟϯηϧ • ΫΤϦΛΩϟϯηϧ • SELECT pg_cancel_backend(pid); • or kill -INT

    pid • ΫΤϦΛΩϟϯηϧͯ͠όοΫΤϯυϓϩηεΛམͱ͢ • SELECT pg_terminate_backend(pid) • or kill -TERM pid
  49. #4 DEADLOCK

  50. ERROR: deadlock detected

  51. σουϩοΫͷݪҼ • ϩοΫΛऔΔॱ൪͕ҟͳΔͨΊ • ςʔϒϧϩοΫɺߦϩοΫͳͲ • ϩοΫ֫ಘ • ໌ࣔతͳϩοΫ •

    ҉໧తͳϩοΫ
  52. ࣮ࡍʹى͖ͨέʔεʢ؆ུ൛ʣ 2013-10-30 00:11:22 JST DETAIL: Process 3225 waits for ShareLock

    on transaction 11759339; blocked by process 3040. Process 3040 waits for ShareLock on transaction 11759337; blocked by process 3225. Process 3225: insert into foo (x,y,z) select x, 12345, z from bar where id in (( values (1), (2), (3) ) except ( select x from foo where date = '...')) ! Process 3040: insert into foo (x,y,z) select x, 12345, z from bar where id in (( values (2), (3), (1) ) except( select x from foo where date = '...')) !
  53. ௐࠪ • લఏɿ։ൃ؀ڥͰ΍Δ͜ͱ • postgresql.conf • log_statement = all •

    log_line_prefixʹ%pΛ͚ͭΔ • COMMIT௚લͷpg_locksΛ֬ೝ
  54. Ұ࣍ௐࠪͰΘ͔ͬͨ͜ͱ • ςʔϒϧϩοΫͷॱ൪͸ಉ͡ • ͦ΋ͦ΋ڧ͍ϩοΫϨϕϧͰςʔϒϧϩοΫऔͬͯͳ͍ • INSERTͳͷͰfooʹର͢ΔߦϩοΫ͸͋Δͷ͔ʁ • ཁ͸Α͘Θ͔Βͳ͔ͬͨ •

    ͨͩɺ͔֬ʹinsertͰยํ͕଴ͨ͞ΕΔ
  55. ࢭ·͍ͬͯΔՕॴΛௐ΂Δ • PostgreSQLΛσόοάϏϧυ • CFLAGS=-O0 ./configure —prefix=$HOME —enable- debug •

    ยํͰSELECT pg_backend_pid() • gdb -p backendͷpid
  56. BACK TRACE (gdb) bt … #5 0x00000000006f0ada in LockAcquire (locktag=0x7fff7ce28d20,

    lockmode=5, sessionLock=0 '\000', dontWait=0 '\000') at lock.c:662 #6 0x00000000006effec in XactLockTableWait (xid=768) at lmgr.c:495 #7 0x00000000004890e4 in _bt_doinsert (rel=0x7fa0c1634838, itup=0x2122468, checkUnique=UNIQUE_CHECK_YES, heapRel=0x7fa0c162f820) at nbtinsert.c:168 #8 0x000000000048f3b4 in btinsert (fcinfo=0x7fff7ce28e40) at nbtree.c:257 #9 0x0000000000819bb5 in FunctionCall6Coll (flinfo=0x2115650, collation=0, arg1=140328416004152, arg2=140735288611488, arg3=140735288611840, arg4=34743148, arg5=140328415983648, arg6=1) at fmgr.c:1439 #10 0x0000000000487c64 in index_insert (indexRelation=0x7fa0c1634838, values=0x7fff7ce292a0, isnull=0x7fff7ce29400 "", heap_t_ctid=0x212236c, heapRelation=0x7fa0c162f820, checkUnique=UNIQUE_CHECK_YES) at indexam.c:216 #11 0x00000000005f29aa in ExecInsertIndexTuples (slot=0x21167c0, tupleid=0x212236c, estate=0x2115e60) at execUtils.c:1087 #12 0x0000000000605273 in ExecInsert (slot=0x21167c0, planSlot=0x21167c0, estate=0x2115e60, canSetTag=1 '\001') at nodeModifyTable.c:248
  57. None
  58. create table tt(a text unique); select pg_backend_pid(); begin; insert into

    tt values (‘aa’), (‘bb’) select pg_backend_pid(); begin; insert into tt values (‘bb’), (‘aa’) (gdb) b _bt_doinsert Breakpoint 1 at 0x488fc7: file nbtinsert.c, line 106. (gdb) c Continuing. ! Breakpoint 1, _bt_doinsert (….) at nbtinsert.c:106 (gdb) c Continuing. ! Breakpoint 1, _bt_doinsert (….) at nbtinsert.c:106 ‘aa’Λinsertͨ͠ͱ͜ΖͰࢭΊΔ ! (gdb) c (gdb) b _bt_doinsert Breakpoint 1 at 0x488fc7: file nbtinsert.c, line 106. (gdb) c Continuing. ! Breakpoint 1, _bt_doinsert (….) at nbtinsert.c:106 (gdb) c Continuing. ! Breakpoint 1, _bt_doinsert (….) at nbtinsert.c:106 ‘bb’Λinsertͨ͠ͱ͜ΖͰࢭΊΔ ! (gdb) c psql psql gdb gdb
  59. ճආํ๏ 2013-10-30 00:11:22 JST DETAIL: Process 3225 waits for ShareLock

    on transaction 11759339; blocked by process 3040. Process 3040 waits for ShareLock on transaction 11759337; blocked by process 3225. Process 3225: insert into foo (x,y,z) select x, 12345, z from bar where x in (( values (1), (2), (3) ) except ( select x from foo where date = ‘…’)) ORDER BY x ! Process 3040: insert into foo (x,y,z) select x, 12345, z from bar where x in (( values (2), (3), (1) ) except( select x from foo where date = ‘...')) ORDER BY x !
  60. #5 TV์ө

  61. 9/9 19:00-21:00 NTV ༗٢θϛ

  62. AWS • instanceΛscale up • m1.large → m3.2xlarge • app

    server૿ڧ • Elastic LoadBalancerͷpre-warming
  63. PERFORMANCE TEST • pgbenchΛ࢖༻ • tps(transaction per second)Λ஌Δ͜ͱ͕Ͱ͖Δ • -f

    ࣮ߦ͍ͨ͠SQLϑΝΠϧ • -c ฒྻ਺ • -t τϥϯβΫγϣϯ਺ • New Relic/CloudWatchͰύϑΥʔϚϯεΛ؂ࢹ
  64. ౰೔ • warmupΛ௚લͰ࣮ࢪ • master/slave • ͋ͱ͸فΔͷΈ

  65. ৼΓฦΓ • 3000over req/sec • app server͕࣋ͪͨ͑͜ΒΕͳ͍࣌ؒଳ(਺෼)͕ൃੜ • ΰʔϧσϯ์ૹͷΞΫηε͸͛͢ʔ •

    DBʹ͍ͭͯ͸ɺcheckpointͷνϡʔχϯά͕؁͘ɺ diskॻ͖ࠐΈ͕݁ߏൃੜͨ͠
  66. #6 TV์ө(࠶์ૹ)

  67. 10/614:00-16:00 NTV ༗٢θϛ(࠶)

  68. ࠶์ૹΛͨ·ͨ·ؾ͕͍ͭͨ ͷ͕2೔લ ʢڭ͑ͯΑɻɻɻʣ

  69. postgresql.conf • memory • shared_buffer=౥ࡌϝϞϦͷ1/4ఔ౓ • wal_buffer=16MB • checkpoint •

    checkpoint_segments = 64 • checkpoint_timeout = 1h • checkpoint_completion_target = 0.9 ! • planner • effective_cache_size=page cacheͷαΠ ζ • random_page_cost=2.0 • lock • deadlock_timeout = 10s
  70. ৼΓฦΓ • ໰୊ͳ͘ࡹ͚ͨ • ࠶์ૹ͸࿈བྷ͜ͳ͍ͷͰཁ஫ҙ 4000%?

  71. ·ͱΊ

  72. PostgreSQL࢖ͬͯΈͯ • ελʔτΞοϓͷαʔϏεͰ࢖͏RDBMSͱͯ͠Α͍ • ػೳ໘ɺίετ໘ • ΋ͪΖΜMySQL΋͍͢͝ͱࢥ͍·͢ • ӡ༻΋޻෉࣍ୈͰ৭ʑͰ͖Δ •

    Ұ୆Ͱ݁ߏͳϦΫΤετ਺Λ଱͑ΒΕΔʂ
  73. એ఻

  74. ΤϯδχΞઈࢍืूதʂ (ಛʹΠϯϑϥ΍Γ͍ͨਓ) http://www.smapo.jp/recruit/index.html

  75. ͓ΘΓ