位置情報を使ったサービス「スマポ」をPostgreSQLで作ってみた db tech showcase 2013 Tokyo

位置情報を使ったサービス「スマポ」をPostgreSQLで作ってみた db tech showcase 2013 Tokyo

82f797c1864841a4aca94079b8508004?s=128

Yoshiyuki Asaba

November 13, 2013
Tweet

Transcript

  1. 6.
  2. 7.
  3. 10.
  4. 13.
  5. 14.

    αʔόߏ੒ • AWSΛར༻ • Instance • m1.large * 2 (master/slave,

    availability zoneΛ෼͚͍ͯΔ) • ELB→App Servers→DB Servers • EBS • provisioned IOPS volume (IOPS=1000) • ෼ੳ؀ڥ͸ผͷσʔληϯλʔʹ഑ஔ
  6. 19.

    ྫɿFILLFACTOR=90 • ͬ͘͘͢͟͝Γͨ͠આ໌Ͱ͢ 8KB 8KB … 8KB 8KB PageHeader lineptr1(lp)

    lp2 lp4 lp3 … … lpN SpecialSpace tuple1 tuple2 tuple3 tuple4 … tupleN FreeSpace (90%௒͑ͨΒ࣍ͷϖʔδ)
  7. 22.

    STREAMING REPLICATION • PostgreSQL 9.0͔Βಋೖ͞Εͨػೳ • SLAVE͸Read-onlyͳDBͱͯ͠ಈ͔͢͜ͱ΋Մೳ • hot standby

    wal_level = hot_standby max_wal_senders = 5 wal_keep_segments = 16 hot_standby = on max_standby_streaming_delay = 90s standby_mode = 'on' primary_conninfo = 'host=x.x.x.x port=5432 user=repl_user password=XXXXXXX' restore_command = 'envdir /etc/wal-e.d/env /usr/local/bin/wal-e wal-fetch "%f" "%p"' masterͷ postgresql.conf slaveͷpostgresql.conf slaveͷrecovery.conf
  8. 23.

    archive_mode = on archive_command = 'envdir /etc/wal-e.d/ env /usr/local/bin/wal-e wal-push

    %p' WAL-E • https://github.com/wal-e/wal-e standby_mode = 'on' restore_command = 'envdir /etc/wal-e.d/ env /usr/local/bin/wal-e wal-fetch "%f" "%p"' postgresql.conf recovery.conf Amazon S3 PostgreSQL (master) PostgreSQL (slave) PostgreSQL (analytics) backup-push backup-fetch wal-push wal-fetch
  9. 25.
  10. 27.
  11. 29.
  12. 30.

    ϑΝΠϧͷ৔ॴ test=# SELECT relname, current_setting('data_directory') || '/' || pg_relation_filepath(oid) as

    filepath, pg_relation_size(oid) as filesize FROM pg_class WHERE relname = 'sample_table'; ! relname | filepath | filesize --------------+-----------------------------------------------+---------- sample_table | /var/lib/postgresql/9.2/test/base/16385/18572 | 1351680 (1 row) test=# SELECT relname, current_setting('data_directory') || '/' || pg_relation_filepath(oid) as filepath, pg_relation_size(oid) as filesize FROM pg_class WHERE oid IN (SELECT indexrelid FROM pg_index WHERE indrelid = (SELECT oid FROM pg_class WHERE relname = ‘sample_table')); ! relname | filepath | filesize --------------------+----------------------------------------------- +---------- sample_table_pkey | /var/lib/postgresql/9.2/test/base/16385/22203 | 16384 idx_sample_table_b | /var/lib/postgresql/9.2/test/base/16385/22205 | 16384 idx_sample_table_c | /var/lib/postgresql/9.2/test/base/16385/22206 | 16384
  13. 34.

    ࢀরͷνϡʔχϯά • EXPLAIN ANALYZEͰͲͷ࣮ߦϓϥϯ͕஗͍͔֬ೝ • table scan͕஗͍ • index͕ແ͍ or

    index͕ෆద੾ʁ • table join͕஗͍ • ANALYZE͕଍Γͳ͍͔֬ೝ • limit͕۟͋Δ৔߹͸subqueryʹͯ͠JOINͷճ਺ΛݮΒͤΔ͔ࢼ͢
  14. 35.

    EXPLAIN ANALYZE • ࣮ߦܭը͕ͦΕͧΕͲΕ͘Β͍͔͔͔࣌ؒͬͨݟΔ͜ ͱ͕Մೳ test=# EXPLAIN ANALYZE SELECT t1.a,

    t2.a FROM t1, t2 WHERE t1.a = t2.a and t1.a < 100; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.85..805.31 rows=95 width=8) (actual time=0.038..3.319 rows=99 loops=1) -> Index Only Scan using t1_pkey on t1 (cost=0.42..10.09 rows=95 width=4) (actual time=0.012..0.449 rows=99 loops=1) Index Cond: (a < 100) Heap Fetches: 99 -> Index Only Scan using idx_t2_a on t2 (cost=0.42..8.36 rows=1 width=4) (actual time=0.006..0.011 rows=1 loops=99) Index Cond: (a = t1.a) Heap Fetches: 99 Total runtime: 3.782 ms (8 rows) ϓϥϯϊʔυ ίετ ࣮ߦ࣌ؒ
  15. 36.

    ΍ͬͨ͜ͱ • ࢖ͬͯ΄͍͠ΠϯσοΫε͕࢖ΘΕ͍ͯͳ͍ͷͰΫΤ ϦΛॻ͖׵͑ • ΠϯσοΫεͳ͍΋ͷ͸௥Ճ • JOINͨ͋͠ͱʹLIMIT͍ͯͨ͠Օॴ͸ɺઌʹLIMIT͔ͯ͠ ΒJOIN͢ΔΑ͏ʹมߋ SELECT

    *FROM a, b WHERE a.id = b.id and a.hoge >= 1000 LIMIT10; ↓ SELECT* FROM (SELECT * FROM a WHERE a.hoge >= 1000 LIMIT10) as aa, b WHERE aa.id = b.id
  16. 38.

    PostgreSQL 9.1→9.2 PostGIS 1.5→2.0 tool down time comment pg_upgrade ˚

    postgisͷupgradeʹ ରԠ͍ͯ͠ͳ͍ʁ pg_dump/ pg_restore ☓ ఀࢭ͕࣌ؒ௕͍ slony-I ̋ ໘౗
  17. 47.

    SELECT WAITING • DB migration࣌ͳͲʹϩοΫ଴͕ͪൃੜ • psίϚϯυ • ps aux

    | grep postgres | grep waiting • SELECT * FROM pg_locks; • ΋͏গ͠ࡉ͔͘ϩοΫͷ֫ಘঢ়گΛ֬ೝͰ͖Δ
  18. 48.

    ΫΤϦΩϟϯηϧ • ΫΤϦΛΩϟϯηϧ • SELECT pg_cancel_backend(pid); • or kill -INT

    pid • ΫΤϦΛΩϟϯηϧͯ͠όοΫΤϯυϓϩηεΛམͱ͢ • SELECT pg_terminate_backend(pid) • or kill -TERM pid
  19. 52.

    ࣮ࡍʹى͖ͨέʔεʢ؆ུ൛ʣ 2013-10-30 00:11:22 JST DETAIL: Process 3225 waits for ShareLock

    on transaction 11759339; blocked by process 3040. Process 3040 waits for ShareLock on transaction 11759337; blocked by process 3225. Process 3225: insert into foo (x,y,z) select x, 12345, z from bar where id in (( values (1), (2), (3) ) except ( select x from foo where date = '...')) ! Process 3040: insert into foo (x,y,z) select x, 12345, z from bar where id in (( values (2), (3), (1) ) except( select x from foo where date = '...')) !
  20. 53.

    ௐࠪ • લఏɿ։ൃ؀ڥͰ΍Δ͜ͱ • postgresql.conf • log_statement = all •

    log_line_prefixʹ%pΛ͚ͭΔ • COMMIT௚લͷpg_locksΛ֬ೝ
  21. 56.

    BACK TRACE (gdb) bt … #5 0x00000000006f0ada in LockAcquire (locktag=0x7fff7ce28d20,

    lockmode=5, sessionLock=0 '\000', dontWait=0 '\000') at lock.c:662 #6 0x00000000006effec in XactLockTableWait (xid=768) at lmgr.c:495 #7 0x00000000004890e4 in _bt_doinsert (rel=0x7fa0c1634838, itup=0x2122468, checkUnique=UNIQUE_CHECK_YES, heapRel=0x7fa0c162f820) at nbtinsert.c:168 #8 0x000000000048f3b4 in btinsert (fcinfo=0x7fff7ce28e40) at nbtree.c:257 #9 0x0000000000819bb5 in FunctionCall6Coll (flinfo=0x2115650, collation=0, arg1=140328416004152, arg2=140735288611488, arg3=140735288611840, arg4=34743148, arg5=140328415983648, arg6=1) at fmgr.c:1439 #10 0x0000000000487c64 in index_insert (indexRelation=0x7fa0c1634838, values=0x7fff7ce292a0, isnull=0x7fff7ce29400 "", heap_t_ctid=0x212236c, heapRelation=0x7fa0c162f820, checkUnique=UNIQUE_CHECK_YES) at indexam.c:216 #11 0x00000000005f29aa in ExecInsertIndexTuples (slot=0x21167c0, tupleid=0x212236c, estate=0x2115e60) at execUtils.c:1087 #12 0x0000000000605273 in ExecInsert (slot=0x21167c0, planSlot=0x21167c0, estate=0x2115e60, canSetTag=1 '\001') at nodeModifyTable.c:248
  22. 57.
  23. 58.

    create table tt(a text unique); select pg_backend_pid(); begin; insert into

    tt values (‘aa’), (‘bb’) select pg_backend_pid(); begin; insert into tt values (‘bb’), (‘aa’) (gdb) b _bt_doinsert Breakpoint 1 at 0x488fc7: file nbtinsert.c, line 106. (gdb) c Continuing. ! Breakpoint 1, _bt_doinsert (….) at nbtinsert.c:106 (gdb) c Continuing. ! Breakpoint 1, _bt_doinsert (….) at nbtinsert.c:106 ‘aa’Λinsertͨ͠ͱ͜ΖͰࢭΊΔ ! (gdb) c (gdb) b _bt_doinsert Breakpoint 1 at 0x488fc7: file nbtinsert.c, line 106. (gdb) c Continuing. ! Breakpoint 1, _bt_doinsert (….) at nbtinsert.c:106 (gdb) c Continuing. ! Breakpoint 1, _bt_doinsert (….) at nbtinsert.c:106 ‘bb’Λinsertͨ͠ͱ͜ΖͰࢭΊΔ ! (gdb) c psql psql gdb gdb
  24. 59.

    ճආํ๏ 2013-10-30 00:11:22 JST DETAIL: Process 3225 waits for ShareLock

    on transaction 11759339; blocked by process 3040. Process 3040 waits for ShareLock on transaction 11759337; blocked by process 3225. Process 3225: insert into foo (x,y,z) select x, 12345, z from bar where x in (( values (1), (2), (3) ) except ( select x from foo where date = ‘…’)) ORDER BY x ! Process 3040: insert into foo (x,y,z) select x, 12345, z from bar where x in (( values (2), (3), (1) ) except( select x from foo where date = ‘...')) ORDER BY x !
  25. 62.

    AWS • instanceΛscale up • m1.large → m3.2xlarge • app

    server૿ڧ • Elastic LoadBalancerͷpre-warming
  26. 63.

    PERFORMANCE TEST • pgbenchΛ࢖༻ • tps(transaction per second)Λ஌Δ͜ͱ͕Ͱ͖Δ • -f

    ࣮ߦ͍ͨ͠SQLϑΝΠϧ • -c ฒྻ਺ • -t τϥϯβΫγϣϯ਺ • New Relic/CloudWatchͰύϑΥʔϚϯεΛ؂ࢹ
  27. 69.

    postgresql.conf • memory • shared_buffer=౥ࡌϝϞϦͷ1/4ఔ౓ • wal_buffer=16MB • checkpoint •

    checkpoint_segments = 64 • checkpoint_timeout = 1h • checkpoint_completion_target = 0.9 ! • planner • effective_cache_size=page cacheͷαΠ ζ • random_page_cost=2.0 • lock • deadlock_timeout = 10s
  28. 71.
  29. 73.
  30. 75.