Slide 1

Slide 1 text

Ґஔ৘ใΛ࢖ͬͨαʔϏεʮεϚ ϙʯΛPostgreSQLͰ࡞ͬͯΈͨ גࣜձࣾεϙοτϥΠτ ઙӋٛ೭

Slide 2

Slide 2 text

͸͡Ίʹ • ͜ͷࢿྉ͸εϚϙͰ͸͜͏΍ͬͯӡ༻͍ͯ͠·͢ͱ͍ ͏ࣄྫΛ঺հ͠·͢ɻ • ଞͷ؀ڥͰ΋ͦͷ··࢖͑Δ͔ʹ͍ͭͯ͸Έͳ͞Μͷ ͝൑அͷԼɺࢀߟʹ͍͚ͯͨͩ͠Δͱ޾͍Ͱ͕͢ɺԿ ͔͋ͬͯ΋ແอূͰ͓ئ͍͠·͢ɻ

Slide 3

Slide 3 text

ձࣾ঺հ

Slide 4

Slide 4 text

ձࣾ঺հ http://www.smapo.jp/

Slide 5

Slide 5 text

εϚϙ঺հ

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

ͳͥPostgreSQL͔

Slide 9

Slide 9 text

PostgreSQLબఆཧ༝ • Ґஔ৘ใΛѻ͍͍ͨ • ʢಛʹॳظஈ֊Ͱ͸ʣαʔϏεӡӦͷϥϯχϯάίε τΛ཈͍͑ͨ • ϚχϡΞϧ͕ॆ࣮͍ͯ͠Δɺ౳

Slide 10

Slide 10 text

POSTGIS

Slide 11

Slide 11 text

ݱࡏ஍͔Β ͍ۙॱʹιʔτ shop A shop B shop C 100m 200m 300m

Slide 12

Slide 12 text

http://postgis.net/docs/manual-2.0/ST_Distance.html

Slide 13

Slide 13 text

ӡ༻ฤ

Slide 14

Slide 14 text

αʔόߏ੒ • AWSΛར༻ • Instance • m1.large * 2 (master/slave, availability zoneΛ෼͚͍ͯΔ) • ELB→App Servers→DB Servers • EBS • provisioned IOPS volume (IOPS=1000) • ෼ੳ؀ڥ͸ผͷσʔληϯλʔʹ഑ஔ

Slide 15

Slide 15 text

MONITORING

Slide 16

Slide 16 text

NEWRELIC • EnterpriseDB ͕ެ։͍ͯ͠ Δnewrelic pluginΛར༻ • webappͷύ ϑΥʔϚϯε ؂ࢹ΋Մೳ

Slide 17

Slide 17 text

VACUUM/ANALYZE

Slide 18

Slide 18 text

VACUUM/ANALYZE • VACUUM/ANALYZE͸ࠓͷͱ͜Ζautovacuum೚ͤ • UPDATEͷଟ͍ςʔϒϧ͸FILLFACTORΛઃఆ • HOT UPDATEͱݺ͹ΕΔ࠷దԽΛޮ͔ͤΔͨΊ • http://lets.postgresql.jp/documents/tutorial/hot_1/

Slide 19

Slide 19 text

ྫɿFILLFACTOR=90 • ͬ͘͘͢͟͝Γͨ͠આ໌Ͱ͢ 8KB 8KB … 8KB 8KB PageHeader lineptr1(lp) lp2 lp4 lp3 … … lpN SpecialSpace tuple1 tuple2 tuple3 tuple4 … tupleN FreeSpace (90%௒͑ͨΒ࣍ͷϖʔδ)

Slide 20

Slide 20 text

REPLICATION

Slide 21

Slide 21 text

REPLICATION != BACKUP • replication͸৑௕Խɾෛՙ෼ࢄ͕໨త • backup͸Φϖϛεͨ࣌͠ʹ΋෮چͰ͖ͳ͍ͱ͍͚ͳ͍ • PITR(Point In Time Recovery)

Slide 22

Slide 22 text

STREAMING REPLICATION • PostgreSQL 9.0͔Βಋೖ͞Εͨػೳ • SLAVE͸Read-onlyͳDBͱͯ͠ಈ͔͢͜ͱ΋Մೳ • hot standby wal_level = hot_standby max_wal_senders = 5 wal_keep_segments = 16 hot_standby = on max_standby_streaming_delay = 90s standby_mode = 'on' primary_conninfo = 'host=x.x.x.x port=5432 user=repl_user password=XXXXXXX' restore_command = 'envdir /etc/wal-e.d/env /usr/local/bin/wal-e wal-fetch "%f" "%p"' masterͷ postgresql.conf slaveͷpostgresql.conf slaveͷrecovery.conf

Slide 23

Slide 23 text

archive_mode = on archive_command = 'envdir /etc/wal-e.d/ env /usr/local/bin/wal-e wal-push %p' WAL-E • https://github.com/wal-e/wal-e standby_mode = 'on' restore_command = 'envdir /etc/wal-e.d/ env /usr/local/bin/wal-e wal-fetch "%f" "%p"' postgresql.conf recovery.conf Amazon S3 PostgreSQL (master) PostgreSQL (slave) PostgreSQL (analytics) backup-push backup-fetch wal-push wal-fetch

Slide 24

Slide 24 text

REPLICATION؂ࢹ • select * from pg_stat_replication; • replication͕ਖ਼ৗʹಈ͍͍ͯΕ͹slaveͷ਺͚ͩ݁Ռ͕ ؼͬͯ͘Δ • masterͰ࣮ߦ

Slide 25

Slide 25 text

ANALYTICS

Slide 26

Slide 26 text

ANALYTICS DB • fluentdͰϩάσʔλΛճऩ͠ɺ෼ੳઐ༻ͷPostgreSQL ΁֨ೲ • ֤छKPIΛPostgreSQLͰूܭ • Ϛγϯύϫʔ͕ඞཁͳͷͰɺ෺ཧαʔόͰߏங

Slide 27

Slide 27 text

WARMUP

Slide 28

Slide 28 text

σʔλΛΩϟογϡʹࡌͤΔ • cache? • PostgreSQLͷshared_buffer • OSͷϖʔδΩϟογϡ(free -mͰ֬ೝ) • ଞʹ΋͋Δ͕লུ • αʔό࠶ىಈ࣌ͳͲ͸Ωϟογϡ͕ΫϦΞ͞Ε͍ͯΔ


Slide 29

Slide 29 text

ࡌͤํ • shared_buffer • SELECT count(*) FROM sample_table • ΠϯσοΫε͸ࡌΒͳ͍ͷͰ஫ҙ • page cache • (ionice -c 3) cat ෺ཧϑΝΠϧ > /dev/null

Slide 30

Slide 30 text

ϑΝΠϧͷ৔ॴ test=# SELECT relname, current_setting('data_directory') || '/' || pg_relation_filepath(oid) as filepath, pg_relation_size(oid) as filesize FROM pg_class WHERE relname = 'sample_table'; ! relname | filepath | filesize --------------+-----------------------------------------------+---------- sample_table | /var/lib/postgresql/9.2/test/base/16385/18572 | 1351680 (1 row) test=# SELECT relname, current_setting('data_directory') || '/' || pg_relation_filepath(oid) as filepath, pg_relation_size(oid) as filesize FROM pg_class WHERE oid IN (SELECT indexrelid FROM pg_index WHERE indrelid = (SELECT oid FROM pg_class WHERE relname = ‘sample_table')); ! relname | filepath | filesize --------------------+----------------------------------------------- +---------- sample_table_pkey | /var/lib/postgresql/9.2/test/base/16385/22203 | 16384 idx_sample_table_b | /var/lib/postgresql/9.2/test/base/16385/22205 | 16384 idx_sample_table_c | /var/lib/postgresql/9.2/test/base/16385/22206 | 16384

Slide 31

Slide 31 text

ӡ༻τϥϒϧू

Slide 32

Slide 32 text

#1 SLOW QUERY

Slide 33

Slide 33 text

ԿΛݟΑ͏͔ʁ • newrelicͷϨεϙϯελΠϜ • slow query log • log_min_duration_statement = 1s • newrelic/cloudwatchͷαʔόεςʔλε

Slide 34

Slide 34 text

ࢀরͷνϡʔχϯά • EXPLAIN ANALYZEͰͲͷ࣮ߦϓϥϯ͕஗͍͔֬ೝ • table scan͕஗͍ • index͕ແ͍ or index͕ෆద੾ʁ • table join͕஗͍ • ANALYZE͕଍Γͳ͍͔֬ೝ • limit͕۟͋Δ৔߹͸subqueryʹͯ͠JOINͷճ਺ΛݮΒͤΔ͔ࢼ͢

Slide 35

Slide 35 text

EXPLAIN ANALYZE • ࣮ߦܭը͕ͦΕͧΕͲΕ͘Β͍͔͔͔࣌ؒͬͨݟΔ͜ ͱ͕Մೳ test=# EXPLAIN ANALYZE SELECT t1.a, t2.a FROM t1, t2 WHERE t1.a = t2.a and t1.a < 100; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.85..805.31 rows=95 width=8) (actual time=0.038..3.319 rows=99 loops=1) -> Index Only Scan using t1_pkey on t1 (cost=0.42..10.09 rows=95 width=4) (actual time=0.012..0.449 rows=99 loops=1) Index Cond: (a < 100) Heap Fetches: 99 -> Index Only Scan using idx_t2_a on t2 (cost=0.42..8.36 rows=1 width=4) (actual time=0.006..0.011 rows=1 loops=99) Index Cond: (a = t1.a) Heap Fetches: 99 Total runtime: 3.782 ms (8 rows) ϓϥϯϊʔυ ίετ ࣮ߦ࣌ؒ

Slide 36

Slide 36 text

΍ͬͨ͜ͱ • ࢖ͬͯ΄͍͠ΠϯσοΫε͕࢖ΘΕ͍ͯͳ͍ͷͰΫΤ ϦΛॻ͖׵͑ • ΠϯσοΫεͳ͍΋ͷ͸௥Ճ • JOINͨ͋͠ͱʹLIMIT͍ͯͨ͠Օॴ͸ɺઌʹLIMIT͔ͯ͠ ΒJOIN͢ΔΑ͏ʹมߋ SELECT *FROM a, b WHERE a.id = b.id and a.hoge >= 1000 LIMIT10; ↓ SELECT* FROM (SELECT * FROM a WHERE a.hoge >= 1000 LIMIT10) as aa, b WHERE aa.id = b.id

Slide 37

Slide 37 text

#2 MAJOR VERSION UP

Slide 38

Slide 38 text

PostgreSQL 9.1→9.2 PostGIS 1.5→2.0 tool down time comment pg_upgrade ˚ postgisͷupgradeʹ ରԠ͍ͯ͠ͳ͍ʁ pg_dump/ pg_restore ☓ ఀࢭ͕࣌ؒ௕͍ slony-I ̋ ໘౗

Slide 39

Slide 39 text

MASTER੾Γସ͑ PostgreSQL (چmaster) PostgreSQL (چslave) PostgreSQL (৽master) PostgreSQL (৽slave) AppServer slonyͷreplication streaming replication

Slide 40

Slide 40 text

MASTER੾Γସ͑ PostgreSQL (چmaster) PostgreSQL (چslave) PostgreSQL (৽master) PostgreSQL (৽slave) AppServer slonyͷreplication ఀࢭ streaming replication

Slide 41

Slide 41 text

ݕূ؀ڥͰ࣮ݧ

Slide 42

Slide 42 text

໰୊ͳ͠ʂ

Slide 43

Slide 43 text

੾Γସ͑౰೔

Slide 44

Slide 44 text

SIGSEGV!!!

Slide 45

Slide 45 text

݁ہpg_dump/pg_restore΍Γ· ͨ͠ɻ͢Έ·ͤΜɻ 9.2→9.3ͰϦϕϯδ༧ఆɻ

Slide 46

Slide 46 text

#3 LOCK଴ͪ໰୊

Slide 47

Slide 47 text

SELECT WAITING • DB migration࣌ͳͲʹϩοΫ଴͕ͪൃੜ • psίϚϯυ • ps aux | grep postgres | grep waiting • SELECT * FROM pg_locks; • ΋͏গ͠ࡉ͔͘ϩοΫͷ֫ಘঢ়گΛ֬ೝͰ͖Δ

Slide 48

Slide 48 text

ΫΤϦΩϟϯηϧ • ΫΤϦΛΩϟϯηϧ • SELECT pg_cancel_backend(pid); • or kill -INT pid • ΫΤϦΛΩϟϯηϧͯ͠όοΫΤϯυϓϩηεΛམͱ͢ • SELECT pg_terminate_backend(pid) • or kill -TERM pid

Slide 49

Slide 49 text

#4 DEADLOCK

Slide 50

Slide 50 text

ERROR: deadlock detected

Slide 51

Slide 51 text

σουϩοΫͷݪҼ • ϩοΫΛऔΔॱ൪͕ҟͳΔͨΊ • ςʔϒϧϩοΫɺߦϩοΫͳͲ • ϩοΫ֫ಘ • ໌ࣔతͳϩοΫ • ҉໧తͳϩοΫ

Slide 52

Slide 52 text

࣮ࡍʹى͖ͨέʔεʢ؆ུ൛ʣ 2013-10-30 00:11:22 JST DETAIL: Process 3225 waits for ShareLock on transaction 11759339; blocked by process 3040. Process 3040 waits for ShareLock on transaction 11759337; blocked by process 3225. Process 3225: insert into foo (x,y,z) select x, 12345, z from bar where id in (( values (1), (2), (3) ) except ( select x from foo where date = '...')) ! Process 3040: insert into foo (x,y,z) select x, 12345, z from bar where id in (( values (2), (3), (1) ) except( select x from foo where date = '...')) !

Slide 53

Slide 53 text

ௐࠪ • લఏɿ։ൃ؀ڥͰ΍Δ͜ͱ • postgresql.conf • log_statement = all • log_line_prefixʹ%pΛ͚ͭΔ • COMMIT௚લͷpg_locksΛ֬ೝ

Slide 54

Slide 54 text

Ұ࣍ௐࠪͰΘ͔ͬͨ͜ͱ • ςʔϒϧϩοΫͷॱ൪͸ಉ͡ • ͦ΋ͦ΋ڧ͍ϩοΫϨϕϧͰςʔϒϧϩοΫऔͬͯͳ͍ • INSERTͳͷͰfooʹର͢ΔߦϩοΫ͸͋Δͷ͔ʁ • ཁ͸Α͘Θ͔Βͳ͔ͬͨ • ͨͩɺ͔֬ʹinsertͰยํ͕଴ͨ͞ΕΔ

Slide 55

Slide 55 text

ࢭ·͍ͬͯΔՕॴΛௐ΂Δ • PostgreSQLΛσόοάϏϧυ • CFLAGS=-O0 ./configure —prefix=$HOME —enable- debug • ยํͰSELECT pg_backend_pid() • gdb -p backendͷpid

Slide 56

Slide 56 text

BACK TRACE (gdb) bt … #5 0x00000000006f0ada in LockAcquire (locktag=0x7fff7ce28d20, lockmode=5, sessionLock=0 '\000', dontWait=0 '\000') at lock.c:662 #6 0x00000000006effec in XactLockTableWait (xid=768) at lmgr.c:495 #7 0x00000000004890e4 in _bt_doinsert (rel=0x7fa0c1634838, itup=0x2122468, checkUnique=UNIQUE_CHECK_YES, heapRel=0x7fa0c162f820) at nbtinsert.c:168 #8 0x000000000048f3b4 in btinsert (fcinfo=0x7fff7ce28e40) at nbtree.c:257 #9 0x0000000000819bb5 in FunctionCall6Coll (flinfo=0x2115650, collation=0, arg1=140328416004152, arg2=140735288611488, arg3=140735288611840, arg4=34743148, arg5=140328415983648, arg6=1) at fmgr.c:1439 #10 0x0000000000487c64 in index_insert (indexRelation=0x7fa0c1634838, values=0x7fff7ce292a0, isnull=0x7fff7ce29400 "", heap_t_ctid=0x212236c, heapRelation=0x7fa0c162f820, checkUnique=UNIQUE_CHECK_YES) at indexam.c:216 #11 0x00000000005f29aa in ExecInsertIndexTuples (slot=0x21167c0, tupleid=0x212236c, estate=0x2115e60) at execUtils.c:1087 #12 0x0000000000605273 in ExecInsert (slot=0x21167c0, planSlot=0x21167c0, estate=0x2115e60, canSetTag=1 '\001') at nodeModifyTable.c:248

Slide 57

Slide 57 text

No content

Slide 58

Slide 58 text

create table tt(a text unique); select pg_backend_pid(); begin; insert into tt values (‘aa’), (‘bb’) select pg_backend_pid(); begin; insert into tt values (‘bb’), (‘aa’) (gdb) b _bt_doinsert Breakpoint 1 at 0x488fc7: file nbtinsert.c, line 106. (gdb) c Continuing. ! Breakpoint 1, _bt_doinsert (….) at nbtinsert.c:106 (gdb) c Continuing. ! Breakpoint 1, _bt_doinsert (….) at nbtinsert.c:106 ‘aa’Λinsertͨ͠ͱ͜ΖͰࢭΊΔ ! (gdb) c (gdb) b _bt_doinsert Breakpoint 1 at 0x488fc7: file nbtinsert.c, line 106. (gdb) c Continuing. ! Breakpoint 1, _bt_doinsert (….) at nbtinsert.c:106 (gdb) c Continuing. ! Breakpoint 1, _bt_doinsert (….) at nbtinsert.c:106 ‘bb’Λinsertͨ͠ͱ͜ΖͰࢭΊΔ ! (gdb) c psql psql gdb gdb

Slide 59

Slide 59 text

ճආํ๏ 2013-10-30 00:11:22 JST DETAIL: Process 3225 waits for ShareLock on transaction 11759339; blocked by process 3040. Process 3040 waits for ShareLock on transaction 11759337; blocked by process 3225. Process 3225: insert into foo (x,y,z) select x, 12345, z from bar where x in (( values (1), (2), (3) ) except ( select x from foo where date = ‘…’)) ORDER BY x ! Process 3040: insert into foo (x,y,z) select x, 12345, z from bar where x in (( values (2), (3), (1) ) except( select x from foo where date = ‘...')) ORDER BY x !

Slide 60

Slide 60 text

#5 TV์ө

Slide 61

Slide 61 text

9/9 19:00-21:00 NTV ༗٢θϛ

Slide 62

Slide 62 text

AWS • instanceΛscale up • m1.large → m3.2xlarge • app server૿ڧ • Elastic LoadBalancerͷpre-warming

Slide 63

Slide 63 text

PERFORMANCE TEST • pgbenchΛ࢖༻ • tps(transaction per second)Λ஌Δ͜ͱ͕Ͱ͖Δ • -f ࣮ߦ͍ͨ͠SQLϑΝΠϧ • -c ฒྻ਺ • -t τϥϯβΫγϣϯ਺ • New Relic/CloudWatchͰύϑΥʔϚϯεΛ؂ࢹ

Slide 64

Slide 64 text

౰೔ • warmupΛ௚લͰ࣮ࢪ • master/slave • ͋ͱ͸فΔͷΈ

Slide 65

Slide 65 text

ৼΓฦΓ • 3000over req/sec • app server͕࣋ͪͨ͑͜ΒΕͳ͍࣌ؒଳ(਺෼)͕ൃੜ • ΰʔϧσϯ์ૹͷΞΫηε͸͛͢ʔ • DBʹ͍ͭͯ͸ɺcheckpointͷνϡʔχϯά͕؁͘ɺ diskॻ͖ࠐΈ͕݁ߏൃੜͨ͠

Slide 66

Slide 66 text

#6 TV์ө(࠶์ૹ)

Slide 67

Slide 67 text

10/614:00-16:00 NTV ༗٢θϛ(࠶)

Slide 68

Slide 68 text

࠶์ૹΛͨ·ͨ·ؾ͕͍ͭͨ ͷ͕2೔લ ʢڭ͑ͯΑɻɻɻʣ

Slide 69

Slide 69 text

postgresql.conf • memory • shared_buffer=౥ࡌϝϞϦͷ1/4ఔ౓ • wal_buffer=16MB • checkpoint • checkpoint_segments = 64 • checkpoint_timeout = 1h • checkpoint_completion_target = 0.9 ! • planner • effective_cache_size=page cacheͷαΠ ζ • random_page_cost=2.0 • lock • deadlock_timeout = 10s

Slide 70

Slide 70 text

ৼΓฦΓ • ໰୊ͳ͘ࡹ͚ͨ • ࠶์ૹ͸࿈བྷ͜ͳ͍ͷͰཁ஫ҙ 4000%?

Slide 71

Slide 71 text

·ͱΊ

Slide 72

Slide 72 text

PostgreSQL࢖ͬͯΈͯ • ελʔτΞοϓͷαʔϏεͰ࢖͏RDBMSͱͯ͠Α͍ • ػೳ໘ɺίετ໘ • ΋ͪΖΜMySQL΋͍͢͝ͱࢥ͍·͢ • ӡ༻΋޻෉࣍ୈͰ৭ʑͰ͖Δ • Ұ୆Ͱ݁ߏͳϦΫΤετ਺Λ଱͑ΒΕΔʂ

Slide 73

Slide 73 text

એ఻

Slide 74

Slide 74 text

ΤϯδχΞઈࢍืूதʂ (ಛʹΠϯϑϥ΍Γ͍ͨਓ) http://www.smapo.jp/recruit/index.html

Slide 75

Slide 75 text

͓ΘΓ