Upgrade to Pro — share decks privately, control downloads, hide ads and more …

GCPのネットワークでハマった話

 GCPのネットワークでハマった話

第16回elasticsearch勉強会 https://elasticsearch.doorkeeper.jp/events/46539

Daichi Hirata

June 27, 2016
Tweet

More Decks by Daichi Hirata

Other Decks in Technology

Transcript

  1. ࣗݾ঺հ DAICHI HIRATA ▸ @daichild
 daichirata ▸ גࣜձࣾαΠόʔΤʔδΣϯτ
 ΞυςΫຊ෦
 CAϦϫʔυ

    ▸ Golang, Ruby ▸ ✂Secateurs (ES IndexTemplate DSL in Ruby) ▸ ྲྀ೿: hhkb2 2౛ྲྀ
  2. ෆ҆ఆͳΫϥελ ▸ ͍͍ͩͨ2࣌ؒҐͷִؒͰϚελʔϊʔυͱͷpingʹࣦഊ ▸ OS: CentOS 7.2 ▸ Elasticsearch: 2.3.1

    [INFO ][discovery.gce ] [elasticsearch-1] master_left [{elasticsearch-2} {4TPArCtHQMKgWaLod3ZMjA}{10.2.101.5}{10.2.101.5:9300}], reason [failed to ping, tried [3] times, each with maximum [30s] timeout] [WARN ][discovery.gce ] [elasticsearch-1] master left (reason = failed to ping, tried [3] times, each with maximum [30s] timeout), current nodes: {{elasticsearch-3} {JtcxuuucRXiClrl6q7qL8A}{10.2.101.5}{10.2.101.5:9300},{elasticsearch-1}{RQvtZKAJTfGmbmWETYY0fw} {10.2.101.4}{elasticsearch-1.c.cyberagent-013.internal/10.2.101.4:9300},} [INFO ][cluster.service ] [elasticsearch-1] removed {{elasticsearch-2} {4TPArCtHQMKgWaLod3ZMjA}{10.2.101.5}{10.2.101.5:9300},}, reason: zen-disco-master_failed ({elasticsearch-2}{4TPArCtHQMKgWaLod3ZMjA}{10.2.101.5}{10.2.101.5:9300})
  3. ෆ҆ఆͳΫϥελ [DEBUG][action.admin.cluster.health] [elasticsearch-1] connection exception while trying to forward request

    with action name [cluster:monitor/health] to master node [{elasticsearch-2} {4TPArCtHQMKgWaLod3ZMjA}{10.2.101.5}{10.2.101.5:9300}], scheduling a retry. Error: [org.elasticsearch.transport.NodeDisconnectedException: [elasticsearch-2][10.2.101.5:9300] [cluster:monitor/health] disconnected] [INFO][discovery.gce ] [elasticsearch-1] master_left [{elasticsearch-2}{Xa2Cq98mQie1WcaXFfHraQ} {10.2.101.5}{10.2.101.5:9300}], reason [transport disconnected] [WARN][discovery.gce ] [elasticsearch-1] master left (reason = transport disconnected), current nodes: {{elasticsearch-1}{fjLqVUoxRB6RRNCecJSAaw}{10.2.101.4}{10.2.101.4:9300},} [INFO][cluster.service] [elasticsearch-1] removed {{elasticsearch-2}{Xa2Cq98mQie1WcaXFfHraQ} {10.2.101.5}{10.2.101.5:9300},}, reason: zen-disco-master_failed ({elasticsearch-2} {Xa2Cq98mQie1WcaXFfHraQ}{10.2.101.16}{10.2.101.16:9300})
  4. TRANSPORT MODULE ▸ TransportपΓͷϩάΛTRACEϨϕϧ·Ͱग़ྗ ▸ curl -XPUT localhost:9200/_cluster/settings -d '


    {
 "transient" : {
 "logger.transport" : "TRACE",
 "logger.org.elasticsearch.transport" : "TRACE"
 }
 }'
  5. TRANSPORT MODULE [2016-04-27 16:07:43,207][TRACE][transport.netty ] [elasticsearch-1] close connection exception caught

    on transport layer [[id: 0xa2b52d5c, /10.2.101.4:40290 => /10.2.101.5:9300]], disconnecting from relevant node java.io.IOException: Connection timed out at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:192) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
  6. NETSTAT $ netstat --tcp -t -o -n | grep 9300

    | sort -k5 tcp6 0 0 10.2.101.4:9300 10.2.101.5:37638 ESTABLISHED keepalive (4107.47/0/1) tcp6 0 0 10.2.101.4:9300 10.2.101.5:37637 ESTABLISHED keepalive (4107.47/0/1) tcp6 0 0 10.2.101.4:9300 10.2.101.5:37636 ESTABLISHED keepalive (4107.47/0/1) tcp6 0 0 10.2.101.4:9300 10.2.101.5:37635 ESTABLISHED keepalive (4107.47/0/1) tcp6 0 0 10.2.101.4:9300 10.2.101.5:37634 ESTABLISHED keepalive (4107.47/0/1) tcp6 0 0 10.2.101.4:9300 10.2.101.5:37633 ESTABLISHED keepalive (5221.58/0/0) tcp6 0 0 10.2.101.4:9300 10.2.101.5:37632 ESTABLISHED keepalive (5172.43/0/0) tcp6 0 0 10.2.101.4:9300 10.2.101.5:37631 ESTABLISHED keepalive (5172.43/0/0) tcp6 0 0 10.2.101.4:9300 10.2.101.5:37630 ESTABLISHED keepalive (5188.81/0/0) tcp6 0 0 10.2.101.4:9300 10.2.101.5:37629 ESTABLISHED keepalive (5188.82/0/0) tcp6 0 0 10.2.101.4:9300 10.2.101.5:37628 ESTABLISHED keepalive (5221.58/0/0) tcp6 0 0 10.2.101.4:9300 10.2.101.5:37627 ESTABLISHED keepalive (4205.77/0/0) tcp6 0 0 10.2.101.4:9300 10.2.101.5:37626 ESTABLISHED keepalive (5319.89/0/0) tcp6 0 0 10.2.101.4:42254 10.2.101.5:9300 ESTABLISHED keepalive (4107.47/0/1) tcp6 0 0 10.2.101.4:42253 10.2.101.5:9300 ESTABLISHED keepalive (4107.47/0/1) tcp6 0 0 10.2.101.4:42252 10.2.101.5:9300 ESTABLISHED keepalive (4107.47/0/1) tcp6 0 0 10.2.101.4:42251 10.2.101.5:9300 ESTABLISHED keepalive (4107.47/0/1) tcp6 0 0 10.2.101.4:42250 10.2.101.5:9300 ESTABLISHED keepalive (4107.47/0/1) tcp6 0 0 10.2.101.4:42249 10.2.101.5:9300 ESTABLISHED keepalive (4107.47/0/1) tcp6 0 0 10.2.101.4:42248 10.2.101.5:9300 ESTABLISHED keepalive (5319.89/0/0) tcp6 0 0 10.2.101.4:42247 10.2.101.5:9300 ESTABLISHED keepalive (5319.89/0/0) tcp6 0 0 10.2.101.4:42246 10.2.101.5:9300 ESTABLISHED keepalive (5319.89/0/0) tcp6 0 0 10.2.101.4:42245 10.2.101.5:9300 ESTABLISHED keepalive (5319.89/0/0) tcp6 0 0 10.2.101.4:42244 10.2.101.5:9300 ESTABLISHED keepalive (5319.89/0/0) tcp6 0 0 10.2.101.4:42243 10.2.101.5:9300 ESTABLISHED keepalive (5319.89/0/0) tcp6 0 0 10.2.101.4:42242 10.2.101.5:9300 ESTABLISHED keepalive (5319.89/0/0) ϊʔυA͔ΒB΁ͷ઀ଓ ϊʔυB͔ΒA΁ͷ઀ଓ
  7. TCP KEEPALIVE ▸ ແ௨৴࣌ɺҰఆִ࣌ؒؒͰprobeύέοτΛૹड৴͢Δ͜ͱʹΑΓɺTCP઀ଓ͕ ΞΫςΟϒͰ͋Δ͜ͱΛ͓ޓ͍ʹ௨஌ɺ֬ೝ͢ΔͨΊͷػೳ ▸ ElasticsearchσϑΥϧτઃఆ͸༗ޮ ▸ net.ipv4.tcp_keepalive_time=7200 (2࣌ؒ)


    net.ipv4.tcp_keepalive_intvl=75
 net.ipv4.tcp_keepalive_probes=9 ▸ ͦ΋ͦ΋ɺTCP Keepaliveͷprobe packet͸ແ௨৴ͩͬͨ৔߹ʹͷΈૹ৴͞ΕΔ ͸ͣ ▸ Ұ෦ͷίωΫγϣϯͷΈ΍ΓऔΓʹࣦഊ͍ͯ͠ΔݪҼ͕ෆ໌