Upgrade to Pro — share decks privately, control downloads, hide ads and more …

incident-debugging.pdf

Rafael Jesus
December 13, 2019
49

 incident-debugging.pdf

Rafael Jesus

December 13, 2019
Tweet

Transcript

  1. Workflow Instrument service and infrastructure layer Set up SLIs that

    reflects the overall health of the application Set up SLOs that reflects the amount of downtime that you aim to support Respond to alerts when they are triggered After recovery, assess how effective instrumentation was, make plans to refine it if necessary
  2. Debugging latency & 5xx errors SloLatencyTooHigh alert goes off, first

    steps: Get the 99th, 95th response time broken down by service , http method and status_code Check upstream service dependencies latency/resource utilization Check outbound TCP connections request size, rate Check the node resource usage the service runs at If node is saturated, DNS queries will be either slow or fail
  3. DNS issues in k8s lookup("server", "A") // all lookups bellow

    are triggered by a single lookup lookup("server", "AAAA") lookup("server.platform.svc.cluster.local", "A") lookup("server.platform.svc.cluster.local", "AAAA") lookup("server.svc.cluster.local", "A") lookup("server.svc.cluster.local", "AAAA") lookup("server.cluster.local", "A") lookup("server.cluster.local", "AAAA") # cat /etc/resolv.conf nameserver 169.254.20.10 search platform.svc.cluster.local svc.cluster.local cluster.local options ndots: 5 timeout:1 attempts:5
  4. Conntrack Kernel module that tracks in/out UDP connections Table format:

    src + dst IP, src + dst port and connection state conntrack -L tcp ESTABLISHED src=172.24.110.67 dst=172.24.124.188 sport=29290 dport=31520 src=172.24.44.248 dst=172.24.124.188 sport=80 dport=29290
  5. The Bug There is a race condition in conntrack When

    two packets are sent via the same socket at the same time Packets get dropped DNS lookup remains in waiting state until it times out Only happens for UDP
  6. Watch out AWS VPC Limits Max of 1024 packets per

    sec and interface for aws dns server Remember there were 8 lookups out of one DNS resolution