DNS 関連の障害 (50 件中 13 件) • Total DNS outage in Kubernetes cluster (Zalando) ◦ https://github.com/zalando-incubator/kubernetes-on-aws/blob/dev/docs/postm ortems/jan-2019-dns-outage.md DNS is troublesome
Conntrack races • NodeLocal DNSCache • Deep dive into NodeLocal DNSCache • Colopl specific NodeLocal DNSCache implementation • High available NodeLocal DNSCache • Future works Table of Contents
Conntrack races • NodeLocal DNSCache • Deep dive into NodeLocal DNSCache • Colopl specific NodeLocal DNSCache implementation • High available NodeLocal DNSCache • Future works Table of Contents
VIP が変わるとサービスの設定変更が必要 ◦ ドメイン名を使えばサービスの設定変更が不要 • DNS をベースにしたサービス検出 ◦ DNS-SD (RFC-6763) ◦ Consul ◦ Cloud Map (AWS), Service Directory (GCP), ... DNS in Cloud Native
Conntrack races • NodeLocal DNSCache • Deep dive into NodeLocal DNSCache • Colopl specific NodeLocal DNSCache implementation • High available NodeLocal DNSCache • Future works Table of Contents
Container Engine and Kubernetes (Google Cloud Next '17) ◦ https://www.youtube.com/watch?v=y2bhV81MfKQ • Life of a Packet [I] - Michael Rubin, Google ◦ https://www.youtube.com/watch?v=0Omvgd7Hg1I Kubernetes Networking
Kubernetes and Networks - Why is this so dang hard ◦ https://speakerdeck.com/thockin/kubernetes-and-networks-why-is-this-so-dan g-hard • kube-proxy iptables "nat" control flow ◦ https://docs.google.com/drawings/d/1MtWL8qRTs6PlnJrW4dh8135_S9e2SaawT4 10bJuoBPk/edit Kubernetes Networking
Conntrack races • NodeLocal DNSCache • Deep dive into NodeLocal DNSCache • Colopl specific NodeLocal DNSCache implementation • High available NodeLocal DNSCache • Future works Table of Contents
• 5 – 15s DNS lookups on Kubernetes? ◦ https://blog.quentin-machu.fr/2018/06/24/5-15s-dns-lookups-on-kubernetes/ • A reason for unexplained connection timeouts on Kubernetes/Docker ◦ https://tech.xing.com/a-reason-for-unexplained-connection-timeouts-on-kubernetes-docke r-abd041cf7e02 Conntrack Races (in Kubernetes)
same-origin entries Fix for Conntrack Races (2) int nf_conntrack_tuple_taken(...) { ... if (nf_ct_key_equal(h, tuple, zone, net)) { * If the *original tuples* are identical, then both * conntracks refer to the same flow. * * Let nf_ct_resolve_clash() deal with this later. if (nf_ct_tuple_equal(&ignored_conntrack->tuplehash[IP_CT_DIR_ORIGINAL].tuple, &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple)) continue; ... }
Conntrack races • NodeLocal DNSCache • Deep dive into NodeLocal DNSCache • Colopl specific NodeLocal DNSCache implementation • High available NodeLocal DNSCache • Future works Table of Contents
Conntrack races • NodeLocal DNSCache • Deep dive into NodeLocal DNSCache • Colopl specific NodeLocal DNSCache implementation • High available NodeLocal DNSCache • Future works Table of Contents
◦ hostNetwork を有効化して接続可能にする必要あり Binding to Dummy Interface $ sudo ip addr list nodelocaldns (...) 38: nodelocaldns: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default link/ether 4e:b0:eb:e8:19:e2 brd ff:ff:ff:ff:ff:ff inet 10.96.0.10/32 brd 10.96.0.10 scope global nodelocaldns valid_lft forever preferred_lft forever
Conntrack races • NodeLocal DNSCache • Deep dive into NodeLocal DNSCache • Colopl specific NodeLocal DNSCache implementation • High available NodeLocal DNSCache • Future works Table of Contents
Conntrack races • NodeLocal DNSCache • Deep dive into NodeLocal DNSCache • Colopl specific NodeLocal DNSCache implementation • High available NodeLocal DNSCache • Future works Table of Contents
のルールを削除 ▪ OOMKill などでプロセスが異常終了した場合に、teardown の処理が行われ ず DNS クエリの名前解決に失敗してしまう ▪ そもそも...コロプラの使い方だと、teardown 処理でデフォルトの kube-dns に フォールバックしても backend と疎通できず障害になる Single Point of Failure
Conntrack races • NodeLocal DNSCache • Deep dive into NodeLocal DNSCache • Colopl Specific Implementation • High Available NodeLocal DNSCache • Future works Table of Contents
◦ ServiceImport で他クラスターから取り込む Service を選択 ◦ <service>.<namespace>.svc.clusterset.local でサービス検出可能 • API 仕様を決めるだけで公式のコントローラー実装はない ◦ API 仕様は CRD で公開、Kubernetes のリリースに依存せず利用可能 ◦ CoreDNS などを使って独自実装 or サードパーティ製品 Multi-cluster Services API