Slide 1

Slide 1 text

© LY Corporation X (formerly Twitter)/GitHub: @musaprg Kotaro INOUE A Hidden Pitfall of K8s DNS with Spring Webflux Kubernetes Meetup Tokyo #69

Slide 2

Slide 2 text

© LY Corporation 1. Spring Webflux’s default DNS resolver behaves differently from JVM built-in implementation. 2. Make use of known workarounds when working with Kubernetes DNS. 1. Use FQDN (with trailing dot) for query • example.com. • example.com 2. Specify ndots to 1 or 2 in the Pod spec field .spec.dnsConfig.options[*] 2 Summary

Slide 3

Slide 3 text

© LY Corporation 3 HELP! DNS request timed out! Application pods cannot resolve cluster-local domain.

Slide 4

Slide 4 text

© LY Corporation • Primary nameserver • Node-local DNS (DaemonSet) • Secondary nameserver • Upstream DNS 4 Overview of name resolution in our cluster /etc/resolv.conf

Slide 5

Slide 5 text

© LY Corporation • The target cluster was in the middle of in-place migration process. • Cluster IP of Upstream DNS was changed before/after the migration. 5 What happened? https://youtu.be/BDjhGEVJ0Gs Rancher (Internal Fork) Cluster API (kubeadm) In-place migration

Slide 6

Slide 6 text

© LY Corporation • Primary nameserver • Node-local DNS (DaemonSet) having correct Upstream DNS IP • Secondary nameserver • Upstream DNS with wrong IP = unreachable due to our bug 6 What happened? /etc/resolv.conf

Slide 7

Slide 7 text

© LY Corporation • Primary nameserver • Node-local DNS (DaemonSet) having correct Upstream DNS IP • Secondary nameserver • Upstream DNS with wrong IP = unreachable due to our bug 7 What happened? /etc/resolv.conf The domain should still be resolvable.

Slide 8

Slide 8 text

© LY Corporation • Common pitfall of K8s DNS • Non-FQDN (PQDN) Query would take so long • Query “example.org” will look like: 8 Possible cause #1: ndots=5 + search domain https://speakerdeck.com/toversus/reliable-and-performant-dns-resolution-with-high-available- nodelocal-dnscache

Slide 9

Slide 9 text

© LY Corporation • Spring Webflux 2.4.0 started using reactor-netty v1.x • reactor-netty v1.0.0 switched their default DNS resolver to their own Netty DNS Resolver instead of JVM one. 9 Possible cause #2: Netty DNS Resolver https://github.com/reactor/reactor-netty/pull/1252

Slide 10

Slide 10 text

© LY Corporation • When we query nginx.default, Netty DNS Resolver behaves like: 1. (1st searchdomain) Try primary nameserver 2. (1st searchdomain) If the response is NXDomain, Try secondary nameserver 3. Proceed with the next searchdomain 4. … 10 Possible cause #2: Netty DNS Resolver tcpdump logs (modified)

Slide 11

Slide 11 text

© LY Corporation 1. Try primary nameserver for all search domain 2. Try secondary nameserver for all search domain 11 Reference: JVM’s built-in resolver tcpdump logs (modified)

Slide 12

Slide 12 text

© LY Corporation 1. First, try primary nameserver for all search domain 2. Next, try secondary nameserver for all search domain 12 Reference: cURL (glibc) tcpdump logs (modified)

Slide 13

Slide 13 text

© LY Corporation 1. Explicitly use JVM’s built-in resolver in Spring Webflux 2. Use FQDN (with trailing dot) for query • example.com. • example.com 3. Specify ndots to 1 or 2 in the Pod spec field .spec.dnsConfig.options[*] 13 Workaround

Slide 14

Slide 14 text

© LY Corporation 1. Explicitly use JVM’s built-in resolver in Spring Webflux 2. Use FQDN (with trailing dot) for query • example.com. • example.com 3. Specify ndots to 1 or 2 in the Pod spec field .spec.dnsConfig.options[*] 14 Workaround Make use of these workarounds to avoid DNS issues

Slide 15

Slide 15 text

© LY Corporation