Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Next Hop Group & Netlink on Linux (JP)

ebiken
December 01, 2022

Next Hop Group & Netlink on Linux (JP)

Next Hop Group & Netlink on Linux の解説

(より詳細な調査メモは GitHub に掲載し更新)
https://github.com/ebiken/nsdevnotes/tree/main/linux

ebiken

December 01, 2022
Tweet

More Decks by ebiken

Other Decks in Technology

Transcript

  1. Next Hop Group & Netlink on Linux Twitter: @ebiken Next

    Hop Group & Netlink on Linux | @ebiken | 2022/12/01
  2. • References • Next Hop Group (NHG) 概要 • Route

    / Next Hop / Next Hop Group Object データ構造 • Next Hop Group の利点(モチベーション) • Next Hop Group on Linux • History (Multipath Route) • Linux Kernel Data Structures • iproute2 を用いた Multipath 設定方法 • netlink overview • netlink & nexthop (group) 具体例 • Appendix • struct & enum of netlink / rtnetlink Next Hop Group & Netlink on Linux | @ebiken | 2022/12/01 目次
  3. • Linux 関連調査メモ by @ebiken : https://github.com/ebiken/nsdevnotes/tree/main/linux • 2022-11-23 Netlinkと友達になろう

    (Netlink の平易な日本語解説BLOG): https://eniyo0.hatenablog.com/entry/2022/11/23/180135 • netlink で利用される Types や Enums の一覧 : https://wiki.slank.dev/book/types.html • Improving Route Scalability: Nexthops as Separate Objects • Linux Plumbers Conf 2019: David Ahern @Cumulus • [nexthop-objects-talk.pdf](https://lpc.events/event/4/contributions/434/attachments/251/436/nexthop-objects-talk.pdf) • https://www.youtube.com/watch?v=HIqvUiwDHGk • lwn.net [net: Improve route scalability via support for nexthop objects](https://lwn.net/Articles/763950/) • Blog: Multipath Routing in Linux - part 1 (Sat 24 June 2017) (歴史や Code Path も載っている) • https://codecave.cc/multipath-routing-in-linux-part-1.html • [RFC5549 Advertising IPv4 Network Layer Reachability Information with an IPv6 Next Hop](https://datatracker.ietf.org/doc/rfc5549/) • [RFC8950 Advertising IPv4 Network Layer Reachability Information (NLRI) with an IPv6 Next Hop](https://datatracker.ietf.org/doc/rfc8950/) • iproute2 man: ip nexthop: https://man.archlinux.org/man/core/iproute2/ip-nexthop.8.en • FRR: Docs: • Nexthop Groups: https://docs.frrouting.org/en/latest/nexthop_groups.html • Multiple nexthop static route: https://docs.frrouting.org/en/latest/static.html#multiple-nexthop-static-route • Resilient Next-Hop Groups in Linux went into Linux 5.13 • NetDevConf 0x15: Resilient Next-Hop Groups in Linux, NVIDIA, Ido Schimmel, Petr Machata • July 2021: https://netdevconf.info/0x15/session.html?Resilient-nexthop-groups • https://www.youtube.com/watch?v=PwWlZGhUgUU&list=PLrninrcyMo3L-hsJv23hFyDGRaeBY1EJO • https://docs.kernel.org/networking/nexthop-group-resilient.html Next Hop Group & Netlink on Linux | @ebiken | 2022/12/01 Reference: Linux Next Hop (Group)
  4. Next Hop Group (NHG) 概要 Next Hop Group & Netlink

    on Linux | @ebiken | 2022/12/01
  5. • route は宛先(prefix)に対応した転送先情報 gateway (gw) と device (dev) を保持 •

    この場合の prefix は Longest Prefix Match に利用されマスク(len)を含む • gw (address) により dev が決定する場合等、dev を指定しなくてもOS等により解 決される場合もあり • gw の MAC アドレスは別途解決され neightbor table に保存される(Linuxの場合) • 実装により様々な保持の仕方(データ構造)が存在する Next Hop Group & Netlink on Linux | @ebiken | 2022/12/01 Route Entry & Next Hop A B C D route prefix: D dev : ens0 gw : B ens0 route ens1
  6. Next Hop Group & Netlink on Linux | @ebiken |

    2022/12/01 Route Entry & Next Hop データ構造 route prefix dev gateway route prefix nexthop nexthop dev gateway ip route add <prefix> via <gw> [dev <device>] ip route add <prefix> nexthop via <gw> [dev <device>] route group nexthop nexthop dev gateway nexthop dev gateway nexthop dev gateway nexthop group nexthop[N] ① route entry に nexthop 情報 を内包(Linux 5.2 以前) ② route entry と nexthop object の分離(Linux 5.3 以降) ※ Linux 5.2 は nexthop サポートに向け たリファクタリングが含まれる。但し、 fib_info には fib_nh のみ含まれ nexthop はまだ含まれない Linux コマンド
  7. • Next Hop の利点 • route エントリと nexthop の分離(それぞれを別々に追加・更新可能) •

    追加・更新に必要な時間の短縮 • リソース(メモリ・SRAM/TCAM)の節約 • RFC5549, RFC8950 に対応(IPv4 routes with IPv6 nexthops) • Next Hop が無い場合、Route追加時に以下操作が毎回必要となる • gateway address + dev が正しいかの確認(Lookup) • トンネルインターフェースの場合、状態の確認 • Next Hop の比較・検索(既に存在するか?新規か?) • Next Hop Group の利点 • Next Hop Group が無い場合、Next Hop 追加・変更・削除時に、全ての route エントリの更新が必 要 • Linux / Switch ASIC では実装されていないが、将来実現可能性がある利点 • NHGをネストする事による backup nexthop の設定 ⇒ Fast Re-Routing Next Hop Group & Netlink on Linux | @ebiken | 2022/12/01 Next Hop Group の利点(モチベーション)
  8. Next Hop Group on Linux Next Hop Group & Netlink

    on Linux | @ebiken | 2022/12/01
  9. • 複数の nexthop が存在する route entry を multipath route と呼ぶ

    • ECMP (Equal Cost Multi Path) ... コストが同じ場合 • Linux v5.2 => IPv6 (ip6_info) を中心にリファクタリング • nexthop object サポートのための準備 • IPv4 routes with IPv6 nexthops • (BGP unnumberd に利用可能な機能: RFC5549/RFCC8950) • Linux v5.3 => nexthop object のサポート • Linux v5.15 => Resilient Next-Hop Groups のサポート Next Hop Group & Netlink on Linux | @ebiken | 2022/12/01 History (Multipath Route)
  10. nh_grp_entry { nexthop *nh; } nh_info { fib_nh fib_nh; };

    nexthop { nh_info *nh_info; }; Next Hop Group & Netlink on Linux | @ebiken | 2022/12/01 Linux Kernel Data Structures(概要) fib_info { nexthop *nh; fib_nh fib_nh[0]; }; fib_nh { } fib_nh_common { ... int nhc_oif; u8 nhc_family; u8 nhc_gw_family; union { __be32 ipv4; struct in6_addr ipv6; } nhc_gw; } nexthop { nh_info *nh_info; }; nexthop { *nh_grp; }; nh_info { fib_nh fib_nh; }; nh_info { fib_nh fib_nh; }; nh_grp_entry { nexthop *nh; } nexthop { nh_info *nh_info; }; nh_info { fib_nh fib_nh; }; nh_grp_entry { nexthop *nh; } nexthop { nh_info *nh_info; }; 従来の方式(Linux v5.2 以前) ① fib_info (route entry) に nexthop 情報を含む fib_nh の配列を保持 Next Hop Object を用いた方式(Linux v5.3 以降) ② fib_info に追加された nexthop へのポインタから nexthop を参照 ③ a. nexthop から nh_info を経由し fib_nh を参照 ③ b. 複数の nexthop がある場合、nexthop から nh_grp_entry を経由し nexthop を参照 ※ nexthop 情報 ⇒ 送信先デバイス (nhc_oif) & gateway アドレス (nhc_gw) ① ② ③ b ③ a
  11. Next Hop Group & Netlink on Linux | @ebiken |

    2022/12/01 Linux Kernel Data Structures (IPv4) > linux-5.2/include/net/ip_fib.h struct fib_info { ... int fib_nhs; bool fib_nh_is_v6; struct rcu_head rcu; struct fib_nh fib_nh[0]; }; struct fib_nh_common { ... struct net_device *nhc_dev; int nhc_oif; u8 nhc_family; u8 nhc_gw_family; struct lwtunnel_state *nhc_lwtstate; union { __be32 ipv4; struct in6_addr ipv6; } nhc_gw; } > include/net/ip_fib.h struct fib_nh { ... struct fib_nh_common nh_common; } Multipath with No NextHop Object
  12. Next Hop Group & Netlink on Linux | @ebiken |

    2022/12/01 Linux Kernel Data Structures (IPv4) > linux-5.3/include/net/ip_fib.h struct fib_info { ... int fib_nhs; bool fib_nh_is_v6; bool nh_updated; struct nexthop *nh; struct rcu_head rcu; struct fib_nh fib_nh[0]; }; > linux-5.3/include/net/nexthop.h struct nh_info { ... u8 family; ... union { struct fib_nh_common fib_nhc; struct fib_nh fib_nh; struct fib6_nh fib6_nh; }; }; > linux-5.3/include/net/nexthop.h struct nexthop { ... union { struct nh_info __rcu *nh_info; struct nh_group __rcu *nh_grp; }; }; struct fib_nh_common { ... struct net_device *nhc_dev; int nhc_oif; u8 nhc_family; u8 nhc_gw_family; struct lwtunnel_state *nhc_lwtstate; union { __be32 ipv4; struct in6_addr ipv6; } nhc_gw; } > include/net/ip_fib.h struct fib_nh { ... struct fib_nh_common nh_common; } with NextHop Object
  13. Next Hop Group & Netlink on Linux | @ebiken |

    2022/12/01 Linux Kernel Data Structures (IPv4) > linux-5.3/include/net/ip_fib.h struct fib_info { ... int fib_nhs; bool fib_nh_is_v6; bool nh_updated; struct nexthop *nh; struct rcu_head rcu; struct fib_nh fib_nh[0]; }; > linux-5.3/include/net/nexthop.h struct nh_info { ... u8 family; ... union { struct fib_nh_common fib_nhc; struct fib_nh fib_nh; struct fib6_nh fib6_nh; }; }; > linux-5.3/include/net/nexthop.h struct nexthop { ... union { struct nh_info __rcu *nh_info; struct nh_group __rcu *nh_grp; }; }; > linux-5.3/include/net/nexthop.h struct nh_group { u16 num_nh; bool mpath; bool has_v4; struct nh_grp_entry nh_entries[0]; }; > linux-5.3/include/net/nexthop.h struct nh_grp_entry { struct nexthop *nh; u8 weight; atomic_t upper_bound; struct list_head nh_list; struct nexthop *nh_parent }; struct fib_nh_common { ... struct net_device *nhc_dev; int nhc_oif; u8 nhc_family; u8 nhc_gw_family; struct lwtunnel_state *nhc_lwtstate; union { __be32 ipv4; struct in6_addr ipv6; } nhc_gw; } > include/net/ip_fib.h struct fib_nh { ... struct fib_nh_common nh_common; } > linux-5.3/include/net/nexthop.h struct nexthop { ... union { struct nh_info __rcu *nh_info; struct nh_group __rcu *nh_grp; }; }; Multipath with NextHop Object
  14. • ip nexthop コマンドを利用して nexthop を作成可能 • 自環境の iproute2 が

    nexthop をサポートしているか確認 • 確認方法 ⇒ $ ip ne [tab] で nexthop が候補として表示される • 2018年9月の以下パッチでサポートされた • [iproute2-next] ip: Add support for nexthop objects • https://patchwork.ozlabs.org/project/netdev/patch/[email protected]/ Next Hop Group & Netlink on Linux | @ebiken | 2022/12/01 iproute2 を用いた Multipath 設定方法 $ ip -V ip utility, iproute2-ss200127 $ ip ne [tab] neigh netconf netns nexthop
  15. Next Hop Group & Netlink on Linux | @ebiken |

    2022/12/01 iproute2 を用いた Multipath 設定方法 > nexthop を設定 $ ip nexthop add id 1 via 172.20.105.172 dev eno1 $ ip nexthop add id 2 via 172.20.105.173 dev eno1 > <nh1>/<nh2>/... のように設定済みの nexthop id > を用いて nexthop group を設定 $ ip nexthop add id 3 group 1/2 > group id を nhid <id> に指定し route を追加 $ ip route add 10.99.99.99/32 nhid 3 $ ip route 10.99.99.99 nhid 3 nexthop via 172.20.105.172 dev eno1 weight 1 nexthop via 172.20.105.173 dev eno1 weight 1 > nexthop が設定されている事を確認 $ ip nexthop list id 1 via 172.20.105.172 dev eno1 scope link id 2 via 172.20.105.173 dev eno1 scope link id 3 group 1/2 > nexthop via <gw> dev <dev> を繰り返す $ ip route add 10.11.11.11/32 ¥ nexthop via 172.20.105.174 dev eno1 ¥ nexthop via 172.20.105.175 dev eno1 $ ip route default via 172.20.104.1 dev eno1 proto static 10.11.11.11 nexthop via 172.20.105.174 dev eno1 weight 1 nexthop via 172.20.105.175 dev eno1 weight 1 > nexthop が設定されない事を確認 $ ip nexthop list (何も表示されない) 従来の方法(Next Hop Object 無し) Next Hop Object を利用した方法
  16. netlink overview Next Hop Group & Netlink on Linux |

    @ebiken | 2022/12/01
  17. • Kernel と情報のやり取りをするために利用されるインタフェース(API) • もしくはそれを提供するサブシステム • Socket を利用 • TCP/UDP等の

    Socket プログラミングに馴染みある技術者は学習コストが少ない • プロトコル(protocol)と呼ばれる機能毎にグルーピング • rtnetlink とは? • protocol == NETLINK_ROUTE に属するメッセージ • IP Routing / Neighbor に関連した機能を提供 Next Hop Group & Netlink on Linux | @ebiken | 2022/12/01 netlink / rtnetlink 概要
  18. Next Hop Group & Netlink on Linux | @ebiken |

    2022/12/01 netlink message のフォーマット netlink message メッセージヘッダ中の “message type” 毎に異なるフォーマット netlink attribute { length, type } + value の配列 value のフォーマットは type 毎に異なる message type 毎に利用可能な attr が決まっている netlink message hdr “message type” (nlmsg_type) が含まれる
  19. Next Hop Group & Netlink on Linux | @ebiken |

    2022/12/01 nlmsg_type: RTM_NEWROUTE & RTM_NEWNEXTHOP nlmsg_type message 構造体 netlink attr RTM_NEWROUTE struct rtmsg RTA_* (Routing Message Attributes) RTM_NEWNEXTHOP struct nhmsg NHA_* (Next Hop Attributes) ip route の設定(追加)では nlmsg_type として RTM_NEWROUTE や RTM_NEWNEXTHOP を利用 それぞれのメッセージは struct rtmsg, struct nhmsg 構造体で規定されたフォーマットを取り、Attribute は RTA_* や NHA_* を利用 // include/uapi/linux/rtnetlink.h struct rtmsg { unsigned char rtm_family; unsigned char rtm_dst_len; unsigned char rtm_src_len; unsigned char rtm_tos; unsigned char rtm_table; unsigned char rtm_protocol; unsigned char rtm_scope; unsigned char rtm_type; unsigned rtm_flags; }; // include/uapi/linux/nexthop.h struct nhmsg { unsigned char nh_family; unsigned char nh_scope; unsigned char nh_protocol; unsigned char resvd; unsigned int nh_flags; };
  20. Next Hop Group & Netlink on Linux | @ebiken |

    2022/12/01 主な RTA_* , NHA_* RTA_* 説明 RTA_DST 宛先アドレス (rtm_family に応じて Type が変わる) RTA_OIF 送信先インターフェース (device ID) RTA_GATEWAY Gateway のアドレス(IPv4, IPv6 etc.) RTA_MULTIPATH “rtnexthop + RTA_GATEWAY” の配列 RTA_NH_ID RTM_NEWNEXTHOP 等で作成された Next Hop Object の ID NHA_* 説明 NHA_ID id == 0 の場合は自動採番(auto-assign) NHA_GROUP nexthop_grp の配列。nexthop_grp には nexthop id, wright が含まれる NHA_OIF 送信先インターフェース (device ID) NHA_GATEWAY Gateway のアドレス(IPv4, IPv6 etc.) RTM_NEWROUTE 等で利用 RTM_NEWNEXTHOP 等で利用
  21. netlink & nexthop (group) 具体例 Next Hop Group & Netlink

    on Linux | @ebiken | 2022/12/01
  22. Next Hop Group & Netlink on Linux | @ebiken |

    2022/12/01 netlink/rtnetlink の具体例 (ip route を設定) 以下4パターン の netlink message を比較 (従来 | Next Hop Object 利用)×(Next Hop 1つ | Multipath) ① nexthop 利用無し(従来) ip route add 10.11.11.99/32 via 172.20.104.1 dev eno1 ② nexthop 利用無し Multipath(従来) ip route add 10.11.11.11/32 ¥ nexthop via 172.20.105.174 dev eno1 ¥ nexthop via 172.20.105.175 dev eno1 ③ nexthop を利用 ip nexthop add id 11 via 172.20.105.173 dev eno1 ip route add 10.11.12.13/32 nhid 11 ④ nexthop group を利用 Multipath ip nexthop add id 1 via 172.20.105.172 dev eno1 ip nexthop add id 2 via 172.20.105.173 dev eno1 ip nexthop add id 3 group 1/2 ip route add 10.11.12.13/32 nhid 3
  23. Next Hop Group & Netlink on Linux | @ebiken |

    2022/12/01 strace を用いた netlink message の確認 strace ビルド&インストール方法 (yum / apt コマンドによるアップデートができない場合) download strace-6.0.tar.xz from https://github.com/strace/strace/releases/tag/v6.0 $ tar xf strace-6.0.tar.xz $ cd strace-6.0 $ ./configure --disable-mpers $ make $ sudo make install $ which strace /usr/local/bin/strace $ strace --version strace -- version 6.0 Copyright (c) 1991-2022 The strace developers <https://strace.io>. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Optional features enabled: stack-trace=libunwind no-m32-mpers no-mx32-mpers • コマンドと Kernel 間でやり取りされる netlink message をモニタ可能なツール • Next Hop Object に関するメッセージである RTM_NEWNEXTHOP は strace v5.15 (2021-10-14) から対応
  24. Next Hop Group & Netlink on Linux | @ebiken |

    2022/12/01 ① nexthop 利用無し(従来) # ip route add 10.11.11.99/32 via 172.20.104.1 dev eno1 sendmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base={{len=52, type=RTM_NEWROUTE, flags=NLM_F_REQUEST|NLM_F_ACK|NLM_F_EXCL|NLM_F_CREATE, seq=1669690078, pid=0}, {rtm_family=AF_INET, rtm_dst_len=32, rtm_src_len=0, rtm_tos=0, rtm_table=RT_TABLE_MAIN, rtm_protocol=RTPROT_BOOT, rtm_scope=RT_SCOPE_UNIVERSE, rtm_type=RTN_UNICAST, rtm_flags=0}, [{{nla_len=8, nla_type=RTA_DST}, inet_addr("10.11.11.99")}, {{nla_len=8, nla_type=RTA_GATEWAY}, inet_addr("172.20.104.1")}, {{nla_len=8, nla_type=RTA_OIF}, if_nametoindex("eno1")}]}, iov_len=52}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 52 Netlink Message Type: RTM_NEWROUTE RT Message: rtm_family=AF_INET rtm_dst_len=32 rtm_src_len=0 rtm_tos=0 rtm_table=RT_TABLE_MAIN rtm_protocol=RTPROT_BOOT rtm_scope=RT_SCOPE_UNIVERSE rtm_type=RTN_UNICAST rtm_flags=0 Netlink Attribute: {nla_len=8, nla_type=RTA_DST}, inet_addr("10.11.11.99") {nla_len=8, nla_type=RTA_GATEWAY}, inet_addr("172.20.104.1") {nla_len=8, nla_type=RTA_OIF}, if_nametoindex("eno1") nexthop object を用いない従来の方法では RTM_NEWROUTE に nexthop に関する情報が含まれる。
  25. Next Hop Group & Netlink on Linux | @ebiken |

    2022/12/01 ② nexthop 利用無し Multipath(従来) # ip route add 10.11.11.11/32 nexthop via 172.20.105.174 dev eno1 nexthop via 172.20.105.175 dev eno1 sendmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=72, nlmsg_type=RTM_NEWROUTE, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK|NLM_F_EXCL|NLM_F_CREATE, nlmsg_seq=1669864316, nlmsg_pid=0}, {rtm_family=AF_INET, rtm_dst_len=32, rtm_src_len=0, rtm_tos=0, rtm_table=RT_TABLE_MAIN, rtm_protocol=RTPROT_BOOT, rtm_scope=RT_SCOPE_UNIVERSE, rtm_type=RTN_UNICAST, rtm_flags=0}, [[{nla_len=8, nla_type=RTA_DST}, inet_addr("10.11.11.11")], [{nla_len=36, nla_type=RTA_MULTIPATH}, [[{rtnh_len=16, rtnh_flags=0, rtnh_hops=0, rtnh_ifindex=if_nametoindex("eno1")}, [{nla_len=8, nla_type=RTA_GATEWAY}, inet_addr("172.20.105.174")]], [{rtnh_len=16, rtnh_flags=0, rtnh_hops=0, rtnh_ifindex=if_nametoindex("eno1")}, [{nla_len=8, nla_type=RTA_GATEWAY}, inet_addr("172.20.105.175")]]]]]], iov_len=72}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 72 Netlink Message Type: RTM_NEWROUTE RT Message: rtm_family=AF_INET rtm_dst_len=32 rtm_src_len=0 rtm_tos=0 rtm_table=RT_TABLE_MAIN rtm_protocol=RTPROT_BOOT rtm_scope=RT_SCOPE_UNIVERSE rtm_type=RTN_UNICAST rtm_flags=0 nexthop object を用いない従来の方法でも、複数の nexthop 設定(Multipath)は可能 RTA_MULTIPATH の値として rtnexthop 構造体を nexthop の数だけ利用 rtnh_ifindex に Output Interface ID(RTA_OIF相当)を、rtnexthop の値に RTA_GATEWAY として gateway アドレスをセット Netlink Attribute: {nla_len=8, nla_type=RTA_DST}, inet_addr("10.11.11.11") {nla_len=36, nla_type=RTA_MULTIPATH} {rtnh_len=16, rtnh_flags=0, rtnh_hops=0, rtnh_ifindex=if_nametoindex("eno1")} {nla_len=8, nla_type=RTA_GATEWAY}, inet_addr("172.20.105.174") {rtnh_len=16, rtnh_flags=0, rtnh_hops=0, rtnh_ifindex=if_nametoindex("eno1")} {nla_len=8, nla_type=RTA_GATEWAY}, inet_addr("172.20.105.175") // include/uapi/linux/rtnetlink.h struct rtnexthop { unsigned short rtnh_len; unsigned char rtnh_flags; unsigned char rtnh_hops; int rtnh_ifindex; }; ※ rtnh_ifindex は数値(ID)
  26. Next Hop Group & Netlink on Linux | @ebiken |

    2022/12/01 ③ nexthop を利用 > ip nexthop add id 11 via 172.20.105.173 dev eno1 sendmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=48, nlmsg_type=RTM_NEWNEXTHOP, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK|NLM_F_EXCL|NLM_F_CREATE, nlmsg_seq=1669695871, nlmsg_pid=0}, {nh_family=AF_INET, nh_scope=RT_SCOPE_UNIVERSE, nh_protocol=RTPROT_UNSPEC, nh_flags=0}, [[{nla_len=8, nla_type=NHA_ID}, 11], [{nla_len=8, nla_type=NHA_GATEWAY}, inet_addr("172.20.105.173")], [{nla_len=8, nla_type=NHA_OIF}, if_nametoindex("eno1")]]], iov_len=48}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 48 > ip route add 10.11.12.13/32 nhid 11 sendmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=44, nlmsg_type=RTM_NEWROUTE, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK|NLM_F_EXCL|NLM_F_CREATE, nlmsg_seq=1669710575, nlmsg_pid=0}, {rtm_family=AF_INET, rtm_dst_len=32, rtm_src_len=0, rtm_tos=0, rtm_table=RT_TABLE_MAIN, rtm_protocol=RTPROT_BOOT, rtm_scope=RT_SCOPE_UNIVERSE, rtm_type=RTN_UNICAST, rtm_flags=0}, [[{nla_len=8, nla_type=RTA_DST}, inet_addr("10.11.12.13")], [{nla_len=8, nla_type=RTA_NH_ID}, "¥x0b¥x00¥x00¥x00"]]], iov_len=44}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 44 Next Hop Object を利用する場合は、まず RTM_NEWNEXTHOP メッセージを送信して nexthop を作成、 そのIDを(RTA_GATEWAY や RTA_OIF の代わりに) RTM_NEWROUTE の Attribute である RTA_NH_ID に指定して送信 Netlink Message Type: RTM_NEWROUTE Next Hop Message: rtm_family=AF_INET rtm_dst_len=32 rtm_src_len=0 rtm_tos=0 rtm_table=RT_TABLE_MAIN rtm_protocol=RTPROT_BOOT rtm_scope=RT_SCOPE_UNIVERSE rtm_type=RTN_UNICAST rtm_flags=0 Netlink Attribute: {nla_len=8, nla_type=RTA_DST}, inet_addr("10.11.12.13") {nla_len=8, nla_type=RTA_NH_ID}, "¥x0b¥x00¥x00¥x00" Netlink Message Type: RTM_NEWNEXTHOP Next Hop Message: nh_family=AF_INET, nh_scope=RT_SCOPE_UNIVERSE, nh_protocol=RTPROT_UNSPEC, nh_flags=0 Netlink Attribute: {nla_len=8, nla_type=NHA_ID}, 11 {nla_len=8, nla_type=NHA_GATEWAY}, inet_addr("172.20.105.173") {nla_len=8, nla_type=NHA_OIF}, if_nametoindex("eno1")
  27. Next Hop Group & Netlink on Linux | @ebiken |

    2022/12/01 ④ nexthop group を利用 Multipath (1/2) > ip nexthop add id 1 via 172.20.105.172 dev eno1 > ip nexthop add id 2 via 172.20.105.173 dev eno1 sendmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=48, nlmsg_type=RTM_NEWNEXTHOP, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK|NLM_F_EXCL|NLM_F_CREATE, nlmsg_seq=1669711458, nlmsg_pid=0}, {nh_family=AF_INET, nh_scope=RT_SCOPE_UNIVERSE, nh_protocol=RTPROT_UNSPEC, nh_flags=0}, [[{nla_len=8, nla_type=NHA_ID}, 1], [{nla_len=8, nla_type=NHA_GATEWAY}, inet_addr("172.20.105.172")], [{nla_len=8, nla_type=NHA_OIF}, if_nametoindex("eno1")]]], iov_len=48}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 48 sendmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=48, nlmsg_type=RTM_NEWNEXTHOP, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK|NLM_F_EXCL|NLM_F_CREATE, nlmsg_seq=1669711492, nlmsg_pid=0}, {nh_family=AF_INET, nh_scope=RT_SCOPE_UNIVERSE, nh_protocol=RTPROT_UNSPEC, nh_flags=0}, [[{nla_len=8, nla_type=NHA_ID}, 2], [{nla_len=8, nla_type=NHA_GATEWAY}, inet_addr("172.20.105.173")], [{nla_len=8, nla_type=NHA_OIF}, if_nametoindex("eno1")]]], iov_len=48}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 48 1. RTM_NEWNEXTHOP メッセージを送信し nexthop を作成(2個以上) Netlink Message Type: RTM_NEWNEXTHOP Next Hop Message: nh_family=AF_INET, nh_scope=RT_SCOPE_UNIVERSE, nh_protocol=RTPROT_UNSPEC, nh_flags=0 Netlink Attribute: {nla_len=8, nla_type=NHA_ID}, 2 {nla_len=8, nla_type=NHA_GATEWAY}, inet_addr("172.20.105.173") {nla_len=8, nla_type=NHA_OIF}, if_nametoindex("eno1") Netlink Message Type: RTM_NEWNEXTHOP Next Hop Message: nh_family=AF_INET, nh_scope=RT_SCOPE_UNIVERSE, nh_protocol=RTPROT_UNSPEC, nh_flags=0 Netlink Attribute: {nla_len=8, nla_type=NHA_ID}, 1 {nla_len=8, nla_type=NHA_GATEWAY}, inet_addr("172.20.105.172") {nla_len=8, nla_type=NHA_OIF}, if_nametoindex("eno1")
  28. Next Hop Group & Netlink on Linux | @ebiken |

    2022/12/01 ④ nexthop group を利用 Multipath (2/2) > ip nexthop add id 3 group 1/2 sendmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=52, nlmsg_type=RTM_NEWNEXTHOP, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK|NLM_F_EXCL|NLM_F_CREATE, nlmsg_seq=1669711532, nlmsg_pid=0}, {nh_family=AF_UNSPEC, nh_scope=RT_SCOPE_UNIVERSE, nh_protocol=RTPROT_UNSPEC, nh_flags=0}, [[{nla_len=8, nla_type=NHA_ID}, 3], [{nla_len=20, nla_type=NHA_GROUP}, [{id=1, weight=0}, {id=2, weight=0}]]]], iov_len=52}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 52 > ip route add 10.11.12.13/32 nhid 3 sendmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=44, nlmsg_type=RTM_NEWROUTE, nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK|NLM_F_EXCL|NLM_F_CREATE, nlmsg_seq=1669711569, nlmsg_pid=0}, {rtm_family=AF_INET, rtm_dst_len=32, rtm_src_len=0, rtm_tos=0, rtm_table=RT_TABLE_MAIN, rtm_protocol=RTPROT_BOOT, rtm_scope=RT_SCOPE_UNIVERSE, rtm_type=RTN_UNICAST, rtm_flags=0}, [[{nla_len=8, nla_type=RTA_DST}, inet_addr("10.11.12.13")], [{nla_len=8, nla_type=RTA_NH_ID}, "¥x03¥x00¥x00¥x00"]]], iov_len=44}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 44 Netlink Message Type: RTM_NEWROUTE RT Message: rtm_family=AF_INET rtm_dst_len=32 rtm_src_len=0 rtm_tos=0 rtm_table=RT_TABLE_MAIN rtm_protocol=RTPROT_BOOT rtm_scope=RT_SCOPE_UNIVERSE rtm_type=RTN_UNICAST rtm_flags=0 Netlink Attribute: {nla_len=8, nla_type=RTA_DST}, inet_addr("10.11.12.13") {nla_len=8, nla_type=RTA_NH_ID}, "¥x03¥x00¥x00¥x00" Netlink Message Type: RTM_NEWNEXTHOP Next Hop Message: nh_family=AF_UNSPEC, nh_scope=RT_SCOPE_UNIVERSE, nh_protocol=RTPROT_UNSPEC, nh_flags=0 Netlink Attribute: {nla_len=8, nla_type=NHA_ID}, 3 {nla_len=20, nla_type=NHA_GROUP}, [ {id=1, weight=0}, {id=2, weight=0} ] 2. RTM_NEWNEXTHOP メッセージを送信し NHA_GROUP に "1." で作成した nexthop の ID を指定し nexthop group を作成 3. nexthop group の ID を RTA_NH_ID に指定して送信
  29. Appendix Next Hop Group & Netlink on Linux | @ebiken

    | 2022/12/01
  30. Next Hop Group & Netlink on Linux | @ebiken |

    2022/12/01 struct & enum of netlink / rtnetlink struct nlmsghdr { // netlink.h __u32 nlmsg_len; /* Len of message inc. hdr */ __u16 nlmsg_type; /* RTM_* (depends on protocol specified at socket open) NLMSG_ERROR メッセージはエラーを示し、ペイロードには nlmsgerr 構造体が入る。 NLMSG_DONE メッセージはマルチパートメッセージの終了を伝える。 __u16 nlmsg_flags; /* Additional flags */ __u32 nlmsg_seq; /* Sequence number */ __u32 nlmsg_pid; /* Sending process port ID */ }; include/uapi/linux/netlink.h #define NETLINK_ROUTE 0 /* Routing/device hook */ enum { RTM_NEWLINK = 16, RTM_DELLINK, RTM_GETLINK, RTM_SETLINK, RTM_NEWADDR = 20, RTM_DELADDR, RTM_GETADDR, RTM_NEWROUTE = 24, RTM_DELROUTE, RTM_GETROUTE, ...snip... }; /* RTM_MULTIPATH --- array of struct rtnexthop. At the moment it is impossible to set different prefsrc, mtu, window and rtt for different paths from multipath. struct rtnexthop { unsigned short rtnh_len; unsigned char rtnh_flags; unsigned char rtnh_hops; int rtnh_ifindex; }; include/uapi/linux/rtnetlink.h struct rtmsg { // rtnetlink.h unsigned char rtm_family; unsigned char rtm_dst_len; unsigned char rtm_src_len; unsigned char rtm_tos; unsigned char rtm_table; unsigned char rtm_protocol; unsigned char rtm_scope; unsigned char rtm_type; unsigned rtm_flags; }; /* rtm_type */ enum { RTN_UNSPEC, RTN_UNICAST, /* Gateway or direct route */ RTN_LOCAL, /* Accept locally */ RTN_BROADCAST, /* Accept locally as broadcast, send as broadcast */ RTN_ANYCAST, /* Accept locally as broadcast, but send as unicast */ RTN_MULTICAST, /* Multicast route */ RTN_BLACKHOLE, /* Drop */ RTN_UNREACHABLE, /* Destination is unreachable */ RTN_PROHIBIT, /* Administratively prohibited */ RTN_THROW, /* Not in this table */ RTN_NAT, /* Translate this address */ RTN_XRESOLVE, /* Use external resolver */ __RTN_MAX }; /* rtm_protocol */ #define RTPROT_KERNEL 2 /* kernel */ #define RTPROT_STATIC 4 /* admin */ #define RTPROT_GATED 8 /* GateD */ #define RTPROT_RA 9 /* RDISC/ND */ #define RTPROT_ZEBRA 11 /* Zebra */ #define RTPROT_BGP 186 /* BGP */ #define RTPROT_ISIS 187 /* ISIS */ #define RTPROT_OSPF 188 /* OSPF */ #define RTPROT_RIP 189 /* RIP */ /* RTA_VIA */ struct rtvia { __kernel_sa_family_t rtvia_family; __u8 rtvia_addr[]; }; /* RTnetlink multicast groups */ enum rtnetlink_groups 1 RTNLGRP_LINK, 3 RTNLGRP_NEIGH, 5 RTNLGRP_IPV4_IFADDR, 6 RTNLGRP_IPV4_MROUTE, 7 RTNLGRP_IPV4_ROUTE, 8 RTNLGRP_IPV4_RULE, 9 RTNLGRP_IPV6_IFADDR, 10 RTNLGRP_IPV6_MROUTE, 11 RTNLGRP_IPV6_ROUTE, struct sockaddr_nl { __kernel_sa_family_t nl_family; /* AF_NETLINK unsigned short nl_pad; /* zero __u32 nl_pid; /* port ID (0 for Kernel) __u32 nl_groups; /* mask of RTNLGRP_*, usnicast if 0 }; > man socket int socket(int domain, int type, int protocol); domain = AF_NETLINK type = SOCK_RAW protocol = NETLINK_ROUTE (belongs to domain) nlmsg_flags の標準フラグビット NLM_F_REQUEST 要求メッセージ全てでセットされなければならない。 NLM_F_MULTI このメッセージはマルチパートメッセージの一部である。 マルチパートメッセージは NLMSG_DONE で終端する。 NLM_F_ACK 成功した場合の応答を要求する。 NLM_F_ECHO この要求をエコーする。 struct rtattr { unsigned short rta_len; unsigned short rta_type; }; /* Routing message attr */ enum rtattr_type_t { RTA_UNSPEC, // 0 RTA_DST, // 1 RTA_SRC, // 2 RTA_IIF, // 3 RTA_OIF, // 4 RTA_GATEWAY, // 5 RTA_PRIORITY, // 6 ...snip... }