Slide 1

Slide 1 text

#1'Λར༻ͨ͠#JU7JTPS಺෦Ͱͷ ύέοτϑΟϧλϦϯά Ћ ౦ژେֶ ৘ใཧ޻ֶܥݚڀՊ ຯી໺խ࢙ BitVisor Summit6 | 2017-12-5

Slide 2

Slide 2 text

2 ήετ04ʹґଘ͠ͳ͍ܰྔͳ ύέοτϑΟϧλϦϯάػߏ͕ཉ͍͠ BitVisor Summit6 | 2017-12-5

Slide 3

Slide 3 text

ήετ04ʹґଘ͠ͳ͍ܰྔͳ ύέοτϑΟϧλϦϯάػߏ͕ཉ͍͠ ͦΕɼ#JU7JTPSͰ؆୯ʹͰ͖ΔΑ 3 BitVisor Summit6 | 2017-12-5

Slide 4

Slide 4 text

ήετ04ʹґଘ͠ͳ͍ܰྔͳ ύέοτϑΟϧλϦϯάػߏ͕ཉ͍͠ ͦΕɼ#JU7JTPSͰ؆୯ʹͰ͖ΔΑ 4 • ՝୊ – Ͳ͜Ͱ – Ͳ͏΍ͬͯ BitVisor Summit6 | 2017-12-5

Slide 5

Slide 5 text

5 BitVisor Summit6 | 2017-12-5 Descriptor Ring Buffer Base Address DMA Engine NIC Physical Memory DMA Buffer Address ௨ৗͷωοτϫʔΫυϥΠό

Slide 6

Slide 6 text

6 BitVisor Summit6 | 2017-12-5 Shadow Descriptor Shadow Buffer Base Address DMA Engine NIC Physical Memory Copy Buffer Address Descriptor Buffer Buffer Address Managed by BitVisor Managed by Guest #JU7JTPSͷ४ύεεϧʔυϥΠό

Slide 7

Slide 7 text

5SBOTNJTTJPO'MPX 1 ゲストがTx Ringを設定 バッファの設定 2 ゲストがTDTを更新 → VMExit MMIOアクセスをEPT Violationでフック 3 BitVisorがShadow bufferへコピー 4 BitVisorが実際にTDTを更新 その後,デバイスが送信 7 BitVisor Summit6 | 2017-12-5 Guest Tx Ring Shadow Tx Ring ③Copy ① ② VMExit ④ Update TDT

Slide 8

Slide 8 text

3FDFJWJOH'MPX 1 ゲストが割り込みを受信 割り込み⾃体は基本パススルー (virtio使⽤時は異なる) 2 ゲストがMMIOレジスタへアクセス→VM Exit 割り込みを確認するためにレジスタへアクセスする 3 BitVisorがGuest bufferへコピー 4 BitVisorがRDTを更新 この後,ゲストは⾃⾝のdescriptorを参照してパケットを受信 8 BitVisor Summit6 | 2017-12-5 Guest Rx Ring Shadow Rx Ring ③Copy ① Interrupt VMExit ④ Update RDT ②

Slide 9

Slide 9 text

9 BitVisor Summit6 | 2017-12-5 Shadow Descriptor Shadow Buffer Base Address DMA Engine NIC Physical Memory Copy Buffer Address Descriptor Buffer Buffer Address Managed by BitVisor Managed by Guest ここでフィルタリングすればOK! #JU7JTPSͷ४ύεεϧʔυϥΠό

Slide 10

Slide 10 text

10 #JU7JTPSʹ͓͚ΔωοτϫʔΫυϥΠό BitVisor Summit6 | 2017-12-5 (VFTU#VGGFS 4IBEPX#VGGFS *OUFSNFEJBUF#VGGFS ネットワークドライバが定義 A あるいは BのタイミングでネットワークAPI のコールバック関数を呼び出す A B

Slide 11

Slide 11 text

11 ωοτϫʔΫ"1* BitVisor Summit6 | 2017-12-5 null pass ip lwIP ippass lwIP

Slide 12

Slide 12 text

12 QBTTϞδϡʔϧ BitVisor Summit6 | 2017-12-5 (VFTU#VGGFS 4IBEPX#VGGFS *OUFSNFEJBUF#VGGFS A B send_phys() send_virt() static void netapi_net_pass_recv_callback (…){ struct net_pass_data2 *p = param; p->func->send (p->handle, num_packets, packets, packet_sizes, true); } Aならsend_phys() Bならsend_virt()が呼ばれる static void receive_physnic (…){ … d2->recvphys_func (…); … } static int process_tdesc (…){ … d2->recvvirt_func (…); … } A B QSPͷ৔߹

Slide 13

Slide 13 text

13 ϑΟϧλϦϯάՕॴ BitVisor Summit6 | 2017-12-5 (VFTU#VGGFS 4IBEPX#VGGFS *OUFSNFEJBUF#VGGFS A send_phys() send_virt() static void netapi_net_pass_recv_callback (…){ struct net_pass_data2 *p = param; // ここでフィルタリング!! p->func->send (p->handle, num_packets, packets, packet_sizes, true); } B

Slide 14

Slide 14 text

14 )PXUP'JMUFS BitVisor Summit6 | 2017-12-5

Slide 15

Slide 15 text

15 )PXUP'JMUFS Use BPF! Efficiency Safety Flexibility BitVisor Summit6 | 2017-12-5 ✅ ✅ ✅

Slide 16

Slide 16 text

16 % tcpdump -d host 127.0.0.1 and port 80 (000) ldh [12] (001) jeq #0x800 jt 2 jf 18 (002) ld [26] (003) jeq #0x7f000001 jt 6 jf 4 (004) ld [30] (005) jeq #0x7f000001 jt 6 jf 18 (006) ldb [23] (007) jeq #0x84 jt 10 jf 8 (008) jeq #0x6 jt 10 jf 9 (009) jeq #0x11 jt 10 jf 18 (010) ldh [20] (011) jset #0x1fff jt 18 jf 12 (012) ldxb 4*([14]&0xf) (013) ldh [x + 14] (014) jeq #0x50 jt 17 jf 15 (015) ldh [x + 16] (016) jeq #0x50 jt 17 jf 18 (017) ret #262144 (018) ret #0 • Berkley Packet Filter [USENIX Winterʼ93] • パケット操作のための 仮想的なレジスタマシン • libpcap, BSD, Linux, … BitVisor Summit6 | 2017-12-5

Slide 17

Slide 17 text

1 2 3 17 ޮ཰ੑ ॊೈੑ ඞཁ࠷খݶͷύέοτΞΫηε KJUԽ͕༰қ े෼ͳهड़ྗΛ໋࣋ͬͨྩηοτ ҆શੑ ෛํ޲ͷδϟϯϓ͕ͳ͍ D#1' ҆શੑͷݕূ͕༰қ FHϝϞϦΞΫηε ˞ݕূثͰ҆શੑ͸νΣοΫ͢Δ 8IZ#1' BitVisor Summit6 | 2017-12-5

Slide 18

Slide 18 text

18 BitVisor Summit6 | 2017-12-5 Design Overview BPF SandBoxed Shadow Buffer Guest Buffer Drop pass / ippass module (送信も同様)

Slide 19

Slide 19 text

int filter(struct __sk_buff *skb) { u8 *cursor = 0; struct ethernet_t *ethernet = cursor_advance(cursor, sizeof(*ethernet)) if (!(ethernet->type == 0x0800)) { goto DROP; } … 19 #1'ͷछྨ BitVisor Summit6 | 2017-12-5 cBPF (classic BPF) eBPF (extended BPF) 古くからのBPF 現在のLinuxで主に利⽤されているBPF libpcap clang, bcc, etc... ίϯύΠϧํ๏ ֓ཁ ϓϩάϥϜྫ host = 127.0.0.1 and port = 80 ࣮૷ྫ libpcap Linux Kernel ubpf

Slide 20

Slide 20 text

20 #1'1SPHSBNͷઃఆํ๏ BitVisor Summit6 | 2017-12-5 • 7.$BMMʹΑΔ௥Ճ MX*1ػೳΛ࢖͏ Guest OS Guest OS BPF Server BPF Program BPF ˞JQQBTTΛ࢖͏ͱɼͭͷ/*$ͰͰ͖Δ BPF BPF Program Verifier ❌ Verifier ❌

Slide 21

Slide 21 text

21 BitVisor Summit6 | 2017-12-5 Throughput netperf -l3, server is the bitvisor machine measurement 10times, nohz=off, pinning

Slide 22

Slide 22 text

22 BitVisor Summit6 | 2017-12-5 Ping round trip time ping, measurement 30times, nohz=off

Slide 23

Slide 23 text

23 BitVisor Summit6 | 2017-12-5 Ping round trip time ping, measurement 30times, nohz=off

Slide 24

Slide 24 text

24 BitVisor Summit6 | 2017-12-5 Netperf Latency netperf -l3 -t omni -- -d rr -T UDP -m 64 nohz=off, pinning

Slide 25

Slide 25 text

25 3FMBUFE8PSLT BitVisor Summit6 | 2017-12-5 Filtering in VMM Filtering Level Stateful Filtering 対応OS 軽量さ (⇄複雑さ) nwfilter (KVM) Packet ○ (conntrack) Linux ○ SDN (Open vSwitch, VMWare NSX) Packet ○ All △ VMI (VMwall1, xFilter2, AL-Safe3) Packet & Process ○(?) Linux △ AWS Security Group Packet △ All ○ BPF in BitVisor Packet △* (eBPF Mapを使う⽅法は考えられる) All ◎ [1] A. Srivastava and J. Giffin. Tamper-resistant, application-aware blocking of malicious network connections. In RAID, pages 39‒58. Springer, 2008. [2] K. Kourai, T. Azumi, and S. Chiba. Efficient and fine-grained vmm-level packet filtering for self-protection. IJARAS, 5(2):83‒100, Apr. 2014. [3] A. Giannakou, L. Rilling, J.-L. Pazat, and C. Morin. AL-SAFE: A secure self-adaptable application-level firewall for IaaS clouds. In CloudCom, pages 383‒390. IEEE, 2016. * TCPのACKフィールドを⾒るなどで⾃分から開始したコネクションを簡易的に判断することは可能

Slide 26

Slide 26 text

#FZPOE5IF 1BDLFU'JMUFSJOH

Slide 27

Slide 27 text

Tracing with eBPF

Slide 28

Slide 28 text

28 BitVisor Summit6 | 2017-12-5 FYUFOEFE #1' • Ϩδελ͕CJU෯ • ໋ྩ਺ͷ૿Ճ • ΑΓKJUԽ͠΍͍͢*4" • ࣄલʹొ࿥ͨؔ͠਺ͷݺͼग़͠

Slide 29

Slide 29 text

29 BitVisor Summit6 | 2017-12-6

Slide 30

Slide 30 text

30 F#1'Ͱͷ5SBDJOHͷجຊ BitVisor Summit6 | 2017-12-5 データ構造 (連想配列) eBPFプログラム イベントの発⽣ 呼び出し (引数: コンテキスト) 必要に応じて BPF Call経由でデータを更新 後から参照

Slide 31

Slide 31 text

31 F#1' 5SBDJOHJO#JU7JTPS BitVisor Summit6 | 2017-12-5 Guest OS Program Attach eBPF HashMap σʔλऔಘ event ※lwIPを使う⽅法も考えられる BitVisor

Slide 32

Slide 32 text

32 dྫd7.&YJU 3FBTPOͷ5SBDJOH BitVisor Summit6 | 2017-12-5 struct args { u32 exit_reason; u32 exit_qualification; u64 data; } static void vt__exit_reason (void){ … int ret = bpf_exec( &args, sizeof(struct args)); … }

Slide 33

Slide 33 text

BitVisor Summit6 | 2017-12-5 33 prog = bcc.BPF(text=""" int entry(struct args* args){ update(args->exit_reason); return 0; } """) attach_program(prog) print("Tracing... Hit Ctrl-C to end.") try: time.sleep(9999999) except KeyboardInterrupt: print() detach_program() r = get_hashmap() for k, v in sorted(r.items(), key=lambda x: x[1], reverse=True)[:10]: print("{:17s}: {}".format(EXIT_REASON[k], v)) ExitReasonを計数,最⼤上位10件を表⽰ 連想配列を1増やす bcc (BPF Compiler Collection) BPFプログラムのアタッチ 連想配列データの取得

Slide 34

Slide 34 text

BitVisor Summit6 | 2017-12-5 34 prog = bcc.BPF(text=""" int entry(struct args* args){ update(args->exit_reason); return 0; } """) attach_program(prog) print("Tracing... Hit Ctrl-C to end.") try: time.sleep(9999999) except KeyboardInterrupt: print() detach_program() r = get_hashmap() for k, v in sorted(r.items(), key=lambda x: x[1], reverse=True)[:10]: print("{:17s}: {}".format(EXIT_REASON[k], v)) % ./monitor_exit_reason.py Tracing... Hit Ctrl-C to end. ^C EPT_VIOLATION : 3135 CPUID : 16 IO_INSTRUCTION : 8 EXCEPTION_OR_NMI : 3 VMCALL : 2

Slide 35

Slide 35 text

BitVisor Summit6 | 2017-12-5 35 prog = bcc.BPF(text=""" int entry(struct args* args){ if(args->exit_reason == EPT_VIOLATION){ u64 guest_address = args->data; update(guest_address); return 1; } return 0; } """) attach_program(prog) subprocess.call("netperf -H 192.168.20.1 -l3", stdout=subprocess.PIPE, shell=True) detach_program() r = get_hashmap() for k, v in sorted(r.items(), key=lambda x: x[1], reverse=True)[:10]: print("{:016X}: {}".format(k, v)) MMIOアクセスのモニタリング

Slide 36

Slide 36 text

BitVisor Summit6 | 2017-12-5 36 prog = bcc.BPF(text=""" int entry(struct args* args){ if(args->exit_reason == EPT_VIOLATION){ u64 guest_address = args->data; update(guest_address); return 1; } return 0; } """) attach_program(prog) subprocess.call("netperf -H 192.168.20.1 -l3", stdout=subprocess.PIPE, shell=True) detach_program() r = get_hashmap() for k, v in sorted(r.items(), key=lambda x: x[1], reverse=True)[:10]: print("{:016X}: {}".format(k, v)) % ./monitor_ept_violation.py 00000000EFC000C0: 8790 00000000EFC00008: 8790 00000000EFC000D0: 8790 00000000EFC000C4: 6967 00000000EFC03818: 5649 00000000EFC02818: 947 00000000EFC040FC: 2 00000000EFC040A0: 2 00000000EFC04054: 2 00000000EFC040A4: 2 ※ippassでNICをフックした状態

Slide 37

Slide 37 text

BitVisor Summit6 | 2017-12-5 37 prog = """ int entry(struct args* args){ if(args->exit_reason == EPT_VIOLATION){ u64 id = get_cpu_id(); u64 time = get_time(); if(args->entry == 1){ put((char*)&id, sizeof(u64), (char*)&time, sizeof(u64)); }else{ void* value = get((char*)&id, sizeof(u64)); if (value == NULL){ return 1; } u64 st = *(u64*)value; update((time-st)<<32 | args->data); } } } return 0; } """ ※ippassでNICをフックした状態 受信時 (netperf server) % ./measure_mmio_latency.py Tracing... Hit Ctrl-C to end. ^C 00000000EFC000C0: 34.6 00000000EFC04000: 17.0 00000000EFC03818: 10.2 00000000EFC04074: 8.5 00000000EFC00008: 7.9 00000000EFC000D0: 7.7 00000000EFC02818: 7.6 00000000EFC0403C: 4.0 00000000EFC040BC: 4.0 00000000EFC040F4: 4.0

Slide 38

Slide 38 text

BitVisor Summit6 | 2017-12-5 38 prog = """ int entry(struct args* args){ if(args->exit_reason == EPT_VIOLATION){ u64 id = get_cpu_id(); u64 time = get_time(); if(args->entry == 1){ put((char*)&id, sizeof(u64), (char*)&time, sizeof(u64)); }else{ void* value = get((char*)&id, sizeof(u64)); if (value == NULL){ return 1; } u64 st = *(u64*)value; update((time-st)<<32 | args->data); } } } return 0; } """ ※ippassでNICをフックした状態 送信時 (netperf client) % ./measure_mmio_latency.py Tracing... Hit Ctrl-C to end. ^C 00000000EFC03818: 92.0 00000000EFC000C0: 18.6 00000000EFC04000: 12.5 00000000EFC04074: 9.0 00000000EFC000C4: 8.3 00000000EFC00008: 8.1 00000000EFC000D0: 7.9 00000000EFC04088: 4.5 00000000EFC0408C: 4.0 00000000EFC040AC: 4.0

Slide 39

Slide 39 text

39 BitVisor Summit6 | 2017-12-5 Ping round trip time ping, measurement 30times, nohz=off

Slide 40

Slide 40 text

40 BitVisor Summit6 | 2017-12-5 Netperf Latency netperf -l3 -t omni -- -d rr -T UDP -m 64 nohz=off, pinning

Slide 41

Slide 41 text

1 2 3 4 5 #JU7JTPSΛ࢖͏ͱ؆୯ʹύέοτૹड৴ΛϑοΫͯ͠ύέοτૢ࡞Ͱ͖·͢ ࣮ࡍʹܰྔͳύέοτϑΟϧλϦϯάػߏΛ࣮૷͠·ͨ͠ F#1'Λ༻͍ͨ7.&YJUͷτϨʔγϯάํ๏ʹ͍͓ͭͯ࿩͠͠·ͨ͠ 'VUVSFXPSL GJMUFSJOH ೝূ "EBQUJWF'JMUFSJOH 4UBUFGVM GJMUFSJOH ʜ 'VUVSFXPSL USBDJOH ϑοΫϙΠϯτ Ҿ਺ͷઃఆ ʜ 4VNNBSZ 41 BitVisor Summit6 | 2017-12-5

Slide 42

Slide 42 text

BitVisor Summit6 | 2017-12-5 42

Slide 43

Slide 43 text

43 F#1'Ͱϓϩάϥϛϯά͢Δࡍͷ஫ҙ఺ BitVisor Summit6 | 2017-12-5 #1' $BMMͰͳ͍ؔ਺Λݺͼग़͢৔߹͸ɼJOMJOFల։͢ΔΑ͏ʹ͢Δ inline int g(int x){ return x; } int f(int x){ int r = g(x); return r; }

Slide 44

Slide 44 text

44 F#1'Ͱϓϩάϥϛϯά͢Δࡍͷ஫ҙ఺ BitVisor Summit6 | 2017-12-5 σʔλ͕ίʔυ಺ʹؚ·ΕΔΑ͏ʹ͢Δ int f(int a){ int x[N] = { IPアドレスのリスト }; int i = 0; for(i = 0; i < N; i++){ if(a == x[i]){ return 1; } } return 0; } 例: IPアドレスのマッチング f: … jeqi r2, goto L1 jeqi r2, goto L1 jeqi r2, goto L1 … L1: mov r0, 1 ret L2: mov r0, 0 ret Nが⼩さいとき ⇨ OK

Slide 45

Slide 45 text

45 F#1'Ͱϓϩάϥϛϯά͢Δࡍͷ஫ҙ఺ BitVisor Summit6 | 2017-12-5 σʔλ͕ίʔυ಺ʹؚ·ΕΔΑ͏ʹ͢Δ int f(int a){ int x[N] = { IPアドレスのリスト }; int i = 0; for(i = 0; i < N; i++){ if(a == x[i]){ return 1; } } return 0; } 例: IPアドレスのマッチング f: … ld_64 r3, … L0: … addi r3, 4 ldw r4, 0(r3) jne r4, r1, goto L0 … .section .rodata .LF.x: .long .long …. Nが⼤きいとき ⇨ .rodataセクションにデータが配置されてしまう これはBPFは直接扱えない!

Slide 46

Slide 46 text

46 F#1'Ͱϓϩάϥϛϯά͢Δࡍͷ஫ҙ఺ BitVisor Summit6 | 2017-12-5 σʔλ͕ίʔυ಺ʹؚ·ΕΔΑ͏ʹ͢Δ int f(int a){ if(a == ){ return 1; } if(a == ){ return 1; } … return 0; } 解決案1: 配列を使⽤せず直接⽐較するコードを書く ※そもそもループは展開しないと検証器にひっかかる可能性 ※膨⼤なIPのブラックリストと⽐較したい場合は,ブラックリストを事 前に⽤意し,それにIPが含まれるかを確認するBPF Callを⽤意すべき (テーブルをlookupして値が含まれるか確認するだけ) c.f. https://github.com/netoptimizer/prototype- kernel/blob/6923acb545dff86a4b6bcfe503cfeff3bd61bb88/kernel/sa mples/bpf/xdp_ddos01_blacklist_kern.c 解決案2: #pragma unroll を使う #pragma unroll for(i = 0; i < N; i++){ if(a == x[i]){ return 1; } }

Slide 47

Slide 47 text

47 F#1'Ͱϓϩάϥϛϯά͢Δࡍͷ஫ҙ఺ BitVisor Summit6 | 2017-12-5 σʔλ͕ίʔυ಺ʹؚ·ΕΔΑ͏ʹ͢Δ int f(int x){ … printf("Hello %d¥n", x) … return 0; } 例: ⽂字列定数 int f(){ … char _fmt[] = "Hello %d¥n"; printf(_fmt, x) … return 0; } (この場合⽂字列定数は.rodata sectionに置かれる)

Slide 48

Slide 48 text

48 F#1'Ͱϓϩάϥϛϯά͢Δࡍͷ஫ҙ఺ BitVisor Summit6 | 2017-12-5 σʔλ͕ίʔυ಺ʹؚ·ΕΔΑ͏ʹ͢Δ 例: ⽂字列定数 int f(){ … char _fmt[] = "Hello %d¥n"; printf(_fmt, x) … return 0; } f: … ld_64 r2, 0x6425206f6c6c6548 std -16(r10), r2 mov r1, r10 addi r1, -16 call printf … ⼀旦⽂字列をスタック(r10)上に置くコードが⽣成される “Hello World%d”