Upgrade to Pro — share decks privately, control downloads, hide ads and more …

BPFを利用したBitVisor内部でのパケットフィルタリング (+α)

mmisono
December 05, 2017

BPFを利用したBitVisor内部でのパケットフィルタリング (+α)

@BitVisor Summit 6 (2017-12-5)

mmisono

December 05, 2017
Tweet

More Decks by mmisono

Other Decks in Technology

Transcript

  1. #1'Λར༻ͨ͠#JU7JTPS಺෦Ͱͷ
    ύέοτϑΟϧλϦϯά Ћ

    ౦ژେֶ ৘ใཧ޻ֶܥݚڀՊ ຯી໺խ࢙
    BitVisor Summit6 | 2017-12-5

    View Slide

  2. 2
    ήετ04ʹґଘ͠ͳ͍ܰྔͳ
    ύέοτϑΟϧλϦϯάػߏ͕ཉ͍͠
    BitVisor Summit6 | 2017-12-5

    View Slide



  3. ήετ04ʹґଘ͠ͳ͍ܰྔͳ
    ύέοτϑΟϧλϦϯάػߏ͕ཉ͍͠
    ͦΕɼ#JU7JTPSͰ؆୯ʹͰ͖ΔΑ
    3
    BitVisor Summit6 | 2017-12-5

    View Slide



  4. ήετ04ʹґଘ͠ͳ͍ܰྔͳ
    ύέοτϑΟϧλϦϯάػߏ͕ཉ͍͠
    ͦΕɼ#JU7JTPSͰ؆୯ʹͰ͖ΔΑ
    4
    • ՝୊
    – Ͳ͜Ͱ
    – Ͳ͏΍ͬͯ
    BitVisor Summit6 | 2017-12-5

    View Slide

  5. 5
    BitVisor Summit6 | 2017-12-5
    Descriptor Ring
    Buffer
    Base Address
    DMA Engine
    NIC
    Physical Memory
    DMA
    Buffer Address
    ௨ৗͷωοτϫʔΫυϥΠό

    View Slide

  6. 6
    BitVisor Summit6 | 2017-12-5
    Shadow Descriptor
    Shadow Buffer
    Base Address
    DMA Engine
    NIC
    Physical Memory
    Copy
    Buffer Address
    Descriptor
    Buffer
    Buffer Address
    Managed by BitVisor
    Managed by Guest
    #JU7JTPSͷ४ύεεϧʔυϥΠό

    View Slide

  7. 5SBOTNJTTJPO'MPX
    1 ゲストがTx Ringを設定
    バッファの設定
    2 ゲストがTDTを更新 → VMExit
    MMIOアクセスをEPT Violationでフック
    3 BitVisorがShadow bufferへコピー
    4 BitVisorが実際にTDTを更新
    その後,デバイスが送信
    7
    BitVisor Summit6 | 2017-12-5
    Guest
    Tx Ring
    Shadow
    Tx Ring
    ③Copy
    ① ②
    VMExit
    ④ Update TDT

    View Slide

  8. 3FDFJWJOH'MPX
    1 ゲストが割り込みを受信
    割り込み⾃体は基本パススルー (virtio使⽤時は異なる)
    2 ゲストがMMIOレジスタへアクセス→VM Exit
    割り込みを確認するためにレジスタへアクセスする
    3 BitVisorがGuest bufferへコピー
    4 BitVisorがRDTを更新
    この後,ゲストは⾃⾝のdescriptorを参照してパケットを受信
    8
    BitVisor Summit6 | 2017-12-5
    Guest
    Rx Ring
    Shadow
    Rx Ring
    ③Copy
    ① Interrupt
    VMExit
    ④ Update RDT

    View Slide

  9. 9
    BitVisor Summit6 | 2017-12-5
    Shadow Descriptor
    Shadow Buffer
    Base Address
    DMA Engine
    NIC
    Physical Memory
    Copy
    Buffer Address
    Descriptor
    Buffer
    Buffer Address
    Managed by BitVisor
    Managed by Guest
    ここでフィルタリングすればOK!
    #JU7JTPSͷ४ύεεϧʔυϥΠό

    View Slide

  10. 10
    #JU7JTPSʹ͓͚ΔωοτϫʔΫυϥΠό
    BitVisor Summit6 | 2017-12-5
    (VFTU#VGGFS
    4IBEPX#VGGFS
    *OUFSNFEJBUF#VGGFS ネットワークドライバが定義
    A あるいは BのタイミングでネットワークAPI
    のコールバック関数を呼び出す
    A
    B

    View Slide

  11. 11
    ωοτϫʔΫ"1*
    BitVisor Summit6 | 2017-12-5
    null pass ip
    lwIP
    ippass
    lwIP

    View Slide

  12. 12
    QBTTϞδϡʔϧ
    BitVisor Summit6 | 2017-12-5
    (VFTU#VGGFS
    4IBEPX#VGGFS
    *OUFSNFEJBUF#VGGFS
    A
    B
    send_phys()
    send_virt()
    static void
    netapi_net_pass_recv_callback
    (…){
    struct net_pass_data2 *p = param;
    p->func->send (p->handle, num_packets,
    packets, packet_sizes, true);
    }
    Aならsend_phys()
    Bならsend_virt()が呼ばれる
    static void
    receive_physnic (…){

    d2->recvphys_func (…);

    }
    static int
    process_tdesc (…){

    d2->recvvirt_func (…);

    }
    A B
    QSPͷ৔߹

    View Slide

  13. 13
    ϑΟϧλϦϯάՕॴ
    BitVisor Summit6 | 2017-12-5
    (VFTU#VGGFS
    4IBEPX#VGGFS
    *OUFSNFEJBUF#VGGFS
    A
    send_phys()
    send_virt()
    static void
    netapi_net_pass_recv_callback
    (…){
    struct net_pass_data2 *p = param;
    // ここでフィルタリング!!
    p->func->send (p->handle, num_packets,
    packets, packet_sizes, true);
    }
    B

    View Slide

  14. 14
    )PXUP'JMUFS
    BitVisor Summit6 | 2017-12-5

    View Slide

  15. 15
    )PXUP'JMUFS
    Use BPF!
    Efficiency Safety
    Flexibility
    BitVisor Summit6 | 2017-12-5




    View Slide

  16. 16
    % tcpdump -d host 127.0.0.1 and port 80
    (000) ldh [12]
    (001) jeq #0x800 jt 2 jf 18
    (002) ld [26]
    (003) jeq #0x7f000001 jt 6 jf 4
    (004) ld [30]
    (005) jeq #0x7f000001 jt 6 jf 18
    (006) ldb [23]
    (007) jeq #0x84 jt 10 jf 8
    (008) jeq #0x6 jt 10 jf 9
    (009) jeq #0x11 jt 10 jf 18
    (010) ldh [20]
    (011) jset #0x1fff jt 18 jf 12
    (012) ldxb 4*([14]&0xf)
    (013) ldh [x + 14]
    (014) jeq #0x50 jt 17 jf 15
    (015) ldh [x + 16]
    (016) jeq #0x50 jt 17 jf 18
    (017) ret #262144
    (018) ret #0
    • Berkley Packet Filter
    [USENIX Winterʼ93]
    • パケット操作のための
    仮想的なレジスタマシン
    • libpcap, BSD, Linux, …
    BitVisor Summit6 | 2017-12-5

    View Slide

  17. 1
    2
    3
    17
    ޮ཰ੑ
    ॊೈੑ
    ඞཁ࠷খݶͷύέοτΞΫηε
    KJUԽ͕༰қ
    े෼ͳهड़ྗΛ໋࣋ͬͨྩηοτ
    ҆શੑ ෛํ޲ͷδϟϯϓ͕ͳ͍ D#1'

    ҆શੑͷݕূ͕༰қ FHϝϞϦΞΫηε

    ˞ݕূثͰ҆શੑ͸νΣοΫ͢Δ

    8IZ#1'
    BitVisor Summit6 | 2017-12-5

    View Slide

  18. 18
    BitVisor Summit6 | 2017-12-5
    Design Overview
    BPF
    SandBoxed
    Shadow Buffer Guest Buffer
    Drop
    pass / ippass module
    (送信も同様)

    View Slide

  19. int filter(struct __sk_buff *skb) {
    u8 *cursor = 0;
    struct ethernet_t *ethernet =
    cursor_advance(cursor, sizeof(*ethernet))
    if (!(ethernet->type == 0x0800)) {
    goto DROP;
    }

    19
    #1'ͷछྨ
    BitVisor Summit6 | 2017-12-5
    cBPF (classic BPF) eBPF (extended BPF)
    古くからのBPF 現在のLinuxで主に利⽤されているBPF
    libpcap clang, bcc, etc...
    ίϯύΠϧํ๏
    ֓ཁ
    ϓϩάϥϜྫ host = 127.0.0.1 and port = 80
    ࣮૷ྫ libpcap
    Linux Kernel
    ubpf

    View Slide

  20. 20
    #1'1SPHSBNͷઃఆํ๏
    BitVisor Summit6 | 2017-12-5
    • 7.$BMMʹΑΔ௥Ճ MX*1ػೳΛ࢖͏
    Guest OS Guest OS
    BPF Server
    BPF Program
    BPF
    ˞JQQBTTΛ࢖͏ͱɼͭͷ/*$ͰͰ͖Δ
    BPF
    BPF Program
    Verifier ❌ Verifier

    View Slide

  21. 21
    BitVisor Summit6 | 2017-12-5
    Throughput
    netperf -l3, server is the bitvisor machine
    measurement 10times, nohz=off, pinning

    View Slide

  22. 22
    BitVisor Summit6 | 2017-12-5
    Ping round trip time
    ping, measurement 30times, nohz=off

    View Slide

  23. 23
    BitVisor Summit6 | 2017-12-5
    Ping round trip time
    ping, measurement 30times, nohz=off

    View Slide

  24. 24
    BitVisor Summit6 | 2017-12-5
    Netperf Latency
    netperf -l3 -t omni -- -d rr -T UDP -m 64
    nohz=off, pinning

    View Slide

  25. 25
    3FMBUFE8PSLT
    BitVisor Summit6 | 2017-12-5
    Filtering in VMM Filtering
    Level
    Stateful Filtering 対応OS 軽量さ
    (⇄複雑さ)
    nwfilter (KVM) Packet ○ (conntrack) Linux ○
    SDN
    (Open vSwitch, VMWare NSX)
    Packet ○ All △
    VMI
    (VMwall1, xFilter2, AL-Safe3)
    Packet &
    Process
    ○(?) Linux △
    AWS Security Group Packet △ All ○
    BPF in BitVisor Packet △*
    (eBPF Mapを使う⽅法は考えられる)
    All ◎
    [1] A. Srivastava and J. Giffin. Tamper-resistant, application-aware blocking of malicious network connections. In RAID, pages 39‒58. Springer, 2008.
    [2] K. Kourai, T. Azumi, and S. Chiba. Efficient and fine-grained vmm-level packet filtering for self-protection. IJARAS, 5(2):83‒100, Apr. 2014.
    [3] A. Giannakou, L. Rilling, J.-L. Pazat, and C. Morin. AL-SAFE: A secure self-adaptable application-level firewall for IaaS clouds. In CloudCom, pages 383‒390. IEEE, 2016.
    * TCPのACKフィールドを⾒るなどで⾃分から開始したコネクションを簡易的に判断することは可能

    View Slide

  26. #FZPOE5IF
    1BDLFU'JMUFSJOH

    View Slide

  27. Tracing with eBPF

    View Slide

  28. 28
    BitVisor Summit6 | 2017-12-5
    FYUFOEFE #1'
    • Ϩδελ͕CJU෯
    • ໋ྩ਺ͷ૿Ճ
    • ΑΓKJUԽ͠΍͍͢*4"
    • ࣄલʹొ࿥ͨؔ͠਺ͷݺͼग़͠

    View Slide

  29. 29
    BitVisor Summit6 | 2017-12-6

    View Slide

  30. 30
    F#1'Ͱͷ5SBDJOHͷجຊ
    BitVisor Summit6 | 2017-12-5
    データ構造
    (連想配列)
    eBPFプログラム
    イベントの発⽣
    呼び出し
    (引数: コンテキスト)
    必要に応じて
    BPF Call経由でデータを更新
    後から参照

    View Slide

  31. 31
    F#1' 5SBDJOHJO#JU7JTPS
    BitVisor Summit6 | 2017-12-5
    Guest OS
    Program Attach
    eBPF HashMap
    σʔλऔಘ
    event
    ※lwIPを使う⽅法も考えられる
    BitVisor

    View Slide

  32. 32
    dྫd7.&YJU 3FBTPOͷ5SBDJOH
    BitVisor Summit6 | 2017-12-5
    struct args {
    u32 exit_reason;
    u32 exit_qualification;
    u64 data;
    }
    static void vt__exit_reason (void){

    int ret = bpf_exec(
    &args, sizeof(struct args));

    }

    View Slide

  33. BitVisor Summit6 | 2017-12-5 33
    prog = bcc.BPF(text="""
    int entry(struct args* args){
    update(args->exit_reason);
    return 0;
    }
    """)
    attach_program(prog)
    print("Tracing... Hit Ctrl-C to end.")
    try:
    time.sleep(9999999)
    except KeyboardInterrupt:
    print()
    detach_program()
    r = get_hashmap()
    for k, v in sorted(r.items(),
    key=lambda x: x[1], reverse=True)[:10]:
    print("{:17s}: {}".format(EXIT_REASON[k], v))
    ExitReasonを計数,最⼤上位10件を表⽰
    連想配列を1増やす
    bcc (BPF Compiler Collection)
    BPFプログラムのアタッチ
    連想配列データの取得

    View Slide

  34. BitVisor Summit6 | 2017-12-5 34
    prog = bcc.BPF(text="""
    int entry(struct args* args){
    update(args->exit_reason);
    return 0;
    }
    """)
    attach_program(prog)
    print("Tracing... Hit Ctrl-C to end.")
    try:
    time.sleep(9999999)
    except KeyboardInterrupt:
    print()
    detach_program()
    r = get_hashmap()
    for k, v in sorted(r.items(),
    key=lambda x: x[1], reverse=True)[:10]:
    print("{:17s}: {}".format(EXIT_REASON[k], v))
    % ./monitor_exit_reason.py
    Tracing... Hit Ctrl-C to end.
    ^C
    EPT_VIOLATION : 3135
    CPUID : 16
    IO_INSTRUCTION : 8
    EXCEPTION_OR_NMI : 3
    VMCALL : 2

    View Slide

  35. BitVisor Summit6 | 2017-12-5 35
    prog = bcc.BPF(text="""
    int entry(struct args* args){
    if(args->exit_reason == EPT_VIOLATION){
    u64 guest_address = args->data;
    update(guest_address);
    return 1;
    }
    return 0;
    }
    """)
    attach_program(prog)
    subprocess.call("netperf -H 192.168.20.1 -l3",
    stdout=subprocess.PIPE, shell=True)
    detach_program()
    r = get_hashmap()
    for k, v in sorted(r.items(),
    key=lambda x: x[1], reverse=True)[:10]:
    print("{:016X}: {}".format(k, v))
    MMIOアクセスのモニタリング

    View Slide

  36. BitVisor Summit6 | 2017-12-5 36
    prog = bcc.BPF(text="""
    int entry(struct args* args){
    if(args->exit_reason == EPT_VIOLATION){
    u64 guest_address = args->data;
    update(guest_address);
    return 1;
    }
    return 0;
    }
    """)
    attach_program(prog)
    subprocess.call("netperf -H 192.168.20.1 -l3",
    stdout=subprocess.PIPE, shell=True)
    detach_program()
    r = get_hashmap()
    for k, v in sorted(r.items(),
    key=lambda x: x[1], reverse=True)[:10]:
    print("{:016X}: {}".format(k, v))
    % ./monitor_ept_violation.py
    00000000EFC000C0: 8790
    00000000EFC00008: 8790
    00000000EFC000D0: 8790
    00000000EFC000C4: 6967
    00000000EFC03818: 5649
    00000000EFC02818: 947
    00000000EFC040FC: 2
    00000000EFC040A0: 2
    00000000EFC04054: 2
    00000000EFC040A4: 2
    ※ippassでNICをフックした状態

    View Slide

  37. BitVisor Summit6 | 2017-12-5 37
    prog = """
    int entry(struct args* args){
    if(args->exit_reason == EPT_VIOLATION){
    u64 id = get_cpu_id();
    u64 time = get_time();
    if(args->entry == 1){
    put((char*)&id, sizeof(u64),
    (char*)&time, sizeof(u64));
    }else{
    void* value = get((char*)&id,
    sizeof(u64));
    if (value == NULL){
    return 1;
    }
    u64 st = *(u64*)value;
    update((time-st)<<32 | args->data);
    }
    }
    }
    return 0;
    }
    """
    ※ippassでNICをフックした状態
    受信時 (netperf server)
    % ./measure_mmio_latency.py
    Tracing... Hit Ctrl-C to end.
    ^C
    00000000EFC000C0: 34.6
    00000000EFC04000: 17.0
    00000000EFC03818: 10.2
    00000000EFC04074: 8.5
    00000000EFC00008: 7.9
    00000000EFC000D0: 7.7
    00000000EFC02818: 7.6
    00000000EFC0403C: 4.0
    00000000EFC040BC: 4.0
    00000000EFC040F4: 4.0

    View Slide

  38. BitVisor Summit6 | 2017-12-5 38
    prog = """
    int entry(struct args* args){
    if(args->exit_reason == EPT_VIOLATION){
    u64 id = get_cpu_id();
    u64 time = get_time();
    if(args->entry == 1){
    put((char*)&id, sizeof(u64),
    (char*)&time, sizeof(u64));
    }else{
    void* value = get((char*)&id,
    sizeof(u64));
    if (value == NULL){
    return 1;
    }
    u64 st = *(u64*)value;
    update((time-st)<<32 | args->data);
    }
    }
    }
    return 0;
    }
    """
    ※ippassでNICをフックした状態
    送信時 (netperf client)
    % ./measure_mmio_latency.py
    Tracing... Hit Ctrl-C to end.
    ^C
    00000000EFC03818: 92.0
    00000000EFC000C0: 18.6
    00000000EFC04000: 12.5
    00000000EFC04074: 9.0
    00000000EFC000C4: 8.3
    00000000EFC00008: 8.1
    00000000EFC000D0: 7.9
    00000000EFC04088: 4.5
    00000000EFC0408C: 4.0
    00000000EFC040AC: 4.0

    View Slide

  39. 39
    BitVisor Summit6 | 2017-12-5
    Ping round trip time
    ping, measurement 30times, nohz=off

    View Slide

  40. 40
    BitVisor Summit6 | 2017-12-5
    Netperf Latency
    netperf -l3 -t omni -- -d rr -T UDP -m 64
    nohz=off, pinning

    View Slide

  41. 1
    2
    3
    4
    5
    #JU7JTPSΛ࢖͏ͱ؆୯ʹύέοτૹड৴ΛϑοΫͯ͠ύέοτૢ࡞Ͱ͖·͢
    ࣮ࡍʹܰྔͳύέοτϑΟϧλϦϯάػߏΛ࣮૷͠·ͨ͠
    F#1'Λ༻͍ͨ7.&YJUͷτϨʔγϯάํ๏ʹ͍͓ͭͯ࿩͠͠·ͨ͠
    'VUVSFXPSL GJMUFSJOH
    ೝূ "EBQUJWF'JMUFSJOH 4UBUFGVM GJMUFSJOH ʜ
    'VUVSFXPSL USBDJOH
    ϑοΫϙΠϯτ Ҿ਺ͷઃఆ ʜ
    4VNNBSZ
    41
    BitVisor Summit6 | 2017-12-5

    View Slide

  42. BitVisor Summit6 | 2017-12-5 42

    View Slide

  43. 43
    F#1'Ͱϓϩάϥϛϯά͢Δࡍͷ஫ҙ఺
    BitVisor Summit6 | 2017-12-5
    #1' $BMMͰͳ͍ؔ਺Λݺͼग़͢৔߹͸ɼJOMJOFల։͢ΔΑ͏ʹ͢Δ
    inline
    int g(int x){
    return x;
    }
    int f(int x){
    int r = g(x);
    return r;
    }

    View Slide

  44. 44
    F#1'Ͱϓϩάϥϛϯά͢Δࡍͷ஫ҙ఺
    BitVisor Summit6 | 2017-12-5
    σʔλ͕ίʔυ಺ʹؚ·ΕΔΑ͏ʹ͢Δ
    int f(int a){
    int x[N] = { IPアドレスのリスト };
    int i = 0;
    for(i = 0; i < N; i++){
    if(a == x[i]){
    return 1;
    }
    }
    return 0;
    }
    例: IPアドレスのマッチング
    f:

    jeqi r2, goto L1
    jeqi r2, goto L1
    jeqi r2, goto L1

    L1:
    mov r0, 1
    ret
    L2:
    mov r0, 0
    ret
    Nが⼩さいとき ⇨ OK

    View Slide

  45. 45
    F#1'Ͱϓϩάϥϛϯά͢Δࡍͷ஫ҙ఺
    BitVisor Summit6 | 2017-12-5
    σʔλ͕ίʔυ಺ʹؚ·ΕΔΑ͏ʹ͢Δ
    int f(int a){
    int x[N] = { IPアドレスのリスト };
    int i = 0;
    for(i = 0; i < N; i++){
    if(a == x[i]){
    return 1;
    }
    }
    return 0;
    }
    例: IPアドレスのマッチング
    f:

    ld_64 r3,

    L0:

    addi r3, 4
    ldw r4, 0(r3)
    jne r4, r1, goto L0

    .section .rodata
    .LF.x:
    .long
    .long
    ….
    Nが⼤きいとき ⇨
    .rodataセクションにデータが配置されてしまう
    これはBPFは直接扱えない!

    View Slide

  46. 46
    F#1'Ͱϓϩάϥϛϯά͢Δࡍͷ஫ҙ఺
    BitVisor Summit6 | 2017-12-5
    σʔλ͕ίʔυ಺ʹؚ·ΕΔΑ͏ʹ͢Δ
    int f(int a){
    if(a == ){
    return 1;
    }
    if(a == ){
    return 1;
    }

    return 0;
    }
    解決案1: 配列を使⽤せず直接⽐較するコードを書く
    ※そもそもループは展開しないと検証器にひっかかる可能性
    ※膨⼤なIPのブラックリストと⽐較したい場合は,ブラックリストを事
    前に⽤意し,それにIPが含まれるかを確認するBPF Callを⽤意すべき
    (テーブルをlookupして値が含まれるか確認するだけ)
    c.f. https://github.com/netoptimizer/prototype-
    kernel/blob/6923acb545dff86a4b6bcfe503cfeff3bd61bb88/kernel/sa
    mples/bpf/xdp_ddos01_blacklist_kern.c
    解決案2: #pragma unroll を使う
    #pragma unroll
    for(i = 0; i < N; i++){
    if(a == x[i]){
    return 1;
    }
    }

    View Slide

  47. 47
    F#1'Ͱϓϩάϥϛϯά͢Δࡍͷ஫ҙ఺
    BitVisor Summit6 | 2017-12-5
    σʔλ͕ίʔυ಺ʹؚ·ΕΔΑ͏ʹ͢Δ
    int f(int x){

    printf("Hello %d¥n", x)

    return 0;
    }
    例: ⽂字列定数
    int f(){

    char _fmt[] = "Hello %d¥n";
    printf(_fmt, x)

    return 0;
    }
    (この場合⽂字列定数は.rodata sectionに置かれる)

    View Slide

  48. 48
    F#1'Ͱϓϩάϥϛϯά͢Δࡍͷ஫ҙ఺
    BitVisor Summit6 | 2017-12-5
    σʔλ͕ίʔυ಺ʹؚ·ΕΔΑ͏ʹ͢Δ
    例: ⽂字列定数
    int f(){

    char _fmt[] = "Hello %d¥n";
    printf(_fmt, x)

    return 0;
    }
    f:

    ld_64 r2, 0x6425206f6c6c6548
    std -16(r10), r2
    mov r1, r10
    addi r1, -16
    call printf

    ⼀旦⽂字列をスタック(r10)上に置くコードが⽣成される
    “Hello World%d”

    View Slide