Upgrade to Pro — share decks privately, control downloads, hide ads and more …

bitvisor.ko : BitVisor as a module

mmisono
November 28, 2018

bitvisor.ko : BitVisor as a module

mmisono

November 28, 2018
Tweet

More Decks by mmisono

Other Decks in Technology

Transcript

  1. bitvisor.ko : BitVisor as a module
    Masanori Misono, The University of Tokyo
    [email protected]
    2018-11-28 BitVisor Summit 7

    View full-size slide

  2. 2018-11-28 BitVisor Summit 7
    3
    BitVisor Developing Flow
    Coding Install Reboot

    View full-size slide

  3. Virtualization Overhead

    View full-size slide

  4. GOAL
    1. Improving Development Speed
    2. Reduce Performance Overhead

    View full-size slide

  5. 2018-11-28 BitVisor Summit 7
    6
    • Necessary situation
    • Device conceal
    • Device encryption
    • …
    • Not always
    • EPT hook
    Is BitVisor needed at the beginning/always?

    View full-size slide

  6. Approach: On-demand Virtualization
    HW
    OS VMM
    HW
    OS
    HW
    OS
    Virtualization De-virtualization
    ◎ Easy to use
    ◎ No overhead when de-virtualized

    View full-size slide

  7. 2018-11-28 BitVisor Summit 7
    8
    • The idea is not new
    • VMX rootkit (Blue Pill [Rutkowska, 2006], SubVirt [King et al., 2006])
    • Late Launch (Intel TXT) [Gebhardt et al., 2009] [Srinivasan et al., 2011]
    • Loading hypervisor from kernel module and virtualizing self
    • ksm
    • bareflank
    • ShadowBox
    • HyperPlatform
    • BitVisor-related
    • De-virtualization after network booting [Omote et al., 2015]
    • On-demand de/re-virtualization for live migration [Im et al., 2017]
    • Booting BitVisor first
    Related Works

    View full-size slide

  8. 2018-11-28 BitVisor Summit 7
    10
    • Loading BitVisor from kernel module
    Base Idea
    HW
    Linux BitVisor
    HW
    Linux
    HW
    Linux




    • Approach
    1. Make BitVisor entirely kernel module
    2. Load bitvisor.elf from kernel module ! Take this

    View full-size slide

  9. 2018-11-28 BitVisor Summit 7
    11
    BitVisor Boot Sequence (UEFI)
    Firmware loadvmm.elf
    Continue to Boot
    Power on
    VMENTRY
    2nd-loader
    VMX non-root mode
    Non-virtualized
    return to loader
    Initialize
    Start VMM
    Trampoline
    Code
    bitvisor.elf

    View full-size slide

  10. 2018-11-28 BitVisor Summit 7
    12
    Boot Sequence detailed (UEFI, BSP)
    loader.elf:entry_func()
    => entry.s:entry
    => entry.s:uefi64_entry
    => uefi.c:uefi_init
    => entry.s:uefi_entry_start
    => entry.s:callmain64
    => main.c:vmm_main
    => call_initfunc(“global”)
    => main.c:start_all_processors
    => main.c:bsp_continue
    => (change stack)
    => main.c:bsp_proc
    => call_initfunc(“bsp”)
    => call_initfunc(“para”)
    => call_initfunc(“pcpu”)
    => main.c:create_pass_vm (from “pcpu” intfunc)
    => bps == true
    => vcpu.c:load_new_vcpu()
    => call_initfunc(“pass”)
    => vmmcall_boot.c:vmm_call_boot_enable()
    => create vmmcall_boot_thread
    => wait_for_boot_continue
    => continue_flag = false
    => wait until continue_flag become true
    => schedule => vmmcall_boot_thread
    => (scheduled from vmmcall_boot_thread)
    => vmctrl.start_vm (vt_main.c:vt_start_vm)
    => vt_mainloop()
    vmmcall_boot.c:vmmcall_boot_thread
    => main.c:bsp_init_thread
    => main.c:initregs
    => calluefi.copy_uefi_bootcode()
    => if boot from uefi-loader-login
    => vmmcall_boot_continue
    => continue_flag = true
    => schedule => return to create_pass_vm
    => (run vm w/o driver initialization)
    => return to loader (boot/uefi-loader-login/loadvmm.c)
    => authentication is performed in the loader
    => loadvmm.c:decrypt_intl
    => vmmcall_loadcfg64_intl => vmmcall_boot.c:loadcfg64
    => vmmcall_boot_intel => vmmcall_boot.c:boot_guest
    => wait_for_boot_continue
    => continue_flag = false
    => as a result, bsp_init_thread wakes up
    => vmmcall handler wait until continue_flag become true
    => call_initfunc(“config0”)
    => call_initfunc(“drivers”)
    => call_initfunc(“config0”)
    => continue_flag = true
    => thread exit
    => schedule => return to create_pass_vm

    View full-size slide

  11. 2018-11-28 BitVisor Summit 7
    13
    • UEFI firmware call
    • ACPI
    • Memory allocation
    • Paging
    • Virtualizing Application Processors (APs)
    • Real mode booting
    • Unsupported instructions
    Challenges

    View full-size slide

  12. 2018-11-28 BitVisor Summit 7
    14
    • During boot, BitVisor calls several uefi functions
    • F D B G G/EDAED DB ?
    • For debugging
    • C ?? D ?DB B
    • Disconnect firmware drivers
    • ) D BF A
    • DGC CG BFG AG CG? DGEC G EB ? G ( G D
    • ( G / ( ) / G
    • Get ACPI RDSP Table
    • D C G G(B C
    • Memory allocation
    UEFI function call

    View full-size slide

  13. 2018-11-28 BitVisor Summit 7
    15
    • During boot, BitVisor calls several uefi functions
    • F D B G G/EDAED DB ?
    • For debugging
    • C ?? D ?DB B
    • Disconnect firmware drivers
    • ) D BF A
    • DGC CG BFG AG CG? DGEC G EB ? G ( G D
    • ( G / ( ) / G
    • Get ACPI RDSP Table
    • D C G G(B C
    • Memory allocation
    UEFI function call
    No need to support
    Need to support

    View full-size slide

  14. 2018-11-28 BitVisor Summit 7
    16
    • Why needed?
    • Conceal IOMMU (DMAR)
    • PCIe extended configuration (MCFG)
    • Power management (FACP, DSDT/SSDT)
    • Suspend, Reset, etc.
    • Cope with some power-related troubles
    e.g., Prevent the firmware to turn off the power of the device that BitVisor uses
    • How to get RSDP (ACPI root table)
    • Search EBDA (Extended BIOS Area) (BIOS)
    • 0x40e: EBDA base address >> 4
    • EFI Configuration Table (UEFI)
    • No need to search memory
    ACPI Table

    View full-size slide

  15. 2018-11-28 BitVisor Summit 7
    17
    • Possible approach
    • Pass the copy of the ACPI table that the OS has
    • e.g., /sys/firmware/acpi/tables (Linux)
    • Find ACPI table in the memory
    • Is the configuration table available after ( ) ?
    • Currently just ignore ACPI related codes :(
    ACPI Table (Cont’d)

    View full-size slide

  16. 2018-11-28 BitVisor Summit 7
    18
    • loadvmm.elf only allocate 64KiB memory
    • The 2nd-loader calls firmware function to allocate the entire
    VMM memory (128MiB)
    Memory Allocation
    Firmware loadvmm.elf
    Continue to Boot
    Power on
    VMENTRY
    2nd-loader
    return to loader
    Initialize
    Start VMM
    Trampoline
    Code
    AllocPages()

    View full-size slide

  17. 2018-11-28 BitVisor Summit 7
    19
    • It could be possible to create kernel function call
    mechanism in the same way as the UEFI function call, but..
    • Linux does not have a generic large physically-contiguous
    memory allocation mechanism
    • At most max slab size, generally <= 2MB
    • BitVisor is not relocatable
    • Assume physically-contiguous memory
    Memory Allocation (cont’d)

    View full-size slide

  18. 2018-11-28 BitVisor Summit 7
    20
    • Approach
    • Make BitVisor relocatable
    • Reserve physically-contiguous memory at boot time and use it
    • boot option ! Current solution
    • CMA (Continuous Memory Allocator)
    Memory Allocation (cont’d)

    View full-size slide

  19. 2018-11-28 BitVisor Summit 7
    21
    • Memory map of the BitVisor
    • 0x40000000-0x7FFFFFFFFF (1GB)
    • Need to switch
    Paging
    PML4 PDPT PD Offset
    0
    20
    21
    29
    30
    38
    39
    47

    7
    15
    23
    31




    CR3
    PML4E[0] PDPTE[0] 0x00000000-3FFFFFFF (Identity Mapping)
    0x40000000-7FFFFFFF
    PDPTE[1]
    (entry_pd)

    View full-size slide

  20. 2018-11-28 BitVisor Summit 7
    22
    Switching Address Space
    phys
    OS virt BitVisor virt
    1GiB
    1GiB
    Create Context-handover
    page table in BitVisor
    Create Context-handover
    page table in Linux

    View full-size slide

  21. 2018-11-28 BitVisor Summit 7
    23
    • Delayed AP(s) initialization during UEFI boot
    • In UEFI boot, only the BSP is virtualized at first
    • Virtualize APs when the guest OS tries to initialize them
    • Trap the access to the local APIC area (Startup-IPI)
    • To cope with some firmware problems [, 2013]
    • When virtualizing from the kernel module, APs are already run!
    • Approach
    • Just virtualize each core
    • In the kernel module, send IPI and virtualize in the handler
    • Synchronize?
    Virtualizing Application Processors (APs)

    View full-size slide

  22. 2018-11-28 BitVisor Summit 7
    24
    • Trampoline code starts in the real mode
    • Need to allocate memory for the real mode
    • Max memory limit is < 1MB!
    • Approach
    • Reserve memory in the same way as mentioned previously
    • Change trampoline codes so that it starts with long mode !
    Real mode booting
    loadvmm.elf
    return to loader
    Trampoline
    Code
    VMLAUNCH

    View full-size slide

  23. 2018-11-28 BitVisor Summit 7
    25
    • The guest should not execute instructions that BitVisor
    does not support
    • e.g.,
    • VMX instructions
    • PCID (INVPCID)
    • (partially supported now)
    • (BitVIsor conceals unsupported features at the boot time)
    • Approach
    • Don’t use in the guest OS
    • Boot option
    • Patch BitVisor
    Unsupported Instructions

    View full-size slide

  24. 2018-11-28 BitVisor Summit 7
    26
    • Reserve the guest state for the VMENTRY
    • TR, GDTR, IDTR
    • IA32_SYSENTER_CS, IA32_SYSENTER_EIP, IA32_SYSENTER_RSP
    • FS_BASE, GS_BASE
    • Entry point
    • Place in the specific ELF section
    • SMEP / XD Bit
    • Supervisor Mode Execution Prevention (CR4.[21])
    • Prevent execution from user memory in the kernel code (PML4.[63] = 1)
    Miscellaneous

    View full-size slide

  25. 2018-11-28 BitVisor Summit 7
    27
    • Reserve memory for BitVisor at boot time
    • Load BitVisor to the reserved region
    • Send IPI to each core to run virtualization code
    • Create page table to run BitVIsor code
    • mmap + modify page table
    • Jump to the entry code
    • Save the guest state
    • Initialize
    • Return to the guest (VMENRY)
    (Current) Virtualization Flow

    View full-size slide

  26. 2018-11-28 BitVisor Summit 7
    28
    1. Issue VMCALL
    2. Save hypervisor state
    3. Jump to the guest without VMENTRY
    • Load guest state registers
    De-virtualization

    View full-size slide

  27. 2018-11-28 BitVisor Summit 7
    29


    !
    • Based System
    • Intel x86-64
    • Linux 4.18
    • BitVisor changeset 244 (2018-10-4)
    Implementation detail

    View full-size slide

  28. 2018-11-28 BitVisor Summit 7
    31
    Development Environment
    VMWare Workstation ()
    Development VM Experiment VM
    Host
    Serial Port to File
    - Load / Unload kernel module
    - Share disk with the dev VM
    - Same Linux kernel of the dev VM
    - Write codes
    There is AVX512-related problem(?)

    View full-size slide

  29. 2018-11-28 BitVisor Summit 7
    32
    Comparison with other hypervisors
    BIOS boot UEFI boot Kernel
    module
    (Linux)
    Kernel
    module
    (Windows)
    Device
    Driver
    AMD License
    ksm × × ○ ○ × × GPL
    bareflank × △ ○ ○ × × LGPL
    BitVisor ○ ○ developing × ○ ○ BSD
    (incomplete)

    View full-size slide

  30. 2018-11-28 BitVisor Summit 7
    33
    • Propose on-demand virtualization scheme for BitVisor
    • The guest cooperation is necessary
    • Useful in some cases
    • Research & development
    Conclusion

    View full-size slide

  31. 2018-11-28 BitVisor Summit 7
    34
    • Finish Implementation
    • Using vmalloc memory region
    • BitVisor assumes physically contiguous memory
    • Device support
    • Other architecture / OS support
    • AMD, Windows, Mac
    • Any advice/comments welcome!
    Future Work

    View full-size slide

  32. 2018-11-28 BitVisor Summit 7
    35
    • J. Rutkowska, Introducing Blue Pill, SyScan’06.
    • S.T. King et al., SubVirt: implementing malware with virtual machines, S&P’06.
    • C. Gebhardt et al., LaLa: A Late Launch Application, STC’09.
    • R. Srinivasan et al., MIvmm: A micro VMM for development of a trusted code base, 2011.
    • Y. Omote et al., Improving agility and elasticity in bare-metal clouds, ASPLOLS’15.
    • J. Im et al., On-demand Virtualization for Live Migration in Bare Metal Cloud, SoCC’17.
    • , BitVisorUEFI, BitVisor Summit 2, 2013.
    References

    View full-size slide

  33. 2018-11-28 BitVisor Summit 7
    37
    Boot Sequence detailed (UEFI, BSP)
    loader.elf:entry_func()
    => entry.s:entry
    => entry.s:uefi64_entry
    => uefi.c:uefi_init
    => entry.s:uefi_entry_start
    => entry.s:callmain64
    => main.c:vmm_main
    => call_initfunc(“global”)
    => main.c:start_all_processors
    => main.c:bsp_continue
    => (change stack)
    => main.c:bsp_proc
    => call_initfunc(“bsp”)
    => call_initfunc(“para”)
    => call_initfunc(“pcpu”)
    => main.c:create_pass_vm (from “pcpu” intfunc)
    => bps == true
    => vcpu.c:load_new_vcpu()
    => call_initfunc(“pass”)
    => vmmcall_boot.c:vmm_call_boot_enable()
    => create vmmcall_boot_thread
    => wait_for_boot_continue
    => continue_flag = false
    => wait until continue_flag become true
    => schedule => vmmcall_boot_thread
    => (scheduled from vmmcall_boot_thread)
    => vmctrl.start_vm (vt_main.c:vt_start_vm)
    => vt_mainloop()
    vmmcall_boot.c:vmmcall_boot_thread
    => main.c:bsp_init_thread
    => main.c:initregs
    => calluefi.copy_uefi_bootcode()
    => if boot from uefi-loader-login
    => vmmcall_boot_continue
    => continue_flag = true
    => schedule => return to create_pass_vm
    => (run vm w/o driver initialization)
    => return to loader (boot/uefi-loader-login/loadvmm.c)
    => authentication is performed in the loader
    => loadvmm.c:decrypt_intl
    => vmmcall_loadcfg64_intl => vmmcall_boot.c:loadcfg64
    => vmmcall_boot_intel => vmmcall_boot.c:boot_guest
    => wait_for_boot_continue
    => continue_flag = false
    => as a result, bsp_init_thread wakes up
    => vmmcall handler wait until continue_flag become true
    => call_initfunc(“config0”)
    => call_initfunc(“drivers”)
    => call_initfunc(“config0”)
    => continue_flag = true
    => thread exit
    => schedule => return to create_pass_vm

    View full-size slide

  34. 2018-11-28 BitVisor Summit 7
    38
    Boot Sequence detailed (UEFI, AP)
    receive SIPI
    => entry.s:cpuinit_start
    => entry.s:call_main64
    => ap.c:apinitproc0
    => (change stack)
    => ap.c:apinitproc1
    => int_init_ap() (initialize IDT)
    => initproc_ap (= ap_initproc = ap_proc)
    => call_initfunc(“ap”)
    => call_initfunc(“para”)
    => call_initfunc(“pcpu”)
    => create_pass_vm
    => load_new_vcpu
    => vmctl.vminit
    => call_initfunc(“pass”)
    => initregs
    => vmctrl.init_signal
    => start_vm
    localapic.c:mmio_apic
    => handle_ap_start
    => ap.c:ap_start
    - copy cpuinit_start code to apinit addr
    => ap_start_addr
    => apic_send_startup_ipi
    => start AP
    => call_initfunc(“dbsp”)
    => time.c:time_init_dbsp
    => main.cwait_for_create_pass_vm
    - call sync_all_processors() to sync with AP
    BSP AP
    Data initialization for AP
    - bspinitproc1
    - allocate apinit_addr() by alloc_realmodemem()
    - localapic_delayed_ap_start()
    - set ap_start function pointer to ap_start

    View full-size slide

  35. 2018-11-28 BitVisor Summit 7
    39
    uefi-loader
    wait_for_boot_continue
    continue_flag = false
    continue_flag = true
    start_vm
    init_regs()
    copy_uefi_bootcode()
    call_initfunc(“config0”)
    call_initfunc(“drivers”)
    call_initfunc(“config0”)
    vmmcall_boot_thread
    create
    vmmcall_boot_thread
    bsp_init_thread
    create_pass_vm
    wait while
    continu_flag == false
    schedule()
    schedule()
    thread exit

    View full-size slide

  36. 2018-11-28 BitVisor Summit 7
    40
    uefi-loader-login
    wait_for_boot_continue
    continue_flag = false
    continue_flag = true
    wait while
    continue_flag == true
    start_vm
    init_regs()
    copy_uefi_bootcode()
    vmmcall_boot_continue()
    vmmcall_boot_thread
    create
    vmmcall_boot_thread
    bsp_init_thread
    create_pass_vm
    wait while
    continu_flag == false
    schedule()
    schedule()
    thread exit
    loadvmm.elf
    continue_flag = true
    call_initfunc(“config0”)
    call_initfunc(“drivers”)
    call_initfunc(“config0”)
    load bitvisor.elf
    password authentification
    (vmmcall_loadcfg)
    From here,
    in the VMX non-root mode
    Continue to boot
    Return to the loader w/o drivers initialization
    boot_guest
    wait_for_boot_continue
    config_flag = false
    wait while
    config_flag == false
    schedule()
    schedule()
    vmmcall_boot

    View full-size slide

  37. 2018-11-28 BitVisor Summit 7
    41
    Page Table
    1st-loader
    (loadvmm.elf)
    2nd-loader BitVisor
    phys virt phys virt phys virt
    Identity Mapping
    1GiB
    4GiB
    Virt addr: 0x40000000 - 0x47FFFFFF (1GB)
    0x40000000-0x400FFFFF: Heap
    0x40100000-code end : Code
    code end -0x47FFFFFF : Heap
    Use 2MB paging

    View full-size slide

  38. 2018-11-28 BitVisor Summit 7
    42
    Page Table Change
    phys virt
    Identity Mapping to entry-pd
    (entry.s:uefi_entry)
    1GiB
    Head of 64KiB
    of BitVisor
    phys virt
    1GiB
    Map entire VMM
    (mm.c:create_pd)

    View full-size slide

  39. 2018-11-28 BitVisor Summit 7
    43
    Page Table Configuration (entry_pml4)
    PML4 PDPT PD Offset
    0
    20
    21
    29
    30
    38
    39
    47
    % %%%%%%%
    7
    15
    23
    31
    % %% %%%% %%%% %%%% %%%% %%%% %%%% %%%%
    % ------- %
    % %%%%%%%% %%%% %%%% %%%% %%%% %%%% %%%% %%%% %%%%
    % ------- %%
    F C4 C $$ ,:4 7 ,0
    $$ 0 / :7 4 7
    F 4C 3 4 = C $$ ,:4 7 1 4 =
    F C4
    C4 $$ ,:4 7 0 /
    * $$ 7 % %%%%%%% 4 C7
    CR3
    PML4E[0] PDPTE[0] 0x00000000-3FFFFFFF (Identity Mapping)
    0x40000000-7FFFFFFF
    entry_pml4: used at the uefi entry
    vmm_pml4: VMM page table
    uefi_entry_cr3: original UEFI cr3 (identity mapping)
    calluefi_uefi_cr3: used for uefi function call (= uefi_entry_cr3)
    PDPTE[1]

    View full-size slide

  40. 2018-11-28 BitVisor Summit 7
    44
    Page Table Configuration (vmm_pml4)
    PML4 PDPT PD Offset
    0
    20
    21
    29
    30
    38
    39
    47
    ]+ b0b
    7
    15
    23
    31
    ( b b b b
    ]-7777777b0b (((b(((( ((((b(((( ((((b(((( ((((b((((
    ] b0b b b b b
    ]*7777777b0b ((b(((( ((((b(((( ((((b(((( ((((b((((
    H P [UP
    WLH LG[SSGV [UP
    _
    PT P
    ZRUT W*
    & bSHVbSLSUW bHWLH b UVPL b UbH b ]4 b &
    MUW P 0b bP /bB<< D6G2;;b11b=2 6 D6)[SSGV EPFb0
    [SSG HW GV b P //b=2 6 D6)= 6G=G3 b b= 6G>CG3 b b= 6G= G3 b b= 6G2G3 b
    = 6G G3 b b= 6G G3
    LT W GV VE*Fb0b Z,+ [PW G [SSGV b ]+ b b= =6G2 >
    H SGW W*b W*
    H SG W W*b W*
    & bSH LbHbTL bVH Lb PWL UW b &
    [SSGIH LG W*b0b SG UGV [SSGV V
    SLS V [SSGV ( bLT W GV b=2 6 D6
    SLS L [SSGV ) b b=2 6 D6
    [SSGV VE Fb0b SG UGV LT W GV b b= =6G2 >
    [SSGV VE(Fb0b SG UGV [SSGV bbbb b= =6G2 >
    [SSGV VE)Fb0b SG UGV [SSGV ( bbb b= =6G2 >
    [SSGV VE*Fb0b SG UGV [SSGV ) bbb b= =6G2 >
    [SSGIH LG W*b0b SG UGV [SSGVSR+
    [SSGVSR+E Fbb0b SG UGV [SSGV V b b= 6G=G3 b b= 6G>CG3
    b= 6GA G3
    a
    CR3
    PML4E[0] PDPTE[0] entry_pd0 = identity mapping (0x00000000-3FFFFFFF)
    VMM kernel space (0x40000000-47FFFFFF)
    PDPTE[1]
    PDPTE[2]
    PDPTE[3]
    vmm_pd1 = entry_pd0 (0x80000000-BFFFFFFF)
    Unset (for dynamic allocation) (0xC0000000-FFFFFFFF)

    View full-size slide

  41. 2018-11-28 BitVisor Summit 7
    45
    Switching Address Space
    phys
    OS virt BitVisor virt
    1GiB
    1GiB
    Create Context-handover
    page table in BitVisor
    Create Context-handover
    page table in Linux

    View full-size slide

  42. 2018-11-28 BitVisor Summit 7
    46
    Linux Address Space
    - 1 - M I = G I FF
    B E = M 9 .0,): G G G
    . .- 1 ) M B E B I
    .. =- 1, 82 = F II G EE IB F F
    =. =. 1 B E
    =/ . 1 F EE =% F I I =
    / / 1 B E
    1 M E F F F I 82
    MGM B E
    = 1 G B F F ,82
    MGM B E
    - 1)/ I MI =
    MGM B E
    1, 52 346 G F II G I =
    MGM B E
    . / 1 ( 72 G E F II G F IB
    1 (, 72 F ME F II G I =
    , 1. 72 = EE
    1( 72 MGM B E
    https://github.com/torvalds/linux/blob/master/Documentation/x86/x86_64/mm.txt

    View full-size slide