Upgrade to Pro — share decks privately, control downloads, hide ads and more …

bitvisor.ko : BitVisor as a module

734096b490c456ce1e8670d279ac30cf?s=47 mmisono
November 28, 2018

bitvisor.ko : BitVisor as a module

734096b490c456ce1e8670d279ac30cf?s=128

mmisono

November 28, 2018
Tweet

Transcript

  1. bitvisor.ko : BitVisor as a module Masanori Misono, The University

    of Tokyo misono@os.ecc.u-tokyo.ac.jp 2018-11-28 BitVisor Summit 7
  2. Background

  3. 2018-11-28 BitVisor Summit 7 3 BitVisor Developing Flow Coding Install

    Reboot
  4. Virtualization Overhead

  5. GOAL 1. Improving Development Speed 2. Reduce Performance Overhead

  6. 2018-11-28 BitVisor Summit 7 6 • Necessary situation • Device

    conceal • Device encryption • … • Not always • EPT hook Is BitVisor needed at the beginning/always?
  7. Approach: On-demand Virtualization HW OS VMM HW OS HW OS

    Virtualization De-virtualization ◎ Easy to use ◎ No overhead when de-virtualized
  8. 2018-11-28 BitVisor Summit 7 8 • The idea is not

    new • VMX rootkit (Blue Pill [Rutkowska, 2006], SubVirt [King et al., 2006]) • Late Launch (Intel TXT) [Gebhardt et al., 2009] [Srinivasan et al., 2011] • Loading hypervisor from kernel module and virtualizing self • ksm • bareflank • ShadowBox • HyperPlatform • BitVisor-related • De-virtualization after network booting [Omote et al., 2015] • On-demand de/re-virtualization for live migration [Im et al., 2017] • Booting BitVisor first Related Works
  9. Design

  10. 2018-11-28 BitVisor Summit 7 10 • Loading BitVisor from kernel

    module Base Idea HW Linux BitVisor HW Linux HW Linux           • Approach 1. Make BitVisor entirely kernel module 2. Load bitvisor.elf from kernel module ! Take this
  11. 2018-11-28 BitVisor Summit 7 11 BitVisor Boot Sequence (UEFI) Firmware

    loadvmm.elf Continue to Boot Power on VMENTRY 2nd-loader VMX non-root mode Non-virtualized return to loader Initialize Start VMM Trampoline Code bitvisor.elf
  12. 2018-11-28 BitVisor Summit 7 12 Boot Sequence detailed (UEFI, BSP)

    loader.elf:entry_func() => entry.s:entry => entry.s:uefi64_entry => uefi.c:uefi_init => entry.s:uefi_entry_start => entry.s:callmain64 => main.c:vmm_main => call_initfunc(“global”) => main.c:start_all_processors => main.c:bsp_continue => (change stack) => main.c:bsp_proc => call_initfunc(“bsp”) => call_initfunc(“para”) => call_initfunc(“pcpu”) => main.c:create_pass_vm (from “pcpu” intfunc) => bps == true => vcpu.c:load_new_vcpu() => call_initfunc(“pass”) => vmmcall_boot.c:vmm_call_boot_enable() => create vmmcall_boot_thread => wait_for_boot_continue => continue_flag = false => wait until continue_flag become true => schedule => vmmcall_boot_thread => (scheduled from vmmcall_boot_thread) => vmctrl.start_vm (vt_main.c:vt_start_vm) => vt_mainloop() vmmcall_boot.c:vmmcall_boot_thread => main.c:bsp_init_thread => main.c:initregs => calluefi.copy_uefi_bootcode() => if boot from uefi-loader-login => vmmcall_boot_continue => continue_flag = true => schedule => return to create_pass_vm => (run vm w/o driver initialization) => return to loader (boot/uefi-loader-login/loadvmm.c) => authentication is performed in the loader => loadvmm.c:decrypt_intl => vmmcall_loadcfg64_intl => vmmcall_boot.c:loadcfg64 => vmmcall_boot_intel => vmmcall_boot.c:boot_guest => wait_for_boot_continue => continue_flag = false => as a result, bsp_init_thread wakes up => vmmcall handler wait until continue_flag become true => call_initfunc(“config0”) => call_initfunc(“drivers”) => call_initfunc(“config0”) => continue_flag = true => thread exit => schedule => return to create_pass_vm
  13. 2018-11-28 BitVisor Summit 7 13 • UEFI firmware call •

    ACPI • Memory allocation • Paging • Virtualizing Application Processors (APs) • Real mode booting • Unsupported instructions Challenges
  14. 2018-11-28 BitVisor Summit 7 14 • During boot, BitVisor calls

    several uefi functions • F D B G G/EDAED DB ? • For debugging • C ?? D ?DB B • Disconnect firmware drivers • ) D BF A • DGC CG BFG AG CG? DGEC G EB ? G ( G D • ( G / ( ) / G • Get ACPI RDSP Table • D C G G(B C • Memory allocation UEFI function call
  15. 2018-11-28 BitVisor Summit 7 15 • During boot, BitVisor calls

    several uefi functions • F D B G G/EDAED DB ? • For debugging • C ?? D ?DB B • Disconnect firmware drivers • ) D BF A • DGC CG BFG AG CG? DGEC G EB ? G ( G D • ( G / ( ) / G • Get ACPI RDSP Table • D C G G(B C • Memory allocation UEFI function call No need to support Need to support
  16. 2018-11-28 BitVisor Summit 7 16 • Why needed? • Conceal

    IOMMU (DMAR) • PCIe extended configuration (MCFG) • Power management (FACP, DSDT/SSDT) • Suspend, Reset, etc. • Cope with some power-related troubles e.g., Prevent the firmware to turn off the power of the device that BitVisor uses • How to get RSDP (ACPI root table) • Search EBDA (Extended BIOS Area) (BIOS) • 0x40e: EBDA base address >> 4 • EFI Configuration Table (UEFI) • No need to search memory ACPI Table
  17. 2018-11-28 BitVisor Summit 7 17 • Possible approach • Pass

    the copy of the ACPI table that the OS has • e.g., /sys/firmware/acpi/tables (Linux) • Find ACPI table in the memory • Is the configuration table available after ( ) ? • Currently just ignore ACPI related codes :( ACPI Table (Cont’d)
  18. 2018-11-28 BitVisor Summit 7 18 • loadvmm.elf only allocate 64KiB

    memory • The 2nd-loader calls firmware function to allocate the entire VMM memory (128MiB) Memory Allocation Firmware loadvmm.elf Continue to Boot Power on VMENTRY 2nd-loader return to loader Initialize Start VMM Trampoline Code AllocPages()
  19. 2018-11-28 BitVisor Summit 7 19 • It could be possible

    to create kernel function call mechanism in the same way as the UEFI function call, but.. • Linux does not have a generic large physically-contiguous memory allocation mechanism • At most max slab size, generally <= 2MB • BitVisor is not relocatable • Assume physically-contiguous memory Memory Allocation (cont’d)
  20. 2018-11-28 BitVisor Summit 7 20 • Approach • Make BitVisor

    relocatable • Reserve physically-contiguous memory at boot time and use it • boot option ! Current solution • CMA (Continuous Memory Allocator) Memory Allocation (cont’d)
  21. 2018-11-28 BitVisor Summit 7 21 • Memory map of the

    BitVisor • 0x40000000-0x7FFFFFFFFF (1GB) • Need to switch Paging PML4 PDPT PD Offset 0 20 21 29 30 38 39 47   7 15 23 31                                       CR3 PML4E[0] PDPTE[0] 0x00000000-3FFFFFFF (Identity Mapping) 0x40000000-7FFFFFFF PDPTE[1] (entry_pd)
  22. 2018-11-28 BitVisor Summit 7 22 Switching Address Space phys OS

    virt BitVisor virt 1GiB 1GiB Create Context-handover page table in BitVisor Create Context-handover page table in Linux
  23. 2018-11-28 BitVisor Summit 7 23 • Delayed AP(s) initialization during

    UEFI boot • In UEFI boot, only the BSP is virtualized at first • Virtualize APs when the guest OS tries to initialize them • Trap the access to the local APIC area (Startup-IPI) • To cope with some firmware problems [, 2013] • When virtualizing from the kernel module, APs are already run! • Approach • Just virtualize each core • In the kernel module, send IPI and virtualize in the handler • Synchronize? Virtualizing Application Processors (APs)
  24. 2018-11-28 BitVisor Summit 7 24 • Trampoline code starts in

    the real mode • Need to allocate memory for the real mode • Max memory limit is < 1MB! • Approach • Reserve memory in the same way as mentioned previously • Change trampoline codes so that it starts with long mode ! Real mode booting loadvmm.elf return to loader Trampoline Code VMLAUNCH
  25. 2018-11-28 BitVisor Summit 7 25 • The guest should not

    execute instructions that BitVisor does not support • e.g., • VMX instructions • PCID (INVPCID) • (partially supported now) • (BitVIsor conceals unsupported features at the boot time) • Approach • Don’t use in the guest OS • Boot option • Patch BitVisor Unsupported Instructions
  26. 2018-11-28 BitVisor Summit 7 26 • Reserve the guest state

    for the VMENTRY • TR, GDTR, IDTR • IA32_SYSENTER_CS, IA32_SYSENTER_EIP, IA32_SYSENTER_RSP • FS_BASE, GS_BASE • Entry point • Place in the specific ELF section • SMEP / XD Bit • Supervisor Mode Execution Prevention (CR4.[21]) • Prevent execution from user memory in the kernel code (PML4.[63] = 1) Miscellaneous
  27. 2018-11-28 BitVisor Summit 7 27 • Reserve memory for BitVisor

    at boot time • Load BitVisor to the reserved region • Send IPI to each core to run virtualization code • Create page table to run BitVIsor code • mmap + modify page table • Jump to the entry code • Save the guest state • Initialize • Return to the guest (VMENRY) (Current) Virtualization Flow
  28. 2018-11-28 BitVisor Summit 7 28 1. Issue VMCALL 2. Save

    hypervisor state 3. Jump to the guest without VMENTRY • Load guest state registers De-virtualization
  29. 2018-11-28 BitVisor Summit 7 29 •    

    ! • Based System • Intel x86-64 • Linux 4.18 • BitVisor changeset 244 (2018-10-4) Implementation detail
  30. Demo

  31. 2018-11-28 BitVisor Summit 7 31 Development Environment VMWare Workstation ()

    Development VM Experiment VM Host Serial Port to File - Load / Unload kernel module - Share disk with the dev VM - Same Linux kernel of the dev VM - Write codes  There is AVX512-related problem(?)
  32. 2018-11-28 BitVisor Summit 7 32 Comparison with other hypervisors BIOS

    boot UEFI boot Kernel module (Linux) Kernel module (Windows) Device Driver AMD License ksm × × ◦ ◦ × × GPL bareflank × △ ◦ ◦ × × LGPL BitVisor ◦ ◦ developing × ◦ ◦ BSD (incomplete)
  33. 2018-11-28 BitVisor Summit 7 33 • Propose on-demand virtualization scheme

    for BitVisor • The guest cooperation is necessary • Useful in some cases • Research & development Conclusion
  34. 2018-11-28 BitVisor Summit 7 34 • Finish Implementation • Using

    vmalloc memory region • BitVisor assumes physically contiguous memory • Device support • Other architecture / OS support • AMD, Windows, Mac • Any advice/comments welcome! Future Work
  35. 2018-11-28 BitVisor Summit 7 35 • J. Rutkowska, Introducing Blue

    Pill, SyScan’06. • S.T. King et al., SubVirt: implementing malware with virtual machines, S&P’06. • C. Gebhardt et al., LaLa: A Late Launch Application, STC’09. • R. Srinivasan et al., MIvmm: A micro VMM for development of a trusted code base, 2011. • Y. Omote et al., Improving agility and elasticity in bare-metal clouds, ASPLOLS’15. • J. Im et al., On-demand Virtualization for Live Migration in Bare Metal Cloud, SoCC’17. •  , BitVisorUEFI, BitVisor Summit 2, 2013. References
  36. None
  37. 2018-11-28 BitVisor Summit 7 37 Boot Sequence detailed (UEFI, BSP)

    loader.elf:entry_func() => entry.s:entry => entry.s:uefi64_entry => uefi.c:uefi_init => entry.s:uefi_entry_start => entry.s:callmain64 => main.c:vmm_main => call_initfunc(“global”) => main.c:start_all_processors => main.c:bsp_continue => (change stack) => main.c:bsp_proc => call_initfunc(“bsp”) => call_initfunc(“para”) => call_initfunc(“pcpu”) => main.c:create_pass_vm (from “pcpu” intfunc) => bps == true => vcpu.c:load_new_vcpu() => call_initfunc(“pass”) => vmmcall_boot.c:vmm_call_boot_enable() => create vmmcall_boot_thread => wait_for_boot_continue => continue_flag = false => wait until continue_flag become true => schedule => vmmcall_boot_thread => (scheduled from vmmcall_boot_thread) => vmctrl.start_vm (vt_main.c:vt_start_vm) => vt_mainloop() vmmcall_boot.c:vmmcall_boot_thread => main.c:bsp_init_thread => main.c:initregs => calluefi.copy_uefi_bootcode() => if boot from uefi-loader-login => vmmcall_boot_continue => continue_flag = true => schedule => return to create_pass_vm => (run vm w/o driver initialization) => return to loader (boot/uefi-loader-login/loadvmm.c) => authentication is performed in the loader => loadvmm.c:decrypt_intl => vmmcall_loadcfg64_intl => vmmcall_boot.c:loadcfg64 => vmmcall_boot_intel => vmmcall_boot.c:boot_guest => wait_for_boot_continue => continue_flag = false => as a result, bsp_init_thread wakes up => vmmcall handler wait until continue_flag become true => call_initfunc(“config0”) => call_initfunc(“drivers”) => call_initfunc(“config0”) => continue_flag = true => thread exit => schedule => return to create_pass_vm
  38. 2018-11-28 BitVisor Summit 7 38 Boot Sequence detailed (UEFI, AP)

    receive SIPI => entry.s:cpuinit_start => entry.s:call_main64 => ap.c:apinitproc0 => (change stack) => ap.c:apinitproc1 => int_init_ap() (initialize IDT) => initproc_ap (= ap_initproc = ap_proc) => call_initfunc(“ap”) => call_initfunc(“para”) => call_initfunc(“pcpu”) => create_pass_vm => load_new_vcpu => vmctl.vminit => call_initfunc(“pass”) => initregs => vmctrl.init_signal => start_vm localapic.c:mmio_apic => handle_ap_start => ap.c:ap_start - copy cpuinit_start code to apinit addr => ap_start_addr => apic_send_startup_ipi => start AP => call_initfunc(“dbsp”) => time.c:time_init_dbsp => main.cwait_for_create_pass_vm - call sync_all_processors() to sync with AP BSP AP Data initialization for AP - bspinitproc1 - allocate apinit_addr() by alloc_realmodemem() - localapic_delayed_ap_start() - set ap_start function pointer to ap_start
  39. 2018-11-28 BitVisor Summit 7 39 uefi-loader wait_for_boot_continue continue_flag = false

    continue_flag = true start_vm init_regs() copy_uefi_bootcode() call_initfunc(“config0”) call_initfunc(“drivers”) call_initfunc(“config0”) vmmcall_boot_thread create vmmcall_boot_thread bsp_init_thread create_pass_vm wait while continu_flag == false schedule() schedule() thread exit
  40. 2018-11-28 BitVisor Summit 7 40 uefi-loader-login wait_for_boot_continue continue_flag = false

    continue_flag = true wait while continue_flag == true start_vm init_regs() copy_uefi_bootcode() vmmcall_boot_continue() vmmcall_boot_thread create vmmcall_boot_thread bsp_init_thread create_pass_vm wait while continu_flag == false schedule() schedule() thread exit loadvmm.elf continue_flag = true call_initfunc(“config0”) call_initfunc(“drivers”) call_initfunc(“config0”) load bitvisor.elf password authentification (vmmcall_loadcfg)  From here, in the VMX non-root mode Continue to boot  Return to the loader w/o drivers initialization boot_guest wait_for_boot_continue config_flag = false wait while config_flag == false schedule() schedule() vmmcall_boot
  41. 2018-11-28 BitVisor Summit 7 41 Page Table 1st-loader (loadvmm.elf) 2nd-loader

    BitVisor phys virt phys virt phys virt Identity Mapping 1GiB 4GiB Virt addr: 0x40000000 - 0x47FFFFFF (1GB) 0x40000000-0x400FFFFF: Heap 0x40100000-code end : Code code end -0x47FFFFFF : Heap Use 2MB paging
  42. 2018-11-28 BitVisor Summit 7 42 Page Table Change phys virt

    Identity Mapping to entry-pd (entry.s:uefi_entry) 1GiB Head of 64KiB of BitVisor phys virt 1GiB Map entire VMM (mm.c:create_pd)
  43. 2018-11-28 BitVisor Summit 7 43 Page Table Configuration (entry_pml4) PML4

    PDPT PD Offset 0 20 21 29 30 38 39 47 % %%%%%%% 7 15 23 31 % %% %%%% %%%% %%%% %%%% %%%% %%%% %%%% % ------- % % %%%%%%%% %%%% %%%% %%%% %%%% %%%% %%%% %%%% %%%% % ------- %% F C4 C $$ ,:4 7 ,0 $$ 0 / :7 4 7 F 4C 3 4 = C $$ ,:4 7 1 4 = F C4 C4 $$ ,:4 7 0 / * $$ 7 % %%%%%%% 4 C7 CR3 PML4E[0] PDPTE[0] 0x00000000-3FFFFFFF (Identity Mapping) 0x40000000-7FFFFFFF entry_pml4: used at the uefi entry vmm_pml4: VMM page table uefi_entry_cr3: original UEFI cr3 (identity mapping) calluefi_uefi_cr3: used for uefi function call (= uefi_entry_cr3) PDPTE[1]
  44. 2018-11-28 BitVisor Summit 7 44 Page Table Configuration (vmm_pml4) PML4

    PDPT PD Offset 0 20 21 29 30 38 39 47 ]+ b0b 7 15 23 31 ( b b b b ]-7777777b0b (((b(((( ((((b(((( ((((b(((( ((((b(((( ] b0b b b b b ]*7777777b0b ((b(((( ((((b(((( ((((b(((( ((((b(((( H P [UP WLH LG[SSGV [UP _ PT P ZRUT W* & bSHVbSLSUW bHWLH b UVPL b UbH b ]4 b & MUW P 0b bP /bB<< D6G2;;b11b=2 6 D6)<G 7 bP [SSGV EPFb0 [SSG HW GV b P //b=2 6 D6)<G 7 b = 6G=G3 b b= 6G>CG3 b b= 6G= G3 b b= 6G2G3 b = 6G G3 b b= 6G G3 LT W GV VE*Fb0b Z,+ [PW G [SSGV b ]+ b b= =6G2 > H SGW W*b W* H SG W W*b W* & bSH LbHbTL bVH Lb PWL UW b & [SSGIH LG W*b0b SG UGV [SSGV V SLS V [SSGV ( bLT W GV b=2 6 D6 SLS L [SSGV ) b b=2 6 D6 [SSGV VE Fb0b SG UGV LT W GV b b= =6G2 > [SSGV VE(Fb0b SG UGV [SSGV bbbb b= =6G2 > [SSGV VE)Fb0b SG UGV [SSGV ( bbb b= =6G2 > [SSGV VE*Fb0b SG UGV [SSGV ) bbb b= =6G2 > [SSGIH LG W*b0b SG UGV [SSGVSR+ [SSGVSR+E Fbb0b SG UGV [SSGV V b b= 6G=G3 b b= 6G>CG3 b= 6GA G3 a CR3 PML4E[0] PDPTE[0] entry_pd0 = identity mapping (0x00000000-3FFFFFFF) VMM kernel space (0x40000000-47FFFFFF) PDPTE[1] PDPTE[2] PDPTE[3] vmm_pd1 = entry_pd0 (0x80000000-BFFFFFFF) Unset (for dynamic allocation) (0xC0000000-FFFFFFFF)
  45. 2018-11-28 BitVisor Summit 7 45 Switching Address Space phys OS

    virt BitVisor virt 1GiB 1GiB Create Context-handover page table in BitVisor Create Context-handover page table in Linux
  46. 2018-11-28 BitVisor Summit 7 46 Linux Address Space - 1

    - M I = G I FF B E = M 9 .0,): G G G . .- 1 ) M B E B I .. =- 1, 82 = F II G EE IB F F =. =. 1 B E =/ . 1 F EE =% F I I = / / 1 B E 1 M E F F F I 82 MGM B E = 1 G B F F ,82 MGM B E - 1)/ I MI = MGM B E 1, 52 346 G F II G I = MGM B E . / 1 ( 72 G E F II G F IB 1 (, 72 F ME F II G I = , 1. 72 = EE 1( 72 MGM B E https://github.com/torvalds/linux/blob/master/Documentation/x86/x86_64/mm.txt
  47. None