Slide 1

Slide 1 text

bitvisor.ko : BitVisor as a module Masanori Misono, The University of Tokyo [email protected] 2018-11-28 BitVisor Summit 7

Slide 2

Slide 2 text

Background

Slide 3

Slide 3 text

2018-11-28 BitVisor Summit 7 3 BitVisor Developing Flow Coding Install Reboot

Slide 4

Slide 4 text

Virtualization Overhead

Slide 5

Slide 5 text

GOAL 1. Improving Development Speed 2. Reduce Performance Overhead

Slide 6

Slide 6 text

2018-11-28 BitVisor Summit 7 6 • Necessary situation • Device conceal • Device encryption • … • Not always • EPT hook Is BitVisor needed at the beginning/always?

Slide 7

Slide 7 text

Approach: On-demand Virtualization HW OS VMM HW OS HW OS Virtualization De-virtualization ◎ Easy to use ◎ No overhead when de-virtualized

Slide 8

Slide 8 text

2018-11-28 BitVisor Summit 7 8 • The idea is not new • VMX rootkit (Blue Pill [Rutkowska, 2006], SubVirt [King et al., 2006]) • Late Launch (Intel TXT) [Gebhardt et al., 2009] [Srinivasan et al., 2011] • Loading hypervisor from kernel module and virtualizing self • ksm • bareflank • ShadowBox • HyperPlatform • BitVisor-related • De-virtualization after network booting [Omote et al., 2015] • On-demand de/re-virtualization for live migration [Im et al., 2017] • Booting BitVisor first Related Works

Slide 9

Slide 9 text

Design

Slide 10

Slide 10 text

2018-11-28 BitVisor Summit 7 10 • Loading BitVisor from kernel module Base Idea HW Linux BitVisor HW Linux HW Linux • Approach 1. Make BitVisor entirely kernel module 2. Load bitvisor.elf from kernel module ! Take this

Slide 11

Slide 11 text

2018-11-28 BitVisor Summit 7 11 BitVisor Boot Sequence (UEFI) Firmware loadvmm.elf Continue to Boot Power on VMENTRY 2nd-loader VMX non-root mode Non-virtualized return to loader Initialize Start VMM Trampoline Code bitvisor.elf

Slide 12

Slide 12 text

2018-11-28 BitVisor Summit 7 12 Boot Sequence detailed (UEFI, BSP) loader.elf:entry_func() => entry.s:entry => entry.s:uefi64_entry => uefi.c:uefi_init => entry.s:uefi_entry_start => entry.s:callmain64 => main.c:vmm_main => call_initfunc(“global”) => main.c:start_all_processors => main.c:bsp_continue => (change stack) => main.c:bsp_proc => call_initfunc(“bsp”) => call_initfunc(“para”) => call_initfunc(“pcpu”) => main.c:create_pass_vm (from “pcpu” intfunc) => bps == true => vcpu.c:load_new_vcpu() => call_initfunc(“pass”) => vmmcall_boot.c:vmm_call_boot_enable() => create vmmcall_boot_thread => wait_for_boot_continue => continue_flag = false => wait until continue_flag become true => schedule => vmmcall_boot_thread => (scheduled from vmmcall_boot_thread) => vmctrl.start_vm (vt_main.c:vt_start_vm) => vt_mainloop() vmmcall_boot.c:vmmcall_boot_thread => main.c:bsp_init_thread => main.c:initregs => calluefi.copy_uefi_bootcode() => if boot from uefi-loader-login => vmmcall_boot_continue => continue_flag = true => schedule => return to create_pass_vm => (run vm w/o driver initialization) => return to loader (boot/uefi-loader-login/loadvmm.c) => authentication is performed in the loader => loadvmm.c:decrypt_intl => vmmcall_loadcfg64_intl => vmmcall_boot.c:loadcfg64 => vmmcall_boot_intel => vmmcall_boot.c:boot_guest => wait_for_boot_continue => continue_flag = false => as a result, bsp_init_thread wakes up => vmmcall handler wait until continue_flag become true => call_initfunc(“config0”) => call_initfunc(“drivers”) => call_initfunc(“config0”) => continue_flag = true => thread exit => schedule => return to create_pass_vm

Slide 13

Slide 13 text

2018-11-28 BitVisor Summit 7 13 • UEFI firmware call • ACPI • Memory allocation • Paging • Virtualizing Application Processors (APs) • Real mode booting • Unsupported instructions Challenges

Slide 14

Slide 14 text

2018-11-28 BitVisor Summit 7 14 • During boot, BitVisor calls several uefi functions • F D B G G/EDAED DB ? • For debugging • C ?? D ?DB B • Disconnect firmware drivers • ) D BF A • DGC CG BFG AG CG? DGEC G EB ? G ( G D • ( G / ( ) / G • Get ACPI RDSP Table • D C G G(B C • Memory allocation UEFI function call

Slide 15

Slide 15 text

2018-11-28 BitVisor Summit 7 15 • During boot, BitVisor calls several uefi functions • F D B G G/EDAED DB ? • For debugging • C ?? D ?DB B • Disconnect firmware drivers • ) D BF A • DGC CG BFG AG CG? DGEC G EB ? G ( G D • ( G / ( ) / G • Get ACPI RDSP Table • D C G G(B C • Memory allocation UEFI function call No need to support Need to support

Slide 16

Slide 16 text

2018-11-28 BitVisor Summit 7 16 • Why needed? • Conceal IOMMU (DMAR) • PCIe extended configuration (MCFG) • Power management (FACP, DSDT/SSDT) • Suspend, Reset, etc. • Cope with some power-related troubles e.g., Prevent the firmware to turn off the power of the device that BitVisor uses • How to get RSDP (ACPI root table) • Search EBDA (Extended BIOS Area) (BIOS) • 0x40e: EBDA base address >> 4 • EFI Configuration Table (UEFI) • No need to search memory ACPI Table

Slide 17

Slide 17 text

2018-11-28 BitVisor Summit 7 17 • Possible approach • Pass the copy of the ACPI table that the OS has • e.g., /sys/firmware/acpi/tables (Linux) • Find ACPI table in the memory • Is the configuration table available after ( ) ? • Currently just ignore ACPI related codes :( ACPI Table (Cont’d)

Slide 18

Slide 18 text

2018-11-28 BitVisor Summit 7 18 • loadvmm.elf only allocate 64KiB memory • The 2nd-loader calls firmware function to allocate the entire VMM memory (128MiB) Memory Allocation Firmware loadvmm.elf Continue to Boot Power on VMENTRY 2nd-loader return to loader Initialize Start VMM Trampoline Code AllocPages()

Slide 19

Slide 19 text

2018-11-28 BitVisor Summit 7 19 • It could be possible to create kernel function call mechanism in the same way as the UEFI function call, but.. • Linux does not have a generic large physically-contiguous memory allocation mechanism • At most max slab size, generally <= 2MB • BitVisor is not relocatable • Assume physically-contiguous memory Memory Allocation (cont’d)

Slide 20

Slide 20 text

2018-11-28 BitVisor Summit 7 20 • Approach • Make BitVisor relocatable • Reserve physically-contiguous memory at boot time and use it • boot option ! Current solution • CMA (Continuous Memory Allocator) Memory Allocation (cont’d)

Slide 21

Slide 21 text

2018-11-28 BitVisor Summit 7 21 • Memory map of the BitVisor • 0x40000000-0x7FFFFFFFFF (1GB) • Need to switch Paging PML4 PDPT PD Offset 0 20 21 29 30 38 39 47 7 15 23 31 CR3 PML4E[0] PDPTE[0] 0x00000000-3FFFFFFF (Identity Mapping) 0x40000000-7FFFFFFF PDPTE[1] (entry_pd)

Slide 22

Slide 22 text

2018-11-28 BitVisor Summit 7 22 Switching Address Space phys OS virt BitVisor virt 1GiB 1GiB Create Context-handover page table in BitVisor Create Context-handover page table in Linux

Slide 23

Slide 23 text

2018-11-28 BitVisor Summit 7 23 • Delayed AP(s) initialization during UEFI boot • In UEFI boot, only the BSP is virtualized at first • Virtualize APs when the guest OS tries to initialize them • Trap the access to the local APIC area (Startup-IPI) • To cope with some firmware problems [, 2013] • When virtualizing from the kernel module, APs are already run! • Approach • Just virtualize each core • In the kernel module, send IPI and virtualize in the handler • Synchronize? Virtualizing Application Processors (APs)

Slide 24

Slide 24 text

2018-11-28 BitVisor Summit 7 24 • Trampoline code starts in the real mode • Need to allocate memory for the real mode • Max memory limit is < 1MB! • Approach • Reserve memory in the same way as mentioned previously • Change trampoline codes so that it starts with long mode ! Real mode booting loadvmm.elf return to loader Trampoline Code VMLAUNCH

Slide 25

Slide 25 text

2018-11-28 BitVisor Summit 7 25 • The guest should not execute instructions that BitVisor does not support • e.g., • VMX instructions • PCID (INVPCID) • (partially supported now) • (BitVIsor conceals unsupported features at the boot time) • Approach • Don’t use in the guest OS • Boot option • Patch BitVisor Unsupported Instructions

Slide 26

Slide 26 text

2018-11-28 BitVisor Summit 7 26 • Reserve the guest state for the VMENTRY • TR, GDTR, IDTR • IA32_SYSENTER_CS, IA32_SYSENTER_EIP, IA32_SYSENTER_RSP • FS_BASE, GS_BASE • Entry point • Place in the specific ELF section • SMEP / XD Bit • Supervisor Mode Execution Prevention (CR4.[21]) • Prevent execution from user memory in the kernel code (PML4.[63] = 1) Miscellaneous

Slide 27

Slide 27 text

2018-11-28 BitVisor Summit 7 27 • Reserve memory for BitVisor at boot time • Load BitVisor to the reserved region • Send IPI to each core to run virtualization code • Create page table to run BitVIsor code • mmap + modify page table • Jump to the entry code • Save the guest state • Initialize • Return to the guest (VMENRY) (Current) Virtualization Flow

Slide 28

Slide 28 text

2018-11-28 BitVisor Summit 7 28 1. Issue VMCALL 2. Save hypervisor state 3. Jump to the guest without VMENTRY • Load guest state registers De-virtualization

Slide 29

Slide 29 text

2018-11-28 BitVisor Summit 7 29 • ! • Based System • Intel x86-64 • Linux 4.18 • BitVisor changeset 244 (2018-10-4) Implementation detail

Slide 30

Slide 30 text

Demo

Slide 31

Slide 31 text

2018-11-28 BitVisor Summit 7 31 Development Environment VMWare Workstation () Development VM Experiment VM Host Serial Port to File - Load / Unload kernel module - Share disk with the dev VM - Same Linux kernel of the dev VM - Write codes There is AVX512-related problem(?)

Slide 32

Slide 32 text

2018-11-28 BitVisor Summit 7 32 Comparison with other hypervisors BIOS boot UEFI boot Kernel module (Linux) Kernel module (Windows) Device Driver AMD License ksm × × ○ ○ × × GPL bareflank × △ ○ ○ × × LGPL BitVisor ○ ○ developing × ○ ○ BSD (incomplete)

Slide 33

Slide 33 text

2018-11-28 BitVisor Summit 7 33 • Propose on-demand virtualization scheme for BitVisor • The guest cooperation is necessary • Useful in some cases • Research & development Conclusion

Slide 34

Slide 34 text

2018-11-28 BitVisor Summit 7 34 • Finish Implementation • Using vmalloc memory region • BitVisor assumes physically contiguous memory • Device support • Other architecture / OS support • AMD, Windows, Mac • Any advice/comments welcome! Future Work

Slide 35

Slide 35 text

2018-11-28 BitVisor Summit 7 35 • J. Rutkowska, Introducing Blue Pill, SyScan’06. • S.T. King et al., SubVirt: implementing malware with virtual machines, S&P’06. • C. Gebhardt et al., LaLa: A Late Launch Application, STC’09. • R. Srinivasan et al., MIvmm: A micro VMM for development of a trusted code base, 2011. • Y. Omote et al., Improving agility and elasticity in bare-metal clouds, ASPLOLS’15. • J. Im et al., On-demand Virtualization for Live Migration in Bare Metal Cloud, SoCC’17. • , BitVisorUEFI, BitVisor Summit 2, 2013. References

Slide 36

Slide 36 text

No content

Slide 37

Slide 37 text

2018-11-28 BitVisor Summit 7 37 Boot Sequence detailed (UEFI, BSP) loader.elf:entry_func() => entry.s:entry => entry.s:uefi64_entry => uefi.c:uefi_init => entry.s:uefi_entry_start => entry.s:callmain64 => main.c:vmm_main => call_initfunc(“global”) => main.c:start_all_processors => main.c:bsp_continue => (change stack) => main.c:bsp_proc => call_initfunc(“bsp”) => call_initfunc(“para”) => call_initfunc(“pcpu”) => main.c:create_pass_vm (from “pcpu” intfunc) => bps == true => vcpu.c:load_new_vcpu() => call_initfunc(“pass”) => vmmcall_boot.c:vmm_call_boot_enable() => create vmmcall_boot_thread => wait_for_boot_continue => continue_flag = false => wait until continue_flag become true => schedule => vmmcall_boot_thread => (scheduled from vmmcall_boot_thread) => vmctrl.start_vm (vt_main.c:vt_start_vm) => vt_mainloop() vmmcall_boot.c:vmmcall_boot_thread => main.c:bsp_init_thread => main.c:initregs => calluefi.copy_uefi_bootcode() => if boot from uefi-loader-login => vmmcall_boot_continue => continue_flag = true => schedule => return to create_pass_vm => (run vm w/o driver initialization) => return to loader (boot/uefi-loader-login/loadvmm.c) => authentication is performed in the loader => loadvmm.c:decrypt_intl => vmmcall_loadcfg64_intl => vmmcall_boot.c:loadcfg64 => vmmcall_boot_intel => vmmcall_boot.c:boot_guest => wait_for_boot_continue => continue_flag = false => as a result, bsp_init_thread wakes up => vmmcall handler wait until continue_flag become true => call_initfunc(“config0”) => call_initfunc(“drivers”) => call_initfunc(“config0”) => continue_flag = true => thread exit => schedule => return to create_pass_vm

Slide 38

Slide 38 text

2018-11-28 BitVisor Summit 7 38 Boot Sequence detailed (UEFI, AP) receive SIPI => entry.s:cpuinit_start => entry.s:call_main64 => ap.c:apinitproc0 => (change stack) => ap.c:apinitproc1 => int_init_ap() (initialize IDT) => initproc_ap (= ap_initproc = ap_proc) => call_initfunc(“ap”) => call_initfunc(“para”) => call_initfunc(“pcpu”) => create_pass_vm => load_new_vcpu => vmctl.vminit => call_initfunc(“pass”) => initregs => vmctrl.init_signal => start_vm localapic.c:mmio_apic => handle_ap_start => ap.c:ap_start - copy cpuinit_start code to apinit addr => ap_start_addr => apic_send_startup_ipi => start AP => call_initfunc(“dbsp”) => time.c:time_init_dbsp => main.cwait_for_create_pass_vm - call sync_all_processors() to sync with AP BSP AP Data initialization for AP - bspinitproc1 - allocate apinit_addr() by alloc_realmodemem() - localapic_delayed_ap_start() - set ap_start function pointer to ap_start

Slide 39

Slide 39 text

2018-11-28 BitVisor Summit 7 39 uefi-loader wait_for_boot_continue continue_flag = false continue_flag = true start_vm init_regs() copy_uefi_bootcode() call_initfunc(“config0”) call_initfunc(“drivers”) call_initfunc(“config0”) vmmcall_boot_thread create vmmcall_boot_thread bsp_init_thread create_pass_vm wait while continu_flag == false schedule() schedule() thread exit

Slide 40

Slide 40 text

2018-11-28 BitVisor Summit 7 40 uefi-loader-login wait_for_boot_continue continue_flag = false continue_flag = true wait while continue_flag == true start_vm init_regs() copy_uefi_bootcode() vmmcall_boot_continue() vmmcall_boot_thread create vmmcall_boot_thread bsp_init_thread create_pass_vm wait while continu_flag == false schedule() schedule() thread exit loadvmm.elf continue_flag = true call_initfunc(“config0”) call_initfunc(“drivers”) call_initfunc(“config0”) load bitvisor.elf password authentification (vmmcall_loadcfg) From here, in the VMX non-root mode Continue to boot Return to the loader w/o drivers initialization boot_guest wait_for_boot_continue config_flag = false wait while config_flag == false schedule() schedule() vmmcall_boot

Slide 41

Slide 41 text

2018-11-28 BitVisor Summit 7 41 Page Table 1st-loader (loadvmm.elf) 2nd-loader BitVisor phys virt phys virt phys virt Identity Mapping 1GiB 4GiB Virt addr: 0x40000000 - 0x47FFFFFF (1GB) 0x40000000-0x400FFFFF: Heap 0x40100000-code end : Code code end -0x47FFFFFF : Heap Use 2MB paging

Slide 42

Slide 42 text

2018-11-28 BitVisor Summit 7 42 Page Table Change phys virt Identity Mapping to entry-pd (entry.s:uefi_entry) 1GiB Head of 64KiB of BitVisor phys virt 1GiB Map entire VMM (mm.c:create_pd)

Slide 43

Slide 43 text

2018-11-28 BitVisor Summit 7 43 Page Table Configuration (entry_pml4) PML4 PDPT PD Offset 0 20 21 29 30 38 39 47 % %%%%%%% 7 15 23 31 % %% %%%% %%%% %%%% %%%% %%%% %%%% %%%% % ------- % % %%%%%%%% %%%% %%%% %%%% %%%% %%%% %%%% %%%% %%%% % ------- %% F C4 C $$ ,:4 7 ,0 $$ 0 / :7 4 7 F 4C 3 4 = C $$ ,:4 7 1 4 = F C4 C4 $$ ,:4 7 0 / * $$ 7 % %%%%%%% 4 C7 CR3 PML4E[0] PDPTE[0] 0x00000000-3FFFFFFF (Identity Mapping) 0x40000000-7FFFFFFF entry_pml4: used at the uefi entry vmm_pml4: VMM page table uefi_entry_cr3: original UEFI cr3 (identity mapping) calluefi_uefi_cr3: used for uefi function call (= uefi_entry_cr3) PDPTE[1]

Slide 44

Slide 44 text

2018-11-28 BitVisor Summit 7 44 Page Table Configuration (vmm_pml4) PML4 PDPT PD Offset 0 20 21 29 30 38 39 47 ]+ b0b 7 15 23 31 ( b b b b ]-7777777b0b (((b(((( ((((b(((( ((((b(((( ((((b(((( ] b0b b b b b ]*7777777b0b ((b(((( ((((b(((( ((((b(((( ((((b(((( H P [UP WLH LG[SSGV [UP _ PT P ZRUT W* & bSHVbSLSUW bHWLH b UVPL b UbH b ]4 b & MUW P 0b bP /bB<< D6G2;;b11b=2 6 D6)CG3 b b= 6G= G3 b b= 6G2G3 b = 6G G3 b b= 6G G3 LT W GV VE*Fb0b Z,+ [PW G [SSGV b ]+ b b= =6G2 > H SGW W*b W* H SG W W*b W* & bSH LbHbTL bVH Lb PWL UW b & [SSGIH LG W*b0b SG UGV [SSGV V SLS V [SSGV ( bLT W GV b=2 6 D6 SLS L [SSGV ) b b=2 6 D6 [SSGV VE Fb0b SG UGV LT W GV b b= =6G2 > [SSGV VE(Fb0b SG UGV [SSGV bbbb b= =6G2 > [SSGV VE)Fb0b SG UGV [SSGV ( bbb b= =6G2 > [SSGV VE*Fb0b SG UGV [SSGV ) bbb b= =6G2 > [SSGIH LG W*b0b SG UGV [SSGVSR+ [SSGVSR+E Fbb0b SG UGV [SSGV V b b= 6G=G3 b b= 6G>CG3 b= 6GA G3 a CR3 PML4E[0] PDPTE[0] entry_pd0 = identity mapping (0x00000000-3FFFFFFF) VMM kernel space (0x40000000-47FFFFFF) PDPTE[1] PDPTE[2] PDPTE[3] vmm_pd1 = entry_pd0 (0x80000000-BFFFFFFF) Unset (for dynamic allocation) (0xC0000000-FFFFFFFF)

Slide 45

Slide 45 text

2018-11-28 BitVisor Summit 7 45 Switching Address Space phys OS virt BitVisor virt 1GiB 1GiB Create Context-handover page table in BitVisor Create Context-handover page table in Linux

Slide 46

Slide 46 text

2018-11-28 BitVisor Summit 7 46 Linux Address Space - 1 - M I = G I FF B E = M 9 .0,): G G G . .- 1 ) M B E B I .. =- 1, 82 = F II G EE IB F F =. =. 1 B E =/ . 1 F EE =% F I I = / / 1 B E 1 M E F F F I 82 MGM B E = 1 G B F F ,82 MGM B E - 1)/ I MI = MGM B E 1, 52 346 G F II G I = MGM B E . / 1 ( 72 G E F II G F IB 1 (, 72 F ME F II G I = , 1. 72 = EE 1( 72 MGM B E https://github.com/torvalds/linux/blob/master/Documentation/x86/x86_64/mm.txt

Slide 47

Slide 47 text

No content