new • VMX rootkit (Blue Pill [Rutkowska, 2006], SubVirt [King et al., 2006]) • Late Launch (Intel TXT) [Gebhardt et al., 2009] [Srinivasan et al., 2011] • Loading hypervisor from kernel module and virtualizing self • ksm • bareflank • ShadowBox • HyperPlatform • BitVisor-related • De-virtualization after network booting [Omote et al., 2015] • On-demand de/re-virtualization for live migration [Im et al., 2017] • Booting BitVisor first Related Works
module Base Idea HW Linux BitVisor HW Linux HW Linux • Approach 1. Make BitVisor entirely kernel module 2. Load bitvisor.elf from kernel module ! Take this
several uefi functions • F D B G G/EDAED DB ? • For debugging • C ?? D ?DB B • Disconnect firmware drivers • ) D BF A • DGC CG BFG AG CG? DGEC G EB ? G ( G D • ( G / ( ) / G • Get ACPI RDSP Table • D C G G(B C • Memory allocation UEFI function call
several uefi functions • F D B G G/EDAED DB ? • For debugging • C ?? D ?DB B • Disconnect firmware drivers • ) D BF A • DGC CG BFG AG CG? DGEC G EB ? G ( G D • ( G / ( ) / G • Get ACPI RDSP Table • D C G G(B C • Memory allocation UEFI function call No need to support Need to support
IOMMU (DMAR) • PCIe extended configuration (MCFG) • Power management (FACP, DSDT/SSDT) • Suspend, Reset, etc. • Cope with some power-related troubles e.g., Prevent the firmware to turn off the power of the device that BitVisor uses • How to get RSDP (ACPI root table) • Search EBDA (Extended BIOS Area) (BIOS) • 0x40e: EBDA base address >> 4 • EFI Configuration Table (UEFI) • No need to search memory ACPI Table
the copy of the ACPI table that the OS has • e.g., /sys/firmware/acpi/tables (Linux) • Find ACPI table in the memory • Is the configuration table available after ( ) ? • Currently just ignore ACPI related codes :( ACPI Table (Cont’d)
memory • The 2nd-loader calls firmware function to allocate the entire VMM memory (128MiB) Memory Allocation Firmware loadvmm.elf Continue to Boot Power on VMENTRY 2nd-loader return to loader Initialize Start VMM Trampoline Code AllocPages()
to create kernel function call mechanism in the same way as the UEFI function call, but.. • Linux does not have a generic large physically-contiguous memory allocation mechanism • At most max slab size, generally <= 2MB • BitVisor is not relocatable • Assume physically-contiguous memory Memory Allocation (cont’d)
relocatable • Reserve physically-contiguous memory at boot time and use it • boot option ! Current solution • CMA (Continuous Memory Allocator) Memory Allocation (cont’d)
UEFI boot • In UEFI boot, only the BSP is virtualized at first • Virtualize APs when the guest OS tries to initialize them • Trap the access to the local APIC area (Startup-IPI) • To cope with some firmware problems [, 2013] • When virtualizing from the kernel module, APs are already run! • Approach • Just virtualize each core • In the kernel module, send IPI and virtualize in the handler • Synchronize? Virtualizing Application Processors (APs)
the real mode • Need to allocate memory for the real mode • Max memory limit is < 1MB! • Approach • Reserve memory in the same way as mentioned previously • Change trampoline codes so that it starts with long mode ! Real mode booting loadvmm.elf return to loader Trampoline Code VMLAUNCH
execute instructions that BitVisor does not support • e.g., • VMX instructions • PCID (INVPCID) • (partially supported now) • (BitVIsor conceals unsupported features at the boot time) • Approach • Don’t use in the guest OS • Boot option • Patch BitVisor Unsupported Instructions
for the VMENTRY • TR, GDTR, IDTR • IA32_SYSENTER_CS, IA32_SYSENTER_EIP, IA32_SYSENTER_RSP • FS_BASE, GS_BASE • Entry point • Place in the specific ELF section • SMEP / XD Bit • Supervisor Mode Execution Prevention (CR4.[21]) • Prevent execution from user memory in the kernel code (PML4.[63] = 1) Miscellaneous
at boot time • Load BitVisor to the reserved region • Send IPI to each core to run virtualization code • Create page table to run BitVIsor code • mmap + modify page table • Jump to the entry code • Save the guest state • Initialize • Return to the guest (VMENRY) (Current) Virtualization Flow
Development VM Experiment VM Host Serial Port to File - Load / Unload kernel module - Share disk with the dev VM - Same Linux kernel of the dev VM - Write codes There is AVX512-related problem(?)
vmalloc memory region • BitVisor assumes physically contiguous memory • Device support • Other architecture / OS support • AMD, Windows, Mac • Any advice/comments welcome! Future Work
Pill, SyScan’06. • S.T. King et al., SubVirt: implementing malware with virtual machines, S&P’06. • C. Gebhardt et al., LaLa: A Late Launch Application, STC’09. • R. Srinivasan et al., MIvmm: A micro VMM for development of a trusted code base, 2011. • Y. Omote et al., Improving agility and elasticity in bare-metal clouds, ASPLOLS’15. • J. Im et al., On-demand Virtualization for Live Migration in Bare Metal Cloud, SoCC’17. • , BitVisorUEFI, BitVisor Summit 2, 2013. References
- M I = G I FF B E = M 9 .0,): G G G . .- 1 ) M B E B I .. =- 1, 82 = F II G EE IB F F =. =. 1 B E =/ . 1 F EE =% F I I = / / 1 B E 1 M E F F F I 82 MGM B E = 1 G B F F ,82 MGM B E - 1)/ I MI = MGM B E 1, 52 346 G F II G I = MGM B E . / 1 ( 72 G E F II G F IB 1 (, 72 F ME F II G I = , 1. 72 = EE 1( 72 MGM B E https://github.com/torvalds/linux/blob/master/Documentation/x86/x86_64/mm.txt