Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to bhyve

Introduction to bhyve

Takuya ASADA

March 12, 2014
Tweet

More Decks by Takuya ASADA

Other Decks in Technology

Transcript

  1. Introduction to bhyve
    Takuya ASADA / @syuu1228

    View Slide

  2. What is bhyve?

    View Slide

  3. What is bhyve?
    • bhyve is a hypervisor introduced in FreeBSD

    • Similar to Linux KVM, runs on host OS

    • BSD License

    • Developed by Peter Grehan and Neel Natu

    View Slide

  4. bhyve features
    • Required Intel VT-x and EPT (Nehalem or later)

    AMD support in progress

    • Does not support BIOS/UEFI for now

    UEFI support in progress

    • Minimal device emulation support:

    virtio-blk, virtio-net, COM port + α

    • Supported guest OS:

    FreeBSD/amd64, i386, Linux/x86_64, OpenBSD/amd64

    View Slide

  5. How to use it?
    kldload vmm.ko

    /usr/sbin/bhyveload -m ${mem} -d ${disk} ${name}

    /usr/sbin/bhyve -c ${cpus} -m ${mem} \

    -s 0,hostbridge -s 2,virtio-blk,${disk} \

    -s 3,virtio-net,${tap} -s 31,lpc -l com1,stdio vm0

    View Slide

  6. How to run Linux?
    • bhyve OS Loader(/usr/sbin/bhyveload) only supports
    FreeBSD

    You need another OS Loader to support other OSs

    • grub2-bhyve is the solution

    • It’s modified version grub2, runs on host OS (FreeBSD)

    • Can load Linux and OpenBSD

    • Available in ports & pkg!

    View Slide

  7. Virtualization in general

    View Slide

  8. Difference between container
    and hypervisor
    • Jail is container
    • It’s virtualize OS environment on kernel level

    • bhyve is hypervisor
    • It virtualizes whole machine

    • Totally different approach

    View Slide

  9. Container
    • Process in jail is just a normal
    process for the kernel

    • The kernel do some tricks to
    isolate environments between jails

    • Lightweight, less-overhead

    • Share one kernel with all jails

    → If the kernel panics, all jails

    will die

    • You cannot install another OS

    (No Windows, No Linux!)
    jail2
    jail1
    Kernel
    Disk
    NIC
    proc
    ess
    proc
    ess
    proc
    ess

    View Slide

  10. Hypervisor
    • Hypervisor virtualizes a machine

    • From guest OS, it looks like real
    hardware

    • Virtual machine is a normal
    process for host OS

    • Does not share kernel, it is
    completely isolated

    • You can run Full OS inside of
    the VM → Windows! Linux!
    Kernel
    Disk
    NIC
    Hypervisor
    proc
    ess
    vm1
    Kernel
    Disk
    NI
    C
    proc
    ess
    vm2
    Kernel
    Disk
    NI
    C
    proc
    ess

    View Slide

  11. How hypervisor virtualize
    machine?
    • To make complete virtual machine, you need to
    virtualize following things:

    • CPU

    • Memory (Address Space)

    • I/O

    View Slide

  12. CPU Virtualization:

    Emulate entire CPU?
    • Like QEMU

    • You can emulate the entire CPU operation on a normal process

    • Very slow, not a really useful choice for virtualization
    QEMU mov dx,3FBh
    mov al,128
    out dx,al
    $16
    FNVMB
    UPS
    SVO
    WJSUVBM
    EFWJDF
    OS
    QIZTJDB
    MEFWJDF
    QIZTJD
    BM
    $16
    IO

    View Slide

  13. CPU Virtualization:

    Direct execution?
    • You want run guest instructions directly on a real
    CPU since you are virtualizing x86 on x86

    • You need to avoid executing some instructions which
    modify system global state, or perform I/O (called
    sensitive instructions)

    • If you execute these instructions on a real CPU, it
    may break host OS state such as directly accessing
    a HW device

    View Slide

  14. Perform I/O on VM
    • You need to avoid access to real HW from VM

    • Need to prevent execution of the instruction
    GuestOS
    Virtual CPU
    Real Display
    outb

    View Slide

  15. Perform IO on VM
    • You can trap them by executing in lower privileged mode

    • However, on x86, there are some instructions which are impossible to trap

    because these are nonprivileged instructions
    GuestOS
    Virtual CPU
    outb
    Virtual Display
    trap!

    View Slide

  16. Software techniques to
    virtualize x86
    • Binary translation (old VMware): interpret & modify
    guest OS’s instructions on-the-fly

    → Runs fast, but implementation is very complex

    • Paravirtualization (old Xen): Modify guest OS for

    the hypervisor

    → Runs fast, but is impossible to run unmodified OS’s

    • We want an easier & better solution

    → HW assisted virtualization!

    View Slide

  17. Hardware assisted
    virtualization(Intel VT-x)
    • New CPU mode: 

    VMX root mode (hypervisor) / VMX non-root mode (guest)

    • If some event needs to emulate in the hypervisor,

    CPU stops guest, exit to hypervisor → VMExit

    • You don’t need complex software techniques

    You don’t have to modify the guest OS
    User
    (Ring 3)
    Kernel
    (Ring 0)
    User
    (Ring 3)
    Kernel
    (Ring 0)
    VMX
    root mode
    VMX
    non-root
    mode
    VMEntry
    VMExit

    View Slide

  18. Memory Virtualization
    • If you run guest OS natively, memory address translation become problematic

    • If GuestB loads Page table A, virtual page 1 translate to Host physical page 1

    but you meant Host physical page 5
    1SPDFTT"

    1SPDFTT#


    (VFTUQIZTJDBMNFNPSZ
    2
    1
    3
    4
    1 1
    2
    1 3
    2 4
    1BHFUBCMF"
    1BHFUBCMF#
    (VFTU"



    2
    1
    3
    4
    1 1
    2
    1 3
    2 4
    )PTUQIZTJDBMNFNPSZ
    2
    1
    7
    3
    4
    5
    6
    8
    1SPDFTT"
    1SPDFTT#
    (VFTUQIZTJDBMNFNPSZ
    1BHFUBCMF"
    1BHFUBCMF#
    (VFTU#

    View Slide

  19. Shadow Paging
    • Trap page table loading/modifying, create “Shadow Page Table”,
    tell physical page number to the MMU

    • A software trick that works well, but is slow



    2
    1
    3
    4
    1 2
    2
    1 3
    2 4
    2
    1
    7
    3
    4
    5
    6
    8
    1 5
    2
    1 7
    2 8
    1SPDFTT"
    1SPDFTT#
    (VFTUQIZTJDBMNFNPSZ
    1BHFUBCMF"
    1BHFUBCMF#
    (VFTU"
    )PTUQIZTJDBMNFNPSZ
    1BHFUBCMF"
    1BHFUBCMF#

    View Slide

  20. Nested Paging (Intel EPT)
    • HW assisted memory virtualization!

    • You will have Guest physical : Host physical translation table

    • MMU translates address by two step (Nested)



    2
    1
    3
    4
    1 2
    2
    1 3
    2 4
    2
    1
    7
    3
    4
    5
    6
    8
    1 5
    2 6
    &15"
    3 7
    4 8
    1SPDFTT"
    1SPDFTT#
    (VFTUQIZTJDBMNFNPSZ
    1BHFUBCMF"
    1BHFUBCMF#
    (VFTU"
    )PTUQIZTJDBMNFNPSZ

    View Slide

  21. I/O Virtualization
    • To run unmodified OSs, you’ll need to emulate all
    devices what you have on the real hardware

    • SATA, NIC(e1000), USB(ehci), VGA(Cirrus),
    Interrupt controller(LAPIC, IO-APIC),
    Clock(HPET), COM port…

    • Emulating real devices is not very fast because it
    causes lot of VMExits, not ideal for for virtualization

    View Slide

  22. Paravirtual I/O
    • Virtual I/O device is designed for VM use

    • Much faster than emulating real devices

    • Required device driver on guest OS

    • De-facto standard: virtio-blk, virtio-net

    View Slide

  23. PCI Device passthrough
    • If you attach a real HW device on a VM, you will have a problem with DMA

    • Because the device requires physical address for DMA but the guest OS
    doesn’t know the Host physical address

    • Address translator for the devices: IOMMU(Intel VT-d)
    • Translates guest physical to host physical using a translation table
    1IZTJDBMNFNPSZ
    2
    1
    7
    3
    4
    5
    6
    8
    PCI
    %FWJDFT
    DMA!
    5 1
    6 2
    7 3
    8 4
    IOMMU
    USBOTMBUJPOUBCMF
    1SPDFTT"

    1SPDFTT#

    (VFTUQIZTJDBM
    NFNPSZ
    2
    1
    3
    4
    1 2
    2
    1 3
    2 4
    1BHFUBCMF"
    1BHFUBCMF#
    (VFTU"
    1 5
    2 6
    &15"
    3 7
    4 8

    View Slide

  24. bhyve internals

    View Slide

  25. How bhyve virtualize machine?
    • CPU: HW-assisted virtualization (Intel VT-x)

    • Memory: HW-assisted memory virtualization (Intel
    EPT)

    • IO: virtio, PCI passthrough, +α

    • Uses HW assisted features

    View Slide

  26. bhyve overview
    • bhyveload: loads

    guest OS

    • bhyve: userland part of
    Hypervisor, emulates
    devices

    • bhyvectl: a management
    tool

    • libvmmapi: userland API

    • vmm.ko: kernel part of
    Hypervisor
    FreeBSD kernel
    bhyveload
    bhyve
    /dev/vmm/${vm_name} WNNLP

    (VFTU
    LFSOFM
    $SFBUF7.JOTUBODF
    MPBEHVFTULFSOFM
    3VO7.JOTUBDF
    H
    D
    N
    I
    C
    Console
    %JTLJNBHF
    UBQEFWJDF
    TUEJOTUEPVU
    bhyvectl
    libvmmapi
    %FTUSPZ7.
    JOTUBODF
    mmap/ioctl

    View Slide

  27. vmm.ko
    • All VT-x features only accessible in kernel
    mode, vmm.ko handles it

    • Most important work of vmm.ko is CPU
    mode switching between hypervisor/guest

    • Provides interface for userland via /dev/
    vmm/${vmname}

    • Each vmm device file contains each VM
    instance state

    View Slide

  28. /dev/vmm/${vmname}
    interfaces
    • create/destroy

    Can create/destroy device file via sysctl

    hw.vmm.create, hw.vmm.destroy

    • read/write/mmap

    Can access guest memory area by standard
    syscall (Which means you even can dump
    guest memory by dd command)

    • ioctl

    Provides various operations to VM

    View Slide

  29. /dev/vmm/${vmname}
    ioctls
    • VM_MAP_MEMORY: Maps guest memory
    area at requested size

    • VM_SET/GET_REGISTER: Access registers

    • VM_RUN: Run guest machine, until virtual
    devices accessed (or some other trap
    happened)

    View Slide

  30. libvmmapi
    • wrapper library of /dev/vmm operations

    • vm_create(name)→ sysctl(“hw.vmm.create”, name)

    • vm_set_register(reg, val) →
    ioctl(VM_SET_REGISTER, reg, val)

    View Slide

  31. bhyveload
    • bhyve uses OS loader instead of BIOS/UEFI, to load guest OS

    • FreeBSD bootloader ported to userland: userboot

    • bhyveload runs host OS, to initialize guest OS

    • Once it called, it does following things:

    • Parse UFS on diskimage, find kernel

    • Load kernel to guest memory area

    • Initialize Page Table

    • Create GDT, IDT, LDT

    • Initialize special registers to get ready for 64bit mode

    • Guest machine can starts from kernel entry point, with 64bit mode

    View Slide

  32. bhyve
    • bhyve command is the userland part of the hypervisor

    • It invokes ioctl(VM_RUN) to run GuestOS

    • Emulates virtual devices

    • Provides user interface(no GUI for now)

    View Slide

  33. main loop in bhyve
    while (1) {

    ioctl(VM_RUN, &vmexit);

    switch (vmexit.exit_code) {

    case IOPORT_ACCESS:


    emulate_device(vmexit.ioport);

    …

    }

    }

    View Slide

  34. Q&A?

    View Slide