Upgrade to Pro — share decks privately, control downloads, hide ads and more …

(outdated) Glimpse at RISC-V H extension draft

(outdated) Glimpse at RISC-V H extension draft

Disclaimer:
Due to LinkedIn's break-up with SlideShare, I'm moving my slides to SpeakerDeck ...

This was a glimpse at the RISC-V Hypervisor Extension back in 2019, June.
I discussed about the desin and qemu's emulation on this extension.

Avatar for Ruinland

Ruinland

June 27, 2019
Tweet

More Decks by Ruinland

Other Decks in Technology

Transcript

  1. Disclaimer • H extension is non-frozen and only in its

    version 0.3 draft for the time being. – Nothing is guaranteed. Use at your own risk. • The materials used in this slides could be found on : – https://content.riscv.org/wp-content/uploads/2017/12/Tue0942-ri scv-hypervisor-waterman.pdf – https://lkml.org/lkml/2019/5/30/714
  2. Guideline • Virtualization/hypervisor 101 • Quick walkthrough for RISC-V Hypervisor

    extension • RV-H emulation on QEMU • Hypervisor adoption
  3. What is “virtualization” ? Formally, virtualization is the construction of

    an isomorphism between a guest system and a host. “Formal Requirements for Virtualizable Third Generation Architectures”, Popek and Goldberg, 1974 guest host S i S j e i (s i ) S’ i S’ j e i ’(s’ i ) V(S i ) V(S j ) A virtual machine map (VM map) V : guest → host is a one-one homomorphism with respect to all the operators e i in the instruction sequence set L. That is, for any state Si in guest and any instruction sequence e i , there exists an instruction sequence e i ’ such that V(e i (S i )) = e i '(V(S i )).
  4. “Hypervisor” • A software layer that provides a “logical partitioning”

    for a computer system. The complex microcoded instructions that support logical partitioning on the IBM System/390 are replaced by programs that use the basic ISA of the host platform and run in a special mode that is more privileged than all other software on the system. Thus the definition of a new mode of operation is what distinguishes this class of partitioning. This new mode is used by the hardware vendors to provide partitioning capability. If the mode is not exposed in the ISA, then the software that runs in this mode can be viewed essentially as an extension of the hardware itself, very much like the VMM software in a codesigned virtual machine. The common name given to this piece of software is the hypervisor. Virtual Machines: Versatile Platforms for Systems and Processes By James E. Smith and Ravi Nair
  5. Virtualizing what? • Memory , e.g. page table for address

    translation • I/O & interrupts • CSRs (control/status regs)
  6. RV-H Extension Spec • MISA [7] indicates whether H extension

    is available. – It should not be hardwired if RV-H is supported by this particular piece of hardware. Let VMM/OS decides. • S mode becomes H ypervisor-extented S mode. • V irtualized S mode and V irtualized U mode are introduced – Simply put, these are ordinary S/U mode caged in virtualization environment. • Hypervisor CSRs – Ones for VMM/Hypervisor to function, e.g. hstatus, hgatp(Hypervisor Guest Address Translation and Protection Register) hideleg, hedeleg …... – Ones for (VS/VU)-HS mode switching, e.g. bsstatus, bsatp ...… (b means background) Trivia: satp is kinda CR3 (a.k.a. pdbr) equivalent for RV.
  7. RV-H Extension Spec (cont.) • When transitioning “V” (0-to-1 or

    1-to-0), the HW implementation will swap foreground and background supervisor CSRs. • When V=1, 2-level address translation is enabled. M mode S mode U mode M mode HS mode U mode VS mode VU mode RV-H V = 0 V = 1
  8. RV-H Extension Spec (cont.) • Support for 2-level transldation. –

    Without Hypervisor extension support : – With Hypervisor extension support : guest virtual addr guest physical addr host physical addr 1st level pgt shadow page table guest virtual addr 1st level pgt guest physical addr 2nd level pgt host physical addr Remember bsatp and satp ? satp holds guest OS’ pgt base addr while bsatp holds host OS’s pgt base addr if V=1. Background(bsatp) and foreground (satp) will swap during V mode transition.
  9. Trap handling w/ RV-H • Trap delegation : In RISC-V’s

    world, by default, all traps are handled in M-mode. Yet when OS/VMM are presented, we often delegate the handling procedure to S-mode with the help of mideleg and medeleg CSRs. ( For instance, BBL/OpenSBI will delegate most of traps to Linux). So, when a trap occurs in VU/VS mode, we might need to delegate it back to VS mode code (where guest OS kernel / guest VMM resides). Thus, hedeleg and hideleg CSRs are provided by H- extension.
  10. Trap handling w/ RV-H (cont.) In hstatus, there are SPV,

    STL, SP2P and SP2V field. – SPV (Supervisor Previous Virtualization Mode) represents the value of V mode before trap into HS-mode. When V=0, a SRET instruction gets executed then SPV value assigns to V. – When a trap is taken into HS-mode, bits SP2V and SP2P are set to the values that SPV and the HS-level SPP had before the trap. (Before the trap, the HS-level SPP is sstatus.SPP if V=0, or bsstatus.SPP if V=1.) When a SRET instruction is executed when V=0, the reverse assignments occur: after SPV and sstatus.SPP have supplied the new virtualization and privilege modes, they are written with SP2V and SP2P, respectively. – The STL bit (Supervisor Translation Level), which indicates which address-translation level caused an access-fault or page-fault exception, is also written by the implementation whenever a trap is taken into HS-mode. This could be used as the way to launch or resume to guest. SPP stands for Supervisor Previous Privilege, which might be S or U.
  11. Ordinary Addressing Modes • RV32 – Bare (no translation) –

    Sv32 : taking 32bit VA, addresing to 34bit PA • RV64 – Bare (no translation) – Sv39 : taking 39bit VA, addressing to 56bit PA – Sv48 : taking 48bit VA, addressing to 56bit PA
  12. Ordinary address translation • Let’s take Sv39 for example -

    - Given a virtual address PTE: VA: ‘V’ stands for “valid.” ‘U’ means user-accessible. ‘D’ idicates “written.” (dirty) ‘G’ hints a global mapping.
  13. Ordinary address translation (cont.) • Let’s take Sv39 for example

    - - Given a virtual address Let base be satp.ppn × 212 (PAGESIZE), and let i = 3 (LEVEL) − 1 which is 2.
  14. Ordinary address translation (cont.) • Let’s take Sv39 for example

    - - Given a virtual address Let base be satp.ppn × 212 (PAGESIZE), and let i = 3 (LEVEL) − 1 which is 2. Let pte be the value of the PTE at address base+va.vpn[ i]×8(PTESIZE).
  15. Ordinary address translation (cont.) • Let’s take Sv39 for example

    - - Given a virtual address Let base be satp.ppn × 212 (PAGESIZE), and let i = 3 (LEVEL) − 1 which is 2. Let pte be the value of the PTE at address base+va.vpn[ i]×8(PTESIZE). If pte.v = 0, or if pte.r = 0 and pte.w = 1, not a valid address or is reserved for furture, raise exception.
  16. Ordinary address translation (cont.) • Let’s take Sv39 for example

    - - Given a virtual address Let base be satp.ppn × 212 (PAGESIZE), and let i = 3 (LEVEL) − 1 which is 2. Let pte be the value of the PTE at address base+va.vpn[ i]×8(PTESIZE). If neither pte.r = 1 or pte.x = 1, this PTE is a pointer to the next level of the page table.
  17. Ordinary address translation (cont.) • Let’s take Sv39 for example

    - - Given a virtual address Let base be satp.ppn × 212 (PAGESIZE), and let i = 3 (LEVEL) − 1 which is 2. Let pte be the value of the PTE at address base+va.vpn[ i]×8(PTESIZE). If neither pte.r = 1 or pte.x = 1, this PTE is a pointer to the next level of the page table. Let i = i − 1. If i < 0, stop and raise a page-fault exception corresponding to the original access type. Otherwise, let base = pte.ppn × 212(PAGESIZE) and go to step 2.
  18. Ordinary address translation (cont.) • Let’s take Sv39 for example

    - - Given a virtual address Let base be satp.ppn × 212 (PAGESIZE), and let i = 3 (LEVEL) − 1 which is 2. Let pte be the value of the PTE at address base+va.vpn[ i]×8(PTESIZE). If pte.r = 1 or pte.x = 1, a leaf PTE has been found. Determine whether requested memory access is allowed by the pte.r, pte.w, pte.x, and pte.u bits for privilege mode check. If not, stop and raise a page-fault exception corresponding to the original access type.
  19. Ordinary address translation (cont.) • Let’s take Sv39 for example

    - - Given a virtual address Let base be satp.ppn × 212 (PAGESIZE), and let i = 3 (LEVEL) − 1 which is 2. Let pte be the value of the PTE at address base+va.vpn[ i]×8(PTESIZE). If pte.r = 1 or pte.x = 1, a leaf PTE has been found. If i > 0 and pte.ppn[i − 1 : 0] != 0, this is a misaligned superpage; stop and raise a page-fault exception corresponding to the original access type. ppn[1:0] if i =2 :
  20. Ordinary address translation (cont.) • Let’s take Sv39 for example

    - - Given a virtual address Finally, translation succeed.The translated physical address is given as follows: • pa.pgoff = va.pgoff. • If i > 0, then this is a superpage translation and pa.ppn[i − 1 : 0] = va.vpn[i − 1 : 0]. • pa.ppn[ 3 (LEVEL) − 1 : i] = pte.ppn[ 3 (LEVEL) − 1 : i]. Assuming i = 2
  21. Ordinary address translation (cont.) • Let’s take Sv39 for example

    - - Given a virtual address Finally, translation succeed.The translated physical address is given as follows: • pa.pgoff = va.pgoff. • If i > 0, then this is a superpage translation and pa.ppn[i − 1 : 0] = va.vpn[i − 1 : 0]. • pa.ppn[ 3 (LEVEL) − 1 : i] = pte.ppn[ 3 (LEVEL) − 1 : i]. Assuming i = 0
  22. Address translation w/ H ext. • Since we’re taking Guest

    Physical Address, the ordinary VPN partition scheme needs some alterations - - expanding top VPN by 2 bits.
  23. Address translation w/ H ext. (cont.) • Note that Sv32x4

    could take 34bit wide GPA directly while Sv39x4 and Sv48x4 only takes 41 and 50bit of GPA respectively. – According to spec, remaining guest address bits (63~41 in Sv39x4 and 63~50 in Sv48x4) should be zeroes, or a page-fault exception will occur, attributed to GPA translation. • The translation algorithm is identical to ordinary ones, except in step 1 - - Let base be hgatp.ppn × 212 (PAGESIZE). Note that (only) the root page table needs to be 16Kb alignment, which leads the lowend 2bits of hgatp.ppn to be 0s. So even thought GPA.vpn[2] << 8 might overlap with the pgt base address, it still won’t cause troubles.
  24. QEMU RV-H extension patch • About 2,000 lines – target/riscv/cpu.h

    , target/riscv/cpu_bits.h • Macros & constants ...... – target/riscv/cpu_helper.c – target/riscv/op_helper.c – target/riscv/csr.c – target/riscv/insn_trans/trans_privileged.inc.c
  25. QEMU RV-H extension patch (cont.) • riscv_cpu_virt_enabled() / riscv_cpu_set_virt_enabled() •

    riscv_cpu_swap_background_regs() • helper_sret() • riscv_cpu_do_interrupt() • riscv_cpu_tlb_fill() – get_physical_address() • Besides riscv_cpu_tlb_fill(), load/store op helpers also call this function.
  26. QEMU RV-H extension patch (cont.) • Handle SRET. In helper_sret()

    : HS mode VS mode VU mode V = 0 V = 1 sret switch to guest
  27. Extra : Trap back to HS, case WFI • According

    to spec, executing WFI in VS-mode or VU-mode causes an illegal instruction exception. • In RISC-V and many architecture, WFI is used to ask CPU to “sleep.” So, as the guest CPU is now sleeping, it makes sense to trap back to HS-mode and let VMM decides what to do, e.g. schedule another guest VCPU ...... HS mode VS mode VU mode V = 0 V = 1 Trap
  28. QEMU RV-H extension patch (cont.) • In riscv_cpu_do_interrupt() : •

    VS/VU→VS • VS/VU→HS , e.g. WFI • HS->HS HS mode VS mode VU mode V = 0 V = 1 (1) (2) (3) (1) (2), WFI goes here. (3)
  29. Missing parts • In address translation scheme, I intentionally ignores

    MPRV (Modified PriVilege) feature and its related issues. In RISC-V, MPRV could be set by M-mode code in order to support unaligned memory access - - MMU will step in and translate the given address. – So, with hypervisor extension support, the HW implementation should be able to translate VS and VU mode while SPRV in hstatus is set.