version 0.3 draft for the time being. – Nothing is guaranteed. Use at your own risk. • The materials used in this slides could be found on : – https://content.riscv.org/wp-content/uploads/2017/12/Tue0942-ri scv-hypervisor-waterman.pdf – https://lkml.org/lkml/2019/5/30/714
an isomorphism between a guest system and a host. “Formal Requirements for Virtualizable Third Generation Architectures”, Popek and Goldberg, 1974 guest host S i S j e i (s i ) S’ i S’ j e i ’(s’ i ) V(S i ) V(S j ) A virtual machine map (VM map) V : guest → host is a one-one homomorphism with respect to all the operators e i in the instruction sequence set L. That is, for any state Si in guest and any instruction sequence e i , there exists an instruction sequence e i ’ such that V(e i (S i )) = e i '(V(S i )).
for a computer system. The complex microcoded instructions that support logical partitioning on the IBM System/390 are replaced by programs that use the basic ISA of the host platform and run in a special mode that is more privileged than all other software on the system. Thus the definition of a new mode of operation is what distinguishes this class of partitioning. This new mode is used by the hardware vendors to provide partitioning capability. If the mode is not exposed in the ISA, then the software that runs in this mode can be viewed essentially as an extension of the hardware itself, very much like the VMM software in a codesigned virtual machine. The common name given to this piece of software is the hypervisor. Virtual Machines: Versatile Platforms for Systems and Processes By James E. Smith and Ravi Nair
is available. – It should not be hardwired if RV-H is supported by this particular piece of hardware. Let VMM/OS decides. • S mode becomes H ypervisor-extented S mode. • V irtualized S mode and V irtualized U mode are introduced – Simply put, these are ordinary S/U mode caged in virtualization environment. • Hypervisor CSRs – Ones for VMM/Hypervisor to function, e.g. hstatus, hgatp(Hypervisor Guest Address Translation and Protection Register) hideleg, hedeleg …... – Ones for (VS/VU)-HS mode switching, e.g. bsstatus, bsatp ...… (b means background) Trivia: satp is kinda CR3 (a.k.a. pdbr) equivalent for RV.
1-to-0), the HW implementation will swap foreground and background supervisor CSRs. • When V=1, 2-level address translation is enabled. M mode S mode U mode M mode HS mode U mode VS mode VU mode RV-H V = 0 V = 1
world, by default, all traps are handled in M-mode. Yet when OS/VMM are presented, we often delegate the handling procedure to S-mode with the help of mideleg and medeleg CSRs. ( For instance, BBL/OpenSBI will delegate most of traps to Linux). So, when a trap occurs in VU/VS mode, we might need to delegate it back to VS mode code (where guest OS kernel / guest VMM resides). Thus, hedeleg and hideleg CSRs are provided by H- extension.
STL, SP2P and SP2V field. – SPV (Supervisor Previous Virtualization Mode) represents the value of V mode before trap into HS-mode. When V=0, a SRET instruction gets executed then SPV value assigns to V. – When a trap is taken into HS-mode, bits SP2V and SP2P are set to the values that SPV and the HS-level SPP had before the trap. (Before the trap, the HS-level SPP is sstatus.SPP if V=0, or bsstatus.SPP if V=1.) When a SRET instruction is executed when V=0, the reverse assignments occur: after SPV and sstatus.SPP have supplied the new virtualization and privilege modes, they are written with SP2V and SP2P, respectively. – The STL bit (Supervisor Translation Level), which indicates which address-translation level caused an access-fault or page-fault exception, is also written by the implementation whenever a trap is taken into HS-mode. This could be used as the way to launch or resume to guest. SPP stands for Supervisor Previous Privilege, which might be S or U.
Sv32 : taking 32bit VA, addresing to 34bit PA • RV64 – Bare (no translation) – Sv39 : taking 39bit VA, addressing to 56bit PA – Sv48 : taking 48bit VA, addressing to 56bit PA
- - Given a virtual address Let base be satp.ppn × 212 (PAGESIZE), and let i = 3 (LEVEL) − 1 which is 2. Let pte be the value of the PTE at address base+va.vpn[ i]×8(PTESIZE).
- - Given a virtual address Let base be satp.ppn × 212 (PAGESIZE), and let i = 3 (LEVEL) − 1 which is 2. Let pte be the value of the PTE at address base+va.vpn[ i]×8(PTESIZE). If pte.v = 0, or if pte.r = 0 and pte.w = 1, not a valid address or is reserved for furture, raise exception.
- - Given a virtual address Let base be satp.ppn × 212 (PAGESIZE), and let i = 3 (LEVEL) − 1 which is 2. Let pte be the value of the PTE at address base+va.vpn[ i]×8(PTESIZE). If neither pte.r = 1 or pte.x = 1, this PTE is a pointer to the next level of the page table.
- - Given a virtual address Let base be satp.ppn × 212 (PAGESIZE), and let i = 3 (LEVEL) − 1 which is 2. Let pte be the value of the PTE at address base+va.vpn[ i]×8(PTESIZE). If neither pte.r = 1 or pte.x = 1, this PTE is a pointer to the next level of the page table. Let i = i − 1. If i < 0, stop and raise a page-fault exception corresponding to the original access type. Otherwise, let base = pte.ppn × 212(PAGESIZE) and go to step 2.
- - Given a virtual address Let base be satp.ppn × 212 (PAGESIZE), and let i = 3 (LEVEL) − 1 which is 2. Let pte be the value of the PTE at address base+va.vpn[ i]×8(PTESIZE). If pte.r = 1 or pte.x = 1, a leaf PTE has been found. Determine whether requested memory access is allowed by the pte.r, pte.w, pte.x, and pte.u bits for privilege mode check. If not, stop and raise a page-fault exception corresponding to the original access type.
- - Given a virtual address Let base be satp.ppn × 212 (PAGESIZE), and let i = 3 (LEVEL) − 1 which is 2. Let pte be the value of the PTE at address base+va.vpn[ i]×8(PTESIZE). If pte.r = 1 or pte.x = 1, a leaf PTE has been found. If i > 0 and pte.ppn[i − 1 : 0] != 0, this is a misaligned superpage; stop and raise a page-fault exception corresponding to the original access type. ppn[1:0] if i =2 :
- - Given a virtual address Finally, translation succeed.The translated physical address is given as follows: • pa.pgoff = va.pgoff. • If i > 0, then this is a superpage translation and pa.ppn[i − 1 : 0] = va.vpn[i − 1 : 0]. • pa.ppn[ 3 (LEVEL) − 1 : i] = pte.ppn[ 3 (LEVEL) − 1 : i]. Assuming i = 2
- - Given a virtual address Finally, translation succeed.The translated physical address is given as follows: • pa.pgoff = va.pgoff. • If i > 0, then this is a superpage translation and pa.ppn[i − 1 : 0] = va.vpn[i − 1 : 0]. • pa.ppn[ 3 (LEVEL) − 1 : i] = pte.ppn[ 3 (LEVEL) − 1 : i]. Assuming i = 0
could take 34bit wide GPA directly while Sv39x4 and Sv48x4 only takes 41 and 50bit of GPA respectively. – According to spec, remaining guest address bits (63~41 in Sv39x4 and 63~50 in Sv48x4) should be zeroes, or a page-fault exception will occur, attributed to GPA translation. • The translation algorithm is identical to ordinary ones, except in step 1 - - Let base be hgatp.ppn × 212 (PAGESIZE). Note that (only) the root page table needs to be 16Kb alignment, which leads the lowend 2bits of hgatp.ppn to be 0s. So even thought GPA.vpn[2] << 8 might overlap with the pgt base address, it still won’t cause troubles.
to spec, executing WFI in VS-mode or VU-mode causes an illegal instruction exception. • In RISC-V and many architecture, WFI is used to ask CPU to “sleep.” So, as the guest CPU is now sleeping, it makes sense to trap back to HS-mode and let VMM decides what to do, e.g. schedule another guest VCPU ...... HS mode VS mode VU mode V = 0 V = 1 Trap
MPRV (Modified PriVilege) feature and its related issues. In RISC-V, MPRV could be set by M-mode code in order to support unaligned memory access - - MMU will step in and translate the given address. – So, with hypervisor extension support, the HW implementation should be able to translate VS and VU mode while SPRV in hstatus is set.