vIOMMU implementation in BitVisor

734096b490c456ce1e8670d279ac30cf?s=47 mmisono
December 12, 2019

vIOMMU implementation in BitVisor

@BitVisor Summit 8



December 12, 2019


  1. vIOMMU Implementation in BitVisor Masanori Misono, The University of Tokyo

    @BitVisor Summit 8 2019-12-12
  2. NOTE š This talk is mainly about Intel VT-d š

    AMD IOMMU has different architecture 2019-12-12 BitVisor Summit 8 2
  3. ! Introduction What is the problem? 2019-12-12 BitVisor Summit 8

  4. What is IOMMU? š MMU for I/O device, obviously š

    Address Translation Service (ATS) from PCI-SIG specification point of view š Implementation š Intel VT-d š AMD IOMMU 2019-12-12 BitVisor Summit 8 4
  5. Address Translation Servicer (ATS) 2019-12-12 BitVisor Summit 8 5

  6. 2019-12-12 BitVisor Summit 8 Usage of IOMMU 6 IOVA PA

    (bus address) mapped region Main Memory IOMMU Device IOTLB Geust Memory Device Host Guest Pass-through (static mapping) Main Memory Device IOMMU IOTLB 2. Inter-OS Protection 1. Intra-OS Protection (To limit DMA-able region in guest memory, vIOMMU is needed)
  7. BitVisor and Intel VT-d š BitVisor conceals VT-d information in

    ACPI table (DMAR) š Reason 1. The guest will not configure IOMMU for devices BitVisor uses š Would result in IOMMU #PF 2. BitVisor uses IOMMU to protect itself (if VTD_TRANS = 1) 2019-12-12 BitVisor Summit 8 7 BitVisor VT-d Guest Devices
  8. š BitVisor passes AMD IOMMU through to the guest now!

    š It will cause problems if the guest uses it… ! š (Currently, most OSs does not utilize IOMMU by default) BitVisor and AMD IOMMU 2019-12-12 BitVisor Summit 8 8
  9. Motivation š The guest wants to use IOMMU! š DMA

    attacks are now common š ThunderClap [NDSS’19] š Protect from buggy device drivers š For virtualization š BitVisor now supports (unsafe) nested virtualization š (For my research) š VTD_TRANS only protects BitVisor, not the guest 2019-12-12 BitVisor Summit 8 9 CC-BY Brett Gutstein 2019
  10. Goal š Let the guest use IOMMU while BitVisor managing

    some devices ! š Target is VT-d š Because that is what I have 2019-12-12 BitVisor Summit 8 10
  11. Approach š No “EPT” for VT-d š Actually, “Scalable Mode

    Address Translation” enables nested translations, but currently no hardware is available š We need IOMMU emulation š i.e. vIOMMU š This is BitVisor. Only emulate necessary parts! 2019-12-12 BitVisor Summit 8 11
  12. Design! How does it work? 2019-12-12 BitVisor Summit 8 12

  13. Intel VT-d š VT-d is more than just “IOMMU” š

    DMA remapping š Interrupt remapping š Interrupt posting (Posted interrupts) š Address translation faults reporting 2019-12-12 BitVisor Summit 8 13
  14. Intel VT-d š VT-d is more than just “IOMMU” š

    DMA remapping š Interrupt remapping š Interrupt posting (Posted interrupts) š Address translation faults reporting 2019-12-12 BitVisor Summit 8 14 This is the today’s topic
  15. DMA Remapping (Legacy Mode) 2019-12-12 BitVisor Summit 8 15 Bus

    = b Dev=0, Fun=0 Dev=31, Fun=7 Bus = 0 Bus = 255 Root Table Context Table For Bus = b Root Table Address Register Dev=d, Fun=f Second-Level Page Table Structures  This is what current HW supports
  16. DMA Remapping (Legacy Mode) 2019-12-12 BitVisor Summit 8 16 Dev=0,

    Fun=0 Dev=31, Fun=7 Bus = 0 Bus = 255 Root Table Context Table For Bus = b Root Table Address Register Second-Level Page Table Structures P=1 P=1 TT=0 Translation Type
  17. DMA Remapping (Legacy Mode) 2019-12-12 BitVisor Summit 8 17 Dev=0,

    Fun=0 Dev=31, Fun=7 Bus = 0 Bus = 255 Root Table Context Table For Bus = b Root Table Address Register Second-Level Page Table Structures P=1 P=1 TT=10 If TT=10, untranslated requests are processed as pass-through. (i.e., no address translation is performed) Translation Type
  18. Relation between Translation Type (TT) and Address Type (AT) in

    TLP 2019-12-12 BitVisor Summit 8 18 AT  TT 00 01 10 Untranslated (00) Translate Translate? (implementation dependent) Pass through Translation Request (01) Block Allow Block Translated (10) Block Allow Block ← PCI Translation Layer Packet (TLP)
  19. (Unsafe) vIOMMU Overview š Let the guest directly configure VT-d

    š Show DMAR in ACPI Table š Allow to access to the VT-d registers š Shadow DMA remapping table so that devices managed by BitVisor can DMA š If we need no protection to BitVisor, simply š Copy root and context table. No need to copy second level page structures. š Set TT=10 for devices managed by BitVisor 2019-12-12 BitVisor Summit 8 19
  20. Remapping Table Shadowing 20 Root Table Context Table Second-Level Page

    Table Structures Root Table Context Table Guest Created BitVisor managed P=1 P=0 P=1 P=1 P=1 TT=0 P=1 TT=0 TT=10 P=1 Entry for the device BitVisor managed Root Table Address Register
  21. The Problem: Tracking the guest modifications of entries š VT-d

    specification allow no explicit IOMMU TLB invalidation if an entry is not on the cache š That is, the guest may add new entries w/o IOMMU TLB invalidation š Configuring EPT for all entry pages is troublesome š The Savior š Caching Mode (CM) 2019-12-12 BitVisor Summit 8 21
  22. Caching Mode (CM) 2019-12-12 BitVisor Summit 8 22 From VT-d

    spec If CM=1, “Any software updates to the remapping structures […] require explicit invalidation.” “Hardware implementations of this architecture must support a value of 0 in this field.”
  23. Tracking the modification of entries š Emulate CM=1 by intercepting

    the guest VT-d register read š Trapping IOMMU TLB invalidation by monitoring VT-d register accesses 2019-12-12 BitVisor Summit 8 23
  24. Safe vIOMMU š Shadow all entries including second level page

    structures š Ensure that the mappings do not contain BitVisor’s memory region š Shadowing is necessary even if the mapping does not contain BitVisor’s memory region because the guest might ignore caching mode and implicitly update the entry š Create remapping table for BitVisor managed devices (like what VTD_TRANS does) 2019-12-12 BitVisor Summit 8 24
  25. Evaluation ! It’s showtime! 2019-12-12 BitVisor Summit 8 25

  26. Performance š TBE 2019-12-12 BitVisor Summit 8 26

  27. Future work š Support other VT-d features š Posted interrupts

    š Inject interrupt w/o VMEXIT 2019-12-12 BitVisor Summit 8 27
  28. Conclusion š Present the way to use IOMMU in BitVisor

    along with the guest 2019-12-12 BitVisor Summit 8 28
  29. References š Intel® Virtualization Technology for Directed I/O Architecture Specification, architecture-specification š AMD I/O Virtualization Technology (IOMMU) Specification, š QEMU Wiki, Features/VT-d, š A Kegel et al., VIRTUALIZING IO THROUGH THE IO MEMORY MANAGEMENT UNIT (IOMMU), ASPLOS’16 Tutorial, 2019-12-12 BitVisor Summit 8 29
  30. References (cont’d) š N. Amit et al., vIOMMU: Efficient IOMMU

    Emulation, ATC’11, š M. Marka et al., rIOMMU: Efficient IOMMU for I/O Devices that Employ Ring Buffers, ASPLOS’15. š O. Peleg et al., Utilizing the IOMMU Scalably, ATC’15. š A. Markuze et al., True IOMMU Protection from DMA Attacks: When Copy is Faster than Zero Copy, ASPLOS’16. š A. Markuze et al., DAMN: Overhead-free IOMMU protection for networking, ASPLOS’18. š B. Morgan et al., IOMMU protection against I/O attacks: a vulnerability and a proof of concept, Journal of the Brazilian Computer Society 2018. š A. Theodore Markettos et al., Thunderclap: Exploring Vulnerabilities in Operating System IOMMU Protection via DMA from Untrustworthy Peripherals, NDSS’19, 2019-12-12 BitVisor Summit 8 30
  31. References (cont’d) š Lan Tianyu, [Xen-devel] Xen virtual IOMMU high

    level design doc V3, 2016, š Jason Wang and Peter Xu, Vhost and VIOMMU, KVM Forum 2016, pdf š Eric Auger, vIOMMU/ARM: full emulation and virtio-iommu approaches, KVM Forum 2017, š Peter Xu, Device Assignment with Nested Guest and DPDK, KVM Forum 2018, 2019-12-12 BitVisor Summit 8 31
  32. Backups ! Anything left? 2019-12-12 BitVisor Summit 8 32

  33. VT-d Root Entry & Context Entry 2019-12-12 BitVisor Summit 8

    33 Root Table Context Entry Root Entry Context Table P=1 P=1 TT=0 Context Table Pointer Second Level Page Translation Pointer  Some fields are omitted
  34. How to use DMA remapping š Configure root table, context

    table and page table entries š GCMD_REG.TE = 1 (Global Command Register) 2019-12-12 BitVisor Summit 8 34
  35. VT-d Scalable Mode Address Translation 2019-12-12 BitVisor Summit 8 35

    From VT-d spec
  36. IOMMU Group in Linux š Linux IOMMU driver create the

    same mapping for the devices in the same IOMMU group 2019-12-12 BitVisor Summit 8 36
  37. How to enable IOMMU in Linux š Use boot option

    š intel_iommu=on iommu=nopt š Linux configure IOMMU for each IOMMU group š intel_iommu=on iommu=pt š Only VFIO uses IOMMU for the device pass-through š To check whether the Linux uses IOMMU š "- ) ( 2019-12-12 BitVisor Summit 8 37
  38. Mapping table created by VTD_TRANS š Devices BitVisor uses š

    Can DMA only the specific region in the BitVisor’s memory š Devices the guest uses š Cannot DMA to BitVIsor’s memory 2019-12-12 BitVisor Summit 8 38
  39. DISABLE_VTD option in BitVisor š Recent firmware of Mac enables

    VT-d at boot time š DISABLE_VTD option disables VT-d by sending commands to Global Command Register š Otherwise Mac will fail to boot because they think there is no VT-d (VT-d is concealed by BitVIsor) but the actually VT-d is enabled and something go wrong 2019-12-12 BitVisor Summit 8 39