Friday, May 25, 2018

ARM Virtualization Extensions

"All problems in computer science can be solved by another level of indirection"
  -- David Wheeler


ARM introduces its virtualization  extensions to its architecture from ARMv7. VMM thus can virtualize the entire instruction set by implementing trap-and-emulate model with hardware instead of software. ARM virtualization extensions include following:
  • Hypervisor execution mode, a higher priority mode than supervisor mode.
  • Virtual interrupts
  • System MMU, supporting multiple translation context for multiple DMA masters, two stage address translation and hardware acceleration and abstraction

HYP mode

ARMv7 introduces new privilege level and new HYP mode for  hypervisor execution. HYP mode has higher privilege than SVC mode. The introduction of new HYP mode allows most of sensitive instructions  native-run on non-secure PL1 without trap-and-emulation and the rest, e.g. guest's Load/Store, will be trapped into HYP mode. HYP mode can only enabled by software running from secure state.

With virtualization extension, the sensitive instructions which can't be execute natively will be trapped automatically when running in PL1. Hypervisor Syndrome Register (HSR), part of virtualization extension, preserves the information, e.g. reason of entry.  HVC is an instruction which helps entering HYP mode from guest OS.

Hypervisor also has its own vector table. The base address of HYP mode's vector table is saved in HVBAR, which is accessible only from monitor or HYP mode.  VBAR saves base address of PL0/1 vector table.

Memory Virtualization

Armv7 also introduces 2-Stage Address Translation where Stage 1 maps virtual address (VA) to an Intermediate Physical Address (IPA) and Stage 2 maps the IPA to Physical Address (PA). Guest OS maintains Stage 1 page tables and hypervisor take whole control of Stage 2. 

The Translation Lookaside Buffer (TLB) entry also include virtual machine ID tag (VMID) such that TLBs do not require explicit invalidation when changing among virtual machines. 
 

System MMU

A system MMU is a hardware device that provides address translation services and protection functionalities to any DMA capable agent in the system other that CPU.

Stage 1 SMMU translation enables a DMA capable device to operate on fragmented physical memory, which is simpler and more efficient than software appoach DMA scatter-gather. It also enables devices that can't access full range of memory such as 16 or 24 bit devices on 32 bit architecture to access any address of the system without help from bounce buffers.

Stage 2 SMMU translation removes the need for hypervisor to maintain shadow tables. Since Guest OS can only access DMA capable devices at IPA level, DMA attack is prevented from corrupting memory of another Guest OS.

Interrupt Virtualization

Generic Interrupt Controller (GIC) is the only interrupt controller in ARM architecture. Interrupt distributor, can be configured at boot time, saves the information of interrupt routing. Virtualization Extensions provides a separate register set for virtual interrupts such that ISR of Guest OS can interact directly with virtual GIC.  Hypervisor can  configure interrupt in Hypervisor Configuration Register (HCR) to generate a hypervisor trap and to deliver an interrupt to a CPU running in virtual process.


References

  1. https://linux.globallogic.com/materials2017/presentations/Main%20stage/Julien%20Grall%20Hypervisors%20on%20ARM%20Overview%20and%20Design%20choices.pdf
  2. https://www.slideshare.net/xen_com_mgr/hardware-accelerated-virtualization-in-the-arm-cortex-processors
  3. http://www.csd.uoc.gr/~hy428/reading/vm-support-ARM-may20-2014.pdf
  4. http://www.hotchips.org/wp-content/uploads/hc_archives/hc22/HC22.23.220-1-Brash-ARMv7A.pdf
  5. https://blog.linuxplumbersconf.org/2012/wp-content/uploads/2012/09/2012-lpc-arm-zyngier.pdf
  6. http://www.cs.nthu.edu.tw/~ychung/slides/Virtualization/VM-Lecture-7-2-Hardware%20support-ARM.pptx
  7. https://www.slideshare.net/jserv/embedded-hypervisor-for-arm

Wednesday, May 23, 2018

ARM Architecture

ARM Architecture


Similar RISC Design

  • Reduced set/single cycle/fix length instructions
  • One-stage decoding pipleline, no microcode
  • A large set of GP registers
  • 32 bit load-store architecture

With following differences

  • Multiple-register load/store with variable cycle execution
  • Inline barrel shift, leading to complex instructions
  • 16 bit Thumb instructions
  • Conditional execution
  • DSP instructions

Processor Modes

Each mode has access to its own stack space and a different subset of registers - banked registers.   
  • User mode: The only non-privileged mode.
  • FIQ mode: A privileged mode that is entered whenever the processor accepts a fast interrupt request.
  • IRQ mode: A privileged mode that is entered whenever the processor accepts an interrupt.
  • Supervisor (svc) mode: A privileged mode entered whenever the CPU is reset or when an SVC instruction is executed.
  • Abort mode: A privileged mode that is entered whenever a prefetch abort or data abort exception occurs.
  • Undefined mode: A privileged mode that is entered whenever an undefined instruction exception occurs.
  • System mode (ARMv4 and above): The only privileged mode that is not entered by an exception. It can only be entered by executing an instruction that explicitly writes to the mode bits of the Current Program Status Register (CPSR) from another privileged mode (not from user mode).
  • Monitor mode (ARMv6 and ARMv7 Security Extensions, ARMv8 EL3): A monitor mode is introduced to support TrustZone extension in ARM cores.
  • Hyp mode (ARMv7 Virtualization Extensions, ARMv8 EL2): A hypervisor mode that supports virtualization
  • Thread mode (ARMv6-M, ARMv7-M, ARMv8-M): A mode which can be specified as either privileged or unprivileged, while whether Main Stack Pointer (MSP) or Process Stack Pointer (PSP) is used can also be specified in CONTROL register with privileged access. This mode is designed for user tasks in RTOS environment but it's typically used in bare-metal for super-loop.
  • Handler mode (ARMv6-M, ARMv7-M, ARMv8-M): A mode dedicated for exception handling (except the RESET which are handled in Thread mode). Handler mode always uses MSP and works in privileged level.

Register organization




Interrupts


AArch64


  • Fixed 32 bit instruction width
  • 31 64-bit GP registers: X0-X30 with 32-bit subregisters W0-W30, +SP, +PC +ZR
  • FPU with 32 registers, each 128-bit wide
  • 64-bit addresses
  • paired load/store, no STM/LDM


References

  1. https://www.youtube.com/watch?v=7LqPJGnBPMM
  2. https://www.csie.ntu.edu.tw/~cyy/courses/assembly/12fall/lectures/handouts/lec08_ARMarch.pdf
  3. https://en.wikipedia.org/wiki/ARM_architecture
  4. https://events.static.linuxfound.org/images/stories/pdf/lcna_co2012_marinas.pdf

Sunday, May 20, 2018

Device and I/O Virtualization

Device Emulation

I/O operations are all privileged and trapped, e.g. PIO, MMIO, DMA etc, which makes it naturally virtualizable with trap-and-emulate technique. This approach requires many guest-host switches and has usually poor performance.

Paravirtualized I/O

In modern operating system environment, it is possible to install device drivers that communicate to hypervisor's emulation code directly. With this approach, the front-end driver in a guest VM forwards I/O request to back-end driver in host, which requests the forwarded I/O to HW via native driver. Paravirtualized I/O reduces number of context switches and results in major performance enhancements.

Direct Device Assignment

Device assignment allows guest OS to access underlying device directly. This approach is often used when a device is used by only one VM. However, even with device assignment,  host intercepts all interrupts so there is still performance cost.

IOMMU

Direct device assignment is vulnerable to DMA attack.  DMA is carried out with machine address which means one VM can access another VM's machine address. IOMMU secure direct I/O access by presenting a virtual address space to I/O device. Without IOMMU support, every DMA request must  be monitored by using memory protection to DMA descriptor region.

References


  1. https://www.slideshare.net/HwanjuKim/5io-virtualization
  2. https://compas.cs.stonybrook.edu/~nhonarmand/courses/sp17/cse506/slides/io_virtualization.pdf
  3. Yassour, Ben-Ami & Ben-Yehuda, Muli & Wasserman, Orit. (2008). Direct device assignment for untrusted fully-virtualized virtual machines. 
  4. https://queue.acm.org/detail.cfm?id=2071256

Thursday, May 17, 2018

Interrupt Virtualization

IO activity is a key performance factor in virtualized environment. While CPU running in a virtual machine can achieve 90-98% of native performance, IO can only approach 40%-75%. Even with direct device assignment,  devices are still unable to reach bare-metal performance because hypervior/host intercepts all interrupts, including those interrupts generated by assigned devices to signal guest OS the completion of IO requests. The overhead includes guest-host context switch which significant degrades the performance of IO-intensive applications. It is possible to adjust the devices and their drivers to generate fewer interrupts, however, doing so may have a negative effect on latency and throughput.

Full Software-based Virtualization

In a full software-based environment, a virtual CPU (VCPU) has a virtualized LAPIC associated with it. Virtual LAPIC emulates LAPIC registers and operations. 

x86 Interrupt Virtualization

x86 hardware virtualization provides two operation modes: guest mode and host mode. The host runs in host mode and creates context for guest. Different Interrupt Descriptor Tables (IDT) are used for different modes. A device can raise an interrupt to CPU when it is either running in guest or host mode. If the CPU is running in guest mode, the CPU forces exit and deliver the interrupt to host. The host may then injects virtual interrupts to guest.

It is also possible to assign physical interrupts to guest operating system, however, with current x86 virtualization implementation, either all or no physical interrupts are delivered to current running guest.

ARM Interrupt Virtualization

In ARM architecture, Generic Interrupt Controller (GIC) handles priority and distribution of all interrupts coming to the system. GIC is programmed through MMIO access. Hypervisor reroutes interrupts to correct VM and setup corresponding virtual CPU interface for the GIC.

In GICv2, virtual CPU interface allows IRQ ACKs and EOIs without VM exit. Hypervisor sets up virtual IRQs in List Registers (LR).  GICv4 allows direct injection of virtual LPIs, which allows software describe to the Interrupt Translation Service (ITS) how physical events map to virtual interrupts.

References


  1. ELI: Bare-Metal Performance for I/O Virtualization. Abel Gordon1*. Nadav Amit2*. Nadav Har'El1. Muli Ben-Yehuda21. Alex Landau1. Assaf Schuster2
  2. https://sites.google.com/site/masumzh/articles/hypervisor-based-virtualization/io-and-interrupt-virtualization
  3. http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dai0176c/ar01s03s01.html
  4. http://schd.ws/hosted_files/xendeveloperanddesignsummit2017/3e/arm_vgic_xensummit_2017.pdf
  5. http://infocenter.arm.com/help/topic/com.arm.doc.dai0492b/GICv3_Software_Overview_Official_Release_B.pdf

Monday, May 14, 2018

Memory Virtualization

Virtual Memory

Virtual memory, implemented in both hardware and software, is a memory management technique that maps memory address called virtual address or logical address into physical memory in computer hardware.

The benefits of virtual memory includes freeing applications from managing shared memory, increased security due to memory isolation and the capability of using more memory than present by paging.

VM Memory Virtualization

The VMM/Hypervisor manages machine memory and assign part of it to virtual machines.  The guest OS continues to control the mapping from virtual address (VA) to guest physical address (PA), however it doesn't have access to underlying machine physical memory (MA). VMM is responsible for  PA-to-MA mapping.

Shadow Page Table

Shadow tables maps guest virtual memory (VA) to machine physical memory (MA). VMM manages a virtual PTBR and real PTBR - MMU virtualization. When guest OS modifies PTBR, it will intercepted by VMM for further emulation.  

Each process on guest OS consumes two page, one is on guest OS and the other is shadow page.  A page fault caused by guest OS lauches walking process that costs a lot of overhead.

Hardware-Assisted Memory Virtualization

Some CPUs introduce two-level page address translation: first level of page tables stores guest virtual-to-physical translations while second level stores guest physical-to-machine translation. HW-assisted memory virtualization eliminates the overhead for software memory virtualization and is preferable for applications that have a large mount of page table miss when executing.

References

1. https://www.d.umn.edu/~gshute/os/virtual-memory.xhtml
2. https://pubs.vmware.com/vsphere-51/index.jsp#com.vmware.vsphere.resmgmt.doc/GUID-69CDC049-8B42-4D26-8B47-94961B1777A4.html
3. http://www.cs.nthu.edu.tw/~ychung/slides/Virtualization/VM-Lecture-2-2-SystemVirtualizationMemory.pptx

Saturday, May 12, 2018

Popek and Goldberg Requirements for Virtualization

VMM Definition

Three fundamental requirements have to be meet when a virtual machine monitor creates a virtual environment that provides abstraction of a virtual machine:

  • Equivalence: The virtual hardware needs to be sufficiently equivalent to the underlying hardware
  • Safty: The virtual machine is completely isolated from other virtual machines and virtual machine monitor.
  • Performance: The overhead of virtualization must be sufficiently small.

Instruction Classification

  • Privilege instructions: those that trap in user mode and do not trap in kernel mode
  • Control-sensitive instructions: those that change control state of the architecture
  • Behavior-sensitive instructions: those whose behavior depends on the configuration of resources

Theorem

If sensitive instructions, union of control-sensitive and behavior-sensitive instructions, is a subset of privileged instructions, a virtual machine monitor can be constructed.

The theorem provides a simple technique for implementing a VMM - trap-and-emulate virtualization, all sensitive instructions always trap and pass control to VMM and non-privileged instructions are executed natively.

References

1. https://en.wikipedia.org/wiki/Popek_and_Goldberg_virtualization_requirements
2. Popek, G. J.; Goldberg, R. P. (July 1974). "Formal requirements for virtualizable third generation architectures". Communications of the ACM. 17 (7): 412–421. doi:10.1145/361011.361073.

Wednesday, May 9, 2018

CPU Virtualization

The Challenge 

CPU usually offers several levels of privilege for operating systems and applications to manage access to computer hardware, e.g. X86 has 4 rings. Virtualizing  CPU architectures requires placing a virtualization layer, running most privileged level, to create and manage the virtual machines. Some sensitive instructions have different semantics when they are not executed in most privileged level. Trapping and translating these critical and privileged instruction at runtime was a challenge.

Solutions

Full virtualization

Full virtualization uses a combination binary translation and direct execution.  Binary translation translates kernel code and replaces nonvirtualizable instructions on the fly with new sequences of instructions that have the intended effect on virtual hardware. User level code is directly executed. 

Full virtualization requires no modification of guest OS. It simplifies migration and portability.

Examples: VMWare, Windows virtual server

Paravirtualization

With paravirtualization, guest OS kernel is modified to replace nonvirtualizable instructions to hypercalls that communicates directly with virtualization layer, hypervisor. The hypervisor also provides hypercall interfaces for other critical kernel operations.

Paravirtualization does not support unmodified guest OS kernel and thus provides poor compatibility and portability. Support and maintainability is also an issue in production environment as it needs deeps kernel modification. 

Examples: Xen


Hardware assisted virtualization

CPU vendor also supports running privileged instructions with a new CPU privilege mode - hypervisor mode. Privileged and sensitive instructions are set to automatically trap to the hypervisor. 

Examples: Intel-VT,  AMD-V, arm-VHE

References

https://www.vmware.com/techpapers/2007/understanding-full-virtualization-paravirtualizat-1008.html