Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Towards Support for Heterogeneous Architectures...

Towards Support for Heterogeneous Architectures on Managed Runtime Systems

Virtual Machine Meetup

October 12, 2017
Tweet

More Decks by Virtual Machine Meetup

Other Decks in Education

Transcript

  1. Towards Support for Heterogeneous Architectures on Managed Runtime Systems Andy

    Nisbet Research Fellow Advanced Processor Technologies (APT) Group Andy Nisbet School of Computer Science, University of Manchester, UK [email protected]
  2. Future Computing Challenges • Greater performance demands – within lower

    power budgets • Emergent diverse computational workloads – Computer vision, machine learning, big data • Increasing architecture heterogeneity/diversity 2
  3. Meeting Computing Challenges • Need vertical full stack optimisation to

    meet the demand of increased performance and lower power budgets • Full stack optimisation – provide abstraction and mechanisms for efficient exploitation hw diversity 3
  4. Traditional VMs and Heterogeneity • Write once, run anywhere (WORA)

    on heterogeneous platforms? • CPU heterogeneity in power performance • GPUs: programmable compute accelerators • FPGAs: fine-grained grid of logic Programming optimization & development effort Lowest-CPUs GPUs FPGAs-Highest (long “compilation”) 4
  5. Traditional VMs and Heterogeneity • Write once, run anywhere (WORA)

    doesn’t work ATM with heterogeneous diversity! • We need abstractions & mechanisms to enable efficient exploitation of diversity • We need tools to help identify/find “vertical stack” optimisation opportunities • We need codesign (modify/develop sw/hw) to evaluate different sw/hw co-operations to optimise for these opportunities 5
  6. Project BeeHive • Provide abstractions & mechanisms for efficient vertical

    stack exploitation routes • Tools to make it feasible to identify optimisation opportunities – simulation/profiling • To prototype and implement solutions – Full stack solutions, from application down to hardware – BeeHive will reduce the level of expert knowledge required to perform full stack optimisation • To investigate co-design tradeoffs between hardware versus software solutions • https://github.com/beehive-lab 6
  7. Java/OpenCL Acceleration Overview • Enables Java to target architectures with

    OpenCL compute devices (James Clarkson) [1] • JIT generated OpenCL C from Java bytecode [1] Heterogeneous Managed Runtime Systems: A Computer Vision Case Study VEE16 7
  8. APTsim: co-design Java/FPGAs 10 • MAST simple mechanism to use

    FPGAs [1] ‒ Evaluate entire end-end usage of FPGAs in a system, not just trace simulation ‒ Catch design errors as early as possible ‒ 43x speedup on OpenJDK for pre-processing stages of kfusion SLAM via JNI • Fast event driven simulation of Java hotpath - Instrumented execution generates events - FPGA models produce statistics/timing information [1] Heterogeneous Managed Runtime Systems: A Computer Vision Case Study VEE16 MAST the scaling and bilateral filter pre-processing stages
  9. APTsim: Hw/Sw Codesign (See [2]) 11 MAST is the glue

    that enables easy exploitation of FPGA resources • Acceleration • Simulation [2] The potential of dynamic binary modification and CPU/FPGA SoCs for simulation FCCM17
  10. Java/FPGA Accelerators Roadmap • Earlier slides – manually JNI and

    MAST [1] • Java/OpenCL -high level synthesis to FPGA – Can already use mechanism of JNI & MAST – Need abstraction & mechanisms to use MAST FPGA accelerators directly & automatically • Accelerators for SLAM computer vision [1] Heterogeneous Managed Runtime Systems: A Computer Vision Case Study VEE16 12
  11. MaxSim/Co-Design • Relates micro-architectural simulation events to VM and language

    features • Enables research, development & evaluation of novel hardware extensions – Codesign for power & performance analysis of new VM features [3] MaxSim: A Simulation Platform for Managed Applications, Rodchenko et al, ISPASS17 best paper award [4] Type Information Elimination from Objects on Architectures with Tagged Pointers Support Rodchenko et al, (in press) IEEE Trans. Comp. 13
  12. MaxSim/Codesign • Evaluate co-design benefits – VM/language aspects max heap

    size – Performance and power with McPAT • Demonstrates use-cases of pointer tagging • Lightweight micro-architectural profiling – Pointer tags used to store class type or object allocation site information (site estimation and class ID) – Fast simulation of microarchitecture performance and power estimation of all DaCapo benchmarks in < 1 day • Profiling to identify co-design opportunities – Dynamic load elimination for array length retrieval – Store array length in pointer tag, use hardware support • Supports object field layout optimisations 14
  13. MaxSim/CoDesign • Micro-architecture characterization (statistics) – L2/L3 cache misses per

    kilo instructions – Energy spent in GC, versus non-GC – Instructions per clock (IPC) … • Fine-grained text output (see Rodchenko’s thesis) 15
  14. MaxSim • HSS using class information pointer elimination using pointer

    tagging • Array length elimination using tagging 16
  15. Underlying MaxineVM Status/Stability • MaxineVM simplicity/flexibility means it is feasible

    to prototype new ideas – APTsim hotpath instrumentation – Integration with ZSim to enable MaxSim • x86-64, ARMv7 32bit • SPECjvm2008 and DaCapo-9.12-bach • X86-64 inspector examine runtime VM state • ARMv7 low-level debugging support method- id with gdb “to appear in VMIL17” • Performance circa 2x slower than hotspot 17
  16. Beehive: https://github.com/beehive-lab • MaxineVM – x86-64, ARMv7, ARMv8 in progress

    … – Updating to JDK8/Graal for performance boost • OpenCL for heterogeneous execution – GPUs and FPGAs in progress • MaxSim: MaxineVM + ZSim – Software simulation/codesign for managed workloads • MAMBO (Pin like tool for ARM) – Dynamic binary instrumentation for ARMv7 & ARMv8 • APTsim= MAST + instrumentation – FPGA accelerated simulation – MAST support for FPGA IP/accelerators 18
  17. Compilers & Runtimes Ioanna-Maria Alifieraki James Clarkson Timothy Hartley Andy

    Nisbet Andrey Rodchenko Foivos Zakkak Cosmin Gorgovan Hardware & Simulators Nikos Foutris Will Toms Oscar Palomar John Mawer Academic Staff Javier Navaridas Christos Kotselidis Jim Garside Antoniu Pop John Goodacre Mikel Lujan FPGAs John Mawer Oscar Palomar Athanasios Stratikopoulos