Nisbet Research Fellow Advanced Processor Technologies (APT) Group Andy Nisbet School of Computer Science, University of Manchester, UK [email protected]
power budgets • Emergent diverse computational workloads – Computer vision, machine learning, big data • Increasing architecture heterogeneity/diversity 2
meet the demand of increased performance and lower power budgets • Full stack optimisation – provide abstraction and mechanisms for efficient exploitation hw diversity 3
doesn’t work ATM with heterogeneous diversity! • We need abstractions & mechanisms to enable efficient exploitation of diversity • We need tools to help identify/find “vertical stack” optimisation opportunities • We need codesign (modify/develop sw/hw) to evaluate different sw/hw co-operations to optimise for these opportunities 5
stack exploitation routes • Tools to make it feasible to identify optimisation opportunities – simulation/profiling • To prototype and implement solutions – Full stack solutions, from application down to hardware – BeeHive will reduce the level of expert knowledge required to perform full stack optimisation • To investigate co-design tradeoffs between hardware versus software solutions • https://github.com/beehive-lab 6
FPGAs [1] ‒ Evaluate entire end-end usage of FPGAs in a system, not just trace simulation ‒ Catch design errors as early as possible ‒ 43x speedup on OpenJDK for pre-processing stages of kfusion SLAM via JNI • Fast event driven simulation of Java hotpath - Instrumented execution generates events - FPGA models produce statistics/timing information [1] Heterogeneous Managed Runtime Systems: A Computer Vision Case Study VEE16 MAST the scaling and bilateral filter pre-processing stages
that enables easy exploitation of FPGA resources • Acceleration • Simulation [2] The potential of dynamic binary modification and CPU/FPGA SoCs for simulation FCCM17
MAST [1] • Java/OpenCL -high level synthesis to FPGA – Can already use mechanism of JNI & MAST – Need abstraction & mechanisms to use MAST FPGA accelerators directly & automatically • Accelerators for SLAM computer vision [1] Heterogeneous Managed Runtime Systems: A Computer Vision Case Study VEE16 12
features • Enables research, development & evaluation of novel hardware extensions – Codesign for power & performance analysis of new VM features [3] MaxSim: A Simulation Platform for Managed Applications, Rodchenko et al, ISPASS17 best paper award [4] Type Information Elimination from Objects on Architectures with Tagged Pointers Support Rodchenko et al, (in press) IEEE Trans. Comp. 13
size – Performance and power with McPAT • Demonstrates use-cases of pointer tagging • Lightweight micro-architectural profiling – Pointer tags used to store class type or object allocation site information (site estimation and class ID) – Fast simulation of microarchitecture performance and power estimation of all DaCapo benchmarks in < 1 day • Profiling to identify co-design opportunities – Dynamic load elimination for array length retrieval – Store array length in pointer tag, use hardware support • Supports object field layout optimisations 14
to prototype new ideas – APTsim hotpath instrumentation – Integration with ZSim to enable MaxSim • x86-64, ARMv7 32bit • SPECjvm2008 and DaCapo-9.12-bach • X86-64 inspector examine runtime VM state • ARMv7 low-level debugging support method- id with gdb “to appear in VMIL17” • Performance circa 2x slower than hotspot 17
Nisbet Andrey Rodchenko Foivos Zakkak Cosmin Gorgovan Hardware & Simulators Nikos Foutris Will Toms Oscar Palomar John Mawer Academic Staff Javier Navaridas Christos Kotselidis Jim Garside Antoniu Pop John Goodacre Mikel Lujan FPGAs John Mawer Oscar Palomar Athanasios Stratikopoulos