Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The AMD Fusion APU

The AMD Fusion APU

Presentation for the High Performance Computing and Architecture course at KAUST

Emaad Manzoor

April 22, 2014
Tweet

More Decks by Emaad Manzoor

Other Decks in Science

Transcript

  1. The AMD Fusion APU CS 280: High Performance Computing and

    Architecture Emaad Ahmed Manzoor April 22, 2014
  2. Llano AMD Fusion APU: Llano. Branover, Alexander, Denis Foley, and

    Maurice Steinman. Micro, IEEE '12. On the Efficacy of an APU for Parallel Computing. Daga et al. SAAHPC '11. Trade-Offs The tradeoffs of fused memory hierarchies in heterogeneous computing architectures. Spafford, Kyle L., et al. CF '12. Efficacy Patterns Characterizing the Impact of Memory Access Patterns on AMD Fusion. Lee et al. SC '12.
  3. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    Top500.org Performance Energy Efficiency
  4. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    On the Efficacy of an APU for Parallel Computing. Daga et al. SAAHPC '11.
  5. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    Top500.org GPUs: ~50% of theoretical peak CPUs: ~78% of theoretical peak
  6. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    On the Efficacy of an APU for Parallel Computing. Daga et al. SAAHPC '11.
  7. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    The tradeoffs of fused memory hierarchies in heterogeneous computing architectures. Spafford, Kyle L., et al. CF '12.
  8. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    Tradeoffs CPU, CPU + Discrete GPU, APU? - Cache coherency VS scalability - Latency VS throughput - Power VS performance
  9. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    Cache Coherency VS Scalability - GPUs: Flat, simple, incoherent Scales well with number of processor tiles Relaxed consistency for groups of cores Hard to program for - CPUs: Multi-level, high-capacity, coherent Much less scalable - Key tradeoff Couple CPU and GPU caches to enforce coherence, while preserving scalability to a large number of cores
  10. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    The tradeoffs of fused memory hierarchies in heterogeneous computing architectures. Spafford, Kyle L., et al. CF '12.
  11. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    The tradeoffs of fused memory hierarchies in heterogeneous computing architectures. Spafford, Kyle L., et al. CF '12.
  12. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    The tradeoffs of fused memory hierarchies in heterogeneous computing architectures. Spafford, Kyle L., et al. CF '12.
  13. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    The tradeoffs of fused memory hierarchies in heterogeneous computing architectures. Spafford, Kyle L., et al. CF '12.
  14. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    Fusion Memory Hierarchy - CPU-like accesses: Traditional caches - GPU-like accesses: Radeon Memory Bus - Cache coherence: Fusion Compute Link
  15. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    Latency VS Throughput Allocation of transistors and die space - CPUs: Optimizing latency in single-threaded programs Caches, instruction-level parallelism, branch predictors - GPUs: Simple, many floating point units Schedule thousands of threads onto these cores - Question: How much of the resources should we dedicate to serial VS parallel processing units?
  16. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    GPU is the computational workhorse
  17. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    Capacity VS Bandwidth Type of physical memory used - CPUs: Optimizing latency to high-capacity DDR3 memory - GPUs: Concerned with repeatedly streaming a fixed-size buffer Maximise bandwidth by using GDDR3 Wider memory bus, higher clock speed Lower capacity (same power budget) - Fused: GPU and CPU cores must use the same type of physical memory
  18. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    Capacity VS Bandwidth Type of physical memory used - CPUs: Optimizing latency to high-capacity DDR3 memory - GPUs: Concerned with repeatedly streaming a fixed-size buffer Maximise bandwidth by using GDDR3 Wider memory bus, higher clock speed Lower capacity (same power budget) - Fused: GPU and CPU cores must use the same type of physical memory
  19. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    GDDR5: High bandwidth at lower clock speed, low capacity
  20. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    HD5670: Only 30% higher clock speed
  21. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    HD5670: Only 30% higher clock speed
  22. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    Performance improvements commensurate or greater than the increase in power consumption
  23. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    Performance increase for the Llano, though its slower shader clock and lower memory bandwidth
  24. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    Power VS Performance - Fusion APU: Slower shader clock Tighter integration leading to fine-grained DVFS Lesser data movement across wires
  25. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    - GPUs benefit HPC performance for data-parallel problems - Limited by the PCIe bandwidth - APUs replace the PCIe with a unified northbridge - Tradeoffs - Cache Coherency VS Scalability - Capacity VS Bandwidth - Power VS Performance
  26. .