The AMD Fusion APU

The AMD Fusion APU

Presentation for the High Performance Computing and Architecture course at KAUST

Ed09e933a899fcae158439f11f66fed0?s=128

Emaad Manzoor

April 22, 2014
Tweet

Transcript

  1. The AMD Fusion APU CS 280: High Performance Computing and

    Architecture Emaad Ahmed Manzoor April 22, 2014
  2. Llano AMD Fusion APU: Llano. Branover, Alexander, Denis Foley, and

    Maurice Steinman. Micro, IEEE '12. On the Efficacy of an APU for Parallel Computing. Daga et al. SAAHPC '11. Trade-Offs The tradeoffs of fused memory hierarchies in heterogeneous computing architectures. Spafford, Kyle L., et al. CF '12. Efficacy Patterns Characterizing the Impact of Memory Access Patterns on AMD Fusion. Lee et al. SC '12.
  3. GPUs

  4. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    Top500.org Performance Energy Efficiency
  5. GPUs VS CPUS Emaad Ahmed Manzoor April 22, 2014 The

    AMD Fusion APU
  6. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    On the Efficacy of an APU for Parallel Computing. Daga et al. SAAHPC '11.
  7. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    Top500.org GPUs: ~50% of theoretical peak CPUs: ~78% of theoretical peak
  8. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    On the Efficacy of an APU for Parallel Computing. Daga et al. SAAHPC '11.
  9. AMD Fusion APU

  10. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    The tradeoffs of fused memory hierarchies in heterogeneous computing architectures. Spafford, Kyle L., et al. CF '12.
  11. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    Tradeoffs CPU, CPU + Discrete GPU, APU? - Cache coherency VS scalability - Latency VS throughput - Power VS performance
  12. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    Cache Coherency VS Scalability - GPUs: Flat, simple, incoherent Scales well with number of processor tiles Relaxed consistency for groups of cores Hard to program for - CPUs: Multi-level, high-capacity, coherent Much less scalable - Key tradeoff Couple CPU and GPU caches to enforce coherence, while preserving scalability to a large number of cores
  13. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    The tradeoffs of fused memory hierarchies in heterogeneous computing architectures. Spafford, Kyle L., et al. CF '12.
  14. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    The tradeoffs of fused memory hierarchies in heterogeneous computing architectures. Spafford, Kyle L., et al. CF '12.
  15. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    The tradeoffs of fused memory hierarchies in heterogeneous computing architectures. Spafford, Kyle L., et al. CF '12.
  16. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    The tradeoffs of fused memory hierarchies in heterogeneous computing architectures. Spafford, Kyle L., et al. CF '12.
  17. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    Fusion Memory Hierarchy - CPU-like accesses: Traditional caches - GPU-like accesses: Radeon Memory Bus - Cache coherence: Fusion Compute Link
  18. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    Latency VS Throughput Allocation of transistors and die space - CPUs: Optimizing latency in single-threaded programs Caches, instruction-level parallelism, branch predictors - GPUs: Simple, many floating point units Schedule thousands of threads onto these cores - Question: How much of the resources should we dedicate to serial VS parallel processing units?
  19. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

  20. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

  21. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    GPU is the computational workhorse
  22. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

  23. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

  24. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    Memory hierarchy performance
  25. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    Capacity VS Bandwidth Type of physical memory used - CPUs: Optimizing latency to high-capacity DDR3 memory - GPUs: Concerned with repeatedly streaming a fixed-size buffer Maximise bandwidth by using GDDR3 Wider memory bus, higher clock speed Lower capacity (same power budget) - Fused: GPU and CPU cores must use the same type of physical memory
  26. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    Capacity VS Bandwidth Type of physical memory used - CPUs: Optimizing latency to high-capacity DDR3 memory - GPUs: Concerned with repeatedly streaming a fixed-size buffer Maximise bandwidth by using GDDR3 Wider memory bus, higher clock speed Lower capacity (same power budget) - Fused: GPU and CPU cores must use the same type of physical memory
  27. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

  28. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    GDDR5: High bandwidth at lower clock speed, low capacity
  29. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

  30. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    HD5670: Only 30% higher clock speed
  31. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    HD5670: Only 30% higher clock speed
  32. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    Bandwidth-limited benchmarks
  33. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    Performance improvements commensurate or greater than the increase in power consumption
  34. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    PCIe bound benchmarks
  35. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    Performance increase for the Llano, though its slower shader clock and lower memory bandwidth
  36. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

  37. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    Power VS Performance - Fusion APU: Slower shader clock Tighter integration leading to fine-grained DVFS Lesser data movement across wires
  38. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

  39. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

  40. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

  41. Summary

  42. Emaad Ahmed Manzoor April 22, 2014 The AMD Fusion APU

    - GPUs benefit HPC performance for data-parallel problems - Limited by the PCIe bandwidth - APUs replace the PCIe with a unified northbridge - Tradeoffs - Cache Coherency VS Scalability - Capacity VS Bandwidth - Power VS Performance
  43. .