Upgrade to Pro — share decks privately, control downloads, hide ads and more …

GPU Implementations of Online Track Finding Alg...

AndiH
March 21, 2014

GPU Implementations of Online Track Finding Algorithms at PANDA

A 12 minutes talk I gave at the spring meeting of the German Physical Society in Frankfurt 2014. The status of my PhD thesis. More or less.

AndiH

March 21, 2014
Tweet

More Decks by AndiH

Other Decks in Science

Transcript

  1. Mitglied der Helmholtz-Gemeinschaft GPU Implementations of Online Track Finding Algorithms

    at PANDA 1 HK 57.2, DPG-Frühjahrstagung 2014, Frankfurt 21 March 2014, Andreas Herten (Institut für Kernphysik, Forschungszentrum Jülich) for the PANDA Collaboration
  2. Mitglied der Helmholtz-Gemeinschaft PANDA — Event Reconstruction • Triggerless read

    out – Many benchmark channels – Background & signal similar • Event Rate: 2 • 107/s 3 Raw Data Rate: 200 GB/s Disk Storage Space for Offline Analysis: 3 PB/y Reduce by ~1/1000 (Reject background events, save interesting physics events)
  3. Mitglied der Helmholtz-Gemeinschaft PANDA — Event Reconstruction • Triggerless read

    out – Many benchmark channels – Background & signal similar • Event Rate: 2 • 107/s 3 Raw Data Rate: 200 GB/s Disk Storage Space for Offline Analysis: 3 PB/y Reduce by ~1/1000 (Reject background events, save interesting physics events) GPUs
  4. Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2

    4 Trigger Detector layers PANDA — Tracking, Online Tracking • PANDA: No hardware-based trigger • But computational intensive software trigger → Online Tracking
  5. Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2

    4 Trigger Detector layers PANDA — Tracking, Online Tracking • PANDA: No hardware-based trigger • But computational intensive software trigger → Online Tracking
  6. Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2

    4 Trigger Detector layers Usual HEP experiment PANDA — Tracking, Online Tracking • PANDA: No hardware-based trigger • But computational intensive software trigger → Online Tracking
  7. Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2

    4 Trigger Detector layers Usual HEP experiment PANDA — Tracking, Online Tracking • PANDA: No hardware-based trigger • But computational intensive software trigger → Online Tracking
  8. Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2

    4 Trigger Detector layers Usual HEP experiment PANDA — Tracking, Online Tracking • PANDA: No hardware-based trigger • But computational intensive software trigger → Online Tracking
  9. Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2

    4 Trigger Detector layers Usual HEP experiment PANDA — Tracking, Online Tracking • PANDA: No hardware-based trigger • But computational intensive software trigger → Online Tracking
  10. Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2

    4 Trigger Detector layers Usual HEP experiment PANDA PANDA — Tracking, Online Tracking • PANDA: No hardware-based trigger • But computational intensive software trigger → Online Tracking
  11. Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2

    4 Trigger Detector layers Usual HEP experiment PANDA PANDA — Tracking, Online Tracking • PANDA: No hardware-based trigger • But computational intensive software trigger → Online Tracking
  12. Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2

    4 Trigger Detector layers Usual HEP experiment PANDA PANDA — Tracking, Online Tracking • PANDA: No hardware-based trigger • But computational intensive software trigger → Online Tracking
  13. Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2

    4 Trigger Detector layers Usual HEP experiment PANDA PANDA — Tracking, Online Tracking • PANDA: No hardware-based trigger • But computational intensive software trigger → Online Tracking
  14. Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2

    GPUs @ PANDA — Online Tracking • Port tracking algorithms to GPU – Serial → parallel – C++ → CUDA • Investigate suitability for online performance • But also: Find & invent tracking algorithms… • Under investigation: – Hough Transformation – Riemann Track Finder – Triplet Finder 5
  15. Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2

    Algorithm: Hough Transform • Idea: Transform (x,y)i → (α,r)ij, find lines via (α,r) space • Solve rij line equation for – Lots of hits (x,y,ρ)i and – Many αj ∈ [0°,360°) each • Fill histogram • Extract track parameters 6 x y x y Mitglied der Helmholtz-Gemeinschaft Hough Transform — Princip → Bin giv r α
  16. Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2

    Algorithm: Hough Transform • Idea: Transform (x,y)i → (α,r)ij, find lines via (α,r) space • Solve rij line equation for – Lots of hits (x,y,ρ)i and – Many αj ∈ [0°,360°) each • Fill histogram • Extract track parameters 6 rij = cos ↵j · xi + sin ↵j · yi + ⇢i i: ~100 hits/event (STT) j: every 0.2° rij: 180 000 x y x y Mitglied der Helmholtz-Gemeinschaft Hough Transform — Princip → Bin giv r α
  17. ° Angle / 0 20 40 60 80 100 120

    140 160 180 Hough transformed -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0 Entries 2.2356e+08 Mean x 90 Mean y 0.02905 RMS x 51.96 RMS y 0.1063 0 5 10 15 20 25 0 Entries 2.2356e+08 Mean x 90 Mean y 0.02905 RMS x 51.96 RMS y 0.1063 1800 x 1800 Grid PANDA STT+MVD Mitglied der Helmholtz-Gemeinschaft 7 68 (x,y) points r α Algorithm: Hough Transform
  18. ° Angle / 0 20 40 60 80 100 120

    140 160 180 Hough transformed -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0 Entries 2.2356e+08 Mean x 90 Mean y 0.02905 RMS x 51.96 RMS y 0.1063 0 5 10 15 20 25 0 Entries 2.2356e+08 Mean x 90 Mean y 0.02905 RMS x 51.96 RMS y 0.1063 1800 x 1800 Grid PANDA STT+MVD Mitglied der Helmholtz-Gemeinschaft 7 68 (x,y) points r α Algorithm: Hough Transform
  19. Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2

    Algorithm: Hough Transform 8 Thrust Plain CUDA • Performance: 3 ms/event – Independent of α granularity – Reduced to set of standard routines • Fast (uses Thrust‘s optimized algorithms) • Inflexible (has it‘s limits, hard to customize) – No peakfinding included • Even possible? • Adds to time! • Performance: 0.5 ms/event – Built completely for this task • Fitting to every problem • Customizable • A bit more complicated at parts – Simple peakfinder implemented (threshold) • Using: Dynamic Parallelism, Shared Memory Two Implementations
  20. Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2

    9 • Idea: Don‘t fit lines (in 2D), fit planes (in 3D)! • Create seeds – All possible three hit combinations • Grow seeds to tracks Continuously test next hit if it fits – Use mapping to Riemann paraboloid • Summer student project (J. Timcheck) x x x x y z‘ x x x y x x x x y x Algorithm: Riemann Track Finder
  21. nLayerx = 1 2 ⇣p 8x + 1 1 ⌘

    pos ( nLayerx ) = 3 pp 3 p 243x2 1 + 27x 32 / 3 + 1 3 p 3 3 pp 3 p 243x2 1 + 27x 1 Mitglied der Helmholtz-Gemeinschaft 10 Algorithm: Riemann Track Finder int ijk = threadIdx.x + blockIdx.x * blockDim.x; for () {for () {for () {}}} • GPU Optimization: Unfolding loops → 100 × faster than CPU version • Time for one event (Tesla K20X): ~0.6 ms
  22. Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2

    11 Algorithm: Triplet Finder • Idea: Use only sub-set of detector as seed – Combine 3 hits to Triplet – Calculate circle from 3 Triplets (no fit) • Features – Tailored for PANDA – Fast & robust algorithm, no t0 • Ported to GPU together with NVIDIA Application Lab
  23. Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2

    Triplet Finder — Optimizations • Bunching Wrapper – Hits from one event have similar timestamp – Combine hits to sets (bunches) which fill up GPU best 13
  24. Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2

    Triplet Finder — Optimizations • Bunching Wrapper – Hits from one event have similar timestamp – Combine hits to sets (bunches) which fill up GPU best 13 Hit
  25. Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2

    Triplet Finder — Optimizations • Bunching Wrapper – Hits from one event have similar timestamp – Combine hits to sets (bunches) which fill up GPU best 13 Hit Event
  26. Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2

    Triplet Finder — Optimizations • Bunching Wrapper – Hits from one event have similar timestamp – Combine hits to sets (bunches) which fill up GPU best 13 Hit Event
  27. Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2

    Triplet Finder — Optimizations • Bunching Wrapper – Hits from one event have similar timestamp – Combine hits to sets (bunches) which fill up GPU best 13 Hit Event Bunch
  28. Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2

    Triplet Finder — Optimizations • Bunching Wrapper – Hits from one event have similar timestamp – Combine hits to sets (bunches) which fill up GPU best 13 Hit Event Bunch (N2) → (N)
  29. Dynamic Parallelism Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014,

    HK 57.2 Triplet Finder — Optimizations • Compare kernel launch strategies 15 1 thread/bunch Calling kernel 1 thread/bunch Calling kernel Triplet Finder 1 thread/bunch Calling kernel 1 block/bunch Joined kernel 1 block/bunch Joined kernel 1 block/bunch Joined kernel TF Stage #1 TF Stage #2 TF Stage #3 TF Stage #4 1 stream/bunch Combining stream 1 stream/bunch Combining stream 1 stream/bunch Calling stream Joined Kernel Host Streams Triplet Finder Triplet Finder CPU GPU TF Stage #1 TF Stage #2 TF Stage #3 TF Stage #4 TF Stage #1 TF Stage #2 TF Stage #3 TF Stage #4
  30. Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2

    16 Triplet Finder — Kernel Launches Preliminary (in publication)
  31. Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2

    17 Triplet Finder — Clock Speed / Chipset Preliminary (in publication) K40 3004 MHz, 745 MHz / 875 MHz K20X 2600 MHz, 732 MHz / 784 MHz Memory Clock Core Clock GPU Boost
  32. Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2

    Summary • Investigated different tracking algorithms – Best performance: 20 µs/event → Online Tracking a feasible technique for PANDA • Multi GPU system needed – (100) GPUs • Still much optimization necessary (efficiency) • Collaboration with NVIDIA Application Lab 18
  33. Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2

    Summary • Investigated different tracking algorithms – Best performance: 20 µs/event → Online Tracking a feasible technique for PANDA • Multi GPU system needed – (100) GPUs • Still much optimization necessary (efficiency) • Collaboration with NVIDIA Application Lab 18 Thank you! Andreas Herten [email protected]