Slide 1

Slide 1 text

Mitglied der Helmholtz-Gemeinschaft GPU Implementations of Online Track Finding Algorithms at PANDA 1 HK 57.2, DPG-Frühjahrstagung 2014, Frankfurt 21 March 2014, Andreas Herten (Institut für Kernphysik, Forschungszentrum Jülich) for the PANDA Collaboration

Slide 2

Slide 2 text

Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 PANDA — The Experiment 2 13 m

Slide 3

Slide 3 text

Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 PANDA — The Experiment 2 13 m Magnet STT MVD

Slide 4

Slide 4 text

Mitglied der Helmholtz-Gemeinschaft PANDA — Event Reconstruction • Triggerless read out – Many benchmark channels – Background & signal similar • Event Rate: 2 • 107/s 3 Raw Data Rate: 200 GB/s Disk Storage Space for Offline Analysis: 3 PB/y Reduce by ~1/1000 (Reject background events, save interesting physics events)

Slide 5

Slide 5 text

Mitglied der Helmholtz-Gemeinschaft PANDA — Event Reconstruction • Triggerless read out – Many benchmark channels – Background & signal similar • Event Rate: 2 • 107/s 3 Raw Data Rate: 200 GB/s Disk Storage Space for Offline Analysis: 3 PB/y Reduce by ~1/1000 (Reject background events, save interesting physics events) GPUs

Slide 6

Slide 6 text

Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 4 Trigger Detector layers PANDA — Tracking, Online Tracking • PANDA: No hardware-based trigger • But computational intensive software trigger → Online Tracking

Slide 7

Slide 7 text

Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 4 Trigger Detector layers PANDA — Tracking, Online Tracking • PANDA: No hardware-based trigger • But computational intensive software trigger → Online Tracking

Slide 8

Slide 8 text

Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 4 Trigger Detector layers Usual HEP experiment PANDA — Tracking, Online Tracking • PANDA: No hardware-based trigger • But computational intensive software trigger → Online Tracking

Slide 9

Slide 9 text

Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 4 Trigger Detector layers Usual HEP experiment PANDA — Tracking, Online Tracking • PANDA: No hardware-based trigger • But computational intensive software trigger → Online Tracking

Slide 10

Slide 10 text

Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 4 Trigger Detector layers Usual HEP experiment PANDA — Tracking, Online Tracking • PANDA: No hardware-based trigger • But computational intensive software trigger → Online Tracking

Slide 11

Slide 11 text

Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 4 Trigger Detector layers Usual HEP experiment PANDA — Tracking, Online Tracking • PANDA: No hardware-based trigger • But computational intensive software trigger → Online Tracking

Slide 12

Slide 12 text

Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 4 Trigger Detector layers Usual HEP experiment PANDA PANDA — Tracking, Online Tracking • PANDA: No hardware-based trigger • But computational intensive software trigger → Online Tracking

Slide 13

Slide 13 text

Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 4 Trigger Detector layers Usual HEP experiment PANDA PANDA — Tracking, Online Tracking • PANDA: No hardware-based trigger • But computational intensive software trigger → Online Tracking

Slide 14

Slide 14 text

Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 4 Trigger Detector layers Usual HEP experiment PANDA PANDA — Tracking, Online Tracking • PANDA: No hardware-based trigger • But computational intensive software trigger → Online Tracking

Slide 15

Slide 15 text

Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 4 Trigger Detector layers Usual HEP experiment PANDA PANDA — Tracking, Online Tracking • PANDA: No hardware-based trigger • But computational intensive software trigger → Online Tracking

Slide 16

Slide 16 text

Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 GPUs @ PANDA — Online Tracking • Port tracking algorithms to GPU – Serial → parallel – C++ → CUDA • Investigate suitability for online performance • But also: Find & invent tracking algorithms… • Under investigation: – Hough Transformation – Riemann Track Finder – Triplet Finder 5

Slide 17

Slide 17 text

Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 Algorithm: Hough Transform • Idea: Transform (x,y)i → (α,r)ij, find lines via (α,r) space • Solve rij line equation for – Lots of hits (x,y,ρ)i and – Many αj ∈ [0°,360°) each • Fill histogram • Extract track parameters 6 x y x y Mitglied der Helmholtz-Gemeinschaft Hough Transform — Princip → Bin giv r α

Slide 18

Slide 18 text

Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 Algorithm: Hough Transform • Idea: Transform (x,y)i → (α,r)ij, find lines via (α,r) space • Solve rij line equation for – Lots of hits (x,y,ρ)i and – Many αj ∈ [0°,360°) each • Fill histogram • Extract track parameters 6 rij = cos ↵j · xi + sin ↵j · yi + ⇢i i: ~100 hits/event (STT) j: every 0.2° rij: 180 000 x y x y Mitglied der Helmholtz-Gemeinschaft Hough Transform — Princip → Bin giv r α

Slide 19

Slide 19 text

° Angle / 0 20 40 60 80 100 120 140 160 180 Hough transformed -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0 Entries 2.2356e+08 Mean x 90 Mean y 0.02905 RMS x 51.96 RMS y 0.1063 0 5 10 15 20 25 0 Entries 2.2356e+08 Mean x 90 Mean y 0.02905 RMS x 51.96 RMS y 0.1063 1800 x 1800 Grid PANDA STT+MVD Mitglied der Helmholtz-Gemeinschaft 7 68 (x,y) points r α Algorithm: Hough Transform

Slide 20

Slide 20 text

° Angle / 0 20 40 60 80 100 120 140 160 180 Hough transformed -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0 Entries 2.2356e+08 Mean x 90 Mean y 0.02905 RMS x 51.96 RMS y 0.1063 0 5 10 15 20 25 0 Entries 2.2356e+08 Mean x 90 Mean y 0.02905 RMS x 51.96 RMS y 0.1063 1800 x 1800 Grid PANDA STT+MVD Mitglied der Helmholtz-Gemeinschaft 7 68 (x,y) points r α Algorithm: Hough Transform

Slide 21

Slide 21 text

Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 Algorithm: Hough Transform 8 Thrust Plain CUDA • Performance: 3 ms/event – Independent of α granularity – Reduced to set of standard routines • Fast (uses Thrust‘s optimized algorithms) • Inflexible (has it‘s limits, hard to customize) – No peakfinding included • Even possible? • Adds to time! • Performance: 0.5 ms/event – Built completely for this task • Fitting to every problem • Customizable • A bit more complicated at parts – Simple peakfinder implemented (threshold) • Using: Dynamic Parallelism, Shared Memory Two Implementations

Slide 22

Slide 22 text

Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 9 • Idea: Don‘t fit lines (in 2D), fit planes (in 3D)! • Create seeds – All possible three hit combinations • Grow seeds to tracks Continuously test next hit if it fits – Use mapping to Riemann paraboloid • Summer student project (J. Timcheck) x x x x y z‘ x x x y x x x x y x Algorithm: Riemann Track Finder

Slide 23

Slide 23 text

nLayerx = 1 2 ⇣p 8x + 1 1 ⌘ pos ( nLayerx ) = 3 pp 3 p 243x2 1 + 27x 32 / 3 + 1 3 p 3 3 pp 3 p 243x2 1 + 27x 1 Mitglied der Helmholtz-Gemeinschaft 10 Algorithm: Riemann Track Finder int ijk = threadIdx.x + blockIdx.x * blockDim.x; for () {for () {for () {}}} • GPU Optimization: Unfolding loops → 100 × faster than CPU version • Time for one event (Tesla K20X): ~0.6 ms

Slide 24

Slide 24 text

Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 11 Algorithm: Triplet Finder • Idea: Use only sub-set of detector as seed – Combine 3 hits to Triplet – Calculate circle from 3 Triplets (no fit) • Features – Tailored for PANDA – Fast & robust algorithm, no t0 • Ported to GPU together with NVIDIA Application Lab

Slide 25

Slide 25 text

Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 12 Triplet Finder — Time

Slide 26

Slide 26 text

Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 Triplet Finder — Optimizations • Bunching Wrapper – Hits from one event have similar timestamp – Combine hits to sets (bunches) which fill up GPU best 13

Slide 27

Slide 27 text

Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 Triplet Finder — Optimizations • Bunching Wrapper – Hits from one event have similar timestamp – Combine hits to sets (bunches) which fill up GPU best 13 Hit

Slide 28

Slide 28 text

Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 Triplet Finder — Optimizations • Bunching Wrapper – Hits from one event have similar timestamp – Combine hits to sets (bunches) which fill up GPU best 13 Hit Event

Slide 29

Slide 29 text

Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 Triplet Finder — Optimizations • Bunching Wrapper – Hits from one event have similar timestamp – Combine hits to sets (bunches) which fill up GPU best 13 Hit Event

Slide 30

Slide 30 text

Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 Triplet Finder — Optimizations • Bunching Wrapper – Hits from one event have similar timestamp – Combine hits to sets (bunches) which fill up GPU best 13 Hit Event Bunch

Slide 31

Slide 31 text

Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 Triplet Finder — Optimizations • Bunching Wrapper – Hits from one event have similar timestamp – Combine hits to sets (bunches) which fill up GPU best 13 Hit Event Bunch (N2) → (N)

Slide 32

Slide 32 text

Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 14 Triplet Finder — Bunching Performance

Slide 33

Slide 33 text

Dynamic Parallelism Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 Triplet Finder — Optimizations • Compare kernel launch strategies 15 1 thread/bunch Calling kernel 1 thread/bunch Calling kernel Triplet Finder 1 thread/bunch Calling kernel 1 block/bunch Joined kernel 1 block/bunch Joined kernel 1 block/bunch Joined kernel TF Stage #1 TF Stage #2 TF Stage #3 TF Stage #4 1 stream/bunch Combining stream 1 stream/bunch Combining stream 1 stream/bunch Calling stream Joined Kernel Host Streams Triplet Finder Triplet Finder CPU GPU TF Stage #1 TF Stage #2 TF Stage #3 TF Stage #4 TF Stage #1 TF Stage #2 TF Stage #3 TF Stage #4

Slide 34

Slide 34 text

Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 16 Triplet Finder — Kernel Launches Preliminary (in publication)

Slide 35

Slide 35 text

Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 17 Triplet Finder — Clock Speed / Chipset Preliminary (in publication) K40 3004 MHz, 745 MHz / 875 MHz K20X 2600 MHz, 732 MHz / 784 MHz Memory Clock Core Clock GPU Boost

Slide 36

Slide 36 text

Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 Summary • Investigated different tracking algorithms – Best performance: 20 µs/event → Online Tracking a feasible technique for PANDA • Multi GPU system needed – (100) GPUs • Still much optimization necessary (efficiency) • Collaboration with NVIDIA Application Lab 18

Slide 37

Slide 37 text

Mitglied der Helmholtz-Gemeinschaft Andreas Herten, DPG Frühjahrstagung 2014, HK 57.2 Summary • Investigated different tracking algorithms – Best performance: 20 µs/event → Online Tracking a feasible technique for PANDA • Multi GPU system needed – (100) GPUs • Still much optimization necessary (efficiency) • Collaboration with NVIDIA Application Lab 18 Thank you! Andreas Herten [email protected]