Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Enabling the Next Generation of Particle Physics Experiments: GPUs for Online Track Reconstruction

AndiH
March 26, 2014

Enabling the Next Generation of Particle Physics Experiments: GPUs for Online Track Reconstruction

Talk I gave about our current research at GPU Technology Conference 2014: http://registration.gputechconf.com/quicklink/dFuC9Rm

AndiH

March 26, 2014
Tweet

More Decks by AndiH

Other Decks in Science

Transcript

  1. Mitglied der Helmholtz-Gemeinschaft 1 GPU Technology Conference 2014 26 March

    2014, Andreas Herten (Institute for Nuclear Physics, Forschungszentrum Jülich, Germany) Enabling the Next Generation of Particle Physics Experiments: GPUs for Online Track Reconstruction
  2. Mitglied der Helmholtz-Gemeinschaft Outline • High Energy Physics • PANDA

    Experiment • Particle Tracking • GPUs at PANDA • Algorithms – Hough Transform – Riemann Track Finder – Triplet Finder 2
  3. Mitglied der Helmholtz-Gemeinschaft HEP High Energy Physics 3

  4. Mitglied der Helmholtz-Gemeinschaft High Energy Physics • High Energy Physics

    (HEP) in a nutshell: – 4 HEP Recipe 1. Accelerate particles (e, p,…) 2. Accelerate particles more! 3. Smash into each other 4. Look at resulting particles 5. Understand world
  5. Mitglied der Helmholtz-Gemeinschaft High Energy Physics • High Energy Physics

    (HEP) in a nutshell: – 4 HEP Recipe 1. Accelerate particles (e, p,…) 2. Accelerate particles more! 3. Smash into each other 4. Look at resulting particles 5. Understand world
  6. Mitglied der Helmholtz-Gemeinschaft High Energy Physics • High Energy Physics

    (HEP) in a nutshell: – 4 HEP Recipe 1. Accelerate particles (e, p,…) 2. Accelerate particles more! 3. Smash into each other 4. Look at resulting particles 5. Understand world
  7. Mitglied der Helmholtz-Gemeinschaft High Energy Physics • High Energy Physics

    (HEP) in a nutshell: – 4 HEP Recipe 1. Accelerate particles (e, p,…) 2. Accelerate particles more! 3. Smash into each other 4. Look at resulting particles 5. Understand world
  8. Mitglied der Helmholtz-Gemeinschaft High Energy Physics • High Energy Physics

    (HEP) in a nutshell: – 4 HEP Recipe 1. Accelerate particles (e, p,…) 2. Accelerate particles more! 3. Smash into each other 4. Look at resulting particles 5. Understand world E=mc2
  9. Mitglied der Helmholtz-Gemeinschaft High Energy Physics • High Energy Physics

    (HEP) in a nutshell: – 4 HEP Recipe 1. Accelerate particles (e, p,…) 2. Accelerate particles more! 3. Smash into each other 4. Look at resulting particles 5. Understand world E=mc2
  10. Mitglied der Helmholtz-Gemeinschaft High Energy Physics • High Energy Physics

    (HEP) in a nutshell: – 4 HEP Recipe 1. Accelerate particles (e, p,…) 2. Accelerate particles more! 3. Smash into each other 4. Look at resulting particles 5. Understand world ✓ E=mc2
  11. Mitglied der Helmholtz-Gemeinschaft High Energy Physics • High Energy Physics

    (HEP) in a nutshell: – 4 HEP Recipe 1. Accelerate particles (e, p,…) 2. Accelerate particles more! 3. Smash into each other 4. Look at resulting particles 5. Understand world ✓ – GPUs are interesting for HEP • Many events due to high collision rate • Events independent, dividable into subsets • Many features extractable (computational intensive) E=mc2
  12. Mitglied der Helmholtz-Gemeinschaft PANDA 5

  13. Mitglied der Helmholtz-Gemeinschaft PANDA — FAIR • Anti Proton Annihilation

    at Darmstadt 6
  14. Mitglied der Helmholtz-Gemeinschaft PANDA — FAIR • Anti Proton Annihilation

    at Darmstadt • FAIR: Facility for Antiproton and Ion Research – Accelerator complex at GSI Darmstadt – Currently under construction 6
  15. Mitglied der Helmholtz-Gemeinschaft PANDA — FAIR • Anti Proton Annihilation

    at Darmstadt • FAIR: Facility for Antiproton and Ion Research – Accelerator complex at GSI Darmstadt – Currently under construction 6
  16. Mitglied der Helmholtz-Gemeinschaft PANDA — The Experiment 7 13 m

    (43 ft)
  17. Mitglied der Helmholtz-Gemeinschaft PANDA — The Experiment 7 13 m

    (43 ft) p p
  18. Mitglied der Helmholtz-Gemeinschaft PANDA — The Experiment 7 13 m

    (43 ft) p p
  19. Mitglied der Helmholtz-Gemeinschaft PANDA — The Experiment 7 13 m

    (43 ft) p p Magnet STT MVD
  20. Mitglied der Helmholtz-Gemeinschaft PANDA — Event Reconstruction • Continuous read

    out – Background & signal similar – Novel feature • Event Rate: 2 • 107/s 8 Raw Data Rate: 200 GB/s Disk Storage Space for Offline Analysis: 2 PB/y Reduce by ~1/1000 (Reject background events, save interesting physics events)
  21. Mitglied der Helmholtz-Gemeinschaft PANDA — Event Reconstruction • Continuous read

    out – Background & signal similar – Novel feature • Event Rate: 2 • 107/s 8 Raw Data Rate: 200 GB/s Disk Storage Space for Offline Analysis: 2 PB/y Reduce by ~1/1000 (Reject background events, save interesting physics events) GPUs
  22. Mitglied der Helmholtz-Gemeinschaft 9 PANDA — Online Tracking Example pp

    → ψ‘→ ψ π+ π- The physics side: Antiproton-proton event e+e-
  23. Mitglied der Helmholtz-Gemeinschaft 9 PANDA — Online Tracking Example pp

    → ψ‘→ ψ π+ π- The physics side: Antiproton-proton event e+e-
  24. Mitglied der Helmholtz-Gemeinschaft 9 PANDA — Online Tracking Example pp

    → ψ‘→ ψ π+ π- The physics side: Antiproton-proton event e+e-
  25. Mitglied der Helmholtz-Gemeinschaft 9 PANDA — Online Tracking Example pp

    → ψ‘→ ψ π+ π- The physics side: Antiproton-proton event e+e-
  26. Mitglied der Helmholtz-Gemeinschaft 9 PANDA — Online Tracking Example π+

    π- e+ e- ψ‘ pp → ψ‘→ ψ π+ π- The physics side: Antiproton-proton event e+e-
  27. Mitglied der Helmholtz-Gemeinschaft 9 PANDA — Online Tracking Example π+

    π- e+ e- ψ‘ pp → ψ‘→ ψ π+ π- The physics side: Antiproton-proton event e+e-
  28. Mitglied der Helmholtz-Gemeinschaft 9 PANDA — Online Tracking Example π+

    π- e+ e- ψ‘ pp → ψ‘→ ψ π+ π- The physics side: Antiproton-proton event e+e-
  29. Mitglied der Helmholtz-Gemeinschaft 9 PANDA — Online Tracking Example pp

    → ψ‘→ ψ π+ π- The physics side: Antiproton-proton event e+e-
  30. Mitglied der Helmholtz-Gemeinschaft 10 PANDA — Online Tracking Example The

    detector side Everything in reverse
  31. Mitglied der Helmholtz-Gemeinschaft 10 PANDA — Online Tracking Example The

    detector side Everything in reverse
  32. Mitglied der Helmholtz-Gemeinschaft 10 PANDA — Online Tracking Example The

    detector side Everything in reverse Particle tracks are curves*
  33. Mitglied der Helmholtz-Gemeinschaft 10 PANDA — Online Tracking Example The

    detector side Everything in reverse Particle tracks are curves* actually: 3D helices
  34. Mitglied der Helmholtz-Gemeinschaft 10 PANDA — Online Tracking Example The

    detector side Everything in reverse Particle tracks are curves* → Find curves connecting hit points!
  35. Mitglied der Helmholtz-Gemeinschaft 10 PANDA — Online Tracking Example The

    detector side Everything in reverse Particle tracks are curves* → Find curves connecting hit points!
  36. Mitglied der Helmholtz-Gemeinschaft 10 PANDA — Online Tracking Example The

    detector side Everything in reverse Particle tracks are curves* → Find curves connecting hit points! Sort by track quality Hits well matched? How many hits?
  37. Mitglied der Helmholtz-Gemeinschaft 10 PANDA — Online Tracking Example The

    detector side Everything in reverse Particle tracks are curves* → Find curves connecting hit points! Sort by track quality Hits well matched? How many hits?
  38. Mitglied der Helmholtz-Gemeinschaft 10 PANDA — Online Tracking Example The

    detector side Everything in reverse Particle tracks are curves* → Find curves connecting hit points! Sort by track quality Hits well matched? How many hits? Identify final particles Curvature, length …
  39. Mitglied der Helmholtz-Gemeinschaft 10 PANDA — Online Tracking Example The

    detector side Everything in reverse Particle tracks are curves* → Find curves connecting hit points! Sort by track quality Hits well matched? How many hits? Identify final particles Curvature, length … π+ π- e+ e- ?
  40. Mitglied der Helmholtz-Gemeinschaft 10 PANDA — Online Tracking Example The

    detector side Everything in reverse Particle tracks are curves* → Find curves connecting hit points! Sort by track quality Hits well matched? How many hits? Identify final particles Curvature, length … Identify intermediate particles Mass constraints Geometry … π+ π- e+ e- ? ψ‘
  41. Mitglied der Helmholtz-Gemeinschaft 10 PANDA — Online Tracking Example The

    detector side Everything in reverse Particle tracks are curves* → Find curves connecting hit points! Sort by track quality Hits well matched? How many hits? Identify final particles Curvature, length … Identify intermediate particles Mass constraints Geometry … Identify process: pp → ψ‘ → e+e- π+ π- π+ π- e+ e- ? ψ‘
  42. Mitglied der Helmholtz-Gemeinschaft 11 PANDA — Triggering

  43. Mitglied der Helmholtz-Gemeinschaft 11 PANDA — Triggering Trigger

  44. Mitglied der Helmholtz-Gemeinschaft 11 PANDA — Triggering Trigger Fast detector

    layer(s) Trigger data acquisition
  45. Mitglied der Helmholtz-Gemeinschaft 11 PANDA — Triggering Trigger Fast detector

    layer(s) Trigger data acquisition π+ π- e+ e- ψ‘
  46. Mitglied der Helmholtz-Gemeinschaft 11 PANDA — Triggering Trigger Fast detector

    layer(s) Trigger data acquisition π+ π- e+ e- ψ‘ Usual HEP experiment
  47. Mitglied der Helmholtz-Gemeinschaft 11 PANDA — Triggering Trigger Fast detector

    layer(s) Trigger data acquisition π+ π- e+ e- ψ‘ Usual HEP experiment
  48. Mitglied der Helmholtz-Gemeinschaft 11 PANDA — Triggering Trigger Fast detector

    layer(s) Trigger data acquisition π+ π- e+ e- ψ‘ Usual HEP experiment
  49. Mitglied der Helmholtz-Gemeinschaft 11 PANDA — Triggering Trigger Fast detector

    layer(s) Trigger data acquisition π+ π- e+ e- ψ‘ Usual HEP experiment
  50. Mitglied der Helmholtz-Gemeinschaft 11 PANDA — Triggering Trigger Fast detector

    layer(s) Trigger data acquisition π+ π- e+ e- ψ‘ PANDA Usual HEP experiment
  51. Mitglied der Helmholtz-Gemeinschaft 11 PANDA — Triggering Trigger Fast detector

    layer(s) Trigger data acquisition π+ π- e+ e- ψ‘ PANDA Usual HEP experiment
  52. Mitglied der Helmholtz-Gemeinschaft 11 PANDA — Triggering Trigger Fast detector

    layer(s) Trigger data acquisition π+ π- e+ e- ψ‘ PANDA Usual HEP experiment
  53. Mitglied der Helmholtz-Gemeinschaft 11 PANDA — Triggering Trigger Fast detector

    layer(s) Trigger data acquisition π+ π- e+ e- ψ‘ PANDA Usual HEP experiment
  54. Mitglied der Helmholtz-Gemeinschaft 11 PANDA — Triggering Trigger Fast detector

    layer(s) Trigger data acquisition Online Tracking! π+ π- e+ e- ψ‘ PANDA Usual HEP experiment
  55. Mitglied der Helmholtz-Gemeinschaft GPUS AT PANDA 12

  56. Mitglied der Helmholtz-Gemeinschaft GPUs @ PANDA — Online Tracking •

    Port tracking algorithms to GPU – Serial → parallel – C++ → CUDA • Investigate suitability for online performance • But also: Find & invent tracking algorithms… • Under investigation: – Hough Transformation – Riemann Track Finder – Triplet Finder 13
  57. Mitglied der Helmholtz-Gemeinschaft ALGORITHMS #1 14 Hough Transform Riemann Track

    Finder Triplet Finder
  58. Mitglied der Helmholtz-Gemeinschaft Hough Transform • Established method for edge

    detection in images (from 1970s HEP experiments!) • New challenges for particle tracking algorithm – Only limited pixels per edge • Easily parallelizable method 15 Original algorithm by Hough, adapted by Duda & Hart
  59. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Method • Idea: Transform

    (x,y)i → (α,r)ij, find lines via (α,r) space • Solve rij line equation for – Many hits (x,y)i – Many αj ∈ [0°,360°) each • Fill histogram • Extract track parameters 16 x y x y Mitglied der Helmholtz-Gemeinschaft Hough Transform — Principle → Bin with highest multiplicity gives track parameters r α rij = cos ↵j · xi + sin ↵ More
  60. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Method • Idea: Transform

    (x,y)i → (α,r)ij, find lines via (α,r) space • Solve rij line equation for – Many hits (x,y)i – Many αj ∈ [0°,360°) each • Fill histogram • Extract track parameters 16 x y x y Mitglied der Helmholtz-Gemeinschaft Hough Transform — Principle → Bin with highest multiplicity gives track parameters r α rij = cos ↵j · xi + sin ↵ More i: ~100 hits/event (STT) j: steps of 0.2° rij: 180 000 rij = cos ↵j · xi + sin ↵j · yi
  61. Mitglied der Helmholtz-Gemeinschaft 17 ° Angle / 0 20 40

    60 80 100 120 140 160 180 Hough transformed -0.04 -0.02 0 0.02 0.04 0.06 0 Entries 324000 Mean x 90 Mean y 0.02791 RMS x 51.96 RMS y 0.02133 0 1 2 3 4 5 6 7 8 9 10 0 Entries 324000 Mean x 90 Mean y 0.02791 RMS x 51.96 RMS y 0.02133 PANDA STT 180 x 180 Grid r 0.06 0.04 α Hough Transform — Example 10 (x,y) points
  62. Mitglied der Helmholtz-Gemeinschaft 17 ° Angle / 0 20 40

    60 80 100 120 140 160 180 Hough transformed -0.04 -0.02 0 0.02 0.04 0.06 0 Entries 324000 Mean x 90 Mean y 0.02791 RMS x 51.96 RMS y 0.02133 0 1 2 3 4 5 6 7 8 9 10 0 Entries 324000 Mean x 90 Mean y 0.02791 RMS x 51.96 RMS y 0.02133 PANDA STT 180 x 180 Grid r 0.06 0.04 α Hough Transform — Example 10 (x,y) points
  63. Mitglied der Helmholtz-Gemeinschaft 17 r 0.06 0.04 ° Angle /

    0 20 40 60 80 100 120 140 160 180 Hough transformed -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0 Entries 2.2356e+08 Mean x 90 Mean y 0.02905 RMS x 51.96 RMS y 0.1063 0 5 10 15 20 25 0 Entries 2.2356e+08 Mean x 90 Mean y 0.02905 RMS x 51.96 RMS y 0.1063 1800 x 1800 Grid PANDA STT+MVD 68 (x,y) points α Hough Transform — Example
  64. Mitglied der Helmholtz-Gemeinschaft 17 r 0.06 0.04 ° Angle /

    0 20 40 60 80 100 120 140 160 180 Hough transformed -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0 Entries 2.2356e+08 Mean x 90 Mean y 0.02905 RMS x 51.96 RMS y 0.1063 0 5 10 15 20 25 0 Entries 2.2356e+08 Mean x 90 Mean y 0.02905 RMS x 51.96 RMS y 0.1063 1800 x 1800 Grid PANDA STT+MVD 68 (x,y) points α Hough Transform — Example
  65. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Remarks 18 Two Implementations

    Thrust Plain CUDA • Performance: 3 ms/event – Independent of angular granularity – Reduced to set of standard routines • Fast (uses Thrust‘s optimized algorithms) • Inflexible (has it‘s limits, hard to customize) – No peakfinding included • Even possible? • Adds to time! • Performance: 0.5 ms/event – Built completely for this task • Fitting to every problem • Customizable • A bit more complicated at parts – Simple peakfinder implemented (threshold) • Using: Dynamic Parallelism, Shared Memory
  66. Mitglied der Helmholtz-Gemeinschaft 19 ALGORITHMS #2 Hough Transform Riemann Track

    Finder Triplet Finder
  67. Mitglied der Helmholtz-Gemeinschaft 20 Riemann Track Finder • Algorithm in

    use in PANDA‘s offline analysis framework for long time – Good results – Well-understood – Handling of uncertainties • Work by Jonathan Timcheck – Summer student at Jülich Based on work by Strandlie et al
  68. Mitglied der Helmholtz-Gemeinschaft 21 Riemann Track Finder — Method •

    Idea: Don‘t fit lines (in 2D), fit planes (in 3D)! • Create seeds – All possible three hit combinations • Grow seeds to tracks Continuously test next hit if it fits – Use mapping to Riemann paraboloid x x x x y z‘ x x x y x x x x y x More on: Seeds; Growing
  69. nLayerx = 1 2 ⇣p 8x + 1 1 ⌘

    pos ( nLayerx ) = 3 pp 3 p 243x2 1 + 27x 32 / 3 + 1 3 p 3 3 pp 3 p 243x2 1 + 27x 1 Mitglied der Helmholtz-Gemeinschaft 22 Riemann Algorithm — GPU Version • GPU Optimization: Unfolding loops → 100 × faster than CPU version • Time for one event (Tesla K20X) Time(%) Time Calls Avg Min Max Name 75.55% 439.49us 1 439.49us 439.49us 439.49us extend_cut_hit_triplets_k 5.96% 34.656us 4 8.6640us 2.3360us 22.432us [CUDA memcpy DtoH] 4.36% 25.344us 1 25.344us 25.344us 25.344us cut_hit_triplets_k 4.26% 24.800us 6 4.1330us 3.7760us 5.3440us [CUDA memset] 2.57% 14.976us 1 14.976us 14.976us 14.976us generate_hit_triplet 2.44% 14.176us 1 14.176us 14.176us 14.176us generate_layer_triplets 1.30% 7.5520us 1 7.5520us 7.5520us 7.5520us void thrust 1.11% 6.4640us 1 6.4640us 6.4640us 6.4640us void thrust 1.11% 6.4640us 1 6.4640us 6.4640us 6.4640us void thrust 0.89% 5.1520us 5 1.0300us 928ns 1.3440us [CUDA memcpy HtoD] 0.45% 2.6240us 1 2.6240us 2.6240us 2.6240us project_onto_paraboloid_k int ijk = threadIdx.x + blockIdx.x * blockDim.x; for () {for () {for () {}}}
  70. Mitglied der Helmholtz-Gemeinschaft 23 ALGORITHMS #3 Hough Transform Riemann Track

    Finder Triplet Finder
  71. Mitglied der Helmholtz-Gemeinschaft 24 Triplet Finder • Algorithm specifically designed

    for the PANDA Straw Tube Tracker (STT) http://www.fz-juelich.de/ias/jsc/ Original algorithm by Marius Mertens et al 1.5 m • Ported to GPU by Andrew Adinetz – NVIDIA Application Lab Jülich – CUDA, Dynamic Parallelism, Thrust
  72. Mitglied der Helmholtz-Gemeinschaft 25 Triplet Finder • Idea: Use only

    subset of detector as seed – Combine 3 hits to Triplet – Calculate circle from 3 Triplets (no fit) • Features – Fast & robust algorithm, no t0 – Many tuning possibilities More
  73. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Display 26 Triplet Isochrone

    early Isochrone early & skewed Isochrone close Isochrone late MVD hit Track timed out Track current
  74. Mitglied der Helmholtz-Gemeinschaft 27 Triplet Finder — Times

  75. Mitglied der Helmholtz-Gemeinschaft 27 Triplet Finder — Times

  76. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Optimizations • Bunching Wrapper

    – Hits from one event have similar timestamp – Combine hits to sets (bunches) which occupy GPU best 28
  77. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Optimizations • Bunching Wrapper

    – Hits from one event have similar timestamp – Combine hits to sets (bunches) which occupy GPU best 28 Hit
  78. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Optimizations • Bunching Wrapper

    – Hits from one event have similar timestamp – Combine hits to sets (bunches) which occupy GPU best 28 Hit Event
  79. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Optimizations • Bunching Wrapper

    – Hits from one event have similar timestamp – Combine hits to sets (bunches) which occupy GPU best 28 Hit Event
  80. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Optimizations • Bunching Wrapper

    – Hits from one event have similar timestamp – Combine hits to sets (bunches) which occupy GPU best 28 Hit Event Bunch
  81. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Optimizations • Bunching Wrapper

    – Hits from one event have similar timestamp – Combine hits to sets (bunches) which occupy GPU best 28 Hit Event Bunch (N2) → (N)
  82. Mitglied der Helmholtz-Gemeinschaft 29 Triplet Finder — Bunching Performance

  83. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Optimizations 30 More •

    Sector Row testing – After found track: Hit association not with all hits of current window, but only with subset (first test rows of sector, then hits of row)
  84. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Optimizations 30 More •

    Sector Row testing – After found track: Hit association not with all hits of current window, but only with subset (first test rows of sector, then hits of row)
  85. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Optimizations 30 More •

    Sector Row testing – After found track: Hit association not with all hits of current window, but only with subset (first test rows of sector, then hits of row)
  86. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Optimizations 30 More •

    Sector Row testing – After found track: Hit association not with all hits of current window, but only with subset (first test rows of sector, then hits of row)
  87. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Optimizations 30 More •

    Sector Row testing – After found track: Hit association not with all hits of current window, but only with subset (first test rows of sector, then hits of row)
  88. Mitglied der Helmholtz-Gemeinschaft 31 Triplet Finder — Sector Rows Preliminary

    (in publication)
  89. Dynamic Parallelism Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Optimizations •

    Compare kernel launch strategies 32 1 thread/bunch Calling kernel 1 thread/bunch Calling kernel Triplet Finder 1 thread/bunch Calling kernel 1 block/bunch Joined kernel 1 block/bunch Joined kernel 1 block/bunch Joined kernel TF Stage #1 TF Stage #2 TF Stage #3 TF Stage #4 1 stream/bunch Combining stream 1 stream/bunch Combining stream 1 stream/bunch Calling stream Joined Kernel Host Streams Triplet Finder Triplet Finder CPU GPU TF Stage #1 TF Stage #2 TF Stage #3 TF Stage #4 TF Stage #1 TF Stage #2 TF Stage #3 TF Stage #4 CPU GPU
  90. Mitglied der Helmholtz-Gemeinschaft 33 Triplet Finder — Kernel Launches Explanation

    Preliminary (in publication)
  91. Tesla K40 Tesla K20X Peak double performance Peak single performance

    GPU Chipset # CUDA Cores Memory size Memory bandwidth 1.46 TFLOPS 1.31 TFLOPS 4.29 TFLOPS 3.95 TFLOPS GK110B GK110 2880 2688 12 GB 6 GB 288 GByte/s 250 GByte/s Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Optimizations • Impact of chipset 34 Source: http://www.nvidia.com/content/tesla/pdf/NVIDIA-Tesla-Kepler-Family-Datasheet.pdf
  92. Mitglied der Helmholtz-Gemeinschaft 35 Triplet Finder — Clock Speed /

    GPU Preliminary (in publication) K40 3004 MHz, 745 MHz / 875 MHz K20X 2600 MHz, 732 MHz / 784 MHz Memory Clock Core Clock GPU Boost
  93. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Optimizations • Many optimizations

    possible – Most important: Bunching wrapper – More float less double-cards à la K10 a viable alternative • Best performance: 20 µs/event → Online Tracking a feasible technique for PANDA – Multi GPU system needed – (100) GPUs 36
  94. Mitglied der Helmholtz-Gemeinschaft Summary • GPUs are very interesting for

    HEP • PANDA investigates GPUs as central element in experiment‘s design • Algorithms in active evaluation and optimization • Collaboration with NVIDIA Application Lab 37
  95. Thank you! Andreas Herten a.herten@fz-juelich.de @AndiH #GTC14 Mitglied der Helmholtz-Gemeinschaft

    Summary • GPUs are very interesting for HEP • PANDA investigates GPUs as central element in experiment‘s design • Algorithms in active evaluation and optimization • Collaboration with NVIDIA Application Lab 37
  96. Mitglied der Helmholtz-Gemeinschaft List of Resources Used • #4: Earth

    icon by Francesco Paleari from The Noun Project • #4: Einstein icon by Roman Rusinov from The Noun Project • #6: FAIR vector logo from official FAIR website • #6: FAIR rendering from official website • #11: Flare Gun icon by Jop van der Kroef from The Noun Project • #27: STT event animation by Marius C. Mertens • #35: Graphics cards images by NVIDIA promotion • #35: GPU Specifications – Tesla K20X Specifications: http://www.nvidia.com/content/PDF/kepler/Tesla-K20X-BD-06397-001- v07.pdf – Tesla K40 Specifications: http://www.nvidia.com/content/PDF/kepler/Tesla-K40-Active-Board-Spec- BD-06949-001_v03.pdf – Tesla Familiy Overview: http://www.nvidia.com/content/tesla/pdf/NVIDIA-Tesla-Kepler-Family- Datasheet.pdf 38
  97. Mitglied der Helmholtz-Gemeinschaft BACKUP 39

  98. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Principle 40 Back

  99. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Principle 40 x y

    Back
  100. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Principle 40 x y

    Back
  101. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Principle 40 x y

    * * (r, α)1 rij = cos ↵j · xi + sin ↵j · yi + ⇢i Back
  102. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Principle 40 x y

    * * r α (r, α)1 rij = cos ↵j · xi + sin ↵j · yi + ⇢i Back
  103. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Principle 40 x y

    * * r α (r, α)1 rij = cos ↵j · xi + sin ↵j · yi + ⇢i Back
  104. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Principle 40 x y

    * * r α (r, α)1 (r, α)2 rij = cos ↵j · xi + sin ↵j · yi + ⇢i Back
  105. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Principle 40 x y

    * * r α rij = cos ↵j · xi + sin ↵j · yi + ⇢i Back
  106. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Principle 40 x y

    * * r α rij = cos ↵j · xi + sin ↵j · yi + ⇢i Back
  107. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Principle 40 x y

    * * r α rij = cos ↵j · xi + sin ↵j · yi + ⇢i Back
  108. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Principle 40 x y

    * * r α rij = cos ↵j · xi + sin ↵j · yi + ⇢i Back
  109. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Principle 40 x y

    * * r α rij = cos ↵j · xi + sin ↵j · yi + ⇢i Back
  110. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Principle 40 x y

    → Bin with highest multiplicity gives track parameters * * r α rij = cos ↵j · xi + sin ↵j · yi + ⇢i Back
  111. Mitglied der Helmholtz-Gemeinschaft 41 Riemann Algorithm — Procedure

  112. Mitglied der Helmholtz-Gemeinschaft 41 Riemann Algorithm — Procedure • Create

    triplet of hit points – All possible three hit combinations need to become triplets 1
  113. Mitglied der Helmholtz-Gemeinschaft 41 Riemann Algorithm — Procedure • Create

    triplet of hit points – All possible three hit combinations need to become triplets • Grow triplets to tracks: Continuously test next hit if it fits to triplet track – Use Riemann paraboloid to circle fit track • Test closeness of new hit: good → add hit; bad → dismiss hit • Continue with next hit – Helix fit: arc length s vs. z position 1 2
  114. Mitglied der Helmholtz-Gemeinschaft 42 1 2 3 4 5 1

    2 3 4 5 Riemann Algorithm — 1 Triplets 1 Layer number Back
  115. Mitglied der Helmholtz-Gemeinschaft 42 1 2 3 4 5 1

    2 3 4 5 Riemann Algorithm — 1 Triplets 1 Layer number Back
  116. Mitglied der Helmholtz-Gemeinschaft 42 1 2 3 4 5 1

    2 3 4 5 Riemann Algorithm — 1 Triplets 1 Layer number Back
  117. Mitglied der Helmholtz-Gemeinschaft 42 1 2 3 4 5 21

    11 31 1 2 3 4 5 Riemann Algorithm — 1 Triplets 1 Layer number Back
  118. Mitglied der Helmholtz-Gemeinschaft 42 1 2 3 4 5 21

    11 31 31 11 41 1 2 3 4 5 Riemann Algorithm — 1 Triplets 1 Layer number Back
  119. Mitglied der Helmholtz-Gemeinschaft 42 1 2 3 4 5 21

    11 31 31 11 41 31 11 32 1 2 3 4 5 Riemann Algorithm — 1 Triplets 1 Layer number Back
  120. Mitglied der Helmholtz-Gemeinschaft 42 1 2 3 4 5 21

    11 31 31 11 41 31 11 32 1 2 3 4 5 Riemann Algorithm — 1 Triplets 1 Layer number Back
  121. Mitglied der Helmholtz-Gemeinschaft 43 Riemann Algorithm — 1 Expansion 2

    Back
  122. Mitglied der Helmholtz-Gemeinschaft 43 Riemann Algorithm — 1 Expansion 2

    x x x x y z‘ Expand to z‘ Back
  123. Mitglied der Helmholtz-Gemeinschaft 43 Riemann Algorithm — 1 Expansion 2

    x x x x y z‘ Expand to z‘ x x x y x Riemann Surface (paraboloid) Back
  124. Mitglied der Helmholtz-Gemeinschaft 43 Riemann Algorithm — 1 Expansion 2

    x x x x y z‘ Expand to z‘ x x x y x Riemann Surface (paraboloid) Back
  125. Mitglied der Helmholtz-Gemeinschaft 43 Riemann Algorithm — 1 Expansion 2

    x x x x y z‘ Expand to z‘ x x x y x Riemann Surface (paraboloid) Back
  126. Mitglied der Helmholtz-Gemeinschaft 43 Riemann Algorithm — 1 Expansion 2

    x x x x y z‘ Expand to z‘ x x x y x Riemann Surface (paraboloid) Back
  127. Mitglied der Helmholtz-Gemeinschaft 43 Riemann Algorithm — 1 Expansion 2

    x x x x y z‘ Expand to z‘ x x x y x Riemann Surface (paraboloid) Back
  128. Mitglied der Helmholtz-Gemeinschaft 43 Riemann Algorithm — 1 Expansion 2

    x x x x y z‘ Expand to z‘ x x x y x Riemann Surface (paraboloid) x Back
  129. Mitglied der Helmholtz-Gemeinschaft 43 Riemann Algorithm — 1 Expansion 2

    x x x x y z‘ Expand to z‘ x x x y x Riemann Surface (paraboloid) x Back
  130. Mitglied der Helmholtz-Gemeinschaft 43 Riemann Algorithm — 1 Expansion 2

    x x x x y z‘ Expand to z‘ x x x y x Riemann Surface (paraboloid) x Back
  131. Mitglied der Helmholtz-Gemeinschaft 43 Riemann Algorithm — 1 Expansion 2

    x x x x y z‘ Expand to z‘ x x x y x Riemann Surface (paraboloid) x Back
  132. Mitglied der Helmholtz-Gemeinschaft 43 Riemann Algorithm — 1 Expansion 2

    x x x x y z‘ Expand to z‘ x x x y x Riemann Surface (paraboloid) x Back
  133. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Method 44 More STT

  134. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Method • STT hit

    in pivot straw 44 More STT
  135. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Method • STT hit

    in pivot straw • Find surrounding hits → Create virtual hit (triplet) at center of gravity (cog) 44 More STT
  136. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Method • STT hit

    in pivot straw • Find surrounding hits → Create virtual hit (triplet) at center of gravity (cog) • Combine with 44 More STT
  137. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Method • STT hit

    in pivot straw • Find surrounding hits → Create virtual hit (triplet) at center of gravity (cog) • Combine with 1. Second STT pivot-cog virtual hit 44 More STT
  138. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Method • STT hit

    in pivot straw • Find surrounding hits → Create virtual hit (triplet) at center of gravity (cog) • Combine with 1. Second STT pivot-cog virtual hit 2. Interaction point 44 More STT
  139. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Method • STT hit

    in pivot straw • Find surrounding hits → Create virtual hit (triplet) at center of gravity (cog) • Combine with 1. Second STT pivot-cog virtual hit 2. Interaction point • Calculate circle through three points 44 More STT
  140. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Method • STT hit

    in pivot straw • Find surrounding hits → Create virtual hit (triplet) at center of gravity (cog) • Combine with 1. Second STT pivot-cog virtual hit 2. Interaction point • Calculate circle through three points → Track Candidate 44 More STT
  141. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Method • STT hit

    in pivot straw • Find surrounding hits → Create virtual hit (triplet) at center of gravity (cog) • Combine with 1. Second STT pivot-cog virtual hit 2. Interaction point • Calculate circle through three points → Track Candidate 44 More STT
  142. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Method • STT hit

    in pivot straw • Find surrounding hits → Create virtual hit (triplet) at center of gravity (cog) • Combine with 1. Second STT pivot-cog virtual hit 2. Interaction point • Calculate circle through three points → Track Candidate 44 More STT
  143. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Method • STT hit

    in pivot straw • Find surrounding hits → Create virtual hit (triplet) at center of gravity (cog) • Combine with 1. Second STT pivot-cog virtual hit 2. Interaction point • Calculate circle through three points → Track Candidate 44 More STT
  144. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Method • STT hit

    in pivot straw • Find surrounding hits → Create virtual hit (triplet) at center of gravity (cog) • Combine with 1. Second STT pivot-cog virtual hit 2. Interaction point • Calculate circle through three points → Track Candidate 44 More STT
  145. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Method • STT hit

    in pivot straw • Find surrounding hits → Create virtual hit (triplet) at center of gravity (cog) • Combine with 1. Second STT pivot-cog virtual hit 2. Interaction point • Calculate circle through three points → Track Candidate 44 More Interaction Point STT
  146. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Optimizations • Sector Row

    testing – Thicken track; shrink sector row layer to line – Find intersection 45 Sector-Row Testing Track Sector-Row Track Sector-Row Back
  147. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Kernel Launch Strategies •

    Joined Kernel (JK): slowest – High # registers → low occupancy • Dynamic Parallelism (DP) / Host Streams (HS): comparable performance – Performance • HS faster for small # processed hits, DP faster for > 45000 hits • HS stagnates there, while DP continues rising – Limiting factor • High # of required kernel calls • Kernel launch latency • Memcopy – HS more affected by this, because • More PCI-E transfers (launch configurations for kernels) • Less launch throughput, kernel launch latency gets more important • False dependencies of launched kernels – Single CPU thread handles all CUDA streams (Multi-thread possible, but synchronization overhead too high for good performance) – Grid scheduling done on hardware (Grid Management Unit) (DP: software) » False dependencies when N(streams) > N(device connections)=323.5 46 Back Back
  148. Mitglied der Helmholtz-Gemeinschaft 47 Triplet Finder — Host Stream Connections

    Preliminary (in publication)
  149. Mitglied der Helmholtz-Gemeinschaft 48 Triplet Finder — Bunch Sizes Preliminary

    (in publication)
  150. Berlin Munich Cologne Jülich Mitglied der Helmholtz-Gemeinschaft 49 Forschungszentrum Jülich

    & Me • Research Center – *1956; Federal center Budget: 730 Mio. USD/year – 5300 employees • Thereof 1700 scientists (600 PhD students) – Topics: Health, Energy, Environment Physics; Supercomputing Many large-scale facilities • Me – Diploma in physics from RWTH Aachen University (CMS experiment) – PhD researcher since 2011: GPU Online Tracking for PANDA