Upgrade to Pro — share decks privately, control downloads, hide ads and more …

GPU-based Online Tracking for the PANDA Experiment [NV App Lab 2014]

AndiH
October 17, 2014

GPU-based Online Tracking for the PANDA Experiment [NV App Lab 2014]

Talk for the NVIDIA Application Lab workshop 2014 at the Jülich Supercomputing Centre.

AndiH

October 17, 2014
Tweet

More Decks by AndiH

Other Decks in Science

Transcript

  1. Mitglied der Helmholtz-Gemeinschaft 1 NVIDIA Application Lab Workshop 2014 8

    October 2014, Andreas Herten GPU-based Online Tracking for the PANDA Experiment
  2. Mitglied der Helmholtz-Gemeinschaft Outline • PANDA – HEP – Our

    Experiment – Online Event Filter • Algorithms – Hough Transform – Riemann Track Finder – Triplet Finder 2
  3. Mitglied der Helmholtz-Gemeinschaft High Energy Physics • High Energy Physics

    (HEP): – 3 HEP Recipe 1. Accelerate particles (e, p,…) 2. Accelerate particles more! 3. Smash into each other 4. Look at resulting particles 5. Understand universe
  4. Mitglied der Helmholtz-Gemeinschaft High Energy Physics • High Energy Physics

    (HEP): – 3 HEP Recipe 1. Accelerate particles (e, p,…) 2. Accelerate particles more! 3. Smash into each other 4. Look at resulting particles 5. Understand universe
  5. Mitglied der Helmholtz-Gemeinschaft High Energy Physics • High Energy Physics

    (HEP): – 3 HEP Recipe 1. Accelerate particles (e, p,…) 2. Accelerate particles more! 3. Smash into each other 4. Look at resulting particles 5. Understand universe
  6. Mitglied der Helmholtz-Gemeinschaft High Energy Physics • High Energy Physics

    (HEP): – 3 HEP Recipe 1. Accelerate particles (e, p,…) 2. Accelerate particles more! 3. Smash into each other 4. Look at resulting particles 5. Understand universe
  7. Mitglied der Helmholtz-Gemeinschaft High Energy Physics • High Energy Physics

    (HEP): – 3 HEP Recipe 1. Accelerate particles (e, p,…) 2. Accelerate particles more! 3. Smash into each other 4. Look at resulting particles 5. Understand universe E=mc2
  8. Mitglied der Helmholtz-Gemeinschaft High Energy Physics • High Energy Physics

    (HEP): – 3 HEP Recipe 1. Accelerate particles (e, p,…) 2. Accelerate particles more! 3. Smash into each other 4. Look at resulting particles 5. Understand universe E=mc2
  9. Mitglied der Helmholtz-Gemeinschaft High Energy Physics • High Energy Physics

    (HEP): – 3 HEP Recipe 1. Accelerate particles (e, p,…) 2. Accelerate particles more! 3. Smash into each other 4. Look at resulting particles 5. Understand universe ✓ E=mc2
  10. Mitglied der Helmholtz-Gemeinschaft High Energy Physics • High Energy Physics

    (HEP): – 3 HEP Recipe 1. Accelerate particles (e, p,…) 2. Accelerate particles more! 3. Smash into each other 4. Look at resulting particles 5. Understand universe ✓ – GPUs are interesting for HEP • Many events due to high collision rate • Events independent, dividable into subsets • Many features extractable (computational intensive) E=mc2
  11. Mitglied der Helmholtz-Gemeinschaft FAIR • Facility for Antiproton and Ion

    Research – New accelerator complex (Darmstadt, Germany) – Next to GSI laboratory – Construction in progress, ending 2018 – Four pillars of research: 4 APPA NUSTAR CBM PANDA Atom & plasma physics Nuclear structure, astro physics Hadron physics Hadron physics
  12. Mitglied der Helmholtz-Gemeinschaft FAIR • Facility for Antiproton and Ion

    Research – New accelerator complex (Darmstadt, Germany) – Next to GSI laboratory – Construction in progress, ending 2018 – Four pillars of research: 4 APPA NUSTAR CBM PANDA Atom & plasma physics Nuclear structure, astro physics Hadron physics Hadron physics © Google Maps
  13. Mitglied der Helmholtz-Gemeinschaft FAIR • Facility for Antiproton and Ion

    Research – New accelerator complex (Darmstadt, Germany) – Next to GSI laboratory – Construction in progress, ending 2018 – Four pillars of research: 4 APPA NUSTAR CBM PANDA Atom & plasma physics Nuclear structure, astro physics Hadron physics Hadron physics
  14. Mitglied der Helmholtz-Gemeinschaft FAIR • Facility for Antiproton and Ion

    Research – New accelerator complex (Darmstadt, Germany) – Next to GSI laboratory – Construction in progress, ending 2018 – Four pillars of research: 4 APPA NUSTAR CBM PANDA Atom & plasma physics Nuclear structure, astro physics Hadron physics Hadron physics fair-center.eu
  15. Mitglied der Helmholtz-Gemeinschaft Mitglied der Helmholtz-Gemeinschaft FAIR Accelerator Complex 5

    Existing facilities New facilities Experiments Beam momentum: p = 1.5 - 15 GeV/c High resolution: L = 1031 cm-2s-1 Δp/p < 4·10-5 High luminosity: L = 2·1032 cm-2s-1 Δp/p < 10-4 High Energy Storage Ring
  16. Mitglied der Helmholtz-Gemeinschaft PANDA — MVD & STT Micro Vertex

    Detector • Silicon-based pixel + strip detector • 10 000 000 + 200 000 channels • Vertex resolution: < 100 μm 7 Straw Tube Tracker • 4636 small drift tubes (ø 1 cm) • Drift times: < 250 ns • 26 layers, 8 skewed • Material budget: 1.2 % radiation length
  17. Mitglied der Helmholtz-Gemeinschaft PANDA — Physics • Meson spectroscopy –

    Light mesons – Charmonium – Open charm – Exotic states • Glueballs • Hybrids • Molecules/multiquarks • Baryon production • Nucleon structure, e.m. processes • Charm in nuclei • Strangeness physics 8 0 2 4 6 8 12 15 10 p Momentum / GeV/c Mass / GeV/c2 1 2 3 4 5 6 ΛΛ ΣΣ ΞΞ Λc Λc Σc Σc Ξc Ξc Ωc Ωc ΩΩ DD Ds Ds ggg,gg light qq π,ρ,ω,f2 ,K,K* cc J/ψ, ηc , χcJ qqqq ccqq nng,ssg ccg nng,ssg ccg ggg
  18. Mitglied der Helmholtz-Gemeinschaft PANDA — Physics • Meson spectroscopy –

    Light mesons – Charmonium – Open charm – Exotic states • Glueballs • Hybrids • Molecules/multiquarks • Baryon production • Nucleon structure, e.m. processes • Charm in nuclei • Strangeness physics 8 → Broad physics program 0 2 4 6 8 12 15 10 p Momentum / GeV/c Mass / GeV/c2 1 2 3 4 5 6 ΛΛ ΣΣ ΞΞ Λc Λc Σc Σc Ξc Ξc Ωc Ωc ΩΩ DD Ds Ds ggg,gg light qq π,ρ,ω,f2 ,K,K* cc J/ψ, ηc , χcJ qqqq ccqq nng,ssg ccg nng,ssg ccg ggg
  19. Mitglied der Helmholtz-Gemeinschaft PANDA — Event Reconstruction • Continuous read

    out – Novel feature – Background & signal similar – No hardware trigger based on few sub-detectors, but online event reconstruction using full detector information 9 (Reject background events, save interesting events) Reduction Amount: Time: ~1/1000 50 ns/evt Storage space for offline analysis 3 PB/y Event: Raw data: 2 × 107/s 200 GB/s Rate
  20. Mitglied der Helmholtz-Gemeinschaft PANDA — Read Out Scheme Requirements to

    Online Tracking • Fast • Sophisticated algorithms possible; reprogrammable • Parallelism beyond single devices • Fast • Limited precision ok 10
  21. Mitglied der Helmholtz-Gemeinschaft PANDA — Read Out Scheme Requirements to

    Online Tracking • Fast • Sophisticated algorithms possible; reprogrammable • Parallelism beyond single devices • Fast • Limited precision ok 10 GPUs
  22. Mitglied der Helmholtz-Gemeinschaft 11 PANDA — Online Tracking Example pp

    → ψ‘→ ψ π+ π- The physics side: Antiproton-proton event e+e-
  23. Mitglied der Helmholtz-Gemeinschaft 11 PANDA — Online Tracking Example pp

    → ψ‘→ ψ π+ π- The physics side: Antiproton-proton event e+e-
  24. Mitglied der Helmholtz-Gemeinschaft 11 PANDA — Online Tracking Example pp

    → ψ‘→ ψ π+ π- The physics side: Antiproton-proton event e+e-
  25. Mitglied der Helmholtz-Gemeinschaft 11 PANDA — Online Tracking Example pp

    → ψ‘→ ψ π+ π- The physics side: Antiproton-proton event e+e-
  26. Mitglied der Helmholtz-Gemeinschaft 11 PANDA — Online Tracking Example π+

    π- e+ e- ψ‘ pp → ψ‘→ ψ π+ π- The physics side: Antiproton-proton event e+e-
  27. Mitglied der Helmholtz-Gemeinschaft 11 PANDA — Online Tracking Example π+

    π- e+ e- ψ‘ pp → ψ‘→ ψ π+ π- The physics side: Antiproton-proton event e+e-
  28. Mitglied der Helmholtz-Gemeinschaft 11 PANDA — Online Tracking Example π+

    π- e+ e- ψ‘ pp → ψ‘→ ψ π+ π- The physics side: Antiproton-proton event e+e-
  29. Mitglied der Helmholtz-Gemeinschaft 11 PANDA — Online Tracking Example pp

    → ψ‘→ ψ π+ π- The physics side: Antiproton-proton event e+e-
  30. Mitglied der Helmholtz-Gemeinschaft 12 PANDA — Online Tracking Example The

    detector side Everything in reverse Particle tracks are curves*
  31. Mitglied der Helmholtz-Gemeinschaft 12 PANDA — Online Tracking Example The

    detector side Everything in reverse Particle tracks are curves* actually: 3D helices
  32. Mitglied der Helmholtz-Gemeinschaft 12 PANDA — Online Tracking Example The

    detector side Everything in reverse Particle tracks are curves* → Find curves connecting hit points!
  33. Mitglied der Helmholtz-Gemeinschaft 12 PANDA — Online Tracking Example The

    detector side Everything in reverse Particle tracks are curves* → Find curves connecting hit points!
  34. Mitglied der Helmholtz-Gemeinschaft 12 PANDA — Online Tracking Example The

    detector side Everything in reverse Particle tracks are curves* → Find curves connecting hit points! Sort by track quality Hits well matched? How many hits?
  35. Mitglied der Helmholtz-Gemeinschaft 12 PANDA — Online Tracking Example The

    detector side Everything in reverse Particle tracks are curves* → Find curves connecting hit points! Sort by track quality Hits well matched? How many hits?
  36. Mitglied der Helmholtz-Gemeinschaft 12 PANDA — Online Tracking Example The

    detector side Everything in reverse Particle tracks are curves* → Find curves connecting hit points! Sort by track quality Hits well matched? How many hits? Identify final particles Curvature, length …
  37. Mitglied der Helmholtz-Gemeinschaft 12 PANDA — Online Tracking Example The

    detector side Everything in reverse Particle tracks are curves* → Find curves connecting hit points! Sort by track quality Hits well matched? How many hits? Identify final particles Curvature, length … π+ π- e+ e- ?
  38. Mitglied der Helmholtz-Gemeinschaft 12 PANDA — Online Tracking Example The

    detector side Everything in reverse Particle tracks are curves* → Find curves connecting hit points! Sort by track quality Hits well matched? How many hits? Identify final particles Curvature, length … Identify intermediate particles Mass constraints Geometry … π+ π- e+ e- ? ψ‘
  39. Mitglied der Helmholtz-Gemeinschaft 12 PANDA — Online Tracking Example The

    detector side Everything in reverse Particle tracks are curves* → Find curves connecting hit points! Sort by track quality Hits well matched? How many hits? Identify final particles Curvature, length … Identify intermediate particles Mass constraints Geometry … Identify process: pp → ψ‘ → e+e- π+ π- π+ π- e+ e- ? ψ‘
  40. Mitglied der Helmholtz-Gemeinschaft 13 PANDA — Triggering Trigger Fast detector

    layer(s) Trigger data acquisition π+ π- e+ e- ψ‘
  41. Mitglied der Helmholtz-Gemeinschaft 13 PANDA — Triggering Trigger Fast detector

    layer(s) Trigger data acquisition π+ π- e+ e- ψ‘ Usual HEP experiment
  42. Mitglied der Helmholtz-Gemeinschaft 13 PANDA — Triggering Trigger Fast detector

    layer(s) Trigger data acquisition π+ π- e+ e- ψ‘ Usual HEP experiment
  43. Mitglied der Helmholtz-Gemeinschaft 13 PANDA — Triggering Trigger Fast detector

    layer(s) Trigger data acquisition π+ π- e+ e- ψ‘ Usual HEP experiment
  44. Mitglied der Helmholtz-Gemeinschaft 13 PANDA — Triggering Trigger Fast detector

    layer(s) Trigger data acquisition π+ π- e+ e- ψ‘ Usual HEP experiment
  45. Mitglied der Helmholtz-Gemeinschaft 13 PANDA — Triggering Trigger Fast detector

    layer(s) Trigger data acquisition π+ π- e+ e- ψ‘ PANDA Usual HEP experiment
  46. Mitglied der Helmholtz-Gemeinschaft 13 PANDA — Triggering Trigger Fast detector

    layer(s) Trigger data acquisition π+ π- e+ e- ψ‘ PANDA Usual HEP experiment
  47. Mitglied der Helmholtz-Gemeinschaft 13 PANDA — Triggering Trigger Fast detector

    layer(s) Trigger data acquisition π+ π- e+ e- ψ‘ PANDA Usual HEP experiment
  48. Mitglied der Helmholtz-Gemeinschaft 13 PANDA — Triggering Trigger Fast detector

    layer(s) Trigger data acquisition π+ π- e+ e- ψ‘ PANDA Usual HEP experiment
  49. Mitglied der Helmholtz-Gemeinschaft 13 PANDA — Triggering Trigger Fast detector

    layer(s) Trigger data acquisition Online Tracking! π+ π- e+ e- ψ‘ PANDA Usual HEP experiment
  50. Mitglied der Helmholtz-Gemeinschaft Algorithm: Hough Transform • Idea: Transform (x,y)i

    → (α,r)ij, find lines via (α,r) space • Solve rij line equation for – Lots of hits (x,y,ρ)i and – Many αj ∈ [0°,360°) each • Fill histogram • Extract track parameters 16 x y x y Mitglied der Helmholtz-Gemeinschaft Hough Transform — Princip → Bin giv r α
  51. Mitglied der Helmholtz-Gemeinschaft Algorithm: Hough Transform • Idea: Transform (x,y)i

    → (α,r)ij, find lines via (α,r) space • Solve rij line equation for – Lots of hits (x,y,ρ)i and – Many αj ∈ [0°,360°) each • Fill histogram • Extract track parameters 16 rij = cos ↵j · xi + sin ↵j · yi + ⇢i i: ~100 hits/event (STT) j: every 0.2° rij: 180 000 x y x y Mitglied der Helmholtz-Gemeinschaft Hough Transform — Princip → Bin giv r α
  52. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Visualization Lines 17 •

    Create lines going through hit point (x,y)i – Line parameterized by rij = cos(αj) ⋅ xi + sin(αj) ⋅ yi • Fill line parameters (α,r)ij into histogram – Rasterize for many αj ∈ [0°,180°) α = 0°, 10°, 20°, …
  53. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Visualization Lines 17 •

    Create lines going through hit point (x,y)i – Line parameterized by rij = cos(αj) ⋅ xi + sin(αj) ⋅ yi • Fill line parameters (α,r)ij into histogram – Rasterize for many αj ∈ [0°,180°) α = 0°, 10°, 20°, …
  54. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Visualization Lines 18 •

    Create lines going through hit point (x,y)i – Line parameterized by rij = cos(αj) ⋅ xi + sin(αj) ⋅ yi • Fill line parameters (α,r)ij into histogram – Rasterize for many αj ∈ [0°,180°) α = 0°, 10°, 20°, …
  55. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Visualization Lines 18 •

    Create lines going through hit point (x,y)i – Line parameterized by rij = cos(αj) ⋅ xi + sin(αj) ⋅ yi • Fill line parameters (α,r)ij into histogram – Rasterize for many αj ∈ [0°,180°) α = 0°, 10°, 20°, …
  56. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Visualization Lines 18 •

    Create lines going through hit point (x,y)i – Line parameterized by rij = cos(αj) ⋅ xi + sin(αj) ⋅ yi • Fill line parameters (α,r)ij into histogram – Rasterize for many αj ∈ [0°,180°) α = 0°, 10°, 20°, …
  57. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Visualization Lines 18 •

    Create lines going through hit point (x,y)i – Line parameterized by rij = cos(αj) ⋅ xi + sin(αj) ⋅ yi • Fill line parameters (α,r)ij into histogram – Rasterize for many αj ∈ [0°,180°) α = 0°, 10°, 20°, …
  58. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Visualization Points 19 •

    Create lines going through hit point (x,y)i • Repeat for every hit point i α = 0°, 10°, 20°, …
  59. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Visualization Points 19 •

    Create lines going through hit point (x,y)i • Repeat for every hit point i α = 0°, 10°, 20°, …
  60. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Visualization Points 20 •

    Create lines going through hit point (x,y)i • Repeat for every hit point i α = 0°, 10°, 20°, …
  61. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Visualization Points 20 •

    Create lines going through hit point (x,y)i • Repeat for every hit point i α = 0°, 10°, 20°, …
  62. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Visualization Points 20 •

    Create lines going through hit point (x,y)i • Repeat for every hit point i α = 0°, 10°, 20°, …
  63. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Visualization Points 20 •

    Create lines going through hit point (x,y)i • Repeat for every hit point i α = 0°, 10°, 20°, …
  64. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Granularity 21 • Choice

    of α granularity determines resolution α = 0°, 2°, 4°, … α = 0°, 2°, 4°, …
  65. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Granularity 21 • Choice

    of α granularity determines resolution α = 0°, 2°, 4°, … α = 0°, 2°, 4°, …
  66. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Granularity 21 • Choice

    of α granularity determines resolution α = 0°, 2°, 4°, … α = 0°, 2°, 4°, …
  67. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Granularity 21 • Choice

    of α granularity determines resolution α = 0°, 2°, 4°, … α = 0°, 2°, 4°, …
  68. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Granularity 21 • Choice

    of α granularity determines resolution α = 0°, 2°, 4°, … α = 0°, 2°, 4°, …
  69. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Granularity 22 • Choice

    of α granularity determines resolution α = 0°, 2°, 4°, … α = 0°, 2°, 4°, …
  70. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Granularity 22 • Choice

    of α granularity determines resolution α = 0°, 2°, 4°, … α = 0°, 2°, 4°, …
  71. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Granularity 22 • Choice

    of α granularity determines resolution α = 0°, 2°, 4°, … α = 0°, 2°, 4°, …
  72. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Granularity 22 • Choice

    of α granularity determines resolution α = 0°, 2°, 4°, … α = 0°, 2°, 4°, …
  73. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Granularity 22 • Choice

    of α granularity determines resolution α = 0°, 2°, 4°, … α = 0°, 2°, 4°, …
  74. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Granularity 22 • Choice

    of α granularity determines resolution α = 0°, 2°, 4°, … α = 0°, 2°, 4°, …
  75. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Granularity 22 • Choice

    of α granularity determines resolution α = 0°, 2°, 4°, … α = 0°, 2°, 4°, …
  76. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Granularity 22 • Choice

    of α granularity determines resolution α = 0°, 2°, 4°, … α = 0°, 2°, 4°, …
  77. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Granularity 22 • Choice

    of α granularity determines resolution α = 0°, 2°, 4°, … α = 0°, 2°, 4°, …
  78. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Granularity 22 • Choice

    of α granularity determines resolution α = 0°, 2°, 4°, … α = 0°, 2°, 4°, …
  79. ° Angle / 0 20 40 60 80 100 120

    140 160 180 Hough transformed -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0 Entries 2.2356e+08 Mean x 90 Mean y 0.02905 RMS x 51.96 RMS y 0.1063 0 5 10 15 20 25 0 Entries 2.2356e+08 Mean x 90 Mean y 0.02905 RMS x 51.96 RMS y 0.1063 1800 x 1800 Grid PANDA STT+MVD Mitglied der Helmholtz-Gemeinschaft 23 68 (x,y) points r α Algorithm: Hough Transform
  80. ° Angle / 0 20 40 60 80 100 120

    140 160 180 Hough transformed -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0 Entries 2.2356e+08 Mean x 90 Mean y 0.02905 RMS x 51.96 RMS y 0.1063 0 5 10 15 20 25 0 Entries 2.2356e+08 Mean x 90 Mean y 0.02905 RMS x 51.96 RMS y 0.1063 1800 x 1800 Grid PANDA STT+MVD Mitglied der Helmholtz-Gemeinschaft 23 68 (x,y) points r α Algorithm: Hough Transform
  81. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Remarks Two Implementations •

    Thrust (CUDA‘s STL) – Performance: 3 ms/evt – Reduce to set of standard routines • Fast (uses Thrust‘s optimized algorithms) • Inflexible (hard to customize) • Not yet at performance maximum • Plain CUDA – Performance: 0.5 ms/evt – Built completely for this task • Fitting for PANDA; customizable • A bit more complicated at parts • 24 Peakfinding challenging
  82. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Remarks 25 ° /

    α 0 20 40 60 80 100 120 140 160 180 r -30 -20 -10 0 10 20 30 40 HoughHist Entries 9000 Mean x 89.33 Mean y 6.66 RMS x 51.8 RMS y 19.2 0 2 4 6 8 10 12 14 16 18 HoughHist Entries 9000 Mean x 89.33 Mean y 6.66 RMS x 51.8 RMS y 19.2 HT histogram Hill Climber Peakfinding challenging
  83. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Remarks 25 ° /

    α 0 20 40 60 80 100 120 140 160 180 r -30 -20 -10 0 10 20 30 40 houghIt0 Entries 9000 Mean x 89.33 Mean y 6.66 RMS x 51.8 RMS y 19.2 0 2 4 6 8 10 12 14 16 18 houghIt0 Entries 9000 Mean x 89.33 Mean y 6.66 RMS x 51.8 RMS y 19.2 HT histogram ° / α 0 20 40 60 80 100 120 140 160 180 r -30 -20 -10 0 10 20 30 40 houghIt1 Entries 5580 Mean x 89.6 Mean y 9.719 RMS x 51.78 RMS y 18.09 0 2 4 6 8 10 12 14 16 houghIt1 Entries 5580 Mean x 89.6 Mean y 9.719 RMS x 51.78 RMS y 18.09 HT histogram ° / α 0 20 40 60 80 100 120 140 160 180 r -30 -20 -10 0 10 20 30 houghIt2 Entries 2700 Mean x 89.13 Mean y 13.79 RMS x 51.77 RMS y 14.04 0 2 4 6 8 10 12 houghIt2 Entries 2700 Mean x 89.13 Mean y 13.79 RMS x 51.77 RMS y 14.04 HT histogram -40 -30 -20 -10 0 10 20 30 40 0 5 10 15 20 25 30 Iterative Maximum Deleter Peakfinding challenging
  84. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Remarks 25 ° /

    α 0 20 40 60 80 100 120 140 160 180 r -30 -20 -10 0 10 20 30 40 houghIt0 Entries 9000 Mean x 89.33 Mean y 6.66 RMS x 51.8 RMS y 19.2 0 2 4 6 8 10 12 14 16 18 houghIt0 Entries 9000 Mean x 89.33 Mean y 6.66 RMS x 51.8 RMS y 19.2 HT histogram ° / α 0 20 40 60 80 100 120 140 160 180 r -30 -20 -10 0 10 20 30 40 houghIt1 Entries 5580 Mean x 89.6 Mean y 9.719 RMS x 51.78 RMS y 18.09 0 2 4 6 8 10 12 14 16 houghIt1 Entries 5580 Mean x 89.6 Mean y 9.719 RMS x 51.78 RMS y 18.09 HT histogram ° / α 0 20 40 60 80 100 120 140 160 180 r -30 -20 -10 0 10 20 30 houghIt2 Entries 2700 Mean x 89.13 Mean y 13.79 RMS x 51.77 RMS y 14.04 0 2 4 6 8 10 12 houghIt2 Entries 2700 Mean x 89.13 Mean y 13.79 RMS x 51.77 RMS y 14.04 HT histogram -40 -30 -20 -10 0 10 20 30 40 0 5 10 15 20 25 30 Iterative Maximum Deleter Peakfinding challenging current research
  85. Mitglied der Helmholtz-Gemeinschaft 27 Riemann Track Finder — Method •

    Idea: Don‘t fit lines (in 2D), fit planes (in 3D)! • Create seeds – All possible three hit combinations • Grow seeds to tracks Continuously test next hit if it fits – Use mapping to Riemann paraboloid (+ s-z fit, det. layer) x x x x y z‘ x x x y x x x x y x More on: Seeds; Growing 1 2
  86. Mitglied der Helmholtz-Gemeinschaft 28 Riemann Track Finder — GPU Adaptations

    CPU GPU 3 loops to generate seeds serially for (int i = 0; i < hitsInLayerOne.size(); i++) { for (int j = 0; j < hitsInLayerTwo.size(); j++) { for (int k = 0; k < hitsInLayerThree.size(); k++) { /* Triplet Generation */ } } } Needed: Mapping of inherent GPU indexing variable to triplet index int ijk = threadIdx.x + blockIdx.x * blockDim.x; nLayerx = 1 2 ⇣p 8x + 1 1 ⌘ pos ( nLayerx ) = 3 pp 3 p 243x2 1 + 27x 32 / 3 + 1 3 p 3 3 pp 3 p 243x2 1 + 27x 1 1
  87. Mitglied der Helmholtz-Gemeinschaft 28 Riemann Track Finder — GPU Adaptations

    CPU GPU 3 loops to generate seeds serially for (int i = 0; i < hitsInLayerOne.size(); i++) { for (int j = 0; j < hitsInLayerTwo.size(); j++) { for (int k = 0; k < hitsInLayerThree.size(); k++) { /* Triplet Generation */ } } } Needed: Mapping of inherent GPU indexing variable to triplet index int ijk = threadIdx.x + blockIdx.x * blockDim.x; nLayerx = 1 2 ⇣p 8x + 1 1 ⌘ pos ( nLayerx ) = 3 pp 3 p 243x2 1 + 27x 32 / 3 + 1 3 p 3 3 pp 3 p 243x2 1 + 27x 1 1 2 Port of CPU code; parallelism on seed base Only easy computations; e.g. 3x3 matrices
  88. Mitglied der Helmholtz-Gemeinschaft 28 Riemann Track Finder — GPU Adaptations

    CPU GPU → 100 × faster than CPU version: ~0.6 ms/event Still needs implementing into PandaRoot 3 loops to generate seeds serially for (int i = 0; i < hitsInLayerOne.size(); i++) { for (int j = 0; j < hitsInLayerTwo.size(); j++) { for (int k = 0; k < hitsInLayerThree.size(); k++) { /* Triplet Generation */ } } } Needed: Mapping of inherent GPU indexing variable to triplet index int ijk = threadIdx.x + blockIdx.x * blockDim.x; nLayerx = 1 2 ⇣p 8x + 1 1 ⌘ pos ( nLayerx ) = 3 pp 3 p 243x2 1 + 27x 32 / 3 + 1 3 p 3 3 pp 3 p 243x2 1 + 27x 1 1 2 Port of CPU code; parallelism on seed base Only easy computations; e.g. 3x3 matrices
  89. Mitglied der Helmholtz-Gemeinschaft 30 Triplet Finder • Algorithm specifically designed

    for the PANDA Straw Tube Tracker (STT) • Ported to GPU by Andrew Adinetz – CUDA, Dynamic Parallelism, Thrust – Quality of tracks comparable to CPU http://www.fz-juelich.de/ias/jsc/ Original algorithm by Marius Mertens et al 1.5 m
  90. Mitglied der Helmholtz-Gemeinschaft 31 Triplet Finder • Idea: Use only

    subset of detector as seed – Don‘t use STT isochrones (drift times) – Calculate circle from 3 points (no fit) More
  91. Mitglied der Helmholtz-Gemeinschaft STT — Drift Tubes and t0 32

    Particle ionizes gas atoms in drift tubes
  92. Mitglied der Helmholtz-Gemeinschaft STT — Drift Tubes and t0 32

    Particle ionizes gas atoms in drift tubes Electrons drift to anode wire, ions to wall
  93. Mitglied der Helmholtz-Gemeinschaft STT — Drift Tubes and t0 32

    Particle ionizes gas atoms in drift tubes Electrons drift to anode wire, ions to wall Signal only when electrons arrive at wire No information about drift duration! For that, start time (t0) needed: t0 - tarrival ≈ tdrift vdrift = const → tdrift • vdrift = risochrone
  94. Mitglied der Helmholtz-Gemeinschaft STT — Drift Tubes and t0 32

    Particle ionizes gas atoms in drift tubes Electrons drift to anode wire, ions to wall Signal only when electrons arrive at wire No information about drift duration! For that, start time (t0) needed: t0 - tarrival ≈ tdrift vdrift = const → tdrift • vdrift = risochrone risochrone
  95. Mitglied der Helmholtz-Gemeinschaft STT — Drift Tubes and t0 32

    Particle ionizes gas atoms in drift tubes Resolution without t0: (0.1 cm) (rtube) Resolution with t0: (0.015 cm) Electrons drift to anode wire, ions to wall Signal only when electrons arrive at wire No information about drift duration! For that, start time (t0) needed: t0 - tarrival ≈ tdrift vdrift = const → tdrift • vdrift = risochrone risochrone
  96. Mitglied der Helmholtz-Gemeinschaft STT — Drift Tubes and t0 32

    Particle ionizes gas atoms in drift tubes Resolution without t0: (0.1 cm) (rtube) Resolution with t0: (0.015 cm) Usual HEP experiment: t0 by trigger But PANDA has no trigger… Electrons drift to anode wire, ions to wall Signal only when electrons arrive at wire No information about drift duration! For that, start time (t0) needed: t0 - tarrival ≈ tdrift vdrift = const → tdrift • vdrift = risochrone risochrone
  97. Mitglied der Helmholtz-Gemeinschaft 33 Triplet Finder • Idea: Use only

    subset of detector as seed – Don‘t use STT isochrones (drift times) – Calculate circle from 3 points (no fit) More
  98. Mitglied der Helmholtz-Gemeinschaft 33 Triplet Finder • Idea: Use only

    subset of detector as seed – Don‘t use STT isochrones (drift times) – Calculate circle from 3 points (no fit) • Features – Fast & robust algorithm, no event time needed – Many tuning possibilities More
  99. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Animation 34 Triplet Isochrone

    early Isochrone early & skewed Isochrone close Isochrone late MVD hit Track timed out Track current
  100. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Optimizations • Bunching Wrapper

    – Hits from one event have similar timestamps – Combine hits to sets (bunches) which occupy GPU best 36
  101. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Optimizations • Bunching Wrapper

    – Hits from one event have similar timestamps – Combine hits to sets (bunches) which occupy GPU best 36 Hit
  102. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Optimizations • Bunching Wrapper

    – Hits from one event have similar timestamps – Combine hits to sets (bunches) which occupy GPU best 36 Hit Event
  103. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Optimizations • Bunching Wrapper

    – Hits from one event have similar timestamps – Combine hits to sets (bunches) which occupy GPU best 36 Hit Event
  104. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Optimizations • Bunching Wrapper

    – Hits from one event have similar timestamps – Combine hits to sets (bunches) which occupy GPU best 36 Hit Event Bunch
  105. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Optimizations • Bunching Wrapper

    – Hits from one event have similar timestamps – Combine hits to sets (bunches) which occupy GPU best 36 Hit Event Bunch (N2) → (N)
  106. Dynamic Parallelism Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Optimizations •

    Compare data processing strategies 38 1 thread/bunch Calling kernel 1 thread/bunch Calling kernel Triplet Finder 1 thread/bunch Calling kernel 1 block/bunch Joined kernel 1 block/bunch Joined kernel 1 block/bunch Joined kernel TF Stage #1 TF Stage #2 TF Stage #3 TF Stage #4 1 stream/bunch Combining stream 1 stream/bunch Combining stream 1 stream/bunch Calling stream Joined Kernel Host Streams Triplet Finder Triplet Finder CPU GPU TF Stage #1 TF Stage #2 TF Stage #3 TF Stage #4 TF Stage #1 TF Stage #2 TF Stage #3 TF Stage #4
  107. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Binning: Sector Rows 41

    • Sector Row testing – After found track: Hit association not with all hits of current window, but only with subset (first test rows of sector, then hits of row) More
  108. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Binning: Sector Rows 41

    • Sector Row testing – After found track: Hit association not with all hits of current window, but only with subset (first test rows of sector, then hits of row) More
  109. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Binning: Sector Rows 41

    • Sector Row testing – After found track: Hit association not with all hits of current window, but only with subset (first test rows of sector, then hits of row) More
  110. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Binning: Sector Rows 41

    • Sector Row testing – After found track: Hit association not with all hits of current window, but only with subset (first test rows of sector, then hits of row) More
  111. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Binning: Sector Rows 41

    • Sector Row testing – After found track: Hit association not with all hits of current window, but only with subset (first test rows of sector, then hits of row) More
  112. Mitglied der Helmholtz-Gemeinschaft 42 Triplet Finder — Binning: Sector Rows

    K20X in publication All Tubes (No Binning) Sector-Row Binning
  113. Mitglied der Helmholtz-Gemeinschaft 43 Triplet Finder — Binning: Skewlets K20X

    in publication Skewlet Binning All Skewlets (No Binning)
  114. Mitglied der Helmholtz-Gemeinschaft 46 Triplet Finder — Clock Speed /

    GPU K40 3004 MHz, 745 MHz / 875 MHz K20X 2600 MHz, 732 MHz / 784 MHz Memory Clock Core Clock GPU Boost in publication
  115. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Summary • Best performance:

    14 µs/event – 14⋅10-6 s/event * 2⋅107 event/s 㱺 280 GPUs2014 – PANDA2019: Multi GPU system – (100) GPUs • Optimizations possible & needed – ε needs to be improved – Speed, €: • More float less double-cards a la K10 • Consumer-grade cards a la GTX 47
  116. Mitglied der Helmholtz-Gemeinschaft Summary • PANDA researches in using GPUs

    as part of online event reconstruction scheme • Algorithms in active evaluation and optimization – Triplet Finder performance-optimized • Data transfer to GPU in research: FairMQ 48
  117. Thank you! Andreas Herten [email protected] Mitglied der Helmholtz-Gemeinschaft Summary •

    PANDA researches in using GPUs as part of online event reconstruction scheme • Algorithms in active evaluation and optimization – Triplet Finder performance-optimized • Data transfer to GPU in research: FairMQ 48
  118. Mitglied der Helmholtz-Gemeinschaft List of Resources Used • #3: Jupiter

    icon by Nikki Rodriguez from The Noun Project • #3: Einstein icon by Roman Rusinov from The Noun Project • #6: FAIR vector logo from official FAIR website • #6: FAIR rendering from official website • #11: Flare Gun icon by Jop van der Kroef from The Noun Project • #27: STT event animation by Marius C. Mertens • #35: Graphics cards images by NVIDIA promotion • #35: GPU Specifications – Tesla K20X Specifications: http://www.nvidia.com/content/PDF/kepler/Tesla- K20X-BD-06397-001-v07.pdf – Tesla K40 Specifications: http://www.nvidia.com/content/PDF/kepler/Tesla-K40- Active-Board-Spec-BD-06949-001_v03.pdf – Tesla Familiy Overview: http://www.nvidia.com/content/tesla/pdf/NVIDIA-Tesla- Kepler-Family-Datasheet.pdf 49
  119. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Principle 51 x y

    * * (r, α)1 rij = cos ↵j · xi + sin ↵j · yi + ⇢i Back
  120. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Principle 51 x y

    * * r α (r, α)1 rij = cos ↵j · xi + sin ↵j · yi + ⇢i Back
  121. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Principle 51 x y

    * * r α (r, α)1 rij = cos ↵j · xi + sin ↵j · yi + ⇢i Back
  122. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Principle 51 x y

    * * r α (r, α)1 (r, α)2 rij = cos ↵j · xi + sin ↵j · yi + ⇢i Back
  123. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Principle 51 x y

    * * r α rij = cos ↵j · xi + sin ↵j · yi + ⇢i Back
  124. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Principle 51 x y

    * * r α rij = cos ↵j · xi + sin ↵j · yi + ⇢i Back
  125. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Principle 51 x y

    * * r α rij = cos ↵j · xi + sin ↵j · yi + ⇢i Back
  126. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Principle 51 x y

    * * r α rij = cos ↵j · xi + sin ↵j · yi + ⇢i Back
  127. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Principle 51 x y

    * * r α rij = cos ↵j · xi + sin ↵j · yi + ⇢i Back
  128. Mitglied der Helmholtz-Gemeinschaft Hough Transform — Principle 51 x y

    → Bin with highest multiplicity gives track parameters * * r α rij = cos ↵j · xi + sin ↵j · yi + ⇢i Back
  129. Mitglied der Helmholtz-Gemeinschaft 52 Riemann Algorithm — Procedure • Create

    triplet of hit points – All possible three hit combinations need to become triplets 1
  130. Mitglied der Helmholtz-Gemeinschaft 52 Riemann Algorithm — Procedure • Create

    triplet of hit points – All possible three hit combinations need to become triplets • Grow triplets to tracks: Continuously test next hit if it fits to triplet track – Use Riemann paraboloid to circle fit track • Test closeness of new hit: good → add hit; bad → dismiss hit • Continue with next hit – Helix fit: arc length s vs. z position 1 2
  131. Mitglied der Helmholtz-Gemeinschaft 53 1 2 3 4 5 1

    2 3 4 5 Riemann Track Finder — 1 Seeds 1 Layer number Back
  132. Mitglied der Helmholtz-Gemeinschaft 53 1 2 3 4 5 1

    2 3 4 5 Riemann Track Finder — 1 Seeds 1 Layer number Back
  133. Mitglied der Helmholtz-Gemeinschaft 53 1 2 3 4 5 1

    2 3 4 5 Riemann Track Finder — 1 Seeds 1 Layer number Back
  134. Mitglied der Helmholtz-Gemeinschaft 53 1 2 3 4 5 21

    11 31 1 2 3 4 5 Riemann Track Finder — 1 Seeds 1 Layer number Back
  135. Mitglied der Helmholtz-Gemeinschaft 53 1 2 3 4 5 21

    11 31 31 11 41 1 2 3 4 5 Riemann Track Finder — 1 Seeds 1 Layer number Back
  136. Mitglied der Helmholtz-Gemeinschaft 53 1 2 3 4 5 21

    11 31 31 11 41 31 11 32 1 2 3 4 5 Riemann Track Finder — 1 Seeds 1 Layer number Back
  137. Mitglied der Helmholtz-Gemeinschaft 53 1 2 3 4 5 21

    11 31 31 11 41 31 11 32 1 2 3 4 5 Riemann Track Finder — 1 Seeds 1 Layer number Back
  138. Mitglied der Helmholtz-Gemeinschaft 54 Riemann Algorithm — 1 Expansion 2

    x x x x y z‘ Expand to z‘ x x x y x Riemann Surface (paraboloid) Back
  139. Mitglied der Helmholtz-Gemeinschaft 54 Riemann Algorithm — 1 Expansion 2

    x x x x y z‘ Expand to z‘ x x x y x Riemann Surface (paraboloid) Back
  140. Mitglied der Helmholtz-Gemeinschaft 54 Riemann Algorithm — 1 Expansion 2

    x x x x y z‘ Expand to z‘ x x x y x Riemann Surface (paraboloid) Back
  141. Mitglied der Helmholtz-Gemeinschaft 54 Riemann Algorithm — 1 Expansion 2

    x x x x y z‘ Expand to z‘ x x x y x Riemann Surface (paraboloid) Back
  142. Mitglied der Helmholtz-Gemeinschaft 54 Riemann Algorithm — 1 Expansion 2

    x x x x y z‘ Expand to z‘ x x x y x Riemann Surface (paraboloid) Back
  143. Mitglied der Helmholtz-Gemeinschaft 54 Riemann Algorithm — 1 Expansion 2

    x x x x y z‘ Expand to z‘ x x x y x Riemann Surface (paraboloid) x Back
  144. Mitglied der Helmholtz-Gemeinschaft 54 Riemann Algorithm — 1 Expansion 2

    x x x x y z‘ Expand to z‘ x x x y x Riemann Surface (paraboloid) x Back
  145. Mitglied der Helmholtz-Gemeinschaft 54 Riemann Algorithm — 1 Expansion 2

    x x x x y z‘ Expand to z‘ x x x y x Riemann Surface (paraboloid) x Back
  146. Mitglied der Helmholtz-Gemeinschaft 54 Riemann Algorithm — 1 Expansion 2

    x x x x y z‘ Expand to z‘ x x x y x Riemann Surface (paraboloid) x Back
  147. Mitglied der Helmholtz-Gemeinschaft 54 Riemann Algorithm — 1 Expansion 2

    x x x x y z‘ Expand to z‘ x x x y x Riemann Surface (paraboloid) x Back
  148. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Method • STT hit

    in pivot straw • Find surrounding hits → Create virtual hit (triplet) at center of gravity (cog) 55 STT More
  149. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Method • STT hit

    in pivot straw • Find surrounding hits → Create virtual hit (triplet) at center of gravity (cog) • Combine with 55 STT More
  150. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Method • STT hit

    in pivot straw • Find surrounding hits → Create virtual hit (triplet) at center of gravity (cog) • Combine with 1.Second STT pivot-cog virtual hit 55 STT More
  151. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Method • STT hit

    in pivot straw • Find surrounding hits → Create virtual hit (triplet) at center of gravity (cog) • Combine with 1.Second STT pivot-cog virtual hit 55 STT More
  152. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Method • STT hit

    in pivot straw • Find surrounding hits → Create virtual hit (triplet) at center of gravity (cog) • Combine with 1.Second STT pivot-cog virtual hit 55 STT More
  153. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Method • STT hit

    in pivot straw • Find surrounding hits → Create virtual hit (triplet) at center of gravity (cog) • Combine with 1.Second STT pivot-cog virtual hit 2.Interaction point 55 Interaction Point STT More
  154. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Method • STT hit

    in pivot straw • Find surrounding hits → Create virtual hit (triplet) at center of gravity (cog) • Combine with 1.Second STT pivot-cog virtual hit 2.Interaction point • Calculate circle through three points 55 Interaction Point STT More
  155. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Method • STT hit

    in pivot straw • Find surrounding hits → Create virtual hit (triplet) at center of gravity (cog) • Combine with 1.Second STT pivot-cog virtual hit 2.Interaction point • Calculate circle through three points → Track Candidate 55 Interaction Point STT More
  156. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Optimizations • Sector Row

    testing – Thicken track; shrink sector row layer to line – Find intersection 56 Sector-Row Testing Track Sector-Row Track Sector-Row Back
  157. Mitglied der Helmholtz-Gemeinschaft Triplet Finder — Kernel Launch Strategies •

    Joined Kernel (JK): slowest – High # registers → low occupancy • Dynamic Parallelism (DP) / Host Streams (HS): comparable performance – Performance • HS faster for small # processed hits, DP faster for > 45000 hits • HS stagnates there, while DP continues rising – Limiting factor • High # of required kernel calls • Kernel launch latency • Memcopy – HS more affected by this, because • More PCI-E transfers (launch configurations for kernels) • Less launch throughput, kernel launch latency gets more important • False dependencies of launched kernels – Single CPU thread handles all CUDA streams (Multi-thread possible, but synchronization overhead too high for good performance) – Grid scheduling done on hardware (Grid Management Unit) (DP: software) » False dependencies when N(streams) > N(device connections)=323.5 57 Back