Upgrade to Pro — share decks privately, control downloads, hide ads and more …

GPU-based Online Tracking for the PANDA Experiment [NV App Lab 2014]

AndiH
October 17, 2014

GPU-based Online Tracking for the PANDA Experiment [NV App Lab 2014]

Talk for the NVIDIA Application Lab workshop 2014 at the Jülich Supercomputing Centre.

AndiH

October 17, 2014
Tweet

More Decks by AndiH

Other Decks in Science

Transcript

  1. Mitglied der Helmholtz-Gemeinschaft
    1
    NVIDIA Application Lab Workshop 2014
    8 October 2014, Andreas Herten
    GPU-based Online Tracking for
    the PANDA Experiment

    View full-size slide

  2. Mitglied der Helmholtz-Gemeinschaft
    Outline
    • PANDA
    – HEP
    – Our Experiment
    – Online Event Filter
    • Algorithms
    – Hough Transform
    – Riemann Track Finder
    – Triplet Finder
    2

    View full-size slide

  3. Mitglied der Helmholtz-Gemeinschaft
    High Energy Physics
    • High Energy Physics (HEP):

    3
    HEP Recipe
    1. Accelerate particles (e, p,…)
    2. Accelerate particles more!
    3. Smash into each other
    4. Look at resulting particles
    5. Understand universe

    View full-size slide

  4. Mitglied der Helmholtz-Gemeinschaft
    High Energy Physics
    • High Energy Physics (HEP):

    3
    HEP Recipe
    1. Accelerate particles (e, p,…)
    2. Accelerate particles more!
    3. Smash into each other
    4. Look at resulting particles
    5. Understand universe

    View full-size slide

  5. Mitglied der Helmholtz-Gemeinschaft
    High Energy Physics
    • High Energy Physics (HEP):

    3
    HEP Recipe
    1. Accelerate particles (e, p,…)
    2. Accelerate particles more!
    3. Smash into each other
    4. Look at resulting particles
    5. Understand universe

    View full-size slide

  6. Mitglied der Helmholtz-Gemeinschaft
    High Energy Physics
    • High Energy Physics (HEP):

    3
    HEP Recipe
    1. Accelerate particles (e, p,…)
    2. Accelerate particles more!
    3. Smash into each other
    4. Look at resulting particles
    5. Understand universe

    View full-size slide

  7. Mitglied der Helmholtz-Gemeinschaft
    High Energy Physics
    • High Energy Physics (HEP):

    3
    HEP Recipe
    1. Accelerate particles (e, p,…)
    2. Accelerate particles more!
    3. Smash into each other
    4. Look at resulting particles
    5. Understand universe
    E=mc2

    View full-size slide

  8. Mitglied der Helmholtz-Gemeinschaft
    High Energy Physics
    • High Energy Physics (HEP):

    3
    HEP Recipe
    1. Accelerate particles (e, p,…)
    2. Accelerate particles more!
    3. Smash into each other
    4. Look at resulting particles
    5. Understand universe
    E=mc2

    View full-size slide

  9. Mitglied der Helmholtz-Gemeinschaft
    High Energy Physics
    • High Energy Physics (HEP):

    3
    HEP Recipe
    1. Accelerate particles (e, p,…)
    2. Accelerate particles more!
    3. Smash into each other
    4. Look at resulting particles
    5. Understand universe

    E=mc2

    View full-size slide

  10. Mitglied der Helmholtz-Gemeinschaft
    High Energy Physics
    • High Energy Physics (HEP):

    3
    HEP Recipe
    1. Accelerate particles (e, p,…)
    2. Accelerate particles more!
    3. Smash into each other
    4. Look at resulting particles
    5. Understand universe

    – GPUs are interesting for HEP
    • Many events due to high collision rate
    • Events independent, dividable into subsets
    • Many features extractable (computational intensive)
    E=mc2

    View full-size slide

  11. Mitglied der Helmholtz-Gemeinschaft
    FAIR
    • Facility for Antiproton and Ion Research
    – New accelerator complex (Darmstadt, Germany)
    – Next to GSI laboratory
    – Construction in progress, ending 2018
    – Four pillars of research:
    4
    APPA NUSTAR CBM PANDA
    Atom & plasma
    physics
    Nuclear structure,
    astro physics
    Hadron physics Hadron physics

    View full-size slide

  12. Mitglied der Helmholtz-Gemeinschaft
    FAIR
    • Facility for Antiproton and Ion Research
    – New accelerator complex (Darmstadt, Germany)
    – Next to GSI laboratory
    – Construction in progress, ending 2018
    – Four pillars of research:
    4
    APPA NUSTAR CBM PANDA
    Atom & plasma
    physics
    Nuclear structure,
    astro physics
    Hadron physics Hadron physics
    © Google Maps

    View full-size slide

  13. Mitglied der Helmholtz-Gemeinschaft
    FAIR
    • Facility for Antiproton and Ion Research
    – New accelerator complex (Darmstadt, Germany)
    – Next to GSI laboratory
    – Construction in progress, ending 2018
    – Four pillars of research:
    4
    APPA NUSTAR CBM PANDA
    Atom & plasma
    physics
    Nuclear structure,
    astro physics
    Hadron physics Hadron physics

    View full-size slide

  14. Mitglied der Helmholtz-Gemeinschaft
    FAIR
    • Facility for Antiproton and Ion Research
    – New accelerator complex (Darmstadt, Germany)
    – Next to GSI laboratory
    – Construction in progress, ending 2018
    – Four pillars of research:
    4
    APPA NUSTAR CBM PANDA
    Atom & plasma
    physics
    Nuclear structure,
    astro physics
    Hadron physics Hadron physics
    fair-center.eu

    View full-size slide

  15. Mitglied der Helmholtz-Gemeinschaft
    Mitglied der Helmholtz-Gemeinschaft
    FAIR Accelerator Complex
    5
    Existing facilities
    New facilities
    Experiments

    View full-size slide

  16. Mitglied der Helmholtz-Gemeinschaft
    Mitglied der Helmholtz-Gemeinschaft
    FAIR Accelerator Complex
    5
    Existing facilities
    New facilities
    Experiments
    Beam momentum: p = 1.5 - 15 GeV/c
    High resolution: L = 1031 cm-2s-1 Δp/p < 4·10-5
    High luminosity: L = 2·1032 cm-2s-1 Δp/p < 10-4
    High Energy Storage Ring

    View full-size slide

  17. Mitglied der Helmholtz-Gemeinschaft
    PANDA — The Experiment
    6
    13 m

    View full-size slide

  18. Mitglied der Helmholtz-Gemeinschaft
    PANDA — The Experiment
    6
    13 m
    p
    p

    View full-size slide

  19. Mitglied der Helmholtz-Gemeinschaft
    PANDA — The Experiment
    6
    13 m
    p
    p
    Magnet
    STT
    MVD

    View full-size slide

  20. Mitglied der Helmholtz-Gemeinschaft
    PANDA — The Experiment
    6
    13 m
    p
    p
    Magnet
    STT
    MVD

    View full-size slide

  21. Mitglied der Helmholtz-Gemeinschaft
    PANDA — MVD & STT
    Micro Vertex Detector
    • Silicon-based pixel + strip detector
    • 10 000 000 + 200 000 channels
    • Vertex resolution: < 100 μm
    7
    Straw Tube Tracker
    • 4636 small drift tubes (ø 1 cm)
    • Drift times: < 250 ns
    • 26 layers, 8 skewed
    • Material budget: 1.2 % radiation length

    View full-size slide

  22. Mitglied der Helmholtz-Gemeinschaft
    PANDA — Physics
    • Meson spectroscopy
    – Light mesons
    – Charmonium
    – Open charm
    – Exotic states
    • Glueballs
    • Hybrids
    • Molecules/multiquarks
    • Baryon production
    • Nucleon structure,
    e.m. processes
    • Charm in nuclei
    • Strangeness physics
    8
    0 2 4 6 8 12 15
    10
    p Momentum / GeV/c
    Mass / GeV/c2
    1 2 3 4 5 6
    ΛΛ
    ΣΣ
    ΞΞ
    Λc
    Λc
    Σc
    Σc
    Ξc
    Ξc
    Ωc
    Ωc
    ΩΩ DD
    Ds
    Ds
    ggg,gg
    light qq
    π,ρ,ω,f2
    ,K,K*
    cc
    J/ψ, ηc
    , χcJ
    qqqq ccqq
    nng,ssg ccg
    nng,ssg ccg
    ggg

    View full-size slide

  23. Mitglied der Helmholtz-Gemeinschaft
    PANDA — Physics
    • Meson spectroscopy
    – Light mesons
    – Charmonium
    – Open charm
    – Exotic states
    • Glueballs
    • Hybrids
    • Molecules/multiquarks
    • Baryon production
    • Nucleon structure,
    e.m. processes
    • Charm in nuclei
    • Strangeness physics
    8
    → Broad physics program
    0 2 4 6 8 12 15
    10
    p Momentum / GeV/c
    Mass / GeV/c2
    1 2 3 4 5 6
    ΛΛ
    ΣΣ
    ΞΞ
    Λc
    Λc
    Σc
    Σc
    Ξc
    Ξc
    Ωc
    Ωc
    ΩΩ DD
    Ds
    Ds
    ggg,gg
    light qq
    π,ρ,ω,f2
    ,K,K*
    cc
    J/ψ, ηc
    , χcJ
    qqqq ccqq
    nng,ssg ccg
    nng,ssg ccg
    ggg

    View full-size slide

  24. Mitglied der Helmholtz-Gemeinschaft
    PANDA — Event Reconstruction
    • Continuous read out
    – Novel feature
    – Background & signal similar
    – No hardware trigger based on few sub-detectors,
    but online event reconstruction using full detector information
    9
    (Reject background events, save interesting events)
    Reduction
    Amount:
    Time:
    ~1/1000
    50 ns/evt
    Storage space for
    offline analysis
    3 PB/y
    Event:
    Raw data:
    2 × 107/s
    200 GB/s
    Rate

    View full-size slide

  25. Mitglied der Helmholtz-Gemeinschaft
    PANDA — Read Out Scheme
    10

    View full-size slide

  26. Mitglied der Helmholtz-Gemeinschaft
    PANDA — Read Out Scheme
    10

    View full-size slide

  27. Mitglied der Helmholtz-Gemeinschaft
    PANDA — Read Out Scheme
    Requirements to
    Online Tracking
    • Fast
    • Sophisticated
    algorithms possible;
    reprogrammable
    • Parallelism beyond
    single devices
    • Fast
    • Limited precision ok
    10

    View full-size slide

  28. Mitglied der Helmholtz-Gemeinschaft
    PANDA — Read Out Scheme
    Requirements to
    Online Tracking
    • Fast
    • Sophisticated
    algorithms possible;
    reprogrammable
    • Parallelism beyond
    single devices
    • Fast
    • Limited precision ok
    10
    GPUs

    View full-size slide

  29. Mitglied der Helmholtz-Gemeinschaft
    11
    PANDA — Online Tracking Example
    pp → ψ‘→ ψ π+ π-
    The physics side:
    Antiproton-proton event
    e+e-

    View full-size slide

  30. Mitglied der Helmholtz-Gemeinschaft
    11
    PANDA — Online Tracking Example
    pp → ψ‘→ ψ π+ π-
    The physics side:
    Antiproton-proton event
    e+e-

    View full-size slide

  31. Mitglied der Helmholtz-Gemeinschaft
    11
    PANDA — Online Tracking Example
    pp → ψ‘→ ψ π+ π-
    The physics side:
    Antiproton-proton event
    e+e-

    View full-size slide

  32. Mitglied der Helmholtz-Gemeinschaft
    11
    PANDA — Online Tracking Example
    pp → ψ‘→ ψ π+ π-
    The physics side:
    Antiproton-proton event
    e+e-

    View full-size slide

  33. Mitglied der Helmholtz-Gemeinschaft
    11
    PANDA — Online Tracking Example
    π+
    π-
    e+
    e-
    ψ‘
    pp → ψ‘→ ψ π+ π-
    The physics side:
    Antiproton-proton event
    e+e-

    View full-size slide

  34. Mitglied der Helmholtz-Gemeinschaft
    11
    PANDA — Online Tracking Example
    π+
    π-
    e+
    e-
    ψ‘
    pp → ψ‘→ ψ π+ π-
    The physics side:
    Antiproton-proton event
    e+e-

    View full-size slide

  35. Mitglied der Helmholtz-Gemeinschaft
    11
    PANDA — Online Tracking Example
    π+
    π-
    e+
    e-
    ψ‘
    pp → ψ‘→ ψ π+ π-
    The physics side:
    Antiproton-proton event
    e+e-

    View full-size slide

  36. Mitglied der Helmholtz-Gemeinschaft
    11
    PANDA — Online Tracking Example
    pp → ψ‘→ ψ π+ π-
    The physics side:
    Antiproton-proton event
    e+e-

    View full-size slide

  37. Mitglied der Helmholtz-Gemeinschaft
    12
    PANDA — Online Tracking Example
    The detector side
    Everything in reverse

    View full-size slide

  38. Mitglied der Helmholtz-Gemeinschaft
    12
    PANDA — Online Tracking Example
    The detector side
    Everything in reverse

    View full-size slide

  39. Mitglied der Helmholtz-Gemeinschaft
    12
    PANDA — Online Tracking Example
    The detector side
    Everything in reverse
    Particle tracks are curves*

    View full-size slide

  40. Mitglied der Helmholtz-Gemeinschaft
    12
    PANDA — Online Tracking Example
    The detector side
    Everything in reverse
    Particle tracks are curves*
    actually: 3D helices

    View full-size slide

  41. Mitglied der Helmholtz-Gemeinschaft
    12
    PANDA — Online Tracking Example
    The detector side
    Everything in reverse
    Particle tracks are curves*
    → Find curves connecting
    hit points!

    View full-size slide

  42. Mitglied der Helmholtz-Gemeinschaft
    12
    PANDA — Online Tracking Example
    The detector side
    Everything in reverse
    Particle tracks are curves*
    → Find curves connecting
    hit points!

    View full-size slide

  43. Mitglied der Helmholtz-Gemeinschaft
    12
    PANDA — Online Tracking Example
    The detector side
    Everything in reverse
    Particle tracks are curves*
    → Find curves connecting
    hit points!
    Sort by track quality
    Hits well matched?
    How many hits?

    View full-size slide

  44. Mitglied der Helmholtz-Gemeinschaft
    12
    PANDA — Online Tracking Example
    The detector side
    Everything in reverse
    Particle tracks are curves*
    → Find curves connecting
    hit points!
    Sort by track quality
    Hits well matched?
    How many hits?

    View full-size slide

  45. Mitglied der Helmholtz-Gemeinschaft
    12
    PANDA — Online Tracking Example
    The detector side
    Everything in reverse
    Particle tracks are curves*
    → Find curves connecting
    hit points!
    Sort by track quality
    Hits well matched?
    How many hits?
    Identify final
    particles
    Curvature, length

    View full-size slide

  46. Mitglied der Helmholtz-Gemeinschaft
    12
    PANDA — Online Tracking Example
    The detector side
    Everything in reverse
    Particle tracks are curves*
    → Find curves connecting
    hit points!
    Sort by track quality
    Hits well matched?
    How many hits?
    Identify final
    particles
    Curvature, length

    π+
    π-
    e+
    e-
    ?

    View full-size slide

  47. Mitglied der Helmholtz-Gemeinschaft
    12
    PANDA — Online Tracking Example
    The detector side
    Everything in reverse
    Particle tracks are curves*
    → Find curves connecting
    hit points!
    Sort by track quality
    Hits well matched?
    How many hits?
    Identify final
    particles
    Curvature, length

    Identify intermediate
    particles
    Mass constraints
    Geometry

    π+
    π-
    e+
    e-
    ?
    ψ‘

    View full-size slide

  48. Mitglied der Helmholtz-Gemeinschaft
    12
    PANDA — Online Tracking Example
    The detector side
    Everything in reverse
    Particle tracks are curves*
    → Find curves connecting
    hit points!
    Sort by track quality
    Hits well matched?
    How many hits?
    Identify final
    particles
    Curvature, length

    Identify intermediate
    particles
    Mass constraints
    Geometry

    Identify process:
    pp → ψ‘ → e+e- π+ π-
    π+
    π-
    e+
    e-
    ?
    ψ‘

    View full-size slide

  49. Mitglied der Helmholtz-Gemeinschaft
    13
    PANDA — Triggering

    View full-size slide

  50. Mitglied der Helmholtz-Gemeinschaft
    13
    PANDA — Triggering
    Trigger

    View full-size slide

  51. Mitglied der Helmholtz-Gemeinschaft
    13
    PANDA — Triggering
    Trigger
    Fast detector layer(s)
    Trigger data acquisition

    View full-size slide

  52. Mitglied der Helmholtz-Gemeinschaft
    13
    PANDA — Triggering
    Trigger
    Fast detector layer(s)
    Trigger data acquisition
    π+
    π-
    e+
    e-
    ψ‘

    View full-size slide

  53. Mitglied der Helmholtz-Gemeinschaft
    13
    PANDA — Triggering
    Trigger
    Fast detector layer(s)
    Trigger data acquisition
    π+
    π-
    e+
    e-
    ψ‘
    Usual HEP experiment

    View full-size slide

  54. Mitglied der Helmholtz-Gemeinschaft
    13
    PANDA — Triggering
    Trigger
    Fast detector layer(s)
    Trigger data acquisition
    π+
    π-
    e+
    e-
    ψ‘
    Usual HEP experiment

    View full-size slide

  55. Mitglied der Helmholtz-Gemeinschaft
    13
    PANDA — Triggering
    Trigger
    Fast detector layer(s)
    Trigger data acquisition
    π+
    π-
    e+
    e-
    ψ‘
    Usual HEP experiment

    View full-size slide

  56. Mitglied der Helmholtz-Gemeinschaft
    13
    PANDA — Triggering
    Trigger
    Fast detector layer(s)
    Trigger data acquisition
    π+
    π-
    e+
    e-
    ψ‘
    Usual HEP experiment

    View full-size slide

  57. Mitglied der Helmholtz-Gemeinschaft
    13
    PANDA — Triggering
    Trigger
    Fast detector layer(s)
    Trigger data acquisition
    π+
    π-
    e+
    e-
    ψ‘
    PANDA
    Usual HEP experiment

    View full-size slide

  58. Mitglied der Helmholtz-Gemeinschaft
    13
    PANDA — Triggering
    Trigger
    Fast detector layer(s)
    Trigger data acquisition
    π+
    π-
    e+
    e-
    ψ‘
    PANDA
    Usual HEP experiment

    View full-size slide

  59. Mitglied der Helmholtz-Gemeinschaft
    13
    PANDA — Triggering
    Trigger
    Fast detector layer(s)
    Trigger data acquisition
    π+
    π-
    e+
    e-
    ψ‘
    PANDA
    Usual HEP experiment

    View full-size slide

  60. Mitglied der Helmholtz-Gemeinschaft
    13
    PANDA — Triggering
    Trigger
    Fast detector layer(s)
    Trigger data acquisition
    π+
    π-
    e+
    e-
    ψ‘
    PANDA
    Usual HEP experiment

    View full-size slide

  61. Mitglied der Helmholtz-Gemeinschaft
    13
    PANDA — Triggering
    Trigger
    Fast detector layer(s)
    Trigger data acquisition
    Online Tracking!
    π+
    π-
    e+
    e-
    ψ‘
    PANDA
    Usual HEP experiment

    View full-size slide

  62. Mitglied der Helmholtz-Gemeinschaft
    ALGORITHMS #1
    14

    View full-size slide

  63. Mitglied der Helmholtz-Gemeinschaft
    ALGORITHMS #1
    15
    Hough Transform
    Riemann Track Finder
    Triplet Finder

    View full-size slide

  64. Mitglied der Helmholtz-Gemeinschaft
    Algorithm: Hough Transform
    • Idea: Transform (x,y)i → (α,r)ij, find lines via (α,r) space
    • Solve rij line equation for
    – Lots of hits (x,y,ρ)i
    and
    – Many αj ∈ [0°,360°) each
    • Fill histogram
    • Extract track parameters
    16
    x
    y
    x
    y
    Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Princip
    → Bin
    giv
    r
    α

    View full-size slide

  65. Mitglied der Helmholtz-Gemeinschaft
    Algorithm: Hough Transform
    • Idea: Transform (x,y)i → (α,r)ij, find lines via (α,r) space
    • Solve rij line equation for
    – Lots of hits (x,y,ρ)i
    and
    – Many αj ∈ [0°,360°) each
    • Fill histogram
    • Extract track parameters
    16
    rij =
    cos
    ↵j
    ·
    xi +
    sin
    ↵j
    ·
    yi + ⇢i
    i: ~100 hits/event (STT)
    j: every 0.2° rij: 180 000
    x
    y
    x
    y
    Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Princip
    → Bin
    giv
    r
    α

    View full-size slide

  66. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Visualization Lines
    17
    • Create lines going through hit point (x,y)i
    – Line parameterized by rij = cos(αj) ⋅ xi + sin(αj) ⋅ yi
    • Fill line parameters (α,r)ij
    into histogram
    – Rasterize for many αj ∈ [0°,180°)
    α = 0°, 10°, 20°, …

    View full-size slide

  67. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Visualization Lines
    17
    • Create lines going through hit point (x,y)i
    – Line parameterized by rij = cos(αj) ⋅ xi + sin(αj) ⋅ yi
    • Fill line parameters (α,r)ij
    into histogram
    – Rasterize for many αj ∈ [0°,180°)
    α = 0°, 10°, 20°, …

    View full-size slide

  68. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Visualization Lines
    18
    • Create lines going through hit point (x,y)i
    – Line parameterized by rij = cos(αj) ⋅ xi + sin(αj) ⋅ yi
    • Fill line parameters (α,r)ij
    into histogram
    – Rasterize for many αj ∈ [0°,180°)
    α = 0°, 10°, 20°, …

    View full-size slide

  69. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Visualization Lines
    18
    • Create lines going through hit point (x,y)i
    – Line parameterized by rij = cos(αj) ⋅ xi + sin(αj) ⋅ yi
    • Fill line parameters (α,r)ij
    into histogram
    – Rasterize for many αj ∈ [0°,180°)
    α = 0°, 10°, 20°, …

    View full-size slide

  70. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Visualization Lines
    18
    • Create lines going through hit point (x,y)i
    – Line parameterized by rij = cos(αj) ⋅ xi + sin(αj) ⋅ yi
    • Fill line parameters (α,r)ij
    into histogram
    – Rasterize for many αj ∈ [0°,180°)
    α = 0°, 10°, 20°, …

    View full-size slide

  71. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Visualization Lines
    18
    • Create lines going through hit point (x,y)i
    – Line parameterized by rij = cos(αj) ⋅ xi + sin(αj) ⋅ yi
    • Fill line parameters (α,r)ij
    into histogram
    – Rasterize for many αj ∈ [0°,180°)
    α = 0°, 10°, 20°, …

    View full-size slide

  72. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Visualization Points
    19
    • Create lines going through hit point (x,y)i
    • Repeat for every hit point i
    α = 0°, 10°, 20°, …

    View full-size slide

  73. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Visualization Points
    19
    • Create lines going through hit point (x,y)i
    • Repeat for every hit point i
    α = 0°, 10°, 20°, …

    View full-size slide

  74. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Visualization Points
    20
    • Create lines going through hit point (x,y)i
    • Repeat for every hit point i
    α = 0°, 10°, 20°, …

    View full-size slide

  75. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Visualization Points
    20
    • Create lines going through hit point (x,y)i
    • Repeat for every hit point i
    α = 0°, 10°, 20°, …

    View full-size slide

  76. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Visualization Points
    20
    • Create lines going through hit point (x,y)i
    • Repeat for every hit point i
    α = 0°, 10°, 20°, …

    View full-size slide

  77. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Visualization Points
    20
    • Create lines going through hit point (x,y)i
    • Repeat for every hit point i
    α = 0°, 10°, 20°, …

    View full-size slide

  78. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Granularity
    21
    • Choice of α granularity determines resolution
    α = 0°, 2°, 4°, …
    α = 0°, 2°, 4°, …

    View full-size slide

  79. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Granularity
    21
    • Choice of α granularity determines resolution
    α = 0°, 2°, 4°, …
    α = 0°, 2°, 4°, …

    View full-size slide

  80. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Granularity
    21
    • Choice of α granularity determines resolution
    α = 0°, 2°, 4°, …
    α = 0°, 2°, 4°, …

    View full-size slide

  81. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Granularity
    21
    • Choice of α granularity determines resolution
    α = 0°, 2°, 4°, …
    α = 0°, 2°, 4°, …

    View full-size slide

  82. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Granularity
    21
    • Choice of α granularity determines resolution
    α = 0°, 2°, 4°, …
    α = 0°, 2°, 4°, …

    View full-size slide

  83. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Granularity
    22
    • Choice of α granularity determines resolution
    α = 0°, 2°, 4°, …
    α = 0°, 2°, 4°, …

    View full-size slide

  84. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Granularity
    22
    • Choice of α granularity determines resolution
    α = 0°, 2°, 4°, …
    α = 0°, 2°, 4°, …

    View full-size slide

  85. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Granularity
    22
    • Choice of α granularity determines resolution
    α = 0°, 2°, 4°, …
    α = 0°, 2°, 4°, …

    View full-size slide

  86. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Granularity
    22
    • Choice of α granularity determines resolution
    α = 0°, 2°, 4°, …
    α = 0°, 2°, 4°, …

    View full-size slide

  87. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Granularity
    22
    • Choice of α granularity determines resolution
    α = 0°, 2°, 4°, …
    α = 0°, 2°, 4°, …

    View full-size slide

  88. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Granularity
    22
    • Choice of α granularity determines resolution
    α = 0°, 2°, 4°, …
    α = 0°, 2°, 4°, …

    View full-size slide

  89. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Granularity
    22
    • Choice of α granularity determines resolution
    α = 0°, 2°, 4°, …
    α = 0°, 2°, 4°, …

    View full-size slide

  90. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Granularity
    22
    • Choice of α granularity determines resolution
    α = 0°, 2°, 4°, …
    α = 0°, 2°, 4°, …

    View full-size slide

  91. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Granularity
    22
    • Choice of α granularity determines resolution
    α = 0°, 2°, 4°, …
    α = 0°, 2°, 4°, …

    View full-size slide

  92. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Granularity
    22
    • Choice of α granularity determines resolution
    α = 0°, 2°, 4°, …
    α = 0°, 2°, 4°, …

    View full-size slide

  93. °
    Angle /
    0 20 40 60 80 100 120 140 160 180
    Hough transformed
    -0.4
    -0.3
    -0.2
    -0.1
    0
    0.1
    0.2
    0.3
    0.4
    0.5
    0.6 0
    Entries 2.2356e+08
    Mean x 90
    Mean y 0.02905
    RMS x 51.96
    RMS y 0.1063
    0
    5
    10
    15
    20
    25
    0
    Entries 2.2356e+08
    Mean x 90
    Mean y 0.02905
    RMS x 51.96
    RMS y 0.1063
    1800 x 1800 Grid
    PANDA STT+MVD
    Mitglied der Helmholtz-Gemeinschaft
    23
    68 (x,y) points
    r
    α
    Algorithm: Hough Transform

    View full-size slide

  94. °
    Angle /
    0 20 40 60 80 100 120 140 160 180
    Hough transformed
    -0.4
    -0.3
    -0.2
    -0.1
    0
    0.1
    0.2
    0.3
    0.4
    0.5
    0.6 0
    Entries 2.2356e+08
    Mean x 90
    Mean y 0.02905
    RMS x 51.96
    RMS y 0.1063
    0
    5
    10
    15
    20
    25
    0
    Entries 2.2356e+08
    Mean x 90
    Mean y 0.02905
    RMS x 51.96
    RMS y 0.1063
    1800 x 1800 Grid
    PANDA STT+MVD
    Mitglied der Helmholtz-Gemeinschaft
    23
    68 (x,y) points
    r
    α
    Algorithm: Hough Transform

    View full-size slide

  95. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Remarks
    Two Implementations
    • Thrust (CUDA‘s STL)
    – Performance: 3 ms/evt
    – Reduce to set of standard routines
    • Fast (uses Thrust‘s optimized algorithms)
    • Inflexible (hard to customize)
    • Not yet at performance maximum
    • Plain CUDA
    – Performance: 0.5 ms/evt
    – Built completely for this task
    • Fitting for PANDA; customizable
    • A bit more complicated at parts

    24
    Peakfinding challenging

    View full-size slide

  96. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Remarks
    25
    °
    /
    α
    0 20 40 60 80 100 120 140 160 180
    r
    -30
    -20
    -10
    0
    10
    20
    30
    40
    HoughHist
    Entries 9000
    Mean x 89.33
    Mean y 6.66
    RMS x 51.8
    RMS y 19.2
    0
    2
    4
    6
    8
    10
    12
    14
    16
    18
    HoughHist
    Entries 9000
    Mean x 89.33
    Mean y 6.66
    RMS x 51.8
    RMS y 19.2
    HT histogram
    Hill Climber
    Peakfinding challenging

    View full-size slide

  97. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Remarks
    25
    °
    /
    α
    0 20 40 60 80 100 120 140 160 180
    r
    -30
    -20
    -10
    0
    10
    20
    30
    40
    houghIt0
    Entries 9000
    Mean x 89.33
    Mean y 6.66
    RMS x 51.8
    RMS y 19.2
    0
    2
    4
    6
    8
    10
    12
    14
    16
    18
    houghIt0
    Entries 9000
    Mean x 89.33
    Mean y 6.66
    RMS x 51.8
    RMS y 19.2
    HT histogram
    °
    /
    α
    0 20 40 60 80 100 120 140 160 180
    r
    -30
    -20
    -10
    0
    10
    20
    30
    40
    houghIt1
    Entries 5580
    Mean x 89.6
    Mean y 9.719
    RMS x 51.78
    RMS y 18.09
    0
    2
    4
    6
    8
    10
    12
    14
    16
    houghIt1
    Entries 5580
    Mean x 89.6
    Mean y 9.719
    RMS x 51.78
    RMS y 18.09
    HT histogram
    °
    /
    α
    0 20 40 60 80 100 120 140 160 180
    r
    -30
    -20
    -10
    0
    10
    20
    30
    houghIt2
    Entries 2700
    Mean x 89.13
    Mean y 13.79
    RMS x 51.77
    RMS y 14.04
    0
    2
    4
    6
    8
    10
    12
    houghIt2
    Entries 2700
    Mean x 89.13
    Mean y 13.79
    RMS x 51.77
    RMS y 14.04
    HT histogram
    -40 -30 -20 -10 0 10 20 30 40
    0
    5
    10
    15
    20
    25
    30
    Iterative Maximum Deleter
    Peakfinding challenging

    View full-size slide

  98. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Remarks
    25
    °
    /
    α
    0 20 40 60 80 100 120 140 160 180
    r
    -30
    -20
    -10
    0
    10
    20
    30
    40
    houghIt0
    Entries 9000
    Mean x 89.33
    Mean y 6.66
    RMS x 51.8
    RMS y 19.2
    0
    2
    4
    6
    8
    10
    12
    14
    16
    18
    houghIt0
    Entries 9000
    Mean x 89.33
    Mean y 6.66
    RMS x 51.8
    RMS y 19.2
    HT histogram
    °
    /
    α
    0 20 40 60 80 100 120 140 160 180
    r
    -30
    -20
    -10
    0
    10
    20
    30
    40
    houghIt1
    Entries 5580
    Mean x 89.6
    Mean y 9.719
    RMS x 51.78
    RMS y 18.09
    0
    2
    4
    6
    8
    10
    12
    14
    16
    houghIt1
    Entries 5580
    Mean x 89.6
    Mean y 9.719
    RMS x 51.78
    RMS y 18.09
    HT histogram
    °
    /
    α
    0 20 40 60 80 100 120 140 160 180
    r
    -30
    -20
    -10
    0
    10
    20
    30
    houghIt2
    Entries 2700
    Mean x 89.13
    Mean y 13.79
    RMS x 51.77
    RMS y 14.04
    0
    2
    4
    6
    8
    10
    12
    houghIt2
    Entries 2700
    Mean x 89.13
    Mean y 13.79
    RMS x 51.77
    RMS y 14.04
    HT histogram
    -40 -30 -20 -10 0 10 20 30 40
    0
    5
    10
    15
    20
    25
    30
    Iterative Maximum Deleter
    Peakfinding challenging current
    research

    View full-size slide

  99. Mitglied der Helmholtz-Gemeinschaft
    26
    ALGORITHMS #2
    Hough Transform
    Riemann Track Finder
    Triplet Finder

    View full-size slide

  100. Mitglied der Helmholtz-Gemeinschaft
    27
    Riemann Track Finder — Method
    • Idea: Don‘t fit lines (in 2D), fit planes (in 3D)!
    • Create seeds
    – All possible three hit combinations
    • Grow seeds to tracks
    Continuously test next hit if it fits
    – Use mapping to Riemann paraboloid (+ s-z fit, det. layer)
    x
    x
    x
    x
    y
    z‘
    x
    x
    x
    y
    x
    x
    x
    x
    y
    x
    More on: Seeds; Growing
    1
    2

    View full-size slide

  101. Mitglied der Helmholtz-Gemeinschaft
    28
    Riemann Track Finder — GPU Adaptations
    CPU GPU

    View full-size slide

  102. Mitglied der Helmholtz-Gemeinschaft
    28
    Riemann Track Finder — GPU Adaptations
    CPU GPU
    3 loops to generate seeds
    serially
    for (int i = 0; i < hitsInLayerOne.size(); i++) {
    for (int j = 0; j < hitsInLayerTwo.size(); j++) {
    for (int k = 0; k < hitsInLayerThree.size(); k++) {
    /* Triplet Generation */
    }
    }
    }
    Needed: Mapping of
    inherent GPU indexing
    variable to triplet index
    int ijk = threadIdx.x + blockIdx.x * blockDim.x;
    nLayerx
    = 1
    2
    ⇣p
    8x
    +
    1 1

    pos
    (
    nLayerx
    ) =
    3
    pp
    3
    p
    243x2 1
    +
    27x
    32
    /
    3
    + 1
    3
    p
    3
    3
    pp
    3
    p
    243x2 1
    +
    27x
    1
    1

    View full-size slide

  103. Mitglied der Helmholtz-Gemeinschaft
    28
    Riemann Track Finder — GPU Adaptations
    CPU GPU
    3 loops to generate seeds
    serially
    for (int i = 0; i < hitsInLayerOne.size(); i++) {
    for (int j = 0; j < hitsInLayerTwo.size(); j++) {
    for (int k = 0; k < hitsInLayerThree.size(); k++) {
    /* Triplet Generation */
    }
    }
    }
    Needed: Mapping of
    inherent GPU indexing
    variable to triplet index
    int ijk = threadIdx.x + blockIdx.x * blockDim.x;
    nLayerx
    = 1
    2
    ⇣p
    8x
    +
    1 1

    pos
    (
    nLayerx
    ) =
    3
    pp
    3
    p
    243x2 1
    +
    27x
    32
    /
    3
    + 1
    3
    p
    3
    3
    pp
    3
    p
    243x2 1
    +
    27x
    1
    1
    2
    Port of CPU code;
    parallelism on seed base
    Only easy computations;
    e.g. 3x3 matrices

    View full-size slide

  104. Mitglied der Helmholtz-Gemeinschaft
    28
    Riemann Track Finder — GPU Adaptations
    CPU GPU
    → 100 × faster than CPU version: ~0.6 ms/event
    Still needs implementing into PandaRoot
    3 loops to generate seeds
    serially
    for (int i = 0; i < hitsInLayerOne.size(); i++) {
    for (int j = 0; j < hitsInLayerTwo.size(); j++) {
    for (int k = 0; k < hitsInLayerThree.size(); k++) {
    /* Triplet Generation */
    }
    }
    }
    Needed: Mapping of
    inherent GPU indexing
    variable to triplet index
    int ijk = threadIdx.x + blockIdx.x * blockDim.x;
    nLayerx
    = 1
    2
    ⇣p
    8x
    +
    1 1

    pos
    (
    nLayerx
    ) =
    3
    pp
    3
    p
    243x2 1
    +
    27x
    32
    /
    3
    + 1
    3
    p
    3
    3
    pp
    3
    p
    243x2 1
    +
    27x
    1
    1
    2
    Port of CPU code;
    parallelism on seed base
    Only easy computations;
    e.g. 3x3 matrices

    View full-size slide

  105. Mitglied der Helmholtz-Gemeinschaft
    29
    ALGORITHMS #3
    Hough Transform
    Riemann Track Finder
    Triplet Finder

    View full-size slide

  106. Mitglied der Helmholtz-Gemeinschaft
    30
    Triplet Finder
    • Algorithm specifically designed for the
    PANDA Straw Tube Tracker (STT)
    • Ported to GPU by Andrew Adinetz
    – CUDA, Dynamic Parallelism, Thrust
    – Quality of tracks comparable to CPU
    http://www.fz-juelich.de/ias/jsc/
    Original algorithm by
    Marius Mertens et al
    1.5 m

    View full-size slide

  107. Mitglied der Helmholtz-Gemeinschaft
    31
    Triplet Finder
    • Idea: Use only subset of detector as seed
    – Don‘t use STT isochrones (drift times)
    – Calculate circle from 3 points (no fit)
    More

    View full-size slide

  108. Mitglied der Helmholtz-Gemeinschaft
    STT — Drift Tubes and t0
    32

    View full-size slide

  109. Mitglied der Helmholtz-Gemeinschaft
    STT — Drift Tubes and t0
    32
    Particle ionizes gas atoms in drift tubes

    View full-size slide

  110. Mitglied der Helmholtz-Gemeinschaft
    STT — Drift Tubes and t0
    32
    Particle ionizes gas atoms in drift tubes
    Electrons drift to anode wire, ions to wall

    View full-size slide

  111. Mitglied der Helmholtz-Gemeinschaft
    STT — Drift Tubes and t0
    32
    Particle ionizes gas atoms in drift tubes
    Electrons drift to anode wire, ions to wall
    Signal only when electrons arrive at wire
    No information about drift duration!
    For that, start time (t0) needed:
    t0 - tarrival ≈ tdrift
    vdrift = const → tdrift • vdrift = risochrone

    View full-size slide

  112. Mitglied der Helmholtz-Gemeinschaft
    STT — Drift Tubes and t0
    32
    Particle ionizes gas atoms in drift tubes
    Electrons drift to anode wire, ions to wall
    Signal only when electrons arrive at wire
    No information about drift duration!
    For that, start time (t0) needed:
    t0 - tarrival ≈ tdrift
    vdrift = const → tdrift • vdrift = risochrone
    risochrone

    View full-size slide

  113. Mitglied der Helmholtz-Gemeinschaft
    STT — Drift Tubes and t0
    32
    Particle ionizes gas atoms in drift tubes
    Resolution without t0: (0.1 cm) (rtube)
    Resolution with t0: (0.015 cm)
    Electrons drift to anode wire, ions to wall
    Signal only when electrons arrive at wire
    No information about drift duration!
    For that, start time (t0) needed:
    t0 - tarrival ≈ tdrift
    vdrift = const → tdrift • vdrift = risochrone
    risochrone

    View full-size slide

  114. Mitglied der Helmholtz-Gemeinschaft
    STT — Drift Tubes and t0
    32
    Particle ionizes gas atoms in drift tubes
    Resolution without t0: (0.1 cm) (rtube)
    Resolution with t0: (0.015 cm)
    Usual HEP experiment: t0 by trigger
    But PANDA has no trigger…
    Electrons drift to anode wire, ions to wall
    Signal only when electrons arrive at wire
    No information about drift duration!
    For that, start time (t0) needed:
    t0 - tarrival ≈ tdrift
    vdrift = const → tdrift • vdrift = risochrone
    risochrone

    View full-size slide

  115. Mitglied der Helmholtz-Gemeinschaft
    33
    Triplet Finder
    • Idea: Use only subset of detector as seed
    – Don‘t use STT isochrones (drift times)
    – Calculate circle from 3 points (no fit)
    More

    View full-size slide

  116. Mitglied der Helmholtz-Gemeinschaft
    33
    Triplet Finder
    • Idea: Use only subset of detector as seed
    – Don‘t use STT isochrones (drift times)
    – Calculate circle from 3 points (no fit)
    • Features
    – Fast & robust algorithm, no event time needed
    – Many tuning possibilities
    More

    View full-size slide

  117. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Animation
    34
    Triplet
    Isochrone early
    Isochrone early & skewed
    Isochrone close
    Isochrone late
    MVD hit
    Track timed out
    Track current

    View full-size slide

  118. Mitglied der Helmholtz-Gemeinschaft
    35
    Triplet Finder — Times
    K20X
    in publication

    View full-size slide

  119. Mitglied der Helmholtz-Gemeinschaft
    35
    Triplet Finder — Times
    K20X
    in publication

    View full-size slide

  120. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Optimizations
    • Bunching Wrapper
    – Hits from one event have similar timestamps
    – Combine hits to sets (bunches) which occupy GPU best
    36

    View full-size slide

  121. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Optimizations
    • Bunching Wrapper
    – Hits from one event have similar timestamps
    – Combine hits to sets (bunches) which occupy GPU best
    36
    Hit

    View full-size slide

  122. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Optimizations
    • Bunching Wrapper
    – Hits from one event have similar timestamps
    – Combine hits to sets (bunches) which occupy GPU best
    36
    Hit Event

    View full-size slide

  123. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Optimizations
    • Bunching Wrapper
    – Hits from one event have similar timestamps
    – Combine hits to sets (bunches) which occupy GPU best
    36
    Hit Event

    View full-size slide

  124. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Optimizations
    • Bunching Wrapper
    – Hits from one event have similar timestamps
    – Combine hits to sets (bunches) which occupy GPU best
    36
    Hit Event
    Bunch

    View full-size slide

  125. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Optimizations
    • Bunching Wrapper
    – Hits from one event have similar timestamps
    – Combine hits to sets (bunches) which occupy GPU best
    36
    Hit Event
    Bunch
    (N2) → (N)

    View full-size slide

  126. Mitglied der Helmholtz-Gemeinschaft
    37
    Triplet Finder — Bunching
    Performance
    K20X
    in publication

    View full-size slide

  127. Dynamic
    Parallelism
    Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Optimizations
    • Compare data processing strategies
    38
    1 thread/bunch
    Calling
    kernel
    1 thread/bunch
    Calling
    kernel
    Triplet
    Finder
    1 thread/bunch
    Calling
    kernel
    1 block/bunch
    Joined
    kernel
    1 block/bunch
    Joined
    kernel
    1 block/bunch
    Joined
    kernel
    TF Stage #1
    TF Stage #2
    TF Stage #3
    TF Stage #4
    1 stream/bunch
    Combining
    stream
    1 stream/bunch
    Combining
    stream
    1 stream/bunch
    Calling
    stream
    Joined
    Kernel
    Host
    Streams
    Triplet
    Finder
    Triplet
    Finder
    CPU
    GPU
    TF Stage #1
    TF Stage #2
    TF Stage #3
    TF Stage #4
    TF Stage #1
    TF Stage #2
    TF Stage #3
    TF Stage #4

    View full-size slide

  128. Mitglied der Helmholtz-Gemeinschaft
    39
    Triplet Finder — Bunch Sizes
    in publication

    View full-size slide

  129. Mitglied der Helmholtz-Gemeinschaft
    40
    Triplet Finder — Data Processing
    Explanation
    K20X
    in publication

    View full-size slide

  130. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Binning: Sector Rows
    41
    • Sector Row testing
    – After found track:
    Hit association not with all hits of current window,
    but only with subset
    (first test rows of sector, then hits of row)
    More

    View full-size slide

  131. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Binning: Sector Rows
    41
    • Sector Row testing
    – After found track:
    Hit association not with all hits of current window,
    but only with subset
    (first test rows of sector, then hits of row)
    More

    View full-size slide

  132. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Binning: Sector Rows
    41
    • Sector Row testing
    – After found track:
    Hit association not with all hits of current window,
    but only with subset
    (first test rows of sector, then hits of row)
    More

    View full-size slide

  133. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Binning: Sector Rows
    41
    • Sector Row testing
    – After found track:
    Hit association not with all hits of current window,
    but only with subset
    (first test rows of sector, then hits of row)
    More

    View full-size slide

  134. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Binning: Sector Rows
    41
    • Sector Row testing
    – After found track:
    Hit association not with all hits of current window,
    but only with subset
    (first test rows of sector, then hits of row)
    More

    View full-size slide

  135. Mitglied der Helmholtz-Gemeinschaft
    42
    Triplet Finder — Binning: Sector Rows
    K20X
    in publication
    All Tubes (No Binning)
    Sector-Row Binning

    View full-size slide

  136. Mitglied der Helmholtz-Gemeinschaft
    43
    Triplet Finder — Binning: Skewlets
    K20X
    in publication
    Skewlet Binning
    All Skewlets (No Binning)

    View full-size slide

  137. Mitglied der Helmholtz-Gemeinschaft
    44
    Triplet Finder — AoS vs. SoA
    K20X
    in publication

    View full-size slide

  138. Mitglied der Helmholtz-Gemeinschaft
    44
    Triplet Finder — AoS vs. SoA
    K20X
    in publication
    → Andrew‘s talk

    View full-size slide

  139. Mitglied der Helmholtz-Gemeinschaft
    45
    Triplet Finder — CUDA Versions
    K20X
    in publication

    View full-size slide

  140. Mitglied der Helmholtz-Gemeinschaft
    46
    Triplet Finder — Clock Speed / GPU
    K40 3004 MHz, 745 MHz / 875 MHz
    K20X 2600 MHz, 732 MHz / 784 MHz
    Memory Clock Core Clock GPU Boost
    in publication

    View full-size slide

  141. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Summary
    • Best performance: 14 µs/event
    – 14⋅10-6 s/event * 2⋅107 event/s 㱺 280 GPUs2014
    – PANDA2019: Multi GPU system – (100) GPUs
    • Optimizations possible & needed
    – ε needs to be improved
    – Speed, €:
    • More float less double-cards a la K10
    • Consumer-grade cards a la GTX
    47

    View full-size slide

  142. Mitglied der Helmholtz-Gemeinschaft
    Summary
    • PANDA researches in using GPUs as part of online
    event reconstruction scheme
    • Algorithms in active evaluation and optimization
    – Triplet Finder performance-optimized
    • Data transfer to GPU in research: FairMQ
    48

    View full-size slide

  143. Thank you!
    Andreas Herten
    [email protected]
    Mitglied der Helmholtz-Gemeinschaft
    Summary
    • PANDA researches in using GPUs as part of online
    event reconstruction scheme
    • Algorithms in active evaluation and optimization
    – Triplet Finder performance-optimized
    • Data transfer to GPU in research: FairMQ
    48

    View full-size slide

  144. Mitglied der Helmholtz-Gemeinschaft
    List of Resources Used
    • #3: Jupiter icon by Nikki Rodriguez from The Noun Project
    • #3: Einstein icon by Roman Rusinov from The Noun Project
    • #6: FAIR vector logo from official FAIR website
    • #6: FAIR rendering from official website
    • #11: Flare Gun icon by Jop van der Kroef from The Noun Project
    • #27: STT event animation by Marius C. Mertens
    • #35: Graphics cards images by NVIDIA promotion
    • #35: GPU Specifications
    – Tesla K20X Specifications: http://www.nvidia.com/content/PDF/kepler/Tesla-
    K20X-BD-06397-001-v07.pdf
    – Tesla K40 Specifications: http://www.nvidia.com/content/PDF/kepler/Tesla-K40-
    Active-Board-Spec-BD-06949-001_v03.pdf
    – Tesla Familiy Overview: http://www.nvidia.com/content/tesla/pdf/NVIDIA-Tesla-
    Kepler-Family-Datasheet.pdf
    49

    View full-size slide

  145. Mitglied der Helmholtz-Gemeinschaft
    BACKUP
    50

    View full-size slide

  146. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    51
    Back

    View full-size slide

  147. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    51
    x
    y
    Back

    View full-size slide

  148. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    51
    x
    y
    Back

    View full-size slide

  149. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    51
    x
    y
    *
    *
    (r, α)1
    rij =
    cos
    ↵j
    ·
    xi +
    sin
    ↵j
    ·
    yi + ⇢i
    Back

    View full-size slide

  150. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    51
    x
    y
    *
    *
    r
    α
    (r, α)1
    rij =
    cos
    ↵j
    ·
    xi +
    sin
    ↵j
    ·
    yi + ⇢i
    Back

    View full-size slide

  151. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    51
    x
    y
    *
    *
    r
    α
    (r, α)1
    rij =
    cos
    ↵j
    ·
    xi +
    sin
    ↵j
    ·
    yi + ⇢i
    Back

    View full-size slide

  152. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    51
    x
    y
    *
    *
    r
    α
    (r, α)1
    (r, α)2
    rij =
    cos
    ↵j
    ·
    xi +
    sin
    ↵j
    ·
    yi + ⇢i
    Back

    View full-size slide

  153. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    51
    x
    y
    *
    *
    r
    α
    rij =
    cos
    ↵j
    ·
    xi +
    sin
    ↵j
    ·
    yi + ⇢i
    Back

    View full-size slide

  154. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    51
    x
    y
    *
    *
    r
    α
    rij =
    cos
    ↵j
    ·
    xi +
    sin
    ↵j
    ·
    yi + ⇢i
    Back

    View full-size slide

  155. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    51
    x
    y
    *
    *
    r
    α
    rij =
    cos
    ↵j
    ·
    xi +
    sin
    ↵j
    ·
    yi + ⇢i
    Back

    View full-size slide

  156. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    51
    x
    y
    *
    *
    r
    α
    rij =
    cos
    ↵j
    ·
    xi +
    sin
    ↵j
    ·
    yi + ⇢i
    Back

    View full-size slide

  157. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    51
    x
    y
    *
    *
    r
    α
    rij =
    cos
    ↵j
    ·
    xi +
    sin
    ↵j
    ·
    yi + ⇢i
    Back

    View full-size slide

  158. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    51
    x
    y
    → Bin with highest multiplicity
    gives track parameters
    *
    *
    r
    α
    rij =
    cos
    ↵j
    ·
    xi +
    sin
    ↵j
    ·
    yi + ⇢i
    Back

    View full-size slide

  159. Mitglied der Helmholtz-Gemeinschaft
    52
    Riemann Algorithm — Procedure

    View full-size slide

  160. Mitglied der Helmholtz-Gemeinschaft
    52
    Riemann Algorithm — Procedure
    • Create triplet of hit points
    – All possible three hit combinations need to become
    triplets
    1

    View full-size slide

  161. Mitglied der Helmholtz-Gemeinschaft
    52
    Riemann Algorithm — Procedure
    • Create triplet of hit points
    – All possible three hit combinations need to become
    triplets
    • Grow triplets to tracks:
    Continuously test next hit if it fits to triplet track
    – Use Riemann paraboloid to circle fit track
    • Test closeness of new hit: good → add hit; bad → dismiss hit
    • Continue with next hit
    – Helix fit: arc length s vs. z position
    1
    2

    View full-size slide

  162. Mitglied der Helmholtz-Gemeinschaft
    53
    1 2 3 4 5
    1
    2
    3
    4
    5
    Riemann Track Finder — 1 Seeds
    1
    Layer number
    Back

    View full-size slide

  163. Mitglied der Helmholtz-Gemeinschaft
    53
    1 2 3 4 5
    1
    2
    3
    4
    5
    Riemann Track Finder — 1 Seeds
    1
    Layer number
    Back

    View full-size slide

  164. Mitglied der Helmholtz-Gemeinschaft
    53
    1 2 3 4 5
    1
    2
    3
    4
    5
    Riemann Track Finder — 1 Seeds
    1
    Layer number
    Back

    View full-size slide

  165. Mitglied der Helmholtz-Gemeinschaft
    53
    1 2 3 4 5
    21
    11 31
    1
    2
    3
    4
    5
    Riemann Track Finder — 1 Seeds
    1
    Layer number
    Back

    View full-size slide

  166. Mitglied der Helmholtz-Gemeinschaft
    53
    1 2 3 4 5
    21
    11 31
    31
    11 41
    1
    2
    3
    4
    5
    Riemann Track Finder — 1 Seeds
    1
    Layer number
    Back

    View full-size slide

  167. Mitglied der Helmholtz-Gemeinschaft
    53
    1 2 3 4 5
    21
    11 31
    31
    11 41
    31
    11 32
    1
    2
    3
    4
    5
    Riemann Track Finder — 1 Seeds
    1
    Layer number
    Back

    View full-size slide

  168. Mitglied der Helmholtz-Gemeinschaft
    53
    1 2 3 4 5
    21
    11 31
    31
    11 41
    31
    11 32
    1
    2
    3
    4
    5
    Riemann Track Finder — 1 Seeds
    1
    Layer number
    Back

    View full-size slide

  169. Mitglied der Helmholtz-Gemeinschaft
    54
    Riemann Algorithm — 1 Expansion
    2
    Back

    View full-size slide

  170. Mitglied der Helmholtz-Gemeinschaft
    54
    Riemann Algorithm — 1 Expansion
    2
    x
    x
    x
    x
    y
    z‘
    Expand to z‘
    Back

    View full-size slide

  171. Mitglied der Helmholtz-Gemeinschaft
    54
    Riemann Algorithm — 1 Expansion
    2
    x
    x
    x
    x
    y
    z‘
    Expand to z‘
    x
    x
    x
    y
    x
    Riemann Surface
    (paraboloid)
    Back

    View full-size slide

  172. Mitglied der Helmholtz-Gemeinschaft
    54
    Riemann Algorithm — 1 Expansion
    2
    x
    x
    x
    x
    y
    z‘
    Expand to z‘
    x
    x
    x
    y
    x
    Riemann Surface
    (paraboloid)
    Back

    View full-size slide

  173. Mitglied der Helmholtz-Gemeinschaft
    54
    Riemann Algorithm — 1 Expansion
    2
    x
    x
    x
    x
    y
    z‘
    Expand to z‘
    x
    x
    x
    y
    x
    Riemann Surface
    (paraboloid)
    Back

    View full-size slide

  174. Mitglied der Helmholtz-Gemeinschaft
    54
    Riemann Algorithm — 1 Expansion
    2
    x
    x
    x
    x
    y
    z‘
    Expand to z‘
    x
    x
    x
    y
    x
    Riemann Surface
    (paraboloid)
    Back

    View full-size slide

  175. Mitglied der Helmholtz-Gemeinschaft
    54
    Riemann Algorithm — 1 Expansion
    2
    x
    x
    x
    x
    y
    z‘
    Expand to z‘
    x
    x
    x
    y
    x
    Riemann Surface
    (paraboloid)
    Back

    View full-size slide

  176. Mitglied der Helmholtz-Gemeinschaft
    54
    Riemann Algorithm — 1 Expansion
    2
    x
    x
    x
    x
    y
    z‘
    Expand to z‘
    x
    x
    x
    y
    x
    Riemann Surface
    (paraboloid)
    x
    Back

    View full-size slide

  177. Mitglied der Helmholtz-Gemeinschaft
    54
    Riemann Algorithm — 1 Expansion
    2
    x
    x
    x
    x
    y
    z‘
    Expand to z‘
    x
    x
    x
    y
    x
    Riemann Surface
    (paraboloid)
    x
    Back

    View full-size slide

  178. Mitglied der Helmholtz-Gemeinschaft
    54
    Riemann Algorithm — 1 Expansion
    2
    x
    x
    x
    x
    y
    z‘
    Expand to z‘
    x
    x
    x
    y
    x
    Riemann Surface
    (paraboloid)
    x
    Back

    View full-size slide

  179. Mitglied der Helmholtz-Gemeinschaft
    54
    Riemann Algorithm — 1 Expansion
    2
    x
    x
    x
    x
    y
    z‘
    Expand to z‘
    x
    x
    x
    y
    x
    Riemann Surface
    (paraboloid)
    x
    Back

    View full-size slide

  180. Mitglied der Helmholtz-Gemeinschaft
    54
    Riemann Algorithm — 1 Expansion
    2
    x
    x
    x
    x
    y
    z‘
    Expand to z‘
    x
    x
    x
    y
    x
    Riemann Surface
    (paraboloid)
    x
    Back

    View full-size slide

  181. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    55
    STT
    More

    View full-size slide

  182. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    55
    STT
    More

    View full-size slide

  183. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    55
    STT
    More

    View full-size slide

  184. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    55
    STT
    More

    View full-size slide

  185. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    • STT hit in pivot straw
    55
    STT
    More

    View full-size slide

  186. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    • STT hit in pivot straw
    • Find surrounding hits
    → Create virtual hit (triplet)
    at center of gravity (cog)
    55
    STT
    More

    View full-size slide

  187. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    • STT hit in pivot straw
    • Find surrounding hits
    → Create virtual hit (triplet)
    at center of gravity (cog)
    • Combine with
    55
    STT
    More

    View full-size slide

  188. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    • STT hit in pivot straw
    • Find surrounding hits
    → Create virtual hit (triplet)
    at center of gravity (cog)
    • Combine with
    1.Second STT pivot-cog virtual hit
    55
    STT
    More

    View full-size slide

  189. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    • STT hit in pivot straw
    • Find surrounding hits
    → Create virtual hit (triplet)
    at center of gravity (cog)
    • Combine with
    1.Second STT pivot-cog virtual hit
    55
    STT
    More

    View full-size slide

  190. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    • STT hit in pivot straw
    • Find surrounding hits
    → Create virtual hit (triplet)
    at center of gravity (cog)
    • Combine with
    1.Second STT pivot-cog virtual hit
    55
    STT
    More

    View full-size slide

  191. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    • STT hit in pivot straw
    • Find surrounding hits
    → Create virtual hit (triplet)
    at center of gravity (cog)
    • Combine with
    1.Second STT pivot-cog virtual hit
    2.Interaction point
    55
    Interaction Point
    STT
    More

    View full-size slide

  192. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    • STT hit in pivot straw
    • Find surrounding hits
    → Create virtual hit (triplet)
    at center of gravity (cog)
    • Combine with
    1.Second STT pivot-cog virtual hit
    2.Interaction point
    • Calculate circle through three
    points
    55
    Interaction Point
    STT
    More

    View full-size slide

  193. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    • STT hit in pivot straw
    • Find surrounding hits
    → Create virtual hit (triplet)
    at center of gravity (cog)
    • Combine with
    1.Second STT pivot-cog virtual hit
    2.Interaction point
    • Calculate circle through three
    points
    → Track Candidate
    55
    Interaction Point
    STT
    More

    View full-size slide

  194. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Optimizations
    • Sector Row testing
    – Thicken track; shrink sector row layer to line
    – Find intersection
    56
    Sector-Row Testing
    Track
    Sector-Row
    Track
    Sector-Row
    Back

    View full-size slide

  195. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Kernel Launch Strategies
    • Joined Kernel (JK): slowest
    – High # registers → low occupancy
    • Dynamic Parallelism (DP) / Host Streams (HS): comparable performance
    – Performance
    • HS faster for small # processed hits, DP faster for > 45000 hits
    • HS stagnates there, while DP continues rising
    – Limiting factor
    • High # of required kernel calls
    • Kernel launch latency
    • Memcopy
    – HS more affected by this, because
    • More PCI-E transfers (launch configurations for kernels)
    • Less launch throughput, kernel launch latency gets more important
    • False dependencies of launched kernels
    – Single CPU thread handles all CUDA streams (Multi-thread possible, but
    synchronization overhead too high for good performance)
    – Grid scheduling done on hardware (Grid Management Unit) (DP: software)
    » False dependencies when N(streams) > N(device connections)=323.5
    57
    Back

    View full-size slide

  196. Mitglied der Helmholtz-Gemeinschaft
    58
    Triplet Finder — Host Stream Connections
    in publication

    View full-size slide