$30 off During Our Annual Pro Sale. View Details »

Enabling the Next Generation of Particle Physics Experiments: GPUs for Online Track Reconstruction

AndiH
March 26, 2014

Enabling the Next Generation of Particle Physics Experiments: GPUs for Online Track Reconstruction

Talk I gave about our current research at GPU Technology Conference 2014: http://registration.gputechconf.com/quicklink/dFuC9Rm

AndiH

March 26, 2014
Tweet

More Decks by AndiH

Other Decks in Science

Transcript

  1. Mitglied der Helmholtz-Gemeinschaft
    1
    GPU Technology Conference 2014
    26 March 2014, Andreas Herten (Institute for Nuclear Physics, Forschungszentrum Jülich, Germany)
    Enabling the Next Generation of
    Particle Physics Experiments:
    GPUs for Online Track Reconstruction

    View Slide

  2. Mitglied der Helmholtz-Gemeinschaft
    Outline
    • High Energy Physics
    • PANDA Experiment
    • Particle Tracking
    • GPUs at PANDA
    • Algorithms
    – Hough Transform
    – Riemann Track Finder
    – Triplet Finder
    2

    View Slide

  3. Mitglied der Helmholtz-Gemeinschaft
    HEP
    High Energy Physics
    3

    View Slide

  4. Mitglied der Helmholtz-Gemeinschaft
    High Energy Physics
    • High Energy Physics (HEP) in a nutshell:

    4
    HEP Recipe
    1. Accelerate particles (e, p,…)
    2. Accelerate particles more!
    3. Smash into each other
    4. Look at resulting particles
    5. Understand world

    View Slide

  5. Mitglied der Helmholtz-Gemeinschaft
    High Energy Physics
    • High Energy Physics (HEP) in a nutshell:

    4
    HEP Recipe
    1. Accelerate particles (e, p,…)
    2. Accelerate particles more!
    3. Smash into each other
    4. Look at resulting particles
    5. Understand world

    View Slide

  6. Mitglied der Helmholtz-Gemeinschaft
    High Energy Physics
    • High Energy Physics (HEP) in a nutshell:

    4
    HEP Recipe
    1. Accelerate particles (e, p,…)
    2. Accelerate particles more!
    3. Smash into each other
    4. Look at resulting particles
    5. Understand world

    View Slide

  7. Mitglied der Helmholtz-Gemeinschaft
    High Energy Physics
    • High Energy Physics (HEP) in a nutshell:

    4
    HEP Recipe
    1. Accelerate particles (e, p,…)
    2. Accelerate particles more!
    3. Smash into each other
    4. Look at resulting particles
    5. Understand world

    View Slide

  8. Mitglied der Helmholtz-Gemeinschaft
    High Energy Physics
    • High Energy Physics (HEP) in a nutshell:

    4
    HEP Recipe
    1. Accelerate particles (e, p,…)
    2. Accelerate particles more!
    3. Smash into each other
    4. Look at resulting particles
    5. Understand world
    E=mc2

    View Slide

  9. Mitglied der Helmholtz-Gemeinschaft
    High Energy Physics
    • High Energy Physics (HEP) in a nutshell:

    4
    HEP Recipe
    1. Accelerate particles (e, p,…)
    2. Accelerate particles more!
    3. Smash into each other
    4. Look at resulting particles
    5. Understand world
    E=mc2

    View Slide

  10. Mitglied der Helmholtz-Gemeinschaft
    High Energy Physics
    • High Energy Physics (HEP) in a nutshell:

    4
    HEP Recipe
    1. Accelerate particles (e, p,…)
    2. Accelerate particles more!
    3. Smash into each other
    4. Look at resulting particles
    5. Understand world

    E=mc2

    View Slide

  11. Mitglied der Helmholtz-Gemeinschaft
    High Energy Physics
    • High Energy Physics (HEP) in a nutshell:

    4
    HEP Recipe
    1. Accelerate particles (e, p,…)
    2. Accelerate particles more!
    3. Smash into each other
    4. Look at resulting particles
    5. Understand world

    – GPUs are interesting for HEP
    • Many events due to high collision rate
    • Events independent, dividable into subsets
    • Many features extractable (computational intensive)
    E=mc2

    View Slide

  12. Mitglied der Helmholtz-Gemeinschaft
    PANDA
    5

    View Slide

  13. Mitglied der Helmholtz-Gemeinschaft
    PANDA — FAIR
    • Anti Proton Annihilation at Darmstadt
    6

    View Slide

  14. Mitglied der Helmholtz-Gemeinschaft
    PANDA — FAIR
    • Anti Proton Annihilation at Darmstadt
    • FAIR: Facility for Antiproton and Ion Research
    – Accelerator complex at GSI Darmstadt
    – Currently under construction
    6

    View Slide

  15. Mitglied der Helmholtz-Gemeinschaft
    PANDA — FAIR
    • Anti Proton Annihilation at Darmstadt
    • FAIR: Facility for Antiproton and Ion Research
    – Accelerator complex at GSI Darmstadt
    – Currently under construction
    6

    View Slide

  16. Mitglied der Helmholtz-Gemeinschaft
    PANDA — The Experiment
    7
    13 m (43 ft)

    View Slide

  17. Mitglied der Helmholtz-Gemeinschaft
    PANDA — The Experiment
    7
    13 m (43 ft)
    p
    p

    View Slide

  18. Mitglied der Helmholtz-Gemeinschaft
    PANDA — The Experiment
    7
    13 m (43 ft)
    p
    p

    View Slide

  19. Mitglied der Helmholtz-Gemeinschaft
    PANDA — The Experiment
    7
    13 m (43 ft)
    p
    p
    Magnet
    STT
    MVD

    View Slide

  20. Mitglied der Helmholtz-Gemeinschaft
    PANDA — Event Reconstruction
    • Continuous read out
    – Background & signal similar
    – Novel feature
    • Event Rate: 2 • 107/s
    8
    Raw Data Rate:
    200 GB/s
    Disk Storage Space for
    Offline Analysis: 2 PB/y
    Reduce by
    ~1/1000
    (Reject background events,
    save interesting physics events)

    View Slide

  21. Mitglied der Helmholtz-Gemeinschaft
    PANDA — Event Reconstruction
    • Continuous read out
    – Background & signal similar
    – Novel feature
    • Event Rate: 2 • 107/s
    8
    Raw Data Rate:
    200 GB/s
    Disk Storage Space for
    Offline Analysis: 2 PB/y
    Reduce by
    ~1/1000
    (Reject background events,
    save interesting physics events)
    GPUs

    View Slide

  22. Mitglied der Helmholtz-Gemeinschaft
    9
    PANDA — Online Tracking Example
    pp → ψ‘→ ψ π+ π-
    The physics side:
    Antiproton-proton event
    e+e-

    View Slide

  23. Mitglied der Helmholtz-Gemeinschaft
    9
    PANDA — Online Tracking Example
    pp → ψ‘→ ψ π+ π-
    The physics side:
    Antiproton-proton event
    e+e-

    View Slide

  24. Mitglied der Helmholtz-Gemeinschaft
    9
    PANDA — Online Tracking Example
    pp → ψ‘→ ψ π+ π-
    The physics side:
    Antiproton-proton event
    e+e-

    View Slide

  25. Mitglied der Helmholtz-Gemeinschaft
    9
    PANDA — Online Tracking Example
    pp → ψ‘→ ψ π+ π-
    The physics side:
    Antiproton-proton event
    e+e-

    View Slide

  26. Mitglied der Helmholtz-Gemeinschaft
    9
    PANDA — Online Tracking Example
    π+
    π-
    e+
    e-
    ψ‘
    pp → ψ‘→ ψ π+ π-
    The physics side:
    Antiproton-proton event
    e+e-

    View Slide

  27. Mitglied der Helmholtz-Gemeinschaft
    9
    PANDA — Online Tracking Example
    π+
    π-
    e+
    e-
    ψ‘
    pp → ψ‘→ ψ π+ π-
    The physics side:
    Antiproton-proton event
    e+e-

    View Slide

  28. Mitglied der Helmholtz-Gemeinschaft
    9
    PANDA — Online Tracking Example
    π+
    π-
    e+
    e-
    ψ‘
    pp → ψ‘→ ψ π+ π-
    The physics side:
    Antiproton-proton event
    e+e-

    View Slide

  29. Mitglied der Helmholtz-Gemeinschaft
    9
    PANDA — Online Tracking Example
    pp → ψ‘→ ψ π+ π-
    The physics side:
    Antiproton-proton event
    e+e-

    View Slide

  30. Mitglied der Helmholtz-Gemeinschaft
    10
    PANDA — Online Tracking Example
    The detector side
    Everything in reverse

    View Slide

  31. Mitglied der Helmholtz-Gemeinschaft
    10
    PANDA — Online Tracking Example
    The detector side
    Everything in reverse

    View Slide

  32. Mitglied der Helmholtz-Gemeinschaft
    10
    PANDA — Online Tracking Example
    The detector side
    Everything in reverse
    Particle tracks are curves*

    View Slide

  33. Mitglied der Helmholtz-Gemeinschaft
    10
    PANDA — Online Tracking Example
    The detector side
    Everything in reverse
    Particle tracks are curves*
    actually: 3D helices

    View Slide

  34. Mitglied der Helmholtz-Gemeinschaft
    10
    PANDA — Online Tracking Example
    The detector side
    Everything in reverse
    Particle tracks are curves*
    → Find curves connecting
    hit points!

    View Slide

  35. Mitglied der Helmholtz-Gemeinschaft
    10
    PANDA — Online Tracking Example
    The detector side
    Everything in reverse
    Particle tracks are curves*
    → Find curves connecting
    hit points!

    View Slide

  36. Mitglied der Helmholtz-Gemeinschaft
    10
    PANDA — Online Tracking Example
    The detector side
    Everything in reverse
    Particle tracks are curves*
    → Find curves connecting
    hit points!
    Sort by track quality
    Hits well matched?
    How many hits?

    View Slide

  37. Mitglied der Helmholtz-Gemeinschaft
    10
    PANDA — Online Tracking Example
    The detector side
    Everything in reverse
    Particle tracks are curves*
    → Find curves connecting
    hit points!
    Sort by track quality
    Hits well matched?
    How many hits?

    View Slide

  38. Mitglied der Helmholtz-Gemeinschaft
    10
    PANDA — Online Tracking Example
    The detector side
    Everything in reverse
    Particle tracks are curves*
    → Find curves connecting
    hit points!
    Sort by track quality
    Hits well matched?
    How many hits?
    Identify final
    particles
    Curvature, length

    View Slide

  39. Mitglied der Helmholtz-Gemeinschaft
    10
    PANDA — Online Tracking Example
    The detector side
    Everything in reverse
    Particle tracks are curves*
    → Find curves connecting
    hit points!
    Sort by track quality
    Hits well matched?
    How many hits?
    Identify final
    particles
    Curvature, length

    π+
    π-
    e+
    e-
    ?

    View Slide

  40. Mitglied der Helmholtz-Gemeinschaft
    10
    PANDA — Online Tracking Example
    The detector side
    Everything in reverse
    Particle tracks are curves*
    → Find curves connecting
    hit points!
    Sort by track quality
    Hits well matched?
    How many hits?
    Identify final
    particles
    Curvature, length

    Identify intermediate
    particles
    Mass constraints
    Geometry

    π+
    π-
    e+
    e-
    ?
    ψ‘

    View Slide

  41. Mitglied der Helmholtz-Gemeinschaft
    10
    PANDA — Online Tracking Example
    The detector side
    Everything in reverse
    Particle tracks are curves*
    → Find curves connecting
    hit points!
    Sort by track quality
    Hits well matched?
    How many hits?
    Identify final
    particles
    Curvature, length

    Identify intermediate
    particles
    Mass constraints
    Geometry

    Identify process:
    pp → ψ‘ → e+e- π+ π-
    π+
    π-
    e+
    e-
    ?
    ψ‘

    View Slide

  42. Mitglied der Helmholtz-Gemeinschaft
    11
    PANDA — Triggering

    View Slide

  43. Mitglied der Helmholtz-Gemeinschaft
    11
    PANDA — Triggering
    Trigger

    View Slide

  44. Mitglied der Helmholtz-Gemeinschaft
    11
    PANDA — Triggering
    Trigger
    Fast detector layer(s)
    Trigger data acquisition

    View Slide

  45. Mitglied der Helmholtz-Gemeinschaft
    11
    PANDA — Triggering
    Trigger
    Fast detector layer(s)
    Trigger data acquisition
    π+
    π-
    e+
    e-
    ψ‘

    View Slide

  46. Mitglied der Helmholtz-Gemeinschaft
    11
    PANDA — Triggering
    Trigger
    Fast detector layer(s)
    Trigger data acquisition
    π+
    π-
    e+
    e-
    ψ‘
    Usual HEP experiment

    View Slide

  47. Mitglied der Helmholtz-Gemeinschaft
    11
    PANDA — Triggering
    Trigger
    Fast detector layer(s)
    Trigger data acquisition
    π+
    π-
    e+
    e-
    ψ‘
    Usual HEP experiment

    View Slide

  48. Mitglied der Helmholtz-Gemeinschaft
    11
    PANDA — Triggering
    Trigger
    Fast detector layer(s)
    Trigger data acquisition
    π+
    π-
    e+
    e-
    ψ‘
    Usual HEP experiment

    View Slide

  49. Mitglied der Helmholtz-Gemeinschaft
    11
    PANDA — Triggering
    Trigger
    Fast detector layer(s)
    Trigger data acquisition
    π+
    π-
    e+
    e-
    ψ‘
    Usual HEP experiment

    View Slide

  50. Mitglied der Helmholtz-Gemeinschaft
    11
    PANDA — Triggering
    Trigger
    Fast detector layer(s)
    Trigger data acquisition
    π+
    π-
    e+
    e-
    ψ‘
    PANDA
    Usual HEP experiment

    View Slide

  51. Mitglied der Helmholtz-Gemeinschaft
    11
    PANDA — Triggering
    Trigger
    Fast detector layer(s)
    Trigger data acquisition
    π+
    π-
    e+
    e-
    ψ‘
    PANDA
    Usual HEP experiment

    View Slide

  52. Mitglied der Helmholtz-Gemeinschaft
    11
    PANDA — Triggering
    Trigger
    Fast detector layer(s)
    Trigger data acquisition
    π+
    π-
    e+
    e-
    ψ‘
    PANDA
    Usual HEP experiment

    View Slide

  53. Mitglied der Helmholtz-Gemeinschaft
    11
    PANDA — Triggering
    Trigger
    Fast detector layer(s)
    Trigger data acquisition
    π+
    π-
    e+
    e-
    ψ‘
    PANDA
    Usual HEP experiment

    View Slide

  54. Mitglied der Helmholtz-Gemeinschaft
    11
    PANDA — Triggering
    Trigger
    Fast detector layer(s)
    Trigger data acquisition
    Online Tracking!
    π+
    π-
    e+
    e-
    ψ‘
    PANDA
    Usual HEP experiment

    View Slide

  55. Mitglied der Helmholtz-Gemeinschaft
    GPUS AT PANDA
    12

    View Slide

  56. Mitglied der Helmholtz-Gemeinschaft
    GPUs @ PANDA — Online Tracking
    • Port tracking algorithms to GPU
    – Serial → parallel
    – C++ → CUDA
    • Investigate suitability for online performance
    • But also: Find & invent tracking algorithms…
    • Under investigation:
    – Hough Transformation
    – Riemann Track Finder
    – Triplet Finder
    13

    View Slide

  57. Mitglied der Helmholtz-Gemeinschaft
    ALGORITHMS #1
    14
    Hough Transform
    Riemann Track Finder
    Triplet Finder

    View Slide

  58. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform
    • Established method for edge detection in images
    (from 1970s HEP experiments!)
    • New challenges for
    particle tracking algorithm
    – Only limited pixels per edge
    • Easily parallelizable method
    15
    Original algorithm by
    Hough, adapted by
    Duda & Hart

    View Slide

  59. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Method
    • Idea: Transform (x,y)i → (α,r)ij, find lines via (α,r) space
    • Solve rij line equation for
    – Many hits (x,y)i
    – Many αj ∈ [0°,360°) each
    • Fill histogram
    • Extract track parameters
    16
    x
    y
    x
    y
    Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    → Bin with highest multiplicity
    gives track parameters
    r
    α
    rij =
    cos
    ↵j
    ·
    xi +
    sin

    More

    View Slide

  60. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Method
    • Idea: Transform (x,y)i → (α,r)ij, find lines via (α,r) space
    • Solve rij line equation for
    – Many hits (x,y)i
    – Many αj ∈ [0°,360°) each
    • Fill histogram
    • Extract track parameters
    16
    x
    y
    x
    y
    Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    → Bin with highest multiplicity
    gives track parameters
    r
    α
    rij =
    cos
    ↵j
    ·
    xi +
    sin

    More
    i: ~100 hits/event (STT)
    j: steps of 0.2° rij: 180 000
    rij =
    cos
    ↵j
    ·
    xi +
    sin
    ↵j
    ·
    yi

    View Slide

  61. Mitglied der Helmholtz-Gemeinschaft
    17
    °
    Angle /
    0 20 40 60 80 100 120 140 160 180
    Hough transformed
    -0.04
    -0.02
    0
    0.02
    0.04
    0.06
    0
    Entries 324000
    Mean x 90
    Mean y 0.02791
    RMS x 51.96
    RMS y 0.02133
    0
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    0
    Entries 324000
    Mean x 90
    Mean y 0.02791
    RMS x 51.96
    RMS y 0.02133
    PANDA STT
    180 x 180 Grid
    r
    0.06
    0.04
    α
    Hough Transform — Example
    10 (x,y) points

    View Slide

  62. Mitglied der Helmholtz-Gemeinschaft
    17
    °
    Angle /
    0 20 40 60 80 100 120 140 160 180
    Hough transformed
    -0.04
    -0.02
    0
    0.02
    0.04
    0.06
    0
    Entries 324000
    Mean x 90
    Mean y 0.02791
    RMS x 51.96
    RMS y 0.02133
    0
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    0
    Entries 324000
    Mean x 90
    Mean y 0.02791
    RMS x 51.96
    RMS y 0.02133
    PANDA STT
    180 x 180 Grid
    r
    0.06
    0.04
    α
    Hough Transform — Example
    10 (x,y) points

    View Slide

  63. Mitglied der Helmholtz-Gemeinschaft
    17
    r
    0.06
    0.04
    °
    Angle /
    0 20 40 60 80 100 120 140 160 180
    Hough transformed
    -0.4
    -0.3
    -0.2
    -0.1
    0
    0.1
    0.2
    0.3
    0.4
    0.5
    0.6 0
    Entries 2.2356e+08
    Mean x 90
    Mean y 0.02905
    RMS x 51.96
    RMS y 0.1063
    0
    5
    10
    15
    20
    25
    0
    Entries 2.2356e+08
    Mean x 90
    Mean y 0.02905
    RMS x 51.96
    RMS y 0.1063
    1800 x 1800 Grid
    PANDA STT+MVD
    68 (x,y) points
    α
    Hough Transform — Example

    View Slide

  64. Mitglied der Helmholtz-Gemeinschaft
    17
    r
    0.06
    0.04
    °
    Angle /
    0 20 40 60 80 100 120 140 160 180
    Hough transformed
    -0.4
    -0.3
    -0.2
    -0.1
    0
    0.1
    0.2
    0.3
    0.4
    0.5
    0.6 0
    Entries 2.2356e+08
    Mean x 90
    Mean y 0.02905
    RMS x 51.96
    RMS y 0.1063
    0
    5
    10
    15
    20
    25
    0
    Entries 2.2356e+08
    Mean x 90
    Mean y 0.02905
    RMS x 51.96
    RMS y 0.1063
    1800 x 1800 Grid
    PANDA STT+MVD
    68 (x,y) points
    α
    Hough Transform — Example

    View Slide

  65. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Remarks
    18
    Two Implementations
    Thrust Plain CUDA
    • Performance: 3 ms/event
    – Independent of angular granularity
    – Reduced to set of standard routines
    • Fast (uses Thrust‘s optimized algorithms)
    • Inflexible (has it‘s limits, hard to customize)
    – No peakfinding included
    • Even possible?
    • Adds to time!
    • Performance: 0.5 ms/event
    – Built completely for this task
    • Fitting to every problem
    • Customizable
    • A bit more complicated at parts
    – Simple peakfinder implemented
    (threshold)
    • Using: Dynamic Parallelism, Shared
    Memory

    View Slide

  66. Mitglied der Helmholtz-Gemeinschaft
    19
    ALGORITHMS #2
    Hough Transform
    Riemann Track Finder
    Triplet Finder

    View Slide

  67. Mitglied der Helmholtz-Gemeinschaft
    20
    Riemann Track Finder
    • Algorithm in use in PANDA‘s offline analysis framework
    for long time
    – Good results
    – Well-understood
    – Handling of uncertainties
    • Work by Jonathan Timcheck
    – Summer student at Jülich
    Based on work by
    Strandlie et al

    View Slide

  68. Mitglied der Helmholtz-Gemeinschaft
    21
    Riemann Track Finder — Method
    • Idea: Don‘t fit lines (in 2D), fit planes (in 3D)!
    • Create seeds
    – All possible three hit combinations
    • Grow seeds to tracks
    Continuously test next hit if it fits
    – Use mapping to Riemann paraboloid
    x
    x
    x
    x
    y
    z‘
    x
    x
    x
    y
    x
    x
    x
    x
    y
    x
    More on: Seeds; Growing

    View Slide

  69. nLayerx
    = 1
    2
    ⇣p
    8x
    +
    1 1

    pos
    (
    nLayerx
    ) =
    3
    pp
    3
    p
    243x2 1
    +
    27x
    32
    /
    3
    + 1
    3
    p
    3
    3
    pp
    3
    p
    243x2 1
    +
    27x
    1
    Mitglied der Helmholtz-Gemeinschaft
    22
    Riemann Algorithm — GPU Version
    • GPU Optimization: Unfolding loops
    → 100 × faster than CPU version
    • Time for one event (Tesla K20X)
    Time(%) Time Calls Avg Min Max Name
    75.55% 439.49us 1 439.49us 439.49us 439.49us extend_cut_hit_triplets_k
    5.96% 34.656us 4 8.6640us 2.3360us 22.432us [CUDA memcpy DtoH]
    4.36% 25.344us 1 25.344us 25.344us 25.344us cut_hit_triplets_k
    4.26% 24.800us 6 4.1330us 3.7760us 5.3440us [CUDA memset]
    2.57% 14.976us 1 14.976us 14.976us 14.976us generate_hit_triplet
    2.44% 14.176us 1 14.176us 14.176us 14.176us generate_layer_triplets
    1.30% 7.5520us 1 7.5520us 7.5520us 7.5520us void thrust
    1.11% 6.4640us 1 6.4640us 6.4640us 6.4640us void thrust
    1.11% 6.4640us 1 6.4640us 6.4640us 6.4640us void thrust
    0.89% 5.1520us 5 1.0300us 928ns 1.3440us [CUDA memcpy HtoD]
    0.45% 2.6240us 1 2.6240us 2.6240us 2.6240us project_onto_paraboloid_k
    int ijk = threadIdx.x + blockIdx.x * blockDim.x;
    for () {for () {for () {}}}

    View Slide

  70. Mitglied der Helmholtz-Gemeinschaft
    23
    ALGORITHMS #3
    Hough Transform
    Riemann Track Finder
    Triplet Finder

    View Slide

  71. Mitglied der Helmholtz-Gemeinschaft
    24
    Triplet Finder
    • Algorithm specifically designed for the
    PANDA Straw Tube Tracker (STT)
    http://www.fz-juelich.de/ias/jsc/
    Original algorithm by
    Marius Mertens et al
    1.5 m
    • Ported to GPU by Andrew Adinetz
    – NVIDIA Application Lab Jülich
    – CUDA, Dynamic Parallelism, Thrust

    View Slide

  72. Mitglied der Helmholtz-Gemeinschaft
    25
    Triplet Finder
    • Idea: Use only subset of detector as seed
    – Combine 3 hits to Triplet
    – Calculate circle from 3 Triplets (no fit)
    • Features
    – Fast & robust algorithm, no t0
    – Many tuning possibilities
    More

    View Slide

  73. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Display
    26
    Triplet
    Isochrone early
    Isochrone early & skewed
    Isochrone close
    Isochrone late
    MVD hit
    Track timed out
    Track current

    View Slide

  74. Mitglied der Helmholtz-Gemeinschaft
    27
    Triplet Finder — Times

    View Slide

  75. Mitglied der Helmholtz-Gemeinschaft
    27
    Triplet Finder — Times

    View Slide

  76. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Optimizations
    • Bunching Wrapper
    – Hits from one event have similar timestamp
    – Combine hits to sets (bunches) which occupy GPU best
    28

    View Slide

  77. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Optimizations
    • Bunching Wrapper
    – Hits from one event have similar timestamp
    – Combine hits to sets (bunches) which occupy GPU best
    28
    Hit

    View Slide

  78. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Optimizations
    • Bunching Wrapper
    – Hits from one event have similar timestamp
    – Combine hits to sets (bunches) which occupy GPU best
    28
    Hit Event

    View Slide

  79. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Optimizations
    • Bunching Wrapper
    – Hits from one event have similar timestamp
    – Combine hits to sets (bunches) which occupy GPU best
    28
    Hit Event

    View Slide

  80. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Optimizations
    • Bunching Wrapper
    – Hits from one event have similar timestamp
    – Combine hits to sets (bunches) which occupy GPU best
    28
    Hit Event
    Bunch

    View Slide

  81. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Optimizations
    • Bunching Wrapper
    – Hits from one event have similar timestamp
    – Combine hits to sets (bunches) which occupy GPU best
    28
    Hit Event
    Bunch
    (N2) → (N)

    View Slide

  82. Mitglied der Helmholtz-Gemeinschaft
    29
    Triplet Finder — Bunching Performance

    View Slide

  83. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Optimizations
    30
    More
    • Sector Row testing
    – After found track:
    Hit association not with all hits of current window,
    but only with subset
    (first test rows of sector, then hits of row)

    View Slide

  84. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Optimizations
    30
    More
    • Sector Row testing
    – After found track:
    Hit association not with all hits of current window,
    but only with subset
    (first test rows of sector, then hits of row)

    View Slide

  85. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Optimizations
    30
    More
    • Sector Row testing
    – After found track:
    Hit association not with all hits of current window,
    but only with subset
    (first test rows of sector, then hits of row)

    View Slide

  86. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Optimizations
    30
    More
    • Sector Row testing
    – After found track:
    Hit association not with all hits of current window,
    but only with subset
    (first test rows of sector, then hits of row)

    View Slide

  87. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Optimizations
    30
    More
    • Sector Row testing
    – After found track:
    Hit association not with all hits of current window,
    but only with subset
    (first test rows of sector, then hits of row)

    View Slide

  88. Mitglied der Helmholtz-Gemeinschaft
    31
    Triplet Finder — Sector Rows
    Preliminary
    (in publication)

    View Slide

  89. Dynamic
    Parallelism
    Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Optimizations
    • Compare kernel launch strategies
    32
    1 thread/bunch
    Calling kernel
    1 thread/bunch
    Calling kernel
    Triplet
    Finder
    1 thread/bunch
    Calling kernel
    1 block/bunch
    Joined kernel
    1 block/bunch
    Joined kernel
    1 block/bunch
    Joined kernel
    TF Stage #1
    TF Stage #2
    TF Stage #3
    TF Stage #4
    1 stream/bunch
    Combining
    stream
    1 stream/bunch
    Combining
    stream
    1 stream/bunch
    Calling stream
    Joined
    Kernel
    Host
    Streams
    Triplet
    Finder
    Triplet
    Finder
    CPU
    GPU
    TF Stage #1
    TF Stage #2
    TF Stage #3
    TF Stage #4
    TF Stage #1
    TF Stage #2
    TF Stage #3
    TF Stage #4
    CPU
    GPU

    View Slide

  90. Mitglied der Helmholtz-Gemeinschaft
    33
    Triplet Finder — Kernel Launches
    Explanation
    Preliminary
    (in publication)

    View Slide

  91. Tesla K40 Tesla K20X
    Peak double performance
    Peak single performance
    GPU Chipset
    # CUDA Cores
    Memory size
    Memory bandwidth
    1.46 TFLOPS 1.31 TFLOPS
    4.29 TFLOPS 3.95 TFLOPS
    GK110B GK110
    2880 2688
    12 GB 6 GB
    288 GByte/s 250 GByte/s
    Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Optimizations
    • Impact of chipset
    34
    Source: http://www.nvidia.com/content/tesla/pdf/NVIDIA-Tesla-Kepler-Family-Datasheet.pdf

    View Slide

  92. Mitglied der Helmholtz-Gemeinschaft
    35
    Triplet Finder — Clock Speed / GPU
    Preliminary
    (in publication)
    K40 3004 MHz, 745 MHz / 875 MHz
    K20X 2600 MHz, 732 MHz / 784 MHz
    Memory Clock Core Clock GPU Boost

    View Slide

  93. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Optimizations
    • Many optimizations possible
    – Most important: Bunching wrapper
    – More float less double-cards à la K10 a viable alternative
    • Best performance: 20 µs/event
    → Online Tracking a feasible technique for PANDA
    – Multi GPU system needed – (100) GPUs
    36

    View Slide

  94. Mitglied der Helmholtz-Gemeinschaft
    Summary
    • GPUs are very interesting for HEP
    • PANDA investigates GPUs as central element in experiment‘s
    design
    • Algorithms in active evaluation and optimization
    • Collaboration with NVIDIA Application Lab
    37

    View Slide

  95. Thank you!
    Andreas Herten
    [email protected]
    @AndiH
    #GTC14
    Mitglied der Helmholtz-Gemeinschaft
    Summary
    • GPUs are very interesting for HEP
    • PANDA investigates GPUs as central element in experiment‘s
    design
    • Algorithms in active evaluation and optimization
    • Collaboration with NVIDIA Application Lab
    37

    View Slide

  96. Mitglied der Helmholtz-Gemeinschaft
    List of Resources Used
    • #4: Earth icon by Francesco Paleari from The Noun Project
    • #4: Einstein icon by Roman Rusinov from The Noun Project
    • #6: FAIR vector logo from official FAIR website
    • #6: FAIR rendering from official website
    • #11: Flare Gun icon by Jop van der Kroef from The Noun Project
    • #27: STT event animation by Marius C. Mertens
    • #35: Graphics cards images by NVIDIA promotion
    • #35: GPU Specifications
    – Tesla K20X Specifications: http://www.nvidia.com/content/PDF/kepler/Tesla-K20X-BD-06397-001-
    v07.pdf
    – Tesla K40 Specifications: http://www.nvidia.com/content/PDF/kepler/Tesla-K40-Active-Board-Spec-
    BD-06949-001_v03.pdf
    – Tesla Familiy Overview: http://www.nvidia.com/content/tesla/pdf/NVIDIA-Tesla-Kepler-Family-
    Datasheet.pdf
    38

    View Slide

  97. Mitglied der Helmholtz-Gemeinschaft
    BACKUP
    39

    View Slide

  98. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    40
    Back

    View Slide

  99. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    40
    x
    y
    Back

    View Slide

  100. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    40
    x
    y
    Back

    View Slide

  101. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    40
    x
    y
    *
    *
    (r, α)1
    rij =
    cos
    ↵j
    ·
    xi +
    sin
    ↵j
    ·
    yi + ⇢i
    Back

    View Slide

  102. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    40
    x
    y
    *
    *
    r
    α
    (r, α)1
    rij =
    cos
    ↵j
    ·
    xi +
    sin
    ↵j
    ·
    yi + ⇢i
    Back

    View Slide

  103. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    40
    x
    y
    *
    *
    r
    α
    (r, α)1
    rij =
    cos
    ↵j
    ·
    xi +
    sin
    ↵j
    ·
    yi + ⇢i
    Back

    View Slide

  104. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    40
    x
    y
    *
    *
    r
    α
    (r, α)1
    (r, α)2
    rij =
    cos
    ↵j
    ·
    xi +
    sin
    ↵j
    ·
    yi + ⇢i
    Back

    View Slide

  105. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    40
    x
    y
    *
    *
    r
    α
    rij =
    cos
    ↵j
    ·
    xi +
    sin
    ↵j
    ·
    yi + ⇢i
    Back

    View Slide

  106. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    40
    x
    y
    *
    *
    r
    α
    rij =
    cos
    ↵j
    ·
    xi +
    sin
    ↵j
    ·
    yi + ⇢i
    Back

    View Slide

  107. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    40
    x
    y
    *
    *
    r
    α
    rij =
    cos
    ↵j
    ·
    xi +
    sin
    ↵j
    ·
    yi + ⇢i
    Back

    View Slide

  108. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    40
    x
    y
    *
    *
    r
    α
    rij =
    cos
    ↵j
    ·
    xi +
    sin
    ↵j
    ·
    yi + ⇢i
    Back

    View Slide

  109. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    40
    x
    y
    *
    *
    r
    α
    rij =
    cos
    ↵j
    ·
    xi +
    sin
    ↵j
    ·
    yi + ⇢i
    Back

    View Slide

  110. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    40
    x
    y
    → Bin with highest multiplicity
    gives track parameters
    *
    *
    r
    α
    rij =
    cos
    ↵j
    ·
    xi +
    sin
    ↵j
    ·
    yi + ⇢i
    Back

    View Slide

  111. Mitglied der Helmholtz-Gemeinschaft
    41
    Riemann Algorithm — Procedure

    View Slide

  112. Mitglied der Helmholtz-Gemeinschaft
    41
    Riemann Algorithm — Procedure
    • Create triplet of hit points
    – All possible three hit combinations need to become triplets
    1

    View Slide

  113. Mitglied der Helmholtz-Gemeinschaft
    41
    Riemann Algorithm — Procedure
    • Create triplet of hit points
    – All possible three hit combinations need to become triplets
    • Grow triplets to tracks:
    Continuously test next hit if it fits to triplet track
    – Use Riemann paraboloid to circle fit track
    • Test closeness of new hit: good → add hit; bad → dismiss hit
    • Continue with next hit
    – Helix fit: arc length s vs. z position
    1
    2

    View Slide

  114. Mitglied der Helmholtz-Gemeinschaft
    42
    1 2 3 4 5
    1
    2
    3
    4
    5
    Riemann Algorithm — 1 Triplets
    1
    Layer number
    Back

    View Slide

  115. Mitglied der Helmholtz-Gemeinschaft
    42
    1 2 3 4 5
    1
    2
    3
    4
    5
    Riemann Algorithm — 1 Triplets
    1
    Layer number
    Back

    View Slide

  116. Mitglied der Helmholtz-Gemeinschaft
    42
    1 2 3 4 5
    1
    2
    3
    4
    5
    Riemann Algorithm — 1 Triplets
    1
    Layer number
    Back

    View Slide

  117. Mitglied der Helmholtz-Gemeinschaft
    42
    1 2 3 4 5
    21
    11 31
    1
    2
    3
    4
    5
    Riemann Algorithm — 1 Triplets
    1
    Layer number
    Back

    View Slide

  118. Mitglied der Helmholtz-Gemeinschaft
    42
    1 2 3 4 5
    21
    11 31
    31
    11 41
    1
    2
    3
    4
    5
    Riemann Algorithm — 1 Triplets
    1
    Layer number
    Back

    View Slide

  119. Mitglied der Helmholtz-Gemeinschaft
    42
    1 2 3 4 5
    21
    11 31
    31
    11 41
    31
    11 32
    1
    2
    3
    4
    5
    Riemann Algorithm — 1 Triplets
    1
    Layer number
    Back

    View Slide

  120. Mitglied der Helmholtz-Gemeinschaft
    42
    1 2 3 4 5
    21
    11 31
    31
    11 41
    31
    11 32
    1
    2
    3
    4
    5
    Riemann Algorithm — 1 Triplets
    1
    Layer number
    Back

    View Slide

  121. Mitglied der Helmholtz-Gemeinschaft
    43
    Riemann Algorithm — 1 Expansion
    2
    Back

    View Slide

  122. Mitglied der Helmholtz-Gemeinschaft
    43
    Riemann Algorithm — 1 Expansion
    2
    x
    x
    x
    x
    y
    z‘
    Expand to z‘
    Back

    View Slide

  123. Mitglied der Helmholtz-Gemeinschaft
    43
    Riemann Algorithm — 1 Expansion
    2
    x
    x
    x
    x
    y
    z‘
    Expand to z‘
    x
    x
    x
    y
    x
    Riemann Surface
    (paraboloid)
    Back

    View Slide

  124. Mitglied der Helmholtz-Gemeinschaft
    43
    Riemann Algorithm — 1 Expansion
    2
    x
    x
    x
    x
    y
    z‘
    Expand to z‘
    x
    x
    x
    y
    x
    Riemann Surface
    (paraboloid)
    Back

    View Slide

  125. Mitglied der Helmholtz-Gemeinschaft
    43
    Riemann Algorithm — 1 Expansion
    2
    x
    x
    x
    x
    y
    z‘
    Expand to z‘
    x
    x
    x
    y
    x
    Riemann Surface
    (paraboloid)
    Back

    View Slide

  126. Mitglied der Helmholtz-Gemeinschaft
    43
    Riemann Algorithm — 1 Expansion
    2
    x
    x
    x
    x
    y
    z‘
    Expand to z‘
    x
    x
    x
    y
    x
    Riemann Surface
    (paraboloid)
    Back

    View Slide

  127. Mitglied der Helmholtz-Gemeinschaft
    43
    Riemann Algorithm — 1 Expansion
    2
    x
    x
    x
    x
    y
    z‘
    Expand to z‘
    x
    x
    x
    y
    x
    Riemann Surface
    (paraboloid)
    Back

    View Slide

  128. Mitglied der Helmholtz-Gemeinschaft
    43
    Riemann Algorithm — 1 Expansion
    2
    x
    x
    x
    x
    y
    z‘
    Expand to z‘
    x
    x
    x
    y
    x
    Riemann Surface
    (paraboloid)
    x
    Back

    View Slide

  129. Mitglied der Helmholtz-Gemeinschaft
    43
    Riemann Algorithm — 1 Expansion
    2
    x
    x
    x
    x
    y
    z‘
    Expand to z‘
    x
    x
    x
    y
    x
    Riemann Surface
    (paraboloid)
    x
    Back

    View Slide

  130. Mitglied der Helmholtz-Gemeinschaft
    43
    Riemann Algorithm — 1 Expansion
    2
    x
    x
    x
    x
    y
    z‘
    Expand to z‘
    x
    x
    x
    y
    x
    Riemann Surface
    (paraboloid)
    x
    Back

    View Slide

  131. Mitglied der Helmholtz-Gemeinschaft
    43
    Riemann Algorithm — 1 Expansion
    2
    x
    x
    x
    x
    y
    z‘
    Expand to z‘
    x
    x
    x
    y
    x
    Riemann Surface
    (paraboloid)
    x
    Back

    View Slide

  132. Mitglied der Helmholtz-Gemeinschaft
    43
    Riemann Algorithm — 1 Expansion
    2
    x
    x
    x
    x
    y
    z‘
    Expand to z‘
    x
    x
    x
    y
    x
    Riemann Surface
    (paraboloid)
    x
    Back

    View Slide

  133. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    44
    More
    STT

    View Slide

  134. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    • STT hit in pivot straw
    44
    More
    STT

    View Slide

  135. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    • STT hit in pivot straw
    • Find surrounding hits
    → Create virtual hit (triplet)
    at center of gravity (cog)
    44
    More
    STT

    View Slide

  136. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    • STT hit in pivot straw
    • Find surrounding hits
    → Create virtual hit (triplet)
    at center of gravity (cog)
    • Combine with
    44
    More
    STT

    View Slide

  137. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    • STT hit in pivot straw
    • Find surrounding hits
    → Create virtual hit (triplet)
    at center of gravity (cog)
    • Combine with
    1. Second STT pivot-cog virtual hit
    44
    More
    STT

    View Slide

  138. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    • STT hit in pivot straw
    • Find surrounding hits
    → Create virtual hit (triplet)
    at center of gravity (cog)
    • Combine with
    1. Second STT pivot-cog virtual hit
    2. Interaction point
    44
    More
    STT

    View Slide

  139. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    • STT hit in pivot straw
    • Find surrounding hits
    → Create virtual hit (triplet)
    at center of gravity (cog)
    • Combine with
    1. Second STT pivot-cog virtual hit
    2. Interaction point
    • Calculate circle through three points
    44
    More
    STT

    View Slide

  140. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    • STT hit in pivot straw
    • Find surrounding hits
    → Create virtual hit (triplet)
    at center of gravity (cog)
    • Combine with
    1. Second STT pivot-cog virtual hit
    2. Interaction point
    • Calculate circle through three points
    → Track Candidate
    44
    More
    STT

    View Slide

  141. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    • STT hit in pivot straw
    • Find surrounding hits
    → Create virtual hit (triplet)
    at center of gravity (cog)
    • Combine with
    1. Second STT pivot-cog virtual hit
    2. Interaction point
    • Calculate circle through three points
    → Track Candidate
    44
    More
    STT

    View Slide

  142. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    • STT hit in pivot straw
    • Find surrounding hits
    → Create virtual hit (triplet)
    at center of gravity (cog)
    • Combine with
    1. Second STT pivot-cog virtual hit
    2. Interaction point
    • Calculate circle through three points
    → Track Candidate
    44
    More
    STT

    View Slide

  143. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    • STT hit in pivot straw
    • Find surrounding hits
    → Create virtual hit (triplet)
    at center of gravity (cog)
    • Combine with
    1. Second STT pivot-cog virtual hit
    2. Interaction point
    • Calculate circle through three points
    → Track Candidate
    44
    More
    STT

    View Slide

  144. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    • STT hit in pivot straw
    • Find surrounding hits
    → Create virtual hit (triplet)
    at center of gravity (cog)
    • Combine with
    1. Second STT pivot-cog virtual hit
    2. Interaction point
    • Calculate circle through three points
    → Track Candidate
    44
    More
    STT

    View Slide

  145. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    • STT hit in pivot straw
    • Find surrounding hits
    → Create virtual hit (triplet)
    at center of gravity (cog)
    • Combine with
    1. Second STT pivot-cog virtual hit
    2. Interaction point
    • Calculate circle through three points
    → Track Candidate
    44
    More
    Interaction Point
    STT

    View Slide

  146. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Optimizations
    • Sector Row testing
    – Thicken track; shrink sector row layer to line
    – Find intersection
    45
    Sector-Row Testing
    Track
    Sector-Row
    Track
    Sector-Row
    Back

    View Slide

  147. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Kernel Launch Strategies
    • Joined Kernel (JK): slowest
    – High # registers → low occupancy
    • Dynamic Parallelism (DP) / Host Streams (HS): comparable performance
    – Performance
    • HS faster for small # processed hits, DP faster for > 45000 hits
    • HS stagnates there, while DP continues rising
    – Limiting factor
    • High # of required kernel calls
    • Kernel launch latency
    • Memcopy
    – HS more affected by this, because
    • More PCI-E transfers (launch configurations for kernels)
    • Less launch throughput, kernel launch latency gets more important
    • False dependencies of launched kernels
    – Single CPU thread handles all CUDA streams (Multi-thread possible, but synchronization
    overhead too high for good performance)
    – Grid scheduling done on hardware (Grid Management Unit) (DP: software)
    » False dependencies when N(streams) > N(device connections)=323.5
    46
    Back
    Back

    View Slide

  148. Mitglied der Helmholtz-Gemeinschaft
    47
    Triplet Finder — Host Stream Connections
    Preliminary
    (in publication)

    View Slide

  149. Mitglied der Helmholtz-Gemeinschaft
    48
    Triplet Finder — Bunch Sizes
    Preliminary
    (in publication)

    View Slide

  150. Berlin
    Munich
    Cologne
    Jülich
    Mitglied der Helmholtz-Gemeinschaft
    49
    Forschungszentrum Jülich & Me
    • Research Center
    – *1956; Federal center
    Budget: 730 Mio. USD/year
    – 5300 employees
    • Thereof 1700 scientists (600 PhD students)
    – Topics: Health, Energy, Environment
    Physics; Supercomputing
    Many large-scale facilities
    • Me
    – Diploma in physics from RWTH Aachen University
    (CMS experiment)
    – PhD researcher since 2011:
    GPU Online Tracking for PANDA

    View Slide