Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Online Tracking on GPUs at PANDA

AndiH
May 23, 2014

Online Tracking on GPUs at PANDA

Talk at FIAS Tracking Workshop

AndiH

May 23, 2014
Tweet

More Decks by AndiH

Other Decks in Science

Transcript

  1. Mitglied der Helmholtz-Gemeinschaft
    1
    5th International Workshop for Future
    Challenges in Tracking and Trigger Concepts,
    FIAS Frankfurt
    13 May 2014, Andreas Herten
    Online Tracking on GPUs at
    PANDA

    View Slide

  2. Mitglied der Helmholtz-Gemeinschaft
    Outline
    • GPUs & PANDA
    • Algorithms
    – Hough Transform
    – Riemann Track Finder
    – Triplet Finder
    2

    View Slide

  3. Mitglied der Helmholtz-Gemeinschaft
    Graphics Processing Units
    3
    GPU
    CPU

    View Slide

  4. Mitglied der Helmholtz-Gemeinschaft
    Graphics Processing Units
    3
    GPU
    CPU
    a1
    → b1
    → c1; a2
    → b2
    → c2; a3
    → …
    a1
    → b1
    → c1
    a2
    → b2
    → c2
    a3
    → …

    View Slide

  5. Mitglied der Helmholtz-Gemeinschaft
    PANDA — The Experiment
    4
    13 m
    p
    p
    Magnet
    STT
    MVD

    View Slide

  6. Mitglied der Helmholtz-Gemeinschaft
    PANDA — Event Reconstruction
    • Continuous read out
    – Background & signal similar
    – Novel feature
    • Event Rate: 2 • 107/s
    5
    Raw Data Rate:
    200 GB/s
    Disk Storage Space for
    Offline Analysis: 3 PB/y
    Reduce by
    ~1/1000
    (Reject background events,
    save interesting physics events)

    View Slide

  7. Mitglied der Helmholtz-Gemeinschaft
    PANDA — Event Reconstruction
    • Continuous read out
    – Background & signal similar
    – Novel feature
    • Event Rate: 2 • 107/s
    5
    Raw Data Rate:
    200 GB/s
    Disk Storage Space for
    Offline Analysis: 3 PB/y
    Reduce by
    ~1/1000
    (Reject background events,
    save interesting physics events)
    GPUs

    View Slide

  8. Mitglied der Helmholtz-Gemeinschaft
    ALGORITHMS #1
    6
    Hough Transform
    Riemann Track Finder
    Triplet Finder

    View Slide

  9. Mitglied der Helmholtz-Gemeinschaft
    Algorithm: Hough Transform
    • Idea: Transform (x,y)i → (α,r)ij, find lines via (α,r) space
    • Solve rij line equation for
    – Lots of hits (x,y,ρ)i
    and
    – Many αj ∈ [0°,360°) each
    • Fill histogram
    • Extract track parameters
    7
    x
    y
    x
    y
    Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Princip
    → Bin
    giv
    r
    α

    View Slide

  10. Mitglied der Helmholtz-Gemeinschaft
    Algorithm: Hough Transform
    • Idea: Transform (x,y)i → (α,r)ij, find lines via (α,r) space
    • Solve rij line equation for
    – Lots of hits (x,y,ρ)i
    and
    – Many αj ∈ [0°,360°) each
    • Fill histogram
    • Extract track parameters
    7
    rij =
    cos
    ↵j
    ·
    xi +
    sin
    ↵j
    ·
    yi + ⇢i
    i: ~100 hits/event (STT)
    j: every 0.2° rij: 180 000
    x
    y
    x
    y
    Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Princip
    → Bin
    giv
    r
    α

    View Slide

  11. °
    Angle /
    0 20 40 60 80 100 120 140 160 180
    Hough transformed
    -0.4
    -0.3
    -0.2
    -0.1
    0
    0.1
    0.2
    0.3
    0.4
    0.5
    0.6 0
    Entries 2.2356e+08
    Mean x 90
    Mean y 0.02905
    RMS x 51.96
    RMS y 0.1063
    0
    5
    10
    15
    20
    25
    0
    Entries 2.2356e+08
    Mean x 90
    Mean y 0.02905
    RMS x 51.96
    RMS y 0.1063
    1800 x 1800 Grid
    PANDA STT+MVD
    Mitglied der Helmholtz-Gemeinschaft
    8
    68 (x,y) points
    r
    α
    Algorithm: Hough Transform

    View Slide

  12. °
    Angle /
    0 20 40 60 80 100 120 140 160 180
    Hough transformed
    -0.4
    -0.3
    -0.2
    -0.1
    0
    0.1
    0.2
    0.3
    0.4
    0.5
    0.6 0
    Entries 2.2356e+08
    Mean x 90
    Mean y 0.02905
    RMS x 51.96
    RMS y 0.1063
    0
    5
    10
    15
    20
    25
    0
    Entries 2.2356e+08
    Mean x 90
    Mean y 0.02905
    RMS x 51.96
    RMS y 0.1063
    1800 x 1800 Grid
    PANDA STT+MVD
    Mitglied der Helmholtz-Gemeinschaft
    8
    68 (x,y) points
    r
    α
    Algorithm: Hough Transform

    View Slide

  13. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Remarks
    9
    Thrust (CUDA‘s STL) Plain CUDA
    • Performance: 3 ms/event
    – Reduced to set of standard routines
    • Fast (uses Thrust‘s optimized algorithms)
    • Inflexible (has it‘s limits, hard to customize)
    – Not yet at performance maximum
    – No peakfinding included
    • Even possible?
    • Adds to time!
    • Ideas in exploration
    • Performance: 0.5 ms/event
    – Built completely for this task
    • Fitting to this problem
    • Customizable
    • A bit more complicated at parts
    – Simple peakfinder implemented
    (threshold)
    • Using: Dynamic Parallelism, Shared
    Memory
    Two Implementations

    View Slide

  14. Mitglied der Helmholtz-Gemeinschaft
    10
    ALGORITHMS #2
    Hough Transform
    Riemann Track Finder
    Triplet Finder

    View Slide

  15. Mitglied der Helmholtz-Gemeinschaft
    11
    Riemann Track Finder — Method
    • Idea: Don‘t fit lines (in 2D), fit planes (in 3D)!
    • Create seeds
    – All possible three hit combinations
    • Grow seeds to tracks
    Continuously test next hit if it fits
    – Use mapping to Riemann paraboloid (+ s-z fit, det. layer)
    x
    x
    x
    x
    y
    z‘
    x
    x
    x
    y
    x
    x
    x
    x
    y
    x
    More on: Seeds; Growing

    View Slide

  16. Mitglied der Helmholtz-Gemeinschaft
    11
    Riemann Track Finder — Method
    • Idea: Don‘t fit lines (in 2D), fit planes (in 3D)!
    • Create seeds
    – All possible three hit combinations
    • Grow seeds to tracks
    Continuously test next hit if it fits
    – Use mapping to Riemann paraboloid (+ s-z fit, det. layer)
    x
    x
    x
    x
    y
    z‘
    x
    x
    x
    y
    x
    x
    x
    x
    y
    x
    More on: Seeds; Growing
    1
    2

    View Slide

  17. Mitglied der Helmholtz-Gemeinschaft
    12
    1 2 3 4 5
    1
    2
    3
    4
    5
    Riemann Track Finder — 1 Seeds
    1
    Layer number
    Back

    View Slide

  18. Mitglied der Helmholtz-Gemeinschaft
    12
    1 2 3 4 5
    1
    2
    3
    4
    5
    Riemann Track Finder — 1 Seeds
    1
    Layer number
    Back

    View Slide

  19. Mitglied der Helmholtz-Gemeinschaft
    12
    1 2 3 4 5
    1
    2
    3
    4
    5
    Riemann Track Finder — 1 Seeds
    1
    Layer number
    Back

    View Slide

  20. Mitglied der Helmholtz-Gemeinschaft
    12
    1 2 3 4 5
    21
    11 31
    1
    2
    3
    4
    5
    Riemann Track Finder — 1 Seeds
    1
    Layer number
    Back

    View Slide

  21. Mitglied der Helmholtz-Gemeinschaft
    12
    1 2 3 4 5
    21
    11 31
    31
    11 41
    1
    2
    3
    4
    5
    Riemann Track Finder — 1 Seeds
    1
    Layer number
    Back

    View Slide

  22. Mitglied der Helmholtz-Gemeinschaft
    12
    1 2 3 4 5
    21
    11 31
    31
    11 41
    31
    11 32
    1
    2
    3
    4
    5
    Riemann Track Finder — 1 Seeds
    1
    Layer number
    Back

    View Slide

  23. Mitglied der Helmholtz-Gemeinschaft
    12
    1 2 3 4 5
    21
    11 31
    31
    11 41
    31
    11 32
    1
    2
    3
    4
    5
    Riemann Track Finder — 1 Seeds
    1
    Layer number
    Back

    View Slide

  24. Mitglied der Helmholtz-Gemeinschaft
    13
    Riemann Track Finder — GPU Adaptations
    CPU GPU

    View Slide

  25. Mitglied der Helmholtz-Gemeinschaft
    13
    Riemann Track Finder — GPU Adaptations
    CPU GPU
    3 loops to generate seeds
    serially
    for (int i = 0; i < hitsInLayerOne.size(); i++) {
    for (int j = 0; j < hitsInLayerTwo.size(); j++) {
    for (int k = 0; k < hitsInLayerThree.size(); k++) {
    /* Triplet Generation */
    }
    }
    }
    Needed: Mapping of
    inherent GPU indexing
    variable to triplet index
    int ijk = threadIdx.x + blockIdx.x * blockDim.x;
    nLayerx
    = 1
    2
    ⇣p
    8x
    +
    1 1

    pos
    (
    nLayerx
    ) =
    3
    pp
    3
    p
    243x2 1
    +
    27x
    32
    /
    3
    + 1
    3
    p
    3
    3
    pp
    3
    p
    243x2 1
    +
    27x
    1
    1

    View Slide

  26. Mitglied der Helmholtz-Gemeinschaft
    13
    Riemann Track Finder — GPU Adaptations
    CPU GPU
    3 loops to generate seeds
    serially
    for (int i = 0; i < hitsInLayerOne.size(); i++) {
    for (int j = 0; j < hitsInLayerTwo.size(); j++) {
    for (int k = 0; k < hitsInLayerThree.size(); k++) {
    /* Triplet Generation */
    }
    }
    }
    Needed: Mapping of
    inherent GPU indexing
    variable to triplet index
    int ijk = threadIdx.x + blockIdx.x * blockDim.x;
    nLayerx
    = 1
    2
    ⇣p
    8x
    +
    1 1

    pos
    (
    nLayerx
    ) =
    3
    pp
    3
    p
    243x2 1
    +
    27x
    32
    /
    3
    + 1
    3
    p
    3
    3
    pp
    3
    p
    243x2 1
    +
    27x
    1
    1
    2
    Port of CPU code;
    parallelism on seed base
    Only easy computations;
    e.g. 3x3 matrices

    View Slide

  27. Mitglied der Helmholtz-Gemeinschaft
    13
    Riemann Track Finder — GPU Adaptations
    CPU GPU
    → 100 × faster than CPU version: ~0.6 ms/event
    3 loops to generate seeds
    serially
    for (int i = 0; i < hitsInLayerOne.size(); i++) {
    for (int j = 0; j < hitsInLayerTwo.size(); j++) {
    for (int k = 0; k < hitsInLayerThree.size(); k++) {
    /* Triplet Generation */
    }
    }
    }
    Needed: Mapping of
    inherent GPU indexing
    variable to triplet index
    int ijk = threadIdx.x + blockIdx.x * blockDim.x;
    nLayerx
    = 1
    2
    ⇣p
    8x
    +
    1 1

    pos
    (
    nLayerx
    ) =
    3
    pp
    3
    p
    243x2 1
    +
    27x
    32
    /
    3
    + 1
    3
    p
    3
    3
    pp
    3
    p
    243x2 1
    +
    27x
    1
    1
    2
    Port of CPU code;
    parallelism on seed base
    Only easy computations;
    e.g. 3x3 matrices

    View Slide

  28. Mitglied der Helmholtz-Gemeinschaft
    14
    ALGORITHMS #3
    Hough Transform
    Riemann Track Finder
    Triplet Finder

    View Slide

  29. Mitglied der Helmholtz-Gemeinschaft
    15
    Triplet Finder
    • Algorithm specifically designed for the
    PANDA Straw Tube Tracker (STT)
    • Ported to GPU by Andrew Adinetz
    – CUDA, Dynamic Parallelism, Thrust
    – Quality of tracks comparable to CPU
    http://www.fz-juelich.de/ias/jsc/
    Original algorithm by
    Marius Mertens et al
    1.5 m

    View Slide

  30. Mitglied der Helmholtz-Gemeinschaft
    16
    Triplet Finder
    • Idea: Use only subset of detector as seed
    – Don‘t use STT isochrones (drift times)
    – Calculate circle from 3 points (no fit)
    • Features
    – Fast & robust algorithm, no t0
    – Many tuning possibilities
    More

    View Slide

  31. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    17
    STT
    More

    View Slide

  32. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    17
    STT
    More

    View Slide

  33. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    17
    STT
    More

    View Slide

  34. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    17
    STT
    More

    View Slide

  35. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    • STT hit in pivot straw
    17
    STT
    More

    View Slide

  36. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    • STT hit in pivot straw
    • Find surrounding hits
    → Create virtual hit (triplet)
    at center of gravity (cog)
    17
    STT
    More

    View Slide

  37. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    • STT hit in pivot straw
    • Find surrounding hits
    → Create virtual hit (triplet)
    at center of gravity (cog)
    • Combine with
    17
    STT
    More

    View Slide

  38. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    • STT hit in pivot straw
    • Find surrounding hits
    → Create virtual hit (triplet)
    at center of gravity (cog)
    • Combine with
    1.Second STT pivot-cog virtual hit
    17
    STT
    More

    View Slide

  39. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    • STT hit in pivot straw
    • Find surrounding hits
    → Create virtual hit (triplet)
    at center of gravity (cog)
    • Combine with
    1.Second STT pivot-cog virtual hit
    17
    STT
    More

    View Slide

  40. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    • STT hit in pivot straw
    • Find surrounding hits
    → Create virtual hit (triplet)
    at center of gravity (cog)
    • Combine with
    1.Second STT pivot-cog virtual hit
    17
    STT
    More

    View Slide

  41. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    • STT hit in pivot straw
    • Find surrounding hits
    → Create virtual hit (triplet)
    at center of gravity (cog)
    • Combine with
    1.Second STT pivot-cog virtual hit
    2.Interaction point
    17
    Interaction Point
    STT
    More

    View Slide

  42. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    • STT hit in pivot straw
    • Find surrounding hits
    → Create virtual hit (triplet)
    at center of gravity (cog)
    • Combine with
    1.Second STT pivot-cog virtual hit
    2.Interaction point
    • Calculate circle through three
    points
    17
    Interaction Point
    STT
    More

    View Slide

  43. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Method
    • STT hit in pivot straw
    • Find surrounding hits
    → Create virtual hit (triplet)
    at center of gravity (cog)
    • Combine with
    1.Second STT pivot-cog virtual hit
    2.Interaction point
    • Calculate circle through three
    points
    → Track Candidate
    17
    Interaction Point
    STT
    More

    View Slide

  44. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Animation
    18
    Triplet
    Isochrone early
    Isochrone early & skewed
    Isochrone close
    Isochrone late
    MVD hit
    Track timed out
    Track current

    View Slide

  45. Mitglied der Helmholtz-Gemeinschaft
    19
    Triplet Finder — Times

    View Slide

  46. Mitglied der Helmholtz-Gemeinschaft
    19
    Triplet Finder — Times

    View Slide

  47. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Optimizations
    • Bunching Wrapper
    – Hits from one event have similar timestamp
    – Combine hits to sets (bunches) which occupy GPU best
    20

    View Slide

  48. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Optimizations
    • Bunching Wrapper
    – Hits from one event have similar timestamp
    – Combine hits to sets (bunches) which occupy GPU best
    20
    Hit

    View Slide

  49. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Optimizations
    • Bunching Wrapper
    – Hits from one event have similar timestamp
    – Combine hits to sets (bunches) which occupy GPU best
    20
    Hit Event

    View Slide

  50. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Optimizations
    • Bunching Wrapper
    – Hits from one event have similar timestamp
    – Combine hits to sets (bunches) which occupy GPU best
    20
    Hit Event

    View Slide

  51. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Optimizations
    • Bunching Wrapper
    – Hits from one event have similar timestamp
    – Combine hits to sets (bunches) which occupy GPU best
    20
    Hit Event
    Bunch

    View Slide

  52. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Optimizations
    • Bunching Wrapper
    – Hits from one event have similar timestamp
    – Combine hits to sets (bunches) which occupy GPU best
    20
    Hit Event
    Bunch
    (N2) → (N)

    View Slide

  53. Mitglied der Helmholtz-Gemeinschaft
    21
    Triplet Finder — Bunching
    Performance

    View Slide

  54. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Optimizations
    22
    • Sector Row testing
    – After found track:
    Hit association not with all hits of current window,
    but only with subset
    (first test rows of sector, then hits of row)
    More

    View Slide

  55. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Optimizations
    22
    • Sector Row testing
    – After found track:
    Hit association not with all hits of current window,
    but only with subset
    (first test rows of sector, then hits of row)
    More

    View Slide

  56. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Optimizations
    22
    • Sector Row testing
    – After found track:
    Hit association not with all hits of current window,
    but only with subset
    (first test rows of sector, then hits of row)
    More

    View Slide

  57. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Optimizations
    22
    • Sector Row testing
    – After found track:
    Hit association not with all hits of current window,
    but only with subset
    (first test rows of sector, then hits of row)
    More

    View Slide

  58. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Optimizations
    22
    • Sector Row testing
    – After found track:
    Hit association not with all hits of current window,
    but only with subset
    (first test rows of sector, then hits of row)
    More

    View Slide

  59. Mitglied der Helmholtz-Gemeinschaft
    23
    Triplet Finder — Sector Rows
    Preliminary
    (in publication)

    View Slide

  60. Dynamic
    Parallelism
    Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Optimizations
    • Compare kernel launch strategies
    24
    1 thread/bunch
    Calling
    kernel
    1 thread/bunch
    Calling
    kernel
    Triplet
    Finder
    1 thread/bunch
    Calling
    kernel
    1 block/bunch
    Joined
    kernel
    1 block/bunch
    Joined
    kernel
    1 block/bunch
    Joined
    kernel
    TF Stage #1
    TF Stage #2
    TF Stage #3
    TF Stage #4
    1 stream/bunch
    Combining
    stream
    1 stream/bunch
    Combining
    stream
    1 stream/bunch
    Calling
    stream
    Joined
    Kernel
    Host
    Streams
    Triplet
    Finder
    Triplet
    Finder
    CPU
    GPU
    TF Stage #1
    TF Stage #2
    TF Stage #3
    TF Stage #4
    TF Stage #1
    TF Stage #2
    TF Stage #3
    TF Stage #4

    View Slide

  61. Mitglied der Helmholtz-Gemeinschaft
    25
    Triplet Finder — Kernel Launches
    Preliminary
    (in publication)
    Explanation

    View Slide

  62. Tesla K40 Tesla K20X
    Peak double
    performance
    Peak single
    performance
    GPU Chipset
    # CUDA Cores
    Memory size
    Memory bandwidth
    1.46 TFLOPS 1.31 TFLOPS
    4.29 TFLOPS 3.95 TFLOPS
    GK110B GK110
    2880 2688
    12 GB 6 GB
    288 GByte/s 250 GByte/s
    Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Optimizations
    • Impact of chipset
    26
    Source: http://www.nvidia.com/content/tesla/pdf/NVIDIA-Tesla-Kepler-Family-Datasheet.pdf

    View Slide

  63. Mitglied der Helmholtz-Gemeinschaft
    27
    Triplet Finder — Clock Speed / GPU
    Preliminary
    (in publication)
    K40 3004 MHz, 745 MHz / 875 MHz
    K20X 2600 MHz, 732 MHz / 784 MHz
    Memory Clock Core Clock GPU Boost

    View Slide

  64. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Summary
    • Best performance: 20 µs/event
    – 20⋅10-6 s/event * 2⋅107 event/s 㱺 400 GPUs2014
    – PANDA2019: Multi GPU system – (100) GPUs
    • Optimizations possible & needed
    – ε needs to be improved
    – Speed, €: More float less double-cards a la K10
    28

    View Slide

  65. Mitglied der Helmholtz-Gemeinschaft
    Summary
    • PANDA researches in using GPUs as part of online
    event reconstruction scheme
    • Algorithms in active evaluation and optimization
    – Triplet Finder performance-optimized
    29

    View Slide

  66. Thank you!
    Andreas Herten
    [email protected]
    Mitglied der Helmholtz-Gemeinschaft
    Summary
    • PANDA researches in using GPUs as part of online
    event reconstruction scheme
    • Algorithms in active evaluation and optimization
    – Triplet Finder performance-optimized
    29

    View Slide

  67. Mitglied der Helmholtz-Gemeinschaft
    List of Resources Used
    • #4: Earth icon by Francesco Paleari from The Noun Project
    • #4: Einstein icon by Roman Rusinov from The Noun Project
    • #6: FAIR vector logo from official FAIR website
    • #6: FAIR rendering from official website
    • #11: Flare Gun icon by Jop van der Kroef from The Noun Project
    • #27: STT event animation by Marius C. Mertens
    • #35: Graphics cards images by NVIDIA promotion
    • #35: GPU Specifications
    – Tesla K20X Specifications: http://www.nvidia.com/content/PDF/kepler/Tesla-
    K20X-BD-06397-001-v07.pdf
    – Tesla K40 Specifications: http://www.nvidia.com/content/PDF/kepler/Tesla-K40-
    Active-Board-Spec-BD-06949-001_v03.pdf
    – Tesla Familiy Overview: http://www.nvidia.com/content/tesla/pdf/NVIDIA-Tesla-
    Kepler-Family-Datasheet.pdf
    30

    View Slide

  68. Mitglied der Helmholtz-Gemeinschaft
    BACKUP
    31

    View Slide

  69. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    32
    Back

    View Slide

  70. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    32
    x
    y
    Back

    View Slide

  71. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    32
    x
    y
    Back

    View Slide

  72. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    32
    x
    y
    *
    *
    (r, α)1
    rij =
    cos
    ↵j
    ·
    xi +
    sin
    ↵j
    ·
    yi + ⇢i
    Back

    View Slide

  73. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    32
    x
    y
    *
    *
    r
    α
    (r, α)1
    rij =
    cos
    ↵j
    ·
    xi +
    sin
    ↵j
    ·
    yi + ⇢i
    Back

    View Slide

  74. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    32
    x
    y
    *
    *
    r
    α
    (r, α)1
    rij =
    cos
    ↵j
    ·
    xi +
    sin
    ↵j
    ·
    yi + ⇢i
    Back

    View Slide

  75. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    32
    x
    y
    *
    *
    r
    α
    (r, α)1
    (r, α)2
    rij =
    cos
    ↵j
    ·
    xi +
    sin
    ↵j
    ·
    yi + ⇢i
    Back

    View Slide

  76. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    32
    x
    y
    *
    *
    r
    α
    rij =
    cos
    ↵j
    ·
    xi +
    sin
    ↵j
    ·
    yi + ⇢i
    Back

    View Slide

  77. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    32
    x
    y
    *
    *
    r
    α
    rij =
    cos
    ↵j
    ·
    xi +
    sin
    ↵j
    ·
    yi + ⇢i
    Back

    View Slide

  78. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    32
    x
    y
    *
    *
    r
    α
    rij =
    cos
    ↵j
    ·
    xi +
    sin
    ↵j
    ·
    yi + ⇢i
    Back

    View Slide

  79. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    32
    x
    y
    *
    *
    r
    α
    rij =
    cos
    ↵j
    ·
    xi +
    sin
    ↵j
    ·
    yi + ⇢i
    Back

    View Slide

  80. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    32
    x
    y
    *
    *
    r
    α
    rij =
    cos
    ↵j
    ·
    xi +
    sin
    ↵j
    ·
    yi + ⇢i
    Back

    View Slide

  81. Mitglied der Helmholtz-Gemeinschaft
    Hough Transform — Principle
    32
    x
    y
    → Bin with highest multiplicity
    gives track parameters
    *
    *
    r
    α
    rij =
    cos
    ↵j
    ·
    xi +
    sin
    ↵j
    ·
    yi + ⇢i
    Back

    View Slide

  82. Mitglied der Helmholtz-Gemeinschaft
    33
    Riemann Algorithm — Procedure

    View Slide

  83. Mitglied der Helmholtz-Gemeinschaft
    33
    Riemann Algorithm — Procedure
    • Create triplet of hit points
    – All possible three hit combinations need to become
    triplets
    1

    View Slide

  84. Mitglied der Helmholtz-Gemeinschaft
    33
    Riemann Algorithm — Procedure
    • Create triplet of hit points
    – All possible three hit combinations need to become
    triplets
    • Grow triplets to tracks:
    Continuously test next hit if it fits to triplet track
    – Use Riemann paraboloid to circle fit track
    • Test closeness of new hit: good → add hit; bad → dismiss hit
    • Continue with next hit
    – Helix fit: arc length s vs. z position
    1
    2

    View Slide

  85. Mitglied der Helmholtz-Gemeinschaft
    34
    Riemann Algorithm — 1 Expansion
    2
    Back

    View Slide

  86. Mitglied der Helmholtz-Gemeinschaft
    34
    Riemann Algorithm — 1 Expansion
    2
    x
    x
    x
    x
    y
    z‘
    Expand to z‘
    Back

    View Slide

  87. Mitglied der Helmholtz-Gemeinschaft
    34
    Riemann Algorithm — 1 Expansion
    2
    x
    x
    x
    x
    y
    z‘
    Expand to z‘
    x
    x
    x
    y
    x
    Riemann Surface
    (paraboloid)
    Back

    View Slide

  88. Mitglied der Helmholtz-Gemeinschaft
    34
    Riemann Algorithm — 1 Expansion
    2
    x
    x
    x
    x
    y
    z‘
    Expand to z‘
    x
    x
    x
    y
    x
    Riemann Surface
    (paraboloid)
    Back

    View Slide

  89. Mitglied der Helmholtz-Gemeinschaft
    34
    Riemann Algorithm — 1 Expansion
    2
    x
    x
    x
    x
    y
    z‘
    Expand to z‘
    x
    x
    x
    y
    x
    Riemann Surface
    (paraboloid)
    Back

    View Slide

  90. Mitglied der Helmholtz-Gemeinschaft
    34
    Riemann Algorithm — 1 Expansion
    2
    x
    x
    x
    x
    y
    z‘
    Expand to z‘
    x
    x
    x
    y
    x
    Riemann Surface
    (paraboloid)
    Back

    View Slide

  91. Mitglied der Helmholtz-Gemeinschaft
    34
    Riemann Algorithm — 1 Expansion
    2
    x
    x
    x
    x
    y
    z‘
    Expand to z‘
    x
    x
    x
    y
    x
    Riemann Surface
    (paraboloid)
    Back

    View Slide

  92. Mitglied der Helmholtz-Gemeinschaft
    34
    Riemann Algorithm — 1 Expansion
    2
    x
    x
    x
    x
    y
    z‘
    Expand to z‘
    x
    x
    x
    y
    x
    Riemann Surface
    (paraboloid)
    x
    Back

    View Slide

  93. Mitglied der Helmholtz-Gemeinschaft
    34
    Riemann Algorithm — 1 Expansion
    2
    x
    x
    x
    x
    y
    z‘
    Expand to z‘
    x
    x
    x
    y
    x
    Riemann Surface
    (paraboloid)
    x
    Back

    View Slide

  94. Mitglied der Helmholtz-Gemeinschaft
    34
    Riemann Algorithm — 1 Expansion
    2
    x
    x
    x
    x
    y
    z‘
    Expand to z‘
    x
    x
    x
    y
    x
    Riemann Surface
    (paraboloid)
    x
    Back

    View Slide

  95. Mitglied der Helmholtz-Gemeinschaft
    34
    Riemann Algorithm — 1 Expansion
    2
    x
    x
    x
    x
    y
    z‘
    Expand to z‘
    x
    x
    x
    y
    x
    Riemann Surface
    (paraboloid)
    x
    Back

    View Slide

  96. Mitglied der Helmholtz-Gemeinschaft
    34
    Riemann Algorithm — 1 Expansion
    2
    x
    x
    x
    x
    y
    z‘
    Expand to z‘
    x
    x
    x
    y
    x
    Riemann Surface
    (paraboloid)
    x
    Back

    View Slide

  97. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Optimizations
    • Sector Row testing
    – Thicken track; shrink sector row layer to line
    – Find intersection
    35
    Sector-Row Testing
    Track
    Sector-Row
    Track
    Sector-Row
    Back

    View Slide

  98. Mitglied der Helmholtz-Gemeinschaft
    Triplet Finder — Kernel Launch Strategies
    • Joined Kernel (JK): slowest
    – High # registers → low occupancy
    • Dynamic Parallelism (DP) / Host Streams (HS): comparable performance
    – Performance
    • HS faster for small # processed hits, DP faster for > 45000 hits
    • HS stagnates there, while DP continues rising
    – Limiting factor
    • High # of required kernel calls
    • Kernel launch latency
    • Memcopy
    – HS more affected by this, because
    • More PCI-E transfers (launch configurations for kernels)
    • Less launch throughput, kernel launch latency gets more important
    • False dependencies of launched kernels
    – Single CPU thread handles all CUDA streams (Multi-thread possible, but
    synchronization overhead too high for good performance)
    – Grid scheduling done on hardware (Grid Management Unit) (DP: software)
    » False dependencies when N(streams) > N(device connections)=323.5
    36
    Back

    View Slide

  99. Mitglied der Helmholtz-Gemeinschaft
    37
    Triplet Finder — Host Stream Connections
    Preliminary
    (in publication)

    View Slide

  100. Mitglied der Helmholtz-Gemeinschaft
    38
    Triplet Finder — Bunch Sizes
    Preliminary
    (in publication)

    View Slide