Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Digging Into Cutting-Edge Lane Detection Methods 2021

Hiroto Honda
December 09, 2021

Digging Into Cutting-Edge Lane Detection Methods 2021

Survey of the cutting edge lane detection methods proposed in 2021 - LaneATT and CondLaneNet.
(presented at the DeNA / Mobility Technologies computer vision reading group on Dec. 3, 2021.)

Hiroto Honda

December 09, 2021
Tweet

More Decks by Hiroto Honda

Other Decks in Research

Transcript

  1. Mobility Technologies Co., Ltd. Hiroto Honda - Mobility Technologies Co.,

    Ltd. (Japan) - homepage: https://hirotomusiker.github.io/ - blogs: Digging Into Detectron 2 (thanks for >2,500 claps👏!!) - recent presentation: Digging Into Sample Assignment Methods for Object Detection - recent paper: End-to-End Monocular Vanishing Point Detection Exploiting Lane Annotations - google scholar, linkedin - kaggle master - Interests: Object Detection, Image Restoration, Autonomous Driving About Me
  2. Mobility Technologies Co., Ltd. • detect the positions of lane

    markers • number of lane classes is typically one • number of lane instances depends on images, typically 0-4 Lane Detection Definition All the images in this presentation are Hiroto Honda’s vacation photos 🏖
  3. Mobility Technologies Co., Ltd. Lane Detection Benchmarks dataset # of

    images resolution curve images fork lanes urban scenes papers with code TuSimple 13K 1280x720 30% - - link CULane 133K 1640x590 10% - ✓ link CurveLanes 🆕 150K 2650x1440 90% ✓ ✓ - CULane is the most popular benchmark as of 2021 - CurveLanes that features various curve images and fork lanes newly appeared
  4. Mobility Technologies Co., Ltd. CULane and CurveLanes Metric - For

    evaluation, prediction and ground-truth lane points are compared by intersection of segments, which are are drawn by cv2.line pred GT TP = IOUmatrix [linear sum assignment indices] > iou_threshold (otherwise FP or FN) precision = TP / (TP + FP), recall = TP / (TP + FN) F1 = 2 * (prec * rec) / (prec + rec) F1 is used as the primary metric ・・・ ・・・ iou=0.80 0.11 0.01 0.02 0.01 0.09 0.35 0.43 0.11 0.99 0.01 0.22 [Code link]
  5. Mobility Technologies Co., Ltd. Multi-class segmentation - each class corresponds

    to each lane marker (e.g. SCNN [1]) pros : simple network and various segmentation methods exploitable cons: Lane classes are fixed / Per-pixel training is excessive for the task postprocess: row-wise argmax Lane Detection Type 1: Segmentation CNN Lane 1 Lane N … input
  6. Mobility Technologies Co., Ltd. Model learns the polynomial coefficients for

    lane representation [2] a0 a1 ・ ・ aN offset conf ・・・ polynomial coefficients pros: fulfill the continuity of lane shapes cons: low accuracy due to the limited representation freedom a0 a1 ・ ・ aN offset conf a0 a1 ・ ・ aN offset conf Lane Detection Type 2: Polynomial Representation input CNN
  7. Mobility Technologies Co., Ltd. Use lane shape priors (anchors) and

    regress the lane points [3] per-anchor vectors pros: no need for high-res feature map as output cons: lane shape representation ability depends on anchor priors Lane Detection Type 3: Anchor-based Detection anchor GT input CNN per-anchor regression
  8. Mobility Technologies Co., Ltd. Detect the x-coordinate of a lane

    row by row [4] row-wise classification vertical classification pros: can represent continuity of lane shape effectively cons: network tends to be more complex than anchor-based one Lane Detection Type 4: Row-wise Detection
  9. Mobility Technologies Co., Ltd. LaneATT [3] (CVPR’21) Method explanation is

    based on the authors’ implementation : https://github.com/lucastabelini/LaneATT The intermediate images are from the actual tensors
  10. Mobility Technologies Co., Ltd. • Anchor-based single-shot detection (like YOLO)

    • Pool the feature map with anchors into per-anchor vectors resnet Anchor-based Pooling feature map 64x11 ch Self-Attention 64x11 ch 1000 anchors fc fc cls (1000 x 2ch) reg (1000 x 73ch) concat (360, 640) (11, 20) LaneATT [3] (CVPR 2021) [Code link] Non-Max Suppression (tes -time)
  11. Mobility Technologies Co., Ltd. Pre-defined anchors - left and right:

    6 angles * 72 offsets - bottom: 15 angles * 128 offsets - each anchor has 72 points on y=0.0-1.0 2784 in total -> 1000 frequent anchors LaneATT: Anchors [Code link]
  12. Mobility Technologies Co., Ltd. ‘Crop’ the feature map pixels using

    1,000 pre-defined anchor indices 64x11 ch 1000 anchors 11 hw 64 ch LaneATT: Anchor-based Pooling [Code link]
  13. Mobility Technologies Co., Ltd. Simple self-attention for the per-anchor vectors

    before the reg /cls branch 64x11 ch FC attention matrix 1000 1000 1000 anchors 1000 704 1000 704 64x11 ch 1000 anchors LaneATT: Self Attention [Code link]
  14. Mobility Technologies Co., Ltd. Sample the ground truth target that

    has the minimum distance below threshold for each anchor to impose losses on per-anchor vectors GT anchor pr0 pr1 pr2 ... pr999 gt0 273 44 82 72 gt1 13 124 277 247 gt2 352 52 11 28 gt3 321 324 22 263 assig ned gt1 - gt2 - L1 distance matrix of x positions LaneATT: Sample Assignment [Code link]
  15. Mobility Technologies Co., Ltd. One-on-one comparison between each prediction and

    assigned ground-truth for each per-anchor vector pred_x0 pred_x1 ・ ・ pred_x71 length pred0 pred1 x-coordinate regression foreground / background classification 0 1 1 0 ground truth foreground background Lcls: focal loss Lreg: smooth l1 loss 175.61 180.28 ・ ・ 399.45 30.0 anchor_x0 anchor_x1 ・ ・ anchor_x71 + LaneATT: Regression ground truth [Code link]
  16. Mobility Technologies Co., Ltd. CondLaneNet [4] (ICCV’21) Method explanation is

    based on the authors’ implementation : https://github.com/aliyun/conditional-lane-detection The intermediate images are from the actual tensors
  17. Mobility Technologies Co., Ltd. • Lane start point detection •

    Row-wise classification + x-regression resnet FPN Transformer Proposal: start point heatmap row-wise location Offset SOTA weights (20, 50) (40, 100) Input image CondLaneNet [4] (ICCV 2021) The RNN module for fork lanes is omitted in this survey [Code link]
  18. Mobility Technologies Co., Ltd. Learn the starting points of the

    lane instances using heatmaps, like human pose estimation train time: impose Focal loss on pred and GT heatmaps test time: pick the peak points as lane instances by 2d max-pooling (64ch, 20, 50) conv x 2 conv x 2 (1ch, 20, 50) (67+67ch, 20, 50) from FPN 2D Gaussian heatmaps for start points Lpoint: Focal Loss ground truth CondLaneNet : Heatmap-based Starting Point Proposal Detection heatmap branch parameter branch [Code link]
  19. Mobility Technologies Co., Ltd. - Randomly pick feature points (i,

    j) ∈Ω around the ground-truth start points, as the training samples - Extract the conv weights from Ω to use later (1ch, 20, 50) (67+67ch, 20, 50) conv weight parameters 67+67ch NΩ: num of training samples Ω CondLaneNet : Extract Weights for Conditional Convolution heatmap branch parameter branch [Code link]
  20. Mobility Technologies Co., Ltd. 67+67ch NΩ: num of training samples

    position encodings from FPN conv conditional conv (64ch, 40, 100) x(NΩ) repeat conditional conv row-wise map offset map - Row-wise and offset branches give the lane shape heatmap and sub-pixel offset map. - The channel-lane correspondence is guaranteed by the conditional convolution, whose parameters have been picked from the parameter branch. CondLaneNet : Location / Offset Branch row -wise branch offset branch [Code link]
  21. Mobility Technologies Co., Ltd. row-wise map - Row-wise expected x

    positions of the lane instance are calculated from the row-wise map and L1 loss is imposed on each row. - Vertical range is also calculated by reducing the x axis. train time: calculate losses for NΩ instances Test time: form the lane shape for each prediction start point fc row-wise expected location vertical classification Lrange: SCE Loss Lrow: l1 Loss vertical range (NΩ, 40, 100) CondLaneNet : Row-wise Head [Code link]
  22. Mobility Technologies Co., Ltd. offset map (NΩ, 40, 100) offset

    GT offset mask Loffset: L1 Loss - X coordinate offset is added to the row-wise x positions calculated at the row-wise head - L1 loss is imposed within the offset mask range train time: calculate losses for NΩ instances within the offset mask test time: add the offset values to the row-wise x positions for each start point 20 pixels CondLaneNet : Offset Head [Code link]
  23. Mobility Technologies Co., Ltd. LaneATT vs CondLaneNet - Both models

    are highly accurate and achieve real-time inference - Far more light-weight compared with a segmentation-based SCNN LaneATT [3] (resnet34) 2021 CondLaneNet [4] (resnet34) 2021 SCNN [1] 2018 F1 score 76.68 78.74 71.6 MACs 18G 19.6G 328.4G FPS 171 152 7.5 Input resolution 640x360 800x320 (crop)
  24. Mobility Technologies Co., Ltd. Conclusion - Lane detection is the

    special branch of detection - Continuity of lanes requires specific representation - Anchor-based and row-based approaches are more efficient than segmentation-based one - Shape representation and instance discrimination are key - SHOUTOUT to the authors for their amazing papers and implementations [1][2][3][4] !!
  25. Mobility Technologies Co., Ltd. References [1] Angshuman Parashar, Minsoo Rhu,

    Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, William J. Dally, “SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks, “ AAAI2018, https://arxiv.org/abs/1708.04485 [2] Jonah Philion, “FastDraw: Addressing the Long Tail of Lane Detection by Adapting a Sequential Prediction Network, “ CVPR 2019, https://arxiv.org/abs/1905.04354 [3] Lucas Tabelini, Rodrigo Berriel, Thiago M. Paixão, Claudine Badue, Alberto F. De Souza, Thiago Oliveira-Santos, “Keep your Eyes on the Lane: Real-time Attention-guided Lane Detection, “ CVPR 2021, https://arxiv.org/abs/2010.12035, https://github.com/lucastabelini/LaneATT [4] Lizhe Liu, Xiaohao Chen, Siyu Zhu, Ping Tan, “CondLaneNet: a Top-to-down Lane Detection Framework Based on Conditional Convolution, “ ICCV 2021, https://arxiv.org/abs/2105.05003, https://github.com/aliyun/conditional-lane-detection