Digging Into Cutting-Edge Lane Detection Methods 2021
Survey of the cutting edge lane detection methods proposed in 2021 - LaneATT and CondLaneNet.
(presented at the DeNA / Mobility Technologies computer vision reading group on Dec. 3, 2021.)
markers • number of lane classes is typically one • number of lane instances depends on images, typically 0-4 Lane Detection Definition All the images in this presentation are Hiroto Honda’s vacation photos 🏖
images resolution curve images fork lanes urban scenes papers with code TuSimple 13K 1280x720 30% - - link CULane 133K 1640x590 10% - ✓ link CurveLanes 🆕 150K 2650x1440 90% ✓ ✓ - CULane is the most popular benchmark as of 2021 - CurveLanes that features various curve images and fork lanes newly appeared
evaluation, prediction and ground-truth lane points are compared by intersection of segments, which are are drawn by cv2.line pred GT TP = IOUmatrix [linear sum assignment indices] > iou_threshold (otherwise FP or FN) precision = TP / (TP + FP), recall = TP / (TP + FN) F1 = 2 * (prec * rec) / (prec + rec) F1 is used as the primary metric ・・・ ・・・ iou=0.80 0.11 0.01 0.02 0.01 0.09 0.35 0.43 0.11 0.99 0.01 0.22 [Code link]
to each lane marker (e.g. SCNN [1]) pros : simple network and various segmentation methods exploitable cons: Lane classes are fixed / Per-pixel training is excessive for the task postprocess: row-wise argmax Lane Detection Type 1: Segmentation CNN Lane 1 Lane N … input
lane representation [2] a0 a1 ・ ・ aN offset conf ・・・ polynomial coefficients pros: fulfill the continuity of lane shapes cons: low accuracy due to the limited representation freedom a0 a1 ・ ・ aN offset conf a0 a1 ・ ・ aN offset conf Lane Detection Type 2: Polynomial Representation input CNN
regress the lane points [3] per-anchor vectors pros: no need for high-res feature map as output cons: lane shape representation ability depends on anchor priors Lane Detection Type 3: Anchor-based Detection anchor GT input CNN per-anchor regression
row by row [4] row-wise classification vertical classification pros: can represent continuity of lane shape effectively cons: network tends to be more complex than anchor-based one Lane Detection Type 4: Row-wise Detection
lane instances using heatmaps, like human pose estimation train time: impose Focal loss on pred and GT heatmaps test time: pick the peak points as lane instances by 2d max-pooling (64ch, 20, 50) conv x 2 conv x 2 (1ch, 20, 50) (67+67ch, 20, 50) from FPN 2D Gaussian heatmaps for start points Lpoint: Focal Loss ground truth CondLaneNet : Heatmap-based Starting Point Proposal Detection heatmap branch parameter branch [Code link]
j) ∈Ω around the ground-truth start points, as the training samples - Extract the conv weights from Ω to use later (1ch, 20, 50) (67+67ch, 20, 50) conv weight parameters 67+67ch NΩ: num of training samples Ω CondLaneNet : Extract Weights for Conditional Convolution heatmap branch parameter branch [Code link]
position encodings from FPN conv conditional conv (64ch, 40, 100) x(NΩ) repeat conditional conv row-wise map offset map - Row-wise and offset branches give the lane shape heatmap and sub-pixel offset map. - The channel-lane correspondence is guaranteed by the conditional convolution, whose parameters have been picked from the parameter branch. CondLaneNet : Location / Offset Branch row -wise branch offset branch [Code link]
positions of the lane instance are calculated from the row-wise map and L1 loss is imposed on each row. - Vertical range is also calculated by reducing the x axis. train time: calculate losses for NΩ instances Test time: form the lane shape for each prediction start point fc row-wise expected location vertical classification Lrange: SCE Loss Lrow: l1 Loss vertical range (NΩ, 40, 100) CondLaneNet : Row-wise Head [Code link]
GT offset mask Loffset: L1 Loss - X coordinate offset is added to the row-wise x positions calculated at the row-wise head - L1 loss is imposed within the offset mask range train time: calculate losses for NΩ instances within the offset mask test time: add the offset values to the row-wise x positions for each start point 20 pixels CondLaneNet : Offset Head [Code link]
special branch of detection - Continuity of lanes requires specific representation - Anchor-based and row-based approaches are more efficient than segmentation-based one - Shape representation and instance discrimination are key - SHOUTOUT to the authors for their amazing papers and implementations [1][2][3][4] !!
Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, William J. Dally, “SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks, “ AAAI2018, https://arxiv.org/abs/1708.04485 [2] Jonah Philion, “FastDraw: Addressing the Long Tail of Lane Detection by Adapting a Sequential Prediction Network, “ CVPR 2019, https://arxiv.org/abs/1905.04354 [3] Lucas Tabelini, Rodrigo Berriel, Thiago M. Paixão, Claudine Badue, Alberto F. De Souza, Thiago Oliveira-Santos, “Keep your Eyes on the Lane: Real-time Attention-guided Lane Detection, “ CVPR 2021, https://arxiv.org/abs/2010.12035, https://github.com/lucastabelini/LaneATT [4] Lizhe Liu, Xiaohao Chen, Siyu Zhu, Ping Tan, “CondLaneNet: a Top-to-down Lane Detection Framework Based on Conditional Convolution, “ ICCV 2021, https://arxiv.org/abs/2105.05003, https://github.com/aliyun/conditional-lane-detection