Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ICPRAI 2022 - STM

ICPRAI 2022 - STM

Olivier Lézoray

June 02, 2022
Tweet

More Decks by Olivier Lézoray

Other Decks in Research

Transcript

  1. Space-Time Memory networks for multi-person skeleton body part detection Rémi

    DUFOUR (PhD Student FCS Railenium), Cyril Meurie, Olivier Lézoray, Ankur Mahtani ICPRAI 2022
  2. • Context • Related works • Experiments • Skeleton confidence

    maps • Skeleton edges tracking • Conclusion 2 Plan
  3. • Autonomous train prototype project directed by Railenium • Camera

    surveillance is needed to enable services and provide security without onboard staff • Pose tracking would be useful as a base for action recognition Context 3
  4. • Most existing pose trackers are two-stage • Pose estimation

    in frames • Linking the poses over time • The first stage is usually performed using a top-down pose detector. • A pose detector first detects human bounding boxes • A pose estimation method is used inside these bounding boxes 4 Related works
  5. • Existing top down pose estimation methods are not made

    for video, they do not have a memory of past frames • Advances in the task of Video Object Segmentation (VOS) have led to a tracker that makes use of long-term memory, called STM (Space-Time Memory) 5 Related works
  6. 6 Related works • Our work seek to answer the

    question "Can the STM architecture, conceived for Video Object Segmentation, be adapted for pose estimation in videos"
  7. • First, we tried training STM to perform skeleton/background binary

    segmentation • The original STM weights are finetuned on this new task • First on the MS-COCO[1] image dataset (64114 images annotated with keypoints) • We generate short videos (2 to 4 frames) by translating/rotating individual images 7 Experiments : skeleton confidence map
  8. • Quantitative results (CC-loss is Pearson's Correlation Coefficient) • Results

    obtained by initialising with ground truth for the first frame and then tracking for the rest of the video sequence 8 Experiments : skeleton confidence map
  9. • Qualitative results on PoseTrack18 • Results are encouraging because

    the model was not trained on PoseTrack18_Train 9 Experiments : skeleton confidence map
  10. • The STM architecture was modified so that it has

    multiple input and output channels, one for each skeleton edge type 10 Experiments : video skeleton edges tracking
  11. 14 Experiments : Video pose estimation • Finally, we modify

    the STM architecture s that it can that handle each keypoint and each edge in a particular confidence map.
  12. 16 Experiments : Video pose estimation • Finally, we use

    a STM architecture that handles each keypoint and each edge in a particular confidence map. • We try out multiple data augmentation method that aim at improving performance for long term tracking. without augmentations "rand" augmentation "baits" augmentation "jitter" augmentation "dull_clouds" augmentation
  13. 17 Experiments : Video pose estimation • Finally, we use

    a STM architecture that handles each keypoint and each edge in a particular confidence map. • We try out multiple data augmentation method that aim at improving performance for long term tracking. without augmentations "rand" augmentation "baits" augmentation "jitter" augmentation
  14. • Then, we finetune on the PoseTrack[1] dataset for 5

    epochs 18 1. Andriluka. et al., PoseTrack: A Benchmark for Human Pose Estimation and Tracking, CVPR 2018 Experiments : skeleton confidence map
  15. 19 1Lin TY. et al., Microsoft COCO: Common Objects in

    Context, ECCV 2014 Experiments : Video pose estimation • Finally, we use a STM architecture that handles each keypoint and each edge in a particular confidence map. • We try out multiple data augmentation method that aim at improving performance for long term tracking. • Cyclic training : During training, the model is fed back its own prediction for the previous frame, repeatedly
  16. 20 1Lin TY. et al., Microsoft COCO: Common Objects in

    Context, ECCV 2014 Experiments : Video pose estimation
  17. 21 1Lin TY. et al., Microsoft COCO: Common Objects in

    Context, ECCV 2014 Experiments : Video pose estimation • Verification that the gains from finetuning on PoseTrack18 and the data augmentations are on long term tracking performance
  18. • We first experimentally proved the capacity of STM to

    handle skeletons rather than segment the contour of the tracked object • We have then shown that the STM architecture can be modified to track each skeleton edge individually • Finally, we test a new architecture that can track skeleton keypoints and edges, and experiment with different training procedures and data augmentation methods. • Finetuning on the train set of PoseTrack18 • Cyclic training • Data augmentation "baits" and "jitter" shown to help • The next step should be to compare to other pose tracking methods Conclusion 22