Skeleton-based Human Action Recognition with Recurrent Neural Network

Skeleton-based Human Action Recognition with Recurrent Neural Network University of
Science, VNU-HCM Faculty of Information Technology Advanced Program in Computer Science Võ Trần Thanh Lương 1551020 Vũ Hoàng Quân 1551026 Thesis Advisors: Dr. Trần Thái Sơn Ho Chi Minh City Aug 18th 2019

Outlines • • • • • • •

Introduction Every human action is done to serve a purpose.
Machines should be able to learn and understand it.

Introduction

Introduction • • ◦ ◦ ◦ ◦ • •

Introduction • ◦ ◦

Motivation • ◦ ◦ ◦ ◦ ◦ ◦ …

• Motivation

Contributions • • • •

Related Work • •

Related Work • ◦ ◦ Zhuowen Lv 1, Xianglei Xing
1,, Kejun Wang 1, and Donghai Guan 2 , "Class Energy Image Analysis for Video Sensor-Based Gait Recognition: A Review," An example of local representations for human action

Related Work • ◦ ◦ Georgios D. Evangelidis, Gurkirt Singh,
Radu Horaud, "Continuous Gesture Recognition from Articulated Poses," Spatial Temporal interest points. S.F. Wong, T.-K. Kim, and R. Cipolla, "Learning motion categories using both semantic and structural information”

Related Work • ◦ ◦ ▪ ▪ Example of 3D
convolution. Karen Simonyan & Andrew Zisserman , "Two-Stream Convolutional Networks for Action Recognition in Videos,"

Related Work • ◦ ◦ Hybrid network for temporal modeling.
Lisa Anne Hendricks, Marcus Rohrbach, Subhashini Venugopalan, Sergio Guadarrama, Kate Saenko, Trevor Darrell, "Long-term recurrent convolutional networks for visual recognition and description," Jun Liu, Gang Wang, Ling-Yu Duan, Kamila Abdiyeva, Alex C. Kot, "Skeleton-Based Human Action Recognition with Global Context-Aware Attention LSTM Networks,

Proposed Method Temporal RNN Spatial RNN

Temporal RNN • • •

Temporal RNN • • • ◦ ◦

Spatial RNN • • • •

Spatial RNN

Spatial RNN • • •

Spatial RNN

3D Transformation

Training Flow Skeleton Dataset Training Set Testing Set Initialize RNN
Feature Extraction Training Softmax Classification Trained Model NTU Dataset Kinetics Dataset Raw video Extract Skeleton Data Using OpenPose

Predicted Flow Raw Video Extract Skeleton Data Load Trained Model
Predicted Value Using OpenPose

Experiments • ◦ ◦ ◦

NTU RGB+D and NTU RGB+D 120 Dataset • • •
•

Skeleton Joints Position (NTU Dataset)

Kinetics Dataset • • •

Sample frames of Kinetics Dataset

Skeleton Joints Position (Coco Model)

Proposed Method Result

Comparison with the state-of-the-art

Accuracy Calculation

Problems with Kinetics dataset NTU RGB+D NTU RGB+D 120 Kinetics
Raw Videos Yes Yes No (can obtain from given URLs) 3D skeleton data Yes Yes No Depth maps Yes Yes No

Problems with Kinetics dataset ◦ ◦

Conclusion • • • •

Future Work • • • Jun Liu, Gang Wang, Ling-Yu
Duan, Kamila Abdiyeva, Alex C. Kot, "Skeleton-Based Human Action Recognition with Global Context-Aware Attention LSTM Networks, ASPR Framework. Liu, Jun and Shahroudy, Amir and Perez, Mauricio and Wang, Gang and Duan, Ling-Yu and Kot, Alex C., "NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding,"

THANK YOU FOR YOUR ATTENTION University of Science, VNU-HCM Faculty
of Information Technology Advanced Program in Computer Science Võ Trần Thanh Lương 1551020 Vũ Hoàng Quân 1551026 Thesis Advisors: Dr. Trần Thái Sơn Ho Chi Minh City Aug 18th 2019

Skeleton-based Human Action Recognition with Re...

Skeleton-based Human Action Recognition with Recurrent Neural Network

More Decks by Luong Vo

Other Decks in Research

Featured

Transcript