Slide 1

Slide 1 text

Zain Fuad [email protected] Lahore Improving Accuracy of AI Algorithms Using Sensor Fusion zainfuad2

Slide 2

Slide 2 text

Contents Introduction Problem Definition Proposed Solution Results Future Work 2

Slide 3

Slide 3 text

➢ HAR refers to acquiring a person’s body movements and understanding the performed action Human Action Recognition (HAR)

Slide 4

Slide 4 text

HAR - Applications Health Care Sports Security Automation Robotics Education

Slide 5

Slide 5 text

HAR – Sensors Utilized RGB-Camera MoCap System Depth Camera Wearable (MEMS) Inertial Sensors

Slide 6

Slide 6 text

HAR – Comparison of Sensors +Can capture visual cues (color, shape, motion) +Widely available - Illumination affects performance -Occlusions affects performance -Subject needs to be in field of view - No depth information +Very accurate -Expensive -Need a lot of space RGB-Camera MoCap System

Slide 7

Slide 7 text

+Can capture 3D information +Can work under low light +Availability of skeletal joint positions (Microsoft Kinect) -Unrealistic skeletal joint Positions -Object Should be in field of view -Privacy concerns HAR – Comparison of Sensors Depth Camera

Slide 8

Slide 8 text

+Can Capture 3D information +Can work under low light +Availability of skeletal joint positions (Microsoft Kinect) -Unrealistic skeletal joint positions -Object should be in field of view -Privacy concerns HAR – Comparison of Sensors -Limit to the number of sensors worn -Unwillingness to wear a sensor +Easy to wear +Provide little or no hindrance +Very accurate with high sampling rate Depth Camera Wearable (MEMS) Inertial Sensors

Slide 9

Slide 9 text

HAR – Sensor Fusion Why sensor Fusion? The limitations of one sensor can be compensated by other sensor(s)

Slide 10

Slide 10 text

Contents Introduction Problem Definition Proposed Solution Results Future Work 10

Slide 11

Slide 11 text

Problem Definition 11 Original Depth sensor Skeletal joint positions Inertial sensor measurements

Slide 12

Slide 12 text

12

Slide 13

Slide 13 text

Performed Action Data Acquisition Recognize Human Actions by Assigning them a Class Label Feature Classification Class Label Eating Knock Squat Smoking Problem Statement

Slide 14

Slide 14 text

Problem Definition 14 Assign a class label Depth Sensor: Inertial Sensor:

Slide 15

Slide 15 text

Contents Introduction Problem Definition Proposed Solution Results Future Work 15

Slide 16

Slide 16 text

Proposed Solution 16

Slide 17

Slide 17 text

Dividing Rows by their norm Savitzky-Golay Filter Bi-cubic Interpolation Savitzky-Golay Filter Features Stacked Column- wise Feature Extraction - Skeletal Data

Slide 18

Slide 18 text

Temporal windows of size W x 6 µ and σ from windows (per direction) Bi-cubic Interpolation Features Stacked Column- wise Feature Extraction - Inertial Data

Slide 19

Slide 19 text

Proposed Solution 19 Depth Sensor: Inertial Sensor: 20 Feature Extraction

Slide 20

Slide 20 text

20 Feature Extraction Depth Sensor: Inertial Sensor: 0.6 Divide each row by its norm to reduce subject and joint dependency Partition into windows (win size = 3) and calculate µ and σ for each direction Stack features column- wise Bicubic interpolation Bicubic interpolation Least number of frames from training set Stack features column-wise and use Savitzky-Golay filter [2] to reduce noise (spikes) Proposed Solution

Slide 21

Slide 21 text

Proposed Solution 21 - 1 hidden layer with 86 Neurons - Trained Using Conjugate Gradient with Polak-Ribiére updates [3] - Individual Neural Network classifiers are used as the classifiers - 1 hidden layer with 90 Neurons - Trained Using Conjugate Gradient with Polak-Ribiére updates [3] Softmax output layer Feature Classification

Slide 22

Slide 22 text

Proposed Solution 22 Logarithmic Opinion Pool Assuming a uniform distribution between sensors Number of sensors

Slide 23

Slide 23 text

Contents Introduction Problem Definition Proposed Solution Results Future Work 23

Slide 24

Slide 24 text

24 University of Texas at Dallas Multimodal Human Action Dataset • 1 Inertial and 1 Depth sensor - IMU to capture 3 axis linear acceleration, 3 axis angular velocity, 3 axis magnetic strength - IMU placed on right wrist for 21 actions, right thigh for 6 actions - Microsoft Kinect to track movement of 20 joints • Total size of the dataset 861 entries - 27 registered actions - 8 subjects (4 males, 4 females) - Each action performed 4 times by each subject - 3 corrupt sequences were removed Results

Slide 25

Slide 25 text

Results 25

Slide 26

Slide 26 text

26 Subject-Generic Test Skeletal Accuracy (%) Inertial Accuracy (%) Fusion Accuracy (%) Chen et al. [5] 74.7 76.4 91.5 Implemented Algorithm 74.8 81.2 95.0 Table 1. Recognition Accuracies for subject-generic experiment - 8-fold cross-validation performed (for each subject) Comparison with state of the art implementation Results

Slide 27

Slide 27 text

Contents Introduction Problem Definition Proposed Solution Results Future Work 27

Slide 28

Slide 28 text

Future Work - Try with a range of different sensors - Test the limitation point of adding the sensors 28

Slide 29

Slide 29 text

Zain Fuad [email protected] Lahore Thank you zainfuad2

Slide 30

Slide 30 text

Paper Reference Fuad, Zain, and Mustafa Unel. "Human action recognition using fusion of depth and inertial sensors." International Conference Image Analysis and Recognition. Springer, Cham, 2018. 30 CVR Control, Vision and Robotics Research Group