Slide 1

Slide 1 text

YOUR SMARTPHONE KNOWS WHAT YOU’RE DOING C Todd Lombardo

Slide 2

Slide 2 text

OUTLINE ● Problem statement ● Performance summary ● Process ○ Exploratory data analysis ○ Model selection ○ Feature selection ○ Model tuning ○ Error analysis ○ Tried and failed ● Next steps

Slide 3

Slide 3 text

Problem Statement

Slide 4

Slide 4 text

Problem statement Hypothesis By examining accelerometer and gyroscope sensor data from a smartphone, a model can classify which activity was performed by a person Goals Accurately predict a human movement from accelerometer and gyroscope smartphone data, both provided in the data repo and captured by a mobile app. Risks and limitations Feature engineering may need to go beyond the scope of the course content. One is the possible exploration of a principal component analysis and other feature reduction techniques.

Slide 5

Slide 5 text

Performance Summary

Slide 6

Slide 6 text

Results: Classify by logistic regression Logistic Regression Accuracy = 0.98640 on the test data set With PCA, the 561 features could be reduced to 120 principal components Sitting and Standing were the most difficult to discern: 20 false positives among them

Slide 7

Slide 7 text

Process

Slide 8

Slide 8 text

Target variable: Activity The model will need to accurately predict one of these movements: Walking Walking_upstairs Walking_downstairs Sitting Standing Laying dynamic static

Slide 9

Slide 9 text

Exploratory Data Analysis

Slide 10

Slide 10 text

About the dataset Human Activity Recognition w/Smartphone (Sources Kaggle, UC Irvine) 1. Inertial sensor data a. Raw triaxial signals from the accelerometer & gyroscope of all the trials with participants b. The labels of all the performed activities 2. Records of activity windows. Each one composed of: a. A 561-feature vector with time and frequency domain variables. b. Its associated activity label c. An identifier of the subject who carried out the experiment. The experiments were carried out with a group of 30 volunteers within an age bracket of 19-48 years. Each person performed six activities (WALKING, WALKING-UPSTAIRS, WALKING-DOWNSTAIRS, SITTING, STANDING, LAYING) wearing a smartphone (Samsung Galaxy S II) on the waist. Using its embedded accelerometer and gyroscope, we captured 3-axial linear acceleration and 3-axial angular velocity at a constant rate of 50Hz. The obtained dataset has been randomly partitioned into two sets, where 70% of the volunteers was selected for generating the training data and 30% the test data.

Slide 11

Slide 11 text

About the dataset Features will be selected and/or engineered from: Time-series signals ▸ tBodyAcc-XYZ ▸ tGravityAcc-XYZ ▸ tBodyAccJerk-XYZ ▸ tBodyGyro-XYZ ▸ tBodyGyroJerk-XYZ ▸ tBodyAccMag ▸ tGravityAccMag ▸ tBodyAccJerkMag ▸ tBodyGyroMag ▸ tBodyGyroJerkMag Fourier Transformed Signals ▸ fBodyAcc-XYZ ▸ fBodyAccJerk-XYZ ▸ fBodyGyro-XYZ ▸ fBodyAccMag ▸ fBodyAccJerkMag ▸ fBodyGyroMag ▸ fBodyGyroJerkMag Features are normalized and bounded within [-1,1]. Each feature vector is a row on the text file. Some feature derivations are included in the dataset. The units used for the accelerations (total and body) are 'g's (gravity of earth -> 9.80665 m/seg2). The gyroscope units are rad/seg.

Slide 12

Slide 12 text

About the dataset Count of the Features

Slide 13

Slide 13 text

Data: Basic statistics hua.describe()

Slide 14

Slide 14 text

So about those correlations...

Slide 15

Slide 15 text

Data: By activity hua.groupby(Activity).count()

Slide 16

Slide 16 text

Which correlations matter? Feature_1 Feature_2 Correlation Abs_Corr tBodyAccJerk-energy()-X fBodyAccJerk-energy()-X 0.999999 0.999999 fBodyAccJerk-energy()-X tBodyAccJerk-energy()-X 0.999999 0.999999 fBodyAcc-bandsEnergy()-1,24 fBodyAcc-energy()-X 0.999878 0.999878 fBodyAcc-energy()-X fBodyAcc-bandsEnergy()-1,24 0.999878 0.999878 fBodyGyro-energy()-X fBodyGyro-bandsEnergy()-1,24 0.999767 0.999767 fBodyGyro-bandsEnergy()-1,24 fBodyGyro-energy()-X 0.999767 0.999767 fBodyAcc-bandsEnergy()-1,24.1 fBodyAcc-energy()-Y 0.999661 0.999661 fBodyAcc-energy()-Y fBodyAcc-bandsEnergy()-1,24.1 0.999661 0.999661 tBodyAccJerkMag-mean() tBodyAccJerk-sma() 0.999656 0.999656 tBodyAccJerk-sma() tBodyAccJerkMag-mean() 0.999656 0.999656 corr_values = hua[feature_cols].corr() corr_values[(corr_values.Correlation < 1.0) & (corr_values.Correlation > 0.9)]

Slide 17

Slide 17 text

What do the activities look like for one subject?

Slide 18

Slide 18 text

What do the activities look like for all subjects?

Slide 19

Slide 19 text

What does one activity look like for different subjects?

Slide 20

Slide 20 text

Can we separate static and dynamic activities? Yes ——> If tBodyAccMag < -0.5 then it’s a static activity If tBodyAccMag > -0.5 then it’s a dynamic activity dynamic static

Slide 21

Slide 21 text

Is it possible to separate specific activities?

Slide 22

Slide 22 text

Is it possible to separate all the activities? (t-SNE)

Slide 23

Slide 23 text

Model Selection

Slide 24

Slide 24 text

Model Selection: “Naive” (No feature selection, no tuning) Algorithm Train Cross-Validation Score Test Accuracy Score DecisionTreeClassifier 0.840870 0.852392 RandomForestClassifier 0.917985 0.922973 KNNeighborsClassifier 0.897175 0.900238 LogisticRegression 0.933495 0.957923 Naive running is fitting models with no feature selection, tuning, or optimization. Yes, I ran all 561 features. And yes, it took a while. Must be classification algorithm, ran four

Slide 25

Slide 25 text

Feature Selection

Slide 26

Slide 26 text

Principal Component Analysis: Visualization of PC1 & PC2

Slide 27

Slide 27 text

PCA

Slide 28

Slide 28 text

PCA – Project Back to the Features

Slide 29

Slide 29 text

Map back to actual features? Signal Value angle(tBodyGyroJerkMean,gravityMean) 0.479798 tBodyAccJerk-mean()-X 0.201355 tGravityAcc-energy()-Y 0.115365 fBodyAcc-kurtosis()-Z 0.098975 fBodyAcc-skewness()-Z 0.098076 tGravityAcc-correlation()-X,Y 0.085849 angle(tBodyGyroMean,gravityMean) 0.075399 angle(Z,gravityMean) 0.074471 tGravityAcc-min()-Y 0.059661 tGravityAcc-mean()-Y 0.055262 pd.Series(pc1, index=hua_pca.columns).sort_values(ascending=False) Which are the most predictive signals?

Slide 30

Slide 30 text

This matches up to the corr heatmap...

Slide 31

Slide 31 text

Model Tuning

Slide 32

Slide 32 text

How many PCA features needed to increase accuracy? Plotting all the accuracy scores by increasing the number of PCA variables in the model

Slide 33

Slide 33 text

Which are the most important PCA Features? coef variable abscoef 6.015424 PC3 6.015424 3.503071 PC22 3.503071 2.330216 PC48 2.330216 1.976726 PC32 1.976726 -1.844982 PC26 1.844982 1.678715 PC23 1.678715 1.291059 PC45 1.291059 1.269009 PC13 1.269009 -1.153241 PC6 1.153241 1.136382 PC24 1.136382 coefs_vars.sort_values('abscoef', ascending=False, inplace=True) Which are the most important PCA Features?

Slide 34

Slide 34 text

Error Analysis

Slide 35

Slide 35 text

Confusion Matrix 120 PCA features With 20 false positives

Slide 36

Slide 36 text

Tried and Failed

Slide 37

Slide 37 text

These didn’t work so well 1. Scaling the data with StandardScaler: Accuracy dropped to 0.911 (“features are normalized”) 2. Further feature reduction of PCA or Signal features a. a challenge to find specific signals, it seems the combination of signals is far more useful b. even when using top 20 principal components 3. Ridge — struggled to get this to work properly

Slide 38

Slide 38 text

Next Steps

Slide 39

Slide 39 text

With more time 1. Horn's parallel analysis 2. Optimize Lasso and Ridge, would a penalty improve the model? 3. Dig further into t-SNE for feature selection & reduction 4. Run experiment with my own phone a. Perform the six activities with a sensor recorder app b. Process the raw data, run the model and score it 5. Try this with data from a smartwatch!

Slide 40

Slide 40 text

No content

Slide 41

Slide 41 text

References & credits https://www.kaggle.com/uciml/human-activity-recognition-with-smartphones http://www.cis.fordham.edu/wisdm/public_files/sensorKDD-2010.pdf https://upcommons.upc.edu/bitstream/handle/2117/101769/IWAAL2012.pdf https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2013-11.pdf https://github.com/anas337/Human-Activity-Recognition-Using-Smartphones https://github.com/markdregan/K-Nearest-Neighbors-with-Dynamic-Time-Warping http://www.cis.fordham.edu/wisdm/dataset.php With code lovingly borrowed from many GA Jupyter and Kaggle notebooks