Slide 29
Slide 29 text
CASE STUDY - MINE 29
Data Warehousing
Defining Key Performance Metrics
Analytics and BI
Data
Science
id user_id target_user_id action_id age is_female is_vet is_south created_at signed_petition
1 13215827 NULL 5390823 19 0 1 1 28-nov-2012 1
2 1614524 120583933 5390823 25 1 1 0 28-nov-2012 1
3 11058392 NULL 6729301 29 1 0 0 28-nov-2012 0
4 9529371 168920339 8950132 28 0 0 1 28-nov-2012 0
5 10385293 NULL 5390823 24 1 1 1 28-nov-2012 1
‣ This is a subset of the total number of columns we have in our dataset.
‣ The actual dataset has approximately 110 columns, representing either a
demographic feature (age, gender, location), or a psychographic/behavioral feature
(is a veteran, number of petitions signed on Causes, number of related FB likes).
For this example, we’ll only use the above features.
‣ Our response variable (the feature we’re predicting) is whether or not they signed
the actual petition.