Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Tech Talk at Animoto - ALL DEM MODELS (and R)

Avatar for podopie podopie
January 26, 2012

Tech Talk at Animoto - ALL DEM MODELS (and R)

Some work I did outside of work, showcasing some basics in machine learning for both supervised and unsupervised learning

Avatar for podopie

podopie

January 26, 2012
Tweet

Other Decks in Programming

Transcript

  1. Predicting Student Test Performance The data Mixed Linear Models Data

    Parsing/filtering Clustering and model optimization Overall performance Questions? 2
  2. The Data: “What do you know?” Problem: Not knowing what

    to study wastes time and focus Question: How well can we predict areas of difficulty for students so they can study smarter? Data: 4 million samples, 93100 to be predicted Goal: Predict outcome % per question 3
  3. 4

  4. Predicting Student Test Performance The data Mixed Linear Models Data

    Parsing/filtering Clustering and model optimization Overall performance Questions 5
  5. track_models = list() for (track in unique(training$track_name)) { print(sprintf(“Starting model

    for track %s.”, track)) rasch = lmer(correct ~ 1 + (1|user_id) + (1| question_id), data = training[training$track_name == track, c(‘correct’, ‘user_id’, ‘question_id’)], family = binomial, REML = F) 8
  6. track_models = list() for (track in unique(training$track_name)) { print(sprintf(“Starting model

    for track %s.”, track)) rasch = lmer(correct ~ 1 + (1|user_id) + (1| question_id), data = training[training$track_name == track, c(‘correct’, ‘user_id’, ‘question_id’)], family = binomial, REML = F) 0/1 predictions for user and question Builds a model per track (0 - 8) Each row is independent 9
  7. 11

  8. Predicting Student Test Performance The data Mixed Linear Models Data

    Parsing/filtering Clustering and model optimization Overall performance Questions? 15
  9. Predicting Student Test Performance The data Mixed Linear Models Data

    Parsing/filtering Clustering and model optimization Overall performance Questions 19
  10. Data: Edwin Chen, “Quick Introduction to ggplot2”blog.echen.me Poorly drawn circles:

    Thomson Nguyen, “Introduction to R(And Machine Learning)”, Jan 24 2012, Lookout Poor Man’s K-Means 22
  11. K-means Clustering m <- read.csv(‘gtagsall.csv’, header = T) id <-

    cbind(rowid = as.vector(t(row(m))), colid = as.vector(t(m))) id <- id[complete.cases(id), ] tag.matrix <- matrix(0, nrow = nrow(m), ncol = max(m, na.rm = T)) tag.matrix[id] <- 1 wss <- (nrow(mydata) -1) * sum(apply(tag.matrix, 2, var)) for (i in 2:281) { wss[i] <- sum(kmeans(tag.matrix, centers = i)$withinss) } plot(1:281, wss, type = ‘b’, xlab = ’Clusters’, ylab = ‘WSS’) 23
  12. Predicting Student Test Performance The data Mixed Linear Models Data

    Parsing/filtering Clustering and model optimization Overall performance Questions? 24