Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Flock: Hybrid Crowd-Machine Learning Classifiers

Flock: Hybrid Crowd-Machine Learning Classifiers

Presented at CSCW 2015

Hybrid crowd-machine learning classifiers are classification models that start with a written description of a learning goal, use the crowd to suggest predictive features and label data, and then weigh these features using machine learning to produce models that are accurate and use human-understandable features. These hybrid classifiers enable fast prototyping of machine learning models that can improve on both algorithm performance and human judgment, and accomplish tasks where automated feature extraction is not yet feasible. Flock, an interactive machine learning platform, instantiates this approach.

Justin Cheng

March 17, 2015
Tweet

More Decks by Justin Cheng

Other Decks in Research

Transcript

  1. We rely on predictions every everyday × Today’s weather is…

    If you liked…, you may also like… The hourly trending topics are… You may know these people… Is this email spam?
  2. It’s time-consuming to figure out which features work. orange contains

    the word “cat” # of page-views # of likes time between edits head looking down pastel colors positive sentiment punctuation capitalization repetition Domingos, P. (CACM 2012)
  3. Embedding crowds inside machine learning architectures Works in domains where

    machines alone fail Allows for faster prototyping Automatically self-improving (And is more accurate) Flock
  4. Where we’re going Flock: a crowd-machine classifier Generating, labeling and

    evaluating features Evaluating human and machine performance
  5. Flock: automating the learning process Output: Model Input: Examples Process:

    Feature Engineering Feature Generation Example Annotation Model Evaluation
  6. Why use people at all? Good at generating diverse ideas

    Andre, P., et al. (CSCW 2014), Yu, L., et al. (CHI 2014)
  7. Why use people at all? Poor at aggregating information Hammond,

    K. R., et al. (Psych. Rev. 1964), Dawes, R. (Am. Psych. 1971)
  8. People Great at weighing multiple factors Machines Good at generating

    diverse ideas Poor at aggregating information Can only annotate certain types of data Limited in feature expressiveness Can annotate arbitrary data
  9. People Great at weighing multiple factors Machines Good at generating

    diverse ideas Poor at aggregating information Can only annotate certain types of data Limited in feature expressiveness Can annotate arbitrary data
  10. People Great at weighing multiple factors Machines Good at generating

    diverse ideas Poor at aggregating information Can only annotate certain types of data Limited in feature expressiveness Can annotate arbitrary data
  11. Why not directly ask the crowd the prediction task? Because

    we can be 10% more accurate using Flock.
  12. It makes me feel pleasant. What do you think makes

    this Wikipedia article a “Good Article”?
  13. Gentner, D., et al. (J. Ed. Psych. 2003) “Good” Article

    “Bad” Article Analogical Encoding
  14. Broken down into organized sections. Thorough and well-organized. First article

    was poorly organized. First article has more in-depth photos. One article offers more photos. There are insufficient images. More pictures and descriptions More historically reliable references First article offers more in-depth photos … Too many!
  15. Broken down into organized sections. Thorough and well-organized. First article

    was poorly organized. First article has more in-depth photos. One article offers more photos. There are insufficient images. Cluster 1 Cluster 2 Cluster 3 …
  16. Broken down into organized sections. Thorough and well-organized. First article

    was poorly organized. First article has more in-depth photos. One article offers more photos. There are insufficient images. Is this article well-organized? Does this article have photos? Are there insufficient images? …
  17. Feature generation Compare positive and negative examples Cluster similar suggestions

    Generate feature for each cluster A B Lorem ipsum dolor sit Lorem ipsum dolor sit Lorem ipsum dolor sit Lorem ipsum dolor sit
  18. A learning algorithm aggregates features. List of Features Feature Matrix

    Feature Generation Example Annotation Machine Learning ? ? ?
  19. Three possible learning algorithms Logistic Regression Decision Trees Random Forests

    Interpretable Scalable Most interpretable Prone to over-fitting Least interpretable Slower Tends to be most accurate
  20. Short Paragraphs? Yes No … … … Yes No Yes

    No Yes No No Yes No Yes No Yes No Yes
  21. Short Paragraphs? Yes No … … … Yes No Yes

    No Strong Intro? Attractive Images? Sounds complicated? Yes Yes Yes No No No No Yes No Yes No Yes No Yes No Yes
  22. Evaluation Paintings Hotel Reviews Wikipedia Jokes StackExchange Lying 200 examples

    200 examples 200 examples 200 examples 200 examples 400 examples
  23. Metrics Guessing
 Crowd directly asked the prediction question Automatic ML


    Training a classifier using best features from prior work Baselines Flock Flock
 Training a classifier with crowd-nominated features Flock + ML
 Training a classifier with crowd-nominated and machine features
  24. Guessing Automatic ML Flock Flock + ML Accuracy 0.5 0.6

    0.7 0.8 0.9 Baseline performance is decent Sen, S., et al. (CSCW 2015) ? ? Lying Jokes StackEx Paintings Reviews Wikipedia Median
  25. Guessing Automatic ML Flock Flock + ML Accuracy 0.5 0.6

    0.7 0.8 0.9 Flock improves on humans/machines ? Lying Jokes StackEx Paintings Reviews Wikipedia Median
  26. Guessing Automatic ML Flock Flock + ML Accuracy 0.5 0.6

    0.7 0.8 0.9 Flock + ML is 10% more accurate Lying Jokes StackEx Paintings Reviews Wikipedia Median
  27. What were the most predictive features? Paintings Hotel Reviews Wikipedia

    Jokes StackExchange Lying Monet or Sisley? Monet are more likely to have flowers. Truthful or Deceptive? Truthful reviews have negative content. Good or Bad Article? Good articles have strong introductions. Popular joke or not?
 Popular jokes use repetition. Answer selected?
 Selected answers are well-written. Lie or truth?
 A lying person has shifty eyes.
  28. Learning how the crowd labels features 200 800 Examples Annotated

    by
 the crowd Annotated using bigram model
 trained on crowd annotations Baseline 1000 Examples All annotated by the crowd Learned Features
  29. Learning how the crowd labels features 0.74 0.78 Accuracy 200

    800 Examples Annotated by
 the crowd Annotated using bigram model
 trained on crowd annotations Baseline 1000 Examples All annotated by the crowd Learned Features