Upgrade to Pro — share decks privately, control downloads, hide ads and more …

NEXT: Crowdsourcing, machine learning and cartoons

NEXT: Crowdsourcing, machine learning and cartoons

Scott Sievert

January 09, 2018
Tweet

More Decks by Scott Sievert

Other Decks in Programming

Transcript

  1. NEXT: Crowdsourcing, machine learning and cartoons Scott Sievert UW–Madison ECE

    @stsievert + #pydata = https://speakerdeck.com/stsievert/next-crowdsourcing-machine-learning-and-cartoons https://tinyurl.com/next-pydata
  2. NEXT enables research by producing better results for the same

    time or money NEXT is implemented in pure Python https://tinyurl.com/next-pydata
  3. Existing crowdsourcing systems are passive Goal: adapt to previously collected

    responses Adapting to previous responses finds the best questions One solution https://tinyurl.com/next-pydata
  4. Kevin Jamieson Prof. Rob Nowak Lalit Jain Daniel Ross nextml.org

    Homepage: http://nextml.org Source: https://github.com/nextml/NEXT Documentation: https://github.com/nextml/NEXT/wiki https://tinyurl.com/next-pydata
  5. UW Psychology uses NEXT to find the best algorithms for

    adaptive data collection in cognitive science. The New Yorker uses NEXT to crowd-source the weekly cartoon caption contest. Air Force Research Lab uses NEXT for active image classification. ML Researchers Experimentalists Practitioners Theory Practice NEXT users https://tinyurl.com/next-pydata
  6. Bob Mankoff Comic by P. C. Vey The New Yorker

    has to find the funniest caption from ~5,000 captions Example problem https://tinyurl.com/next-pydata
  7. Histogram of responses Histogram of time responses received Experiment Info

    Data from contests: https://github.com/nextml/caption-contest-data Dashboard https://tinyurl.com/next-pydata
  8. https://tinyurl.com/next-pydata 0 not funny 1 somewhat funny 2 funny Data

    BR-lilUCB Random Papers + software enhancements 4x times fewer ratings needed! Experimentalist Benefits
  9. Crowdsourcing Adaptive sampling algorithms fewer responses real–world data + problems

    seen Goal: enable this feedback loop Enabling this feedback loop requires software that is useful and easy to use by both parties NEXT https://tinyurl.com/next-pydata
  10. NEXT also has a REST API comic by P. C.

    Vey Cardinal Bandits Select the street that looks safer Dueling Bandits Select expression on the bottom most similar to the face on top Pool based triplets By default, NEXT has adaptive algorithms for the 3 default question types Default uses https://tinyurl.com/next-pydata
  11. Algorithm developer and mathematician Pr ky ˆ yk2 2 <

    ✏ 1 https://tinyurl.com/next-pydata
  12. See https://github.com/nextml/NEXT/wiki for details and more launching options (more detail

    in SciPy 2017 proceedings and on docs) Launching NEXT via Amazon EC2 AMI https://tinyurl.com/next-pydata
  13. Algorithm developer and mathematician Pr ky ˆ yk2 2 <

    ✏ 1 https://tinyurl.com/next-pydata
  14. 1. Treat algorithms as black boxes • For each function,

    inputs and outputs are documented and type-checked 2. Use wrapper to allow easy access to experiment information and background jobs 3. Objects are abstracted to integers • There is a mapping from integers to complete object details (more detail in SciPy 2017 proceedings and on docs) Algorithm design decisions 0. Use a high–level language (Python) https://tinyurl.com/next-pydata
  15. Select the street that looks safer Street Score Select expression

    on the bottom most similar to the face on top Psychology triplets Use cases Image search https://tinyurl.com/next-pydata enabled by this implementation
  16. Psychology triplets Problem: use humans to generate “similarity” map of

    facial emotions This is the motivation for NEXT https://tinyurl.com/next-pydata
  17. People Prof. Tim Rogers Psychology April Murphy Psychology Kevin Jamieson

    ECE Prof. Robert Nowak ECE Lalit Jain Math https://tinyurl.com/next-pydata
  18. Psychology triplets Select expression on the bottom most similar to

    the face on top Best question for similarity map: https://tinyurl.com/next-pydata
  19. Psychology triplets Number of questions scale poorly Finding the best

    questions gets more difficult (⇡ n3) https://tinyurl.com/next-pydata
  20. Street score Goal: find humans perception of the safest streets

    in Chicago https://tinyurl.com/next-pydata
  21. Street score Question: pairwise comparisons These use Google StreetView images

    Select the street that looks safer https://tinyurl.com/next-pydata
  22. Street score Issues • Clustered ranking safe somewhat safe unsafe

    Advantages Similar to sorting https://tinyurl.com/next-pydata
  23. Paper (@AISTATS 2018) Sumeet Katariya ECE Nandana Sengupta Sociology Prof.

    James Evans Sociology https://tinyurl.com/next-pydata Prof. Robert Nowak ECE Lalit Jain Math
  24. Layman’s adaptive algorithm getQuery: choose streets where most uncertain about

    which cluster the street belongs too boundary Score Street picture index https://tinyurl.com/next-pydata safe unsafe
  25. Results No gains from active learning! Why? Crowdsourcing responses are

    too noisy https://tinyurl.com/next-pydata Fraction of inverted pairs Error Number of responses passive adaptive Actual: Predicted:
  26. But if we dig a little deeper with simulations… Safe

    Actual: Predicted: Unsafe Safe Unsafe adaptive passive
  27. Image search problems • Large dataset from Zappos • 50,000

    shoes with 1,000 features each) https://tinyurl.com/next-pydata [f(shoe, , ) for shoe in [ , , , , , ] ↑ entire dataset
  28. Image search solution exhaustively evaluate shoes ↓ selectively evaluate shoes

    [f(shoe, , ) for shoe in [ , , ] https://tinyurl.com/next-pydata
  29. Time 20 40 Cumulative rewards 5 10 15 20 25

    30 OFUL Lazy LTS OFUL Light QOFUL NN Their method https://tinyurl.com/next-pydata Number of answers for red boots
  30. Software improvements • Internal redesign • {Experimentalist, developer} ease of

    use • Fast in-memory database support with Redis • Documentation improvements • Many bugs fixes
  31. 1. Adaptive sampling reduces data collection cost. 2. NEXT is

    a crowdsourcing data collection tool that can use adaptive sampling techniques 3. NEXT is easy* to use by experimentalists, algorithm developers and practitioners, and a mathematical background is not required. 4. NEXT developers use experimentalist engagement to aid research and to gain feedback to improve the software * NEXT has been created by an academic research group in collaboration with psychologists Key messages https://tinyurl.com/next-pydata
  32. • Documented exactly in apps/[app-id]/algs/Algs.yaml • Function implementation Algorithm inputs

    and outputs Depends on a library we developed: https://github.com/daniel3735928559/pijemont
  33. More detail on documentation: https://github.com/nextml/NEXT/wiki 0. Web browser 1. Amazon

    AWS account 2. ZIP of targets (e.g., images) 3. Experiment description (which has good documentation!) Result requirements After NEXT link sent to crowdsourcing service, results can be generated!
  34. Histogram of responses Histogram of time responses received Experiment Info

    Data from contests: https://github.com/nextml/caption-contest-data Dashboard tinyurl.com/scipy-next