Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Human Powered Rails: Automated Crowdsourcing In...

Human Powered Rails: Automated Crowdsourcing In Your Rails App

RailsConf 2018 talk on integrating Amazon Mechanical Turk in a Ruby on Rails Application

Andy Glass

April 18, 2018
Tweet

More Decks by Andy Glass

Other Decks in Programming

Transcript

  1. Andy Glass | RailsConf 2018 | Human Powered Rails This

    is not a talk about Mechanical Turk
  2. Andy Glass | RailsConf 2018 | Human Powered Rails This

    is not a talk about Mechanical Turk (yet)
  3. Andy Glass | RailsConf 2018 | Human Powered Rails This

    is not a talk about Crowdsourcing
  4. Andy Glass | RailsConf 2018 | Human Powered Rails This

    is a talk about being an imposter
  5. Andy Glass | RailsConf 2018 | Human Powered Rails Imposter

    Syndrome A concept describing individuals who are marked by an inability to internalize their accomplishments and have a persistent fear of being exposed as a "fraud" Pauline R. Clance and Suzanne A. Imes; 1978
  6. Andy Glass | RailsConf 2018 | Human Powered Rails We’re

    in good company… Maya Angelou Tom Hanks Neil Gaiman Sheryl Sandberg Sonia Sotomayor
  7. Andy Glass | RailsConf 2018 | Human Powered Rails Why

    Imposter Syndrome is Good for you Shane Ferro Huffington Post January 15, 2016
  8. Andy Glass | RailsConf 2018 | Human Powered Rails If

    you are interested in personal growth and development, by definition you are always going to be pushing yourself into something which is new. And when things are new, of course we don’t feel as comfortable in our skin as when we are doing something which is deeply familiar to us, and which we’ve been doing for five or 10 years. Caroline Webb
  9. Andy Glass | RailsConf 2018 | Human Powered Rails This

    is not a talk about Mechanical Turk
  10. Andy Glass | RailsConf 2018 | Human Powered Rails This

    is a talk about learning to fake shit
  11. Andy Glass | RailsConf 2018 | Human Powered Rails Amazon

    Mechanical Turk is about faking shit
  12. Andy Glass | RailsConf 2018 | Human Powered Rails Amazon

    Mechanical Turk is about faking shit (ok, now lets talk about turk)
  13. Andy Glass | RailsConf 2018 | Human Powered Rails How

    did Mechanical Turk Start? Internal Amazon tool for categorizing products & identifying duplicates Publicly launched as a AWS product in 2005 Marketplace for online micro-jobs
  14. Andy Glass | RailsConf 2018 | Human Powered Rails I

    told you it was about faking shit…
  15. Andy Glass | RailsConf 2018 | Human Powered Rails What

    can we use mTurk for? • Image/Video Processing • Data verification & Clean up • Information Gathering • Data processing
  16. Andy Glass | RailsConf 2018 | Human Powered Rails ~1,500

    Groups Of HITS (Questions) ~300,000 HITS (Assignments) ~15,000 Most # of HITS In A Single HIT Group $0.01 Lowest HIT Reward $150 Highest HIT Reward (2+ Hours Audio Transcription)
  17. Andy Glass | RailsConf 2018 | Human Powered Rails My

    requirements for this talk ✓ Not simple for Artificial Intelligence* ✓ Social Media Content ✓ Pittsburgh-centric ✓ Make attendees excited for dinner/happy hour *Yes- AI can do a lot
  18. Andy Glass | RailsConf 2018 | Human Powered Rails Does

    this sandwich have french fries in it?
  19. Andy Glass | RailsConf 2018 | Human Powered Rails Manual

    mTurk process ✓ Assemble sample data (IG Posts) ✓ Create new mTurk project ✓ Load mTurk batch ✓ Review Results
  20. Andy Glass | RailsConf 2018 | Human Powered Rails 4

    seconds Min time 702 seconds Max time 42 seconds Median Time 239 seconds Avg Time
  21. Andy Glass | RailsConf 2018 | Human Powered Rails 1

    Min assignments/worker 10 Max assignments/worker 3 Avg assignments/worker
  22. Andy Glass | RailsConf 2018 | Human Powered Rails 100%

    Consensus (both workers agreed) 95.2% Accuracy (20/21 sandwiches)
  23. Andy Glass | RailsConf 2018 | Human Powered Rails There

    are tips + tricks to getting accurate results
  24. Andy Glass | RailsConf 2018 | Human Powered Rails Automation

    ✓ Create mTurk RoR service ✓ Create processes for loading tasks ✓ Create processes for approving results/re-inputting tasks ✓ Serve results via API
  25. Andy Glass | RailsConf 2018 | Human Powered Rails Gems

    I used Turkee by Jim Jones (aantix) Built on top of RTurk; DB Models; Easily converts forms; Launch/Import tasks https://github.com/aantix/turkee on top of: 
 RTurk by Ryan Tate (ryantate) A simple wrapper and library for Amazon's Mechanical Turk https://github.com/ryantate/rturk
  26. Andy Glass | RailsConf 2018 | Human Powered Rails Batch

    name title description instructions output_field_name output_field_opts (hash) BatchItem batch_id post_id reward_in_cents status result confidence BatchItemResult batch_item_id result (hash)
  27. Andy Glass | RailsConf 2018 | Human Powered Rails Batch

    name title description instructions output_field_name output_field_opts (hash) BatchItem batch_id post_id reward_in_cents status result confidence BatchItemResult batch_item_id result (hash) batch = Batch.create( name: "Sandwich Wizard", title: "Look at a picture of a delicious sandwich and determine…”, description: "You You will be provided with an image of a sandwich. Please analyze…” instructions: "<p>Please look at the image of a sandwich provided and…”, output_field_name: "category", output_field_opts: { yes: "Yes- it has french fries inside", no: "No- it does not have french fries inside" } ) post_ids = [‘BJ8ohZEAjWQ’, ‘BOn4nJvFAf7’, ‘BQirlsNFAd5’… post_ids.each do |post_id| batch.batch_items.create(post_id: post_id, reward_in_cents: 15) end
  28. rails g turkee 1. Creates config file (fill with AWS

    credentials) 2. Creates 2 DB models
  29. Andy Glass | RailsConf 2018 | Human Powered Rails Batch

    name title description instructions output_field_name output_field_opts (hash) BatchItem batch_id post_id reward_in_cents status result confidence BatchItemResult batch_item_id result (hash) Turkee::TurkeeTask Turkee::TurkeeImportedAssignment turkee_task_id assignment_id (from mturk) worker_id (from mturk) result_id hit_url (from mturk) task_type hit_id complete (boolean) number_completed_assignments (integer)
  30. Andy Glass | RailsConf 2018 | Human Powered Rails host

    = “https://sandwichspotter.herokuapp.com” hit_title = batch.title hit_description = batch.description model_name = :BatchItemResult num_assignments = 2 reward = batch_item.price_in_cents/100.0 hit_lifetime = 5 #days duration = 1 #hours qualifications = {approval_rate: {gt: 95}} params = {} opts = {form_url: "https://sandwichspotter.herokuapp.com/ batches/#{batch.id}/batch_items/#{id}/batch_item_results/new"}
  31. Andy Glass | RailsConf 2018 | Human Powered Rails host

    = “https://sandwichspotter.herokuapp.com” hit_title = batch.title hit_description = batch.description model_name = :BatchItemResult num_assignments = 2 reward = batch_item.price_in_cents/100.0 hit_lifetime = 5 #days duration = 1 #hours qualifications = {approval_rate: {gt: 95}} params = {} opts = {form_url: "https://sandwichspotter.herokuapp.com/ batches/#{batch.id}/batch_items/#{id}/batch_item_results/new"}
  32. Andy Glass | RailsConf 2018 | Human Powered Rails host

    = “https://sandwichspotter.herokuapp.com” hit_title = batch.title hit_description = batch.description model_name = :BatchItemResult num_assignments = 2 reward = batch_item.price_in_cents/100.0 hit_lifetime = 5 #days duration = 1 #hours qualifications = {approval_rate: {gt: 95}} params = {} opts = {form_url: "https://sandwichspotter.herokuapp.com/ batches/#{batch.id}/batch_items/#{id}/batch_item_results/new"}
  33. Andy Glass | RailsConf 2018 | Human Powered Rails host

    = “https://sandwichspotter.herokuapp.com” hit_title = batch.title hit_description = batch.description model_name = :BatchItemResult num_assignments = 2 reward = batch_item.price_in_cents/100.0 hit_lifetime = 5 #days duration = 1 #hours qualifications = {approval_rate: {gt: 95}} params = {} opts = {form_url: "https://sandwichspotter.herokuapp.com/ batches/#{batch.id}/batch_items/#{id}/batch_item_results/new"}
  34. Andy Glass | RailsConf 2018 | Human Powered Rails host

    = “https://sandwichspotter.herokuapp.com” hit_title = batch.title hit_description = batch.description model_name = :BatchItemResult num_assignments = 2 reward = batch_item.price_in_cents/100.0 hit_lifetime = 5 #days duration = 1 #hours qualifications = {approval_rate: {gt: 95}} params = {} opts = {form_url: "https://sandwichspotter.herokuapp.com/ batches/#{batch.id}/batch_items/#{id}/batch_item_results/new"}
  35. Andy Glass | RailsConf 2018 | Human Powered Rails Turkee::TurkeeTask.create_hit(host,

    hit_title, hit_description, model_name, num_assignments, reward, hit_lifetime, duration, qualifications, params, opts)
  36. Andy Glass | RailsConf 2018 | Human Powered Rails batch_item_results/new.html.erb

    <h1><%= @batch.title %></h1> <h3><%= @batch.description %></h3> <h3><%= @batch.instructions.html_safe %></h3> <%= turkee_form_for @batch_item_result, params %> <%= @batch.output_field_opts.each do |label, instruction| %> <p><%= instruction %></p> <input name=“result[<%= @batch.output_field_name %>]” type=“radio" value="<%= label %>" /> <% end %> <%= f.submit “submit" %> <% end %>
  37. Andy Glass | RailsConf 2018 | Human Powered Rails batch_item_results/new.html.erb

    <h1><%= @batch.title %></h1> <h3><%= @batch.description %></h3> <h3><%= @batch.instructions.html_safe %></h3> <%= turkee_form_for @batch_item_result, params %> <%= @batch.output_field_opts.each do |label, instruction| %> <p><%= instruction %></p> <input name=“result[<%= @batch.output_field_name %>]” type=“radio" value="<%= label %>" /> <% end %> <%= f.submit “submit" %> <% end %> -Sets form url to https://www.mturk.com/mturk/externalSubmit -Adds hidden fields for capture (worker id, assignment ids, etc) Lets your form mimic a regular rails form
  38. Andy Glass | RailsConf 2018 | Human Powered Rails Importing

    result data Turkee::TurkeeTask.process_hits ✓ Creates Turkee::TurkeeImportedAssignments records ✓ Creates BatchItemResponse records
  39. Andy Glass | RailsConf 2018 | Human Powered Rails Importing

    result data #<Turkee::TurkeeImportedAssignment id: 111, assignment_id: “36U2A8VAG27ZEFQZ3NAC8G6ZFE9YKF", turkee_task_id: 6288, worker_id: “A2ARFO21LWE88K", result_id: 32 > #<BatchItemResult id: 32, batch_item: 2 result: {‘category’=>’No’} >
  40. Andy Glass | RailsConf 2018 | Human Powered Rails Batch

    name title description instructions output_field_name output_field_opts (hash) BatchItem batch_id post_id reward_in_cents status result confidence BatchItemResult batch_item_id result (hash) Turkee::TurkeeTask Turkee::TurkeeImportedAssignment turkee_task_id assignment_id (from mturk) worker_id (from mturk) result_id hit_url (from mturk) task_type hit_id complete (boolean) number_completed_assignments (integer)
  41. Andy Glass | RailsConf 2018 | Human Powered Rails Batch

    name title description instructions output_field_name output_field_opts (hash) BatchItem batch_id post_id reward_in_cents status result confidence BatchItemResult batch_item_id result (hash) Turkee::TurkeeTask Turkee::TurkeeImportedAssignment turkee_task_id assignment_id (from mturk) worker_id (from mturk) result_id hit_url (from mturk) task_type hit_id complete (boolean) number_completed_assignments (integer)
  42. Andy Glass | RailsConf 2018 | Human Powered Rails We

    still need to process and confirm the data
  43. Andy Glass | RailsConf 2018 | Human Powered Rails Batch

    Item (With results completed) Adjudicator Batch Item Complete Reprocess In Turk " #
  44. Andy Glass | RailsConf 2018 | Human Powered Rails def

    process if Adjudicator.new(self).approved? update_attributes(status: “complete”, result: adjudicator.result, confidence: adjudicator.confidence) else queue_for_reprocessing end end end
  45. Andy Glass | RailsConf 2018 | Human Powered Rails def

    process if Adjudicator.new(self).approved? update_attributes(status: “complete”, result: adjudicator.result, confidence: adjudicator.confidence) else queue_for_reprocessing end end end Send back to Turk for two more opinions
  46. Andy Glass | RailsConf 2018 | Human Powered Rails class

    Adjudicator def initialize(batch_item) @results_array = batch_item.bundle_results #flat array results accumulated from the BatchItem’s BatchItemResponse records (e.g. [‘yes’, ‘no’] @best_response_with_frequency = best_response_with_frequency end def confidence @best_response_with_frequency[1] == 1 ? “high” : “low” end def approved? (@best_response_with_frequency[1] / @results_array.count).to_f >.5 end def result @best_response_with_frequency[0] unless !approved? end private def best_response_with_frequency histo = result_array.to_histogram.sort{|resp, freq| -freq} histo[0] end end
  47. Andy Glass | RailsConf 2018 | Human Powered Rails def

    process if Adjudicator.new(self).approved? update_attributes(status: “complete”, result: adjudicator.result, confidence: adjudicator.confidence) else queue_for_reprocessing end end end
  48. Andy Glass | RailsConf 2018 | Human Powered Rails task

    :import_and_process do Turkee::TurkeeTask.process_hits batch_items_to_process = BatchItem.incomplete.all_results_in batch_items_to_process.each(&:process) end
  49. Andy Glass | RailsConf 2018 | Human Powered Rails Basic

    Use Case (sandwich spotter) Extensible (any use case)
  50. Andy Glass | RailsConf 2018 | Human Powered Rails How

    could we extend the app? ✓ Multiple batch items in a TurkeeTask ✓ Complex reprocessing flow ✓ Multiple inputs/outputs for a batch item ✓ Different output types
  51. Andy Glass | RailsConf 2018 | Human Powered Rails How

    should we extend the app? ✓ Multiple batch items in a TurkeeTask ✓ Complex reprocessing flow ✓ Multiple inputs/outputs for a batch item ✓ Different output types
  52. Andy Glass | RailsConf 2018 | Human Powered Rails Multiple

    Inputs/Outputs BatchItem batch_id post_id reward_in_cents status result confidence Batch name title description instructions output_field_name output_field_opts (json)
  53. Andy Glass | RailsConf 2018 | Human Powered Rails Multiple

    Inputs/Outputs BatchItem batch_id post_id reward_in_cents status result confidence Batch name title description instructions output_field_name output_field_opts (json)
  54. Andy Glass | RailsConf 2018 | Human Powered Rails BatchItem

    batch_id post_id reward_in_cents status result confidence input_data (json) results (json) confidence_levels (json) Batch name title description instructions output_field_name output_field_opts (hash) BatchInput batch_id key format settings (json) BatchOutput batch_id key format display_settings (json) adjudicator adjudicator_settings (json)
  55. Andy Glass | RailsConf 2018 | Human Powered Rails batch

    = Batch.create( name: "Sandwich Wizard", ) batch.batch_inputs.create( key: “sandwich_ig”, format: “instagram_post” )
  56. Andy Glass | RailsConf 2018 | Human Powered Rails batch

    = Batch.create( name: "Sandwich Wizard", ) batch.batch_inputs.create( key: “sandwich_ig”, format: “instagram_post” ) batch.batch_items.create( input_data:{ sandwich_ig: “BJ8ohZEAjWQ” } )
  57. Andy Glass | RailsConf 2018 | Human Powered Rails batch.batch_outputs.create(

    key: “has_fries”, format: “categories”, display_settings:{ category_opts: { yes: "Yes- it has french fries inside", no: "No- it does not have french fries inside” } }, adjudicator: ‘mode’, adjudicator_settings:{ acceptance: .5, } )
  58. Andy Glass | RailsConf 2018 | Human Powered Rails batch.batch_outputs.create(

    key: “has_fries”, format: “categories”, display_settings:{ category_opts: { yes: "Yes- it has french fries inside", no: "No- it does not have french fries inside” } }, adjudicator: ‘mode’, adjudicator_settings:{ acceptance: .5, } ) How we display the output How we confirm the output data
  59. Andy Glass | RailsConf 2018 | Human Powered Rails batch.batch_outputs.create(

    key: “fry_count”, format: “counter”, display_settings:{ instruct: “if yes, how many fries”, category_opts: { min: 0, max: 50 } } adjudicator: ‘number’, adjudicator_settings:{ variance: .20, } )
  60. Andy Glass | RailsConf 2018 | Human Powered Rails batch.batch_outputs.create(

    key: “fry_count”, format: “counter”, display_settings:{ instruct: “if yes, how many fries”, category_opts: { min: 0, max: 50 } } adjudicator: ‘number’, adjudicator_settings:{ variance: .20, } ) How we display the output How we confirm the output data
  61. Andy Glass | RailsConf 2018 | Human Powered Rails Possible

    input formats ✓ Business listing information ✓ Image ✓ Social posts (tweets, grams) ✓ Twilio embed (phone calls) ✓ Video (with JS helpers for splicing, etc)
  62. Andy Glass | RailsConf 2018 | Human Powered Rails Possible

    output formats ✓ Text (phone, email, website, etc) ✓ Radio buttons ✓ Multi-select categories ✓ Number ✓ Video data
  63. Andy Glass | RailsConf 2018 | Human Powered Rails Adjudicator

    Types ✓ Single text output ✓ Multiple text output ✓ Email ✓ URL ✓ Number
  64. Andy Glass | RailsConf 2018 | Human Powered Rails Tips

    for Turk Accuracy ✓ Explicit instructions ✓ UX ✓ Simple and straightforward tasks ✓ “Gold” Data ✓ Approved worker pools/worker qualifications
  65. Andy Glass | RailsConf 2018 | Human Powered Rails Task

    Speed Rate = e * r * n Ease of Task reward number of tasks
  66. Andy Glass | RailsConf 2018 | Human Powered Rails Task

    Speed Rate = e * r * n Ease of Task reward number of tasks Andy’s Theorem™
  67. Andy Glass | RailsConf 2018 | Human Powered Rails Ethics

    of mTurk Amazon Mechanical Turk: Gold Mine or Coal Mine? University of Colorado 2011 Amazon's Mechanical Turk workers protest: 'I am a human being, not an algorithm’ The Guardian December 3 2014
  68. Andy Glass | RailsConf 2018 | Human Powered Rails Culture

    http://turkernation.com/ https://turkopticon.ucsd.edu/ /r/mturk “Turking isn't beer money for me, it's rent money and food money” “12 hours a day on here to make an average of $100 a day” “A higher-end worker on MTurk can expect to make $6-$12 per hour”
  69. Andy Glass | RailsConf 2018 | Human Powered Rails Cambridge

    Analytica + mTurk How Amazon Helped Cambridge Analytica Harvest Americans’ Facebook Data Fast Company March 27 2018
  70. Andy Glass | RailsConf 2018 | Human Powered Rails So,

    why did you listen to me talk about mTurk?
  71. Andy Glass | RailsConf 2018 | Human Powered Rails Thanks

    for listening to me talk about mTurk.