Upgrade to Pro — share decks privately, control downloads, hide ads and more …

RailsConf 2013: No traffic, no users, no problem!

RailsConf 2013: No traffic, no users, no problem!

Usability and A/B testing with Rails and Mechanical Turk.

1) Sample code for quickly integrating your Rails site with Mechanical Turk. (slide #22)

2) How to structure your HITs (Human Intelligence Tasks) so that you solicit detailed feedback from the workers. (slide #26)

3) Integrating A/B testing so that you can quickly decide which is the better design component. (slide #36)

4) Tactics for stopping automated bots from ruining your usability tests. (slide #47)

Avatar for Jim Jones

Jim Jones

April 29, 2013
Tweet

More Decks by Jim Jones

Other Decks in Programming

Transcript

  1. No Traffic, No Users, No Problem! Jim Jones Ruby on

    Rails Consultant http://www.aantix.com [email protected] @aantix Thursday, May 2, 13
  2. 8 •HITs - Human Intelligence Tasks •Turkers - The workers

    performing the tasks •Assignment - A HIT has multiple assignments Thursday, May 2, 13
  3. Demographics •United States: 46.80% •India: 34.00% •Miscellaneous: 19.20% 11 *

    http://behind-the-enemy-lines.blogspot.com/2010/03/new-demographics-of-mechanical-turk.html Thursday, May 2, 13
  4. How can I use it? 17 •Basic Web Interface •API

    (e.g. RTurk, Ruby-aws, Turkee) •Command-line tool (XML) Thursday, May 2, 13
  5. 20 The Turker interacts with your website directly. Your site

    is displayed within an iFrame. Thursday, May 2, 13
  6. ....allows a developer to easily integrate with Mechanical Turk using

    a Rails- like form helper method 21 Thursday, May 2, 13
  7. Easily import the data posted by the Mechanical Turk workers

    back into your data models. 23 Thursday, May 2, 13
  8. How can we gather meaningful feedback quickly, so that we

    can make better daily decisions? 26 Thursday, May 2, 13
  9. First tool of choice; the ‘forward’ gem. 28 >gem install

    forward >forward 3000 Forwarding port 3000 to https://aantix.fwd.wf Ctrl-C to stop forwarding Thursday, May 2, 13
  10. 29 Soliciting Feedback using Turkee Gemfile ... gem ‘turkee’ #

    http://www.github.com/aantix/turkee Thursday, May 2, 13
  11. 31 rake turkee:create_study[params...] url,title,description,num_assignments, reward,lifetime Creating the Study HIT e.g.

    rake turkee:create_study['https://aantix.fwd.wf/',"Can you help me test my website?","Tell me what you don’t like about my site’s homepage? Click submit below when you're finished.",30,0.20,20] Thursday, May 2, 13
  12. 34 Turker Feedback • “Your should make the logo bolder

    than the menu. They blend in too much. The image could be lighten up because the saturation is too much for my eyes. The grey box over the image could contain more information or a better call to action...” • “I had problems using your site with the sign in , when I clicked sign in with facebook it gave me an error "page cannot be displayed. other than that the site is doing great. loads fast and its easy to understand. you should add more content and find more bands willing to do the online web chat. “ Thursday, May 2, 13
  13. 35 PROTIP(s): When soliciting feedback... • My appeal to the

    Turker is a personal appeal. Personal appeals generate more effort from the Turkers. • Ensure that you invite the Turkers to give negative feedback Thursday, May 2, 13
  14. Vanity Gem 37 # Gemfile # http://vanity.labnotes.org/ gem ‘vanity’ #

    project/experiments/homepage_photo_options.rb ab_test "Homepage photo options" do description "Which is the best photo to show on the home page?" alternatives "crowd_surfing_blue2.jpg", "girls_with_tickets3.jpg" metrics :event_view end # project/app/controllers/events_controller.rb class EventsController < ApplicationController def show .... track! :event_view end Thursday, May 2, 13
  15. Intent is hard to test with Mechanical Turk. These are

    paid workers and they’re not customers interested in your product. 38 Thursday, May 2, 13
  16. Example Test: 40 Given two large images on the homepage,

    under which image are Turkers more successful in reaching the even page? Thursday, May 2, 13
  17. Test Assumption: 41 The image where Turkers reach the event

    page is the image that was more compelling Thursday, May 2, 13
  18. Turker Instructions: 44 “Look through my site and find your

    way to the band's event page. Then give feedback below on what you didn't like about my site. Click submit below when you're finished.” Thursday, May 2, 13
  19. Estimates as high as 40% of all data submissions on

    Mechanical Turk are SPAM/ bots. 48 Thursday, May 2, 13
  20. •Even Better; ask a validating question about their submission. e.g.

    How many words in your second sentence? •This causes the Turker to slow down and re-read their work. And it serves as a validator. 50 Thursday, May 2, 13
  21. Financial Incentives and the Performance of Crowds 53 Winter Mason,

    Yahoo! Research Figure 2 reveals two main findings: first, that across all difficulty levels participants chose to complete more tasks on average when the pay was higher (F(3,607) = 15.73, p < 0.001); and second, that across all payment levels, the number of completed tasks decreased with increasing difficulty. As Figure 3 indicates, however, increasing compensation did not improve accuracy, which we measured in two ways… Thursday, May 2, 13
  22. 54 Toward Automatic Task Design: A Progress Report 54 Eric

    Huang, School of Engineering and Applied Sciences Harvard University Thursday, May 2, 13
  23. 55 Task Search in a Human Computation Market 54 Lydia

    B. Chilton, University of Washington We found strong evidence that Turkers sort by the most number of HITs available (so they can find one task, and then do 100 instances of them in a row) and the most recently posted HITS (so they get the latest and greatest HITs). Thursday, May 2, 13
  24. 10,000 Sheep • http://www.thesheepmarket.com • External HIT; hosted flash app

    allowed Turkers to draw each sheep 58 Thursday, May 2, 13
  25. Dr. Suessify the News • http://groups.csail.mit.edu/uid/deneme/?p=671 • Russia to launch

    520-day mock mission to Mars → Oh, The Places Russia Will Go! • BP tries again to cap well; protests set to start → BP tries to cap with a hat while protesters start to raise hell about the well 59 Thursday, May 2, 13
  26. Micro bonuses; using the bonus as a lever • The

    user is awarded a varying amount depending upon how much time they take to answer a question or how accurate it (relative to the other user answers) 61 Thursday, May 2, 13
  27. Gambling • Sentiment Analysis • Show the user all of

    the recent headlines for a given stock. Have them speculate as to whether the stock will close out higher or lower than the prior day. If they’re correct in their assessment, they get their fee + a bonus. Otherwise, their HIT is rejected. 62 Thursday, May 2, 13