Paul Verbeek-Mast (Booking.com), Bad Evidence in Testing Product Hypothesis, CodeFest 2017

Bad Evidence

Paul Verbeek-Mast

workingatbooking.com

All A/B tests and data shown in this presentation are
not based on real experiments. They are made up just for this presentation.

How much do you want to create “Bad Evidence”? Насколько
вы готовы получить доказательство обратного ?

You don’t want to do something if it is going
to go against your theory of the case. Вы не хотите делать что-то что повредит вашей теории

Rather than trying to get to the truth, what you’re
trying to do is build your case, and make it the strongest case possible. Вместо того чтобы докопаться до истины вы пытаетесь защитить свою версию, сделав ее доказательство "пуленепробиваемым".

What does veriﬁcation bias cause you to do? Ignore it
and push it to the side. Что вы будете делать со своей предвзятостью ? Просто игнорируйте ее.

Base база

Base Variant база вариант

Base Variant вариант база

Base Variant 5234 searches вариант база

Base Variant 5234 searches 6252 searches вариант база

Base Variant 5234 searches 6252 searches +19.45% вариант база

Base Variant база вариант

Base Variant Making the search box hotpink will result in
more searches база вариант

Making the search box hotpink will result in more searches

6252 searches +19.45%

6252 searches +19.45% 242 bookings -4.7%

?

Because of (why) we believe that changing (what) for (who)
will result into (outcome)

Why • Based on a gut feeling, I believe (…)
• Because I like it better, I believe (…) • Because I saw it on another website, I believe (…) Bad examples Objective and based on data

Why • Because of research described in article (…), we
believe (…) • After done user research, we believe (…) • Based on a previous experiment doing (…), we believe (…) Objective and based on data Good examples

What An accurate, short description of your change • we
make it pink • we move it to a different place • we change the title Bad examples

What • we make the search box on the homepage
pink • we open pictures in the search page in a lightbox when clicking on it Good examples An accurate, short description of your change

Who A realistic, accurate description of your target group •
everyone • some people • users booking a hotel in Novosibirsk, named Paul, from Amsterdam, with a big beard Bad examples

Who A realistic, accurate description of your target group •
users visiting the home page • users searching for a property in Novosibirsk • users who are logged in Good examples

Outcome measurable, expected changes • users feeling better • the
site looking prettier • an increase in loyalty Bad examples

Outcome • an increase in earnings • a decrease in
returned products • an increase in sign-ups Good examples measurable expected metrics

Because of user research we believe that changing (what) for
(who) will result into (outcome)

Because of user research we believe that changing the background
of the search box to pink for (who) will result into (outcome)

of the search box on the homepage pink for users that visit the homepage will result into (outcome)

of the search box on the homepage pink for users that visit the homepage will result into an increase in bookings

Because of studies done by x and y that show
the positive effect of green, we believe that changing our booking buttons to green for users who visit our product page will result into more bookings

the positive effect of green, we believe that changing our booking buttons to green for users who visit our product page will result into more bookings Base

the positive effect of green, we believe that changing our booking buttons to green for users who visit our product page will result into more bookings Base Variant

Base Variant 812 bookings 825 bookings +1.6% Day 1

Base Variant 1.924 bookings 1.920 bookings -0.2% Day 2

Base Variant 2.714 bookings 2.925 bookings +7.8% Day 3

Base Variant 16.623 bookings 16.324 bookings -1.8% Day 20

• Number of visitors • How big of a change
you want to measure • How conﬁdent you want to be, that your test is correct How long should your run your A/B test?

You can never be 100% conﬁdent that your test is
correct

The more things you measure, the higher the chance some
test metrics are incorrect

Variant bookings -1.8%

Variant bookings -1.8% price of bookings +4.3%

clicks on button hover over button bookings visits on page
scrolled to button bookings from IE8 bookings from Malaysia users going to search results logins sign ups clicks on logo time on page returning visitors price of booking number of rooms booked language changes calls to customer service buys with credit card

clicks on button hover over button bookings visits on page
scrolled to button bookings from IE8 bookings from Malaysia users going to search results logins sign ups clicks on logo time on page returning visitors price of booking number of rooms booked language changes calls to customer service buys with credit card +0.1% -0.2% +2.3% +0.3% +4.7% -3.1% +0.0% +3.5% -1.1% -2.1% +0.3% +2.1% -1.8% -0.3% +0.0% +0.5% +4.3% -0.2%

Focus on your deﬁned metrics, but also keep an eye
on your health metrics

“price is going up, so it must be doing well”
“price is going down, so it must be a false negative” vs. Metrics that are not in hypothesis

“this new metric is positive, it’s working great!” “this new
metric is negative, must be having a bug” vs. Newly implemented metrics

“it’s positive after 5 days, let’s put it in production”
“it’s negative after 5 days, let’s run it for another few days” vs. Sample size

• Number of visitors • How big of a change
you want to measure • How conﬁdent you want to be, that your test is correct How long should your run your A/B test?

Paul Verbeek-Mast

@_paulverbeek [email protected]

Questions? Вопросов?

Paul Verbeek-Mast (Booking.com), Bad Evidence ...

Paul Verbeek-Mast (Booking.com), Bad Evidence in Testing Product Hypothesis, CodeFest 2017

More Decks by CodeFest

Other Decks in Technology

Featured

Transcript