How we've built Yahoo Fantasy Football (Droidcon Italy '15)

HOW WE’VE BUILT YAHOO FANTASY FOOTBALL Alex Florescu Yahoo UK
#droidconit April 10th, 2015

OVERVIEW Intro Principles & practices Testing Internationalisation Instrumentation & A/B
testing Performance

INTRO London team started in January 2014 Fantasy Football (Fantasy
Calcio) launched in July 2014 Android / iOS / Web clients + back-end team

THE APP 100k+ MAUs (on Android), ★★★★☆ Premier League, Campionato
Italiano, Ligue 1©, Bundesliga, La Liga, MLS

KEY PRINCIPLES Automate everything Short release cycle Performance, stability, quick
changes Track and measure everything Data-driven product decisions Stress and enforce principles, not process

ENGINEERING PRACTICES - CI CI pipeline from day one CD
up to internal deployment Unit testing & UI testing Automatic APK generation and signing Compile time conﬁgs for dev, dogfood & production builds

ENGINEERING PRACTICES - CI Git ﬂow: Work on a branch,
do a pull request to merge Short lived branches, keep PRs brief Master always builds, always shippable All code must be reviewed Compile-time feature toggles “disable” code that is not ready

TESTING CI without automated testing is … Different levels of
testing On commit hook: robolectric suite Next stage, smoke suite of UI tests Nightly: full suite of UI tests, performance tests, monkey tests

ROBOLECTRIC TESTING Robolectric tests run on JVM, no devices needed
Slower than plain JUnit tests, but signiﬁcantly faster than UI tests Very useful as unit tests With architectures such as MVP, can also be acceptance tests

ROBOLECTRIC PROBLEMS Not all Android framework functionality is replicated Differences
between JVM and Dalvik VM Difﬁcult to test complex user ﬂows over multiple screens Custom views sometimes problematic

OUR NUMBERS 700+ tests 50-60% coverage (higher in biz logic,
lower in UI) 2’ to run, 6’ full build from scratch

UI TESTS Good: Proper integration tests Run on device Most
closely resembling real user ﬂows Can catch device speciﬁc issues

UI TESTS Bad: Synchronisation problems (e.g. Button “OK” not found)
Brittle, hard to maintain Very slow to run Requires a device lab to be setup for CI

SMOKE SUITE VS FULL SUITE Even small suites can take
hours to run because of sync issues For sanity checking, a smoke suite will do Relatively fast (10-15min) & simple UI test Ensure app runs and can see all screens

FULL SUITE For enhanced testing, a nightly full suite In-depth
user ﬂow tests, can run for hours Make sure someone checks it daily! Should be a release blocker

CI PIPELINE

MONKEY TESTING Useful for stability testing Catches crashes and memory
leaks Could be included in automated nightly runs Make sure app activity is restricted Lock monkey in app (e.g. Surelock) Consider removing certain features when monkey runs

TRACKING TESTING Coverage useful for analysis (e.g. what areas get
the least testing and why?), but should not enforce a coverage target Reasonable to expect acceptance tests with features Enforce testing through code review Tests are code! Refactoring, good architecture, documentation, still apply

I18N, L10N … Translation: strings only Localisation: adapting content for
language, culture and region Internationalisation: designing a product to allow localisation

CALCIO, SOCCER, FUßBALL… We shipped to 20+ locales from day
one Challenges: All strings needs to be translated Number formatting, currency formatting etc. Support, reviews, release notes Testing load increased — UI issues with some locales only

I18N — DEALING WITH IT Externalise all strings and enforce
no lint errors on build Collect all strings early for translation before they block release Have standard release notes saved & translated for emergencies Some test devices permanently on tricky locales

I13N — INSTRUMENTATION What Collecting data to understand how an
app performs and how it is used Why Key to understanding what the users are doing

WHAT TO INSTRUMENT Time spent in app Buttons tapped Loading
time, network performance Anything you want!

WHAT TO DO WITH DATA How long does it take
a user to create a team? What are the best triggers for a user to sign in? How often do users share something with friends? Signs of frustration: e.g. repeating identical action

13N CHALLENGES Collecting the data is the easy part (and
it’s not easy) Don’t reinvent the wheel, use 3rd party tools for this We use Flurry Real challenge: What does user engagement mean? How do you measure it?

A/B TESTING — WHY? What makes users more likely to
invite or share with friends? What makes users more likely to be engaged? Happy? What features do we add or remove? Is a new feature supporting our high level goals? Goal: maximum user satisfaction and engagement with minimum number of features

EXPERIMENTS Build-up an MVP of your new feature Enabled the
feature in a test bucket (e.g. only for 10% of users) Data is collected for all users, bucket-aware and results are compared across test and control bucket Results can be used to guide product decisions

EXPERIMENT EXAMPLE Hypothesis: A prompt to share the newly created
league will increase the number of shares

EXPERIMENT RESULTS Succesful! 71% of users that see the prompt
share the league

EXPERIMENT EXAMPLE Hypothesis: A tutorial will increase the number of
completed teams

EXPERIMENT RESULTS Completion team was actually unaffected: hypothesis rejected But,
signiﬁcantly more likely that they will complete the team in the same session

EXPERIMENTS “Guesses” are not necessarily right “Obvious” improvements may not
be Used correctly, real world data provides proof

PERFORMANCE Caring is measuring What numbers we track Cold start
time FPS Automated measurements (e.g. nightly build to track progress) Track production numbers — this is what matters

PERFORMANCE Numbers will vary wildly in different regions Slower networks,
older devices When we started monitoring our world average for load time was ~2-3x our US/UK one

PERFORMANCE

WRAP-UP CI & automated testing are key for quality and
stability Instrument everything, use data to experiment and guide product A/B testing can conﬁrm product hypothesis You should localise your apps, but know what you’re getting into Performance needs prod monitoring and on-going measurement

Q & A yahoo-mep.tumblr.com www.ﬂorescu.org @ﬂor3scu

How we've built Yahoo Fantasy Football (Droidco...

How we've built Yahoo Fantasy Football (Droidcon Italy '15)

More Decks by Alex Florescu

Other Decks in Programming

Featured

Transcript