How we've built Yahoo Fantasy Football (Droidcon Italy '15)

How we've built Yahoo Fantasy Football (Droidcon Italy '15)

How we've built Yahoo Fantasy Football in the Yahoo UK office. Presented at Droidcon Italy 2015, in Turin, on April 10th.

Video of the talk here: https://www.youtube.com/watch?v=hKmUZR6ZI28

E59595853b7c88a88b0300bdcc226302?s=128

Alex Florescu

April 10, 2015
Tweet

Transcript

  1. HOW WE’VE BUILT YAHOO FANTASY FOOTBALL Alex Florescu Yahoo UK

    #droidconit April 10th, 2015
  2. OVERVIEW Intro Principles & practices Testing Internationalisation Instrumentation & A/B

    testing Performance
  3. INTRO London team started in January 2014 Fantasy Football (Fantasy

    Calcio) launched in July 2014 Android / iOS / Web clients + back-end team
  4. THE APP 100k+ MAUs (on Android), ★★★★☆ Premier League, Campionato

    Italiano, Ligue 1©, Bundesliga, La Liga, MLS
  5. KEY PRINCIPLES Automate everything Short release cycle Performance, stability, quick

    changes Track and measure everything Data-driven product decisions Stress and enforce principles, not process
  6. ENGINEERING PRACTICES - CI CI pipeline from day one CD

    up to internal deployment Unit testing & UI testing Automatic APK generation and signing Compile time configs for dev, dogfood & production builds
  7. ENGINEERING PRACTICES - CI Git flow: Work on a branch,

    do a pull request to merge Short lived branches, keep PRs brief Master always builds, always shippable All code must be reviewed Compile-time feature toggles “disable” code that is not ready
  8. TESTING CI without automated testing is … Different levels of

    testing On commit hook: robolectric suite Next stage, smoke suite of UI tests Nightly: full suite of UI tests, performance tests, monkey tests
  9. ROBOLECTRIC TESTING Robolectric tests run on JVM, no devices needed

    Slower than plain JUnit tests, but significantly faster than UI tests Very useful as unit tests With architectures such as MVP, can also be acceptance tests
  10. ROBOLECTRIC PROBLEMS Not all Android framework functionality is replicated Differences

    between JVM and Dalvik VM Difficult to test complex user flows over multiple screens Custom views sometimes problematic
  11. OUR NUMBERS 700+ tests 50-60% coverage (higher in biz logic,

    lower in UI) 2’ to run, 6’ full build from scratch
  12. UI TESTS Good: Proper integration tests Run on device Most

    closely resembling real user flows Can catch device specific issues
  13. UI TESTS Bad: Synchronisation problems (e.g. Button “OK” not found)

    Brittle, hard to maintain Very slow to run Requires a device lab to be setup for CI
  14. SMOKE SUITE VS FULL SUITE Even small suites can take

    hours to run because of sync issues For sanity checking, a smoke suite will do Relatively fast (10-15min) & simple UI test Ensure app runs and can see all screens
  15. FULL SUITE For enhanced testing, a nightly full suite In-depth

    user flow tests, can run for hours Make sure someone checks it daily! Should be a release blocker
  16. CI PIPELINE

  17. MONKEY TESTING Useful for stability testing Catches crashes and memory

    leaks Could be included in automated nightly runs Make sure app activity is restricted Lock monkey in app (e.g. Surelock) Consider removing certain features when monkey runs
  18. TRACKING TESTING Coverage useful for analysis (e.g. what areas get

    the least testing and why?), but should not enforce a coverage target Reasonable to expect acceptance tests with features Enforce testing through code review Tests are code! Refactoring, good architecture, documentation, still apply
  19. I18N, L10N … Translation: strings only Localisation: adapting content for

    language, culture and region Internationalisation: designing a product to allow localisation
  20. CALCIO, SOCCER, FUßBALL… We shipped to 20+ locales from day

    one Challenges: All strings needs to be translated Number formatting, currency formatting etc. Support, reviews, release notes Testing load increased — UI issues with some locales only
  21. I18N — DEALING WITH IT Externalise all strings and enforce

    no lint errors on build Collect all strings early for translation before they block release Have standard release notes saved & translated for emergencies Some test devices permanently on tricky locales
  22. I13N — INSTRUMENTATION What Collecting data to understand how an

    app performs and how it is used Why Key to understanding what the users are doing
  23. WHAT TO INSTRUMENT Time spent in app Buttons tapped Loading

    time, network performance Anything you want!
  24. WHAT TO DO WITH DATA How long does it take

    a user to create a team? What are the best triggers for a user to sign in? How often do users share something with friends? Signs of frustration: e.g. repeating identical action
  25. 13N CHALLENGES Collecting the data is the easy part (and

    it’s not easy) Don’t reinvent the wheel, use 3rd party tools for this We use Flurry Real challenge: What does user engagement mean? How do you measure it?
  26. A/B TESTING — WHY? What makes users more likely to

    invite or share with friends? What makes users more likely to be engaged? Happy? What features do we add or remove? Is a new feature supporting our high level goals? Goal: maximum user satisfaction and engagement with minimum number of features
  27. EXPERIMENTS Build-up an MVP of your new feature Enabled the

    feature in a test bucket (e.g. only for 10% of users) Data is collected for all users, bucket-aware and results are compared across test and control bucket Results can be used to guide product decisions
  28. EXPERIMENT EXAMPLE Hypothesis: A prompt to share the newly created

    league will increase the number of shares
  29. EXPERIMENT RESULTS Succesful! 71% of users that see the prompt

    share the league
  30. EXPERIMENT EXAMPLE Hypothesis: A tutorial will increase the number of

    completed teams
  31. EXPERIMENT RESULTS Completion team was actually unaffected: hypothesis rejected But,

    significantly more likely that they will complete the team in the same session
  32. EXPERIMENTS “Guesses” are not necessarily right “Obvious” improvements may not

    be Used correctly, real world data provides proof
  33. PERFORMANCE Caring is measuring What numbers we track Cold start

    time FPS Automated measurements (e.g. nightly build to track progress) Track production numbers — this is what matters
  34. PERFORMANCE Numbers will vary wildly in different regions Slower networks,

    older devices When we started monitoring our world average for load time was ~2-3x our US/UK one
  35. PERFORMANCE

  36. WRAP-UP CI & automated testing are key for quality and

    stability Instrument everything, use data to experiment and guide product A/B testing can confirm product hypothesis You should localise your apps, but know what you’re getting into Performance needs prod monitoring and on-going measurement
  37. Q & A yahoo-mep.tumblr.com www.florescu.org @flor3scu