Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Breaking Bug

Breaking Bug

Breaking Bug is a post-mortem about how we handled in Chicisimo a bug which went largely undetected and of unknown scale.

We will explain how it did affect us, and which techniques and tools did we use to:

1. Investigate what was going on.
2. Properly diagnose and narrow it to the affected part in the app.
3. Analyze its impact and affected people.
4. Finally isolate it and fix it.

In the end we will give some conclusions about how was the real cost of fixing it, and in which ways the process gave us insight and helped us to understand and improve our quality processes.

Saúl Díaz

October 21, 2017
Tweet

More Decks by Saúl Díaz

Other Decks in Programming

Transcript

  1. Seems like it doesn’t have internet, but internet is working!

    Please help! Since last update you cannot enter the app anymore, everything is white. Please fix it or I will uninstall the app. It was great at the beginning, but now it is not loading. I uninstalled and reinstalled several times, but it does not improve. The app is so good I didn’t want to stop using it. Please improve it. Nothing works, I uninstalled and installed several times, at first I thought it was a problem with my cellphone, but it isn’t. What do I do now? The app is blocked and it does not load, I already updated and nothing.
  2. THE FIRST LINE OF DEFENSE Testing + CI TESTING Not

    perfect, but solid suite Domain JUnit E2E, ~65% coverage UI Junit & Robolectric, ~43% coverage Focused on key components
  3. THE FIRST LINE OF DEFENSE MANUAL QA & DOGFOODING The

    team tests the app We ship early internally on a variety of devices and API Levels Manual Testing
  4. THE FIRST LINE OF DEFENSE STAGED ROLLOUTS At least 24h

    w/ constant monitoring before full rollouts, up to 99% to ensure rollbacking CONTINOUS DELIVERY Shipping is a major asset Staged Rollouts
  5. Place your screenshot here NETWORKING OkHttp + Retrofit USE CASES

    Covered by tests PRESENTATION Bus-backed UI ProgressSwitcher
  6. Place your screenshot here NETWORKING OkHttp + Retrofit USE CASES

    Covered by tests PRESENTATION Bus-backed UI ProgressSwitcher
  7. Heisenbug In computer jargon, a “heisenbug” is a software bug

    that seems to disappear or alter its behavior when one attempts to study it. Source: Wikipedia
  8. Oct 12 08:21:15 android INFO MainActivity [Data successful] hello world!_

    userId: sefford, installation:1234512345 Samsung Sm-j700f 6.0.1(api 23) build:110105_ date logtag comp. info user id installation id device info API level build nº
  9. Jul 03 17:31:25 android chicisimo: ERROR MainActivity [ProgressSwitcher] feed arrived

    successfully with 0 elements Build:109402 Jul 03 17:42:38 android chicisimo: ERROR MainActivity [ProgressSwitcher] feed arrived successfully with 0 elements Build:109402 Jul 03 17:54:44 android chicisimo: ERROR MainActivity [ProgressSwitcher] feed arrived successfully with 0 elements Build:109402 Jul 03 18:02:56 android chicisimo: ERROR MainActivity [ProgressSwitcher] feed arrived successfully with 0 elements Build:109402 Jul 03 18:04:11 android chicisimo: ERROR MainActivity [ProgressSwitcher] feed arrived successfully with 0 elements Build:109402 Jul 03 18:09:21 android chicisimo: ERROR MainActivity [ProgressSwitcher] feed arrived successfully with 0 elements Build:109402 Jul 03 18:23:57 android chicisimo: ERROR MainActivity [ProgressSwitcher] feed arrived successfully with 0 elements Build:109402 Jul 03 18:37:33 android chicisimo: ERROR MainActivity [ProgressSwitcher] feed arrived successfully with 0 elements Build:109402 Jul 03 18:38:03 android chicisimo: ERROR MainActivity [ProgressSwitcher] feed arrived successfully with 0 elements Build:109402 Jul 03 18:45:38 android chicisimo: ERROR MainActivity [ProgressSwitcher] feed arrived successfully with 0 elements Build:109402 Jul 03 18:54:49 android chicisimo: ERROR MainActivity [ProgressSwitcher] feed arrived successfully with 0 elements Build:109402 Build:109402
  10. --- a/app/src/main/java/com/chicisimo/common/injection/ApiModule.java +++ b/app/src/main/java/com/chicisimo/common/injection/ApiModule.java @@ -104,7 +104,7 @@ public class

    ApiModule { .addNetworkInterceptor(responseInterceptor) - .connectTimeout(60, TimeUnit.SECONDS) + .connectTimeout(00, TimeUnit.SECONDS) .readTimeout(60, TimeUnit.SECONDS); }
  11. IMPROVING THE TEST SUITE ▰ Switched Retrofit mocks for real

    requests to MockWebServer ▰ Segment new connection error conditions like timeouts, SSL handshake errors... ▰ Better understanding on how the app behaves on the networking area
  12. VYSOR.IO “I wish there was a way to fly to

    Brazil and hook their phones to my computer properly debug what is going on”
  13. INFO: connection from chicisimo.api failed:SocketTimeoutException: Socket closed after 60000ms INFO:

    connection from chicisimo.api failed:SocketTimeoutException: Socket closed after 60000ms INFO: connection from chicisimo.api failed:SocketTimeoutException: Socket closed after 60000ms INFO: connection from chicisimo.api failed:SocketTimeoutException: Socket closed after 60000ms INFO: connection from chicisimo.api failed:SocketTimeoutException: Socket closed after 60000ms INFO: connection from chicisimo.api failed:SocketTimeoutException: Socket closed after 60000ms INFO: connection from chicisimo.api failed:SocketTimeoutException: Socket closed after 60000ms INFO: connection from chicisimo.api failed:SocketTimeoutException: Socket closed after 60000ms
  14. Send request Return server UTC time Log the time taken

    with every request Send server the real time frame
  15. # NumSamples = 453938; Min = 0.00; Max = 97885.00

    # Mean = 1642.091535; Variance = 18645635.742440; SD = 4318.059256; Median 765.000000 # each ∎ represents a count of 5360 0 - 2000 [402006]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ (88.56%) 2000 - 3000 [ 21361]: ∎∎∎ (4.71%) 3000 - 4000 [ 5650]: ∎ (1.24%) 4000 - 5000 [ 2682]: (0.59%) 5000 - 6000 [ 1575]: (0.35%) 6000 - 7000 [ 1514]: (0.33%) 7000 - 8000 [ 931]: (0.21%) 8000 - 9000 [ 620]: (0.14%) 9000 - 10000 [ 356]: (0.08%) 10000 - 20000 [ 1678]: (0.37%) 20000 - 97885 [ 15565]: ∎∎ (3.43%) https://github.com/bitly/data_hacks
  16. DRAWING OF THE WEB https://github.com/square/okhttp/issues/270 ▰ Hidden API on OkHttp

    3.8.0 for Network Introspection ▰ In discussion since 2013 ▰ Available on OkHttp 3.9.0
  17. Jul 17 18:21:48 android chicisimo: INFO [Network] get '/albums/82603' server:255.255.255.255_

    dns:253ms ssl:752ms connection:670ms installation:1234512345 user:sefford endpoint server DNS SSL connection
  18. “Dude, this alpha you sent me is going like s**t,

    everything takes a century to load. J. Barroso, Chicisimo’s Designer
  19. THE ACTUAL FIX public class IpV4PriorizerDns implements Dns { public

    static final Comparator<InetAddress> INET_ADDRESS_COMPARATOR = new Comparator<InetAddress>() { @Override public int compare(InetAddress left, InetAddress right) { if (left instanceof Inet4Address) { return -1; } else if (right instanceof Inet4Address) { return 1; } return 0; } }; @Override public List<InetAddress> lookup(String hostname) throws UnknownHostException { if (hostname == null) { throw new UnknownHostException( "hostname == null"); } else { final List<InetAddress> inetAddresses = new ArrayList<>( Arrays.asList(InetAddress.getAllByName(hostname))); Collections. sort(inetAddresses, INET_ADDRESS_COMPARATOR); return inetAddresses; } } }
  20. 4 weeks of dev time That’s a lot of money

    0.03 rating down We only recovered 0,012 since 15,656 net users lost And a lot of users
  21. Improved our tests And our UX! Learnt a lot of

    insight If something happens, we’ll know it Got a lot of tools It increases the strategies we can use!
  22. WRAPPING UP ▰ Keep calm and be scientific ▰ Engineering

    comes from latin “ingenium” ▰ Failin’ fine
  23. THANKS! Any questions? You can find me at @sefford &

    [email protected] https://speakerdeck.com/sefford/breaking-bug
  24. CREDITS Special thanks to all the people who made and

    released these awesome resources for free: ▰ Presentation template by SlidesCarnival ▰ All the people in the Android community who helped during this endeavor