Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Waiting is boring

Filipe Freire
September 30, 2017

Waiting is boring

Talk about waiting in software testing and display problems and solutions found to test async scenarios in services with messaging systems, frontend apps and distributed stream processing apps.

Filipe Freire

September 30, 2017
Tweet

More Decks by Filipe Freire

Other Decks in Programming

Transcript

  1. Waiting is boring
    Filipe Freire - 30 September 2017

    View Slide

  2. Quick intro
    Husband, learner, tester, developer

    (new) OS contributor
    2y work as a “coding” tester

    1y work as a developer
    2
    Currently @

    View Slide

  3. Disclaimer
    This talk is about waiting in software testing.
    Expect an imperfect personal account on
    problems and solutions.
    3

    View Slide

  4. What you can take from this
    Tips on testing async scenarios in:
    A) services with messaging systems

    B) frontend apps

    C) distributed stream processing apps.
    4

    View Slide

  5. 1) Intro
    5

    View Slide

  6. –Venkat Subramaniam (twitter, 1 September 2017)
    “It’s foolish to wait forever for something to
    happen. Both in life and in asynchronous
    programming, timeouts are critical.”
    6

    View Slide

  7. 7

    View Slide

  8. What about software?
    (testing-wise)
    8

    View Slide

  9. Why do we test?
    9

    View Slide

  10. “To verify something

    can work”
    10

    View Slide

  11. (Or “Fact checks”)
    Automated
    tests
    Testing
    effort
    11

    View Slide

  12. 12
    source: martinfowler.com

    View Slide

  13. Theres more to it, but lets focus
    on time
    13

    View Slide

  14. Syncing test’s actions with
    results, typical first approach…
    14

    View Slide

  15. Thread.sleep(ms);
    usleep(microseconds);
    time.sleep(time_ms)
    System.Threading.Thread.Sleep(msecs);
    15

    View Slide

  16. We let the computer wait
    a) 5 seconds
    b) 10 seconds
    c) 1 minute
    d) 1 hour

    16

    View Slide

  17. If we’re lucky with our “estimate”
    17

    View Slide

  18. 18

    View Slide

  19. It’s never just X seconds.
    19

    View Slide

  20. How do we fix avoid it?
    20

    View Slide

  21. 2) Example: API & messaging
    21

    View Slide

  22. The context
    Warehousing & Logistics project
    60 person project
    Tons of Microservices & Monoliths
    (different teams)
    Comms: REST and AMQP (RabbitMQ)
    22

    View Slide

  23. Typical “End-to-End”
    Event triggered Expected state
    ?
    Producer Consumer
    Exchange Queue
    Live Live
    23

    View Slide

  24. Different teams & test tools
    design patterns

    containers
    dependency injection
    magic
    java “scripts”
    magic
    A B
    Hammered sleeps
    24

    View Slide

  25. One more thing…
    Inverted test pyramid
    And the Hammered
    sleeps
    25

    View Slide

  26. “Inverted pyramid?

    Not my money!”
    26

    View Slide

  27. So the sleeps then…
    27

    View Slide

  28. “Typical day”
    20 different async comm flows
    each w/ 20 variations in payload

    (+ “inf” minor variations)
    28

    View Slide

  29. “Typical day”
    Some math:
    20 flows times 

    20 variations times 

    5 sec sleep =
    half hour (+/-)
    29

    View Slide

  30. “Typical day”
    Some math:
    20 flows times 

    20 variations times 

    10 sec sleep = (not uncommon)
    1 hour (+/-)
    30

    View Slide

  31. Solutions
    1. Invert the damn pyramid
    2. Don’t hammer a wait
    31

    View Slide

  32. Solutions
    1. Invert the damn pyramid
    2. Don’t hammer a sleep
    32

    View Slide

  33. How do we avoid a hammered
    sleep?
    33

    View Slide

  34. The “scholar”-way
    34

    View Slide

  35. The “scholar”-way
    35

    View Slide

  36. The “do it for me”-way
    36

    View Slide

  37. The “do it for me”-way
    + AssertJ
    37

    View Slide

  38. 3) Example: Frontend
    38

    View Slide

  39. The context
    Back-office projects ( >1 project)
    “Latest frameworks & tools”
    No FE unit tests. No FE tests.
    “Let’s make end-to-end FE tests”
    39

    View Slide

  40. Sorry, what?
    Back-office projects ( >1 project)
    “Latest frameworks & tools”
    No FE unit tests. No FE tests.
    “Let’s make end-to-end FE tests”
    40

    View Slide

  41. ?
    No unit tests.
    No tests.
    41

    View Slide

  42. ???
    No unit tests.
    No tests.
    42

    View Slide

  43. Scream disappointed.
    43

    View Slide

  44. Now let’s pretend we don’t have
    that problem, and focus 

    on end-to-end tests
    44

    View Slide

  45. And the other problem
    Elements visibility takes time
    Interactions take time
    Animations take time
    Requests take time
    45

    View Slide

  46. How do we fight this “time”?
    46

    View Slide

  47. The “spit upwards” approach
    >60 tests
    Hammer big waits
    Wait for everything to be ready
    Result: 2h test suite
    47

    View Slide

  48. Should we run the tests in 

    parallel?
    Not yet.
    48

    View Slide

  49. The “moderate” approach
    45 tests (clean-up)
    Small individual fixed waits
    Result: 40min test suite
    49

    View Slide

  50. The “surgeon” approach
    30 tests (keep the “good” ones)
    Small individual waits
    Waits are counter-like
    Result: 20min test suite
    50

    View Slide

  51. The “Duke Nukem” approach
    10 tests (less is more)
    Fluent waits
    Result: < 6min test suite
    51

    View Slide

  52. Fluent waits?
    “Implicit… explicit… mayonese”
    We don’t need to reinvent the
    wheel.
    Selenium libs have this for ages.
    52

    View Slide

  53. Fluent waits!
    Note: If you’re using WebDriverWait

    you’re using FluentWait
    53

    View Slide

  54. Expected conditions
    Check the doc: https://goo.gl/J17NVD
    54

    View Slide

  55. Google it today.
    It’s worth it.
    55

    View Slide

  56. 3) Example: Stream Processing

    (i.e. Apache Storm)
    56

    View Slide

  57. The context
    (new) project
    10 person team
    Process half million
    critical messages p/day
    This is just the
    beginning (MVP)
    57

    View Slide

  58. A recent problem
    Storm + Scala project
    1 small integration (e2e) test suite
    Critical sanity/regression checks
    Needs to take less than 2-3 min.
    How do we handle waits?
    58

    View Slide

  59. How a typical e2e test works
    Topology
    Consumer
    Event triggered Expected state
    ?
    59

    View Slide

  60. Do we hammer 10 sec waits?
    Maybe 20 sec?
    60

    View Slide

  61. Nope. We use what we have at
    hand.
    61

    View Slide

  62. Scalatest’s “eventually”
    62

    View Slide

  63. How you can do it
    63

    View Slide

  64. Why is time that important?
    64

    View Slide

  65. If e2e take 3 mins,
    Our current pipeline takes 10 mins
    65

    View Slide

  66. If e2e take 6 mins (+3)
    Our CI/CD pipeline takes 13 mins

    10 runs per day
    Time wasted per day: 30 mins
    Per month: 10 hours
    66

    View Slide

  67. 4) Conclusions
    67

    View Slide

  68. –Michael Bolton (twitter, September 2017)
    “If you’re not scrutinising test code, why do
    you trust it any more than your production
    code? Especially when no problems are
    reported?”
    68

    View Slide

  69. –Michael Bolton (twitter, September 2017)
    “Lesson: don’t blindly trust your test code
    or your automated checks, lest they fail to
    reveal important problems in production
    code.”
    69

    View Slide

  70. Smarter waits don’t solve…
    70

    View Slide

  71. Immutable poor culture
    71

    View Slide

  72. Immutable poor decisions
    72

    View Slide

  73. Waiting for change is boring.
    Stop waiting.
    73

    View Slide

  74. Thank you. Questions?
    filfreire filrfreire
    74

    View Slide