$30 off During Our Annual Pro Sale. View Details »

Nondeterministic Software for the Rest of Us

Nondeterministic Software for the Rest of Us

A talk given at BuildStuff 2018 in Vilnius, Lithuania.

Classically-trained (if you can call it that) software engineers are used to clear problem statements and clear success and acceptance criteria. Need a mobile front-end for your blog? Sure! Support instant messaging for a million concurrent users? No problem! Store and serve 50TB of JSON blobs? Presto!

Unfortunately, it turns out modern software often includes challenges that we have a hard time with: those without clear criteria for correctness, no easy way to measure performance and success is about more than green dashboards. Your blog platform better have a spam filter, your instant messaging service has to have search, and your blobs will inevitably be fed into some data scientist's crazy contraption.

In this talk I'll share my experiences of learning to deal with non-deterministic problems, what made the process easier for me and what I've learned along the way. With any luck, you'll have an easier time of it!

Tomer Gabel

November 14, 2018
Tweet

More Decks by Tomer Gabel

Other Decks in Programming

Transcript

  1. NONDETERMINISTIC SOFTWARE
    FOR THE REST OF US
    An exercise in frustration by Tomer Gabel
    BuildStuff 2018, Lithuania
    Follow along!
    https://tinyurl.com/nondeterminism

    View Slide

  2. Case Study #1
    • Delver, circa 2007
    • We built a search engine
    • What’s expected?
    – Performant (<1 sec)
    – Reliable
    – Useful

    View Slide

  3. Let me take you back…
    Spec
    Tests
    Code
    Deployment
    • We applied good old
    fashioned engineering
    • It was kind of great!
    – Reliability
    – Fast iteration
    – Built-in regression suite

    View Slide

  4. Let me take you back…
    • So yeah, we coded it
    • And it worked… sort of
    – It was highly available
    – It responded within SLA
    – … but with crap results
    • Green tests aren’t
    everything!

    View Slide

  5. Furthermore
    • Not all software can be
    acceptance-tested
    – Qualitative/subjective
    (e.g. search, social feed)

    View Slide

  6. Furthermore
    • Not all software can be
    acceptance-tested
    – Qualitative/subjective
    (e.g. search, social feed)
    – Huge input space
    (e.g. machine vision)
    Image: Cristian David

    View Slide

  7. Furthermore
    • Not all software can be
    acceptance-tested
    – Qualitative/subjective
    (e.g. search, social feed)
    – Huge input space
    (e.g. machine vision)
    – Resource-constrained
    (e.g. Lyft or Uber)
    Image: rideshareapps.com

    View Slide

  8. “CORRECT” AND “GOOD”
    ARE SEPARATE DIMENSIONS
    Takeaway #1

    View Slide

  9. Getting Started
    • For any product of any
    scale, always ask:
    – What does success look like?
    Image: Hole in the Wall, FremantleMedia North America

    View Slide

  10. Getting Started
    • For any product of any
    scale, always ask:
    – What does success look like?
    – How can I measure success?
    Image: Hole in the Wall, FremantleMedia North America

    View Slide

  11. Getting Started
    • For any product of any
    scale, always ask:
    – What does success look like?
    – How can I measure success?
    • You’re an engineer!
    – Intuition can’t replace data
    – QA can’t save your butt
    Image: Hole in the Wall, FremantleMedia North America

    View Slide

  12. What should you measure?
    • (Un-) fortunately, you
    have customers
    • Analyze their behavior
    – What do they want?
    – What influences your
    quality of service?
    • For a search engine…
    Query
    Skim
    Decide
    Follow
    Refinement
    Paging

    View Slide

  13. USERS ARE PART OF YOUR SYSTEM
    Takeaway #2

    View Slide

  14. What should you measure?
    • (Un-) fortunately, you
    have customers
    • Analyze their behavior
    – What do they want?
    – What influences your
    quality of service?
    • For a search engine…
    Query
    Skim
    Decide
    Follow
    Refinement
    Paging
    Signal
    Signal
    Signal

    View Slide

  15. What should you measure?
    Paging
    – “Not relevant enough”
    Query
    Skim
    Decide
    Follow
    Refinement
    Paging

    View Slide

  16. What should you measure?
    Paging
    – “Not relevant enough”
    Refinement
    – “Not what I meant”
    Query
    Skim
    Decide
    Follow
    Refinement
    Paging

    View Slide

  17. What should you measure?
    Paging
    – “Not relevant enough”
    Refinement
    – “Not what I meant”
    Clickthrough
    – “Bingo!”
    Query
    Skim
    Decide
    Follow
    Refinement
    Paging

    View Slide

  18. What should you measure?
    Paging
    – “Not relevant enough”
    Refinement
    – “Not what I meant”
    Clickthrough
    – “Bingo!”
    Bonus: Abandonment
    – ”You suck”
    Query
    Skim
    Decide
    Follow
    Refinement
    Paging

    View Slide

  19. It should.
    Is this starting to look familiar?

    View Slide

  20. Well now!
    • We’ve been having this
    conversation for years
    • Mostly with…
    – Product managers
    – Business analysis
    – Data engineers
    • Guess what?
    Product
    Changes
    R&D
    Deployment
    Measurement
    Analysis

    View Slide

  21. Well now!
    • We’ve been having this
    conversation for years
    • Mostly with…
    – Product managers
    – Business analysis
    – Data engineers
    • Guess what?
    Product
    Changes
    R&D
    Deployment
    Measurement
    Analysis
    Informed
    by BI

    View Slide

  22. What can we learn from
    BI?
    • Analysis
    • Experimentation
    • Iteration
    Ø Be mindful of your users
    Ø Talk to your analysts!

    View Slide

  23. What can we learn from
    BI?
    • Analysis
    • Experimentation
    • Iteration
    Ø Invest in A/B tests
    Ø Prove your
    improvements!

    View Slide

  24. What can we learn from
    BI?
    • Analysis
    • Experimentation
    • Iteration
    Ø Establish your baseline
    Ø Invest in metric collection
    and dashboards

    View Slide

  25. SYSTEMS ARE NOT SNAPSHOTS.
    MEASURE CONTINUOUSLY
    Takeaway #3

    View Slide

  26. Hold on to your hats
    … this isn’t about search engines

    View Slide

  27. Case Study #2
    • newBrandAnalytics,
    circa 2011
    • A social listening platform
    – Finds user-generated
    content (e.g. reviews)
    – Provides operational
    analytics

    View Slide

  28. Social Listening Platform
    • A three-stage pipeline
    Acquisition
    • 3rd party ingestion
    • BizDev
    • Web scraping
    Analysis
    • Manual tagging/training
    • NLP/ML models
    Analytics
    • Dashboards
    • Ad-hoc query/drilldown
    • Reporting

    View Slide

  29. Social Listening Platform
    • A three-stage pipeline
    • My team focused on data
    acquisition
    • Let’s discuss web scraping
    – Structured data extraction
    – At scale
    – Reliability is paramount
    Acquisition
    • 3rd party ingestion
    • BizDev
    • Web scraping
    Analysis
    • Manual tagging/training
    • NLP/ML models
    Analytics
    • Dashboards
    • Ad-hoc query/drilldown
    • Reporting

    View Slide

  30. Large-Scale Scraping
    • A two-pronged problem
    • Target sites…
    – Can change at the drop of a hat
    – Actively resist scraping!
    • Both are external constraints
    • Neither can be unit-tested

    View Slide

  31. Optimizing for User
    Happiness
    • Users consume reviews
    • What do they want?
    – Completeness
    (no missed reviews)
    – Correctness
    (no duplicates/garbage)
    – Timeliness
    (near real-time)
    TripAdvisor
    Twitter
    Yelp

    Data Acquisition
    Reports
    Notifications
    Data Lake

    View Slide

  32. Putting It Together
    • How do we measure
    completeness?
    • Manually
    – Costly, time consuming
    – Sampled (by definition)
    Image: Keypunching at Texas A&M, Cushing Memorial Library and Archives, Texas A&M (CC-BY 2.0)

    View Slide

  33. Putting It Together
    • How do we measure
    completeness?
    • Manually
    – Costly, time consuming
    – Sampled (by definition)
    • Automatically
    – Re-scrape a known subset
    – Produce similarity score

    View Slide

  34. Putting It Together
    • How do we measure
    completeness?
    • Manually
    – Costly, time consuming
    – Sampled (by definition)
    • Automatically
    – Re-scrape a known subset
    – Produce similarity score
    • Same with correctness

    View Slide

  35. Putting It Together
    • Targets do not want
    to be scraped
    • Major sites employ:
    – IP throttling
    – Traffic fingerprinting
    • 3rd party proxies are
    expensive
    Image from the movie “UHF", Metro-Goldwyn-Mayer

    View Slide

  36. Putting It Together
    • What of timeliness?
    • It’s an optimization
    problem
    – Polling frequency
    determines latency
    – But polling has a cost
    – “Good” is a tradeoff

    View Slide

  37. Putting It Together
    • So then, timeliness…?
    • First, build a cost
    model
    – Review acquisition cost
    – Break it down by source
    • Next, put together SLAs
    – Reflect cost in pricing!
    – Adjust scheduler by SLA

    View Slide

  38. Recap
    1. ”Correct” and “Good” are
    separate dimensions
    2. Users are part of your
    system
    3. Systems are not
    snapshots.
    Measure continuously
    Image: Confused Monkey, Michael Keen (CC BY-NC-ND 2.0)

    View Slide

  39. QUESTIONS?
    Thank you for listening
    [email protected]
    @tomerg
    http://www.tomergabel.com
    This work is licensed under a Creative
    Commons Attribution-ShareAlike 4.0
    International License.

    View Slide