Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Nondeterministic Software for the Rest of Us

Tomer Gabel
November 14, 2018

Nondeterministic Software for the Rest of Us

A talk given at BuildStuff 2018 in Vilnius, Lithuania.

Classically-trained (if you can call it that) software engineers are used to clear problem statements and clear success and acceptance criteria. Need a mobile front-end for your blog? Sure! Support instant messaging for a million concurrent users? No problem! Store and serve 50TB of JSON blobs? Presto!

Unfortunately, it turns out modern software often includes challenges that we have a hard time with: those without clear criteria for correctness, no easy way to measure performance and success is about more than green dashboards. Your blog platform better have a spam filter, your instant messaging service has to have search, and your blobs will inevitably be fed into some data scientist's crazy contraption.

In this talk I'll share my experiences of learning to deal with non-deterministic problems, what made the process easier for me and what I've learned along the way. With any luck, you'll have an easier time of it!

Tomer Gabel

November 14, 2018
Tweet

More Decks by Tomer Gabel

Other Decks in Programming

Transcript

  1. NONDETERMINISTIC SOFTWARE FOR THE REST OF US An exercise in

    frustration by Tomer Gabel BuildStuff 2018, Lithuania Follow along! https://tinyurl.com/nondeterminism
  2. Case Study #1 • Delver, circa 2007 • We built

    a search engine • What’s expected? – Performant (<1 sec) – Reliable – Useful
  3. Let me take you back… Spec Tests Code Deployment •

    We applied good old fashioned engineering • It was kind of great! – Reliability – Fast iteration – Built-in regression suite
  4. Let me take you back… • So yeah, we coded

    it • And it worked… sort of – It was highly available – It responded within SLA – … but with crap results • Green tests aren’t everything!
  5. Furthermore • Not all software can be acceptance-tested – Qualitative/subjective

    (e.g. search, social feed) – Huge input space (e.g. machine vision) Image: Cristian David
  6. Furthermore • Not all software can be acceptance-tested – Qualitative/subjective

    (e.g. search, social feed) – Huge input space (e.g. machine vision) – Resource-constrained (e.g. Lyft or Uber) Image: rideshareapps.com
  7. Getting Started • For any product of any scale, always

    ask: – What does success look like? Image: Hole in the Wall, FremantleMedia North America
  8. Getting Started • For any product of any scale, always

    ask: – What does success look like? – How can I measure success? Image: Hole in the Wall, FremantleMedia North America
  9. Getting Started • For any product of any scale, always

    ask: – What does success look like? – How can I measure success? • You’re an engineer! – Intuition can’t replace data – QA can’t save your butt Image: Hole in the Wall, FremantleMedia North America
  10. What should you measure? • (Un-) fortunately, you have customers

    • Analyze their behavior – What do they want? – What influences your quality of service? • For a search engine… Query Skim Decide Follow Refinement Paging
  11. What should you measure? • (Un-) fortunately, you have customers

    • Analyze their behavior – What do they want? – What influences your quality of service? • For a search engine… Query Skim Decide Follow Refinement Paging Signal Signal Signal
  12. What should you measure? Paging – “Not relevant enough” Refinement

    – “Not what I meant” Query Skim Decide Follow Refinement Paging
  13. What should you measure? Paging – “Not relevant enough” Refinement

    – “Not what I meant” Clickthrough – “Bingo!” Query Skim Decide Follow Refinement Paging
  14. What should you measure? Paging – “Not relevant enough” Refinement

    – “Not what I meant” Clickthrough – “Bingo!” Bonus: Abandonment – ”You suck” Query Skim Decide Follow Refinement Paging
  15. Well now! • We’ve been having this conversation for years

    • Mostly with… – Product managers – Business analysis – Data engineers • Guess what? Product Changes R&D Deployment Measurement Analysis
  16. Well now! • We’ve been having this conversation for years

    • Mostly with… – Product managers – Business analysis – Data engineers • Guess what? Product Changes R&D Deployment Measurement Analysis Informed by BI
  17. What can we learn from BI? • Analysis • Experimentation

    • Iteration Ø Be mindful of your users Ø Talk to your analysts!
  18. What can we learn from BI? • Analysis • Experimentation

    • Iteration Ø Invest in A/B tests Ø Prove your improvements!
  19. What can we learn from BI? • Analysis • Experimentation

    • Iteration Ø Establish your baseline Ø Invest in metric collection and dashboards
  20. Case Study #2 • newBrandAnalytics, circa 2011 • A social

    listening platform – Finds user-generated content (e.g. reviews) – Provides operational analytics
  21. Social Listening Platform • A three-stage pipeline Acquisition • 3rd

    party ingestion • BizDev • Web scraping Analysis • Manual tagging/training • NLP/ML models Analytics • Dashboards • Ad-hoc query/drilldown • Reporting
  22. Social Listening Platform • A three-stage pipeline • My team

    focused on data acquisition • Let’s discuss web scraping – Structured data extraction – At scale – Reliability is paramount Acquisition • 3rd party ingestion • BizDev • Web scraping Analysis • Manual tagging/training • NLP/ML models Analytics • Dashboards • Ad-hoc query/drilldown • Reporting
  23. Large-Scale Scraping • A two-pronged problem • Target sites… –

    Can change at the drop of a hat – Actively resist scraping! • Both are external constraints • Neither can be unit-tested
  24. Optimizing for User Happiness • Users consume reviews • What

    do they want? – Completeness (no missed reviews) – Correctness (no duplicates/garbage) – Timeliness (near real-time) TripAdvisor Twitter Yelp … Data Acquisition Reports Notifications Data Lake
  25. Putting It Together • How do we measure completeness? •

    Manually – Costly, time consuming – Sampled (by definition) Image: Keypunching at Texas A&M, Cushing Memorial Library and Archives, Texas A&M (CC-BY 2.0)
  26. Putting It Together • How do we measure completeness? •

    Manually – Costly, time consuming – Sampled (by definition) • Automatically – Re-scrape a known subset – Produce similarity score
  27. Putting It Together • How do we measure completeness? •

    Manually – Costly, time consuming – Sampled (by definition) • Automatically – Re-scrape a known subset – Produce similarity score • Same with correctness
  28. Putting It Together • Targets do not want to be

    scraped • Major sites employ: – IP throttling – Traffic fingerprinting • 3rd party proxies are expensive Image from the movie “UHF", Metro-Goldwyn-Mayer
  29. Putting It Together • What of timeliness? • It’s an

    optimization problem – Polling frequency determines latency – But polling has a cost – “Good” is a tradeoff
  30. Putting It Together • So then, timeliness…? • First, build

    a cost model – Review acquisition cost – Break it down by source • Next, put together SLAs – Reflect cost in pricing! – Adjust scheduler by SLA
  31. Recap 1. ”Correct” and “Good” are separate dimensions 2. Users

    are part of your system 3. Systems are not snapshots. Measure continuously Image: Confused Monkey, Michael Keen (CC BY-NC-ND 2.0)
  32. QUESTIONS? Thank you for listening [email protected] @tomerg http://www.tomergabel.com This work

    is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.