Slide 1

Slide 1 text

NONDETERMINISTIC SOFTWARE FOR THE REST OF US An exercise in frustration by Tomer Gabel BuildStuff 2018, Lithuania Follow along! https://tinyurl.com/nondeterminism

Slide 2

Slide 2 text

Case Study #1 • Delver, circa 2007 • We built a search engine • What’s expected? – Performant (<1 sec) – Reliable – Useful

Slide 3

Slide 3 text

Let me take you back… Spec Tests Code Deployment • We applied good old fashioned engineering • It was kind of great! – Reliability – Fast iteration – Built-in regression suite

Slide 4

Slide 4 text

Let me take you back… • So yeah, we coded it • And it worked… sort of – It was highly available – It responded within SLA – … but with crap results • Green tests aren’t everything!

Slide 5

Slide 5 text

Furthermore • Not all software can be acceptance-tested – Qualitative/subjective (e.g. search, social feed)

Slide 6

Slide 6 text

Furthermore • Not all software can be acceptance-tested – Qualitative/subjective (e.g. search, social feed) – Huge input space (e.g. machine vision) Image: Cristian David

Slide 7

Slide 7 text

Furthermore • Not all software can be acceptance-tested – Qualitative/subjective (e.g. search, social feed) – Huge input space (e.g. machine vision) – Resource-constrained (e.g. Lyft or Uber) Image: rideshareapps.com

Slide 8

Slide 8 text

“CORRECT” AND “GOOD” ARE SEPARATE DIMENSIONS Takeaway #1

Slide 9

Slide 9 text

Getting Started • For any product of any scale, always ask: – What does success look like? Image: Hole in the Wall, FremantleMedia North America

Slide 10

Slide 10 text

Getting Started • For any product of any scale, always ask: – What does success look like? – How can I measure success? Image: Hole in the Wall, FremantleMedia North America

Slide 11

Slide 11 text

Getting Started • For any product of any scale, always ask: – What does success look like? – How can I measure success? • You’re an engineer! – Intuition can’t replace data – QA can’t save your butt Image: Hole in the Wall, FremantleMedia North America

Slide 12

Slide 12 text

What should you measure? • (Un-) fortunately, you have customers • Analyze their behavior – What do they want? – What influences your quality of service? • For a search engine… Query Skim Decide Follow Refinement Paging

Slide 13

Slide 13 text

USERS ARE PART OF YOUR SYSTEM Takeaway #2

Slide 14

Slide 14 text

What should you measure? • (Un-) fortunately, you have customers • Analyze their behavior – What do they want? – What influences your quality of service? • For a search engine… Query Skim Decide Follow Refinement Paging Signal Signal Signal

Slide 15

Slide 15 text

What should you measure? Paging – “Not relevant enough” Query Skim Decide Follow Refinement Paging

Slide 16

Slide 16 text

What should you measure? Paging – “Not relevant enough” Refinement – “Not what I meant” Query Skim Decide Follow Refinement Paging

Slide 17

Slide 17 text

What should you measure? Paging – “Not relevant enough” Refinement – “Not what I meant” Clickthrough – “Bingo!” Query Skim Decide Follow Refinement Paging

Slide 18

Slide 18 text

What should you measure? Paging – “Not relevant enough” Refinement – “Not what I meant” Clickthrough – “Bingo!” Bonus: Abandonment – ”You suck” Query Skim Decide Follow Refinement Paging

Slide 19

Slide 19 text

It should. Is this starting to look familiar?

Slide 20

Slide 20 text

Well now! • We’ve been having this conversation for years • Mostly with… – Product managers – Business analysis – Data engineers • Guess what? Product Changes R&D Deployment Measurement Analysis

Slide 21

Slide 21 text

Well now! • We’ve been having this conversation for years • Mostly with… – Product managers – Business analysis – Data engineers • Guess what? Product Changes R&D Deployment Measurement Analysis Informed by BI

Slide 22

Slide 22 text

What can we learn from BI? • Analysis • Experimentation • Iteration Ø Be mindful of your users Ø Talk to your analysts!

Slide 23

Slide 23 text

What can we learn from BI? • Analysis • Experimentation • Iteration Ø Invest in A/B tests Ø Prove your improvements!

Slide 24

Slide 24 text

What can we learn from BI? • Analysis • Experimentation • Iteration Ø Establish your baseline Ø Invest in metric collection and dashboards

Slide 25

Slide 25 text

SYSTEMS ARE NOT SNAPSHOTS. MEASURE CONTINUOUSLY Takeaway #3

Slide 26

Slide 26 text

Hold on to your hats … this isn’t about search engines

Slide 27

Slide 27 text

Case Study #2 • newBrandAnalytics, circa 2011 • A social listening platform – Finds user-generated content (e.g. reviews) – Provides operational analytics

Slide 28

Slide 28 text

Social Listening Platform • A three-stage pipeline Acquisition • 3rd party ingestion • BizDev • Web scraping Analysis • Manual tagging/training • NLP/ML models Analytics • Dashboards • Ad-hoc query/drilldown • Reporting

Slide 29

Slide 29 text

Social Listening Platform • A three-stage pipeline • My team focused on data acquisition • Let’s discuss web scraping – Structured data extraction – At scale – Reliability is paramount Acquisition • 3rd party ingestion • BizDev • Web scraping Analysis • Manual tagging/training • NLP/ML models Analytics • Dashboards • Ad-hoc query/drilldown • Reporting

Slide 30

Slide 30 text

Large-Scale Scraping • A two-pronged problem • Target sites… – Can change at the drop of a hat – Actively resist scraping! • Both are external constraints • Neither can be unit-tested

Slide 31

Slide 31 text

Optimizing for User Happiness • Users consume reviews • What do they want? – Completeness (no missed reviews) – Correctness (no duplicates/garbage) – Timeliness (near real-time) TripAdvisor Twitter Yelp … Data Acquisition Reports Notifications Data Lake

Slide 32

Slide 32 text

Putting It Together • How do we measure completeness? • Manually – Costly, time consuming – Sampled (by definition) Image: Keypunching at Texas A&M, Cushing Memorial Library and Archives, Texas A&M (CC-BY 2.0)

Slide 33

Slide 33 text

Putting It Together • How do we measure completeness? • Manually – Costly, time consuming – Sampled (by definition) • Automatically – Re-scrape a known subset – Produce similarity score

Slide 34

Slide 34 text

Putting It Together • How do we measure completeness? • Manually – Costly, time consuming – Sampled (by definition) • Automatically – Re-scrape a known subset – Produce similarity score • Same with correctness

Slide 35

Slide 35 text

Putting It Together • Targets do not want to be scraped • Major sites employ: – IP throttling – Traffic fingerprinting • 3rd party proxies are expensive Image from the movie “UHF", Metro-Goldwyn-Mayer

Slide 36

Slide 36 text

Putting It Together • What of timeliness? • It’s an optimization problem – Polling frequency determines latency – But polling has a cost – “Good” is a tradeoff

Slide 37

Slide 37 text

Putting It Together • So then, timeliness…? • First, build a cost model – Review acquisition cost – Break it down by source • Next, put together SLAs – Reflect cost in pricing! – Adjust scheduler by SLA

Slide 38

Slide 38 text

Recap 1. ”Correct” and “Good” are separate dimensions 2. Users are part of your system 3. Systems are not snapshots. Measure continuously Image: Confused Monkey, Michael Keen (CC BY-NC-ND 2.0)

Slide 39

Slide 39 text

QUESTIONS? Thank you for listening tomer@tomergabel.com @tomerg http://www.tomergabel.com This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.