NONDETERMINISTIC SOFTWARE
FOR THE REST OF US
An exercise in frustration by Tomer Gabel
BuildStuff 2018, Lithuania
Follow along!
https://tinyurl.com/nondeterminism
Slide 2
Slide 2 text
Case Study #1
• Delver, circa 2007
• We built a search engine
• What’s expected?
– Performant (<1 sec)
– Reliable
– Useful
Slide 3
Slide 3 text
Let me take you back…
Spec
Tests
Code
Deployment
• We applied good old
fashioned engineering
• It was kind of great!
– Reliability
– Fast iteration
– Built-in regression suite
Slide 4
Slide 4 text
Let me take you back…
• So yeah, we coded it
• And it worked… sort of
– It was highly available
– It responded within SLA
– … but with crap results
• Green tests aren’t
everything!
Slide 5
Slide 5 text
Furthermore
• Not all software can be
acceptance-tested
– Qualitative/subjective
(e.g. search, social feed)
Slide 6
Slide 6 text
Furthermore
• Not all software can be
acceptance-tested
– Qualitative/subjective
(e.g. search, social feed)
– Huge input space
(e.g. machine vision)
Image: Cristian David
Slide 7
Slide 7 text
Furthermore
• Not all software can be
acceptance-tested
– Qualitative/subjective
(e.g. search, social feed)
– Huge input space
(e.g. machine vision)
– Resource-constrained
(e.g. Lyft or Uber)
Image: rideshareapps.com
Slide 8
Slide 8 text
“CORRECT” AND “GOOD”
ARE SEPARATE DIMENSIONS
Takeaway #1
Slide 9
Slide 9 text
Getting Started
• For any product of any
scale, always ask:
– What does success look like?
Image: Hole in the Wall, FremantleMedia North America
Slide 10
Slide 10 text
Getting Started
• For any product of any
scale, always ask:
– What does success look like?
– How can I measure success?
Image: Hole in the Wall, FremantleMedia North America
Slide 11
Slide 11 text
Getting Started
• For any product of any
scale, always ask:
– What does success look like?
– How can I measure success?
• You’re an engineer!
– Intuition can’t replace data
– QA can’t save your butt
Image: Hole in the Wall, FremantleMedia North America
Slide 12
Slide 12 text
What should you measure?
• (Un-) fortunately, you
have customers
• Analyze their behavior
– What do they want?
– What influences your
quality of service?
• For a search engine…
Query
Skim
Decide
Follow
Refinement
Paging
Slide 13
Slide 13 text
USERS ARE PART OF YOUR SYSTEM
Takeaway #2
Slide 14
Slide 14 text
What should you measure?
• (Un-) fortunately, you
have customers
• Analyze their behavior
– What do they want?
– What influences your
quality of service?
• For a search engine…
Query
Skim
Decide
Follow
Refinement
Paging
Signal
Signal
Signal
Slide 15
Slide 15 text
What should you measure?
Paging
– “Not relevant enough”
Query
Skim
Decide
Follow
Refinement
Paging
Slide 16
Slide 16 text
What should you measure?
Paging
– “Not relevant enough”
Refinement
– “Not what I meant”
Query
Skim
Decide
Follow
Refinement
Paging
Slide 17
Slide 17 text
What should you measure?
Paging
– “Not relevant enough”
Refinement
– “Not what I meant”
Clickthrough
– “Bingo!”
Query
Skim
Decide
Follow
Refinement
Paging
Slide 18
Slide 18 text
What should you measure?
Paging
– “Not relevant enough”
Refinement
– “Not what I meant”
Clickthrough
– “Bingo!”
Bonus: Abandonment
– ”You suck”
Query
Skim
Decide
Follow
Refinement
Paging
Slide 19
Slide 19 text
It should.
Is this starting to look familiar?
Slide 20
Slide 20 text
Well now!
• We’ve been having this
conversation for years
• Mostly with…
– Product managers
– Business analysis
– Data engineers
• Guess what?
Product
Changes
R&D
Deployment
Measurement
Analysis
Slide 21
Slide 21 text
Well now!
• We’ve been having this
conversation for years
• Mostly with…
– Product managers
– Business analysis
– Data engineers
• Guess what?
Product
Changes
R&D
Deployment
Measurement
Analysis
Informed
by BI
Slide 22
Slide 22 text
What can we learn from
BI?
• Analysis
• Experimentation
• Iteration
Ø Be mindful of your users
Ø Talk to your analysts!
Slide 23
Slide 23 text
What can we learn from
BI?
• Analysis
• Experimentation
• Iteration
Ø Invest in A/B tests
Ø Prove your
improvements!
Slide 24
Slide 24 text
What can we learn from
BI?
• Analysis
• Experimentation
• Iteration
Ø Establish your baseline
Ø Invest in metric collection
and dashboards
Slide 25
Slide 25 text
SYSTEMS ARE NOT SNAPSHOTS.
MEASURE CONTINUOUSLY
Takeaway #3
Slide 26
Slide 26 text
Hold on to your hats
… this isn’t about search engines
Slide 27
Slide 27 text
Case Study #2
• newBrandAnalytics,
circa 2011
• A social listening platform
– Finds user-generated
content (e.g. reviews)
– Provides operational
analytics
Slide 28
Slide 28 text
Social Listening Platform
• A three-stage pipeline
Acquisition
• 3rd party ingestion
• BizDev
• Web scraping
Analysis
• Manual tagging/training
• NLP/ML models
Analytics
• Dashboards
• Ad-hoc query/drilldown
• Reporting
Slide 29
Slide 29 text
Social Listening Platform
• A three-stage pipeline
• My team focused on data
acquisition
• Let’s discuss web scraping
– Structured data extraction
– At scale
– Reliability is paramount
Acquisition
• 3rd party ingestion
• BizDev
• Web scraping
Analysis
• Manual tagging/training
• NLP/ML models
Analytics
• Dashboards
• Ad-hoc query/drilldown
• Reporting
Slide 30
Slide 30 text
Large-Scale Scraping
• A two-pronged problem
• Target sites…
– Can change at the drop of a hat
– Actively resist scraping!
• Both are external constraints
• Neither can be unit-tested
Slide 31
Slide 31 text
Optimizing for User
Happiness
• Users consume reviews
• What do they want?
– Completeness
(no missed reviews)
– Correctness
(no duplicates/garbage)
– Timeliness
(near real-time)
TripAdvisor
Twitter
Yelp
…
Data Acquisition
Reports
Notifications
Data Lake
Slide 32
Slide 32 text
Putting It Together
• How do we measure
completeness?
• Manually
– Costly, time consuming
– Sampled (by definition)
Image: Keypunching at Texas A&M, Cushing Memorial Library and Archives, Texas A&M (CC-BY 2.0)
Slide 33
Slide 33 text
Putting It Together
• How do we measure
completeness?
• Manually
– Costly, time consuming
– Sampled (by definition)
• Automatically
– Re-scrape a known subset
– Produce similarity score
Slide 34
Slide 34 text
Putting It Together
• How do we measure
completeness?
• Manually
– Costly, time consuming
– Sampled (by definition)
• Automatically
– Re-scrape a known subset
– Produce similarity score
• Same with correctness
Slide 35
Slide 35 text
Putting It Together
• Targets do not want
to be scraped
• Major sites employ:
– IP throttling
– Traffic fingerprinting
• 3rd party proxies are
expensive
Image from the movie “UHF", Metro-Goldwyn-Mayer
Slide 36
Slide 36 text
Putting It Together
• What of timeliness?
• It’s an optimization
problem
– Polling frequency
determines latency
– But polling has a cost
– “Good” is a tradeoff
Slide 37
Slide 37 text
Putting It Together
• So then, timeliness…?
• First, build a cost
model
– Review acquisition cost
– Break it down by source
• Next, put together SLAs
– Reflect cost in pricing!
– Adjust scheduler by SLA
Slide 38
Slide 38 text
Recap
1. ”Correct” and “Good” are
separate dimensions
2. Users are part of your
system
3. Systems are not
snapshots.
Measure continuously
Image: Confused Monkey, Michael Keen (CC BY-NC-ND 2.0)
Slide 39
Slide 39 text
QUESTIONS?
Thank you for listening
tomer@tomergabel.com
@tomerg
http://www.tomergabel.com
This work is licensed under a Creative
Commons Attribution-ShareAlike 4.0
International License.