How do you collect data and run experiments on users in an ethical way?
Presented as a keynote at O'Reilly Velocity NYC 2018.
“Do you think other browser makers collect this type of data?”
Not an ethicist
How To Be Perfect
How To Be Perfect
This is what we do. It’s not perfect.
This approach is open source so you can steal it and make it better.
Give us your feedback so we can make it better too.
Collect only what you need
Keep it for the minimum amount of time
Don’t violate user expectations
Classes of Data
Category 1: Technical Data
Examples: OS, available memory, version number
Generally okay to collect, opt-out
Category 2: Interaction Data
Examples: # of tabs, session length, conﬁg settings, feature use
Generally okay to collect, opt-out.
Category 3: Web Activity Data
Example: browsing history
Stickier. Usually no, but may be possible with mitigation.
Category 4: Highly Sensitive Data
Examples: email, username, identiﬁers
Assume no. Maybe opt-in with advance notice, user consent, and secondary opt-out.
Collecting data is simple
1. Request for collection
2. Review by data steward
What is a Data Steward?
Allows reasoning about data collection
Privacy Preserving Data Collection
–Rebecca Weiss, Director of Data Science
‘By not performing A/B tests before we release new features and
products, we are guilty of administering massive uncontrolled
experiments upon our users.
The only outcome measure that we can observe as a result of
these experiments is “how many users have we driven away
since we released that feature?”’
How’d that happen?
Good intentions, road to hell, etc
No data collected
No one felt empowered to say no
What did we learn?
More formal process
Deﬁnition of red ﬂags
Deeper engineering review
Documented escalation paths
“Burn it all. Burn it to the ground.”
We can all do better.
Learn from your mistakes.
Steal these ideas.
Steward your users’ data wisely.
Come ask questions.