Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Tracking All the Things

Tracking All the Things

For the last two years, Chris Powers has led the development of Groupon's "Bloodhound" tracking system which collects and records user behavior metrics across the globe.

Throughout this process, nearly as many things went wrong as went right. Lessons were learned along this path towards "tracking all the things", and Chris will be pointing out both the pitfalls and the big wins to look for while building out a behavior tracking infrastructure.

Chris Powers

March 25, 2014
Tweet

More Decks by Chris Powers

Other Decks in Technology

Transcript

  1. Where We Were (Tech) Tracking Chaos! 20-30 tracking pixels per

    page Tracking servers on the brink of falling over Multiple tracking systems, schemas across platforms Unable to collect rich data
  2. Where We Were (Org) Organizational Chaos! No central ownership, no

    oversight Little communication/collaboration between trackers Multiple tracking systems across organization Different groups with different data requirements
  3. Data Consumers CEO: Key Business Metrics Marketing: Attribution, SEM Spend

    PM/Developers: Feature Metrics, Lift Data Scientists: Data Spelunking
  4. API Desktop Browsers Mobile Browsers iOS Android Web App Touch

    App Backend Service Backend Service Email
  5. Tracking Dichotomies Centralized vs. Decentralized (Core vs. Ad Hoc) Explicit

    vs. Implicit Fat vs. Skinny Client-side vs. Server-side Logic
  6. Message Overhead Client/Server Event IDs Client/Server Timestamps Client Message Index

    Schema ID App ID - where did the message come from? Message Encoding - To JSON, or not to JSON?
  7. User/Device Data User Identifier User Agent (parsed or raw) Metadata:

    Number visits, kind of user, etc. Logged in / logged out Leave tracking cookie after logout?
  8. Sessionization Session ID (Timestamp + User ID) Session Expiry Logic

    Set cookie w/ wildcard domain, time limit (not session) Session ID will be used as a key for client data Referral Data
  9. Page View Data What constitutes a “page view”? AJAX? Mobile?

    Page ID (Session ID + Timestamp) Type (what kind of a page) URL (parsed or not?) Country/Locale (can this change in a session?)
  10. Page View Data, Cont. App Specific Metadata Referring Page ID,

    Referring Click ID Other app specific attribution logic Tracking Library Version Beware the Back Button, Reload & Browser Cache
  11. Importance of Visualizations Make the (nearly) invisible visible! Improve adoption

    from non-tech team members Improve adoption from developers too! Provide visibility outside of engineering/product org
  12. Delivering Client Messages Batch messages to reduce overhead, server load

    Persist message cache across page loads Use retry logic if delivery fails Options for encoding batches Origin switches are a pain point (http/https, subdomain)
  13. Persistence Tips Version all messages Create migration functions to upgrade

    data to newest version Isolate top-level localStorage keys to reduce churn Abstract away storage engine (cookie, localStorage, db) Track storage usage, fire warning messages Purge old data (beware of ghosts)
  14. Data Verification Surprisingly hard to verify correctness Define success metrics

    and loss thresholds up front You will have data loss - how will you measure it? Identify points throughout pipe where validation can occur Consider a tracking pixel as a point of comparison Use message indexes to identify dropped messages
  15. Testing the Tracking Unit Test the tracking components and all

    “core” tracking Use a crawler to fire messages, look for them in data stores Keep realtime alerts looking for missing/malformed keys Develop process when quality is questioned, don’t panic!
  16. Tracking Security Nothing from the client can be fully trusted.

    Bots will run JS, and they will do weird things. Be prepared and able to throw out data by session/user/ip. Know your users’ patterns, and identify outliers. Be prepared to block IPs from the tracking endpoint.