Slide 1

Slide 1 text

Tracking All the Things Chris Powers Engineering Manager at Groupon

Slide 2

Slide 2 text

Where We Were (Tech) Tracking Chaos! 20-30 tracking pixels per page Tracking servers on the brink of falling over Multiple tracking systems, schemas across platforms Unable to collect rich data

Slide 3

Slide 3 text

Where We Were (Org) Organizational Chaos! No central ownership, no oversight Little communication/collaboration between trackers Multiple tracking systems across organization Different groups with different data requirements

Slide 4

Slide 4 text

Data Consumers CEO: Key Business Metrics Marketing: Attribution, SEM Spend PM/Developers: Feature Metrics, Lift Data Scientists: Data Spelunking

Slide 5

Slide 5 text

Where do we track?

Slide 6

Slide 6 text

API Desktop Browsers Mobile Browsers iOS Android Web App Touch App Backend Service Backend Service Email

Slide 7

Slide 7 text

Browsers Mobile API Services Email Transform Data Warehouse Real-time Analytics Ad Hoc Queries Raw Store UAT, Staging, Production

Slide 8

Slide 8 text

What do we track?

Slide 9

Slide 9 text

Tracking Dichotomies Centralized vs. Decentralized (Core vs. Ad Hoc) Explicit vs. Implicit Fat vs. Skinny Client-side vs. Server-side Logic

Slide 10

Slide 10 text

Message Overhead Client/Server Event IDs Client/Server Timestamps Client Message Index Schema ID App ID - where did the message come from? Message Encoding - To JSON, or not to JSON?

Slide 11

Slide 11 text

TIME User Session Session Page Page Page Page Page Page Page TRACKING EVENTS

Slide 12

Slide 12 text

User/Device Data User Identifier User Agent (parsed or raw) Metadata: Number visits, kind of user, etc. Logged in / logged out Leave tracking cookie after logout?

Slide 13

Slide 13 text

Sessionization Session ID (Timestamp + User ID) Session Expiry Logic Set cookie w/ wildcard domain, time limit (not session) Session ID will be used as a key for client data Referral Data

Slide 14

Slide 14 text

Page View Data What constitutes a “page view”? AJAX? Mobile? Page ID (Session ID + Timestamp) Type (what kind of a page) URL (parsed or not?) Country/Locale (can this change in a session?)

Slide 15

Slide 15 text

Page View Data, Cont. App Specific Metadata Referring Page ID, Referring Click ID Other app specific attribution logic Tracking Library Version Beware the Back Button, Reload & Browser Cache

Slide 16

Slide 16 text

How do we track?

Slide 17

Slide 17 text

Explicit Tracking TrackingHub - simple one liner tracking TrackingHub.add(“msgName”, {some: “data”})

Slide 18

Slide 18 text

Implicit Tracking Bloodhound - Easy impression/click tracking
...
...

Slide 19

Slide 19 text

Importance of Visualizations Make the (nearly) invisible visible! Improve adoption from non-tech team members Improve adoption from developers too! Provide visibility outside of engineering/product org

Slide 20

Slide 20 text

Delivering Client Messages Batch messages to reduce overhead, server load Persist message cache across page loads Use retry logic if delivery fails Options for encoding batches Origin switches are a pain point (http/https, subdomain)

Slide 21

Slide 21 text

Persistence Tips Version all messages Create migration functions to upgrade data to newest version Isolate top-level localStorage keys to reduce churn Abstract away storage engine (cookie, localStorage, db) Track storage usage, fire warning messages Purge old data (beware of ghosts)

Slide 22

Slide 22 text

Data Verification Surprisingly hard to verify correctness Define success metrics and loss thresholds up front You will have data loss - how will you measure it? Identify points throughout pipe where validation can occur Consider a tracking pixel as a point of comparison Use message indexes to identify dropped messages

Slide 23

Slide 23 text

Testing the Tracking Unit Test the tracking components and all “core” tracking Use a crawler to fire messages, look for them in data stores Keep realtime alerts looking for missing/malformed keys Develop process when quality is questioned, don’t panic!

Slide 24

Slide 24 text

Tracking Security Nothing from the client can be fully trusted. Bots will run JS, and they will do weird things. Be prepared and able to throw out data by session/user/ip. Know your users’ patterns, and identify outliers. Be prepared to block IPs from the tracking endpoint.

Slide 25

Slide 25 text

Questions? @chrisjpowers