Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Jan Stępień - Tracking those who Track

Jan Stępień - Tracking those who Track

Talk by Jan Stępień at the firsta Munich DataGeeks Meetup
Data: 02.07.2013

Munich DataGeeks

July 02, 2013

More Decks by Munich DataGeeks

Other Decks in Technology


  1. http_if_none_match http_referer http_accept_encoding http_accept http_cookie http_connection http_host http_user_agent http_version path_info

    http_accept_charset http_accept_language http_cache_control http_if_modified_since request_method request_path request_uri query_string remote_host remote_addr script_name server_name server_port server_protocol http_dnt timestamp
  2. 03 04 05 06 07 08 09 10 11 12

    01 02 03 04 05 06 15k 10k 5k 0
  3. 00 01 02 03 04 05 06 07 08 09

    10 11 12 13 14 15 16 17 18 19 20 21 22 23 100 0 200 300 400 500
  4. www.google-analytics.com 36197 static.adzerk.net 13983 edge.quantserve.com 11659 www.facebook.com 9641 ad.doubleclick.net 3822

    pagead2.googlesyndication.com 3764 s.youtube.com 2173 b.scorecardresearch.com 1974 pubads.g.doubleclick.net 1465 googleads.g.doubleclick.net 1231
  5. 1. Select request from popular domains 2. Group requests into

    15 minute intervals 3. Count domains per interval
  6. cluster 0 1268 cluster 1 702 cluster 2 651 cluster

    3 2387 What is the meaning behind these clusters?
  7. 0 1 2 3 ← classified as 1188 29 11

    40 cluster 0 47 654 1 0 cluster 1 10 1 622 18 cluster 2 50 0 18 2319 cluster 3 cluster 0: rubyonrails.pl developer.android.com amazon.com youtube.com cluster 1: linkedin.com dictionary.reference.com meetup.com cluster 2: reddit.com redditmedia.com bbc.co.uk cluster 3: stackoverflow.com