Third-party web tracking is a serious privacy issue.
Advertisement sites and social networking sites stealthily collect users' web browsing history for purposes such as targeted advertising or predicting trends.
Unfortunately, very few Internet users realize this, and their privacy has been infringed upon since they have no means of recognizing the situation.
This paper presents the design and implementation of a system called MindYourPrivacy that visualizes third-party web tracking and clarifies the entities threatening users' privacy.
The implementation adopts deep packet inspection, DNS-SOA-record-based categorization, and HTTP-referred graph analysis to visualize collectors of web browsing histories without device dependency.
In order to demonstrate the effectiveness of our proof-of-concept implementation, we conducted an experiment in an IT technology camp, where 129 attendees discussed IT technologies for four days,
The experiment's results revealed that visualizing web tracking effectively influences users' perception of privacy.
The result of analysis of user data we collected at the camp also revealed that MCODE clustering and some features derived by graph theory are useful for detecting advertising sites that potentially collect user information by web tracking for their own purposes.
IEEE, 12th Annual Conference on Privacy Security
Trust, PST 2014
MindYourPrivacy: Design and
Implementation of a Visualization
System for Third-Party Web
Yuuki Takano, Satoshi Ohta,
Takeshi Takahashi, Ruo Ando,
❖ The number of third-party Web tracking is growing each year.!
❖ online privacy is now signiﬁcant issue!
❖ SNSs and targeted ads can associate real names of individuals with tracking
❖ Propose MindYourPrivacy to visualize and show third-party web tracking.!
❖ deep-packet-inspection based architecture!
❖ to support heterogeneous browsers and devices!
❖ Experimented MindYourPrivacy at the Workshop (WIDE Camp 2014 Autumn in
JAPAN), which has 129 attendees.!
❖ reveal that clustering web graph helps to detect ads’ sites by analyzing user trafﬁc!
❖ some graph theory features also help to heuristically detect ads sites
Web Tracking Mechanism
❖ Third-party Web tracker typically tracks by cookie,
Etags or ﬂash storage
USBDLJOHJE DPPLJF &UBHT qBTITUPSBHF FUD
YES. Twitter knows our tendency.
Web Tracking Detection Techniques
❖ swap a link to known data-collection sites such as Facebook!
❖ Roesner et al. “Detecting and defending against third-party tracking on the
web”, USENIX NSDI 2012!
❖ visualize web graph between ﬁrst and third-party sites!
❖ AdBlock Plus!
❖ signature based ads detection and blocking!
❖ Several researchers reported on third party web tracker.!
❖ One of the research reported third-party trackers within Alexa’s top 500 domains.!
❖ Roesner et al, “Detecting and defending against third-party tracking on the web”, USENIX NSDI 2012!
e fact that the tracking
t it is thus difﬁcult to
or policy solutions.
s classiﬁcation is ag-
on of the mechanisms
e storage may be done
, and information may
ker in any way. This
anism makes the clas-
evolution of speciﬁc
ework, we created a
rved on the client-side.
Figure 6: Prevalence of Trackers on Top 500 Domains.
Trackers are counted on domains, i.e., if a particular tracker
appears on two pages of a domain, it is counted once.
Top 20 Trackers on Alexa’s Top 500 Domains!
[Roesner et al. NSDI 2012]
❖ We designed and implemented a visualization system for third-party web tracking called
❖ To clearly show third-party web trackers to users.!
❖ Design Principles of MindYourPrivacy!
❖ Independence from browsers and devices!
❖ the existence of various OSes or devices such as Linux, Windows, MacOS, and smartphone
OSes such as Android and iOS complicates the problem!
❖ adopt a deep-packet-inspection based approach to support heterogeneous browsers and devices!
❖ Accessibility and comprehensiveness of the analysis results!
❖ easy to access: MindYourPrivacy provides analysis results in the form of an HTML ﬁle via an
HTTP server to facilitate users’ access to them!
❖ easy to understand: visualize trackers by tag cloud fashion, and provide web graph’s ﬁle further
Design and Implementation
Web Tracker Identification Methodology (1)
❖ HTTP Referrer Web Graph Analysis!
❖ generate a web graph by using HTTP referrer tag!
❖ if an site is referred by many other sites, MindYourPrivacy
assumes that it is a suspicious site tracking users!
❖ Domain Aggregation!
❖ to show users which organizations track them, MindYourPrivacy
aggregates domains as either second or third level!
❖ platform.twitter.com and platform0.twitter.com are aggregated to
Design and Implementation
Web Tracker Identification Methodology (2)
❖ DNS-SOA-Record-Based Grouping!
❖ aggregate domains by DNS SOA record!
❖ facebook.com and facebook.net are aggregated into dns.facebook.com,
which is their DNS SOA record!
❖ Balanchander et al., “Privacy diffusion on the web: a longitudinal
perspective”, WWW 2009!
❖ Weighted site Ranking of User Data Leakage!
❖ MindYourPrivacy shows not only web trackers but also leaking sites to
❖ leaking sites are scored, but the details are omitted here. see our paper
Design and Implementation
❖ MindYourPrivacy captures trafﬁc of users’ web access!
❖ show analyzed results via MindYourPrivacy’s web server!
❖ users need not install or conﬁgure speciﬁc applications
Analyzed Result via HTTP
Design and Implementation
❖ Catenaccio DPI!
❖ capture trafﬁc from network IF!
❖ reconstruct TCP stream and store captured data into
❖ written in C++!
❖ NoSQL DB!
❖ use MongoDB as a database!
❖ Tracking Analyzer!
❖ analyze measurement data!
❖ HTML/Graph File Generator!
❖ generate visualized results!
❖ written in Python!
❖ HTML Server!
❖ serve HTML/Graph ﬁles to users
Catenaccio DPI NoSQL DB
Design and Implementation
Web User Interface
❖ visualize suspicious web trackers as tag cloud fashion!
❖ domains are grouped by DNS SOA records!
❖ referring sites are shown in right pane
Experiment at WIDE Camp 2013 Autumn
❖ We experimented MindYourPrivacy at WIDE camp 2013 autumn.!
❖ WIDE Camp 2013 Autumn (Sep. 10 - Sep. 13)!
❖ a workshop for Internet researchers, operators and developers!
❖ 129 attendees, most of whom are either IT specialists or
students majoring IT!
❖ the experiment is agreed by every attendees (for only research
❖ We captured the attendees’ web browsing trafﬁc and analyzed.
User Traffic Analysis (1)
❖ Obtained 734,194 HTTP
requests and 1,661
individual source IP
addresses (IPv4 and IPv6).!
❖ A directed web graph is
generated by using HTTP
❖ There are 3,966 nodes and
❖ Analyze this web graph to
ﬁnd web trackers.
User Traffic Analysis (2)
❖ To ﬁnd web trackers, we extract top most-referred sites
from the web graph!
❖ Advertisements and social sites, which tend to track
users, have many incoming links.
RLs are only
TABLE II: Top-ﬁve Most-referred Sites
Site # of incoming links
Top-Five Most-referred Sites
User Traffic Analysis (3)
❖ We then adopted a clustering technique (M-CODE) to the web graph.!
❖ As a result of clustering, many ad-sites are found in cluster.
referred Graph Pane: This pane provides referred
.dot and .sif formats. Users can download these
re and analyze or visualize the referred graph by
viz, Cytoscape, etc. Figures 5 and Figure 6 show
examples using Cytoscape. Through this sort of
users can easily ﬁnd to which sites many other
strate the usability and eﬀectiveness of the pro-
m, we conducted an experiment at WIDE camp
September 10–13 2013.
E project  is a research and development
apan aimed at developing a widely integrated
nvironment. It organizes camps every spring and
many researchers, developers, and students tak-
discussing Internet technologies. Table I lists the
f the camp attendees. There were 129 attendees,
m are either IT specialists or students majoring in
conducted two types of experiments: user traﬃc
questionnaire-based use analysis.
whose values are random text strings, the number of coo
values we observed, and examples. In total we obser
2,309 and 2,671 requests for platform.twitter.com
www.facebook.com, respectively. However, we found o
about 100 unique values for each cookie, though fr
www.facebook.com is 397. fr thus does not seem to
tracking cookies, and the 100 likely indicates the numbe
attendees (which was also around 100) or devices. The res
reveal that tracking cookies can also be used for per-u
analysis and visualization.
We then applied MCODE clustering  to the graph
Figure 5 to ﬁnd further features. This allowed us to obse
many ad sites clustered into the rank 1 cluster by MCO
The following domains were ad sites found in the ran
cluster of Figure 6:
advg.jp, adingo.jp, iogous.com, admeld.com,
Ad sites generally tend to collect user information for busin
purposes. We therefore should be concerned with the priv
issues they present. This discovery should help further anal
and visualization concerning such sites. Table IV lists
feature vector of ads and other sites that appeared in Figur
ad-sites in cluster
User Traffic Analysis (4)
❖ We analyzed the cluster from the aspect of graph theory’s feature.!
❖ As a result of that, we found that ad-sites’ #incoming links, #outgoing links
and neighborhood connectivity are quite different from others.!
❖ ad-sites have many incoming links, but few outgoing links!
❖ ad-sites’ neighborhood connectivity is relatively low
Fig. 6: Rank 1 Cluster by MCODE (include loops = false,
degree cutoﬀ = 2, haircut = true, ﬂuﬀ = false, node score
cutoﬀ = 0.2, k-core = 2, and max. depth = 100)
TABLE IV: Feature Vector of Rank 1 Cluster’s Edge (Average
and Unbiased Variance)
#incoming links # of outgoing
avg. var. avg. var. avg. var.
ad sites 90.2 12405.4 15.2 3972.9 46.0 3972.9
others 30.2 3972.9 29.7 569.3 130.2 5212.0
measures, and the most popular measure is to use multiple
browsers. Although multiple browser usage does not strictly
the DNT ﬂag i
tracking; it is ju
referrers or coo
not use SNSs.
pros and cons o
• Use privat
• Delete HT
• Use AdBlo
Modern Web b
mode to isolat
Some of them
for not disablin
blocks online a
privacy are qui
Question 3: D
after seeing the
User Traffic Analysis (5)
❖ Do Not Track ﬂag is used to announce a wish of users to
❖ However only 40,650 (40,605/734,194 = 6 %) DNT
enabled requests are observed.
Conclusion and Future Work
❖ Proposed a visualization system for third-party web tracking called
❖ browser and device independent architecture!
❖ visualize web trackers as tag cloud fashion!
❖ Experimented MindYourPrivacy at WIDE camp 2013 autumn and analyze users’
web browsing trafﬁc.!
❖ generate web graph by HTTP referrer and analyze it!
❖ revealed that graph clustering and some graph theory’s features are useful to
ﬁnd web trackers!
❖ Adopting more sophisticated approaches we revealed at the experiment, and
signature based approach is a future work.