most similar #hashtags Similarity based on co-occurrence [Feng2015], corrects for popularity Topic as a means to express their opinion. Using a single hashtag may thus miss part of th relevant posts. To address this limitation, we extend the deﬁnition of topic to be more encompassin Given a seed hashtag, we deﬁne a topic as a set of related hashtags, which co-occur wit the seed hashtag. To ﬁnd related hashtags, we employ (and improve upon) a recen clustering algorithm tailored for the purpose [Feng et al. 2015]. Feng et al. [2015] develop a simple measure to compute the similarity between tw hashtags, which relies on co-occurring words and hashtags. The authors then use th similarity measure to ﬁnd closely related hashtags and deﬁne clusters. However, th simple approach presents one drawback, in that very popular hashtags such as #ff o #follow co-occur with a large number of hashtags. Hence, directly applying the origina approach results in extremely noisy clusters. Since the quality of the topic affect critically the entire pipeline, we want to avert this issue and ensure minimal noise introduced in the expanded set of hashtags. Therefore, we improve the basic approach by taking into account and normalizin for the popularity of the hashtags. Speciﬁcally, we compute the document frequenc of all hashtags on a random 1% sample of the Twitter stream, and normalize th original similarity score between two hashtags by the inverse document frequency. Th similarity score is formally deﬁned as sim ( h s , h t) = 1 1 + log( df ( h t)) ( ↵ cos( W s , W t) + (1 ↵ ) cos( H s , H t)) , (1 where h s is the seed tag, h t is the candidate tag, W x and H x are the sets of word and hashtags that co-occur with hashtag h x , respectively, cos is the cosine similarit between two vectors, df is the document frequency of a tag, and ↵ is a parameter tha
for the topics (a) #baltimoreriots and (b) #netanyahuspeech. 4.2. Data aspects For each topic, we retrieve all tweets that contain one of its hashtags and that are generated during the observation window. We also ensure that the selected hashtags #baltimoreriots #netanyahuspeech
3: Sample conversation graphs with retweet (top) and follow (bottom) aspects (visualized using the force-directed layout algorithm in Gephi). The left side is controversial, (a,e) #beefban, (b,f) #russia march, while the right side is non-controversial, (c,g) #sxsw, (d,h) #germanwings.
Accept, Sample) "The Nature and Origins of Mass Opinion" Response Axiom: "Individuals form opinions by averaging across the considerations that are immediately salient or accessible to them" Authoritative (inﬂuential) users with high degree set opinions Measure likelihood a user to be exposed to opinions from inﬂuential users on either side
walk that starts from a random vertex in tha visits any high-degree vertex (from either sid dom Walk Controversy (RWC) measure as fo ending in partition X and one ending in par obabilities of two events: (i) both random wa in and (ii) both random walks started in a pa The measure is quantiﬁed as RWC = P XX P Y Y P Y X P XY , X, Y } is the conditional probability AB = Pr [ start in partition A | end in partition probabilities have the following desirable prop Consider two random walks, one ending in partition X and one ending in partition Y , RWC is the difference of the probabilities of two events: (i) both random walks started from the partition they ended in and (ii) both random walks started in a partition other than the one they ended in (a) (b) Partitions obtained for (a) #beefban, (b) #russia march by using the hybrid graph ch. The partitions are more noisy than those in Figures 3(a,b). Subsequently, we select one partition at random (each with probability 0 er a random walk that starts from a random vertex in that partition. Th nates when it visits any high-degree vertex (from either side). deﬁne the Random Walk Controversy (RWC) measure as follows. “Consi m walks, one ending in partition X and one ending in partition Y , RWC nce of the probabilities of two events: (i) both random walks started fr on they ended in and (ii) both random walks started in a partition other t ey ended in.” The measure is quantiﬁed as RWC = P XX P Y Y P Y X P XY , P AB , A, B 2 { X, Y } is the conditional probability P AB = Pr [ start in partition A | end in partition B ] . orementioned probabilities have the following desirable properties: (i) they d by the size of each partition, as the random walk starts with equal pro ach partition, and (ii) they are not skewed by the total degree of vertices
Random walks end on either side with equal probability Not skewed by size of each partition Not skewed by total degree of vertices in each partition Close to 1 when probability of crossing sides low (high controversy) Close to 0 when probability of crossing comparable to that of staying (low controversy)
os-R´ enyi graphs planted with two communities. p1 is the intra-community edge probability, while p2 is the inter-community edge probability. generate random Erd¨ os-R´ enyi graphs with varying community structure, and compute the RWC score on them. Speciﬁcally, to mimic community structure, we plant two separate communities with intra-community edge probability p1 . That is, p1 deﬁnes how dense these communities are within themselves. We then add random edges between these two communities with probability p2 . Therefore, p2 deﬁnes how connected the two communities are. A higher value of p1 and a lower value of p2 create a clearer two-community structure. Figure 10 shows the RWC score for random graphs of 2000 vertices for two different settings: plotting the score as a function of p1 while ﬁxing p2 (Figure 10a), and vice-versa Planted Synthetic Graphs
computed as described before (expected hitting time) Intuition: large difference between polarities Ru and Rv → low acceptance probability p(u,v) Based on retweets and connections Scores bucketed to smooth probabilities p(u, v) = N endorsed (R u , R v ) N exposed (R u , R v )
recompute the RWR Use Sherman-Morrison formula 1 order of magnitude improvement in runtime e by using Fagin’s algorithm [10]. Specif- ut two ranked lists of edges ( u, v ), one g RWC u ! v (as currently produced in hm 1) and another one ranked by de- of acceptance p ( u, v ). Fagin’s algorithm parallel to ﬁnd the edges that optimize E ( u, v ). We refer the interested reader for details [10]. NTAL COMPUTATION OF RWC deﬁned in Section 3 can be computed Rank, which is usually implemented by wever, since we are only interested in mental change in RWC after adding an ew way to e ciently compute it. sition probability matrix P . After the ted) edge from vertex a to vertex b , only ↵ected: the column that corresponds to of the directed edge. Let q be the out lly, before the addition of the edge, the atrix has the following form. Lemma 1 (Sherman-Morrison Formula [14]). be a square n ⇥ n invertible matrix and M 1 its inverse. M over, let a and b be any two column vectors of size n . T the following equation holds (M + abT ) 1 = M 1 M 1abT M 1 / (1 + bT M 1 Now, from Equation (3), the updated RWC, RWC0 RWC0 = (1 ↵ )( cx cy )T( M 0 1 x ex M 0 1 y ey ) , and update in RWC can be written as (RWC) = RWC 0 RWC = (1 ↵ )( cx cy )T ( M 0 1 x ex M 1 x ex ) +( M 1 y ey M 0 1 y ey ) = (1 ↵ )( cx cy )T ✓ ↵M 1 x zxu T M 1 x 1 + ↵u T M 1 x zx ex + ↵M 1 y zyu T M 1 y 1 + ↵u T M 1 y zy ey ◆ .
link prediction. Algorithm Summary AUC Vertex polarity Link recommendation based on 0.79 vertex polarity Adamic-Adar [1] Link prediction based on number 0.60 of common neighbors Reliability [41] Block stochastic model 0.66 RAI [35] Using community detection to 0.60 improve link prediction SLIM [29] Collaborative ﬁltering 0.71 recommendation FISM [20] Content-based recommendation 0.66 which indicates that endorsement graphs across di↵erent datasets have similar edge-formation criteria. We compare our approach with existing link-recommendation methods. The implementations are obtained from Librec [18]. Table 2 reports the results. As we can see, our approach, which uses vertex polarity scores for predicting links, works Acceptance Model
with related approaches (NetGel, MioBi, Shortcut) for 2% of the total edges added. The Greedy algorithm considers all possible edges. Figure 3: Comparison of di↵erent edge-addition strategies after the addition of 50 edges. 50 edges, drawn at random from the sampled vertices, and corresponding to the 4 possible combinations (high/non-high to high/non-high edges). Figure 3 shows the results of these simulations. We see that, despite the fact that high-degree vertices are selected at random, connecting such vertices gives the highest decrease in polarity score (blue line). 6.5 Case study In order to provide qualitative evidence on the functioning of our algorithms on real-world datasets, we conduct a case study on three datasets. The datasets are chosen for the ease of the interpretation of the results, since they represent topics of wider interest (compared to beefban, for example, which is speciﬁc to India). sides, they might be hard to materialize in the real world. This issue is mitigated by ROV-AP, which recommends edges between less popular users, yet connects opposing viewpoints. Examples include the edge (csgv, dloesch) for guncontrol, which connects a pro-gun-control organization to a conservative radio host, or the edge (farhankvirk, pame- lageller), which connects an islamist blogger with a user who wants to “Stop the Islamization of America.”2 Additionally, we provide a quantitative comparison of the output of the two algorithms, ROV and ROV-AP, by ex- tracting several statistics regarding the recommended edges. In particular we consider: ( i ) Total number of followers. We compute the median number of followers from all edges sug- gested by ROV and ROV-AP. A high value indicates that Connecting Inﬂuencers
the algorithm. r each recommended ho retweeted users x ty of the two sets. As by taking the median cates that there is a sers x, y on the topic. 4. We observe that r example, ROV-AP of followers (not ex- re common retweets, Speed−up netanyahu nem tsov ukraine baltim ore indiasdaughter indiana russia_m arch beefban obam acare guncontrol 0 10 20 30 40 50 60 Figure 5: Relative Speed-up produced by using our proposed method in Section 5 Future work. Our approach relies on a random walk- based optimization function [11]. Although this measure has been proven to be e↵ective it has a few drawbacks. In particular, the measure is applicable to controversies having Speed-up
our algorithms for di↵erent datasets. obamacare guncontrol #netanyahuspeech vertex1 vertex2 vertex1 vertex2 vertex1 vertex2 ROV mittromney barackobama ghostpanther barackobama maxblumenthal netanyahu realdonaldtrump truthteam2012 mmﬂint robdelaney bipartisanism lindasuhler barackobama drudge report miafarrow chuckwoolery harryslaststand rednationrising barackobama paulryanvp realalexjones barackobama lindasuhler marwanbishara michelebachmann barackobama goldiehawn jedediahbila thebaxterbean worldnetdaily ROV-AP kksheld ezraklein chuckwoolery csgv farhankvirk pamelageller lolgop romneyresponse liamkﬁsher miafarrow medeabenjamin annebayefsky irritatedwoman motherjones csgv dloesch 2aﬁght sttbs73 hcan romneyresponse jonlovett spreadbutter rednationrising palsjustice klsouth dennisdmz drmartyfox hu↵postpol jvplive chucknellis Table 4: Quantitative comparison of recommenda- tions from ROV and ROV-AP. ⇤ indicates that the result is statistically signiﬁcant with p < 0 . 1, and ⇤⇤ with p < 0 . 001. Signiﬁcance is tested using Welch’s t -test for inequality of means. ROV ROV-AP NumFollowers 50729 36160⇤ ContentOverlap 0.054 0.073⇤⇤ CommonRetweets 0.029 0.063⇤⇤ [7] M. Conover, J. Ratkiewicz, M. Francisco, B. Gon¸ calves, F. Menczer, and A. Flammini. Political Polarization on Twitter. In ICWSM, 2011. Adaptation and Personalization, 2015. [19] D. J. Isenberg. Group polarization: A critical review and meta-analysis. Journal of personality and social psychology, 50(6):1141, 1986. [20] S. Kabbur, X. Ning, and G. Karypis. Fism: factored item similarity models for top-n recommender systems. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 659–667. ACM, 2013. [21] Q. V. Liao and W.-T. Fu. Beyond the ﬁlter bubble: interactive e↵ects of perceived threat and topic involvement on selective exposure to information. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 2359–2368. ACM, 2013. [22] Q. V. Liao and W.-T. Fu. Can you hear me now?: mitigating the echo chamber e↵ect by source position indicators. In Proceedings of the 17th ACM conference on Case Study
discussed in social media We know how to quantify the polarization of a user on a given topic We have a model for acceptance probability of a link in an endorsement graph We have a way to select which link to recommend to reduce controversy
Michael Mathioudakis. "Quantifying Controversy in Social Media." WSDM 2016. Garimella, Kiran, Gianmarco De Francisci Morales, Aristides Gionis, and Michael Mathioudakis. "Reducing Controversy by Connecting Opposing Views." WSDM 2017. [best student paper] Garimella, Kiran, Michael Mathioudakis, Gianmarco De Francisci Morales, and Aristides Gionis. "Exploring Controversy in Twitter." CSCW 2016. Garimella, Kiran, Gianmarco De Francisci Morales, Aristides Gionis, and Michael Mathioudakis. "Mary, Mary, Quite Contrary: Exposing Twitter Users to Contrarian News." WWW 2017. Garimella, Kiran, Gianmarco De Francisci Morales, Aristides Gionis, and Michael Mathioudakis. "The Ebb and Flow of Controversial Debates on Social Media." ICWSM 2017. Garimella, Kiran, Gianmarco De Francisci Morales, Aristides Gionis, and Michael Mathioudakis. "Quantifying Controversy in Social Media." ACM TSC 2017.
J. Huang. "STREAMCUBE: Hierarchical Spatio-temporal Hashtag Clustering for Event Exploration over the Twitter Stream". ICDE 2015. [Fagin2003] R. Fagin, A. Lotem, and M. Naor. "Optimal aggregation algorithms for middleware." Journal of computer and system sciences, 2003.