Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Controversy on Social Media: Collective Attention, Echo Chambers, and Price of Bipartisanship

Controversy on Social Media: Collective Attention, Echo Chambers, and Price of Bipartisanship

More Decks by Gianmarco De Francisci Morales

Other Decks in Research

Transcript

  1. Controversy on Social Media:
    Collective Attention, Echo Chambers,
    and Price of Bipartisanship
    Gianmarco De Francisci Morales
    ISI Foundation
    with

    Kiran Garimella

    Aristides Gionis 

    Michael Mathioudakis

    View Slide

  2. Controversy: from Latin contra (against)
    vertere (turn) “turned against, disputed”

    View Slide

  3. View Slide

  4. View Slide

  5. View Slide

  6. View Slide

  7. The gap grows

    View Slide

  8. The gap grows

    View Slide

  9. Goal
    Understand how controversies 

    unfold in social media

    View Slide

  10. Outline

    View Slide

  11. Outline
    Quantify

    View Slide

  12. Outline
    Quantify
    Graph

    View Slide

  13. Outline
    Quantify
    Graph
    Polarization
    Measure

    View Slide

  14. Outline
    Quantify Evolve
    Graph
    Polarization
    Measure

    View Slide

  15. Outline
    Quantify Evolve
    Time
    Graph
    Polarization
    Measure

    View Slide

  16. Outline
    Quantify Evolve
    Time
    Graph
    Polarization
    Measure
    Collective
    Attention

    View Slide

  17. Outline
    Quantify Evolve
    Cause
    Time
    Graph
    Polarization
    Measure
    Collective
    Attention

    View Slide

  18. Outline
    Quantify Evolve
    Cause
    Time
    Graph
    Opinion
    Polarization
    Measure
    Collective
    Attention

    View Slide

  19. Outline
    Quantify Evolve
    Cause
    Time
    Graph
    Opinion
    Echo
    Chambers
    Polarization
    Measure
    Collective
    Attention

    View Slide

  20. Quantifying Controversy

    in Social Media 


    WSDM 2016, TSC 2018

    View Slide

  21. Black/Blue or White/Gold?

    View Slide

  22. Desiderata
    In the wild
    Not necessarily political
    No domain knowledge
    Language independent
    Allows comparison

    View Slide

  23. Problem Formulation
    Graph-based unsupervised formulation
    Conversation graph for a topic (endorsements)
    Find partition of graph (represents 2 sides)
    Measure distance between partitions (random walks)

    View Slide

  24. Endorsement Graph
    #марш #sxsw

    View Slide

  25. Pipeline
    • Retweets
    • Follow
    • Mentions
    • Content
    • METIS
    • Spectral
    • Label 

    propagation
    • Random walk
    • Edge betweenness
    • 2d embedding
    • Sentiment variance

    View Slide

  26. Example

    View Slide

  27. Example
    #beefban #марш #sxsw #germanwings

    View Slide

  28. Example
    #beefban #марш #sxsw #germanwings
    Controversial Non controversial

    View Slide

  29. RWC Rationale
    Random Walk Controversy score
    Zaller's RAS model (Receive, Accept, Sample) 

    "The Nature and Origins of Mass Opinion"
    Response Axiom: "Individuals form opinions by averaging across
    the considerations that are immediately salient or accessible to
    them"
    Authoritative (influential) users with high degree set opinions
    Measure likelihood of user to be exposed to opinions from influential
    users on either side

    View Slide

  30. Random Walk

    View Slide

  31. Random Walk
    X Y

    View Slide

  32. Random Walk
    X Y

    View Slide

  33. RWC Definition
    we select one partition at random (each with
    walk that starts from a random vertex in tha
    visits any high-degree vertex (from either sid
    dom Walk Controversy (RWC) measure as fo
    ending in partition X and one ending in par
    obabilities of two events: (i) both random wa
    in and (ii) both random walks started in a pa
    The measure is quantified as
    RWC = PXX
    PY Y
    PY X
    PXY
    ,
    X, Y } is the conditional probability
    AB = Pr[start in partition A | end in partition
    probabilities have the following desirable prop
    Consider two random walks, one ending in partition X and one ending in
    partition Y , RWC is the difference of the probabilities of two events: (i)
    both random walks started from the partition they ended in and (ii) both
    random walks started in a partition other than the one they ended in
    (a) (b)
    Partitions obtained for (a) #beefban, (b) #russia march by using the hybrid graph
    ch. The partitions are more noisy than those in Figures 3(a,b).
    Subsequently, we select one partition at random (each with probability 0
    er a random walk that starts from a random vertex in that partition. Th
    nates when it visits any high-degree vertex (from either side).
    define the Random Walk Controversy (RWC) measure as follows. “Consi
    m walks, one ending in partition X and one ending in partition Y , RWC
    nce of the probabilities of two events: (i) both random walks started fr
    on they ended in and (ii) both random walks started in a partition other t
    ey ended in.” The measure is quantified as
    RWC = PXX
    PY Y
    PY X
    PXY
    ,
    PAB
    , A, B 2 {X, Y } is the conditional probability
    PAB = Pr[start in partition A | end in partition B].
    orementioned probabilities have the following desirable properties: (i) they
    d by the size of each partition, as the random walk starts with equal pro
    ach partition, and (ii) they are not skewed by the total degree of vertices

    View Slide

  34. RWC Properties
    Probabilities are conditional on ending in either partition
    Random walks end on either side with equal probability
    Not skewed by size of each partition
    Not skewed by total degree of vertices in each partition
    Close to 1 when probability of crossing sides low (high
    controversy)
    Close to 0 when probability of crossing comparable to that of
    staying (low controversy)

    View Slide

  35. Controversy Detection

    View Slide

  36. Controversy Score

    View Slide

  37. A:19
    (a) (b)
    Fig. 10: RWC scores for synthetic Erd¨
    os-R´
    enyi graphs planted with two communities. p1
    is the
    intra-community edge probability, while p2
    is the inter-community edge probability.
    generate random Erd¨
    os-R´
    enyi graphs with varying community structure, and compute
    the RWC score on them. Specifically, to mimic community structure, we plant two
    separate communities with intra-community edge probability p1
    . That is, p1
    defines how
    dense these communities are within themselves. We then add random edges between
    these two communities with probability p2
    . Therefore, p2
    defines how connected the
    two communities are. A higher value of p1
    and a lower value of p2
    create a clearer
    two-community structure.
    Figure 10 shows the RWC score for random graphs of 2000 vertices for two different
    settings: plotting the score as a function of p1
    while fixing p2
    (Figure 10a), and vice-versa
    Planted Synthetic Graphs

    View Slide

  38. Summary
    RWC: a measure for how controversial a discussion on
    a topic is on social media
    Graph-based measure: no domain knowledge, language
    agnostic
    Intuitive semantics founded on opinion formation models
    Captures controversy better than state-of-the-art
    User-level polarization measure easy to derive

    View Slide

  39. The Effect of Collective Attention on
    Controversial Debates on Social media


    WebSci 2017 (Best Paper Award)

    View Slide

  40. The Effect of Collective Attention on
    Controversial Debates on Social media

    View Slide

  41. The Effect of Collective Attention on
    Controversial Debates on Social media

    View Slide

  42. The Effect of Collective Attention on
    Controversial Debates on Social media

    View Slide

  43. The Effect of Collective Attention on
    Controversial Debates on Social media

    View Slide

  44. "Trump taxes" on Google

    View Slide

  45. "Trump taxes" on Google
    Rachel Maddow
    show on 2005
    tax return

    View Slide

  46. Obamacare on Twitter

    View Slide

  47. Gun Control on Twitter

    View Slide

  48. Literature so far
    Controversial debates examined in isolation
    As static snapshots

    View Slide

  49. Contribution
    Controversial debates are dynamic
    They change with collective attention
    Analyze controversial debates over time
    Particularly when collective attention increases
    When external ‘event’ happens

    View Slide

  50. Data
    Twitter
    4 longitudinal polarized topics
    Obamacare, Abortion, Gun control, Fracking
    5 years (2011 -- 2016)
    Hundreds of thousands of users
    Millions of tweets

    View Slide

  51. Definitions
    Retweet Graph
    Reply Graph
    Core Users

    View Slide

  52. Retweet Graph

    View Slide

  53. Reply graph

    View Slide

  54. Core

    View Slide

  55. Core
    Core Users

    View Slide

  56. Experiments

    View Slide

  57. Experiments
    Compare these

    two points

    View Slide

  58. Retweet Graph

    View Slide

  59. Retweet Graph
    1) New users enter
    the discussion

    View Slide

  60. Retweet Graph
    2) Most retweets to
    existing core users
    1) New users enter
    the discussion

    View Slide

  61. Retweet Graph
    2) Most retweets to
    existing core users
    1) New users enter
    the discussion
    3) Cross-side
    retweets decrease

    View Slide

  62. Retweet Graph
    2) Most retweets to
    existing core users
    1) New users enter
    the discussion
    3) Cross-side
    retweets decrease
    4) Within-side
    retweets increase

    View Slide

  63. Controversy Measure
    Figure 2: RWC score as a function of the activity in the
    retweet network. An increase in interest in the controversial
    topic corresponds to an increase in the controversy score of
    the retweet network.
    5.1 Network
    F
    t
    r
    s
    w

    View Slide

  64. Core-Periphery Openness
    Figure 12: Core–periphery openness as a function of activity
    in the retweet network. As the interest increases, the num-
    ber of core-periphery edges, normalized by the expected
    number of edges in a random network, increases. This sug-
    gests a propensity of periphery nodes to connect with the
    core nodes when interest increases.

    View Slide

  65. Reply Graph
    Cross-side edges increase: more discussion
    Attention
    increases

    View Slide

  66. Content
    Pro Life
    Pro Choice
    Normal

    Condition
    Attention

    Increase

    View Slide

  67. Content
    Pro Life
    Pro Choice
    Normal

    Condition
    Attention

    Increase
    Content becomes uniform across the sides

    View Slide

  68. Long-Term Polarization

    View Slide

  69. Summary
    Controversial debates during external events
    Polarization increases
    Retweet graph becomes hierarchical (core-periphery)
    More replies across sides
    Content becomes more uniform
    Many more results in the paper!

    View Slide

  70. Political Discourse on Social Media
    Echo Chambers, Gatekeepers, 

    and the Price of Bipartisanship
    WWW 2018

    View Slide

  71. Political Discourse on

    Social Media
    Characterized by heavy polarization
    Emergence of echo chambers ("Hear your own voice")
    Might hamper deliberative process in democracy
    Lack of shared world view
    Concern expressed by former US Presidents,
    Facebook, Twitter, and more

    View Slide

  72. Polarization Cause
    Selective exposure?
    People see only content that agrees with their pre-
    existing opinion
    Biased assimilation?
    People pay more attention to content that agrees
    with their pre-existing opinion

    View Slide

  73. Echo Chamber Definition
    Echo = opinion
    Chamber = network
    Joint content + network definition
    Echo chamber = political leaning of content that users
    receive from network agrees with that of content they
    share to the network

    View Slide

  74. Production/Consumption
    Consumption
    What you receive in your feed
    What your followees tweet
    Production
    What you tweet

    View Slide

  75. Political Leaning Scores
    Based on source of the content (500 domains)
    Score derived by self-declared affiliation of sharers on FB
    FoxNews.com is aligned with conservatives (CP = 0.9),

    HuffingtonPost.com is aligned with liberals (CP = 0.17)

    View Slide

  76. Production/Consumption
    Scores
    Polarity scores based on “content” leaning (from source)
    Production score
    Average political leaning of the content the user tweets
    Consumption score
    Average political leaning of the content the user receives on
    their feed
    Results of selection by the user

    View Slide

  77. δ-partisanship
    f
    s
    n-
    r
    e
    m-
    .,
    s
    n
    e
    e
    k
    Figure 1: Example showing the denition of -partisan users.
    The dotted red lines are drawn at and 1- . Users on the left
    of the leftmost dashed red line or right of the rightmost one
    are -partisan.

    View Slide

  78. δ-{partisan,consumer,gatekeeper}
    δ-partisan: produces content with polarity beyond δ
    δ-bipartisan: produces content with polarity within δ
    δ-consumer: consumes content with polarity beyond δ
    δ-gatekeeper: δ-partisan but not δ-consumer
    consumes from both sides but produces content aligned
    with only one side
    blocks information flow towards its community

    View Slide

  79. Network Measures
    Network-based latent-space user polarity
    Based on following politicians with aligned ideology
    Network centrality (PageRank)
    Local clustering coefficient
    Retweet/favorite rates and volumes

    View Slide

  80. (a) (b) (c) (d) (e)
    (f) (g) (h) (i) (j)
    Figure 3: Distribution of production and consumption polarity, for P (rst row) and NP (second row)
    datasets. The scatter plots display the production (x-axis) and consumption ( -axis) polarities of each user in a dataset. Colors
    indicate user polarity sign, following [6] (grey = democrat, yellow = republican). The one-dimensional plots along the axes
    show the distributions of the production and consumption polarities for democrats and republicans.
    Correlation
    (a) (b) (c) (d) (e)
    (f) (g) (h) (i) (j)
    Figure 3: Distribution of production and consumption polarity, for P (rst row) and NP (second row)
    datasets. The scatter plots display the production (x-axis) and consumption ( -axis) polarities of each user in a dataset. Colors
    indicate user polarity sign, following [6] (grey = democrat, yellow = republican). The one-dimensional plots along the axes
    show the distributions of the production and consumption polarities for democrats and republicans.

    View Slide

  81. Correlation: Gun Control

    View Slide

  82. (f) (g) (h) (i) (j)
    Figure 3: Distribution of production and consumption polarity, for P (rst row) and NP (second row)
    datasets. The scatter plots display the production (x-axis) and consumption ( -axis) polarities of each user in a dataset. Colors
    indicate user polarity sign, following [6] (grey = democrat, yellow = republican). The one-dimensional plots along the axes
    show the distributions of the production and consumption polarities for democrats and republicans.
    (a) (b) (c) (d) (e)
    (f) (g) (h) (i) (j)
    Figure 4: Top: Production polarity variance vs. production polarity (mean). Bottom: Consumption polarity variance vs. con-
    sumption polarity (mean).
    However, dierently from the rest of the side they align with, they
    show a lower clustering coecient, an indication that they are
    not completely embedded in a single community. Given that they
    receive content also from the opposing side, this result is to be
    Finally, given that both partisans and gatekeepers sport higher
    centrality, we compare their PageRank values directly and nd that
    there is a signicant dierence: partisans have a higher PageRank
    compared to gatekeepers (gure not shown). This eect is more
    Variance
    (f) (g) (h) (i) (j)
    Figure 3: Distribution of production and consumption polarity, for P (rst row) and NP (second row)
    datasets. The scatter plots display the production (x-axis) and consumption ( -axis) polarities of each user in a dataset. Colors
    indicate user polarity sign, following [6] (grey = democrat, yellow = republican). The one-dimensional plots along the axes
    show the distributions of the production and consumption polarities for democrats and republicans.
    (a) (b) (c) (d) (e)
    (f) (g) (h) (i) (j)
    Figure 4: Top: Production polarity variance vs. production polarity (mean). Bottom: Consumption polarity variance vs. con-
    sumption polarity (mean).
    However, dierently from the rest of the side they align with, they
    show a lower clustering coecient, an indication that they are
    not completely embedded in a single community. Given that they
    receive content also from the opposing side, this result is to be
    Finally, given that both partisans and gatekeepers sport higher
    centrality, we compare their PageRank values directly and nd that
    there is a signicant dierence: partisans have a higher PageRank
    compared to gatekeepers (gure not shown). This eect is more

    View Slide

  83. Variance
    (b) (c)

    View Slide

  84. 0.0 1.0 2.0
    0.2 0.3 0.4
    Large
    Threshold δ
    partisan
    bipartisan
    (a)
    0.0 1.0 2.0
    0.2 0.3 0.4
    Combined
    Threshold δ
    partisan
    bipartisan
    (b)
    −0.5 0.5 1.5 2.5
    0.2 0.3 0.4
    Guncontrol
    Threshold δ
    partisan
    bipartisan
    (c)
    0.0 1.0 2.0
    0.2 0.3 0.4
    Obamacare
    Threshold δ
    partisan
    bipartisan
    (d)
    0.0 1.0 2.0
    0.2 0.3 0.4
    Abortion
    Threshold δ
    partisan
    bipartisan
    (e)
    Figure 5: Absolute value of the user polarity scores for -partisan and -bipartisan users.
    5e−07 2e−06 1e−05
    0.2 0.3 0.4
    Large
    Threshold δ
    partisan
    bipartisan
    (a)
    2e−05 2e−04 2e−03
    0.2 0.3 0.4
    Combined
    Threshold δ
    partisan
    bipartisan
    (b)
    2e−05 2e−04 2e−03
    0.2 0.3 0.4
    Guncontrol
    Threshold δ
    partisan
    bipartisan
    (c)
    1e−05 1e−04 1e−03
    0.2 0.3 0.4
    Obamacare
    Threshold δ
    partisan
    bipartisan
    (d)
    1e−05 1e−04 1e−03
    0.2 0.3 0.4
    Abortion
    Threshold δ
    partisan
    bipartisan
    (e)
    Figure 6: Pagerank for -partisan and -bipartisan users.
    ble 3: Comparison between -gatekeeper users and a ran-
    m sample of normal users. A 3 indicates that the corre-
    onding property is signicantly higher for gatekeepers
    < 0.001) for at least 4 of the 6 thresholds used. A mi-
    s next to the checkmark (-) indicates that the property is
    nicantly lower.
    Table 4: Accuracy for prediction of users who are pa
    sans (p) or gatekeepers ( ). (net) indicates network and p
    le features only, (n-gram) indicates just n-gram featur
    The last two columns show results for all features combin
    p (net) (net) p (n-gram) (n-gram) p
    Price of Bipartisanship
    0.0 1.0 2.0
    0.2 0.3 0.4
    Large
    Threshold δ
    partisan
    bipartisan
    (a)
    0.0 1.0 2.0
    0.2 0.3 0.4
    Combined
    Threshold δ
    partisan
    bipartisan
    (b)
    −0.5 0.5 1.5 2.5
    0.2 0.3 0.4
    Guncontrol
    Threshold δ
    partisan
    bipartisan
    (c)
    0.0 1.0 2.0
    0.2 0.3 0.4
    Obamacare
    Threshold δ
    partisan
    bipartisan
    (d)
    0.0 1.0 2.0
    0.2 0.3 0.4
    Abortion
    Threshold δ
    partisan
    bipartisan
    (e)
    Figure 5: Absolute value of the user polarity scores for -partisan and -bipartisan users.
    5e−07 2e−06 1e−05
    0.2 0.3 0.4
    Large
    Threshold δ
    partisan
    bipartisan
    (a)
    2e−05 2e−04 2e−03
    0.2 0.3 0.4
    Combined
    Threshold δ
    partisan
    bipartisan
    (b)
    2e−05 2e−04 2e−03
    0.2 0.3 0.4
    Guncontrol
    Threshold δ
    partisan
    bipartisan
    (c)
    1e−05 1e−04 1e−03
    0.2 0.3 0.4
    Obamacare
    Threshold δ
    partisan
    bipartisan
    (d)
    1e−05 1e−04 1e−03
    0.2 0.3 0.4
    Abortion
    Threshold δ
    partisan
    bipartisan
    (e)
    Figure 6: Pagerank for -partisan and -bipartisan users.
    ble 3: Comparison between -gatekeeper users and a ran-
    m sample of normal users. A 3 indicates that the corre-
    onding property is signicantly higher for gatekeepers
    < 0.001) for at least 4 of the 6 thresholds used. A mi-
    s next to the checkmark (-) indicates that the property is
    nicantly lower.
    Table 4: Accuracy for prediction of users who are pa
    sans (p) or gatekeepers ( ). (net) indicates network and p
    le features only, (n-gram) indicates just n-gram featur
    The last two columns show results for all features combin
    p (net) (net) p (n-gram) (n-gram) p

    View Slide

  85. Price of Bipartisanship: PR
    hreshold δ
    (b)
    Threshold δ
    (c)
    Thresh
    (d)
    value of the user polarity scores for -partisan and
    0.3 0.4
    ombined
    hreshold δ
    partisan
    bipartisan
    (b)
    2e−05 2e−04 2e−03
    0.2 0.3 0.4
    Guncontrol
    Threshold δ
    partisan
    bipartisan
    (c)
    1e−05 1e−04 1e−03
    0.2 0.3
    Obama
    Thresh
    (d)

    View Slide

  86. Partisans vs Bipartisans

    Gatekeepers vs Non-gatekeepers
    ions
    d of
    g of
    pro-
    urce
    pro-
    the
    and
    ddi-
    for
    ties.
    (for
    and
    mp-
    also
    pro-
    rces.
    ach
    The
    Table 2: Comparison of various features for partisans & bi-
    partisans and gatekeepers & non-gatekeepers. A 3 indicates
    that the corresponding feature is signicantly higher for the
    group of the column (p < 0.001) for at least 4 of the 6 thresh-
    olds used, for most datasets. A minus next to the check-
    mark (-) indicates that the feature is signicantly lower.
    Features Partisans Gatekeepers
    PageRank 3 3
    clustering coecient 3 (-) 3 (-)
    user polarity 3 (-) 3 (-)
    degree 3 3
    retweet rate 3 7
    retweet volume 3 7
    favorite rate 3 7
    favorite volume 3 7
    # followers 7 7
    # friends 7 7
    # tweets 7 7
    age on Twitter 7 7
    datasets).9 A “3 (-)” means that the property is signicantly lower

    View Slide

  87. Summary
    Find echo chambers in political discussion on Twitter
    Definition of echo chambers with two elements:
    Content (echo) + Network (chamber)
    Data supports the selective exposure theory
    Bi-partisan users pay a price in terms of network
    centrality and content appreciation

    View Slide

  88. Conclusions
    How do controversies unfold on social media?
    Measuring is the first step (RWC)
    Controversies are dynamic (time is an important factor)
    Collective attention increases polarization
    Echo chambers associated with controversies
    Evidence of selective exposure and price of bi-partisanship

    View Slide

  89. What's next?
    Joint opinion formation + network generation model
    Adding data to opinion dynamics models
    Temporal dynamics of the process
    Application to other contexts (Reddit, Facebook)
    Interventions: can we do something about it?

    View Slide

  90. Thanks!
    Questions please!
    64
    @gdfm7
    [email protected]

    View Slide