Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Putting it all on the line

Putting it all on the line

Some of the things that I do

Will Lowe

April 25, 2013
Tweet

More Decks by Will Lowe

Other Decks in Research

Transcript

  1. Putting it all on the line Will Lowe MZES/SFB/Eurodata University

    of Mannheim (and sometime CS, University of Bath)
  2. . . . and you are? I’m a political methodologist

    The role of a methodologist is to clarify the substantive assumptions and technical limitations of research methods
  3. . . . and you are? I’m a political methodologist

    The role of a methodologist is to clarify the substantive assumptions and technical limitations of research methods prevent social scientists worrying about di erences and choices that will not matter
  4. . . . and you are? I’m a political methodologist

    The role of a methodologist is to clarify the substantive assumptions and technical limitations of research methods prevent social scientists worrying about di erences and choices that will not matter o er general solutions to general problems
  5. Today’s Applications Scaling text to extract positions of political actors

    Visualising conversation topics in protest movements Measurement models of international conflict
  6. Estimating Positions Political scientists need to estimate positions or ‘ideal

    points’ for political actors, from: Judgements from experts, or voters Legislative votes Raw and pre-coded texts, e.g. manifestos, media, speeches
  7. Estimating Positions Political scientists need to estimate positions or ‘ideal

    points’ for political actors, from: Judgements from experts, or voters Legislative votes Raw and pre-coded texts, e.g. manifestos, media, speeches Let me start by telling you something about scaling text . . .
  8. Estimating Positions It looks like there are a lot of

    ways to do it , e.g. (Budge et al. 1987; Laver & Garry, 2000; Laver et al. 2003; Monroe & Maeda, 2004; Slapin & Proksch, 2008; El , 2013) with di erent models , assumptions , and estimators
  9. Estimating Positions It looks like there are a lot of

    ways to do it , e.g. (Budge et al. 1987; Laver & Garry, 2000; Laver et al. 2003; Monroe & Maeda, 2004; Slapin & Proksch, 2008; El , 2013) with di erent models , assumptions , and estimators . . . but there aren’t
  10. Estimating Positions Two claims: Positions are taken by changing the

    relative (proportional) emphasis of countable items: words, topics, categories.
  11. Estimating Positions Two claims: Positions are taken by changing the

    relative (proportional) emphasis of countable items: words, topics, categories. Dimensions are low-dimensional latent spaces that best explain variation across relative emphases
  12. Estimating Positions Two claims: Positions are taken by changing the

    relative (proportional) emphasis of countable items: words, topics, categories. Dimensions are low-dimensional latent spaces that best explain variation across relative emphases imply one model for scaling count data
  13. Unifying Theory In current work I show that existing methods

    are either that model , approximations to that model, or special cases of that model
  14. Unifying Theory In current work I show that existing methods

    are either that model , approximations to that model, or special cases of that model Practical consequences: clearer substantive assumptions, easier extensions, better estimation procedures, and uncertainty measures, new visualisation methods
  15. The Data A word frequency matrix is a contingency table

    W1 W2 W3 W4 W5 W6 W7 W8 ? D1 4 6 2 2 15 6 2 3 40 ? D2 12 17 8 10 49 48 27 29 200 ? D3 6 5 2 3 14 10 11 9 60 ? D4 5 6 6 13 18 14 19 19 100 ? D5 7 3 17 19 15 21 56 42 180 ? D6 2 3 13 17 13 10 31 51 140 36 40 48 64 124 109 146 153 720
  16. The Data A word frequency matrix is a contingency table

    W1 W2 W3 W4 W5 W6 W7 W8 ? D1 4 6 2 2 15 6 2 3 40 ? D2 12 17 8 10 49 48 27 29 200 ? D3 6 5 2 3 14 10 11 9 60 ? D4 5 6 6 13 18 14 19 19 100 ? D5 7 3 17 19 15 21 56 42 180 ? D6 2 3 13 17 13 10 31 51 140 36 40 48 64 124 109 146 153 720 What positions generate these counts?
  17. The Data A word frequency matrix is a contingency table

    W1 W2 W3 W4 W5 W6 W7 W8 ? D1 4 6 2 2 15 6 2 3 40 ? D2 12 17 8 10 49 48 27 29 200 ? D3 6 5 2 3 14 10 11 9 60 ? D4 5 6 6 13 18 14 19 19 100 ? D5 7 3 17 19 15 21 56 42 180 ? D6 2 3 13 17 13 10 31 51 140 36 40 48 64 124 109 146 153 720 What positions generate these counts? Let’s consider some traditional models. . .
  18. . . . that won’t work W1 W2 W3 W4

    W5 W6 W7 W8 ? D1 4 6 2 2 15 6 2 3 40 ? D2 12 17 8 10 49 48 27 29 200 ? D3 6 5 2 3 14 10 11 9 60 ? D4 5 6 6 13 18 14 19 19 100 ? D5 7 3 17 19 15 21 56 42 180 ? D6 2 3 13 17 13 10 31 51 140 36 40 48 64 124 109 146 153 720 There are two log-linear models of this table: log —ij = ¸i + j (independence: boring ) = ¸i + j + –ij (saturated: pointless )
  19. Finding where the action is Two traditional but useless models:

    log —ij = ¸i + j (independence) = ¸i + j + –ij (saturated) All the position-taking action is in – . . . so let’s analyse that
  20. Extracting a Latent Space Intuition: Every matrix has an orthogonal

    decomposition – = ˆ˚ BT (SVD) = M X m „ ( m ) ( m ) ˛T ( m ) ı „ ˛T (Rank 1 approx.)
  21. Extracting a Latent Space Intuition: Every matrix has an orthogonal

    decomposition – = ˆ˚ BT (SVD) = M X m „ ( m ) ( m ) ˛T ( m ) ı „ ˛T (Rank 1 approx.) „ are document positions , ˛ are word positions says how much positioning action is in this dimension
  22. Extracting a Latent Space ˛ -1.11 -1.7 0.89 0.81 -1.22

    -0.85 0.81 0.91 W1 W2 W3 W4 W5 W6 W7 W8 „ -1.37 D1 4 6 2 2 15 6 2 3 40 -0.61 D2 12 17 8 10 49 48 27 29 200 -0.45 D3 6 5 2 3 14 10 11 9 60 0.21 D4 5 6 6 13 18 14 19 19 100 1.00 D5 7 3 17 19 15 21 56 42 180 1.22 D6 2 3 13 17 13 10 31 51 140 36 40 48 64 124 109 146 153 720 = 0 : 39 .
  23. Extracting a Latent Space ˛ -1.11 -1.7 0.89 0.81 -1.22

    -0.85 0.81 0.91 W1 W2 W3 W4 W5 W6 W7 W8 „ -1.37 D1 4 6 2 2 15 6 2 3 40 -0.61 D2 12 17 8 10 49 48 27 29 200 -0.45 D3 6 5 2 3 14 10 11 9 60 0.21 D4 5 6 6 13 18 14 19 19 100 1.00 D5 7 3 17 19 15 21 56 42 180 1.22 D6 2 3 13 17 13 10 31 51 140 36 40 48 64 124 109 146 153 720 = 0 : 39 . Word scores ˛ are not optional. . .
  24. Models The model in 1 dimension (Goodman, 1981) log —ij

    = ¸i + j + „i ˛j (RC Model) or (Monroe & Maeda 2004; Slapin & Proksch 2007) log —ij = ¸i + j + „i˛j (Wordfish)
  25. Models The model in 1 dimension (Goodman, 1981) log —ij

    = ¸i + j + „i ˛j (RC Model) or (Monroe & Maeda 2004; Slapin & Proksch 2007) log —ij = ¸i + j + „i˛j (Wordfish) in a least squares approximation (Benzecri, 1973) pij = ricj(1 + „i ˛j) (CA) with Wordscores as a special case (Lowe, 2008)
  26. Practical Consequences Substantive: Opening black boxes dimensional association or a

    correlation coe cient Opens the way to multidimensional models No more empirical papers comparing count scaling methods
  27. Practical Consequences Substantive: Opening black boxes dimensional association or a

    correlation coe cient Opens the way to multidimensional models No more empirical papers comparing count scaling methods Statistical: RC model estimates Germany party positions in ı 30 mins, CA in ı 1 second Statistical theory now available for both (e.g. Becker, Gilula, Lowe and Benoit, 2012)
  28. DE Party Manifestos: Economy • • • • • •

    • • • • • • • • • • • • • • 0.15 0.18 0.21 0.24 0.27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 sigma (canonical correlation) Value
  29. FDP FDP FDP FDP FDP PDS PDS PDS PDS PDS

    GREENS GREENS GREENS GREENS GREENS SPD SPD SPD SPD SPD CDU CDU CDU CDU CDU −0.25 0.00 0.25 −0.75 −0.50 −0.25 0.00 0.25 0.50 Dimension 1 Dimension 2
  30. Validity Much work comparing model positions and uncertainty to manually

    categorised sentences (Lowe et al. 2011, Benoit et al. 2012) legislative speeches (Lowe & Benoit 2011, 2012, forthcoming)
  31. Validity Much work comparing model positions and uncertainty to manually

    categorised sentences (Lowe et al. 2011, Benoit et al. 2012) legislative speeches (Lowe & Benoit 2011, 2012, forthcoming) The match is good, but not perfect : Multiple dimensions, drifting language and topics Extra -dimensional structure, e.g. gov. and opposition These are the topics of current work
  32. −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 −1.5 −1.0

    −0.5 0.0 0.5 1.0 1.5 2.0 Human Direct Scaling Estimates Poisson Scaling Estimates Lenihan, Brian FF Cowen, Brian FF Gilmore, Eamon LAB Kenny, Enda FG Bruton, Richard FG Quinn, Ruairi LAB Higgins, Michael LAB Burton, Joan LAB ODonnell, Kieran FG Morgan, Arthur SF OCaolain, Caoimhghin SF Gormley, John Green Cuffe, Ciaran Green Ryan, Eamon Green GOVERNMENT OPPOSITION • • • • • • • • • • • • • •
  33. How to talk like you’re not in government • •

    • • • • • • • −2 −1 0 1 2 3 −2 −1 0 1 2 3 Human positions with uncertainty Model position with word bootstrap uncertainty Gilmore LAB Kenny FG Bruton FG Quinn LAB Burton LAB ODonnell FG Morgan SF OCaolain SF Higgins LAB
  34. Visualising Relative Emphasis Manually-coded Twitter conversations about #OccupyWallStreet (Theocharis et

    al., 2013) ESP GRE USA capitalism/crisis 33 68 85 government ine ciency 1 33 26 media criticism 42 22 73 other political topic 40 3 14 protest acts and movement 487 409 479 resentment of political elite 101 118 19 Fit a 2-dimensional model and visualise this table using a biplot
  35. -2 -1 0 1 -1.0 -0.5 0.0 0.5 1.0 capitalism/crisis

    government inefficiency media criticism other political topic protest acts and movement resentment of political elite ESP GRE USA
  36. Estimating Conflict Levels Quantitative international relations and conflict researchers often

    need data on international events , for understanding conflict dynamics and the consequences of intervention evaluating risk and forecasting measuring connectedness in the international system
  37. Estimating Conflict Levels Quantitative international relations and conflict researchers often

    need data on international events , for understanding conflict dynamics and the consequences of intervention evaluating risk and forecasting measuring connectedness in the international system Extracted from newswire by machine in the form of [Date, SOURCE, EVENT, TARGET]
  38. Measuring Conflict from Events Experts assign conflict scores to >100

    event types Code Event Score [-10,10) . . . . . . . . . 211 Seize position -9.2 . . . . . . . . . 194 Halt negotiations -3.8 . . . . . . . . . 101 O er proposal 1.5 . . . . . . . . . 071 Extend economic aid 7.4 . . . . . . . . . The standard since Goldstein, 1992. (It’s expensive)
  39. . . . as Scaling Can we understand conflict level

    estimation as a count data scaling problem ?
  40. . . . as Scaling Can we understand conflict level

    estimation as a count data scaling problem ? „ is the unobserved conflict level in a dyad/week ˛ is the unobserved ‘conflictualness’ of each event type Marginal event counts depend on intensity of media coverage how easy it is to generate di erent events
  41. . . . as Scaling Can we understand conflict level

    estimation as a count data scaling problem ? „ is the unobserved conflict level in a dyad/week ˛ is the unobserved ‘conflictualness’ of each event type Marginal event counts depend on intensity of media coverage how easy it is to generate di erent events Unsuccessfully attempted with IRT models (Schrodt 2012)
  42. Measuring Conflict from Events Excerpt from 260 weeks of events

    in the Serbia-Bosnia dyad Date Mat. Conf. Mat. Coop. Verbal Conf. Verbal Coop. . . . . . . . . . . . . . . . 9.7.1995 29 3 4 5 16.7.1995 21 4 12 13 23.7.1995 12 3 4 1 30.7.1995 4 2 2 4 . . . . . . . . . . . . . . .
  43. Measuring Conflict from Events Excerpt from 260 weeks of events

    in the Serbia-Bosnia dyad Date Mat. Conf. Mat. Coop. Verbal Conf. Verbal Coop. . . . . . . . . . . . . . . . 9.7.1995 29 3 4 5 16.7.1995 21 4 12 13 23.7.1995 12 3 4 1 30.7.1995 4 2 2 4 . . . . . . . . . . . . . . . Aggregate event types, fit models, compare to expert scores
  44. Conflict: Serbia ) Bosnia −10 −5 0 5 −10 −5

    0 5 −10 −5 0 5 Manual CA Assoc 1993 1994 1995 1996 Conflict / Cooperation
  45. . . . recovered as Dimension 1 r = 0

    : 9 r = 0 : 85 • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •• • • • • • • • • • • • • • • • • • • • • • • • • •• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • −10 −5 0 5 −10 −5 0 5 Manual CA • • • • • • • • • • • • • • •• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •• • • • • • • • • • • • • • • • • • • • • • • • • •• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • −10 −5 0 5 −10 −5 0 5 Manual Assoc
  46. Event/word scores Material Conflict Material Cooperation Verbal Conflict Verbal Cooperation

    • • • • −1 0 1 −1 0 1 2 Dimension 1 (Conflict / Cooperation) Dimension 2 (Material / Verbal)
  47. Event/word Scores Goldstein Model no weights weights Material Conflict -1.27

    -1.13 -1.32 Verbal Conflict -0.28 -0.54 -0.20 Verbal Cooperation 0.55 0.95 0.57 Material Cooperation 1.00 0.73 0.95 r = : 95 r = : 99 Taking manual coding uncertainty into account, perfectly recovers the aggregated event scores
  48. Quantitative IR Agenda Manual event extraction automated c.1990 Compared machine

    event extraction performance to human coders (King & Lowe, 2003) Distributed 100M events machine extracted data (Lowe & King, 2003) and R tools (Lowe, 2012) Manual conflict measures, assigned c.1992 Automated and generalised (Lowe, APSA 2012)
  49. Quantitative IR Agenda Future work: Dynamic measurement models for intervention

    analysis (Lowe & Stewart, MS) Conflict and dyad-specific event scaling
  50. Quantitative IR Agenda Future work: Dynamic measurement models for intervention

    analysis (Lowe & Stewart, MS) Conflict and dyad-specific event scaling Now that GDELT (200M events since 1979) is available . . . realtime event analysis with big data
  51. Relative proportional emphasis This model Cij ‰ Poisson ( —ij)

    log —ij = ¸i + j + „i ˛j turns into this model [ Ci 1 : : : CiV ] ‰ Multinomial ( ıi; Ni) log „ ıij ıik « = j=k + „i ˛j=k when you condition on Ni (Baker, 1994)
  52. Projects (1) Text-related selection models for legislative speech content (C4)

    measurement and strategic use of linguistic vagueness (C4) models for classical content analysis and coder error experimental investigation of relative emphasis
  53. Projects (2) Count data-related Forecasting and intervention modelling with event

    data hierarchical ideal point modelling (covariates on „) extensions to network structured data, e.g. social media
  54. Projects (3) Neither text nor counts Rare event models without

    a sense of proportion Causal inference under awkward measurement assumptions, e.g. SEM