Psephology 101

Psephology 101 BREAK INTO DATA SCIENCE John Sandall 6th June
2019 @john_sandall @SixFiftyData

BREAK INTO DATA SCIENCE Hello

WHAT IS DATA SCIENCE? Call to arms

Comparison With Other Forecasts Forecast Predicted Conservative Majority YouGov -24
(hung) Final SixFifty ML model +22 SixFifty national UNS model +24 New Statesman +24 SixFifty regional UNS model +34 Lord Ashcroft +64 Elections Etc +66 Electoral Calculus +66 Election Forecast +82

Comparison With Other Forecasts https://www.thesun.co.uk/news/3686937/pollster-yougov-is-mocked-over-utter-tripe-poll-which-shows-theresa-may-losing-her-majority-in-election/

UK #GE2017 Final Seat Projections

TL;DR In seven weeks we built the most accurate election
forecast built purely from public data. UK #GE2017 Final Seat Projections

BREAK INTO DATA SCIENCE How To Forecast A General Election

What are we doing? National vote share forecasts get a
lot of media attention... https://www.theguardian.com/politics/2019/jun/01/brexit-party-nigel-farage-lead- opinion-poll-conservatives-opinium https://www.electoralcalculus.co.uk/

What are we doing? ...but seat level forecasts can feed
into strategic decision making https://www.electoralcalculus.co.uk/

Forecasting Techniques 1.Extrapolate from national polls 2.Extrapolate from regional polling
3.Maybe we can get better seat level accuracy by taking into account what happened last time?

Uniform National Swing: A Case Study Welcome to Shefﬁeld Hallam!
(MP: Nick Clegg) 2010 Results (Sheffield Hallam) 24% 16% 53%

2010 Results (Sheffield Hallam) 2010 National Results 2015 National Polling
24% 36% 33% 16% 29% 33% 53% 23% 9% Uniform National Swing: A Case Study Step 1. Compare national results with latest polling

Uniform National Swing: A Case Study 2010 Results (Sheffield Hallam)
2010 National Results 2015 National Polling Uniform National Swing 24% 36% 33% -8% 16% 29% 33% +13% 53% 23% 9% -62% Step 2. Calculate "uniform national swing" (i.e. uplift)

2010 National Results 2015 National Polling Uniform National Swing 2015 Forecast (Sheffield Hallam) 24% 36% 33% -8% 22% 16% 29% 33% +13% 18% 53% 23% 9% -62% 20% Step 3. Apply UNS to each constituency

2010 National Results 2015 National Polling Uniform National Swing 2015 Forecast (Sheffield Hallam) 24% 36% 33% -8% 22% 16% 29% 33% +13% 18% 53% 23% 9% -62% 20% Step 4. Forecast winner (Conservative victory?)

Uniform National Swing: A Case Study Step 5. So who
won? (Liberal Democrat victory) 2010 Results (Sheffield Hallam) 2010 National Results 2015 National Polling Uniform National Swing 2015 Forecast (Sheffield Hallam) 2015 Results (Sheffield Hallam) 24% 36% 33% -8% 22% 14% 16% 29% 33% +13% 18% 36% 53% 23% 9% -62% 20% 40%

Is there a better way? • Use regional polling where
available. • Model out regional polls from national poll breakdowns. • Adjust each pollster for historical reliability or bias. • Adjust polls based on how they weight undecided voters. • Adjust based on current sentiments around polling accuracy. Uniform National Swing: A Case Study

http://britainelects.com/polling/westminster/

https://projects.fivethirtyeight.com/pollster-ratings/

Multilevel Regression and Post-stratiﬁcation (MRP) Read more: https://yougov.co.uk/topics/politics/articles-reports/ 2017/05/31/how-yougov-model-2017-general-election-works

• Use rigorous & modern modelling techniques. • Cross-validate, backtest,
evaluate for predictive accuracy. • Blend in multiple data sources, not just polling. • Greater understanding of what drives election outcomes. • Open source our code, data, methodology. Is there a better way?

BREAK INTO DATA SCIENCE How To Evaluate A Forecast Methodology

Backtesting If we simulated the last three elections... ...using ONLY
data that was available before election night... ...how does each technique do?

Backtesting If we simulated the last three elections... ...using ONLY
data that was available before election night... ...how does each technique do? Error / seat 50% 58% 67% 75% 2010 2015 2017

BREAK INTO DATA SCIENCE Polling data

What does polling data look like?

Raw data looks like:

Raw data contains: • Voting Intention (“Which party will you
be voting for on June 8th?”) • Party leader satisfaction • Policy preferences (“Do you think tuition fees should be abolished?”) • Demographic background (location, gender, age, education, etc) • Voted during EU Referendum? Remain or Leave? • Voted during 2015 general election? Which party voted for? • Questions designed to gauge likelihood of voting

Easily accessible? Methodology? Regional?

Polling data on SixFifty.org.uk https://sixﬁfty.org.uk/polls

Regional polling data

Auto-extraction of polling data • Pay people? Done this, expensive,
inaccurate. • Scraping PDFs (Tabula)? Done this, costly, brittle, doesn't scale. • Deep Learning? Dropbox solved a much simpler problem with a much larger team, solve this and you have a unicorn! • Collaboration with pollsters? Slow road, but the most realistic; why would they invest in this? • Regardless, detailed historical data pre-2012 (with demographic breakouts) is hard to ﬁnd.

BREAK INTO DATA SCIENCE Open Data + Politics

Data challenges • No single data hub for political science.
• Lack of consistent identiﬁers makes it hard to join datasets. • Joining overlapping geographic regions is difﬁcult (e.g. census districts to electoral consituencies). • There is no mechanism for political scientists and analysts to share reliably pre-processed data. • Lack of clear data licences inhibits sharing/republishing data.

Open Data http://www.nationalarchives.gov.uk/doc/open-government-licence/version/2/

Are Polls Open Data?

• Python package for automatically downloading, processing, joining and creating
model ready datasets. • Automatically cleans and standardises lookup identiﬁers. • Moves towards open data whilst respecting licenses by doing all of this on device. My solution What else can I do? What can communities like Campaign Lab do?

BREAK INTO DATA SCIENCE @john_sandall @SixFiftyData Thank You

Psephology 101

Psephology 101

More Decks by John Sandall

Other Decks in Technology

Featured

Transcript