Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Using Python to Get Out the Vote

Using Python to Get Out the Vote

Slides from my PyOhio 2015 talk about using Python tools to drive the voter outreach campaign for a recent school levy ballot issue, featuring a breezy introduction to CSVs, SQLAlchemy, and Alembic.

Mike Pirnat

August 01, 2015
Tweet

More Decks by Mike Pirnat

Other Decks in Programming

Transcript

  1. VOTE USING PYTHON TO GET OUT THE Mike Pirnat ž

    @mpirnat ž mike.pirnat.com ž  ž  ž  ž  ž
  2. Objectives • Manage committee outreach to voters • Mailing list

    • Door-to-door list • Call lists by school • Minimize wasted contacts • Avoid redundant contacts • Repeatable; re-create database at any time
  3. Sources of Data CSVs from Ohio Board of Elections: •

    List of registered voters • Absentee ballots
  4. Sources of Data Excel spreadsheets (XLSX) from levy committee: •

    School directory data • School district staff list • Early childhood PTA list • Do not contact list
  5. Technologies • Excel/Numbers • Python • Standard library: csv, datetime,

    re • Standard library: SQLite • SQLAlchemy • Alembic
  6. Voter CSV - name - BOE voter id - precinct

    - ward - address - party - status - birth year - ballot history Student CSV - name - address - parent names - phone - school Absentee Ballot CSV - name - BOE voter id - address - date requested - date returned - challenged PTA CSV - name - address - phone - email - position - kid info Staff CSV - name - school - position - address - phone - email Do Not Contact CSV - name - address
  7. Voter CSV - name - BOE voter id - party

    - status - birth year - ballot history Student CSV - name - school Absentee Ballot CSV - name - BOE voter id - address - date requested - date returned - challenged PTA CSV - name - email - position - kid info Staff CSV - name - school - position - email Do Not Contact CSV - name - address - precinct - ward - address - address - address - address - phone - phone - phone - parent names
  8. Voter CSV - name - BOE voter id - party

    - status - birth year - ballot history Student CSV - name - school Absentee Ballot CSV - name - BOE voter id - address - date requested - date returned - challenged PTA CSV - name - email - position - kid info Staff CSV - name - school - position - email Do Not Contact CSV - name - address - precinct - ward - address - address - address - address - phone - phone - phone - parent names Household - address - precinct - ward - parent names - phone - staff - pta - do not contact
  9. Voter CSV - name - BOE voter id - precinct

    - ward - address - party - status - birth year - ballot history Student CSV - address - parent names - phone Absentee Ballot CSV - name - BOE voter id - address - date requested - date returned - challenged PTA CSV - name - address - phone - email - position - kid info Staff CSV - name - school - position - address - phone - email Do Not Contact CSV - name - address - name - school
  10. Voter CSV - name - BOE voter id - precinct

    - ward - address - party - status - birth year - ballot history Student CSV - address - parent names - phone Absentee Ballot CSV - name - BOE voter id - address - date requested - date returned - challenged PTA CSV - name - address - phone - email - position - kid info Staff CSV - name - school - position - address - phone - email Do Not Contact CSV - name - address - name - school Student - name - school
  11. Voter CSV - precinct - ward - address - birth

    year - ballot history Student CSV - name - address - parent names - phone - school Absentee Ballot CSV - name - address PTA CSV - name - address - phone - email - position - kid info Staff CSV - name - school - position - address - phone - email Do Not Contact CSV - name - address - name - party - status - BOE voter id - date requested - BOE voter id - date returned - challenged
  12. Voter CSV - precinct - ward - address - birth

    year - ballot history Student CSV - name - address - parent names - phone - school Absentee Ballot CSV - name - address PTA CSV - name - address - phone - email - position - kid info Staff CSV - name - school - position - address - phone - email Do Not Contact CSV - name - address - name - party - status - BOE voter id - date requested - BOE voter id - date returned - challenged Voter - name - party - status - BOE voter id - absentee requested - absentee returned - absentee challenged
  13. Voter CSV - name - BOE voter id - precinct

    - ward - address - party - status - birth year Student CSV - name - address - parent names - phone - school Absentee Ballot CSV - name - BOE voter id - address - date requested - date returned - challenged PTA CSV - name - address - phone - email - position - kid info Staff CSV - name - school - position - address - phone - email Do Not Contact CSV - name - address - ballot history
  14. Voter CSV - name - BOE voter id - precinct

    - ward - address - party - status - birth year Student CSV - name - address - parent names - phone - school Absentee Ballot CSV - name - BOE voter id - address - date requested - date returned - challenged PTA CSV - name - address - phone - email - position - kid info Staff CSV - name - school - position - address - phone - email Do Not Contact CSV - name - address - ballot history Ballot - ballot type - election type - date
  15. Household - address - precinct - ward - parent names

    - phone - staff - pta - do not contact Student - name - school Voter - name - party - status - BOE voter id - absentee requested - absentee returned - absentee challenged Ballot - type - date
  16. # levydb/models.py... from sqlalchemy import Column, Integer, String, \ Boolean,

    Date, ForeignKey from sqlalchemy.orm import relationship, backref from sqlalchemy.ext.declarative import declarative_base Base = declarative_base()
  17. # levydb/models.py... from sqlalchemy import Column, Integer, String, \ Boolean,

    Date, ForeignKey from sqlalchemy.orm import relationship, backref from sqlalchemy.ext.declarative import declarative_base Base = declarative_base()
  18. # levydb/models.py... from sqlalchemy import Column, Integer, String, \ Boolean,

    Date, ForeignKey from sqlalchemy.orm import relationship, backref from sqlalchemy.ext.declarative import declarative_base Base = declarative_base()
  19. # levydb/models.py... from sqlalchemy import Column, Integer, String, \ Boolean,

    Date, ForeignKey from sqlalchemy.orm import relationship, backref from sqlalchemy.ext.declarative import declarative_base Base = declarative_base()
  20. class Household(Base): __tablename__ = 'households' id = Column(Integer, primary_key=True) precinct

    = Column(String) ward = Column(String) address = Column(String) mother_last_name = Column(String) mother_first_name = Column(String) father_last_name = Column(String) father_first_name = Column(String) phone = Column(String) do_not_contact = Column(Boolean) early_childhood_pta = Column(Boolean) staff = Column(Boolean) students = relationship('Student', order_by='Student.id', backref='household', cascade='all, delete, delete-orphan') voters = relationship('Voter', order_by='Voter.id', backref='household', cascade='all, delete, delete-orphan')
  21. class Household(Base): __tablename__ = 'households' id = Column(Integer, primary_key=True) precinct

    = Column(String) ward = Column(String) address = Column(String) mother_last_name = Column(String) mother_first_name = Column(String) father_last_name = Column(String) father_first_name = Column(String) phone = Column(String) do_not_contact = Column(Boolean) early_childhood_pta = Column(Boolean) staff = Column(Boolean) students = relationship('Student', order_by='Student.id', backref='household', cascade='all, delete, delete-orphan') voters = relationship('Voter', order_by='Voter.id', backref='household', cascade='all, delete, delete-orphan')
  22. class Household(Base): __tablename__ = 'households' id = Column(Integer, primary_key=True) precinct

    = Column(String) ward = Column(String) address = Column(String) mother_last_name = Column(String) mother_first_name = Column(String) father_last_name = Column(String) father_first_name = Column(String) phone = Column(String) do_not_contact = Column(Boolean) early_childhood_pta = Column(Boolean) staff = Column(Boolean) students = relationship('Student', order_by='Student.id', backref='household', cascade='all, delete, delete-orphan') voters = relationship('Voter', order_by='Voter.id', backref='household', cascade='all, delete, delete-orphan')
  23. class Household(Base): __tablename__ = 'households' id = Column(Integer, primary_key=True) precinct

    = Column(String) ward = Column(String) address = Column(String) mother_last_name = Column(String) mother_first_name = Column(String) father_last_name = Column(String) father_first_name = Column(String) phone = Column(String) do_not_contact = Column(Boolean) early_childhood_pta = Column(Boolean) staff = Column(Boolean) students = relationship('Student', order_by='Student.id', backref='household', cascade='all, delete, delete-orphan') voters = relationship('Voter', order_by='Voter.id', backref='household', cascade='all, delete, delete-orphan')
  24. class Voter(Base): __tablename__ = 'voters' id = Column(Integer, primary_key=True) boe_voter_id

    = Column(Integer) last_name = Column(String) first_name = Column(String) middle_name = Column(String) suffix = Column(String) birth_year = Column(Integer) party = Column(String) status = Column(String) absentee_ballot_requested = Column(Boolean) absentee_ballot_returned = Column(Date) absentee_ballot_challenged = Column(String) household_id = Column(Integer, ForeignKey('households.id')) ballots = relationship('Ballot', order_by='Ballot.id', backref='voter', cascade='all, delete, delete-orphan')
  25. class Voter(Base): __tablename__ = 'voters' id = Column(Integer, primary_key=True) boe_voter_id

    = Column(Integer) last_name = Column(String) first_name = Column(String) middle_name = Column(String) suffix = Column(String) birth_year = Column(Integer) party = Column(String) status = Column(String) absentee_ballot_requested = Column(Boolean) absentee_ballot_returned = Column(Date) absentee_ballot_challenged = Column(String) household_id = Column(Integer, ForeignKey('households.id')) ballots = relationship('Ballot', order_by='Ballot.id', backref='voter', cascade='all, delete, delete-orphan')
  26. class Voter(Base): __tablename__ = 'voters' id = Column(Integer, primary_key=True) boe_voter_id

    = Column(Integer) last_name = Column(String) first_name = Column(String) middle_name = Column(String) suffix = Column(String) birth_year = Column(Integer) party = Column(String) status = Column(String) absentee_ballot_requested = Column(Boolean) absentee_ballot_returned = Column(Date) absentee_ballot_challenged = Column(String) household_id = Column(Integer, ForeignKey('households.id')) ballots = relationship('Ballot', order_by='Ballot.id', backref='voter', cascade='all, delete, delete-orphan')
  27. class Voter(Base): __tablename__ = 'voters' id = Column(Integer, primary_key=True) boe_voter_id

    = Column(Integer) last_name = Column(String) first_name = Column(String) middle_name = Column(String) suffix = Column(String) birth_year = Column(Integer) party = Column(String) status = Column(String) absentee_ballot_requested = Column(Boolean) absentee_ballot_returned = Column(Date) absentee_ballot_challenged = Column(String) household_id = Column(Integer, ForeignKey('households.id')) ballots = relationship('Ballot', order_by='Ballot.id', backref='voter', cascade='all, delete, delete-orphan')
  28. class Voter(Base): __tablename__ = 'voters' id = Column(Integer, primary_key=True) boe_voter_id

    = Column(Integer) last_name = Column(String) first_name = Column(String) middle_name = Column(String) suffix = Column(String) birth_year = Column(Integer) party = Column(String) status = Column(String) absentee_ballot_requested = Column(Boolean) absentee_ballot_returned = Column(Date) absentee_ballot_challenged = Column(String) household_id = Column(Integer, ForeignKey('households.id')) ballots = relationship('Ballot', order_by='Ballot.id', backref='voter', cascade='all, delete, delete-orphan')
  29. class Ballot(Base): __tablename__ = 'ballots' id = Column(Integer, primary_key=True) ballot_type

    = Column(String) election_type = Column(String) election_date = Column(Date) voter_id = Column(Integer, ForeignKey('voters.id'))
  30. class Student(Base): __tablename__ = 'students' id = Column(Integer, primary_key=True) last_name

    = Column(String) first_name = Column(String) school_name = Column(String) household_id = Column(Integer, ForeignKey('households.id'))
  31. # In levydb/ingest_data.py... def ingest_voter_data(): ... def ingest_student_data(): ... def

    ingest_staff_data(): ... def ingest_pta_data(): ... def ingest_do_not_contact_data(): ...
  32. # In levydb/db.py... from sqlalchemy import create_engine from sqlalchemy.orm import

    sessionmaker from levydb import config Session = sessionmaker() engine = create_engine(config.DB_FILE, echo=config.DB_ECHO) Session.configure(bind=engine) session = Session() # In levydb/ingest_data.py... from levydb.db import engine, session
  33. # In levydb/db.py... from sqlalchemy import create_engine from sqlalchemy.orm import

    sessionmaker from levydb import config Session = sessionmaker() engine = create_engine(config.DB_FILE, echo=config.DB_ECHO) Session.configure(bind=engine) session = Session() # In levydb/ingest_data.py... from levydb.db import engine, session
  34. # In levydb/db.py... from sqlalchemy import create_engine from sqlalchemy.orm import

    sessionmaker from levydb import config Session = sessionmaker() engine = create_engine(config.DB_FILE, echo=config.DB_ECHO) Session.configure(bind=engine) session = Session() # In levydb/ingest_data.py... from levydb.db import engine, session
  35. # In levydb/db.py... from sqlalchemy import create_engine from sqlalchemy.orm import

    sessionmaker from levydb import config Session = sessionmaker() engine = create_engine(config.DB_FILE, echo=config.DB_ECHO) Session.configure(bind=engine) session = Session() # In levydb/ingest_data.py... from levydb.db import engine, session
  36. import csv with open("some_data.csv") as f: reader = csv.reader(f) header_row

    = reader.next() for row in reader: value0 = row[0] value1 = row[1] ...
  37. import csv with open("some_data.csv") as f: reader = csv.reader(f) header_row

    = reader.next() for row in reader: value0 = row[0] value1 = row[1] ...
  38. import csv with open("some_data.csv") as f: reader = csv.reader(f) header_row

    = reader.next() for row in reader: value0 = row[0] value1 = row[1] ...
  39. import csv with open("some_data.csv") as f: reader = csv.reader(f) header_row

    = reader.next() for row in reader: value0 = row[0] value1 = row[1] ...
  40. Election type and date are encoded as header row titles:

    ...,G_11_04_2014,S_08_05_2014,P_05-06-2014,... The ballot type voted in those elections is specified in each row and can indicate a general or party-specific ballot (eg, in primaries): ...,Y,,D,... Ballot Data
  41. Election type and date are encoded as header row titles:

    ...,G_11_04_2014,S_08_05_2014,P_05-06-2014,... The ballot type voted in those elections is specified in each row and can indicate a general or party-specific ballot (eg, in primaries): ...,Y,,D,... Ballot Data
  42. Election type and date are encoded as header row titles:

    ...,G_11_04_2014,S_08_05_2014,P_05-06-2014,... The ballot type voted in those elections is specified in each row and can indicate a general or party-specific ballot (eg, in primaries): ...,Y,,D,... Ballot Data
  43. Election type and date are encoded as header row titles:

    ...,G_11_04_2014,S_08_05_2014,P_05-06-2014,... The ballot type voted in those elections is specified in each row and can indicate a general or party-specific ballot (eg, in primaries): ...,Y,,D,... Ballot Data
  44. Election type and date are encoded as header row titles:

    ...,G_11_04_2014,S_08_05_2014,P_05-06-2014,... The ballot type voted in those elections is specified in each row and can indicate a general or party-specific ballot (eg, in primaries): ...,Y,,D,... Ballot Data
  45. Election type and date are encoded as header row titles:

    ...,G_11_04_2014,S_08_05_2014,P_05-06-2014,... The ballot type voted in those elections is specified in each row and can indicate a general or party-specific ballot (eg, in primaries): ...,Y,,D,... Ballot Data
  46. Election type and date are encoded as header row titles:

    ...,G_11_04_2014,S_08_05_2014,P_05-06-2014,... The ballot type voted in those elections is specified in each row and can indicate a general or party-specific ballot (eg, in primaries): ...,Y,,D,... Ballot Data
  47. # Get ballot data for election_title, ballot_type in zip(header_row[18:65], row[18:65]):

    if not ballot_type: continue election_type, election_date = parse_election_title(election_title) ballot = Ballot( ballot_type=ballot_type, election_type=election_type, election_date=election_date, voter_id=voter.id) session.add(ballot) session.commit() def parse_election_title(title): election_type, month, day, year = title.replace('-', '_').split('_') election_date = datetime.date(int(year), int(month), int(day)) return election_type, election_date
  48. # Get ballot data for election_title, ballot_type in zip(header_row[18:65], row[18:65]):

    if not ballot_type: continue election_type, election_date = parse_election_title(election_title) ballot = Ballot( ballot_type=ballot_type, election_type=election_type, election_date=election_date, voter_id=voter.id) session.add(ballot) session.commit() def parse_election_title(title): election_type, month, day, year = title.replace('-', '_').split('_') election_date = datetime.date(int(year), int(month), int(day)) return election_type, election_date
  49. # Get ballot data for election_title, ballot_type in zip(header_row[18:65], row[18:65]):

    if not ballot_type: continue election_type, election_date = parse_election_title(election_title) ballot = Ballot( ballot_type=ballot_type, election_type=election_type, election_date=election_date, voter_id=voter.id) session.add(ballot) session.commit() def parse_election_title(title): election_type, month, day, year = title.replace('-', '_').split('_') election_date = datetime.date(int(year), int(month), int(day)) return election_type, election_date
  50. # Get ballot data for election_title, ballot_type in zip(header_row[18:65], row[18:65]):

    if not ballot_type: continue election_type, election_date = parse_election_title(election_title) ballot = Ballot( ballot_type=ballot_type, election_type=election_type, election_date=election_date, voter_id=voter.id) session.add(ballot) session.commit() def parse_election_title(title): election_type, month, day, year = title.replace('-', '_').split('_') election_date = datetime.date(int(year), int(month), int(day)) return election_type, election_date
  51. # Get ballot data for election_title, ballot_type in zip(header_row[18:65], row[18:65]):

    if not ballot_type: continue election_type, election_date = parse_election_title(election_title) ballot = Ballot( ballot_type=ballot_type, election_type=election_type, election_date=election_date, voter_id=voter.id) session.add(ballot) session.commit() def parse_election_title(title): election_type, month, day, year = title.replace('-', '_').split('_') election_date = datetime.date(int(year), int(month), int(day)) return election_type, election_date
  52. # Get ballot data for election_title, ballot_type in zip(header_row[18:65], row[18:65]):

    if not ballot_type: continue election_type, election_date = parse_election_title(election_title) ballot = Ballot( ballot_type=ballot_type, election_type=election_type, election_date=election_date, voter_id=voter.id) session.add(ballot) session.commit() def parse_election_title(title): election_type, month, day, year = title.replace('-', '_').split('_') election_date = datetime.date(int(year), int(month), int(day)) return election_type, election_date
  53. # Get ballot data for election_title, ballot_type in zip(header_row[18:65], row[18:65]):

    if not ballot_type: continue election_type, election_date = parse_election_title(election_title) ballot = Ballot( ballot_type=ballot_type, election_type=election_type, election_date=election_date, voter_id=voter.id) session.add(ballot) session.commit() def parse_election_title(title): election_type, month, day, year = title.replace('-', '_').split('_') election_date = datetime.date(int(year), int(month), int(day)) return election_type, election_date
  54. Board of Elections: School Directory: house_no pre_dir street apartment zip

    221 N BAKER ST B1 12345 Data != Data Mailing Street Mailing Zip 221 North Baker Street APT #B-1 12345
  55. def normalize_address(address): # Convert to uppercase address = address.upper() #

    Remove non-alphanumerics/non-space address = re.sub(r'[^A-Z0-9 ]', '', address) # Condense whitespace address = re.sub(r'\s+', ' ', address) # Replace things for pattern, replacement in address_replacements: address = re.sub(pattern, replacement, address) # Condense whitespace again in case we messed it up along the way address = re.sub(r'\s+', ' ', address) # Strip again in case we added extra whitespace somehow address = address.strip() return address
  56. def normalize_address(address): # Convert to uppercase address = address.upper() #

    Remove non-alphanumerics/non-space address = re.sub(r'[^A-Z0-9 ]', '', address) # Condense whitespace address = re.sub(r'\s+', ' ', address) # Replace things for pattern, replacement in address_replacements: address = re.sub(pattern, replacement, address) # Condense whitespace again in case we messed it up along the way address = re.sub(r'\s+', ' ', address) # Strip again in case we added extra whitespace somehow address = address.strip() return address
  57. def normalize_address(address): # Convert to uppercase address = address.upper() #

    Remove non-alphanumerics/non-space address = re.sub(r'[^A-Z0-9 ]', '', address) # Condense whitespace address = re.sub(r'\s+', ' ', address) # Replace things for pattern, replacement in address_replacements: address = re.sub(pattern, replacement, address) # Condense whitespace again in case we messed it up along the way address = re.sub(r'\s+', ' ', address) # Strip again in case we added extra whitespace somehow address = address.strip() return address
  58. def normalize_address(address): # Convert to uppercase address = address.upper() #

    Remove non-alphanumerics/non-space address = re.sub(r'[^A-Z0-9 ]', '', address) # Condense whitespace address = re.sub(r'\s+', ' ', address) # Replace things for pattern, replacement in address_replacements: address = re.sub(pattern, replacement, address) # Condense whitespace again in case we messed it up along the way address = re.sub(r'\s+', ' ', address) # Strip again in case we added extra whitespace somehow address = address.strip() return address
  59. def normalize_address(address): # Convert to uppercase address = address.upper() #

    Remove non-alphanumerics/non-space address = re.sub(r'[^A-Z0-9 ]', '', address) # Condense whitespace address = re.sub(r'\s+', ' ', address) # Replace things for pattern, replacement in address_replacements: address = re.sub(pattern, replacement, address) # Condense whitespace again in case we messed it up along the way address = re.sub(r'\s+', ' ', address) # Strip again in case we added extra whitespace somehow address = address.strip() return address
  60. def normalize_address(address): # Convert to uppercase address = address.upper() #

    Remove non-alphanumerics/non-space address = re.sub(r'[^A-Z0-9 ]', '', address) # Condense whitespace address = re.sub(r'\s+', ' ', address) # Replace things for pattern, replacement in address_replacements: address = re.sub(pattern, replacement, address) # Condense whitespace again in case we messed it up along the way address = re.sub(r'\s+', ' ', address) # Strip again in case we added extra whitespace somehow address = address.strip() return address
  61. def normalize_address(address): # Convert to uppercase address = address.upper() #

    Remove non-alphanumerics/non-space address = re.sub(r'[^A-Z0-9 ]', '', address) # Condense whitespace address = re.sub(r'\s+', ' ', address) # Replace things for pattern, replacement in address_replacements: address = re.sub(pattern, replacement, address) # Condense whitespace again in case we messed it up along the way address = re.sub(r'\s+', ' ', address) # Strip again in case we added extra whitespace somehow address = address.strip() return address
  62. address_replacements = [ ('APT', ''), ('SUITE', ''), (' EAST ',

    ' E '), (' WEST ', ' W '), (' NORTH ', ' N '), (' SOUTH ', ' S '), (r'(\D) ROAD', r'\1 RD'), (r'(\D) AVENUE', r'\1 AVE'), (r'(\D) DRIVE', r'\1 DR'), (r'(\D) CIRCLE', r'\1 CIR'), (r'(\D) TRAIL', r'\1 TRL'), (r'(\D) BOULEVARD', r'\1 BLVD'), (r'(\D) LANE', r'\1 LN'), (r'(\D) COURT', r'\1 CT'), (r'(\D) PARKWAY', r'\1 PKWY'), (r'(\D) PKY', r'\1 PKWY'), (r'(\D) MEADOWS', r'\1 MDWS'), (r'(\D) PLACE', r'\1 PL'), (r'(\D) ROWE', r'\1 ROW'), (r'(\D) CROSSING$', r'\1 XING'), ..., (r'(\s\D) (\d+)$', r'\1\2'), ]
  63. # When ingesting voter data for row in reader: ...

    # Normalize address raw_address = ' '.join( [house_number, direction_prefix, street, apartment]) address = normalize_address(raw_address) # Try to match to an existing household household = session.query(Household).filter_by(address=address).first() if household: voter.household_id = household.id else: # Create new household ...
  64. # When ingesting voter data for row in reader: ...

    # Normalize address raw_address = ' '.join( [house_number, direction_prefix, street, apartment]) address = normalize_address(raw_address) # Try to match to an existing household household = session.query(Household).filter_by(address=address).first() if household: voter.household_id = household.id else: # Create new household ...
  65. # When ingesting voter data for row in reader: ...

    # Normalize address raw_address = ' '.join( [house_number, direction_prefix, street, apartment]) address = normalize_address(raw_address) # Try to match to an existing household household = session.query(Household).filter_by(address=address).first() if household: voter.household_id = household.id else: # Create new household ...
  66. # When ingesting voter data for row in reader: ...

    # Normalize address raw_address = ' '.join( [house_number, direction_prefix, street, apartment]) address = normalize_address(raw_address) # Try to match to an existing household household = session.query(Household).filter_by(address=address).first() if household: voter.household_id = household.id else: # Create new household ...
  67. # When ingesting student data... for row in reader: ...

    # Attach students to households address = normalize_address(raw_address) household = session.query(Household).filter_by(address=address).first() if household: student.household_id = household.id if mother_last_name and not household.mother_last_name: household.mother_last_name = mother_last_name if mother_first_name and not household.mother_first_name: household.mother_first_name = mother_first_name ... if phone and not household.phone: household.phone = phone else: # Create new household ...
  68. # When ingesting student data... for row in reader: ...

    # Attach students to households address = normalize_address(raw_address) household = session.query(Household).filter_by(address=address).first() if household: student.household_id = household.id if mother_last_name and not household.mother_last_name: household.mother_last_name = mother_last_name if mother_first_name and not household.mother_first_name: household.mother_first_name = mother_first_name ... if phone and not household.phone: household.phone = phone else: # Create new household ...
  69. # When ingesting student data... for row in reader: ...

    # Attach students to households address = normalize_address(raw_address) household = session.query(Household).filter_by(address=address).first() if household: student.household_id = household.id if mother_last_name and not household.mother_last_name: household.mother_last_name = mother_last_name if mother_first_name and not household.mother_first_name: household.mother_first_name = mother_first_name ... if phone and not household.phone: household.phone = phone else: # Create new household ...
  70. # When ingesting staff, early childhood PTA, do not contact

    data... for row in reader: ... address = normalize_address(raw_address) household = session.query(Household).filter_by(address=address).first() if not household: continue ...
  71. # When ingesting staff, early childhood PTA, do not contact

    data... for row in reader: ... address = normalize_address(raw_address) household = session.query(Household).filter_by(address=address).first() if not household: continue ...
  72. # When ingesting staff, early childhood PTA, do not contact

    data... for row in reader: ... address = normalize_address(raw_address) household = session.query(Household).filter_by(address=address).first() if not household: continue ...
  73. A single model object: voter.absentee_ballot_requested = True session.commit() Many rows,

    without a model object: session.query(Voter).update({ 'absentee_ballot_requested': None, 'absentee_ballot_returned': None, 'absentee_ballot_challenged': None, }) session.commit() Updating the Database
  74. #!/bin/env python def get_data(...): ... def write_csv(...): ... def main():

    data = get_data() write_csv(data) if __name__ == "__main__": main()
  75. model_object = session.query(ModelObject).first() model_objects = session.query(ModelObject).all() model_objects = session.query(ModelObject)[42:2112] model_object

    = session.query(ModelObject).\ filter(ModelObject.attribute == value).first() model_objects = session.query(ModelObject).\ filter(ModelObject.attribute == value).\ group_by(ModelObject.other_attribute).\ order_by(ModelObject.third_attribute).\ all()
  76. model_object = session.query(ModelObject).first() model_objects = session.query(ModelObject).all() model_objects = session.query(ModelObject)[42:2112] model_object

    = session.query(ModelObject).\ filter(ModelObject.attribute == value).first() model_objects = session.query(ModelObject).\ filter(ModelObject.attribute == value).\ group_by(ModelObject.other_attribute).\ order_by(ModelObject.third_attribute).\ all()
  77. model_object = session.query(ModelObject).first() model_objects = session.query(ModelObject).all() model_objects = session.query(ModelObject)[42:2112] model_object

    = session.query(ModelObject).\ filter(ModelObject.attribute == value).first() model_objects = session.query(ModelObject).\ filter(ModelObject.attribute == value).\ group_by(ModelObject.other_attribute).\ order_by(ModelObject.third_attribute).\ all() Offset Limit
  78. model_object = session.query(ModelObject).first() model_objects = session.query(ModelObject).all() model_objects = session.query(ModelObject)[42:2112] model_object

    = session.query(ModelObject).\ filter(ModelObject.attribute == value).first() model_objects = session.query(ModelObject).\ filter(ModelObject.attribute == value).\ group_by(ModelObject.other_attribute).\ order_by(ModelObject.third_attribute).\ all()
  79. model_object = session.query(ModelObject).first() model_objects = session.query(ModelObject).all() model_objects = session.query(ModelObject)[42:2112] model_object

    = session.query(ModelObject).\ filter(ModelObject.attribute == value).first() model_objects = session.query(ModelObject).\ filter(ModelObject.attribute == value).\ group_by(ModelObject.other_attribute).\ order_by(ModelObject.third_attribute).\ all()
  80. A simple example: household = session.query(Household).\ filter_by(address=address).first() A more complex

    example: voters = session.query(Voter).join(Household).\ filter(Household.do_not_contact == None).\ filter(Voter.status.in_(allowed_statuses)).\ group_by(Voter.household_id).\ order_by(Voter.last_name, Voter.first_name).\ options(contains_eager(Voter.household)).\ all() More Specifically...
  81. def write_csv(model_objects): with open("report.csv", "w") as f: writer = csv.writer(f)

    header_row = ["attr0", "attr1", ...] writer.writerow(header_row) for model_object in model_objects: row = [model_object.attr0, model_object.attr1, ...] writer.writerow(row)
  82. def write_csv(model_objects): with open("report.csv", "w") as f: writer = csv.writer(f)

    header_row = ["attr0", "attr1", ...] writer.writerow(header_row) for model_object in model_objects: row = [model_object.attr0, model_object.attr1, ...] writer.writerow(row)
  83. def write_csv(model_objects): with open("report.csv", "w") as f: writer = csv.writer(f)

    header_row = ["attr0", "attr1", ...] writer.writerow(header_row) for model_object in model_objects: row = [model_object.attr0, model_object.attr1, ...] writer.writerow(row)
  84. def write_csv(model_objects): with open("report.csv", "w") as f: writer = csv.writer(f)

    header_row = ["attr0", "attr1", ...] writer.writerow(header_row) for model_object in model_objects: row = [model_object.attr0, model_object.attr1, ...] writer.writerow(row)
  85. Recent Voters • Have voted in a recent election... •

    That’s a non-Presidential election
  86. School-Affiliated Voters • Kids enrolled in a school • Parents

    in early childhood PTA • School district staff
  87. Counts Become Costs Total Active Recent School Affiliated Voters 23470

    20648 11486 3748 Households 12764 11423 7238 1847
  88. Counts Become Costs Total Active Recent School Affiliated Voters 23470

    20648 11486 3748 Households 12764 11423 7238 1847 4500?
  89. Mailing List • Households... • With a registered, active voter...

    • Who voted in May 2014... • Or who voted in November 2014 and has a student in school... • That aren’t flagged do-not-call
  90. allowed_statuses = [VoterStatus.ACTIVE, VoterStatus.PROVISIONAL] def get_voters(): date1 = datetime.date(2014, 5,

    6) date2 = datetime.date(2014, 11, 4) voters = session.query(Voter).join(Ballot).\ join(Household).outerjoin(Student).\ filter(Voter.status.in_(allowed_statuses)).\ filter(Household.do_not_contact == None).\ filter( or_( Ballot.election_date == date1, and_( Ballot.election_date == date2, Student.last_name != None ) )).\ group_by(Voter.household_id).\ order_by(Voter.last_name, Voter.first_name).\ options(contains_eager(Voter.household)).\ all() return voters
  91. allowed_statuses = [VoterStatus.ACTIVE, VoterStatus.PROVISIONAL] def get_voters(): date1 = datetime.date(2014, 5,

    6) date2 = datetime.date(2014, 11, 4) voters = session.query(Voter).join(Ballot).\ join(Household).outerjoin(Student).\ filter(Voter.status.in_(allowed_statuses)).\ filter(Household.do_not_contact == None).\ filter( or_( Ballot.election_date == date1, and_( Ballot.election_date == date2, Student.last_name != None ) )).\ group_by(Voter.household_id).\ order_by(Voter.last_name, Voter.first_name).\ options(contains_eager(Voter.household)).\ all() return voters
  92. allowed_statuses = [VoterStatus.ACTIVE, VoterStatus.PROVISIONAL] def get_voters(): date1 = datetime.date(2014, 5,

    6) date2 = datetime.date(2014, 11, 4) voters = session.query(Voter).join(Ballot).\ join(Household).outerjoin(Student).\ filter(Voter.status.in_(allowed_statuses)).\ filter(Household.do_not_contact == None).\ filter( or_( Ballot.election_date == date1, and_( Ballot.election_date == date2, Student.last_name != None ) )).\ group_by(Voter.household_id).\ order_by(Voter.last_name, Voter.first_name).\ options(contains_eager(Voter.household)).\ all() return voters
  93. allowed_statuses = [VoterStatus.ACTIVE, VoterStatus.PROVISIONAL] def get_voters(): date1 = datetime.date(2014, 5,

    6) date2 = datetime.date(2014, 11, 4) voters = session.query(Voter).join(Ballot).\ join(Household).outerjoin(Student).\ filter(Voter.status.in_(allowed_statuses)).\ filter(Household.do_not_contact == None).\ filter( or_( Ballot.election_date == date1, and_( Ballot.election_date == date2, Student.last_name != None ) )).\ group_by(Voter.household_id).\ order_by(Voter.last_name, Voter.first_name).\ options(contains_eager(Voter.household)).\ all() return voters
  94. allowed_statuses = [VoterStatus.ACTIVE, VoterStatus.PROVISIONAL] def get_voters(): date1 = datetime.date(2014, 5,

    6) date2 = datetime.date(2014, 11, 4) voters = session.query(Voter).join(Ballot).\ join(Household).outerjoin(Student).\ filter(Voter.status.in_(allowed_statuses)).\ filter(Household.do_not_contact == None).\ filter( or_( Ballot.election_date == date1, and_( Ballot.election_date == date2, Student.last_name != None ) )).\ group_by(Voter.household_id).\ order_by(Voter.last_name, Voter.first_name).\ options(contains_eager(Voter.household)).\ all() return voters
  95. allowed_statuses = [VoterStatus.ACTIVE, VoterStatus.PROVISIONAL] def get_voters(): date1 = datetime.date(2014, 5,

    6) date2 = datetime.date(2014, 11, 4) voters = session.query(Voter).join(Ballot).\ join(Household).outerjoin(Student).\ filter(Voter.status.in_(allowed_statuses)).\ filter(Household.do_not_contact == None).\ filter( or_( Ballot.election_date == date1, and_( Ballot.election_date == date2, Student.last_name != None ) )).\ group_by(Voter.household_id).\ order_by(Voter.last_name, Voter.first_name).\ options(contains_eager(Voter.household)).\ all() return voters
  96. allowed_statuses = [VoterStatus.ACTIVE, VoterStatus.PROVISIONAL] def get_voters(): date1 = datetime.date(2014, 5,

    6) date2 = datetime.date(2014, 11, 4) voters = session.query(Voter).join(Ballot).\ join(Household).outerjoin(Student).\ filter(Voter.status.in_(allowed_statuses)).\ filter(Household.do_not_contact == None).\ filter( or_( Ballot.election_date == date1, and_( Ballot.election_date == date2, Student.last_name != None ) )).\ group_by(Voter.household_id).\ order_by(Voter.last_name, Voter.first_name).\ options(contains_eager(Voter.household)).\ all() return voters
  97. allowed_statuses = [VoterStatus.ACTIVE, VoterStatus.PROVISIONAL] def get_voters(): date1 = datetime.date(2014, 5,

    6) date2 = datetime.date(2014, 11, 4) voters = session.query(Voter).join(Ballot).\ join(Household).outerjoin(Student).\ filter(Voter.status.in_(allowed_statuses)).\ filter(Household.do_not_contact == None).\ filter( or_( Ballot.election_date == date1, and_( Ballot.election_date == date2, Student.last_name != None ) )).\ group_by(Voter.household_id).\ order_by(Voter.last_name, Voter.first_name).\ options(contains_eager(Voter.household)).\ all() return voters
  98. allowed_statuses = [VoterStatus.ACTIVE, VoterStatus.PROVISIONAL] def get_voters(): date1 = datetime.date(2014, 5,

    6) date2 = datetime.date(2014, 11, 4) voters = session.query(Voter).join(Ballot).\ join(Household).outerjoin(Student).\ filter(Voter.status.in_(allowed_statuses)).\ filter(Household.do_not_contact == None).\ filter( or_( Ballot.election_date == date1, and_( Ballot.election_date == date2, Student.last_name != None ) )).\ group_by(Voter.household_id).\ order_by(Voter.last_name, Voter.first_name).\ options(contains_eager(Voter.household)).\ all() return voters
  99. allowed_statuses = [VoterStatus.ACTIVE, VoterStatus.PROVISIONAL] def get_voters(): date1 = datetime.date(2014, 5,

    6) date2 = datetime.date(2014, 11, 4) voters = session.query(Voter).join(Ballot).\ join(Household).outerjoin(Student).\ filter(Voter.status.in_(allowed_statuses)).\ filter(Household.do_not_contact == None).\ filter( or_( Ballot.election_date == date1, and_( Ballot.election_date == date2, Student.last_name != None ) )).\ group_by(Voter.household_id).\ order_by(Voter.last_name, Voter.first_name).\ options(contains_eager(Voter.household)).\ all() return voters
  100. • 4279 households to mail to • Right in our

    sweet spot AND within budget How Many?
  101. Call Lists • One list for each school • Households...

    • With a registered, active voter... • With a student in the school... • With a phone number... • That didn’t request an absentee ballot... • That aren’t flagged do-not-contact... • De-duplicated across lists
  102. def make_list(school_name, file_name, households_seen): households = get_households(school_name) write_csv(filename, households, households_seen)

    def get_households(school_name): households = session.query(Household).join(Voter).join(Student).\ filter(Voter.status.in_(allowed_statuses)).\ filter(Voter.absentee_ballot_requested == None).\ filter(Student.school_name == school_name).\ filter(Household.phone != None).\ filter(Household.do_not_contact == None).\ order_by(Voter.last_name, Voter.first_name).\ all() return households def write_csv(filename, households, households_seen): ... for household in households: if household.id in households_seen: continue row = [...] writer.writerow(row) households_seen[household.id] = household
  103. def make_list(school_name, file_name, households_seen): households = get_households(school_name) write_csv(filename, households, households_seen)

    def get_households(school_name): households = session.query(Household).join(Voter).join(Student).\ filter(Voter.status.in_(allowed_statuses)).\ filter(Voter.absentee_ballot_requested == None).\ filter(Student.school_name == school_name).\ filter(Household.phone != None).\ filter(Household.do_not_contact == None).\ order_by(Voter.last_name, Voter.first_name).\ all() return households def write_csv(filename, households, households_seen): ... for household in households: if household.id in households_seen: continue row = [...] writer.writerow(row) households_seen[household.id] = household
  104. def make_list(school_name, file_name, households_seen): households = get_households(school_name) write_csv(filename, households, households_seen)

    def get_households(school_name): households = session.query(Household).join(Voter).join(Student).\ filter(Voter.status.in_(allowed_statuses)).\ filter(Voter.absentee_ballot_requested == None).\ filter(Student.school_name == school_name).\ filter(Household.phone != None).\ filter(Household.do_not_contact == None).\ order_by(Voter.last_name, Voter.first_name).\ all() return households def write_csv(filename, households, households_seen): ... for household in households: if household.id in households_seen: continue row = [...] writer.writerow(row) households_seen[household.id] = household
  105. def make_list(school_name, file_name, households_seen): households = get_households(school_name) write_csv(filename, households, households_seen)

    def get_households(school_name): households = session.query(Household).join(Voter).join(Student).\ filter(Voter.status.in_(allowed_statuses)).\ filter(Voter.absentee_ballot_requested == None).\ filter(Student.school_name == school_name).\ filter(Household.phone != None).\ filter(Household.do_not_contact == None).\ order_by(Voter.last_name, Voter.first_name).\ all() return households def write_csv(filename, households, households_seen): ... for household in households: if household.id in households_seen: continue row = [...] writer.writerow(row) households_seen[household.id] = household
  106. def main(): households_seen = {} make_list(SchoolNames.POKEY_OAKS, '...', households_seen) make_list(SchoolNames.SPRINGFIELD_ELEMENTARY, '...',

    households_seen) make_list(SchoolNames.HOGWARTS, '...', households_seen) make_list(SchoolNames.MORNING_GLORY, '...', households_seen) make_list(SchoolNames.XAVIERS, '...', households_seen) make_list(SchoolNames.UNSEEN_UNIVERSITY, '...', households_seen)
  107. def main(): households_seen = {} make_list(SchoolNames.POKEY_OAKS, '...', households_seen) make_list(SchoolNames.SPRINGFIELD_ELEMENTARY, '...',

    households_seen) make_list(SchoolNames.HOGWARTS, '...', households_seen) make_list(SchoolNames.MORNING_GLORY, '...', households_seen) make_list(SchoolNames.XAVIERS, '...', households_seen) make_list(SchoolNames.UNSEEN_UNIVERSITY, '...', households_seen)
  108. def main(): households_seen = {} make_list(SchoolNames.POKEY_OAKS, '...', households_seen) make_list(SchoolNames.SPRINGFIELD_ELEMENTARY, '...',

    households_seen) make_list(SchoolNames.HOGWARTS, '...', households_seen) make_list(SchoolNames.MORNING_GLORY, '...', households_seen) make_list(SchoolNames.XAVIERS, '...', households_seen) make_list(SchoolNames.UNSEEN_UNIVERSITY, '...', households_seen)
  109. Unreturned Absentee Ballots • Households with... • One or more

    registered, active voters... • Who requested an absentee ballot... • But haven’t returned it... • And aren’t flagged do-not-contact
  110. def get_voters(): voters = session.query(Voter).join(Household).\ filter(Voter.absentee_ballot_requested != None).\ filter(Voter.absentee_ballot_returned ==

    None).\ filter(Household.do_not_contact == None).\ group_by(Voter.household_id).\ order_by(Voter.last_name, Voter.first_name).\ options(contains_eager(Voter.household)).\ all() return voters
  111. def get_voters(): voters = session.query(Voter).join(Household).\ filter(Voter.absentee_ballot_requested != None).\ filter(Voter.absentee_ballot_returned ==

    None).\ filter(Household.do_not_contact == None).\ group_by(Voter.household_id).\ order_by(Voter.last_name, Voter.first_name).\ options(contains_eager(Voter.household)).\ all() return voters
  112. Shame List • Registered, active voters who… • Are school

    district staff, or... • Are early childhood PTA members, or... • Have a student in school • Who haven’t voted recently
  113. voters = session.query(Voter).join(Household).outerjoin(Student).\ filter(Voter.status.in_(allowed_statuses)).\ filter( or_( Student.first_name != None, or_(

    Household.staff == True, Household.early_childhood_pta == True ) ) ).\ order_by(Voter.last_name, Voter.first_name).\ options(contains_eager(Voter.household)).\ all() shame_voters = [ voter for voter in voters if not any( [ballot for ballot in voter.ballots if end_date >= ballot.election_date >= start_date] ) ]
  114. voters = session.query(Voter).join(Household).outerjoin(Student).\ filter(Voter.status.in_(allowed_statuses)).\ filter( or_( Student.first_name != None, or_(

    Household.staff == True, Household.early_childhood_pta == True ) ) ).\ order_by(Voter.last_name, Voter.first_name).\ options(contains_eager(Voter.household)).\ all() shame_voters = [ voter for voter in voters if not any( [ballot for ballot in voter.ballots if end_date >= ballot.election_date >= start_date] ) ]
  115. voters = session.query(Voter).join(Household).outerjoin(Student).\ filter(Voter.status.in_(allowed_statuses)).\ filter( or_( Student.first_name != None, or_(

    Household.staff == True, Household.early_childhood_pta == True ) ) ).\ order_by(Voter.last_name, Voter.first_name).\ options(contains_eager(Voter.household)).\ all() shame_voters = [ voter for voter in voters if not any( [ballot for ballot in voter.ballots if end_date >= ballot.election_date >= start_date] ) ]
  116. voters = session.query(Voter).join(Household).outerjoin(Student).\ filter(Voter.status.in_(allowed_statuses)).\ filter( or_( Student.first_name != None, or_(

    Household.staff == True, Household.early_childhood_pta == True ) ) ).\ order_by(Voter.last_name, Voter.first_name).\ options(contains_eager(Voter.household)).\ all() shame_voters = [ voter for voter in voters if not any( [ballot for ballot in voter.ballots if end_date >= ballot.election_date >= start_date] ) ]
  117. voters = session.query(Voter).join(Household).outerjoin(Student).\ filter(Voter.status.in_(allowed_statuses)).\ filter( or_( Student.first_name != None, or_(

    Household.staff == True, Household.early_childhood_pta == True ) ) ).\ order_by(Voter.last_name, Voter.first_name).\ options(contains_eager(Voter.household)).\ all() shame_voters = [ voter for voter in voters if not any( [ballot for ballot in voter.ballots if end_date >= ballot.election_date >= start_date] ) ]
  118. voters = session.query(Voter).join(Household).outerjoin(Student).\ filter(Voter.status.in_(allowed_statuses)).\ filter( or_( Student.first_name != None, or_(

    Household.staff == True, Household.early_childhood_pta == True ) ) ).\ order_by(Voter.last_name, Voter.first_name).\ options(contains_eager(Voter.household)).\ all() shame_voters = [ voter for voter in voters if not any( [ballot for ballot in voter.ballots if end_date >= ballot.election_date >= start_date] ) ]
  119. voters = session.query(Voter).join(Household).outerjoin(Student).\ filter(Voter.status.in_(allowed_statuses)).\ filter( or_( Student.first_name != None, or_(

    Household.staff == True, Household.early_childhood_pta == True ) ) ).\ order_by(Voter.last_name, Voter.first_name).\ options(contains_eager(Voter.household)).\ all() shame_voters = [ voter for voter in voters if not any( [ballot for ballot in voter.ballots if end_date >= ballot.election_date >= start_date] ) ]
  120. voters = session.query(Voter).join(Household).outerjoin(Student).\ filter(Voter.status.in_(allowed_statuses)).\ filter( or_( Student.first_name != None, or_(

    Household.staff == True, Household.early_childhood_pta == True ) ) ).\ order_by(Voter.last_name, Voter.first_name).\ options(contains_eager(Voter.household)).\ all() shame_voters = [ voter for voter in voters if not any( [ballot for ballot in voter.ballots if end_date >= ballot.election_date >= start_date] ) ]
  121. Next Steps • Post-election analysis • Absentee ballot returns •

    School-affiliated voter turnout • Party affiliation voter turnout • Refactor & repackage for reuse
  122. Thoughts / Advice • Model only what you need to

    model • Code for repeatability • Shared understanding of query specification is 90% of the work • Expect arbitrary changes on short notice • Use your powers for Good
  123. Useful Links • Cuyahoga County Board of Elections Data: http://boe.cuyahogacounty.us/en-US/downloads.aspx

    • USPS Street Suffix Abbreviations: http://pe.usps.gov/text/pub28/28apc_002.htm • SQLAlchemy ORM Tutorial: http://docs.sqlalchemy.org/en/latest/orm/tutorial.html • Alembic Tutorial: http://alembic.readthedocs.org/en/latest/tutorial.html