Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Navigating the Campaign Contribution Network

Bobby Norton
November 17, 2012

Navigating the Campaign Contribution Network

A brief intro to complex systems, and a preview of work I'm doing exploring FEC campaign contribution data as a complex network.

Bobby Norton

November 17, 2012

More Decks by Bobby Norton

Other Decks in Science



    - Detroit, MI November 17, 2012 image: http://skyeome.net/wordpress/?p=102
  2. Learning Systems Institute Lockheed Martin Simulation & Training ThoughtWorks DRW

    Trading Group Aurelius Education Startup (Coming soon...)
  3. If string theory fails to provide a testable prediction, then

    nobody should believe it. S. James Gates, Jr. - University of Maryland
  4. In string theory, the Planck length is the order of

    magnitude of the oscillating strings that form elementary particles, and shorter lengths do not make physical sense. http://en.wikipedia.org/wiki/Planck_length
  5. Graph databases provide us with an efficient means to create

    predictive models of complex systems. CLAIM
  6. COMPLEX SYSTEMS • Cascading failures • Unclear boundaries • May

    be capable of adaptation • Nonlinear (exhibit a “Butterfly Effect”) • May be nested, a system of systems • Often exhibit small world and scale-free topologies http://en.wikipedia.org/wiki/Complex_system#Features_of_complex_systems
  7. Skitter data depicting a macroscopic snapshot of Internet connectivity, with

    selected backbone ISPs By K. C. Claffy Email: [email protected] http://www.caida.org/publications/papers/bydate/index.xml
  8. C00514893|N|Q1|P|12951391265|15E|IND|KATZ, DAVID|SAN FRANCISCO|CA|94110|GROUPON, INC.|VP/ GM|03202012|1000|C00401224|C3694043A|776253||* EARMARKED CONTRIBUTION: SEE BELOW| 4051020121155797607

    C00496778|A|Q1|P|12951584507|15|IND|KALATHIL, VINOD|CHICAGO|IL|60607|GROUPON/DIRECTOR OF INTERNAL AUDIT|DIRECTOR OF INTERNAL AUDIT|03152012|1000||C6914012|781193||| 4051520121155972880 C00420760|N|Q2|P|12020554070|15|IND|KIMET, CAROLYN|EVANSTON|IL|80201|GROUPON|E-MARKETING DIRECTOR|06052012|1000||SA0802094012139|802475|||2080220121160111657 C00494740|N|M8|P|12952683408|15|IND|GERSTER, DAVID|BURLINGAME|CA|94010|GROUPON, INC.| DIRECTOR OF ANALYTICS|07112012|2000||C17532152|805999|||4082820121161649967 C00431445|N|M9|P|12972391292|15|IND|KATZ, DAVID|SAN FRANCISCO|CA|94110|GROUPON, INC.|VP/ GM MOBILE|08312012|1000||C20387409|811365|||4100320121165864857 C00494740|N|M10|P|12960022928|15|IND|KLATT, KYLE|CHICAGO|IL|60657|GROUPON|SENIOR CAMPAIGN ORGANIZER|09092012|250||C21392360|821033|||4102520121168451880 C00501692|A|Q3|P|12950510503|15|IND|BAVDA, MRUGESH|CHICAGO|IL|60622|GROUPON|MARKET PLANNER|09272011|500||C7251377|765709|||4030120121152599889 C00401224|N|M4||12971009695|24T|IND|KATZ, DAVID|SAN FRANCISCO|CA|94110|GROUPON, INC.|VP/ GM|03202012|1000|C00514893|SA11AI_5025357|778552||EARMARKED FOR PEOPLE FOR DEREK KILMER (C00514893)|4060920121156872726 C00494930|N|Q2|G|12020551850|15|IND|KLAUMINZER, JAY|ROCKY RIVER|OH|44116|GROUPON| REGIONAL VICE PRESIDENT|06292012|1000||SA0803123312246|802781|||2080620121160328596 C00494740|A|Q2|P|11971580062|15|IND|LEFKOFSKY, ERIC|GLENCOE|IL|60022|GROUPON|OWNER| 04152011|35800||C11008231|748092|||4101820111143685060 C00494740|N|M9|P|12972228143|15|IND|BAKER, CAROLINE|CHICAGO|IL|60616|GROUPON|OPERATIONS| 08312012|250||C20245573|810872|||4092720121165000105 C00431445|N|M9|P|12972342658|15|IND|HUTMACHER, AMY|GRAND JUNCTION|CO|81506|GROUPON| PRODUCT MANAGER|08082012|250||C18876006|811365|||4100320121165718953 C00494740|N|M9|P|12972226974|15|IND|MASON, ANDREW DIVVENS|CHICAGO|IL|60612|GROUPON| CEO|08102012|35800||C19080692|810872|||4092720121164996598 C00431445|N|M9|P|12972331365|15|IND|RASMUSSEN, ERIC|MENLO PARK|CA|94025|GROUPON.COM| MARKETING|08312012|250||C20380155|811365|||4100320121165685074
  9. FEC NODES AND EDGES • candidates --contribute_to--> committees • committees

    --contribute_to--> committees • donors --contribute_to--> committees • donors --employed_by--> companies • companies --contribute_to--> committees • committees --contribute_to--> candidates
  10. public int saveEntries(BatchInserter inserter, BatchIndex index) { int count =

    0; try { for (String line : lines) { String[] fields = line.split("\\|"); // Candidate address data is the most inconsistent in this file, so we skip that entirely. // There are also incomplete records, e.g. CAND_ID H2NJ02177, that we can skip since they // aren't referenced anywhere else. if (dirty(fields, 10)) continue; Map<String, Object> candidate = transform(fieldEnum, fields); long candidateId = inserter.createNode(candidate); index.add(candidateId, candidate); // TODO: Convert the COMMITTEE_ID property to a relationship with a committee count++; } } catch (Exception e) { throw new RuntimeException("Failed to write candidates:", e); } return count; }
  11. for (String line : lines) { String[] fields = line.split("\\|");

    if (dirty(fields, 10)) { dirty++; continue; }
  12. Map<String, Object> props = transactionProperties(fields); String sourceData = fields[Fields.SOURCE_COMMITTEE_ID.ordinal()]; String

    targetData = fields[Fields.TARGET_COMMITTEE_ID.ordinal()]; Long sourceId = committeeIndex.find("committee_id", sourceData); Long targetId = committeeIndex.find("committee_id", targetData); if (sourceId == null || targetId == null) { props.put("source", sourceData); props.put("target", targetData); int amount = Integer.parseInt(fields[Fields.AMOUNT.ordinal()].trim()); dirtyMoney += amount; dirty++; } else { inserter.createRelationship(sourceId, targetId, Relationships.CONTRIBUTED, props); count++; }
  13. public BatchIndex(BatchInserter inserter, String indexName, String indexProperty) { this.indexProperty =

    indexProperty; indexProvider = new LuceneBatchInserterIndexProvider(inserter); index = indexProvider.nodeIndex(indexName, MapUtil.stringMap("type", "exact")); index.setCacheCapacity(indexProperty, 100000); } http://lucene.apache.org
  14. public void add(long nodeId, Map<String, Object> properties) { if (!properties.containsKey(indexProperty))

    { throw new RuntimeException( String.format("Node %d is missing property %s", nodeId, indexProperty)); } index.add(nodeId, properties); }
  15. public Long find(String key, String value) { IndexHits<Long> ids =

    index.get(key, value); if (!ids.hasNext()) { ids.close(); return null; } Long nodeId = ids.next(); ids.close(); return nodeId; }
  16. How much has the Obama campaign spent? obama = g.V.filter

    {it.name == "OBAMA, BARACK"}.next() x = [] obama.outE.amount.store(x) { it.toInteger() } x.inject(0) { acc, val -> acc + val } ==>187447424* *given the data I’ve loaded so far...
  17. To which committee did the Obama campaign make the most

    number of contributions? obama = g.V.filter {it.name == "OBAMA, BARACK"}.next() x = [:] obama.out.groupCount(x).iterate() top = x.sort {a,b -> b.value <=> a.value}[0..9] gremlin> top.keySet().toArray().first().name ==>WORKING AMERICA
  18. Who is going to win the election? (according to eigenvector

    centrality) m = [:]; c = 0; g.V.out.groupCount(m).loop(2){c++ < 1000} top = m.sort{-it.value}[0..9] top.keySet().toArray().first().name ==>OBAMA FOR AMERICA
  19. NEXT STEPS • Open source release for hackers and data

    scientists - ANN coming soon via Twitter • Finish loading FEC data and continue network analysis • Visualize the network • Compare results to other implementations to validate the original claim