Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Public Python for the greater good by JD Bothma (with additional notes)

Pycon ZA
October 06, 2016

Public Python for the greater good by JD Bothma (with additional notes)

We at Code For South Africa use technology to facilitate promoting informed decision making for positive social change. This can mean generally being aware of what's going on, as well as deep critical research and analysis. We run the civic tech movement {code}bridge where people come and hack together or on their lonesome on civic tech projects. A quick summary will be given of some outputs of this community in Cape Town and Ethekwini.

We'll summarise some work we've done using mostly common python tools for the good of South African society. In particular I'll show how I've scraped and mirrored a government website on a tight budget at {code}bridge for better access to public information and seen usage pick up right after the local elections. We'll also show how a little bit of tech can empower citizens to hold government to account, and participate in the governing and development of our infrastructure. How presumably boring government notices really come to life when made accessible and personal.

This talk is aimed at anyone keen on making a big impact with a little bit of tech, and interested in improving lives. There are many low hanging fruit out there where lives can be improved with technology facilitating the necessary groundwork. I'd like to show you how easy it is to make an impact.

This is a heavily revised version of a talk given at DebConf 2016 with several new projects matured or launched since.

Pycon ZA

October 06, 2016
Tweet

More Decks by Pycon ZA

Other Decks in Programming

Transcript

  1. Use technology Promote informed decision making For positive social change

    Code For South Africa (code4sa) • who is Code4SA? ◦ Use tech to promote informed decision making for positive social change. ◦ technology partner for Civic Society Organisations as well as government projects ◦ Data Journalism Academy where we train journalists in tools and techniques for analysing data, treating a dataset like another source and how to present it
  2. • Community • Workspace • incubator Gonna show you some

    of the things we do and how simple tech helps us have a big impact
  3. mpr.code4sa.org • Look up pricing for generics and alternate brands

    for medication • Published by the government • Django and sqlite • Broke it – doctor reached out
  4. You are here! wazimap.co.za • Show localised data from the

    last national census • This like proportion of households in the area headed by children, or the different levels of education in the area • Django + postgres + MapIt API + Google Search • fork of censusreporter • serving as basis for similar projects (ECD, wazimap kenya) • Didn’t expect API and exports GeoJSON, KML, CSV MapIt API
  5. • first year: proprietary survey tool • second year: Open

    Data Kit APP + FormHub + spreadsheets + scripting for SVG posters • Platform for education and automatically generating comparative stats online
  6. - searcheable and search alerts by email for free -

    scrapy for gathering the gazettes that are online - aleph for indexing, search interface - pdfminer TIKA for text extraction - future: extract notices, entities, produce datasets, corporate data - get in touch
  7. pip install XlsxWriter This frees us for exciting stuff Most

    pragmatic thing ever - most of the time, people don't know what analyses they want to perform. They want to and should explore their data. - "At least once every three months there's a startup that reinvents Pivot Tables" - Joel Spolsky
  8. Municipal Money data portal - make muni finances accessible -

    already published in excel and pdf for years - initiated by national treasury - praveen promised in budget speech
  9. Municipal Money for end users - standard financial performance assesments

    - thought long and hard (and iterated a few times) on how to present to the public - educating public on municipal finance is important for accountability - core goal
  10. MFMA HTML... <A HREF="http://mfma.treasury.gov.za/Documents/Forms/AllItems.aspx?RootFolder=..." onclick="javascript:EnterFolder('http:\u002f\u002fmfma.treasury.gov.za\u002f...'); return false;" > 04. Service

    Delivery and Budget Implementation Plans </A> http://mfma.treasury.gov.za/Documents/Forms/AllItems.aspx? RootFolder=%2fDocuments%2f04%2e%20Service%20Delivery%20and%20Budget%20Implementatio n%20Plans&amp;FolderCTID=&amp;View=%7b84CA1A01%2dEF8A%2d4DE0%2d8DC4%2d47D223 CB5867%7d RootFolder=/Documents/04. Service Delivery and Budget Implementation Plans FolderCTID= View={84CA1A01-EF8A-4DE0-8DC4-47D223CB5867} {code}bridge - but maybe it's just the link resulting in 401 - emailed, phoned, didn't get anywhere - think Jacques is responsible - maybe lazy, maybe busy, maybe incentives are wrong - decided can and want to fix regardless
  11. https://mfmamirror.github.io {code}bridge - I think many recognise the theme...or lack

    of - jekyll site hosted on github pages - sorry for ruby - perhaps it's like bringing a revolver to a pistol fight?
  12. Class MfmaSpider(scrapy.Spider): start_urls = ["http://mfma.treasury.gov.za"] def parse(self, response): for item

    in self.page_item(response): yield item def page_item(self, response): page_item = PageItem() title_css = '.breadcrumbCurrent' title = response.selector.css(title_css) page_item['title'] = title.xpath('text()')[0].extract() # Scrape content etc... yield page_item {code}bridge - start URLs - find what you want on page with css or xpath - emit urls to crawl/spider further - emit custom items as the scraped data
  13. Scrapy Item - MFMA Page Item { "form_table_rows": [], "original_url":

    "http://mfma.treasury.gov.za/Return_Forms/...", "breadcrumbs": "<span><a " href=\"http://mfma.treasury.gov.za...", "title": "Return Forms", "path": "/Return_Forms/index.html", "type": "page", "body": "<div>\nAll Return Forms contain new demarcation codes ..." } {code}bridge - arbitrary fields - scrapy checks that the correct fields are set - very simple script iterates over items and writes yml files for jekyll to build site
  14. https://mfmamirror.github.io 1 Scrape periodically Rebuild site locally (when I remember)

    Push to github Link to files on .gov.za {code}bridge - scrape original using scrapy - do it every 2 days or so on scrapinghub - link back to resources on original website - way better but not perfect
  15. https://mfmamirror.github.io 2 Rebuild and push from scrapinghub? Ugh….gitpython needs native

    git {code}bridge - try to make it publish from scrapinghub - ...uh.... gitpython actually needs cgit
  16. https://mfmamirror.github.io 3 Resources on S3 ITEM_PIPELINES = { 'mfma.pipelines.DepagingPipeline': 100,

    'mfma.pipelines.FileArchivePipeline': 100, # 'mfma.pipelines.MirrorBuilderPipeline': 300, } {code}bridge - push non-html to S3 - link to those - nice archive - not yet obeying cache rules like modified date and noticing changes in non-html
  17. Ward candidates {code}bridge - now next step is to search

    for candidate IDs in gazettes... - ...and connect the results with CIPC dataset
  18. Community Centres {code}bridge • Sounds like weddings and parties a

    big revenue stream • Security guards removed due to budget cuts? • Unsafe • Big payers take precedence? • 1) identify where your nearest options are • 2) make booking/fee info accessible, transparent • 3) make sure they’re serving the community
  19. Get Involved Play with your city’s data Show what’s possible

    Join/start a civic tech/open data group Take a sabbatical ( ͡º ͜ʖ ͡º) code4sa.org/careers • find your local data and play with it. The coolest thing when you start looking at a dataset is how exciting it becomes once you start looking at it
  20. Tonight - community evening twice-monthly {code}bridge Various regular and occasional

    events • Once a month - data visualisation • Once a month - movie night • Twice a month - community evening