Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Public Python for the greater good by JD Bothma

Pycon ZA
October 06, 2016

Public Python for the greater good by JD Bothma

We at Code For South Africa use technology to facilitate promoting informed decision making for positive social change. This can mean generally being aware of what's going on, as well as deep critical research and analysis. We run the civic tech movement {code}bridge where people come and hack together or on their lonesome on civic tech projects. A quick summary will be given of some outputs of this community in Cape Town and Ethekwini.

We'll summarise some work we've done using mostly common python tools for the good of South African society. In particular I'll show how I've scraped and mirrored a government website on a tight budget at {code}bridge for better access to public information and seen usage pick up right after the local elections. We'll also show how a little bit of tech can empower citizens to hold government to account, and participate in the governing and development of our infrastructure. How presumably boring government notices really come to life when made accessible and personal.

This talk is aimed at anyone keen on making a big impact with a little bit of tech, and interested in improving lives. There are many low hanging fruit out there where lives can be improved with technology facilitating the necessary groundwork. I'd like to show you how easy it is to make an impact.

This is a heavily revised version of a talk given at DebConf 2016 with several new projects matured or launched since.

Pycon ZA

October 06, 2016
Tweet

More Decks by Pycon ZA

Other Decks in Programming

Transcript

  1. Public Python for the greater good JD Bothma Code for

    South Africa @jdbothma @code4sa
  2. Use technology Promote informed decision making For positive social change

    Code For South Africa (code4sa)
  3. None
  4. mpr.code4sa.org

  5. You are here! wazimap.co.za

  6. None
  7. None
  8. None
  9. None
  10. None
  11. pip install XlsxWriter This frees us for exciting stuff

  12. None
  13. Municipal Money data portal

  14. Municipal Money for end users

  15. http://mfma.treasury.gov.za/ {code}bridge

  16. ~31600 files ~10900 spreadsheets ~18481 PDFs {code}bridge

  17. MFMA HTML... <A HREF="http://mfma.treasury.gov.za/Documents/Forms/AllItems.aspx?RootFolder=..." onclick="javascript:EnterFolder('http:\u002f\u002fmfma.treasury.gov.za\u002f...'); return false;" > 04. Service

    Delivery and Budget Implementation Plans </A> http://mfma.treasury.gov.za/Documents/Forms/AllItems.aspx? RootFolder=%2fDocuments%2f04%2e%20Service%20Delivery%20and%20Budget%20Implementatio n%20Plans&amp;FolderCTID=&amp;View=%7b84CA1A01%2dEF8A%2d4DE0%2d8DC4%2d47D223 CB5867%7d RootFolder=/Documents/04. Service Delivery and Budget Implementation Plans FolderCTID= View={84CA1A01-EF8A-4DE0-8DC4-47D223CB5867} {code}bridge
  18. https://mfmamirror.github.io {code}bridge

  19. First online: 2016-06-12 {code}bridge

  20. Google Analytics outbound clicks {code}bridge

  21. Class MfmaSpider(scrapy.Spider): start_urls = ["http://mfma.treasury.gov.za"] def parse(self, response): for item

    in self.page_item(response): yield item def page_item(self, response): page_item = PageItem() title_css = '.breadcrumbCurrent' title = response.selector.css(title_css) page_item['title'] = title.xpath('text()')[0].extract() # Scrape content etc... yield page_item {code}bridge
  22. Scrapy Item - MFMA Page Item { "form_table_rows": [], "original_url":

    "http://mfma.treasury.gov.za/Return_Forms/...", "breadcrumbs": "<span><a " href=\"http://mfma.treasury.gov.za...", "title": "Return Forms", "path": "/Return_Forms/index.html", "type": "page", "body": "<div>\nAll Return Forms contain new demarcation codes ..." } {code}bridge
  23. https://mfmamirror.github.io 1 Scrape periodically Rebuild site locally (when I remember)

    Push to github Link to files on .gov.za {code}bridge
  24. https://mfmamirror.github.io 2 Rebuild and push from scrapinghub? Ugh….gitpython needs native

    git {code}bridge
  25. https://mfmamirror.github.io 3 Resources on S3 ITEM_PIPELINES = { 'mfma.pipelines.DepagingPipeline': 100,

    'mfma.pipelines.FileArchivePipeline': 100, # 'mfma.pipelines.MirrorBuilderPipeline': 300, } {code}bridge
  26. Ward candidates {code}bridge

  27. SA CITIES OPEN DATA ALMANACBETA {code}bridge

  28. Community Centres {code}bridge

  29. Get Involved Play with your city’s data Show what’s possible

    Join/start a civic tech/open data group Take a sabbatical ( ͡º ͜ʖ ͡º) code4sa.org/careers
  30. Tonight - community evening twice-monthly {code}bridge