Slide 1

Slide 1 text

Open Budgets India: Lessons from the front line Gaurav Godhwani | @gggodhwani

Slide 2

Slide 2 text

Government budgets explain real priorities and values of the state and its people

Slide 3

Slide 3 text

Budgets are hard to consume & difficult to understand But..

Slide 4

Slide 4 text

Major issues with India’s Budgets ● Scattered and unstructured PDF documents ● Limited availability of Budgets online ● Inconsistent Formats ● No Metadata ● Inconsistent and incomplete Budget Codes aka Unique IDs

Slide 5

Slide 5 text

But these problems are common across all public information systems & for civic-tech projects

Slide 6

Slide 6 text

Primary Education Public Health Judiciary Agriculture Drinking Water & Sanitation Energy

Slide 7

Slide 7 text

MAJOR LESSONS

Slide 8

Slide 8 text

LESSON #1 Invest on Problem Munging

Slide 9

Slide 9 text

150+ Budget Source Websites

Slide 10

Slide 10 text

150+ Budget Formats

Slide 11

Slide 11 text

Collaborate with Communities

Slide 12

Slide 12 text

LESSON #2 Explore existing Data Platforms

Slide 13

Slide 13 text

Understand Existing Data Platforms

Slide 14

Slide 14 text

LESSON #3 Go the Agile Way

Slide 15

Slide 15 text

Process Development Cycle

Slide 16

Slide 16 text

LESSON #4 Build a Robust Pipeline

Slide 17

Slide 17 text

Data Pipeline Scrape Parse Transform Publish Analyse

Slide 18

Slide 18 text

Parse - Line-based Segmentation table_bounds = { "top": …, "left": …., "bottom": ..., "right": … } column_ coordinates = [c1, c2, c3, ... , cN]

Slide 19

Slide 19 text

https://github.com/tabulapdf/tabula { Table Attributes } Parse - Line-based Segmentation

Slide 20

Slide 20 text

But..

Slide 21

Slide 21 text

Parse - Block-based Segmentation

Slide 22

Slide 22 text

Clean Machine Readable Data https://openbudgetsindia.org/api/action/datastore_search?resource_id= 38e553a0-4dd9-46f5-8d62-4938e1f7df3d

Slide 23

Slide 23 text

LESSON #5 Keep everything Open-by-default

Slide 24

Slide 24 text

Keeping Code, Data, Research, Design - All Open https://github.com/cbgaindia

Slide 25

Slide 25 text

LESSON #6 Enable Data Consumption

Slide 26

Slide 26 text

Educate https://openbudgetsindia.org/budget-basics/union-budget.html#money-flow

Slide 27

Slide 27 text

Simplify http://unionbudget2017.cbgaindia.org/

Slide 28

Slide 28 text

Compare https://cbgaindia.github.io/story-generator/

Slide 29

Slide 29 text

Compare https://cbgaindia.github.io/story-generator/

Slide 30

Slide 30 text

Enable Replication https://datakind-blr.github.io/antara/

Slide 31

Slide 31 text

Customize

Slide 32

Slide 32 text

Customize

Slide 33

Slide 33 text

LESSON #7 Document Everything!

Slide 34

Slide 34 text

https://github.com/cbgaindia/parsers

Slide 35

Slide 35 text

LESSON #8 Track Various Data Adoptions

Slide 36

Slide 36 text

State of Aadhaar Report - Social Protection, May 2017 http://stateofaadhaar.in/wp-content/uploads/State-of-Aadhaar-Ch5-Social-Protection.pdf

Slide 37

Slide 37 text

The Huffington Post http://www.huffingtonpost.in/vineet-john-samuel/the-gorakhpur-and-farrukhabad-tragedies-are-symptoms -of-a-larger-malaise_a_23194697/

Slide 38

Slide 38 text

Collaborate Help us to: ● Use our tools to analyse your data & write your data stories ● Generate more Open Government Data in your Geography ● Help us improve these algorithms and evolve Codebase We are open to new ideas, suggestions and feedback

Slide 39

Slide 39 text

Attributions ● https://pixabay.com/p-330580/ - AkshayaPatra Foundation at Pixabay [CC0] ● https://flic.kr/p/9bybv7 - United Nations Photo - Maternal Health in Developing Countries at Flickr [CC BY 2.0] ● https://commons.wikimedia.org/wiki/File%3ASupreme_Court_of_India_-_200705_(edited).jpg - Legaleagle86 at en.wikipedia [CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0) or GFDL (http://www.gnu.org/copyleft/fdl.html)], via Wikimedia Commons ● https://commons.wikimedia.org/wiki/File%3AAgriculture_main.jpg - By Meera'rah (Own work) [CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons ● https://pixabay.com/en/faucet-fountain-water-dispenser-1684902/ [CC0] ● https://www.meetup.com/DataKind-Bangalore/photos/26368975/460442903/ - DataKind Bangalore, Project Accelerator ● https://heaven00.github.io/pycon_delhi_2017/#/ -Jayant Pahuja, PyCon Delhi - 2017 ● https://www.zopyx.com/andreas-jung/contents/integrating-sphinx-documentation-into-a-pyramid-application/image - Sphinx Logo ● https://www.shareicon.net/react-js-logo-react-js-117367 - ReactJS Logo ● http://newprolab.com/en/dataengineer/img/logos/supersetcolor.png - Apache Superset Logo

Slide 40

Slide 40 text

Thanks Code: https://github.com/cbgaindia Email: [email protected] @gggodhwani @OpenBudgetsIn @CBGAIndia @DataKindBLR