Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Ric Roberts: Data Platforms and the Data Value Chain

Swirrl
June 15, 2017

Ric Roberts: Data Platforms and the Data Value Chain

This deck supports Ric's talk about the wider perspective of data publishing, where the act of publishing itself sits within that, what it entails and the advantages it brings. For all talks from the day, head to: http://power-of-data-2017.swirrl.com/

Swirrl

June 15, 2017
Tweet

More Decks by Swirrl

Other Decks in Technology

Transcript

  1. Data Platforms
    and the
    Data Value Chain
    CTO, @RicRoberts

    View full-size slide

  2. data
    You are here.

    View full-size slide

  3. profit!
    You want to be here.

    View full-size slide

  4. profit!
    Examples
    • Deciding the best place to put a new school
    • Benchmarking hospitals
    • Working out the impact of poor air quality on health
    • Calculating the cost of increased flood risk

    View full-size slide

  5. data profit!
    ?

    View full-size slide

  6. data
    collect
    clean
    curate

    View full-size slide

  7. Examples
    • Surveys of various sorts
    • Administrative systems
    • Sensors e.g. detecting river flow
    • Social media
    data
    collect
    clean
    curate

    View full-size slide

  8. data
    ?
    profit!

    View full-size slide

  9. data
    use
    profit!

    View full-size slide

  10. use
    • exploring
    • filtering
    • aggregating
    • downloading
    • exporting
    • analysing (data science!)
    • generating reports (xls, pdf, doc, ppt)
    • using it in interactive apps or visualisations
    • sharing results

    View full-size slide

  11. data
    ?
    use
    profit!

    View full-size slide

  12. data
    connect
    use
    profit!

    View full-size slide

  13. connect
    • A common set of names for the things in the data.
    • A shared, documented and understood model of the data.
    • An agreed set of technologies for communicating and
    manipulating the data (standards!).
    • The data needs to be in a place people can get to it, in an
    relevant format (with a licence).

    View full-size slide

  14. https://www.flickr.com/photos/kewl/7006904747

    View full-size slide

  15. connect
    In computing, linked data is a method of publishing
    structured data so that it can be interlinked and become
    more useful through semantic queries. It builds upon
    standard Web technologies such as HTTP, RDF and URIs, but
    rather than using them to serve web pages for human
    readers, it extends them to share information in a way that
    can be read automatically by computers. This enables data
    from different sources to be connected and queried.
    — Wikipedia

    View full-size slide

  16. connect
    • A common set of names for the things in the data.
    • A common set of names for the things in the data.
    • An agreed set of technologies for communicating and
    manipulating the data (standards!).
    • The data needs to be in a place people can get to it, in an
    relevant format (with a licence).

    View full-size slide

  17. connect
    • A common set of names for the things in the data.
    • A shared, documented and understood model of the data.
    • An agreed set of technologies for communicating and
    manipulating the data (standards!).
    • The data needs to be in a place people can get to it, in an
    relevant format (with a licence).

    View full-size slide

  18. connect
    • A common set of names for the things in the data.
    • A shared, documented and understood model of the data.
    • An agreed set of technologies for communicating and
    manipulating the data (standards!).
    • The data needs to be in a place people can get to it, in an
    relevant format (with a licence).

    View full-size slide

  19. connect
    • A common set of names for the things in the data.
    • A shared, documented and understood model of the data.
    • An agreed set of technologies for communicating and
    manipulating the data (standards!).
    • The data needs to be in a place people can get to it, in an
    relevant format (with a licence).

    View full-size slide

  20. https://www.flickr.com/photos/iwannt/8596885627

    View full-size slide

  21. data
    ?
    connect
    use
    profit!

    View full-size slide

  22. data
    publish
    connect
    use
    profit!

    View full-size slide

  23. publish An (RDF) Graph Store
    Apache Jena

    View full-size slide

  24. publish Extract, Transform, Load (ETL)
    grafter.org github.com/swirrl/grafter

    View full-size slide

  25. publish Drafting and publication workflow

    View full-size slide

  26. publish A User Interface

    View full-size slide

  27. publish APIs

    View full-size slide

  28. data
    publish
    connect
    use
    profit!
    collect
    clean
    curate
    What’s limiting the effectiveness of
    this value chain?
    • Cottage industry of skilled individuals
    • Data preparation is not always considering bigger picture
    • Those expending the costs != those reaping the benefits
    • Availability of skilled data analysts
    • Lack of guidance and standardisation

    View full-size slide

  29. data
    publish
    connect
    use
    profit!
    collect
    clean
    curate

    View full-size slide

  30. Data Platforms
    and the
    Data Value Chain
    CTO, @RicRoberts

    View full-size slide