Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CSVY – CSV reimagined

CSVY – CSV reimagined

Martin Fenner

May 04, 2016
Tweet

More Decks by Martin Fenner

Other Decks in Science

Transcript

  1. CSVY –
    CSV reimagined
    Martin Fenner
    DataCite Technical Director
    http://orcid.org/0000-0003-1419-2405

    View Slide

  2. What we like about CSV
    simple
    ubiquitous
    human-readable
    machine-readable

    View Slide

  3. What we don’t like about
    CSV
    ambiguous
    incomplete
    simple

    View Slide

  4. CSV is ambiguous
    RFC 4180 closest to official standard, but
    in practice different usage of
    charset
    header
    delimiter
    quoting
    comments
    skip lines

    View Slide

  5. Given the widespread use ambiguous
    CSV, any attempt of defining CSV might
    come to late
    It might be easier to give that clearly
    defined CSV a different name

    View Slide

  6. CSV is incomplete
    day,high,low
    Monday,19,6
    Tuesday,17,7
    Wednesday,14,5
    Thursday,18,7
    Friday,22,10
    Saturday,22,10
    Sunday,22,13

    View Slide

  7. Adding metadata to CSV
    comments
    skip lines
    YAML header

    View Slide

  8. Even minimal metadata
    would help
    ---
    title: High and low temperatures in
    Berlin the week starting May 2nd, 2016
    ---
    day,high,low
    Monday,19,6
    Tuesday,17,7
    Wednesday,14,5
    Thursday,18,7

    View Slide

  9. Also define columns
    ---
    title: High and low temperatures in
    Berlin the week starting May 2nd, 2016
    fields:
    - name: high
    title: High temperature in °C
    type: integer
    ---
    day,high,low
    Monday,19,6
    Tuesday,17,7
    Wednesday,14,5

    View Slide

  10. Another example
    2016-02-01,10.1016/0024-3205(78)90267-9,Brazil,Ivaí,-25.007219,-50.8575274
    2016-02-01,10.1016/j.microc.2010.09.010,India,Ahmedabad ,23.022505,72.5713621
    2016-02-01,10.1016/S0304-3932(96)01281-0,France,Paris,48.856614,2.3522219
    2016-02-01,10.1039/C4PY01449A,United States,Dallas,32.7766642,-96.7969879
    2016-02-01,10.1002/ejlt.200800064,Brazil,Aracaju,-10.9472468,-37.0730823
    2016-02-01,10.1053/j.ackd.2011.09.003,Greece,Athina ,37.983917,23.7293599
    2016-02-01,10.1016/j.jhazmat.2010.04.061,Algeria,N/A,24.0982893,3.743509
    2016-02-01,10.1109/PowerEng.2015.7266364,Iran,Tiran, 32.7014728,51.1559259
    2016-02-01,10.1039/C3TA15373H,Turkey,N/A,38.4336547,27.19191
    2016-02-01,10.1016/j.jclepro.2012.01.003,Tunisia,Tunis,36.7997069,10.1675245

    View Slide

  11. Another example
    ---
    id: http://doi.org/10.5061/dryad.q447c/1
    title: Sci-Hub download data
    date: 2016-04-28
    author:
    - Alexandra Elbaklan
    - John Bohannon
    publisher: Dryad Digital Repository
    fields:
    - name: city
    type: string
    required: false
    ---
    date,doi,country,city,lat,lon
    2016-02-01,10.1016/0024-3205(78)90267-9,Brazil,Ivaí,-25.007219,-50.8575274
    2016-02-01,10.1016/j.microc.2010.09.010,India,Ahmedabad ,23.022505,72.5713621
    2016-02-01,10.1016/S0304-3932(96)01281-0,France,Paris,48.856614,2.3522219
    2016-02-01,10.1039/C4PY01449A,United States,Dallas,32.7766642,-96.7969879
    2016-02-01,10.1002/ejlt.200800064,Brazil,Aracaju,-10.9472468,-37.0730823
    2016-02-01,10.1053/j.ackd.2011.09.003,Greece,Athina ,37.983917,23.7293599
    2016-02-01,10.1016/j.jhazmat.2010.04.061,Algeria,N/A,24.0982893,3.743509
    2016-02-01,10.1109/PowerEng.2015.7266364,Iran,Tiran,32.7014728,51.1559259
    2016-02-01,10.1039/C3TA15373H,Turkey,N/A,38.4336547,27.19191
    2016-02-01,10.1016/j.jclepro.2012.01.003,Tunisia,Tunis,36.7997069,10.1675245

    View Slide

  12. Convert to Data Package
    {
    "id": "http://doi.org/10.5061/dryad.q447c/1",
    "title": "Sci-Hub download data",
    "author": [
    { "name": "Alexandra Elbaklan" },
    { "name": "John Bohannon" }
    ],
    "date": "2016-04-28",
    "publisher": "Dryad Digital Repository",
    "resources": {
    "schema": {
    "fields": {
    "name": "city",
    "type": "string",
    "required": false
    }
    }
    }
    }

    View Slide

  13. Convert to csvw
    HTTP/1.1 200 OK
    Content-Type: text/tab-separated-values
    ...
    Link: ; rel="describedBy"; type="application/
    csvm+json"

    View Slide

  14. Does this pattern look
    familiar?
    CommonMark

    View Slide

  15. Tables in CommonMark
    could use CSVY
    csv,conf is a non-profit community conference run by some
    folks who really love data and sharing knowledge.
    ,,,
    id,name,title
    rsmithunna,Richard Smith-Unna,"Easy, massive-scale reuse of
    scientific outputs"
    amoser,Aurelia Moser,"This is Not a Map: Building Interactive
    Maps with CSVs, Creative Themes, and Curious Geometries"
    tdoehman,Till Doehmen,There and back again - Automatic
    detection and conversion of logical table structures
    ,,,
    ### Big and small

    View Slide

  16. CSV is simple
    Sometimes CSV is not the best format
    CSVY will not work for all CSV files

    View Slide