Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CSVY – CSV reimagined

CSVY – CSV reimagined

Martin Fenner

May 04, 2016
Tweet

More Decks by Martin Fenner

Other Decks in Science

Transcript

  1. CSV is ambiguous RFC 4180 closest to official standard, but

    in practice different usage of charset header delimiter quoting comments skip lines
  2. Given the widespread use ambiguous CSV, any attempt of defining

    CSV might come to late It might be easier to give that clearly defined CSV a different name
  3. Even minimal metadata would help --- title: High and low

    temperatures in Berlin the week starting May 2nd, 2016 --- day,high,low Monday,19,6 Tuesday,17,7 Wednesday,14,5 Thursday,18,7
  4. Also define columns --- title: High and low temperatures in

    Berlin the week starting May 2nd, 2016 fields: - name: high title: High temperature in °C type: integer --- day,high,low Monday,19,6 Tuesday,17,7 Wednesday,14,5
  5. Another example 2016-02-01,10.1016/0024-3205(78)90267-9,Brazil,Ivaí,-25.007219,-50.8575274 2016-02-01,10.1016/j.microc.2010.09.010,India,Ahmedabad ,23.022505,72.5713621 2016-02-01,10.1016/S0304-3932(96)01281-0,France,Paris,48.856614,2.3522219 2016-02-01,10.1039/C4PY01449A,United States,Dallas,32.7766642,-96.7969879 2016-02-01,10.1002/ejlt.200800064,Brazil,Aracaju,-10.9472468,-37.0730823 2016-02-01,10.1053/j.ackd.2011.09.003,Greece,Athina

    ,37.983917,23.7293599 2016-02-01,10.1016/j.jhazmat.2010.04.061,Algeria,N/A,24.0982893,3.743509 2016-02-01,10.1109/PowerEng.2015.7266364,Iran,Tiran, 32.7014728,51.1559259 2016-02-01,10.1039/C3TA15373H,Turkey,N/A,38.4336547,27.19191 2016-02-01,10.1016/j.jclepro.2012.01.003,Tunisia,Tunis,36.7997069,10.1675245
  6. Another example --- id: http://doi.org/10.5061/dryad.q447c/1 title: Sci-Hub download data date:

    2016-04-28 author: - Alexandra Elbaklan - John Bohannon publisher: Dryad Digital Repository fields: - name: city type: string required: false --- date,doi,country,city,lat,lon 2016-02-01,10.1016/0024-3205(78)90267-9,Brazil,Ivaí,-25.007219,-50.8575274 2016-02-01,10.1016/j.microc.2010.09.010,India,Ahmedabad ,23.022505,72.5713621 2016-02-01,10.1016/S0304-3932(96)01281-0,France,Paris,48.856614,2.3522219 2016-02-01,10.1039/C4PY01449A,United States,Dallas,32.7766642,-96.7969879 2016-02-01,10.1002/ejlt.200800064,Brazil,Aracaju,-10.9472468,-37.0730823 2016-02-01,10.1053/j.ackd.2011.09.003,Greece,Athina ,37.983917,23.7293599 2016-02-01,10.1016/j.jhazmat.2010.04.061,Algeria,N/A,24.0982893,3.743509 2016-02-01,10.1109/PowerEng.2015.7266364,Iran,Tiran,32.7014728,51.1559259 2016-02-01,10.1039/C3TA15373H,Turkey,N/A,38.4336547,27.19191 2016-02-01,10.1016/j.jclepro.2012.01.003,Tunisia,Tunis,36.7997069,10.1675245
  7. Convert to Data Package { "id": "http://doi.org/10.5061/dryad.q447c/1", "title": "Sci-Hub download

    data", "author": [ { "name": "Alexandra Elbaklan" }, { "name": "John Bohannon" } ], "date": "2016-04-28", "publisher": "Dryad Digital Repository", "resources": { "schema": { "fields": { "name": "city", "type": "string", "required": false } } } }
  8. Convert to csvw HTTP/1.1 200 OK Content-Type: text/tab-separated-values ... Link:

    <metadata.json>; rel="describedBy"; type="application/ csvm+json"
  9. Tables in CommonMark could use CSVY csv,conf is a non-profit

    community conference run by some folks who really love data and sharing knowledge. ,,, id,name,title rsmithunna,Richard Smith-Unna,"Easy, massive-scale reuse of scientific outputs" amoser,Aurelia Moser,"This is Not a Map: Building Interactive Maps with CSVs, Creative Themes, and Curious Geometries" tdoehman,Till Doehmen,There and back again - Automatic detection and conversion of logical table structures ,,, ### Big and small
  10. CSV is simple Sometimes CSV is not the best format

    CSVY will not work for all CSV files