Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CSVY – CSV reimagined

CSVY – CSV reimagined

855ee26b04af97fe0fc421b03a92454e?s=128

Martin Fenner

May 04, 2016
Tweet

Transcript

  1. CSVY – CSV reimagined Martin Fenner DataCite Technical Director http://orcid.org/0000-0003-1419-2405

  2. What we like about CSV simple ubiquitous human-readable machine-readable

  3. What we don’t like about CSV ambiguous incomplete simple

  4. CSV is ambiguous RFC 4180 closest to official standard, but

    in practice different usage of charset header delimiter quoting comments skip lines
  5. Given the widespread use ambiguous CSV, any attempt of defining

    CSV might come to late It might be easier to give that clearly defined CSV a different name
  6. CSV is incomplete day,high,low Monday,19,6 Tuesday,17,7 Wednesday,14,5 Thursday,18,7 Friday,22,10 Saturday,22,10

    Sunday,22,13
  7. Adding metadata to CSV comments skip lines YAML header

  8. Even minimal metadata would help --- title: High and low

    temperatures in Berlin the week starting May 2nd, 2016 --- day,high,low Monday,19,6 Tuesday,17,7 Wednesday,14,5 Thursday,18,7
  9. Also define columns --- title: High and low temperatures in

    Berlin the week starting May 2nd, 2016 fields: - name: high title: High temperature in °C type: integer --- day,high,low Monday,19,6 Tuesday,17,7 Wednesday,14,5
  10. Another example 2016-02-01,10.1016/0024-3205(78)90267-9,Brazil,Ivaí,-25.007219,-50.8575274 2016-02-01,10.1016/j.microc.2010.09.010,India,Ahmedabad ,23.022505,72.5713621 2016-02-01,10.1016/S0304-3932(96)01281-0,France,Paris,48.856614,2.3522219 2016-02-01,10.1039/C4PY01449A,United States,Dallas,32.7766642,-96.7969879 2016-02-01,10.1002/ejlt.200800064,Brazil,Aracaju,-10.9472468,-37.0730823 2016-02-01,10.1053/j.ackd.2011.09.003,Greece,Athina

    ,37.983917,23.7293599 2016-02-01,10.1016/j.jhazmat.2010.04.061,Algeria,N/A,24.0982893,3.743509 2016-02-01,10.1109/PowerEng.2015.7266364,Iran,Tiran, 32.7014728,51.1559259 2016-02-01,10.1039/C3TA15373H,Turkey,N/A,38.4336547,27.19191 2016-02-01,10.1016/j.jclepro.2012.01.003,Tunisia,Tunis,36.7997069,10.1675245
  11. Another example --- id: http://doi.org/10.5061/dryad.q447c/1 title: Sci-Hub download data date:

    2016-04-28 author: - Alexandra Elbaklan - John Bohannon publisher: Dryad Digital Repository fields: - name: city type: string required: false --- date,doi,country,city,lat,lon 2016-02-01,10.1016/0024-3205(78)90267-9,Brazil,Ivaí,-25.007219,-50.8575274 2016-02-01,10.1016/j.microc.2010.09.010,India,Ahmedabad ,23.022505,72.5713621 2016-02-01,10.1016/S0304-3932(96)01281-0,France,Paris,48.856614,2.3522219 2016-02-01,10.1039/C4PY01449A,United States,Dallas,32.7766642,-96.7969879 2016-02-01,10.1002/ejlt.200800064,Brazil,Aracaju,-10.9472468,-37.0730823 2016-02-01,10.1053/j.ackd.2011.09.003,Greece,Athina ,37.983917,23.7293599 2016-02-01,10.1016/j.jhazmat.2010.04.061,Algeria,N/A,24.0982893,3.743509 2016-02-01,10.1109/PowerEng.2015.7266364,Iran,Tiran,32.7014728,51.1559259 2016-02-01,10.1039/C3TA15373H,Turkey,N/A,38.4336547,27.19191 2016-02-01,10.1016/j.jclepro.2012.01.003,Tunisia,Tunis,36.7997069,10.1675245
  12. Convert to Data Package { "id": "http://doi.org/10.5061/dryad.q447c/1", "title": "Sci-Hub download

    data", "author": [ { "name": "Alexandra Elbaklan" }, { "name": "John Bohannon" } ], "date": "2016-04-28", "publisher": "Dryad Digital Repository", "resources": { "schema": { "fields": { "name": "city", "type": "string", "required": false } } } }
  13. Convert to csvw HTTP/1.1 200 OK Content-Type: text/tab-separated-values ... Link:

    <metadata.json>; rel="describedBy"; type="application/ csvm+json"
  14. Does this pattern look familiar? CommonMark

  15. Tables in CommonMark could use CSVY csv,conf is a non-profit

    community conference run by some folks who really love data and sharing knowledge. ,,, id,name,title rsmithunna,Richard Smith-Unna,"Easy, massive-scale reuse of scientific outputs" amoser,Aurelia Moser,"This is Not a Map: Building Interactive Maps with CSVs, Creative Themes, and Curious Geometries" tdoehman,Till Doehmen,There and back again - Automatic detection and conversion of logical table structures ,,, ### Big and small
  16. CSV is simple Sometimes CSV is not the best format

    CSVY will not work for all CSV files