Slide 1

Slide 1 text

CSVY – CSV reimagined Martin Fenner DataCite Technical Director http://orcid.org/0000-0003-1419-2405

Slide 2

Slide 2 text

What we like about CSV simple ubiquitous human-readable machine-readable

Slide 3

Slide 3 text

What we don’t like about CSV ambiguous incomplete simple

Slide 4

Slide 4 text

CSV is ambiguous RFC 4180 closest to official standard, but in practice different usage of charset header delimiter quoting comments skip lines

Slide 5

Slide 5 text

Given the widespread use ambiguous CSV, any attempt of defining CSV might come to late It might be easier to give that clearly defined CSV a different name

Slide 6

Slide 6 text

CSV is incomplete day,high,low Monday,19,6 Tuesday,17,7 Wednesday,14,5 Thursday,18,7 Friday,22,10 Saturday,22,10 Sunday,22,13

Slide 7

Slide 7 text

Adding metadata to CSV comments skip lines YAML header

Slide 8

Slide 8 text

Even minimal metadata would help --- title: High and low temperatures in Berlin the week starting May 2nd, 2016 --- day,high,low Monday,19,6 Tuesday,17,7 Wednesday,14,5 Thursday,18,7

Slide 9

Slide 9 text

Also define columns --- title: High and low temperatures in Berlin the week starting May 2nd, 2016 fields: - name: high title: High temperature in °C type: integer --- day,high,low Monday,19,6 Tuesday,17,7 Wednesday,14,5

Slide 10

Slide 10 text

Another example 2016-02-01,10.1016/0024-3205(78)90267-9,Brazil,Ivaí,-25.007219,-50.8575274 2016-02-01,10.1016/j.microc.2010.09.010,India,Ahmedabad ,23.022505,72.5713621 2016-02-01,10.1016/S0304-3932(96)01281-0,France,Paris,48.856614,2.3522219 2016-02-01,10.1039/C4PY01449A,United States,Dallas,32.7766642,-96.7969879 2016-02-01,10.1002/ejlt.200800064,Brazil,Aracaju,-10.9472468,-37.0730823 2016-02-01,10.1053/j.ackd.2011.09.003,Greece,Athina ,37.983917,23.7293599 2016-02-01,10.1016/j.jhazmat.2010.04.061,Algeria,N/A,24.0982893,3.743509 2016-02-01,10.1109/PowerEng.2015.7266364,Iran,Tiran, 32.7014728,51.1559259 2016-02-01,10.1039/C3TA15373H,Turkey,N/A,38.4336547,27.19191 2016-02-01,10.1016/j.jclepro.2012.01.003,Tunisia,Tunis,36.7997069,10.1675245

Slide 11

Slide 11 text

Another example --- id: http://doi.org/10.5061/dryad.q447c/1 title: Sci-Hub download data date: 2016-04-28 author: - Alexandra Elbaklan - John Bohannon publisher: Dryad Digital Repository fields: - name: city type: string required: false --- date,doi,country,city,lat,lon 2016-02-01,10.1016/0024-3205(78)90267-9,Brazil,Ivaí,-25.007219,-50.8575274 2016-02-01,10.1016/j.microc.2010.09.010,India,Ahmedabad ,23.022505,72.5713621 2016-02-01,10.1016/S0304-3932(96)01281-0,France,Paris,48.856614,2.3522219 2016-02-01,10.1039/C4PY01449A,United States,Dallas,32.7766642,-96.7969879 2016-02-01,10.1002/ejlt.200800064,Brazil,Aracaju,-10.9472468,-37.0730823 2016-02-01,10.1053/j.ackd.2011.09.003,Greece,Athina ,37.983917,23.7293599 2016-02-01,10.1016/j.jhazmat.2010.04.061,Algeria,N/A,24.0982893,3.743509 2016-02-01,10.1109/PowerEng.2015.7266364,Iran,Tiran,32.7014728,51.1559259 2016-02-01,10.1039/C3TA15373H,Turkey,N/A,38.4336547,27.19191 2016-02-01,10.1016/j.jclepro.2012.01.003,Tunisia,Tunis,36.7997069,10.1675245

Slide 12

Slide 12 text

Convert to Data Package { "id": "http://doi.org/10.5061/dryad.q447c/1", "title": "Sci-Hub download data", "author": [ { "name": "Alexandra Elbaklan" }, { "name": "John Bohannon" } ], "date": "2016-04-28", "publisher": "Dryad Digital Repository", "resources": { "schema": { "fields": { "name": "city", "type": "string", "required": false } } } }

Slide 13

Slide 13 text

Convert to csvw HTTP/1.1 200 OK Content-Type: text/tab-separated-values ... Link: ; rel="describedBy"; type="application/ csvm+json"

Slide 14

Slide 14 text

Does this pattern look familiar? CommonMark

Slide 15

Slide 15 text

Tables in CommonMark could use CSVY csv,conf is a non-profit community conference run by some folks who really love data and sharing knowledge. ,,, id,name,title rsmithunna,Richard Smith-Unna,"Easy, massive-scale reuse of scientific outputs" amoser,Aurelia Moser,"This is Not a Map: Building Interactive Maps with CSVs, Creative Themes, and Curious Geometries" tdoehman,Till Doehmen,There and back again - Automatic detection and conversion of logical table structures ,,, ### Big and small

Slide 16

Slide 16 text

CSV is simple Sometimes CSV is not the best format CSVY will not work for all CSV files