Slide 1 text
Collec&ng quan&ta&ve metadata by coun&ng all specimens in a herbarium Peter Desmet
Slide 2 text
Quan&ta&ve metadata are cool! A very colourful presenta&on by @peterdesmet #tdwg
Slide 3 text
Index Herbariorum 350,000,000 herbarium specimens worldwide
Slide 4 text
25,000,000 digi&zed and published (= 7%) GBIF Data Portal (Andrea Hahn)
Slide 5 text
What do we know about the other 93% ?
Slide 6 text
Descrip&ve metadata
Slide 7 text
Metadata registries‐herbaria
Slide 8 text
Collec&on name + code Address Staff Subcollec&ons
Slide 9 text
Es&mated size Based on what? Actually counted?
Slide 10 text
Geographic scope Pre^y well described How distributed?
Slide 11 text
Taxonomic scope Vascular plants + Bryophytes? Families? Genera?
Slide 12 text
Can we get some real numbers?
Slide 13 text
Vascular plants specimens are organized in Folders
Slide 14 text
Slide 15 text
Slide 16 text
Slide 17 text
Slide 18 text
What if we counted the folders?
Slide 19 text
And the # of specimens per folder?
Slide 20 text
? $ How much would it cost?
Slide 21 text
? days How long would it take?
Slide 22 text
What we did at the Marie-‐Victorin Herbarium (MT)
Slide 23 text
Move an es&mated 900,000 specimens
Slide 24 text
More space Reassign 350 -‐> 640 cases
Slide 25 text
New classifica&on Flowering plants: APG III (2009) Ferns: Smith et al. (2006)
Slide 26 text
Coun&ng Digi&zing Data cleaning Publishing
Slide 27 text
Slide 28 text
Slide 29 text
Average age > 60
Slide 30 text
Slide 31 text
1 summer
Slide 32 text
826 work hours 110 work days, 22 work weeks
Slide 33 text
Slide 34 text
4 volunteers
Slide 35 text
Paper -‐> Excel
Slide 36 text
Data cleaning
Slide 37 text
2 volunteers 1 professor 1 informa&cian
Slide 38 text
Correc&ng errors Typos, missed genera, dubious counts
Slide 39 text
New classifica&on Assigning families, correc&ng genera
Slide 40 text
Format data
Slide 41 text
Slide 42 text
1 informa&cian (me)
Slide 43 text
Google Fusion Tables‐inventory-‐gk
Slide 44 text
Darwin Core Archive via IPT‐inventory
Slide 45 text
Metadata = EML Descrip&ve metadata
Slide 46 text
Occurrence dataset basisOfRecord = PreservedSpecimen
Slide 47 text
1 record 1 folder 1 genus 1 loca&on in 1 tray
Slide 48 text
# specimens individualCount
Slide 49 text
What do we know now?
Slide 50 text
22,298 folders
Slide 51 text
628,664 specimens
Slide 52 text
2/3 of previous es&mate
Slide 53 text
21.5% digi&zed
Slide 54 text
380 families
Slide 55 text
82% of known families
Slide 56 text
5,298 genera
Slide 57 text
6 con&nents
Slide 58 text
Combina&ons Rubus specimens from Canada? Yes: 2921, in trays A236-‐07 – A238-‐04
Slide 59 text
Useful for us In-‐house management & planning Digi&za&on priori&es
Slide 60 text
Useful for others? Loans Demand driven digi&za&on?
Slide 61 text
Granularity Genus, con&nent -‐> Useful for climate change & invasive species studies?
Slide 62 text
Global picture Really 350 mil. specimens? How distributed over genus & con&nent?
Slide 63 text
Cost / Time ?
Slide 64 text
158 work days Publishing 1% Data cleaning 21% Digi&zing 8% Coun&ng 70%
Slide 65 text
5,740 $ total salary cost Publishing 7% Digi&zing 0% Coun&ng 37% Data cleaning 56%
Slide 66 text
110 specimens = 1$ 100 &mes cheaper than full digi&za&on
Slide 67 text
3,200,000 $ All 350 mil. specimens
Slide 68 text
138 h 1049 h Staff 5,740 $ Volunteers 0 $ 88% by volunteers
Slide 69 text
16,230 $ 10$ wage for “volunteers” + staff salary
Slide 70 text
9,000,000 $ All 350 mil. specimens
Slide 71 text
340 years 1 person at 7.5h/day, 5 days/week, no holidays
Slide 72 text
26 days One person per herbarium 3,400 herbaria -‐ Index Herbariorum
Slide 73 text
?! Tricky to extrapolate! What about non-‐mounted specimens? How useful is this data? Is there a metadata repository?
Slide 74 text
First step Towards some real numbers
Slide 75 text
Thanks!‐inventory Peter Desmet