Discovering Standards: Adoptability and Adaptability (with notes)

DISCOVERING STANDARDS Adoptability and Adaptability Dorothea Salo the iSchool at
UW-Madison NADDI 2015 Photo: NASA Goddard Space Flight Center, “Galaxy Cluster Abell 1689” https://www.flickr.com/photos/gsfc/4910568042/ CC-BY, enlarged/cropped Standards… the ﬁnal frontier. These… are the voyages of the starship NADDI… its continuing mission… … no, but seriously, thank you, Barry, and hi everybody, it’s great to meet you all. I’m Dorothea Salo, and I teach XML markup and research-data management — among other things — at the iSchool here at UW-Madison. So of course I’ve known about DDI for a long time, been watching it progress and gain adoption with great interest and delight. Designing a markup language is hard! (I know this because I’ve done it.) Getting anybody to USE a markup language is even harder! So I hope you are all proud of what DDI’s designers and user community have accomplished. I am certainly proud to stand here before all of you.

“Enhancing Discoverability Photo: NASA Goddard Space Flight Center, “Galaxy Cluster
Abell 1689” https://www.flickr.com/photos/gsfc/4910568042/ CC-BY, enlarged/cropped with Open Metadata Standards” So the tagline for NADDI 2015 is “Enhancing discoverability with open metadata standards.” Which, I have to say, probably not everybody’s cup of chai, you know?

IF YOU’RE HERE, YOU THINK THE… … IS A GOOD
LIFE! Photo: Ron Cogswell, “Standard Life Building — Downtown Jackson (MS) May 2013” https://www.flickr.com/photos/22711505@N05/8987155759 CC-BY, cropped But if you’re here in this room, you’ve drunk the chai already — you truly believe that the standard life is a good life.

SO DO I! Or I wouldn’t be a librarian. Photo:
NASA Goddard Space Flight Center, “Galaxy Cluster Abell 1689” https://www.flickr.com/photos/gsfc/4910568042/ CC-BY, enlarged/cropped Or teach in an iSchool. So do I! I believe! I believe in standards! If I didn’t believe in standards I wouldn’t have gone to an information school. If I didn’t believe in standards I wouldn’t have been interested in librarianship. If I didn’t believe in standards I certainly wouldn’t TEACH in an information school!

A few of my favorite standards Photo: NASA Goddard Space
Flight Center, “Galaxy Cluster Abell 1689” https://www.flickr.com/photos/gsfc/4910568042/ CC-BY, enlarged/cropped SGML ISO 12083 Text Encoding Initiative (TEI) XML PREMIS HTML (in all its variations) Dublin Core MODS METS NLM/JATS XSLT EPUB DAISY Digital Talking Book CSS Darwin Core VRA Core schema.org microdata Just for fun, a few of my favorite standards… *CLICK* hey, I have to ask, do we have any old-school unreconstructed SGML fans in the room? (if yes: Me too! Good old SGML! *click* ISO twelve-oh-eighty-three, best standard ever, am I right?) (if no: Awww, *click* no love for ISO twelve-oh-eighty-three? I am crushed. CRUSHED. You’re breaking my heart here.) Okay, okay, *click* also an XML fan, of course, plenty of scope for playing around with standards there. Some of you may notice that I don’t have RDF or OWL or anything else semantic-web or linked-data-ish on here. That’s deliberate. I work with RDF, I teach and train on it, I even give talks about it now and then, but I’m not a gigantic fan of it; I work with it and teach it because I have to, people need to know about it. *click* But I’ll add one more de facto standard that I AM a… cautious… fan of: schema-dot-org microdata. I’ll be mentioning it again later. And just so you know, it has NOTHING to do with what the DDI community typically calls microdata — so, yeah, just the VOCABULARY in the standards landscape is a mess!

A few of my favorite standards Photo: NASA Goddard Space
Flight Center, “Galaxy Cluster Abell 1689” https://www.flickr.com/photos/gsfc/4910568042/ CC-BY, enlarged/cropped SGML ISO 12083 Text Encoding Initiative (TEI) XML PREMIS HTML (in all its variations) NLM/JATS XSLT EPUB DAISY Digital Talking Book CSS Darwin Core schema.org microdata MODS METS VRA Core Dublin Core And now some of you are giving me sideeye with a “what in the world is WRONG with this woman?! Standards are all ﬁne and good, but moderation in all things!” Hey. I said librarians loved standards. I wasn’t kidding! Don’t try this level of dedication to standards at home; go to the library instead, okay? The point is, there’s LOTS of standards out there. SO MANY STANDARDS. You kind of have to be a librarian to love this universe, right?

WHY AM I TALKING ABOUT A TON OF WHEN WE’RE
HERE TO TALK ABOUT JUST ONE? Photo: Epic Fireworks, “Standard Fireworks Poster - Epic Fireworks” https://www.flickr.com/photos/epicfireworks/4500259738 CC-BY, cropped/brightened S No, but seriously, as I thought about what I wanted to say to you all today, I decided it was important to point out the very crowded and confusing standards and markup-languages space, not to mention the even MORE crowded and confusing best-practices space opening up around research-data management.

“Enhancing Discoverability Photo: NASA Goddard Space Flight Center, “Galaxy Cluster
Abell 1689” https://www.flickr.com/photos/gsfc/4910568042/ CC-BY, enlarged/cropped with Open Metadata Standards” OF And so I decided to hack the conference tagline, a little bit. (I do this. I hack things. You want to see my latest Mad Information Science hacking efforts, come on up to the iSchool library on the fourth floor of Helen C, I’ll show you my media-archaeology machine.) Instead of enhancing discovery with open metadata standards — which is a fine and worthy and VERY librarianly goal, don’t get me wrong — I decided to talk about *CLICK* enhancing discoverability OF open metadata standards. Why did I do that? Because in my head, THAT is the final frontier, the discovery and exploration frontier for standards — standards just like DDI. In a crowded, confusing, competitive standards landscape, how DOES anybody get a standard noticed? How do you get it adopted? How do people whose problems your standard can solve discover your standard? How do they decide to adopt it, and how do you explain to them why they SHOULD adopt it in the first place? And how does your standard, your one tiny galaxy in the giant universe, fit into the rest of their universe? That turns out to be a crucially important question these days, as it happens, because…

Photo: Judy Schmidt, ‘NGC 6720 “Ring”’ https://www.flickr.com/photos/geckzilla/10055992403 CC-BY Because there
is no to rule them all. there is no One Ring — I mean, One Standard — there is no One Standard to Rule Them All. I’ll say this again, because it’s important. THERE IS NO ONE DATA OR METADATA STANDARD. There never will be. There never SHOULD be. With all the million different things we create data from, and do with data, and need from data, there’s just no way to create a single comprehensive standard that makes sense for every imaginable kind of data and data use case.

Photo: Judy Schmidt, ‘NGC 6720 “Ring”’ https://www.flickr.com/photos/geckzilla/10055992403 CC-BY Sorry, but…
not even DDI. Sorry, but not even DDI. That means that inevitably — seriously, there’s no getting around this — DDI has no choice but to do two things. DDI has to COMPETE for mindspace and adoption against other standards, not to mention non-standard technologies like Microsoft Excel, which is of course one of the horrors that DDI was designed to prevent. And this competition takes place in what I already showed you is a huge, complicated, and confusing space. Secondly, DDI also has to FIT ITSELF INTO a universe where people will be using other standards and non-standard technologies alongside it, and they’d ideally like that to be easy for them.

IN WE ARE STRUGGLING WITH THIS RIGHT NOW. Photo: Erik
Lorenzsonn, “Madison Bubbler” https://www.flickr.com/photos/96684011@N05/9902415166 CC-BY DDI isn’t alone. This community is not alone in facing these challenges! In libraries, we are struggling with exactly the same thing right now.

Photo: Judy Schmidt, ‘NGC 6720 “Ring”’ https://www.flickr.com/photos/geckzilla/10055992403 CC-BY We (thought
we) had a to rule them all… We honestly thought, back in the nineteen-sixties, nineteen-seventies, that we’d created a One Ring, a One Standard that would rule them all — or at least DESCRIBE them all, everything, everything a library might collect.

Photo: John H Gray, “MARC 11” https://www.flickr.com/photos/8391775@N05/4431183721 CC-BY, cropped We
called it MARC, Machine Readable Cataloging, and a brilliant programmer and systems analyst named Henriette Avram designed it in the nineteen- sixties — as it happens, about the same time as SGML and relational databases were coming into existence.

Photo: Deborah Fitchett, “Catalogue cards” https://www.flickr.com/photos/deborahfitchett/2970373235 CC-BY And MARC was
designed so that computers could hold, share, and print out the kind of metadata you ﬁnd on a card in a library card catalog: author, title, subject, call number, copyright date, physical item description, and so forth. For books, but also for other things libraries collect: maps, music scores, journal titles, and so forth. By the way, you can impress your friends at parties with how long libraries have been standardizing stuff: the card catalog was invented by Melvil Dui in the mid-1800s, and card size and catalog size were standardized by the American Library Association in eighteen seventy-six. As a standardista, I love libraries, I really do — you have to love it when practically the FIRST ACT of a brand-new professional organization is to set a standard!

Photo: John H Gray, “MARC 11” https://www.flickr.com/photos/8391775@N05/4431183721 CC-BY, cropped And
it goes to show how durable a useful standard can be. The physical card catalog survived as a standardized technology well into the nineteen-nineties, after all, over a century of use. And for half a century now, half a century, MARC has been librarianship’s freight train, our rail gauge, our standards heavy hitter. I can’t begin to explain to you the importance of this standard in librarianship globally.

Photo: Deborah Fitchett, “Catalogue cards” https://www.flickr.com/photos/deborahfitchett/2970373235 CC-BY So what’s the
problem with MARC? As I just said, MARC was designed for printing catalog cards. That means that we librarians were designing our COMPUTERIZED record structure around a HUMAN-READABLE data format. DDI didn’t actually do this. DDI isn’t intended for humans to look at directly. And that’s good! That was the right design decision to make! I want you to understand WHY it’s good, though, because it’s something that you may well have to explain to potential DDI adopters who expect something more human-friendly than raw DDI is.

Photo: That’s a Big If, “Cart Before the Horse” http://bestandworstever.blogspot.com/2013/01/worst-thing-to-put-before-horse-ever.html
CC-BY 2.5, cropped So we couldn’t really have known this at the time in libraries, but it turned out that basing our data structure for computers on something meant to be human-readable was kind of putting the cart before the horse. This perfectly understandable and reasonable decision, the decision to build a standard around human-readability, actually hurt libraries and librarians in the long run.

Photo: That’s a Big If, “Cart Before the Horse” http://bestandworstever.blogspot.com/2013/01/worst-thing-to-put-before-horse-ever.html
CC-BY 2.5, cropped One reason is that designing around the card catalog, which was then totally the ultimate in human-readable data display meant serious problems when the ultimate in data display changed on us, went all digital! And there are eighteen long stories here that I’m passing over in silence, but the practical upshot is that catalog cards didn’t translate real well to web pages, never mind web search engines like Google. Worse, the human-readable catalog-card format turned out to be ridiculously hard to program against for computerized indexing and search. DDI didn’t do this. DDI’s design is based on the structures inherent in the DATA, without making assumptions about how humans would want to see or manipulate it. And that was exceptionally wise, because humans don’t always want to see or use data the same way.

Photo: Judy Schmidt, ‘NGC 6720 “Ring”’ https://www.flickr.com/photos/geckzilla/10055992403 CC-BY Because there
is no to rule them all. Another problem I’ll mention with MARC has to do with what I said earlier about there not being a single standard that handles every single use with equal ease and effectiveness. MARC tried to be that standard, for libraries. It failed, and we’ve been dealing with that failure for decades. Talk to any music cataloger! MARC was designed for books, not sheet music, and there are some key differences that it just doesn’t respect. Or talk to anybody who deals with CDs or DVDs or other multimedia. Just forget it, MARC’s terrible for that stuff. Even at the time, MARC was a poor ﬁt for some of the library environment — librarians were just so laser-focused on books that they didn’t take the rest of library collections seriously enough. I encourage the DDI community to look seriously at its edge cases, maybe even publicize them. Where is DDI being used in unexpected contexts? Is it a good ﬁt? If it isn’t, could it be, or is the problem truly out of scope? Where is DDI being “misused,” are any of the so-called misuses interesting enough to become real use cases? Sometimes the way standards achieve broad adoption is by paying closer attention to problems that weren’t in the original scope. Just a suggestion.

Photo: Spanish Coches, “Mismatched rear lights” https://www.flickr.com/photos/39302751@N06/4168859009/ CC-BY Returning to
MARC, the really, really big reason that modeling MARC on human-readable catalog cards was a huge mistake for libraries has to do with data consistency — or more properly, lack thereof. Humans can usually — not always, but usually — read past inconsistency, or ignore it when it’s not important. You or I might chuckle or frown if we wound up driving behind this car in traffic, but we probably wouldn’t crash our car into it, right? Because the inconsistent taillight coverings don’t matter to us really. We look right past them.

Photo: Charlotte L., “may27 211” https://www.flickr.com/photos/charlottel/154443920/ CC-BY True story, one
day when I was a new librarian, I accidentally wore one black shoe and one navy-blue shoe in the same style to work, and I was completely mortiﬁed once I noticed, but absolutely nobody else saw it. This is an amazing, brilliant human skill, this ability to cope with inconsistency. We’re also, as a species, absolutely top-notch at dealing with this in text — most of us handle abbreviations, misspellings, smartphone autocorrect errors, no problem! But this amazing human skill of tolerating inconsistency makes a MESS of data structures, especially when computers enter the picture. While MARC was being designed, nobody cared whether its standards and practices were completely consistent. Formal consistency wasn’t even considered worth shooting for, because who would even notice really if it wasn’t there? Just like nobody noticed my shoes that day.

Photo: Rick Harris, “meh” https://www.flickr.com/photos/rickharris/430890004/ CC-BY So there are lots
of places in library cataloging standards where the instructions just shrug and say “meh, put whatever you want, as long as people can understand it.” I kid you not, the standards say “grab a fortune cookie and write down what it says, meh, whatever, it doesn’t matter.” And I see some of you cringing, because you know the kinds of data analysis and data reuse problems that leads to — it’s part of why DDI exists, right? — and so do I, it’s just that in the nineteen-sixties nobody knew that yet. Except maybe EF Codd, but relational databases were still being invented at the time, so never mind.

Photo: Rick Harris, “meh” https://www.flickr.com/photos/rickharris/430890004/ CC-BY ARGH! And the real-world
consequence of that decision has been that libraries are completely dependent on expensive, lousy, backward computer systems to run our operations. We’re stuck! MARC locked us out of using off-the-shelf or open-source software for the most part, partly because none of it was designed to read MARC — seriously, who knows about MARC except librarians? — and partly because writing code to handle the records that were inconsistent because the rules didn’t TELL anybody to be consistent is a computer programmer’s purgatory! It’s not easy, it’s not fun — that’s an understatement — so the open-source community waves it on past, and libraries aren’t a big enough market to attract much for-proﬁt programming effort. And, I’m saying, neither is DDI a really big market either, though I do know about Colectica and I’m glad it exists, but seriously, the more fun you can make working with DDI data, the more software the community will have. Consistent data is easy and fun to work with. Inconsistent data is not.

Photo: Jason Eppink, “split stairs” https://www.flickr.com/photos/jasoneppink/3523576857/ CC-BY It gets worse.
After MARC was standardized, library catalogers wanted to make catalog cards better for the people who used libraries, and when online catalogs came around, they wanted to make THOSE work better too, but the only tool they had to make changes with was how they made their MARC records. So this completely praiseworthy “users ﬁrst!” ethos among library catalogers meant that they dinked around with the structure and content of MARC records in inconsistent and computer-unfriendly ways. This led, as you’d expect, to all kinds of inconsistency across records even just in a single catalog in a single library! As for records across all the MARC- using libraries in the world, just forget it — there is heinous amounts of inconsistency there, all in the name of making life easier for people. I sure hope this isn’t happening with DDI.

Photo: GotCredit (http://www.gotcredit.com/), “Break” https://www.flickr.com/photos/jakerust/16639995227/ CC-BY, cropped What we didn’t
know in nineteen-sixty but know REALLY WELL now is that computers just cannot read right past inconsistency the way humans do. Generally they break. When they don’t break, it takes absolutely HEROIC programming effort to get them past the inconsistency.

CONSISTENCY IS WHY THE IS A GOOD LIFE! Photo: Ron
Cogswell, “Standard Life Building — Downtown Jackson (MS) May 2013” https://www.flickr.com/photos/22711505@N05/8987155759 CC-BY, cropped This, of course, is a major reason humans invent and use standards like MARC and DDI to begin with! Standards help design and enforce a degree of consistency that an unaided human being is generally not capable of and certainly won’t produce spontaneously. Now, as we’ve seen with MARC, a standard is not an ironclad guarantee of consistency, of course; HTML is another great example of this. People abuse standards, they don’t learn them well, sometimes they even insist on loosening standards up because they don’t want the validator yelling at them any more. But by and large, the last best hope for consistency — anybody see what I did there? Babylon 5, getting all the geek jokes in today — the last best hope for consistency is still some kind of standard.

Photo: NASA Goddard Space Flight Center, “Galaxy Cluster Abell 1689”
https://www.flickr.com/photos/gsfc/4910568042/ CC-BY, enlarged/cropped “Enhancing Discoverability OF Open Metadata Standards” But that strictness, that enforcement of consistency, comes at a cost. And in talking with researchers, and graduate students who are learning to become researchers, I’ve found it’s a cost that especially hurts at the standards-discovery stage. For a standard, the discovery stage is when people who don’t already use a standard on their data, but have that nagging uneasy sense that maybe they SHOULD be using one, these people go searching the huge, complicated, confusing standards universe to try to discover the standard that they should be using.

Photo: Maxwell GS, “Coffee House Clarice 2” https://www.flickr.com/photos/maxwellgs/4267310664/ CC-BY, cropped
And the FIRST QUESTION that someone in the middle of the standards-discovery process asks when they spot a likely standard is “Can I do this? Can I work with this?” And of course they ask other questions, sure they do, but the ﬁrst question, always, every single time, is a total gut-check CAN I DO THIS?

Photo: Haria Varlan, “The word no made from jigsaw puzzle
pieces” https://www.flickr.com/photos/horiavarlan/4536149424/ CC-BY And strict enforcement of consistency makes standards harder to use, harder to experiment with, easier to mess up. Strict enforcement of consistency makes it a lot more likely that a standards-discoverer’s answer to the gut-check “can I do this?” question will be “nope, this is way out of my league, moving on now!”

Photo: sixpounder, “Deputy Enforcement Officer Blanche Rogers, 1913, Dewey, Oklahoma”
https://www.flickr.com/photos/sixpounder/14522827773/ And what I’ve found in my standards-building and standards-using life is that if the answer to that gut-check is “no,” honestly the only way a standard EVER grabs that potential user back is by MAKING them use it, meaning a journal requirement or a funder mandate or a repository mandate or whatever. The DDI community knows about this; ICPSR is DDI’s current enforcer. And ICPSR has done a great job in that role, but it’d be nice if DDI had carrots as well as sticks, right? And not every social scientist engages with ICPSR, either.

Photo: Lachlan Donald, “Sharpest tool in the shed” https://www.flickr.com/photos/lox/9408028555/ CC-BY,
cropped The other way to encourage standards use is by making the standard use invisible by baking it into a tool, sort of like Colectica has tried to do, but the problem with that is that people are persnickety about their tools. Not everybody will use the same one. So we’re left with people looking at a standard that’s new to them and saying “I can’t use my favorite tool with this standard?! Well, FORGET THIS STANDARD then!”

Photo: Erich Ferdinand, “no” https://www.flickr.com/photos/erix/99778255 CC-BY, cropped And from your
point of view, you totally WANT people with social-science data from surveys and interviews and the like to choose DDI, right? And you want people who need to understand or reuse that data to see that it’s in DDI and cheer, because they know they can ﬁgure out how to do what they need to do with it, right? So that’s two audiences of standards discoverers that DDI has to court, people who MAKE social science data and people who USE social science data. So this tension between a standard that makes consistent computer-friendly data, and a standard that human beings can ﬁgure out how to use, is really important for the DDI community, an important cost to mitigate if you can. You want standards discoverers to encounter DDI and say “yes, I can do this!”

Lorenzsonn, “Madison Bubbler” https://www.flickr.com/photos/96684011@N05/9902415166 CC-BY And in libraries, we’re trying to ﬁgure this one out too. We pretty much know it’s time for MARC to go out to pasture.

And we know this partly because MARC, in addition to
making it harder and more expensive to run library systems, has been a serious barrier to getting everyone ELSE in the world, from library vendors to programmer hobbyists, to work comfortably with what libraries know about what libraries have! I mean, look at this, I went over to Wikipedia’s article on MARC for a quick check on something and had to stop to laugh at the top cleanup note! If you can’t read it from where you are, it says “This article may be too technical for most readers to understand, blah blah fix it.” Now look. When Wikipedia says “most readers” it really means “most Wikipedians,” and Wikipedians tend heavily toward the computer-nerdy. If COMPUTER NERDS can’t figure MARC out, MARC has a pretty serious comprehensibility problem. So for this reason, and for the horrific inconsistency across the universe of MARC records that makes dealing with them via computer so difficult and frustrating, MARC’s gotta go.

Lorenzsonn, “Madison Bubbler” https://www.flickr.com/photos/96684011@N05/9902415166 CC-BY And it’s looking pretty likely that the successor standards to MARC will be based on a technology called “linked data.” You may or may not have heard of linked data — I know DDI is currently working on three linked-data vocabularies, but it looks to me like it’s still early days for those — but look, honestly, it doesn’t matter if you haven’t. The point is, librarians are hunting a way forward through standards discovery. A lot of us are looking at linked data for the very ﬁrst time, and… … let’s just say it’s not going as well as it might.

Photo: Haria Varlan, “The word no made from jigsaw puzzle
pieces” https://www.flickr.com/photos/horiavarlan/4536149424/ CC-BY A lot of librarians have looked at linked data, done the “can I do this” gut-check, and had the answer be “OH MY GOSH GET ME OUT OF HERE WHAT EVEN IS THIS NONSENSE I CAN’T WITH THIS.” So far, linked data has TOTALLY FAILED the gut-check test among librarians. It ain’t pretty, let me tell you. Bone folders at ten paces, people.

Photo: Maxwell GS, “Coffee House Clarice 2” https://www.flickr.com/photos/maxwellgs/4267310664/ CC-BY, cropped
So I’m going to ask you all this, and you don’t have to answer me except in your heart. How often has DDI failed the gut-check test among social scientists? How many of your colleagues have taken one look at DDI and said “oh HECK naw, are you kidding me?” And if the number is as high as I suspect it is, what can the DDI community do about that? Library linked data, speaking sociologically, is a total mess, I can’t even begin to tell you. I don’t want the same for DDI. YOU don’t want the same for DDI.

Photo: Doctor Popular, “Harinezumi glitch” https://www.flickr.com/photos/docpopular/8540519772/ CC-BY Because no lie,
I am a DDI fan, because I’m a digital preservationist — that’s another thing I teach — and I know what’ll happen to a lot of social-science datasets that should be in DDI but aren’t. They’ll glitch, like this image up here, and then they’ll die. That information will be unrecoverably lost. Ain’t NOBODY want that. I have another dog in this hunt too, and that’s this: the social science community, by and large, is light-years ahead of the rest of research when it comes to taking proper care of data. I really want other disciplines to LEARN FROM you people, because that’ll make my life as a digital research-data preservationist easier! But that brings up a consistency thing again — if even social scientists can’t converge on a standard as useful as DDI, how useful are social scientists as a model? So I NEED DDI to pass the gut-check test.

Photo: NASA Goddard Space Flight Center, “Galaxy Cluster Abell 1689”
https://www.flickr.com/photos/gsfc/4910568042/ CC-BY, enlarged/cropped “Enhancing Discoverability OF Open Metadata Standards” So let me close by making some suggestions, as an outsider to the DDI community who is nonetheless invested in DDI’s success, about how DDI might pass more gut checks, become more discoverable, more adoptable, and more adaptable.

After I stopped laughing at MARC’s Wikipedia page, just for
the heck of it I looked up DDI’s. It does have one, and that’s great, that’s totally step one. But it’s got a blah-blah-fix-it note up too, this time about uncited information. Like it or not, Wikipedia is a place a lot of people go for that “can I do this?” gut check. Blah-blah-fix-it notes do not inspire confidence in these people. I really recommend a community Wikipedia hackathon day or whatever to fix this. One thing you may well find is that some or all of the uncited information in the Wikipedia page here doesn’t actually HAVE an available, citable online source. That’s a problem! That’s a documentation problem for DDI! If there’s information basic enough to be in the Wikipedia entry, you ABSOLUTELY want to ensure it’s in DDI’s website and documentation also.

Useless! So here’s DDI’s home page, the other likely place
for that gut-check question. And I love y’all, I really do, but tough love here: this page is noooooooooot good. This page seems absolutely DESIGNED to make standards discoverers run screaming in the opposite direction. *CLICK* Just as a minor example, the ﬁrst information after the nav is the last-updated date. Come on. Nobody’s coming to this page looking for that! Put it in the page footer where it belongs.

argh, WTF?! And then there’s the lifecycle diagram, and look,
I know lifecycle models were trendy in like two-thousand-nine or so, but my experience is that they’re TERRIBLE communication tools. Nobody understands these things; they’re too vague and abstract for people who do research to see themselves and their workﬂows in. This one speciﬁcally, it’s totally not clear why DDI is at the center of the picture, or why it’s in this weird gear thing, and the picture totally doesn’t make clear what DDI actually DOES or how it helps with all the things in the blue boxes. Ditch this thing. Seriously, just dump it. It’s not helping DDI’s adoptability among social scientists.

Aha! I get this! Surveys Interviews Codebooks Microdata What I
might do instead, and this is only a suggestion, is to explain clearly what kinds of research and research data DDI works with. This is a quick list off the top of my head, you could probably do better, but the point is, a researcher who doesn’t use DDI will come to this page, see that list, and if they make the kind of data that DDI is good for, they’ll IMMEDIATELY recognize that! Which they can’t from this diagram.

NOPE!!!!! And then there’s DDI’s tagline, “a metadata specification for
the social and behavioral sciences.” Two things about this. One, gimme an estimate here, how many social and behavioral scientists have a sense of what “metadata” even means? I mean, it’s probably higher than some other disciplines, but in my experience, lots and lots of researchers bounce RIGHT OFF the word “metadata,” and its negative connotations due to our friends at the enn-ess-ay probably don’t help much. Two, take it from a librarian, DDI is not just a metadata specification! It contains metadata, sure, codebooks are metadata and instrument descriptions are metadata, but DDI is ALSO a content and data specification! You don’t JUST describe your interview instruments or your survey methodology with DDI, you can ALSO put the actual interview transcripts or survey results in DDI. And this seems like a persnickety objection, and I won’t lie, it is! But look, putting my librarian hat on — librarians hold pretty strictly to the distinction between content and metadata. And as some of you learned yesterday from my fellow librarians Brianna Marshall and Trisha Adamus and Kristin Briney, librarians are helping guide standards discoverers to standards these days, and if your home page misleads librarians about what the DDI standard actually does, I kinda think it’s a problem.

ugh… And this is nitpicky, but exactly how many DDI
speciﬁcations are there? To somebody trying for that gut-check, hearing that DDI is one speciﬁcation from the tagline, and then seeing a couple inches down that it’s MORE than one is worrisome. It’s like a mini-bait-and-switch, like you’re trying to make DDI seem easier than it is.

CAN I DO THIS???? Last thing: This page doesn’t even
try to answer the gut-checker’s ﬁrst question. CAN I DO THIS? Heck if I can tell from this page. And no, nobody wants to start from the documentation, especially if it’s called that. Documentation is what you give your grad students so they can get on with it and you can ignore it, right? Y’all need a getting-started page here in the worst way.

It’s about what It’s not about what Photo: NASA Goddard
Space Flight Center, “Galaxy Cluster Abell 1689” https://www.flickr.com/photos/gsfc/4910568042/ CC-BY, enlarged/cropped can do with DDI. YOU I STUDENTS LIBRARIANS JOURNALISTS WEB DEVS SEARCH ENGINES Stepping back from speciﬁcs, there’s a thought-pattern that I want to encourage the DDI community to use. No joke, I really mean this, DDI will not succeed or fail based on what you here in this room can do with DDI. You wouldn’t be here if you weren’t already knowledgeable, okay?! ***CLICK*** So DDI adoption is not about you. It’s about me, it’s about what I can do with DDI as a community outsider. Look, DDI wants me, because I train people who you hope will be community insiders someday! And I’m not the only outsider you care about, either. *CLICK* It’s students. *CLICK* It’s librarians helping people preserve data and ﬁnd datasets. *CLICK* It’s journalists looking for stories, stories that might be lurking in your data. *CLICK* It’s about web developers looking for interesting data to mash up. *CLICK* And bringing it back to the conference theme, it’s DEFINITELY about search engines. So much. SO MUCH about web search engines.

ASK PEOPLE! Photo: NASA Goddard Space Flight Center, “Galaxy Cluster
Abell 1689” https://www.flickr.com/photos/gsfc/4910568042/ CC-BY, enlarged/cropped ASK EDUCATORS And as anyone who works toward web usability knows, the way you ﬁgure out what people think when they do that gut-check with your standard is to ASK THEM. Hey, check out the DDI web page, do you get what DDI is now? No? Okay, what don’t you understand? And you revise your page from there. But that can be hard and time-consuming to do for every population of outsiders you’re interested in, and I actually think there’s a short-cut. *CLICK* Ask educators. Ask people who teach about DDI. Ask people who teach DDI — Jane Fry, are you here? Ask Jane Fry! We educators see people’s ﬁrst encounters with new standards ALL THE TIME. We can totally tell you what trips people up! It’s what I’ve just been doing, right?

And maybe you don’t believe me about outsiders, so I’m
going to show you something. This is a Q-and-A site called Open Data StackExchange, which is where people who are interested in open data ask and answer each others’ questions. And a TON of questions on this site revolve around social-science data, mostly where to ﬁnd it. And it’s kind of hilarious, the kinds of data people just assume somebody has, I really want to ask half the questioners on this site why on EARTH they think the dataset they want exists, but look, that’s not the point.

uh-oh… The point is there are a LOT of potential
DDI users here, from both sides of the pipeline — data creators and data users. Do they know about DDI? Not from Stack Exchange presently, as you can see. DDI is not part of this universe. And as I keep repeating, even if somebody points them to it, when they ask themselves that gut-check question, “Can I do this? Can I do something with DDI? Is there something in DDI for me?” DDI really needs the answer to be “yes.” And right now it’s not. So what can DDI do to ﬁt better into new users’ environments?

I think part of the answer for DDI, just as
it is for libraries, is “ﬁtting into the World Wide Web better.” And that’s why I bring up microdata, as I promised I would earlier. And again, sorry, terminology problem, this is YOUR deﬁnition of microdata, “data concerning individuals in a trial, survey, et cetera,” but that isn’t actually what I mean today.

I mean the second deﬁnition, “data stored in a microformat.”
Which is a completely useless deﬁnition, of course…

… so let’s look up microformat. “A simple data format
that can be embedded in a webpage.” AHA. Now we’re on to something, something that might help DDI ﬁt better into the larger web.

So, where you go to ﬁnd out about web page
microdata is a website called schema dot org. And they give an even better deﬁnition of microdata: “schemas webmasters can use to mark up HTML pages in ways RECOGNIZED BY MAJOR SEARCH PROVIDERS.” And they go on to say that all the major search engines use microdata to improve how their search results look. Now, who wouldn’t kill to have Google actually understand what a DDI dataset is — just understanding THAT IT’S A DATASET would be a lot all by itself! If Google actually helped people FIND DDI datasets, and give them an idea of what they’re looking at? Wouldn’t that be great? THAT is what microdata can do for DDI.

But does microdata understand what a dataset is, you ask?
Why yes, yes it does! In a kind of limited way, I grant you — you won’t be able to pack ALL your metadata into the web page for your project — but enough so that Google search results, before anybody even clicks on one, can say “this is a dataset about midlife, its called MIDUS, it’s by these researchers and it’s published by the UW Institute on Aging” and so on and so forth.

Even better, microdata understands that datasets often come in catalogs,
so if you have a project portal, you can totally tell Google that it’s a project portal with a whole bunch of datasets in it! And coming full circle here, microdata is how I think DDI and DDI datasets should be leveraging their metadata to enhance their own discoverability. And even better, I think this will even help with the gut-check question from potential DDI users. If DDI makes it super-easy to create microdata for a project web page or portal, maybe through an XSLT stylesheet or an HTML-plus-microdata template or building it into existing DDI tools or whatever, I really think “it’ll be way easier for people to Google my dataset” is a pretty compelling statement of DDI’s worth. So as DDI experiments with linked data and other possible serializations and representations for social-science data — and I know you’re doing that this VERY AFTERNOON! — I encourage you to put microdata on the list. See what you can do with it. Let’s show standards discoverers what DDI is good for!

THANK YOU! This presentation is available under a Creative Commons
Attribution 4.0 International license. Dorothea Salo [email protected] http://dsalo.info/ Photo: NASA Goddard Space Flight Center, “Galaxy Cluster Abell 1689” https://www.flickr.com/photos/gsfc/4910568042/ CC-BY, enlarged/cropped So thanks for sticking with me through all that, I hope some of it’s helpful, and if you’d like to get in touch with me or you’re curious about what I do, my contact information’s on the slide there. Have a great day here in Madison, and long live DDI!

Discovering Standards: Adoptability and Adaptab...

Discovering Standards: Adoptability and Adaptability (with notes)

More Decks by Dorothea Salo

Other Decks in Science

Featured

Transcript