Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CNI2012

tingletech
December 11, 2012

 CNI2012

#cni12f SNAC slides

tingletech

December 11, 2012
Tweet

More Decks by tingletech

Other Decks in Research

Transcript

  1. 2012-11-04 - SLIDE 12/11/12 Building an Archival Identity Management Network:

    Transforming Archival Practice and Historical Research Daniel Pitti* and Brian Tingle** * Institute for Advance Technology in the Humanities ** California Digital Library Thanks to Ray R. Larson of the University of California, Berkeley, School of Information for many of the slides here Wednesday, December 12, 12
  2. 2012-11-04 - SLIDE 12/11/12 Funding and People • Funding and

    Timeline – National Endowment for the Humanities – May 2010-April 2012 – Andrew W. Mellon Foundation – May 2012-April 2014 • People – Daniel Pitti (PI) and Worthy Martin (Institute for Advanced Technology in the Humanities, University of Virginia) – Adrian Turner and Brian Tingle (California Digital Library, University of California) – Ray Larson (School of Information, University of California, Berkeley) Wednesday, December 12, 12
  3. 2012-11-04 - SLIDE 12/11/12 The Source Data • EAD-encoded finding

    aids (guides to archival records) – 150K – Primarily from U.S. sources, but also U.K. and France • Archival authority records (360K) – National Archives and Records Administration – State Archive of New York – Smithsonian Institution – British Library – National Archives (France) & BnF • WorldCat Archival Descriptions: 2M Wednesday, December 12, 12
  4. 2012-11-04 - SLIDE 12/11/12 Library and Museum Authority Records •

    Getty Vocabulary Program: Union List of Artist Names (293K personal and corporate names) • Virtual International Authority File (16M+ cluster records) – Contributed from around the world by national libraries and others Wednesday, December 12, 12
  5. 2012-11-04 - SLIDE 12/11/12 Methods and Processing • Extract EAC-CPF

    records from existing EAD- encoded archival descriptions – Extracting both creators and referenced CPF names • Match EAC-CPF records against one another and against existing authority records (ULAN, VIAF, LCNAF) – Enhance EAC-CPF by normalizing entries, adding alternative entries, titles (VIAF), and historical data (ULAN) • Create a prototype historical resource and access system – Historical data and social-professional networks – Links to archive, library, and museum resources (by and about) Wednesday, December 12, 12
  6. 2012-11-04 - SLIDE 12/11/12 Example EAD Record (Hub) <EAD> <EADHEADER

    LANGENCODING = "ISO 639"> <EADID> GB 0133 TAB </EADID> <FILEDESC> <TITLESTMT> <TITLEPROPER> Tabley Muniments </TITLEPROPER> </TITLESTMT> <PUBLICATIONSTMT> <PUBLISHER> John Rylands University Library of Manchester </PUBLISHER> <ADDRESS> <ADDRESSLINE> 150 Deansgate </ADDRESSLINE> <ADDRESSLINE> Manchester </ADDRESSLINE> <ADDRESSLINE> ... (Parts removed )… </FRONTMATTER> <ARCHDESC LEVEL = "FONDS" LANGMATERIAL = "English"> <DID> <REPOSITORY> University of Manchester, John Rylands University Library of Manchester </REPOSITORY> <UNITID ENCODINGANALOG = "ISADG3.1.1." COUNTRYCODE = "GB" REPOSITORYCODE = "0133"> GB 0133 TAB </UNITID> <UNITTITLE LABEL = "Title" ENCODINGANALOG = "ISADG3.1.2."> Tabley Muniments </UNITTITLE> <UNITDATE LABEL = "Dates of Creation" ENCODINGANALOG = "ISADG3.1.3."> 19th century </UNITDATE> <PHYSDESC LABEL = "Extent" ENCODINGANALOG = "ISADG3.1.5."> <EXTENT> 1.24 cu.m </EXTENT> </PHYSDESC> <ORIGINATION LABEL = "Creator" ENCODINGANALOG = "ISADG3.2.1."> <FAMNAME SOURCE = "NCARULES"> Warren, family, of Tabley, Cheshire </FAMNAME> <PERSNAME SOURCE = "NCARULES"> Warren, John Byrne Leicester, 1835-1895, 3rd Baron de Tabley, poet </PERSNAME> </ORIGINATION> </DID> Wednesday, December 12, 12
  7. 2012-11-04 - SLIDE 12/11/12 Example EAD Record (Hub) <BIOGHIST ENCODINGANALOG

    = "ISADG3.2.2."> <HEAD> Administrative/Biographical History </HEAD> <P> The poet John Byrne Leicester Warren, later 3rd and last Baron de Tabley, of Tabley near Knutsford, Cheshire, was born in 1835, the son of the 2nd Baron de Tabley (1811-1887), and his wife, Catherina. His mother was Italian, the daughter of the count de Soglio, and Warren spent much of his early childhood with her in Italy and Greece. He was educated at Eton and Christ Church, Oxford. At Oxford he published a volume of poetry. Originally he published under the pseudonyms George F. Preston (1859-1862) and William Lancaster (1863-1868), but latterly under his own name. </P> <P> His early verse included <TITLE> Praeterita </TITLE> (1863), <TITLE> Eclogues and Monodramas </TITLE> (1864), <TITLE> Studies in Verse </TITLE> (1865), <TITLE> Philocletes </TITLE> (1866), and <TITLE> Orestes </TITLE> (1868). His early work was Tennysonian in style, but he was later to be influenced by both Browning and Swinburne. In 1873 he produced …. (some data removed)… Wednesday, December 12, 12
  8. 2012-11-04 - SLIDE 12/11/12 Example EAD Record (Hub) <SCOPECONTENT ENCODINGANALOG

    = "ISADG3.3.1."> <HEAD> Scope and Content </HEAD> <P> The collection consists mainly of the personal papers of the 3rd Baron de Tabley. The papers reflect his interests in literature, politics, botany and numismatics and include correspondence with numerous prominent later Victorian figures. Attention should also be drawn to de Tabley’s extensive and important collection of armorial bookplates. </P> <P> Correspondents include Sir Mountstuart Grant Duff, Edmund Gosse, Lord Houghton, A.C.Benson, and Robert Bridges. There are volumes of Tabley's essays and verse, as well as a considerable number of notebooks and loose manuscripts of verse and other writings. There are various bundles and boxes relating to &quot;Coins&quot;, &quot;Botany&quot;, &quot;Poetry&quot;, &quot;Literary&quot;, &quot;Financial&quot; and bookplates. </P> </SCOPECONTENT> <ADD> <OTHERFINDAID ENCODINGANALOG = "ISADG3.4.6."> <P> Preliminary survey list. </P> </OTHERFINDAID> <RELATEDMATERIAL ENCODINGANALOG = "ISADG3.5.3."> <P> There is correspondence with the 3rd Baron de Tabley among the Edward Freeman Papers, held at JRULM. The Library also has custody of the important Tabley Book Collection. </P> </RELATEDMATERIAL> <SEPARATEDMATERIAL> <P> The family and estate papers of the Leicester-Warren Family of Tabley are held by Cheshire Record Office. Some of these papers were originally in the custody of the John Rylands University Library of Manchester. </P> </SEPARATEDMATERIAL> </ADD> Wednesday, December 12, 12
  9. 2012-11-04 - SLIDE 12/11/12 Example EAD Record (Hub) <CONTROLACCESS> <HEAD>

    Index terms </HEAD> <GEOGNAME SOURCE = "NCARULES"> <EMPH ALTRENDER = "a">Tabley Inferior</EMPH> <EMPH ALTRENDER = "a-">Cheshire SJ7378</EMPH> </GEOGNAME> <PERSNAME SOURCE = "NCARULES"> <EMPH ALTRENDER = "surname">Benson</EMPH> <EMPH ALTRENDER = "forename">Arthur Christopher</EMPH> <EMPH ALTRENDER = "dates">1862-1923</EMPH> </PERSNAME> <PERSNAME SOURCE = "NCARULES"> <EMPH ALTRENDER = "surname">Bridges</EMPH> <EMPH ALTRENDER = "forename">Robert Seymour</EMPH> <EMPH ALTRENDER = "dates">1844-1930</EMPH> </PERSNAME> <PERSNAME SOURCE = "NCARULES"> <EMPH ALTRENDER = "surname">Duff</EMPH> <EMPH ALTRENDER = "title">Sir</EMPH> <EMPH ALTRENDER = "forename">Mountstuart Elphinstone Grant</EMPH> <EMPH ALTRENDER = "dates">1829-1906</EMPH> <EMPH ALTRENDER = "epithet">Knight</EMPH> </PERSNAME> <PERSNAME SOURCE = "NCARULES"> <EMPH ALTRENDER = "surname">Gosse</EMPH> <EMPH ALTRENDER = "title">Sir</EMPH> <EMPH ALTRENDER = "forename">Edmund William</EMPH> <EMPH ALTRENDER = "dates">1849-1928</EMPH> <EMPH ALTRENDER = "epithet">Knight</EMPH> </PERSNAME> <PERSNAME SOURCE = "NCARULES"> <EMPH ALTRENDER = "surname">Milnes</EMPH> <EMPH ALTRENDER = "forename">Richard Monckton</EMPH> <EMPH ALTRENDER = "dates">1809-1885</EMPH> <EMPH ALTRENDER = "epithet">1st Baron Houghton</EMPH> </PERSNAME> <SUBJECT SOURCE = "LCSH"> <EMPH ALTRENDER = "a">Bookplates</EMPH> </SUBJECT> <SUBJECT SOURCE = "LCSH"> <EMPH ALTRENDER = "a">Botany</EMPH> </SUBJECT> <SUBJECT SOURCE = "LCSH"> <EMPH ALTRENDER = "a">Numismatics</EMPH> </SUBJECT> <SUBJECT SOURCE = "LCSH"> <EMPH ALTRENDER = "a-">Poetry</EMPH> <EMPH ALTRENDER = "a">Modern</EMPH> <EMPH ALTRENDER = "y">19th century</EMPH> </SUBJECT> </CONTROLACCESS> </ARCHDESC> </EAD> Wednesday, December 12, 12
  10. 2012-11-04 - SLIDE 12/11/12 2010-2012 Extraction Results • Source data:

    30,000 finding aids • EAC-CPF records extracted – LoC: 43,702 from 1,159 finding aids – OAC: 91,811 from ~15,400 – NWDA: 22,609 from 5,160 – VH: 15,175 from 8,390 – Total 173,297 Wednesday, December 12, 12
  11. 2012-11-04 - SLIDE 12/11/12 Phase II preliminary results • unmerged

    SIA Henry Correspondence • 32,988 Names • unmerged WorldCat MARC • 4,548,270 Names Wednesday, December 12, 12
  12. 2012-11-04 - SLIDE 12/11/12 Methods and Processing • Extract EAC-CPF

    records from existing EAD- encoded archival descriptions – Extracting both creators and referenced CPF names • Match EAC-CPF records against one another and against existing authority records (ULAN, VIAF, LCNAF) – Enhance EAC-CPF by normalizing entries, adding alternative entries, titles (VIAF), and historical data (ULAN) • Create a prototype historical resource and access system – Historical data and social-professional networks – Links to archive, library, and museum resources (by and about) Wednesday, December 12, 12
  13. 2012-11-04 - SLIDE 12/11/12 The Problem • Proliferation of the

    forms of names – Different names for the same person – Different people with the same names • Examples – from Books in Print (semi-controlled but not consistent) – ERIC author index (not controlled) Wednesday, December 12, 12
  14. 2012-11-04 - SLIDE 12/11/12 Library and Archive Authority • Library

    (or bibliographic) authority control is almost exclusively about the control of names • Archival identity control involves biographical- historical description of the CPF entity – Descriptions based on controlled vocabularies, for example, occupations, place of birth and death – But also biographical-historical description • Prose • Chronological list • Archival authority control provides context for understanding records, the context of their creation, the provenance Wednesday, December 12, 12
  15. 2012-11-04 - SLIDE 12/11/12 Repository of merged EAC Records EAC

    Repository VIAF Repository Connect exactly matching records Connect records using name authority information Repository of connected EAC Records (MongoDB) Merge Cheshire Search Merging EAC-CPF Records LCNAF Repository ULAN Repository Wednesday, December 12, 12
  16. 2012-11-04 - SLIDE 12/11/12 Repository of merged EAC Records EAC

    Repository VIAF Repository Connect exactly matching records Connect records using name authority information Repository of connected EAC Records (MongoDB) Merge Cheshire Search Merging EAC-CPF Records Wednesday, December 12, 12
  17. 2012-11-04 - SLIDE 12/11/12 Connect Exact Matches • The EAC-CPF

    records provide the names without having to parse texts, etc. • Allows us to use some simple methods like exact matching – Assume identical name entries means the same person/corporate body/family – Enter the full names and record IDs into a database and flag IDs with same names for merging Wednesday, December 12, 12
  18. 2012-11-04 - SLIDE 12/11/12 But… • Exact merging assumes that

    archives are following LC cataloging practice in their EAD records – There are some problems with this assumption Wednesday, December 12, 12
  19. 2012-11-04 - SLIDE 12/11/12 Some failures for merging… • Different

    abbreviations: – A. & G. Carisch & C. – A. & G. Carisch & Co. • And spacing issues: – A. C. Peters & Bro. – A. C. Peters & Brother. – A. C. Peters. (??) – A. C.Peters & Bro. • Completeness and alternate rules – Tabb, John B. (John Banister), 1845-1909. – Tabb, John Banister, 1845-1909. • Also differing transliterations for non-Latin scripts Wednesday, December 12, 12
  20. 2012-11-04 - SLIDE 12/11/12 More… • Variant romanizations (and spacing):

    – M. P. Belaieff. – M. P. Belaïeff. – M. P. Bieliaev. – M.P. Belaïeff. – M.P.Belaïeff. • Initials vs. names: – Zabolotskii, N.A. – Zabolotskii, Nikolai Alekseevich, 1903-1958. – Zabolotskii. Wednesday, December 12, 12
  21. 2012-11-04 - SLIDE 12/11/12 More… • Inverted order vs. uninverted

    – Taylor, Zachary, 1784-1850. – Zachary Taylor. • Various combinations: – Tchaikovsky, Peter I. – Tchaikovsky, Pëtr Il. – Tchaikovsky, Piotr Ilyich. – Tchaikovsky, Pyotr Il. – Tchaikovsky, Pyotr Ilyich. Wednesday, December 12, 12
  22. 2012-11-04 - SLIDE 12/11/12 Repository of merged EAC Records EAC

    Repository VIAF Repository Connect exactly matching records Connect records using name authority information Repository of connected EAC Records (MongoDB) Merge Cheshire Search Merging EAC-CPF Records Wednesday, December 12, 12
  23. 2012-11-04 - SLIDE 12/11/12 Search Authority Files • For each

    name, formulate a search of the VIAF database using the Cheshire system (SGML/XML retrieval system with probabilistic and Boolean matching) – Search both the “authoritative” and “non- authoritative” forms – Consider any name matching a non- authoritative form to be a candidate match for the authoritative form – Flag EAC records that match the same authority record as potential matches Wednesday, December 12, 12
  24. 2012-11-04 - SLIDE 12/11/12 Shingle Language Model for names Name:

    Einstein Albert Shingle sequence: ein, ins, nst, ste, tei, ein … , ert Probability that the sequence (ins, nst, ste) follows ein is very high for the name einstein Krishna Janakiraman and Sean Marimpietri - Biograph NGRAM or Shingle Matching Wednesday, December 12, 12
  25. 2012-11-04 - SLIDE 12/11/12 Name 1 : Einstein Albert Name

    2 : Ainshtain Albert Name 3 : Albert Einstein ein ins nst ste ein In n a alb ert al rte tei ein Ain ins nsh sht hta tai ain alb ert al rte tei ein ein ins nst ste ein In n a alb ert al rte tei ein lbe lbe lbe Shingle Language Model for names Krishna Janakiraman and Sean Marimpietri - Biograph Wednesday, December 12, 12
  26. 2012-11-04 - SLIDE 12/11/12 Repository of merged EAC Records EAC

    Repository VIAF Repository Connect exactly matching records Connect records using name authority information Repository of connected EAC Records (MongoDB) Merge Cheshire Search Merging EAC-CPF Records Wednesday, December 12, 12
  27. 2012-11-04 - SLIDE 12/11/12 Merge Flagged Records • For all

    of the exact matches and authority matches – Use the Authoritative form of the name – Combine data from each match into a single EAC-CPF record – Retain all source record IDs and information • Finally, output the merged EAC-CPF records Wednesday, December 12, 12
  28. 2012-11-04 - SLIDE 12/11/12 Inputs to SNAC merging • LoC:

    43,702 EAC-CPF records derived from 1159 finding aids • OAC: 91,814 EAC-CPF records derived from ~15,400 finding aids • NWDA: 24952 EAC-CPF records derived from 5,568 finding aids • VH: 15,175 EAC-CPF records • Total: 175,688 Input EAC records for merging • Result: 128,781 “unique” names Wednesday, December 12, 12
  29. 2012-11-04 - SLIDE 12/11/12 Another view of the numbers… •

    95624 Person names merged from 125555 Person records • 31287 Institutions merged from 47189 Institution records • 1980 Families merged from 2899 Family records Wednesday, December 12, 12
  30. 2012-11-04 - SLIDE 12/11/12 Merging Conclusions • There will not

    be a single merging method, but a staged set of approaches that will allow us to go from the simplest exact matches, to (we hope) reliably identifying various variant forms of a name, etc. when corroborated by contextual (date, etc.) information Wednesday, December 12, 12
  31. 2012-11-04 - SLIDE 12/11/12 Next • Developing an updateable database

    of merged EAC data (dumping Mongo for PostgreSQL) – Will permit incremental addition of new data and support editing and “forced” merges • Process the 2M WorldCat archival descriptions • Process the 150,000 finding aids • Convert several hundred thousand archival authority records into EAC-CPF and match/ merge process Wednesday, December 12, 12
  32. 2012-11-04 - SLIDE 12/11/12 Methods and Processing • Extract EAC-CPF

    records from existing EAD- encoded archival descriptions – Extracting both creators and referenced CPF names • Match EAC-CPF records against one another and against existing authority records (ULAN, VIAF, LCNAF) – Enhance EAC-CPF by normalizing entries, adding alternative entries, titles (VIAF), and historical data (ULAN) • Create a prototype historical resource and access system – Historical data and social-professional networks – Links to archive, library, and museum resources (by and about) Wednesday, December 12, 12
  33. 12/11/12 Outline • User Persona • Search and Display •

    Network graph visualization • Linked Data / RDF • Future Plans Wednesday, December 12, 12
  34. 12/11/12 Meet the target users Personas are fictional characters created

    to represent the different user types within a targeted demographic, attitude and/or behavior set that might use a site, brand or product in a similar way. http://en.wikipedia.org/wiki/Persona_(marketing) Wednesday, December 12, 12
  35. 12/11/12 Meet the target users • Randy: Graduate student working

    on a PhD that involves biographies and the study of diplomatic families and networks. Sometimes he comes to the site looking for information on specific people; other times he is looking for information on a specific subject or event. He also TAs an undergraduate history class and sometimes has to help students find topics for papers. Personas are fictional characters created to represent the different user types within a targeted demographic, attitude and/or behavior set that might use a site, brand or product in a similar way. http://en.wikipedia.org/wiki/Persona_(marketing) Wednesday, December 12, 12
  36. 12/11/12 Meet the target users • Randy: Graduate student working

    on a PhD that involves biographies and the study of diplomatic families and networks. Sometimes he comes to the site looking for information on specific people; other times he is looking for information on a specific subject or event. He also TAs an undergraduate history class and sometimes has to help students find topics for papers. • Connie: Works at an institution that contributed records to the project. Is going to be asking themselves how this site would be useful to their users. Wants to understand how their records were used and what the added value is. Personas are fictional characters created to represent the different user types within a targeted demographic, attitude and/or behavior set that might use a site, brand or product in a similar way. http://en.wikipedia.org/wiki/Persona_(marketing) Wednesday, December 12, 12
  37. 12/11/12 Meet the target users • Randy: Graduate student working

    on a PhD that involves biographies and the study of diplomatic families and networks. Sometimes he comes to the site looking for information on specific people; other times he is looking for information on a specific subject or event. He also TAs an undergraduate history class and sometimes has to help students find topics for papers. • Connie: Works at an institution that contributed records to the project. Is going to be asking themselves how this site would be useful to their users. Wants to understand how their records were used and what the added value is. • Quincy: Library School Student working to QA record matching. Personas are fictional characters created to represent the different user types within a targeted demographic, attitude and/or behavior set that might use a site, brand or product in a similar way. http://en.wikipedia.org/wiki/Persona_(marketing) Wednesday, December 12, 12
  38. 12/11/12 Meet the target users • Randy: Graduate student working

    on a PhD that involves biographies and the study of diplomatic families and networks. Sometimes he comes to the site looking for information on specific people; other times he is looking for information on a specific subject or event. He also TAs an undergraduate history class and sometimes has to help students find topics for papers. • Connie: Works at an institution that contributed records to the project. Is going to be asking themselves how this site would be useful to their users. Wants to understand how their records were used and what the added value is. • Quincy: Library School Student working to QA record matching. • Adele: Person doing authority work during collection processing. Personas are fictional characters created to represent the different user types within a targeted demographic, attitude and/or behavior set that might use a site, brand or product in a similar way. http://en.wikipedia.org/wiki/Persona_(marketing) Wednesday, December 12, 12
  39. 12/11/12 Meet the target users • Randy: Graduate student working

    on a PhD that involves biographies and the study of diplomatic families and networks. Sometimes he comes to the site looking for information on specific people; other times he is looking for information on a specific subject or event. He also TAs an undergraduate history class and sometimes has to help students find topics for papers. • Connie: Works at an institution that contributed records to the project. Is going to be asking themselves how this site would be useful to their users. Wants to understand how their records were used and what the added value is. • Quincy: Library School Student working to QA record matching. • Adele: Person doing authority work during collection processing. • Lenny: Lenny likes linked data, and wants to be able to mine the links that have been established programatically. Personas are fictional characters created to represent the different user types within a targeted demographic, attitude and/or behavior set that might use a site, brand or product in a similar way. http://en.wikipedia.org/wiki/Persona_(marketing) Wednesday, December 12, 12
  40. 12/11/12 Outline • User Persona • Search and Display •

    Network graph visualization • Linked Data / RDF • Future Plans Wednesday, December 12, 12
  41. 12/11/12 Outline • User Persona • Search and Display •

    Network graph visualization • Context widget (needs new name) • Linked Data / RDF • Future Plans Wednesday, December 12, 12
  42. 12/11/12 Tinkerpop graph database stack • Simple "property graph" model

    • "JDBC for graph databases" [SNAC is using Neo4J for the graphDB] Wednesday, December 12, 12
  43. 12/11/12 Tinkerpop graph database stack • Simple "property graph" model

    • "JDBC for graph databases" [SNAC is using Neo4J for the graphDB] • XPath like "gremlin" for graph query Wednesday, December 12, 12
  44. 12/11/12 Tinkerpop graph database stack • Simple "property graph" model

    • "JDBC for graph databases" [SNAC is using Neo4J for the graphDB] • XPath like "gremlin" for graph query • REST interfaces with "Rexster" Wednesday, December 12, 12
  45. 12/11/12 Tinkerpop graph database stack • Simple "property graph" model

    • "JDBC for graph databases" [SNAC is using Neo4J for the graphDB] • XPath like "gremlin" for graph query • REST interfaces with "Rexster" • For me, this was 10 to 100 times easier than using RDF Wednesday, December 12, 12
  46. 12/11/12 Outline • User Persona • Search and Display •

    Network graph visualization • Linked Data / RDF • Future Plans Wednesday, December 12, 12
  47. 12/11/12 What is Linked Open Data? • w3c Semantic Web

    Technology Stack Wednesday, December 12, 12
  48. 12/11/12 What is Linked Open Data? • w3c Semantic Web

    Technology Stack • Web of atomized Data, not a web of documents Wednesday, December 12, 12
  49. 12/11/12 What is Linked Open Data? • w3c Semantic Web

    Technology Stack • Web of atomized Data, not a web of documents • RDF; OWL ontologies; SPARQL queries; triple/quad/quint stores Wednesday, December 12, 12
  50. 12/11/12 What is Linked Open Data? • w3c Semantic Web

    Technology Stack • Web of atomized Data, not a web of documents • RDF; OWL ontologies; SPARQL queries; triple/quad/quint stores • httpRange14; content negotiation; CURIE Wednesday, December 12, 12
  51. 12/11/12 What is Linked Open Data? • w3c Semantic Web

    Technology Stack • Web of atomized Data, not a web of documents • RDF; OWL ontologies; SPARQL queries; triple/quad/quint stores • httpRange14; content negotiation; CURIE • No restrictions on data use; free and easy license Wednesday, December 12, 12
  52. 12/11/12 What is Linked Open Data? • w3c Semantic Web

    Technology Stack • Web of atomized Data, not a web of documents • RDF; OWL ontologies; SPARQL queries; triple/quad/quint stores • httpRange14; content negotiation; CURIE • No restrictions on data use; free and easy license • Lenny wants it, but does Randy? Wednesday, December 12, 12
  53. 12/11/12 What is Linked Open Data? • Getting to the

    good stuff Wednesday, December 12, 12
  54. 12/11/12 What is Linked Open Data? • Getting to the

    good stuff • Blue underlined text Wednesday, December 12, 12
  55. 12/11/12 What is Linked Open Data? • Getting to the

    good stuff • Blue underlined text • Pulling in data from multiple sources, in an intelligent way, into a "document" Wednesday, December 12, 12
  56. 12/11/12 What is Linked Open Data? • Getting to the

    good stuff • Blue underlined text • Pulling in data from multiple sources, in an intelligent way, into a "document" • Understand and discover relationships Wednesday, December 12, 12
  57. 12/11/12 What is Linked Open Data? • Getting to the

    good stuff • Blue underlined text • Pulling in data from multiple sources, in an intelligent way, into a "document" • Understand and discover relationships • Open access for research, education, private study and other fair use Wednesday, December 12, 12
  58. 12/11/12 My opinion on the use cases for w3c RDF

    tech • Good for publishing data • Good for controlled vocabularies • Data models? • Most people with open source RDF-store type systems do the real stuff with solr • Consider a graph database Wednesday, December 12, 12
  59. 12/11/12 Outline • User Persona • Search and Display •

    Linked Data / RDF • Network graph visualization • Future Plans Wednesday, December 12, 12
  60. 12/11/12 Future Plans • Conduct assessment activities involving members of

    target audiences to establish mental model of users for design work Wednesday, December 12, 12
  61. 12/11/12 Future Plans • Conduct assessment activities involving members of

    target audiences to establish mental model of users for design work • Scale interface to millions of names Wednesday, December 12, 12
  62. 12/11/12 Future Plans • Conduct assessment activities involving members of

    target audiences to establish mental model of users for design work • Scale interface to millions of names • Visualizations useful and integrated (network and geospatial) Wednesday, December 12, 12
  63. 12/11/12 Future Plans • Conduct assessment activities involving members of

    target audiences to establish mental model of users for design work • Scale interface to millions of names • Visualizations useful and integrated (network and geospatial) • Stable URLs between batches for linked data Wednesday, December 12, 12
  64. 12/11/12 Future Plans • Conduct assessment activities involving members of

    target audiences to establish mental model of users for design work • Scale interface to millions of names • Visualizations useful and integrated (network and geospatial) • Stable URLs between batches for linked data • Social and personalization features (gateway to crowdsourcing) Wednesday, December 12, 12
  65. 12/11/12 Future Plans • Conduct assessment activities involving members of

    target audiences to establish mental model of users for design work • Scale interface to millions of names • Visualizations useful and integrated (network and geospatial) • Stable URLs between batches for linked data • Social and personalization features (gateway to crowdsourcing) • Integration with local systems (such as with the context widget) Wednesday, December 12, 12
  66. 12/11/12 • Photo attribution http://www.flickr.com/photos/dsevilla/ 139656712/in/photostream/ • http://xtf.cdlib.org/ • http://code.google.com/p/eac-graph-load/source/

    browse/README.txt • http://tinkerpop.com/ • http://thejit.org/ • https://github.com/tingletech/snac-related-widget Wednesday, December 12, 12