Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CNI2012

tingletech
December 11, 2012

 CNI2012

#cni12f SNAC slides

tingletech

December 11, 2012
Tweet

More Decks by tingletech

Other Decks in Research

Transcript

  1. Opening Slide
    Wednesday, December 12, 12

    View Slide

  2. 2012-11-04 - SLIDE
    12/11/12
    Building an Archival Identity
    Management Network: Transforming
    Archival Practice and Historical
    Research
    Daniel Pitti* and Brian Tingle**
    * Institute for Advance Technology in the Humanities
    ** California Digital Library
    Thanks to Ray R. Larson of the University of California, Berkeley, School of
    Information for many of the slides here
    Wednesday, December 12, 12

    View Slide

  3. 2012-11-04 - SLIDE
    12/11/12
    Funding and People
    • Funding and Timeline
    – National Endowment for the Humanities
    – May 2010-April 2012
    – Andrew W. Mellon Foundation
    – May 2012-April 2014
    • People
    – Daniel Pitti (PI) and Worthy Martin (Institute for Advanced
    Technology in the Humanities, University of Virginia)
    – Adrian Turner and Brian Tingle (California Digital Library,
    University of California)
    – Ray Larson (School of Information, University of California,
    Berkeley)
    Wednesday, December 12, 12

    View Slide

  4. 2012-11-04 - SLIDE
    12/11/12
    The Source Data
    • EAD-encoded finding aids (guides to archival
    records)
    – 150K
    – Primarily from U.S. sources, but also U.K. and
    France
    • Archival authority records (360K)
    – National Archives and Records Administration
    – State Archive of New York
    – Smithsonian Institution
    – British Library
    – National Archives (France) & BnF
    • WorldCat Archival Descriptions: 2M
    Wednesday, December 12, 12

    View Slide

  5. 2012-11-04 - SLIDE
    12/11/12
    Library and Museum Authority Records
    • Getty Vocabulary Program: Union List of
    Artist Names (293K personal and corporate
    names)
    • Virtual International Authority File (16M+
    cluster records)
    – Contributed from around the world by national
    libraries and others
    Wednesday, December 12, 12

    View Slide

  6. 2012-11-04 - SLIDE
    12/11/12
    Wednesday, December 12, 12

    View Slide

  7. 2012-11-04 - SLIDE
    12/11/12
    Methods and Processing
    • Extract EAC-CPF records from existing EAD-
    encoded archival descriptions
    – Extracting both creators and referenced CPF
    names
    • Match EAC-CPF records against one another and
    against existing authority records (ULAN, VIAF,
    LCNAF)
    – Enhance EAC-CPF by normalizing entries, adding
    alternative entries, titles (VIAF), and historical data
    (ULAN)
    • Create a prototype historical resource and access
    system
    – Historical data and social-professional networks
    – Links to archive, library, and museum resources (by
    and about)
    Wednesday, December 12, 12

    View Slide

  8. 2012-11-04 - SLIDE
    12/11/12
    Example EAD Record (Hub)



    GB 0133 TAB




    Tabley Muniments




    John Rylands University Library of
    Manchester



    150 Deansgate


    Manchester


    ... (Parts removed )…




    University of Manchester, John Rylands University Library of Manchester

    REPOSITORYCODE = "0133">
    GB 0133 TAB


    Tabley Muniments


    19th century



    1.24 cu.m




    Warren, family, of Tabley, Cheshire


    Warren, John Byrne Leicester, 1835-1895, 3rd Baron de Tabley, poet



    Wednesday, December 12, 12

    View Slide

  9. 2012-11-04 - SLIDE
    12/11/12
    Example EAD Record (Hub)


    Administrative/Biographical History


    The poet John Byrne Leicester Warren, later 3rd and last Baron de Tabley, of Tabley near Knutsford, Cheshire,
    was born in 1835, the son of the 2nd Baron de Tabley (1811-1887), and his wife, Catherina. His mother was Italian,
    the daughter of the count de Soglio, and Warren spent much of his early childhood with her in Italy and Greece. He
    was educated at Eton and Christ Church, Oxford. At Oxford he published a volume of poetry. Originally he
    published under the pseudonyms George F. Preston (1859-1862) and William Lancaster (1863-1868), but latterly
    under his own name.


    His early verse included

    Praeterita

    (1863),

    Eclogues and Monodramas

    (1864),

    Studies in Verse

    (1865),

    Philocletes

    (1866), and

    Orestes

    (1868). His early work was Tennysonian in style, but he was later to be influenced by both Browning and
    Swinburne. In 1873 he produced …. (some data removed)…
    Wednesday, December 12, 12

    View Slide

  10. 2012-11-04 - SLIDE
    12/11/12
    Example EAD Record (Hub)


    Scope and Content


    The collection consists mainly of the personal papers of the 3rd Baron de Tabley. The papers reflect his interests in
    literature, politics, botany and numismatics and include correspondence with numerous prominent later Victorian
    figures. Attention should also be drawn to de Tabley’s extensive and important collection of armorial bookplates.


    Correspondents include Sir Mountstuart Grant Duff, Edmund Gosse, Lord Houghton, A.C.Benson, and Robert
    Bridges. There are volumes of Tabley's essays and verse, as well as a considerable number of notebooks and
    loose manuscripts of verse and other writings. There are various bundles and boxes relating to
    "Coins", "Botany", "Poetry", "Literary", "Financial"
    and bookplates.





    Preliminary survey list.




    There is correspondence with the 3rd Baron de Tabley among the Edward Freeman Papers, held at JRULM.
    The Library also has custody of the important Tabley Book Collection.




    The family and estate papers of the Leicester-Warren Family of Tabley are held by Cheshire Record
    Office. Some of these papers were originally in the custody of the John Rylands University Library
    of Manchester.



    Wednesday, December 12, 12

    View Slide

  11. 2012-11-04 - SLIDE
    12/11/12
    Example EAD Record (Hub)


    Index terms


    Tabley Inferior
    Cheshire SJ7378


    Benson
    Arthur Christopher
    1862-1923


    Bridges
    Robert Seymour
    1844-1930


    Duff
    Sir
    Mountstuart Elphinstone Grant
    1829-1906
    Knight


    Gosse
    Sir
    Edmund William
    1849-1928
    Knight


    Milnes
    Richard Monckton
    1809-1885
    1st Baron Houghton


    Bookplates


    Botany


    Numismatics


    Poetry
    Modern
    19th century




    Wednesday, December 12, 12

    View Slide

  12. 2012-11-04 - SLIDE
    12/11/12
    2010-2012 Extraction Results
    • Source data: 30,000 finding aids
    • EAC-CPF records extracted
    – LoC: 43,702 from 1,159 finding aids
    – OAC: 91,811 from ~15,400
    – NWDA: 22,609 from 5,160
    – VH: 15,175 from 8,390
    – Total 173,297
    Wednesday, December 12, 12

    View Slide

  13. 2012-11-04 - SLIDE
    12/11/12
    Phase II preliminary results
    • unmerged SIA Henry Correspondence
    • 32,988 Names
    • unmerged WorldCat MARC
    • 4,548,270 Names
    Wednesday, December 12, 12

    View Slide

  14. 2012-11-04 - SLIDE
    12/11/12
    Methods and Processing
    • Extract EAC-CPF records from existing EAD-
    encoded archival descriptions
    – Extracting both creators and referenced CPF names
    • Match EAC-CPF records against one another
    and against existing authority records (ULAN,
    VIAF, LCNAF)
    – Enhance EAC-CPF by normalizing entries, adding
    alternative entries, titles (VIAF), and historical data
    (ULAN)
    • Create a prototype historical resource and access
    system
    – Historical data and social-professional networks
    – Links to archive, library, and museum resources (by
    and about)
    Wednesday, December 12, 12

    View Slide

  15. 2012-11-04 - SLIDE
    12/11/12
    The Problem
    • Proliferation of the forms of names
    – Different names for the same person
    – Different people with the same names
    • Examples
    – from Books in Print (semi-controlled but not
    consistent)
    – ERIC author index (not controlled)
    Wednesday, December 12, 12

    View Slide

  16. 2012-11-04 - SLIDE
    12/11/12
    Goethe
    …etc…
    Wednesday, December 12, 12

    View Slide

  17. 2012-11-04 - SLIDE
    12/11/12
    John Muir
    Wednesday, December 12, 12

    View Slide

  18. 2012-11-04 - SLIDE
    12/11/12
    Library and Archive Authority
    • Library (or bibliographic) authority control is almost
    exclusively about the control of names
    • Archival identity control involves biographical-
    historical description of the CPF entity
    – Descriptions based on controlled vocabularies, for
    example, occupations, place of birth and death
    – But also biographical-historical description
    • Prose
    • Chronological list
    • Archival authority control provides context for
    understanding records, the context of their
    creation, the provenance
    Wednesday, December 12, 12

    View Slide

  19. 2012-11-04 - SLIDE
    12/11/12
    Repository of
    merged EAC
    Records
    EAC Repository
    VIAF Repository
    Connect
    exactly
    matching
    records
    Connect
    records using
    name
    authority
    information
    Repository of
    connected EAC
    Records
    (MongoDB)
    Merge
    Cheshire
    Search
    Merging EAC-CPF Records
    LCNAF Repository ULAN Repository
    Wednesday, December 12, 12

    View Slide

  20. 2012-11-04 - SLIDE
    12/11/12
    Repository of
    merged EAC
    Records
    EAC Repository
    VIAF Repository
    Connect
    exactly
    matching
    records
    Connect
    records using
    name
    authority
    information
    Repository of
    connected EAC
    Records
    (MongoDB)
    Merge
    Cheshire
    Search
    Merging EAC-CPF Records
    Wednesday, December 12, 12

    View Slide

  21. 2012-11-04 - SLIDE
    12/11/12
    Connect Exact Matches
    • The EAC-CPF records provide the names
    without having to parse texts, etc.
    • Allows us to use some simple methods like
    exact matching
    – Assume identical name entries means the
    same person/corporate body/family
    – Enter the full names and record IDs into a
    database and flag IDs with same names for
    merging
    Wednesday, December 12, 12

    View Slide

  22. 2012-11-04 - SLIDE
    12/11/12
    But…
    • Exact merging assumes that archives are
    following LC cataloging practice in their
    EAD records
    – There are some problems with this assumption
    Wednesday, December 12, 12

    View Slide

  23. 2012-11-04 - SLIDE
    12/11/12
    Some failures for merging…
    • Different abbreviations:
    – A. & G. Carisch & C.
    – A. & G. Carisch & Co.
    • And spacing issues:
    – A. C. Peters & Bro.
    – A. C. Peters & Brother.
    – A. C. Peters. (??)
    – A. C.Peters & Bro.
    • Completeness and alternate rules
    – Tabb, John B. (John Banister), 1845-1909.
    – Tabb, John Banister, 1845-1909.
    • Also differing transliterations for non-Latin scripts
    Wednesday, December 12, 12

    View Slide

  24. 2012-11-04 - SLIDE
    12/11/12
    More…
    • Variant romanizations (and spacing):
    – M. P. Belaieff.
    – M. P. Belaïeff.
    – M. P. Bieliaev.
    – M.P. Belaïeff.
    – M.P.Belaïeff.
    • Initials vs. names:
    – Zabolotskii, N.A.
    – Zabolotskii, Nikolai Alekseevich, 1903-1958.
    – Zabolotskii.
    Wednesday, December 12, 12

    View Slide

  25. 2012-11-04 - SLIDE
    12/11/12
    More…
    • Inverted order vs. uninverted
    – Taylor, Zachary, 1784-1850.
    – Zachary Taylor.
    • Various combinations:
    – Tchaikovsky, Peter I.
    – Tchaikovsky, Pëtr Il.
    – Tchaikovsky, Piotr Ilyich.
    – Tchaikovsky, Pyotr Il.
    – Tchaikovsky, Pyotr Ilyich.
    Wednesday, December 12, 12

    View Slide

  26. 2012-11-04 - SLIDE
    12/11/12
    Repository of
    merged EAC
    Records
    EAC Repository
    VIAF Repository
    Connect
    exactly
    matching
    records
    Connect
    records using
    name
    authority
    information
    Repository of
    connected EAC
    Records
    (MongoDB)
    Merge
    Cheshire
    Search
    Merging EAC-CPF Records
    Wednesday, December 12, 12

    View Slide

  27. 2012-11-04 - SLIDE
    12/11/12
    Search Authority Files
    • For each name, formulate a search of the
    VIAF database using the Cheshire system
    (SGML/XML retrieval system with
    probabilistic and Boolean matching)
    – Search both the “authoritative” and “non-
    authoritative” forms
    – Consider any name matching a non-
    authoritative form to be a candidate match for
    the authoritative form
    – Flag EAC records that match the same
    authority record as potential matches
    Wednesday, December 12, 12

    View Slide

  28. 2012-11-04 - SLIDE
    12/11/12
    Shingle Language Model for names
    Name: Einstein Albert
    Shingle sequence: ein, ins, nst, ste, tei, ein … , ert
    Probability that the sequence (ins, nst, ste) follows ein is very high for the
    name einstein
    Krishna Janakiraman and Sean Marimpietri - Biograph
    NGRAM or Shingle Matching
    Wednesday, December 12, 12

    View Slide

  29. 2012-11-04 - SLIDE
    12/11/12
    Name 1 : Einstein Albert Name 2 : Ainshtain Albert Name 3 : Albert Einstein
    ein
    ins
    nst
    ste
    ein In
    n a
    alb
    ert
    al
    rte
    tei
    ein
    Ain
    ins
    nsh
    sht
    hta tai
    ain
    alb
    ert
    al
    rte
    tei
    ein
    ein
    ins
    nst
    ste
    ein In
    n a
    alb
    ert
    al
    rte
    tei
    ein
    lbe
    lbe lbe
    Shingle Language Model for names
    Krishna Janakiraman and Sean Marimpietri - Biograph
    Wednesday, December 12, 12

    View Slide

  30. 2012-11-04 - SLIDE
    12/11/12
    Repository of
    merged EAC
    Records
    EAC Repository
    VIAF Repository
    Connect
    exactly
    matching
    records
    Connect
    records using
    name
    authority
    information
    Repository of
    connected EAC
    Records
    (MongoDB)
    Merge
    Cheshire
    Search
    Merging EAC-CPF Records
    Wednesday, December 12, 12

    View Slide

  31. 2012-11-04 - SLIDE
    12/11/12
    Merge Flagged Records
    • For all of the exact matches and authority
    matches
    – Use the Authoritative form of the name
    – Combine data from each match into a single
    EAC-CPF record
    – Retain all source record IDs and information
    • Finally, output the merged EAC-CPF
    records
    Wednesday, December 12, 12

    View Slide

  32. 2012-11-04 - SLIDE
    12/11/12
    Inputs to SNAC merging
    • LoC: 43,702 EAC-CPF records derived from 1159
    finding aids
    • OAC: 91,814 EAC-CPF records derived from
    ~15,400 finding aids
    • NWDA: 24952 EAC-CPF records derived from
    5,568 finding aids
    • VH: 15,175 EAC-CPF records
    • Total: 175,688 Input EAC records for merging
    • Result: 128,781 “unique” names
    Wednesday, December 12, 12

    View Slide

  33. 2012-11-04 - SLIDE
    12/11/12
    Another view of the numbers…
    • 95624 Person names merged from 125555
    Person records
    • 31287 Institutions merged from 47189
    Institution records
    • 1980 Families merged from 2899 Family
    records
    Wednesday, December 12, 12

    View Slide

  34. 2012-11-04 - SLIDE
    12/11/12
    Merging Conclusions
    • There will not be a single merging method,
    but a staged set of approaches that will
    allow us to go from the simplest exact
    matches, to (we hope) reliably identifying
    various variant forms of a name, etc. when
    corroborated by contextual (date, etc.)
    information
    Wednesday, December 12, 12

    View Slide

  35. 2012-11-04 - SLIDE
    12/11/12
    Next
    • Developing an updateable database of
    merged EAC data (dumping Mongo for
    PostgreSQL)
    – Will permit incremental addition of new data
    and support editing and “forced” merges
    • Process the 2M WorldCat archival
    descriptions
    • Process the 150,000 finding aids
    • Convert several hundred thousand archival
    authority records into EAC-CPF and match/
    merge process
    Wednesday, December 12, 12

    View Slide

  36. 2012-11-04 - SLIDE
    12/11/12
    Methods and Processing
    • Extract EAC-CPF records from existing EAD-
    encoded archival descriptions
    – Extracting both creators and referenced CPF names
    • Match EAC-CPF records against one another and
    against existing authority records (ULAN, VIAF,
    LCNAF)
    – Enhance EAC-CPF by normalizing entries, adding
    alternative entries, titles (VIAF), and historical data
    (ULAN)
    • Create a prototype historical resource and
    access system
    – Historical data and social-professional networks
    – Links to archive, library, and museum resources
    (by and about)
    Wednesday, December 12, 12

    View Slide

  37. 12/11/12
    Outline
    • User Persona
    • Search and Display
    • Network graph visualization
    • Linked Data / RDF
    • Future Plans
    Wednesday, December 12, 12

    View Slide

  38. 12/11/12
    Meet the target users
    Personas are fictional characters created to represent the different user types within a targeted demographic, attitude and/or behavior set that might use a site, brand or
    product in a similar way. http://en.wikipedia.org/wiki/Persona_(marketing)
    Wednesday, December 12, 12

    View Slide

  39. 12/11/12
    Meet the target users
    • Randy: Graduate student working on a PhD that involves biographies and the study of diplomatic families
    and networks. Sometimes he comes to the site looking for information on specific people; other times he is
    looking for information on a specific subject or event. He also TAs an undergraduate history class and
    sometimes has to help students find topics for papers.
    Personas are fictional characters created to represent the different user types within a targeted demographic, attitude and/or behavior set that might use a site, brand or
    product in a similar way. http://en.wikipedia.org/wiki/Persona_(marketing)
    Wednesday, December 12, 12

    View Slide

  40. 12/11/12
    Meet the target users
    • Randy: Graduate student working on a PhD that involves biographies and the study of diplomatic families
    and networks. Sometimes he comes to the site looking for information on specific people; other times he is
    looking for information on a specific subject or event. He also TAs an undergraduate history class and
    sometimes has to help students find topics for papers.
    • Connie: Works at an institution that contributed records to the project. Is going to be asking
    themselves how this site would be useful to their users. Wants to understand how their records were used and
    what the added value is.
    Personas are fictional characters created to represent the different user types within a targeted demographic, attitude and/or behavior set that might use a site, brand or
    product in a similar way. http://en.wikipedia.org/wiki/Persona_(marketing)
    Wednesday, December 12, 12

    View Slide

  41. 12/11/12
    Meet the target users
    • Randy: Graduate student working on a PhD that involves biographies and the study of diplomatic families
    and networks. Sometimes he comes to the site looking for information on specific people; other times he is
    looking for information on a specific subject or event. He also TAs an undergraduate history class and
    sometimes has to help students find topics for papers.
    • Connie: Works at an institution that contributed records to the project. Is going to be asking
    themselves how this site would be useful to their users. Wants to understand how their records were used and
    what the added value is.
    • Quincy: Library School Student working to QA record matching.
    Personas are fictional characters created to represent the different user types within a targeted demographic, attitude and/or behavior set that might use a site, brand or
    product in a similar way. http://en.wikipedia.org/wiki/Persona_(marketing)
    Wednesday, December 12, 12

    View Slide

  42. 12/11/12
    Meet the target users
    • Randy: Graduate student working on a PhD that involves biographies and the study of diplomatic families
    and networks. Sometimes he comes to the site looking for information on specific people; other times he is
    looking for information on a specific subject or event. He also TAs an undergraduate history class and
    sometimes has to help students find topics for papers.
    • Connie: Works at an institution that contributed records to the project. Is going to be asking
    themselves how this site would be useful to their users. Wants to understand how their records were used and
    what the added value is.
    • Quincy: Library School Student working to QA record matching.
    • Adele: Person doing authority work during collection processing.
    Personas are fictional characters created to represent the different user types within a targeted demographic, attitude and/or behavior set that might use a site, brand or
    product in a similar way. http://en.wikipedia.org/wiki/Persona_(marketing)
    Wednesday, December 12, 12

    View Slide

  43. 12/11/12
    Meet the target users
    • Randy: Graduate student working on a PhD that involves biographies and the study of diplomatic families
    and networks. Sometimes he comes to the site looking for information on specific people; other times he is
    looking for information on a specific subject or event. He also TAs an undergraduate history class and
    sometimes has to help students find topics for papers.
    • Connie: Works at an institution that contributed records to the project. Is going to be asking
    themselves how this site would be useful to their users. Wants to understand how their records were used and
    what the added value is.
    • Quincy: Library School Student working to QA record matching.
    • Adele: Person doing authority work during collection processing.
    • Lenny: Lenny likes linked data, and wants to be able to mine the links that have been established
    programatically.
    Personas are fictional characters created to represent the different user types within a targeted demographic, attitude and/or behavior set that might use a site, brand or
    product in a similar way. http://en.wikipedia.org/wiki/Persona_(marketing)
    Wednesday, December 12, 12

    View Slide

  44. 12/11/12
    Outline
    • User Persona
    • Search and Display
    • Network graph visualization
    • Linked Data / RDF
    • Future Plans
    Wednesday, December 12, 12

    View Slide

  45. Wednesday, December 12, 12

    View Slide

  46. 12/11/12
    Wednesday, December 12, 12

    View Slide

  47. 12/11/12
    Wednesday, December 12, 12

    View Slide

  48. 12/11/12
    Wednesday, December 12, 12

    View Slide

  49. Wednesday, December 12, 12

    View Slide

  50. Wednesday, December 12, 12

    View Slide

  51. Wednesday, December 12, 12

    View Slide

  52. Wednesday, December 12, 12

    View Slide

  53. Wednesday, December 12, 12

    View Slide

  54. Wednesday, December 12, 12

    View Slide

  55. Wednesday, December 12, 12

    View Slide

  56. Advanced limits match EAC sections
    Wednesday, December 12, 12

    View Slide

  57. Wednesday, December 12, 12

    View Slide

  58. Wednesday, December 12, 12

    View Slide

  59. Wednesday, December 12, 12

    View Slide

  60. Wednesday, December 12, 12

    View Slide

  61. Wednesday, December 12, 12

    View Slide

  62. Wednesday, December 12, 12

    View Slide

  63. Wednesday, December 12, 12

    View Slide

  64. Wednesday, December 12, 12

    View Slide

  65. Wednesday, December 12, 12

    View Slide

  66. Wednesday, December 12, 12

    View Slide

  67. Wednesday, December 12, 12

    View Slide

  68. Wednesday, December 12, 12

    View Slide

  69. Wednesday, December 12, 12

    View Slide

  70. Wednesday, December 12, 12

    View Slide

  71. Wednesday, December 12, 12

    View Slide

  72. Wednesday, December 12, 12

    View Slide

  73. Wednesday, December 12, 12

    View Slide

  74. 12/11/12
    Outline
    • User Persona
    • Search and Display
    • Network graph visualization
    • Context widget (needs new name)
    • Linked Data / RDF
    • Future Plans
    Wednesday, December 12, 12

    View Slide

  75. 12/11/12
    Tinkerpop graph database stack
    Wednesday, December 12, 12

    View Slide

  76. 12/11/12
    Tinkerpop graph database stack
    • Simple "property graph" model
    Wednesday, December 12, 12

    View Slide

  77. 12/11/12
    Tinkerpop graph database stack
    • Simple "property graph" model
    • "JDBC for graph databases" [SNAC is using Neo4J for the
    graphDB]
    Wednesday, December 12, 12

    View Slide

  78. 12/11/12
    Tinkerpop graph database stack
    • Simple "property graph" model
    • "JDBC for graph databases" [SNAC is using Neo4J for the
    graphDB]
    • XPath like "gremlin" for graph query
    Wednesday, December 12, 12

    View Slide

  79. 12/11/12
    Tinkerpop graph database stack
    • Simple "property graph" model
    • "JDBC for graph databases" [SNAC is using Neo4J for the
    graphDB]
    • XPath like "gremlin" for graph query
    • REST interfaces with "Rexster"
    Wednesday, December 12, 12

    View Slide

  80. 12/11/12
    Tinkerpop graph database stack
    • Simple "property graph" model
    • "JDBC for graph databases" [SNAC is using Neo4J for the
    graphDB]
    • XPath like "gremlin" for graph query
    • REST interfaces with "Rexster"
    • For me, this was 10 to 100 times easier than using RDF
    Wednesday, December 12, 12

    View Slide

  81. Wednesday, December 12, 12

    View Slide

  82. Wednesday, December 12, 12

    View Slide

  83. Wednesday, December 12, 12

    View Slide

  84. Wednesday, December 12, 12

    View Slide

  85. Wednesday, December 12, 12

    View Slide

  86. Wednesday, December 12, 12

    View Slide

  87. Wednesday, December 12, 12

    View Slide

  88. Wednesday, December 12, 12

    View Slide

  89. Wednesday, December 12, 12

    View Slide

  90. 12/11/12
    Outline
    • User Persona
    • Search and Display
    • Network graph visualization
    • Linked Data / RDF
    • Future Plans
    Wednesday, December 12, 12

    View Slide

  91. 12/11/12
    What is Linked Open Data?
    Wednesday, December 12, 12

    View Slide

  92. 12/11/12
    What is Linked Open Data?
    • w3c Semantic Web Technology Stack
    Wednesday, December 12, 12

    View Slide

  93. 12/11/12
    What is Linked Open Data?
    • w3c Semantic Web Technology Stack
    • Web of atomized Data, not a web of documents
    Wednesday, December 12, 12

    View Slide

  94. 12/11/12
    What is Linked Open Data?
    • w3c Semantic Web Technology Stack
    • Web of atomized Data, not a web of documents
    • RDF; OWL ontologies; SPARQL queries; triple/quad/quint stores
    Wednesday, December 12, 12

    View Slide

  95. 12/11/12
    What is Linked Open Data?
    • w3c Semantic Web Technology Stack
    • Web of atomized Data, not a web of documents
    • RDF; OWL ontologies; SPARQL queries; triple/quad/quint stores
    • httpRange14; content negotiation; CURIE
    Wednesday, December 12, 12

    View Slide

  96. 12/11/12
    What is Linked Open Data?
    • w3c Semantic Web Technology Stack
    • Web of atomized Data, not a web of documents
    • RDF; OWL ontologies; SPARQL queries; triple/quad/quint stores
    • httpRange14; content negotiation; CURIE
    • No restrictions on data use; free and easy license
    Wednesday, December 12, 12

    View Slide

  97. 12/11/12
    What is Linked Open Data?
    • w3c Semantic Web Technology Stack
    • Web of atomized Data, not a web of documents
    • RDF; OWL ontologies; SPARQL queries; triple/quad/quint stores
    • httpRange14; content negotiation; CURIE
    • No restrictions on data use; free and easy license
    • Lenny wants it, but does Randy?
    Wednesday, December 12, 12

    View Slide

  98. 12/11/12
    What is Linked Open Data?
    Wednesday, December 12, 12

    View Slide

  99. 12/11/12
    What is Linked Open Data?
    • Getting to the good stuff
    Wednesday, December 12, 12

    View Slide

  100. 12/11/12
    What is Linked Open Data?
    • Getting to the good stuff
    • Blue underlined text
    Wednesday, December 12, 12

    View Slide

  101. 12/11/12
    What is Linked Open Data?
    • Getting to the good stuff
    • Blue underlined text
    • Pulling in data from multiple sources, in an intelligent
    way, into a "document"
    Wednesday, December 12, 12

    View Slide

  102. 12/11/12
    What is Linked Open Data?
    • Getting to the good stuff
    • Blue underlined text
    • Pulling in data from multiple sources, in an intelligent
    way, into a "document"
    • Understand and discover relationships
    Wednesday, December 12, 12

    View Slide

  103. 12/11/12
    What is Linked Open Data?
    • Getting to the good stuff
    • Blue underlined text
    • Pulling in data from multiple sources, in an intelligent
    way, into a "document"
    • Understand and discover relationships
    • Open access for research, education, private study and
    other fair use
    Wednesday, December 12, 12

    View Slide

  104. RDFa owl:sameAs
    Wednesday, December 12, 12

    View Slide

  105. HTML 5 microdata in chron list
    Wednesday, December 12, 12

    View Slide

  106. Thanks Ed Summers!
    RDF of the social graph
    Wednesday, December 12, 12

    View Slide

  107. Wednesday, December 12, 12

    View Slide

  108. Wednesday, December 12, 12

    View Slide

  109. Wednesday, December 12, 12

    View Slide

  110. http://templates.xdams.net/IBC/ontology/eac-cpf.rdf
    Silvia Mazzini
    regesta.exe srl
    Wednesday, December 12, 12

    View Slide

  111. Wednesday, December 12, 12

    View Slide

  112. 12/11/12
    &mode=xml2owl [experimental]
    Wednesday, December 12, 12

    View Slide

  113. 12/11/12
    My opinion on the use cases for w3c RDF tech
    • Good for publishing data
    • Good for controlled vocabularies
    • Data models?
    • Most people with open source RDF-store type systems do
    the real stuff with solr
    • Consider a graph database
    Wednesday, December 12, 12

    View Slide

  114. Wednesday, December 12, 12

    View Slide

  115. 12/11/12
    Outline
    • User Persona
    • Search and Display
    • Linked Data / RDF
    • Network graph visualization
    • Future Plans
    Wednesday, December 12, 12

    View Slide

  116. 12/11/12
    Future Plans
    Wednesday, December 12, 12

    View Slide

  117. 12/11/12
    Future Plans
    • Conduct assessment activities involving members of target
    audiences to establish mental model of users for design work
    Wednesday, December 12, 12

    View Slide

  118. 12/11/12
    Future Plans
    • Conduct assessment activities involving members of target
    audiences to establish mental model of users for design work
    • Scale interface to millions of names
    Wednesday, December 12, 12

    View Slide

  119. 12/11/12
    Future Plans
    • Conduct assessment activities involving members of target
    audiences to establish mental model of users for design work
    • Scale interface to millions of names
    • Visualizations useful and integrated (network and geospatial)
    Wednesday, December 12, 12

    View Slide

  120. 12/11/12
    Future Plans
    • Conduct assessment activities involving members of target
    audiences to establish mental model of users for design work
    • Scale interface to millions of names
    • Visualizations useful and integrated (network and geospatial)
    • Stable URLs between batches for linked data
    Wednesday, December 12, 12

    View Slide

  121. 12/11/12
    Future Plans
    • Conduct assessment activities involving members of target
    audiences to establish mental model of users for design work
    • Scale interface to millions of names
    • Visualizations useful and integrated (network and geospatial)
    • Stable URLs between batches for linked data
    • Social and personalization features (gateway to crowdsourcing)
    Wednesday, December 12, 12

    View Slide

  122. 12/11/12
    Future Plans
    • Conduct assessment activities involving members of target
    audiences to establish mental model of users for design work
    • Scale interface to millions of names
    • Visualizations useful and integrated (network and geospatial)
    • Stable URLs between batches for linked data
    • Social and personalization features (gateway to crowdsourcing)
    • Integration with local systems (such as with the context widget)
    Wednesday, December 12, 12

    View Slide

  123. 12/11/12
    • Photo attribution http://www.flickr.com/photos/dsevilla/
    139656712/in/photostream/
    • http://xtf.cdlib.org/
    • http://code.google.com/p/eac-graph-load/source/
    browse/README.txt
    • http://tinkerpop.com/
    • http://thejit.org/
    • https://github.com/tingletech/snac-related-widget
    Wednesday, December 12, 12

    View Slide