Lock in $30 Savings on PRO—Offer Ends Soon! ⏳

DLF 2012 SNAC/NAAC

tingletech
November 04, 2012

DLF 2012 SNAC/NAAC

Denver, Colorado
Sunday, November 4, 2012
Adrian Turner, California Digital Library
Ray R. Larson, School of Information, UC Berkeley
Brian Tingle, California Digital Library

http://www.diglib.org/forums/2012forum/social-networks-and-archival-context-project/

tingletech

November 04, 2012
Tweet

More Decks by tingletech

Other Decks in Research

Transcript

  1. Hamilton Alexander 1757 1804 G S Hamilton,Alexander,1757Ͳ1804 Luce,ClareBoothe,1903Ͳ 1987 Patton,GeorgeS.

    (GeorgeSmith), 1885Ͳ1945 Oppenheimer,J.Robert,1904Ͳ 1967 Sontag,Susan,1933Ͳ2004 Washington,George,1732Ͳ1799 ArchivalName Authority System Pattonfamily Whitman,Walt,1819Ͳ1892 Wright,Lloyd,1890Ͳ1978 AuthoritySystem
  2. Anthony,SusanB Franklin,Benjamin,1706Ͳ1790 Fuller,R.Buckminster (Richard Buckminster) 1895 1983 Hamilton Alexander 1757

    1804 G S y BerkeleyFreeChurch (RichardBuckminster),1895Ͳ1983 Hamilton,Alexander,1757Ͳ1804 Hamilton,Alexander,1757Ͳ1804 Luce,ClareBoothe,1903Ͳ 1987 Patton,GeorgeS. (GeorgeSmith), 1885Ͳ1945 Bernstein,Leonard, 1918Ͳ Luce,ClareBoothe,1903Ͳ1987 Oppenheimer,J.Robert,1904Ͳ1967 Oppenheimer,J.Robert,1904Ͳ 1967 Sontag,Susan,1933Ͳ2004 Washington,George,1732Ͳ1799 ArchivalName Authority System 1918 Block,Herbert,1909Ͳ2001 Pattonfamily ( h) Pattonfamily Whitman,Walt,1819Ͳ1892 Wright,Lloyd,1890Ͳ1978 AuthoritySystem Bush,Vannevar,1890Ͳ1974 kf l Patton,GeorgeS.(GeorgeSmith), Frankfurter,Felix,1882Ͳ1965
  3. Engelland,Jurgen (George). Enwall,Ogie (Aage). Erickson, Selma Inez Walfred. Norwick,Goodman. Nygaard,

    Lars Thomas Holmes Anna Gudrun Hauge SelmaInez. Fahl,HansJohanFredrik. Fet,PeterLaurits. Flones,Edward. Fredrickson,Hans. F d i k S F d i k Nygaard,LarsThomas. Odmark,ElsieKarlson. Ohrt,Sigfrid Eidsness. Oliver,Kole Skaflestad. Olson,AlvinE. Opsal,CatoTorvald. Holmes,AnnaGudrunHauge. Holmes,EliasKristoffersonVelholmen. Hoset,Ole. Howard,BarnettAllen,b.1827. Hytmo,Guri Olsdatter. Johnson,Andrew(AndersJohansson). Hamilton,Alexander,1757Ͳ1804 1885Ͳ1945 Fredrickson,SvenFredrick. Garberg,Peder. Gillam,ChandlerB.,1833Ͳ1899. Halseth,OttoHjalmer. Handeland,MarthaTweiten. H A S h id Petersen,GretaJensen. Rasmussen,Martin. Rinne,EstherWiirre. Rodneyfamily Sandback,GeorgeBrun. S Si t A d Johnson,Phiea PetersenStahl. Johnson,ThelmaIrene Underdal. Jorgenson, Jorgen Aadneram. Luce,ClareBoothe,1903Ͳ 1987 Sontag,Susan,1933Ͳ2004 Washington,George,1732Ͳ1799 Hansen,AnneSchmidt. Hansen,Sylvia(Solveig). Haug,OlgaKarolineNilsen. Hemmestad,OlgaKristineBrodahl. Henry,OscarM.,1851Ͳ1916. H l A G d Saure,Sivert Andreas. Enwall,Ogie (Aage). Erickson, SelmaInez. Fahl Hans Johan Fredrik Jorgenson,JorgenAadneram. Kjersem,OleJohnson. Knudsen,Johanne. Kofoed,Thorvald Andreas. Larsen,Elias. Oppenheimer,J.Robert,1904Ͳ 1967 Whitman,Walt,1819Ͳ1892 Flones,Edward. Fredrickson,Hans. ArchivalName Authority System Holmes,AnnaGudrun Hauge. Holmes,EliasKristofferson Velholmen Fahl,HansJohanFredrik. Fet,PeterLaurits.Norberg,JonasWalfred. Norwick,Goodman. Nygaard,LarsThomas. Odmark,ElsieKarlson. Oh t Si f id Eid Lillelien,Thor. Loe,OttoCalvin. Molund,ErikWilhelm. Nakkerud,IngaAmandaTreland. Nakkerud,Trygve Bloch. Nelson Amanda Pattonfamily Fredrickson,SvenFredrick. Garberg,Peder. Gillam,ChandlerB.,1833Ͳ1899. Halseth,OttoHjalmer. AuthoritySystem Velholmen. Hoset,Ole. Howard,BarnettAllen,b.1827. Hytmo,Guri Olsdatter. Knudsen, Johanne. Ohrt,Sigfrid Eidsness. Oliver,Kole Skaflestad. Olson,AlvinE. Opsal,CatoTorvald. Petersen,GretaJensen. R M ti Nelson,Amanda. Nerland,Einar Magnus. Nielsen,Einer. Nilsen,MarthaDagsvik. Nissen Ole Andreas Nissenivert Andreas Patton,GeorgeS. (GeorgeSmith), . Wright,Lloyd,1890Ͳ1978 Knudsen,Johanne. Kofoed,Thorvald Andreas. Nakkerud,IngaAmandaTreland. Nakkerud,Trygve Bloch. Nelson,Amanda. Nerland,Einar Magnus. Rasmussen,Martin. Rinne,EstherWiirre. Rodneyfamily Sandback,GeorgeBrun. Saure,SHandeland,Martha Nissen,OleAndreasNissenivert Andreas. Johnson,Andrew(AndersJohansson). Johnson,Phiea PetersenStahl. Johnson,ThelmaIrene Underdal Nielsen,Einer. Nilsen,MarthaDagsvik. Nissen,OleAndreasNissen. Norberg,Jonas Tweiten. Hansen,AnneSchmidt. Hansen,Sylvia(Solveig). Haug,OlgaKarolineNilsen. Underdal. Jorgenson,JorgenAadneram. Kjersem,OleJohnson.
  4. Engelland,Jurgen (George). Enwall,Ogie (Aage). Erickson, SelmaInez. Fahl,HansJohanFredrik. Nelson,Amanda. Nerland,Einar Magnus.

    Nielsen,Einer. Nilsen,MarthaDagsvik. Nissen,OleAndreasNissen. Hoset,Ole. Howard,BarnettAllen,b.1827. Hytmo,Guri Olsdatter. Johnson,Andrew(AndersJohansson). Johnson, Phiea Petersen Stahl. Engelland,Jurgen (George). Enwall,Ogie (Aage). E i k Nelson,Amanda. Nerland,Einar Magnus. Ni l Ei Hoset,Ole. Howard,BarnettAllen,b.1827. H t G i Ol d tt Engelland,Jurgen (George). Enwall,Ogie (Aage). E i k Nelson,Amanda. Nerland,Einar Magnus. Ni l Ei Hoset,Ole. Howard,BarnettAllen,b.1827. H t G Anthony,SusanB Franklin,Benjamin,1706Ͳ1790 Fuller,R.Buckminster (Richard Buckminster) 1895 1983 Hamilton Alexander 1757 1804 G S Fet,PeterLaurits. Flones,Edward. Fredrickson,Hans. Fredrickson,SvenFredrick. Garberg,Peder. Gillam Chandler B 1833 1899 , Norberg,JonasWalfred. Norwick,Goodman. Nygaard,LarsThomas. Odmark,ElsieKarlson. Johnson,Phiea PetersenStahl. Johnson,ThelmaIrene Underdal. Jorgenson,JorgenAadneram. Kj Ol J h Erickson, SelmaInez. Fahl,HansJohanFredrik. Fet,PeterLaurits. Flones,Edward. Fredrickson Hans Nielsen,Einer. Nilsen,MarthaDagsvik. Nissen,OleAndreasNissen. Norberg,JonasWalfred. Norwick, Goodman. Hytmo,Guri Olsdatter. Johnson,Andrew(AndersJohansson). Johnson,Phiea PetersenStahl. Johnson,ThelmaIrene Underdal Erickson, SelmaInez. Fahl,HansJohanFredrik. Fet,PeterLaurits. Flones,Edward. Fredrickson Hans Nielsen,Einer. Nilsen,MarthaDagsvik. Nissen,OleAndreasNissen. Norberg,JonasWalfred. Norwick, Goodman. Hytmo,Gu Johnson,Andrew(Anders Johnson,Phiea Peterse Johnson,Thelm U y BerkeleyFreeChurch (RichardBuckminster),1895Ͳ1983 Hamilton,Alexander,1757Ͳ1804 Hamilton,Alexander,1757Ͳ1804 Luce,ClareBoothe,1903Ͳ 1987 Patton,GeorgeS. (GeorgeSmith), 1885Ͳ1945 Gillam,ChandlerB.,1833Ͳ1899. Halseth,OttoHjalmer. Handeland,MarthaTweiten. Hansen,AnneSchmidt. Hansen,Sylvia(Solveig). Haug,OlgaKarolineNilsen. Ohrt,Sigfrid Eidsness. Oliver,Kole Skaflestad. Olson,AlvinE. Opsal,CatoTorvald. Petersen,GretaJensen. Rasmussen Martin Kjersem,OleJohnson. Knudsen,Johanne. Kofoed,Thorvald Andreas. Larsen,Elias. Lillelien, Thor. Fredrickson,Hans. Fredrickson,SvenFredrick. Garberg,Peder. Gillam,ChandlerB.,1833Ͳ1899. Halseth,OttoHjalmer. Handeland,MarthaTweiten. Norwick,Goodman. Nygaard,LarsThomas. Odmark,ElsieKarlson. Ohrt,Sigfrid Eidsness. Oliver,Kole Skaflestad. Olson, Alvin E. Underdal. Jorgenson,JorgenAadneram. Kjersem,OleJohnson. Knudsen,Johanne. Fredrickson,Hans. Fredrickson,SvenFredrick. Garberg,Peder. Gillam,ChandlerB.,1833Ͳ1899. Halseth,OttoHjalmer. Handeland,MarthaTweiten. Norwick,Goodman. Nygaard,LarsThomas. Odmark,ElsieKarlson. Ohrt,Sigfrid Eidsness. Oliver,Kole Skaflestad. Olson, Alvin E. U Jorgenson,JorgenAad Kjersem,OleJ Knudsen,J Bernstein,Leonard, 1918Ͳ Luce,ClareBoothe,1903Ͳ1987 Oppenheimer,J.Robert,1904Ͳ1967 Oppenheimer,J.Robert,1904Ͳ 1967 Sontag,Susan,1933Ͳ2004 Washington,George,1732Ͳ1799 g, g Hemmestad,OlgaKristineBrodahl. Henry,OscarM.,1851Ͳ1916. Holmes,AnnaGudrun Hauge. Rasmussen,Martin. Rinne,EstherWiirre. Rodneyfamily Sandback,GeorgeBrun. Saure,Sivert Andreas. Enwall,Ogie (Aage). Lillelien,Thor. Loe,OttoCalvin. Molund,ErikWilhelm. Nakkerud,IngaAmandaTreland. Nakkerud,Trygve Bloch. Nelson,Amanda. Hansen,AnneSchmidt. Hansen,Sylvia(Solveig). Haug,OlgaKarolineNilsen. Hemmestad,OlgaKristineBrodahl. Henry,OscarM.,1851Ͳ1916. l d Olson,AlvinE. Opsal,CatoTorvald. Petersen,GretaJensen. Rasmussen,Martin. Rinne,EstherWiirre. Rodneyfamily Kofoed,Thorvald Andreas. Larsen,Elias. Lillelien,Thor. Loe,OttoCalvin. Molund,ErikWilhelm. N kk d I A d l d Hansen,AnneSchmidt. Hansen,Sylvia(Solveig). Haug,OlgaKarolineNilsen. Hemmestad,OlgaKristineBrodahl. Henry,OscarM.,1851Ͳ1916. l d Olson,AlvinE. Opsal,CatoTorvald. Petersen,GretaJensen. Rasmussen,Martin. Rinne,EstherWiirre. Rodneyfamily Kofoed,Thorv Larsen,Elias. Lillelien,Thor. Loe,OttoCalvin. Molund,ErikWilhelm. N kk d I A d ArchivalName Authority System 1918 Block,Herbert,1909Ͳ2001 Pattonfamily ( h) Pattonfamily Whitman,Walt,1819Ͳ1892 Wright,Lloyd,1890Ͳ1978 g Holmes,EliasKristofferson Velholmen. Hoset,Ole. Howard, Barnett Allen, b. 1827. g ( g ) Erickson, SelmaInez. Fahl,HansJohanFredrik. Fet,PeterLaurits. Fl Ed d Nerland,Einar Magnus. Nielsen,Einer. Nilsen,MarthaDagsvik. Nissen,OleAndreasNissen. Holmes,AnnaGudrun Hauge. Holmes,EliasKristofferson V lh l Sandback,GeorgeBrun. Saure,Sivert Andreas. Enwall,Ogie (Aage). Erickson, Selma Inez Nakkerud,IngaAmandaTreland. Nakkerud,Trygve Bloch. Nelson,Amanda. Nerland,Einar Magnus. Nielsen Einer Holmes,AnnaGudrun Hauge. Holmes,EliasKristofferson V lh l Sandback,GeorgeBrun. Saure,Sivert Andreas. Enwall,Ogie (Aage). Erickson, Selma Inez Nakkerud,IngaAmanda Nakkerud,Trygve Bl Nelson,Amanda Nerland,Einar Magnus. Nielsen Ei AuthoritySystem Bush,Vannevar,1890Ͳ1974 kf l Patton,GeorgeS.(GeorgeSmith), Howard,BarnettAllen,b.1827. Hytmo,Guri Olsdatter. Johnson,Andrew(AndersJohansson). Johnson,Phiea PetersenStahl. Johnson,ThelmaIreneUnderdal. Jorgenson Jorgen Aadneram Flones,Edward. Fredrickson,Hans. Fredrickson,SvenFredrick. Garberg,Peder. Gillam,ChandlerB.,1833Ͳ1899. Norberg,JonasWalfred. Norwick,Goodman. Nygaard,LarsThomas. Odmark,ElsieKarlson. Ohrt,Sigfrid Eidsness. Oliver Kole Skaflestad Velholmen. Hoset,Ole. Howard,BarnettAllen,b.1827. Hytmo,Guri Olsdatter. Johnson Andrew (Anders Johansson) SelmaInez. Fahl,HansJohanFredrik. Fet,PeterLaurits. Flones,Edward. Fredrickson,Hans. Fredrickson, Sven Fredrick. Nielsen,Einer. Nilsen,MarthaDagsvik. Nissen,OleAndreasNissen. Norberg,JonasWalfred. Norwick,Goodman. Nygaard Lars Thomas Velholmen. Hoset,Ole. Howard,BarnettAllen,b.1827. Hytmo,Guri Olsdatter. Johnson Andrew (Anders Johansson) SelmaInez. Fahl,HansJohanFredrik. Fet,PeterLaurits. Flones,Edward. Fredrickson,Hans. Fredrickson, Sven Fredrick. Nielsen,Ei Nilsen,MarthaDagsvik. Nissen,OleAndreasNissen. Norberg,JonasWalfred. Norwick,Goodman. Nygaard Lars Thomas Frankfurter,Felix,1882Ͳ1965 Jorgenson,JorgenAadneram. Kjersem,OleJohnson. Knudsen,Johanne. Kofoed,Thorvald Andreas. Larsen,Elias. Lillelien, Thor. Halseth,OttoHjalmer. Handeland,MarthaTweiten. Hansen,AnneSchmidt. Hansen,Sylvia(Solveig). Haug,OlgaKarolineNilsen. H d Ol K i i B d hl Oliver,Kole Skaflestad. Olson,AlvinE. Opsal,CatoTorvald. Petersen,GretaJensen. Rasmussen,Martin. Rinne,EstherWiirre. Johnson,Andrew(AndersJohansson). Johnson,Phiea PetersenStahl. Johnson,ThelmaIreneUnderdal. Jorgenson,JorgenAadneram. Kjersem,OleJohnson. Knudsen,Johanne. Fredrickson,SvenFredrick. Garberg,Peder. Gillam,ChandlerB.,1833Ͳ1899. Halseth,OttoHjalmer. Handeland,MarthaTweiten. Hansen Anne Schmidt Nygaard,LarsThomas. Odmark,ElsieKarlson. Ohrt,Sigfrid Eidsness. Oliver,Kole Skaflestad. Olson,AlvinE. Opsal,CatoTorvald. Johnson,Andrew(AndersJohansson). Johnson,Phiea PetersenStahl. Johnson,ThelmaIreneUnderdal. Jorgenson,JorgenAadneram. Kjersem,OleJohnson. Knudsen,Johanne. Fredrickson,SvenFredrick. Garberg,Peder. Gillam,ChandlerB.,1833Ͳ1899. Halseth,OttoHjalmer. Handeland,MarthaTweiten. Hansen Anne Schmidt Nygaard,LarsThomas. Odmark,ElsieKarlson. Ohrt,Sigfrid Eidsness. Oliver,Kole Skaflestad. Olson,AlvinE. Opsal,CatoTorvald. Lillelien,Thor. Loe,OttoCalvin. Molund,ErikWilhelm. Nakkerud,IngaAmandaTreland. Nakkerud,Trygve Bloch. Hemmestad,OlgaKristineBrodahl. Henry,OscarM.,1851Ͳ1916. Holmes,AnnaGudrunHauge. Holmes,EliasKristoffersonVelholmen. , Rodneyfamily Sandback,GeorgeBrun. Saure,Sivert Andreas. , Kofoed,Thorvald Andreas. Larsen,Elias. Lillelien,Thor. Loe,OttoCalvin. Molund,ErikWilhelm. N kk d I A d T l d Hansen,AnneSchmidt. Hansen,Sylvia(Solveig). Haug,OlgaKarolineNilsen. Hemmestad,OlgaKristineBrodahl. Henry,OscarM.,1851Ͳ1916. Holmes,AnnaGudrunHauge. Petersen,GretaJensen. Rasmussen,Martin. Rinne,EstherWiirre. Rodneyfamily Sandback,GeorgeBrun. Saure Sivert Andreas , Kofoed,Thorvald Andreas. Larsen,Elias. Lillelien,Thor. Loe,OttoCalvin. Molund,ErikWilhelm. N kk d I A d T l d Hansen,AnneSchmidt. Hansen,Sylvia(Solveig). Haug,OlgaKarolineNilsen. Hemmestad,OlgaKristineBrodahl. Henry,OscarM.,1851Ͳ1916. Holmes,AnnaGudrunHauge. Petersen,GretaJensen. Rasmussen,Martin. Rinne,EstherWiirre. Rodneyfamily Sandback,GeorgeBrun. Saure Sivert Andreas
  5. Background Background • Researchanddemonstrationproject • Multi year funding • MultiͲyearfunding

    • NationalEndowmentfortheHumanities (2010Ͳ2012) ( • AndrewW.MellonFoundation(2012Ͳ 2014) 2014)
  6. Objectives Objectives l l f 1. DeveloptoolsforextractingEACͲCPF records,drawingonexistingdata(EAD , g

    g ( findingaids,MARCrecords) 2 Match merge and enhance; build a 2. Match,merge,andenhance;builda largetestcorpusofEACͲCPFrecords 3. Createaprototypebiographical resource and access system using resourceandaccesssystem,using thoserecords
  7. Objectives Objectives l l f 1. DeveloptoolsforextractingEACͲCPF records,drawingonexistingdata(EAD , g

    g ( findingaids,MARCrecords) 2 Match merge and enhance; build a 2. Match,merge,andenhance;builda largetestcorpusofEACͲCPFrecords 3. Createaprototypebiographical resource and access system using resourceandaccesssystem,using thoserecords
  8. Objectives Objectives l l f 1. DeveloptoolsforextractingEACͲCPF records,drawingonexistingdata(EAD , g

    g ( findingaids,MARCrecords) 2 Match merge and enhance; build a 2. Match,merge,andenhance;builda largetestcorpusofEACͲCPFrecords 3. Createaprototypebiographical resource and access system using resourceandaccesssystem,using thoserecords
  9. Project Team ProjectTeam • UniversityofVirginia,Institutefor Advanced Technology in the Humanities

    AdvancedTechnologyintheHumanities – DanielPitti(PI)andWorthyMartin • UCBerkeleySchoolofInformation – Ray Larson and Yiming Liu RayLarsonandYimingLiu • CaliforniaDigitalLibrary – RachaelHu,BrianTingle,andAdrianTurner
  10. Project Team ProjectTeam • TerryCatapano(ColumbiaUniversity) • SaraSprenkle(WashingtonandLeeUniversity) • SarahWells(UniversityofVirginia) •

    Kathy Wisser (Simmons Graduate School of Library • KathyWisser(SimmonsGraduateSchoolofLibrary andInformationScience) T L h (U i it f Illi i S h l f Lib • TomLynch(UniversityofIllinoisSchoolofLibrary andInformationScience)
  11. EAC CPF EACͲCPF • XMLͲbaseddatastructurestandardfor encodingarchivalauthorityrecords g y • Authorizednameheadingsfortheentity

    i hi l/hi i l f h i • Biographical/historicalcontextfortheentity • Linkstoresourcescreatedbytheentity y y • Linkstoresourcesabouttheentity
  12. Data Sources DataSources EAD fi di id • EADfindingaids[~150,000] –

    13regionalandstatewideconsortia – 35 repositories in US, UK, and France; multiple US federal 35repositoriesinUS,UK,andFrance;multipleUSfederal agencies • MARC21records[~1.5million] OCLC W ldC t – OCLCWorldCat • Authorityrecords – OCLC Research: Virtual International Authority File (VIAF) OCLCResearch:VirtualInternationalAuthorityFile(VIAF) [~16million] – GettyVocabularyProgram:UnionListofArtistNames(ULAN) [~120,000] [ ] – AdditionalnamerecordsfromArchivesnationales,British Library,NARA,NewYorkStateArchives,andSmithsonian InstitutionArchives
  13. Consortia Individualinstitutions P i t •ArchivesFlorida •ArchivesHub(UK) •ArizonaArchivesOnline •EAD FACTORY

    (OhioLink) •AmericanPhilosophicalSociety •Archivesnationales(France) •ArchivesofAmericanArt •Bibliothèque nationale de France •NorthwesternUniversity •PrincetonUniversity •RutgersUniversity •Smithsonian Institution Archives • Points •EADFACTORY(OhioLink) •FiveColleges •MaineArchivalCollections Online(MACON) BibliothèquenationaledeFrance •BnFArchivesetmanuscripts •FrenchUnionCatalog •BrighamYoungUniversity SmithsonianInstitutionArchives •SyracuseUniversity •UniversityofAlabama •UniversityofChicago ( ) •NorthwestDigitalArchives (NWDA) •OnlineArchiveofCalifornia •Philadelphia Area •ChurchofLatterDaySaints Archives •ColumbiaUniversity •Cornell University •UniversityofConnecticut •UniversityofDelaware •UniversityofFlorida •University of Illinois •PhiladelphiaArea ConsortiumofSpecial CollectionsLibraries(PACSCL) •RhodeIslandArchival& CornellUniversity •DukeUniversity •HarvardUniversity •IndianaUniversity UniversityofIllinois •UniversityofKansas •UniversityofMaryland •UniversityofMichiganBentley& ManuscriptCollectionsOnline (RIAMCO) •RockyMountainOnline Archive (RMOA) •LibraryofCongress(publicly availablewithoutrestriction) •MinnesotaHistoricalSociety •Massachusetts Institute of SpecialCollections •UniversityofMinnesota •UniversityofNebraska •University of North Carolina Archive(RMOA) •TexasArchivalResources Online(TARO) •VirginiaHeritage MassachusettsInstituteof Technology •NationalLibraryofMedicine •NewYorkPublicLibrary UniversityofNorthCarolina, ChapelHill •UniversityofUtah •UtahStateArchives •NewYorkUniversity •NorthCarolinaState •UtahStateUniversity •YaleUniversity
  14. Data Sources DataSources EAD fi di id • EADfindingaids[~150,000] –

    13regionalandstatewideconsortia – 35 repositories in US, UK, and France; multiple US federal 35repositoriesinUS,UK,andFrance;multipleUSfederal agencies • MARC21records[~1.5million] OCLC W ldC t – OCLCWorldCat • Authorityrecords – OCLC Research: Virtual International Authority File (VIAF) OCLCResearch:VirtualInternationalAuthorityFile(VIAF) [~16million] – GettyVocabularyProgram:UnionListofArtistNames(ULAN) [~120,000] [ ] – AdditionalnamerecordsfromArchivesnationales,British Library,NARA,NewYorkStateArchives,andSmithsonian InstitutionArchives
  15. Data Sources DataSources EAD fi di id • EADfindingaids[~150,000] –

    13regionalandstatewideconsortia – 35 repositories in US, UK, and France; multiple US federal 35repositoriesinUS,UK,andFrance;multipleUSfederal agencies • MARC21records[~1.5million] OCLC W ldC t – OCLCWorldCat • Authorityrecords – OCLC Research: Virtual International Authority File (VIAF) OCLCResearch:VirtualInternationalAuthorityFile(VIAF) [~16million] – GettyVocabularyProgram:UnionListofArtistNames(ULAN) [~120,000] [ ] – AdditionalnamerecordsfromArchivesnationales,British Library,NARA,NewYorkStateArchives,andSmithsonian InstitutionArchives
  16. Activities Activities 1. CultivateEACͲCPF expertiseacrossthe archival community through 140 SAAͲ

    archivalcommunity,through140SAA hostedworkshops 2. Developablueprintforasustainable, national archival authority cooperative nationalarchivalauthoritycooperative
  17. Activities Activities 1. CultivateEACͲCPF expertiseacrossthe archival community through 140 SAAͲ

    archivalcommunity,through140SAA hostedworkshops 2. Developablueprintforasustainable, national archival authority cooperative nationalarchivalauthoritycooperative
  18. Activities Activities 1. CultivateEACͲCPF expertiseacrossthe archival community through 140 SAAͲ

    archivalcommunity,through140SAA hostedworkshops 2. Developablueprintforasustainable, national archival authority cooperative nationalarchivalauthoritycooperative Staytunedforfall2013!
  19. 2012-11-04 - SLIDE DLF 2012 - Denver The Social Networks

    and Archival Context Project: Status Report Adrian Turner*, Ray R. Larson**, Brian Tingle* *California Digital Library **University of California, Berkeley - School of Information Thanks to Daniel V. Pitti of the Institute for Advanced Technology in the Humanities, University of Virginia, and Brian Tingle of the California Digital Library for many of the slides here
  20. 2012-11-04 - SLIDE DLF 2012 - Denver Funding and People

    • Funding and Timeline – National Endowment for the Humanities – May 2010-April 2012 – Andrew W. Mellon Foundation – May 2012-April 2014 • People – Daniel Pitti (PI) and Worthy Martin (Institute for Advanced Technology in the Humanities, University of Virginia) – Adrian Turner and Brian Tingle (California Digital Library, University of California) – Ray Larson (School of Information, University of California, Berkeley)
  21. 2012-11-04 - SLIDE DLF 2012 - Denver Two Interrelated Project

    • Further the transformation of archival description (separate description of records from description of people documented in them) in order to … • Enhance access to archival resources, though in fact all cultural heritage resources • Enhance understanding of resources by providing the social-professional context within which people lived and worked
  22. 2012-11-04 - SLIDE DLF 2012 - Denver The Source Data

    • EAD-encoded finding aids (guides to archival records) – 150K – Primarily from U.S. sources, but also U.K. and France • Archival authority records (360K) – National Archives and Records Administration – State Archive of New York – Smithsonian Institution – British Library – National Archives (France) & BnF • WorldCat Archival Descriptions: 2M
  23. 2012-11-04 - SLIDE DLF 2012 - Denver Library and Museum

    Authority Records • Getty Vocabulary Program: Union List of Artist Names (293K personal and corporate names) • Virtual International Authority File (16M+ cluster records) – Contributed from around the world by national libraries and others
  24. 2012-11-04 - SLIDE DLF 2012 - Denver Methods and Processing

    • Extract EAC-CPF records from existing EAD- encoded archival descriptions – Extracting both creators and referenced CPF names • Match EAC-CPF records against one another and against existing authority records (ULAN, VIAF, LCNAF) – Enhance EAC-CPF by normalizing entries, adding alternative entries, titles (VIAF), and historical data (ULAN) • Create a prototype historical resource and access system – Historical data and social-professional networks – Links to archive, library, and museum resources (by and about)
  25. 2012-11-04 - SLIDE DLF 2012 - Denver Example EAD Record

    (Hub) <EAD> <EADHEADER LANGENCODING = "ISO 639"> <EADID> GB 0133 TAB </EADID> <FILEDESC> <TITLESTMT> <TITLEPROPER> Tabley Muniments </TITLEPROPER> </TITLESTMT> <PUBLICATIONSTMT> <PUBLISHER> John Rylands University Library of Manchester </PUBLISHER> <ADDRESS> <ADDRESSLINE> 150 Deansgate </ADDRESSLINE> <ADDRESSLINE> Manchester </ADDRESSLINE> <ADDRESSLINE> ... (Parts removed )… </FRONTMATTER> <ARCHDESC LEVEL = "FONDS" LANGMATERIAL = "English"> <DID> <REPOSITORY> University of Manchester, John Rylands University Library of Manchester </REPOSITORY> <UNITID ENCODINGANALOG = "ISADG3.1.1." COUNTRYCODE = "GB" REPOSITORYCODE = "0133"> GB 0133 TAB </UNITID> <UNITTITLE LABEL = "Title" ENCODINGANALOG = "ISADG3.1.2."> Tabley Muniments </UNITTITLE> <UNITDATE LABEL = "Dates of Creation" ENCODINGANALOG = "ISADG3.1.3."> 19th century </UNITDATE> <PHYSDESC LABEL = "Extent" ENCODINGANALOG = "ISADG3.1.5."> <EXTENT> 1.24 cu.m </EXTENT> </PHYSDESC> <ORIGINATION LABEL = "Creator" ENCODINGANALOG = "ISADG3.2.1."> <FAMNAME SOURCE = "NCARULES"> Warren, family, of Tabley, Cheshire </FAMNAME> <PERSNAME SOURCE = "NCARULES"> Warren, John Byrne Leicester, 1835-1895, 3rd Baron de Tabley, poet </PERSNAME> </ORIGINATION> </DID>
  26. 2012-11-04 - SLIDE DLF 2012 - Denver Example EAD Record

    (Hub) <BIOGHIST ENCODINGANALOG = "ISADG3.2.2."> <HEAD> Administrative/Biographical History </HEAD> <P> The poet John Byrne Leicester Warren, later 3rd and last Baron de Tabley, of Tabley near Knutsford, Cheshire, was born in 1835, the son of the 2nd Baron de Tabley (1811-1887), and his wife, Catherina. His mother was Italian, the daughter of the count de Soglio, and Warren spent much of his early childhood with her in Italy and Greece. He was educated at Eton and Christ Church, Oxford. At Oxford he published a volume of poetry. Originally he published under the pseudonyms George F. Preston (1859-1862) and William Lancaster (1863-1868), but latterly under his own name. </P> <P> His early verse included <TITLE> Praeterita </TITLE> (1863), <TITLE> Eclogues and Monodramas </TITLE> (1864), <TITLE> Studies in Verse </TITLE> (1865), <TITLE> Philocletes </TITLE> (1866), and <TITLE> Orestes </TITLE> (1868). His early work was Tennysonian in style, but he was later to be influenced by both Browning and Swinburne. In 1873 he produced …. (some data removed)…
  27. 2012-11-04 - SLIDE DLF 2012 - Denver Example EAD Record

    (Hub) <SCOPECONTENT ENCODINGANALOG = "ISADG3.3.1."> <HEAD> Scope and Content </HEAD> <P> The collection consists mainly of the personal papers of the 3rd Baron de Tabley. The papers reflect his interests in literature, politics, botany and numismatics and include correspondence with numerous prominent later Victorian figures. Attention should also be drawn to de Tabley’s extensive and important collection of armorial bookplates. </P> <P> Correspondents include Sir Mountstuart Grant Duff, Edmund Gosse, Lord Houghton, A.C.Benson, and Robert Bridges. There are volumes of Tabley's essays and verse, as well as a considerable number of notebooks and loose manuscripts of verse and other writings. There are various bundles and boxes relating to &quot;Coins&quot;, &quot;Botany&quot;, &quot;Poetry&quot;, &quot;Literary&quot;, &quot;Financial&quot; and bookplates. </P> </SCOPECONTENT> <ADD> <OTHERFINDAID ENCODINGANALOG = "ISADG3.4.6."> <P> Preliminary survey list. </P> </OTHERFINDAID> <RELATEDMATERIAL ENCODINGANALOG = "ISADG3.5.3."> <P> There is correspondence with the 3rd Baron de Tabley among the Edward Freeman Papers, held at JRULM. The Library also has custody of the important Tabley Book Collection. </P> </RELATEDMATERIAL> <SEPARATEDMATERIAL> <P> The family and estate papers of the Leicester-Warren Family of Tabley are held by Cheshire Record Office. Some of these papers were originally in the custody of the John Rylands University Library of Manchester. </P> </SEPARATEDMATERIAL> </ADD>
  28. 2012-11-04 - SLIDE DLF 2012 - Denver Example EAD Record

    (Hub) <CONTROLACCESS> <HEAD> Index terms </HEAD> <GEOGNAME SOURCE = "NCARULES"> <EMPH ALTRENDER = "a">Tabley Inferior</EMPH> <EMPH ALTRENDER = "a-">Cheshire SJ7378</EMPH> </GEOGNAME> <PERSNAME SOURCE = "NCARULES"> <EMPH ALTRENDER = "surname">Benson</EMPH> <EMPH ALTRENDER = "forename">Arthur Christopher</EMPH> <EMPH ALTRENDER = "dates">1862-1923</EMPH> </PERSNAME> <PERSNAME SOURCE = "NCARULES"> <EMPH ALTRENDER = "surname">Bridges</EMPH> <EMPH ALTRENDER = "forename">Robert Seymour</EMPH> <EMPH ALTRENDER = "dates">1844-1930</EMPH> </PERSNAME> <PERSNAME SOURCE = "NCARULES"> <EMPH ALTRENDER = "surname">Duff</EMPH> <EMPH ALTRENDER = "title">Sir</EMPH> <EMPH ALTRENDER = "forename">Mountstuart Elphinstone Grant</EMPH> <EMPH ALTRENDER = "dates">1829-1906</EMPH> <EMPH ALTRENDER = "epithet">Knight</EMPH> </PERSNAME> <PERSNAME SOURCE = "NCARULES"> <EMPH ALTRENDER = "surname">Gosse</EMPH> <EMPH ALTRENDER = "title">Sir</EMPH> <EMPH ALTRENDER = "forename">Edmund William</EMPH> <EMPH ALTRENDER = "dates">1849-1928</EMPH> <EMPH ALTRENDER = "epithet">Knight</EMPH> </PERSNAME> <PERSNAME SOURCE = "NCARULES"> <EMPH ALTRENDER = "surname">Milnes</EMPH> <EMPH ALTRENDER = "forename">Richard Monckton</EMPH> <EMPH ALTRENDER = "dates">1809-1885</EMPH> <EMPH ALTRENDER = "epithet">1st Baron Houghton</EMPH> </PERSNAME> <SUBJECT SOURCE = "LCSH"> <EMPH ALTRENDER = "a">Bookplates</EMPH> </SUBJECT> <SUBJECT SOURCE = "LCSH"> <EMPH ALTRENDER = "a">Botany</EMPH> </SUBJECT> <SUBJECT SOURCE = "LCSH"> <EMPH ALTRENDER = "a">Numismatics</EMPH> </SUBJECT> <SUBJECT SOURCE = "LCSH"> <EMPH ALTRENDER = "a-">Poetry</EMPH> <EMPH ALTRENDER = "a">Modern</EMPH> <EMPH ALTRENDER = "y">19th century</EMPH> </SUBJECT> </CONTROLACCESS> </ARCHDESC> </EAD>
  29. 2012-11-04 - SLIDE DLF 2012 - Denver 2010-2012 Extraction Results

    • Source data: 30,000 finding aids • EAC-CPF records extracted – LoC: 43,702 from 1,159 finding aids – OAC: 91,811 from ~15,400 – NWDA: 22,609 from 5,160 – VH: 15,175 from 8,390 – Total 173,297
  30. 2012-11-04 - SLIDE DLF 2012 - Denver Methods and Processing

    • Extract EAC-CPF records from existing EAD- encoded archival descriptions – Extracting both creators and referenced CPF names • Match EAC-CPF records against one another and against existing authority records (ULAN, VIAF, LCNAF) – Enhance EAC-CPF by normalizing entries, adding alternative entries, titles (VIAF), and historical data (ULAN) • Create a prototype historical resource and access system – Historical data and social-professional networks – Links to archive, library, and museum resources (by and about)
  31. 2012-11-04 - SLIDE DLF 2012 - Denver The Problem •

    Proliferation of the forms of names – Different names for the same person – Different people with the same names • Examples – from Books in Print (semi-controlled but not consistent) – ERIC author index (not controlled)
  32. 2012-11-04 - SLIDE DLF 2012 - Denver Library and Archive

    Authority • Library (or bibliographic) authority control is almost exclusively about the control of names • Archival authority control involves biographical- historical description of the CPF entity – Descriptions based on controlled vocabularies, for example, occupations, place of birth and death – But also biographical-historical description • Prose • Chronological list • Archival authority control provides context for understanding records, the context of their creation, the provenance
  33. 2012-11-04 - SLIDE DLF 2012 - Denver Repository of merged

    EAC Records EAC Repository VIAF Repository Connect exactly matching records Connect records using name authority information Repository of connected EAC Records (MongoDB) Merge Cheshire Search Merging EAC-CPF Records LCNAF Repository ULAN Repository
  34. 2012-11-04 - SLIDE DLF 2012 - Denver Repository of merged

    EAC Records EAC Repository VIAF Repository Connect exactly matching records Connect records using name authority information Repository of connected EAC Records (MongoDB) Merge Cheshire Search Merging EAC-CPF Records
  35. 2012-11-04 - SLIDE DLF 2012 - Denver Connect Exact Matches

    • The EAC-CPF records provide the names without having to parse texts, etc. • Allows us to use some simple methods like exact matching – Assume identical name entries means the same person/corporate body/family – Enter the full names and record IDs into a database and flag IDs with same names for merging
  36. 2012-11-04 - SLIDE DLF 2012 - Denver But… • Exact

    merging assumes that archives are following LC cataloging practice in their EAD records – There are some problems with this assumption
  37. 2012-11-04 - SLIDE DLF 2012 - Denver Some failures for

    merging… • Different abbreviations: – A. & G. Carisch & C. – A. & G. Carisch & Co. • And spacing issues: – A. C. Peters & Bro. – A. C. Peters & Brother. – A. C. Peters. (??) – A. C.Peters & Bro. • Completeness and alternate rules – Tabb, John B. (John Banister), 1845-1909. – Tabb, John Banister, 1845-1909. • Also differing transliterations for non-Latin scripts
  38. 2012-11-04 - SLIDE DLF 2012 - Denver More… • Variant

    romanizations (and spacing): – M. P. Belaieff. – M. P. Belaïeff. – M. P. Bieliaev. – M.P. Belaïeff. – M.P.Belaïeff. • Initials vs. names: – Zabolotskii, N.A. – Zabolotskii, Nikolai Alekseevich, 1903-1958. – Zabolotskii.
  39. 2012-11-04 - SLIDE DLF 2012 - Denver More… • Inverted

    order vs. uninverted – Taylor, Zachary, 1784-1850. – Zachary Taylor. • Various combinations: – Tchaikovsky, Peter I. – Tchaikovsky, Pëtr Il. – Tchaikovsky, Piotr Ilyich. – Tchaikovsky, Pyotr Il. – Tchaikovsky, Pyotr Ilyich.
  40. 2012-11-04 - SLIDE DLF 2012 - Denver Repository of merged

    EAC Records EAC Repository VIAF Repository Connect exactly matching records Connect records using name authority information Repository of connected EAC Records (MongoDB) Merge Cheshire Search Merging EAC-CPF Records
  41. 2012-11-04 - SLIDE DLF 2012 - Denver Search Authority Files

    • For each name, formulate a search of the VIAF database using the Cheshire system (SGML/XML retrieval system with probabilistic and Boolean matching) – Search both the “authoritative” and “non- authoritative” forms – Consider any name matching a non- authoritative form to be a candidate match for the authoritative form – Flag EAC records that match the same authority record as potential matches
  42. 2012-11-04 - SLIDE DLF 2012 - Denver Shingle Language Model

    for names Name: Einstein Albert Shingle sequence: ein, ins, nst, ste, tei, ein … , ert Probability that the sequence (ins, nst, ste) follows ein is very high for the name einstein Krishna Janakiraman and Sean Marimpietri - Biograph NGRAM or Shingle Matching
  43. 2012-11-04 - SLIDE DLF 2012 - Denver Name 1 :

    Einstein Albert Name 2 : Ainshtain Albert Name 3 : Albert Einstein ein ins nst ste ein In n a alb ert al rte tei ein Ain ins nsh sht hta tai ain alb ert al rte tei ein ein ins nst ste ein In n a alb ert al rte tei ein lbe lbe lbe Shingle Language Model for names Krishna Janakiraman and Sean Marimpietri - Biograph
  44. 2012-11-04 - SLIDE DLF 2012 - Denver Repository of merged

    EAC Records EAC Repository VIAF Repository Connect exactly matching records Connect records using name authority information Repository of connected EAC Records (MongoDB) Merge Cheshire Search Merging EAC-CPF Records
  45. 2012-11-04 - SLIDE DLF 2012 - Denver Merge Flagged Records

    • For all of the exact matches and authority matches – Use the Authoritative form of the name – Combine data from each match into a single EAC-CPF record – Retain all source record IDs and information • Finally, output the merged EAC-CPF records
  46. 2012-11-04 - SLIDE DLF 2012 - Denver Inputs to SNAC

    merging • LoC: 43,702 EAC-CPF records derived from 1159 finding aids • OAC: 91,814 EAC-CPF records derived from ~15,400 finding aids • NWDA: 24952 EAC-CPF records derived from 5,568 finding aids • VH: 15,175 EAC-CPF records • Total: 175,688 Input EAC records for merging • Result: 128,781 “unique” names
  47. 2012-11-04 - SLIDE DLF 2012 - Denver Another view of

    the numbers… • 95624 Person names merged from 125555 Person records • 31287 Institutions merged from 47189 Institution records • 1980 Families merged from 2899 Family records
  48. 2012-11-04 - SLIDE DLF 2012 - Denver Merging Conclusions •

    There will not be a single merging method, but a staged set of approaches that will allow us to go from the simplest exact matches, to (we hope) reliably identifying various variant forms of a name, etc. when corroborated by contextual (date, etc.) information
  49. 2012-11-04 - SLIDE DLF 2012 - Denver Next • Developing

    an updateable database of merged EAC data (dumping Mongo for PostgreSQL) – Will permit incremental addition of new data and support editing and “forced” merges • Process the 2M WorldCat archival descriptions • Process the 150,000 finding aids • Convert several hundred thousand archival authority records into EAC-CPF and match/ merge process
  50. 2012-11-04 - SLIDE DLF 2012 - Denver Methods and Processing

    • Extract EAC-CPF records from existing EAD- encoded archival descriptions – Extracting both creators and referenced CPF names • Match EAC-CPF records against one another and against existing authority records (ULAN, VIAF, LCNAF) – Enhance EAC-CPF by normalizing entries, adding alternative entries, titles (VIAF), and historical data (ULAN) • Create a prototype historical resource and access system – Historical data and social-professional networks – Links to archive, library, and museum resources (by and about)
  51. 2012-11-04 - SLIDE DLF 2012 - Denver For More Information

    • http://socialarchive.iath.virginia.edu/ (Project website) • http://socialarchive.iath.virginia.edu/xtf/ search (public prototype)
  52. Outline • User Persona! • Search and Display! • Network

    graph visualization! • Linked Data / RDF! • Future Plans
  53. Meet the target users • Randy: Graduate student working on

    a PhD that involves biographies and the study of diplomatic families and networks. Sometimes he comes to the site looking for information on specific people; other times he is looking for information on a specific subject or event. He also TAs an undergraduate history class and sometimes has to help students find topics for papers. " • Connie: Works at an institution that contributed records to the project. Is going to be asking themselves how this site would be useful to their users. Wants to understand how their records were used and what the added value is." • Quincy: Library School Student working to QA record matching. " • Adele: Person doing authority work during collection processing. " • Lenny: Lenny likes linked data, and wants to be able to mine the links that have been established programatically. Personas are fictional characters created to represent the different user types within a targeted demographic, attitude and/or behavior set that might use a site, brand or product in a similar way. http://en.wikipedia.org/wiki/Persona_(marketing)
  54. Outline • User Persona! • Search and Display • Network

    graph visualization! • Linked Data / RDF! • Future Plans
  55. Outline • User Persona! • Search and Display! • Network

    graph visualization • Context widget (needs new name) • Linked Data / RDF! • Future Plans
  56. Tinkerpop graph database stack • Simple "property graph" model! •

    "JDBC for graph databases" [SNAC is using Neo4J for the graphDB]! • XPath like "gremlin" for graph query! • REST interfaces with "Rexster"! • For me, this was 10 to 100 times easier than using RDF
  57. Outline • User Persona! • Search and Display! • Network

    graph visualization! • Linked Data / RDF • Future Plans
  58. What is Linked Open Data? • w3c Semantic Web Technology

    Stack! • Web of atomized Data, not a web of documents! • RDF; OWL ontologies; SPARQL queries; triple/quad/quint stores! • httpRange14; content negotiation; CURIE! • No restrictions on data use; free and easy license! • Lenny wants it, but does Randy?
  59. What is Linked Open Data? • Getting to the good

    stuff! • Blue underlined text! • Pulling in data from multiple sources, in an intelligent way, into a "document"! • Understand and discover relationships! • Open access for research, education, private study and other fair use
  60. My opinion on the use cases for w3c RDF tech

    • Good for publishing data! • Good for controlled vocabularies! • Data models?! • Most people with open source RDF-store type systems do the real stuff with solr! • Consider a graph database
  61. Outline • User Persona! • Search and Display! • Linked

    Data / RDF! • Network graph visualization! • Future Plans
  62. Future Plans • Conduct assessment activities involving members of target

    audiences to establish mental model of users for design work! • Scale interface to millions of names! • Visualizations useful and integrated (network and geospatial)! • Stable URLs between batches for linked data! • Social and personalization features (gateway to crowdsourcing)! • Integration with local systems (such as with the context widget)
  63. • Photo attribution http://www.flickr.com/photos/ dsevilla/139656712/in/photostream/! • http://xtf.cdlib.org/ ! • http://code.google.com/p/eac-graph-load/source/

    browse/README.txt! • http://tinkerpop.com/! • http://thejit.org/! • https://github.com/tingletech/snac-related-widget