Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Eero Hyvönen:‘Harmonising the Heterogeneous: Shared Ontologies and Linked Data in Archives, Museums and Libraries’

Eero Hyvönen:‘Harmonising the Heterogeneous: Shared Ontologies and Linked Data in Archives, Museums and Libraries’

More Decks by Cultures of Knowledge: Networking the Republic of Letters, 1550-1750

Transcript

  1. Harmonising the Heterogenous: Shared Ontologies and Linked Data in Archives,

    Museums and Libraries Eero Hyvönen [email protected] Semantic Computing Research Group (SeCo) Aalto University http://www.seco.tkk.fi/
  2. Outline  Background: Semantic Web and Linked Data is Here

     Vision: Content Infrastructure is Needed! – Ontology and Data Services  Practise: Applications
  3. GGG: Big Boys Have Entered the Game  Google Knowledge

    Graph  Microsoft Satori Knowledge Base
  4. Limitations of Non-semantic Web  <artifact> <id> NBA:H26069:467 </id> <target>

    cup and plate </target> <material> porcelain </material> <creationLocation> Germany </creationLocation> <creator> Meissen </creator> </artifact>  This metadata cannot answer the following questions: – Find all vessels? – Find all ceramic products? – Find artifacts manufactured in Europe? – Does the city of Meissen manufacture ceramics?
  5. Semantic Web Solution: Ontologies  NBA-H26069-467 :object ”cup and plate”

    ; :object_concept object:cup ; :object_concept object:plate ; :material ”porcelain” ; :material_concept object:porcelain ; :creationPlace ”Germany” ; :creationPlace_concept place:Germany ; :creator ”Meissen” :creator_concept actor:Meissen . NBA-H26069-467 place:Germany object:cup creationLocation_concept place:Europe loc:partOf rdfs:subClassOf object:vessel object_concept object_concept object:plate rdfs:subClassOf ... ... ... Find all vessels? Find all ceramic products? Find artifacts manufactured in Europe? Does the city of Meissen manufacture ceramics? object ontology place ontology actor ontology material ontology place:Meissen actor:Meissen material:porcelain material_concept
  6. Cultural Content Compexity - Heterogenous and Interlinked Encyclopedia Artifacts Maps

    Videos Buildings Fine arts Biographies Narratives Literature Cultural sites Music
  7. Problem 2: Cultural Content Production System - Distributed and Independent

    Museums Libraries Archieves Land survey Linked Data Web 2.0 sites Media Citizens
  8. FinnONTO Ontology Infrastructure CultureSampo model in a Nutshell Semantic Metadata

    Content Providers Land survey Museums Archieves Linked Data Citizens Libraries Web 2.0 sites Media
  9. Biografiakeskus ja kirjastot keräävät henkilöhistoriaa henkilö nimi ammatti syntymapaikka ...

    H1 Akseli Gallen-Kallela taiteilija Lemu H2 Gustaf Mannerheim marsalkka Askainen ... H1 Lemu taiteiija ihminen ”Akseli Gallen-Kallela” H2 Askainen marsalkka ”Gustaf Mannerheim” tyyppi tyyppi nimi nimi ammatti ammatti s-paikka s-paikka Biography Center
  10. Museo luetteloi maalauksia ... T1 1929 maalaus tekijä aika tyyppi

    ”Gustaf Mannerheim” nimi aihe nimi ”Akseli Gallen-Kallela” teos nimi tekijä aika aihe ... T1 Mannerheimin muotokuva Akseli Gallen-Kallela 1929 Gustaf Mannerheim T2 Aino-triptyykki Akseli Gallen-Kallela 1891 Aino, Kalevala ... Art Museum Collection
  11. Maanmittauslaitos ylläpitää paikkarekistereitä Varsinais-Suomen lääni Suomi Askainen Lemu Turku kunta

    lääni Askainen Varsinais-Suomen lääni Helsinki Uudenmaan lääni Lemu Varsinais-Suomen lääni Turku Varsinais-Suomen lääni ... part-of part-of part-of part-of kunta tyyppi lääni tyyppi ... tyyppi Land Survey
  12. FinnONTO kehittää ontologiainfrastruktuuria taiteiija ihminen marsalkka maalaus käsite pysyvä paikka

    ammatti kunta yläluokka yläluokka yläluokka yläluokka yläluokka ajanjakso yläluokka abstrakti muuttuva fyysinen objekti lääni KOKO-ontologia FinnONTO
  13. Semantic RDF Network Connects it All: Web of Data (GGG)

    H1 Lemu taiteiija ihminen ”Akseli Gallen-Kallela” H2 Askainen marsalkka ”Gustaf Mannerheim” tyyppi tyyppi nimi nimi ammatti ammatti s-paikka s-paikka T1 1929 maalaus tekijä aihe aika tyyppi Varsinais-Suomen lääni Suomi Turku part-of part-of part-of part-of käsitteet pysyvä paikka ammatti kunta tyyppi tyyppi tyyppi yläluokka yläluokka yläluokka yläluokka yläluokka ajanjakso yläluokka abstrakti muuttuva fyysinen objekti lääni yläluokka ... CultureSampo
  14. Skill Documentations (Intangible CH) A Semantic Video Viewer Semantic recommendations

    Semantic process description Dynamic information about the video scene
  15. Semantic Kalevala: The computer ”understands” the national epic Kalevala Semantically

    annotated 50 poems of the national epic - Events and narratives Translation into modern Finnish Links to related art etc.
  16. Lessons Learned from MuseumFinland – Finnish Museums on the Semantic

    Web 2002-2004  First system integrating museum collections form different museums based on semantic web ontologies and RDF – Online since 2004: http://www.museosuomi.fi/ – [Hyvönen, et al., Journal of Web Semantics, 2005]  W3C domain independent ontology standard RDF(S) is useful but – Domain specific ”standards” i.e. actual ontologies are missing – No ontologies of Finnish cultural heritage were available  There were useful thesauri around, such as – General Finnish Thesaurus (YSA) (24,000 concepts) – Thesaurus for Museum Domain (MASA) (6,000 concepts)  Let’s create and share together ontologies on a national level! – With interoperability to international ontologies  => National Finnish Ontology Project FinnONTO (2003-2012)
  17. Problem 1: Data Value Alignment: Same Nodes (URIs) Used Between

    Datasets H1 Lemu taiteiija ihminen ”Akseli Gallen-Kallela” H2 Askainen marsalkka ”Gustaf Mannerheim” tyyppi tyyppi nimi nimi ammatti ammatti s-paikka s-paikka T1 1929 maalaus tekijä aihe aika tyyppi Varsinais-Suomen lääni Suomi Turku part-of part-of part-of part-of käsitteet pysyvä paikka ammatti kunta tyyppi tyyppi tyyppi yläluokka yläluokka yläluokka yläluokka yläluokka ajanjakso yläluokka abstrakti muuttuva fyysinen objekti lääni yläluokka ... CultureSampo
  18. Problem 2: Metadata Model Alignment 1) ”Dublin Core Approach”: Aligning

    properties into subPropertyOf hierarchies in RDFS -2) ”CIDOC CRM Approch”: Transforming all metadata into a foundational event-based model
  19. FinnONTO Thesis  Semantic Web needs a content infrastructure for

    interoperability  Especially useful in – Cross-domain applications – Collaborative Web 2.0 applications
  20. FinnONTO Industrial & Public Organization Consortium  FinnONTO – 2003-2004

    14 funding organizations – 2004-2005 16 funding organizations – 2005-2006 30 funding organizations – 2006-2007 37 funding organizations  FinnONTO 2.0 – 2008-2010 38 funding organizations Semantic UBICOM-services (SUBI) 17 funding org., 2010-2012 0,55MEUR FinnONTO 2.0 35 funding org., 2010-2012 1,52MEUR Linked Data Finland (LDF) 19 funding org., 2012-2014 0,49 MEUR
  21. Our Approach: Preventing Interoperability Problems in Advance by Collaboration 

    Sharing Ontologies  Sharing Metadata Schemas  Sharing Linked Open Data [Hyvönen et al., ICSC 2007, ESWC 2009, SW Journal 2010]
  22. Major Domain Ontology Types Needed  General Concept Ontologies 

    Actor Ontologies  Place Ontologies  Time Ontologies  Event Ontologies  Domain Nomenclatures and Terminologies  ”Domain ontology” refers thesaurus or gazetteer like KOSs whose resources are used is element values of metadata descriptions
  23. A Method for Transforming Keyword Thesauri into Light Weight Ontologies

     Input: a thesaurus using standard ISO [2788] / SFS 5471 – Semantic relations: NT, BT, RT, USE, USED FOR, ... – Used widely in Finland – Fairly large vocabularies, e.g., YSA 24,000 terms  Output: a light-weight subclass-of ontology – Complete subclass-of hierarchy – Hierarchical relationships » Based on NT / BT relations of the thesaurus » Transitive subclass hierarchy sceletons » Part-of and associative relations distinquished from subclass-of – Disambiguation of meanings » Semantic disambiquation of term meanings » Preferred vs. non-preferred terms (USE / USED FOR) » Multilinguality
  24. Example of Transitivity Issues  make-up mirrors BT mirrors mirrors

    BT furniture  make-up mirrors subClassOf furniture  When searching furniture make-up mirrors would be in the result!
  25. KOKO: Linked Ontology Cloud / Network  Each ontology was

    aligned with the top ontology YSO by machine and then corrected by hand [Hyvönen et al., ESWC, 2008]  Protege editor (Stanford University) was used for editing  Skosify tool was developed for validating and transforming ontologies into SKOS [Suominen, Hyvönen, EKAW, 2012]  Some additional tools were developed, such as MUTU for managing changes in the top ontology [Pessala et al., ISWC WS, 2011] and KOAN for inspecting overlapping areas
  26. YSO AFO MAO TAO VALO KOKO ... ... KOKO –

    Linked Open Ontology Cloud Your ontology? Aligning KOKO ontologies [Hyvönen et al., ESWC 2009]
  27. KOKO: Linked Open Ontology Cloud Name Ontology domain Underlying thesaurus

    Size Maintaining Organization 1 YSO General domain General Finnish Thesaurus, YSA, Allärs 23700 National Library, Åbo Academy 2 MUSO Music Thesaurus of Music, MUSA/CILLA 1000 National Library 3 MAO Museum domain Thesaurus of Museum Domain, MASA 6800 National Board of Antiquities 4 AFO Agriculture, foresty Agriforest Thesaurus 5500 Viikki Science Library 5 TAO Applied arts Thesaurus of Applied Arts 2600 University of Eastern Finland and Library of Aalto University 6 VALO Photography Thesaurus of Photography Literature, Thesaurus of Photography Technology 1900 Finnish Museum of Photography 7 MERO Seafaring, shipping Thesaurus of Seafaring 1400 Finnish Transport Agency 8 KAUNO Literature subjects Thesaurus of Literature, Bella 4900 Finnish Public Libraries, Kirjastot.fi 9 JUHO Public government Thesaurus of Finnish Government, VNAS 6400 Ministry of Finance 10 TERO Health promotion YSA, TESA, MeSH, Stameta 22000 Various organizations 11 KITO Literature research Thesaurus of Literature Research 900 Finnish Literature Society 12 KULO Culture research Thesaurus for Folk Culture Studies 1600 Finnish Literature Society 13 KTO Linguistics Thesaurus of Linguistics 1000 Research Institute for the Languages in 14 PUHO Defense Thesaurus of Defence Administration 2000 Finnish Defence Forces 15 POIO Points of interest TGN, Geonames, LDG, SUO 4600 Various organizations TOTAL 86300
  28. Ontology Developers in Action  Work goes now on at

    the National Library based on the FinnONTO prototype – Validating and correcting ontologies (e.g. translations) – Finalizing the parts and structure of the ontology cloud – Minimizing overlapping areas
  29. Resolving and Aligning Identities - Linguistic variations - Pseudonames (Mark

    Twain = Samuel Clemens), nicknames - Honorary names (e.g., popes, kings, ...) - Name changes (e.g., due to marriage) in time - Different names used in different communities
  30. ONKI People Ontology Service A demo with Getty ULAN 120,000

    Instances [Kurki, Hyvönen, ICSD, 2010] Faceted Search
  31. Use Case: Semantic National Biography http://www.ldf.fi/dataset/history/map.html#4.00/60.00/25.00  6300 biographies of

    the National Biography (Finnish Literature Society)  CIDOC CRM data (116 000 events) linked with 4 additional datasets to enrich data  based on (SKS) aineistoon perustuen [Hyvönen et al., ISWC P&D, 2014]
  32. Historical Place Ontology of Finland (counties, municipalities) [Kauppinen et al.,

    2007, Hyvönen et al., 2011] http://www.seco.tkk.fi/ontologies/sapo/ Modeling Changes of Regions http://www.seco.tkk.fi/ontologies/sapo/ http://demo.seco.tkk.fi/saha/sapo/resource.shtml?uri=http%3A%2F%2Fwww.yso.fi%2Fonto%2Fsapo%2FHelsinki_1946-1965_
  33.  Modeling linear and cyclic time  Time periods are

    different in different countries – E.g. Bronze Age in Egypt and Nordic Countries  Modeling uncertainty in time [Kauppinen et al., IJHCS, 2010]
  34. Using Events as References (URI)  Example: Bettman Archive Foto

    [Doerr, 2004]  Dublin Core record related to the Yalta Conference  Matadata does not refer explicitly to the Yalta Conference or WW II  Based on shared events, content could be aggregated – Searching stuff dealing with the events – Finding semantic associations based on events and their clusters
  35. Skill Documentation in CultureSampo: A Semantic Video Viewer Semantic recommendations

    Semantic process description Dynamic information about the video scene
  36. Lots of Different Kinds in Diffrent Domains  Diseases, drugs

     Trademarks of different kinds  Chemical combounds  Biological species  Laws  …
  37. Biological Taxonomic Ontologies and Namelists Collab.: Finnish Museum of Natural

    History, BirdLife, Vanamo http://www.seco.tkk.fi/ontologies/biology/ vuosi alue ryhmä taksoneita joista lajeja Catalogus Lepidopterorum. Fenniae et regionum adiacentium. 1. Macrolepidoptera. 1962 Suomi Perhoset 313 161 Suomen perhosten luettelo 1977 Suomi Perhoset 256 120 The Lepidoptera of Europe. A Distributional Checklist 1996 Skandinavia Perhoset 12256 9804 Checklist of Finnish Lepidoptera 2002 Suomi Perhoset 265 126 Suomen perhosten luettelo - päivitetty versio 2008 Suomi Perhoset 4573 2987 Norwegian Lepidoptera 2008 Norja Perhoset 3244 2210 Catalogue of the Lepidoptera of Russia (only NW parts) 2008 Luoteis- Venäjä Perhoset 3251 2171 Estonian Lepidoptera. Catalogue 2008 Viro Perhoset 3477 2389 The Fly Fauna of Finland (Draft) 2008 Suomi Kärpäset 6351 4800 Suomen loispistiäisluettelo (Hymenoptera, Parasitica). Osa 1. heimo Ichneumonidae, alaheimot Pimplinae, Poemeniinae, Rhyssinae ja Diacritinae - A check list of Finnish Hymenoptera, Parasitica. Part 1 1995 Suomi Loispistiäiset 282 210 Suomen loispistiäisluettelo (Hymenoptera, Parasitica). Osa 2. alaheimot Tryphoninae, Eucerotinae, Adelognathinae, Xoridinae ja Agriotypinae - A check list of Finnish Hymenoptera, Parasitica. Part 2 1999 Suomi Loispistiäiset 398 311 Suomen loispistiäisluettelo (Hymenoptera, Parasitica). Osa 3. alaheimo Cryptinae - A check list of Finnish Hymenoptera, Parasitica. Part 3 1999 Suomi Loispistiäiset 919 727 Suomen loispistiäisluettelo (Hymenoptera, Parasitica). Osa 4. heimo Ichneumonidae, alaheimot Lycorinae, Neorhacodinae, Stilbopinae, Banchinae ja Ctenopelmatinae - A Check list of Finnish Hymenoptera, Parasitica. Part 4 2000 Suomi Loispistiäiset 786 646 Suomen loispistiäisluettelo (Hymenoptera, Parasitica). Osa 5. heimo Ichneumonidae, alaheimot Tersilochinae, Ophioninae, Anomalinae, Paxylommatinae, Cremastinae ja Campopleginae - A check list of Finnish Hymenoptera, Parasitica. Part 5 2003 Suomi Loispistiäiset 733 587 Suomen ripsiäisten luettelo - Checklist of Finnish Thysanoptera 2008 Suomi Ripsiäiset 219 140 Suomen nivelkärsäisten luettelo - Check-list of Finnish Hemiptera 2008 Suomi Nivelkärsäiset 2690 1697 Suomen verkkosiipiset ja kärsäkorennot - The Neuroptera s.l. and Mecoptera of Finland 2008 Suomi Verkkosiipiset ja kärsäkorennot 113 72 Maailman lintujen suomenkieliset nimet (Finnish Names of the Birds of the World) 2010 Maailma Linnut 12125 9740 Nisäkkäiden nimilista (beta) 2008 Maailma Nisäkkäät 6062 4629 Suomen myrkkypistiäisten luettelo 2010 Suomi Myrkkypistiäiset 1048 664 [Tuominen et al., ESWC, 2011, ESWC WS 2013]
  38. Centralized ONKI Ontology Services 1. Ontology Developers - Colloborative development

    of interdependent ontologies - Versioning and support for updates 2. Information Searchers - Support concept-based search - Keyword disambiguation - Finding the right search concepts 2. Information Indexers - Support indexing concept finding - Keyword disambiguation - Support indexing patterns Nokia: company or city?
  39.  FinnONTO ontologies are published as a service by the

    National Ontology Library ONKI since 2008  Living laboratory for human and machine users (APIs) » 14 000 human users / month » 400 registered machine user domains  Two implementations » ONKI 3  http://www.onki.fi » ONKI Light  On SPARQL for SKOS vocabularies  http://light.onki.fi/
  40. What is ONKI? Architecture ONKI SKOS ONKI Geo ONKI People

    ... ONKI API ONKI library www.onki.fi ONKI Widgets ONKI Browser(s) ONKI Web Service Ontology download External sources Ontology directory ONKI Publish Your ONKI pURIfier ONKI ctrl ONKI Fetch Ontologies ONKI Feedback Domain and ontology specific ONKI implementations Services for end-users Services for ontology owners Ontology data Annotators, software developers, content service end-users Ontology developers & publishers [Viljanen et al., ESWC 2010]
  41. ONKI Widget for Mashups  Ontology services are automatically available

    after publishing a vocabulary or ontology with ONKI  Simple AJAX-based widget for creating mash-ups
  42. KOKO Ontologies and ONKI Light Deployed in 2014 by the

    National Library as Finto: http://finto.fi  Permanent free national service funded by Ministry of Education and Culture and Ministry of Finance from state budget
  43. Our ”7-star” Model and LDF.fi Data Hotel  Goals: enhance

    re-usability and data quality Burj Al Arab
  44. What is LDF.fi?  Living Laboratory for publishing Linked Open

    Data – Same idea as in ontology services (e.g. ONKI http://onki.fi ) – But for data and schemas  Data Services for – Linked Datasets – Schemas  Links to – Related services – Related applications  Learning Center – For Publishing and Using Linked Data
  45. Services – 5-star Linked Data Services » Getting RDF based

    on URIs, LD browsing, downloading » SPARQL Endpoint Services (using Fuseki) – Documentation services (automatic) » For schemas, vocabulary usage, statistics – Validation » Checking possible problems – Visualization – Data Curation » Automatic annotation, RDF editing, data linking – Data Policies (URI Minting etc.) – Online Learning Materials » Starting in 2015 – Your data? » Open service for publishing useful Linked Data
  46. Key Idea and Goal is Very Simple: Supporting Linked Data

    Applications BookSampo LD CultureSampo LD API API YOUR NEXT APPLICATION API LDF.fi Service
  47. Automatic Documentation  Schema-based Documentation – Classes, properties, domains, ranges,

    cardinalities – Automatic documentation generator can be used » Live OWL Documentation Environment LODE http://www.essepuntato.it/lode used at the moment  Vocabulary Usage » We developed a new service: http://vocab.at » Provides a detailed analysis of vocabularies and their usage in a dataset » Points out some quality issues
  48. Open your data so that it links with the others’

    data! 1. Everybody’s data is enriched for ”free” 2. Redundant work is minimized by collaboration 3. Work can be shared in better ways 4. Data can be reused in different applications
  49. 98 ?! http://www.seco.tkk.fi/ http://www.seco.tkk.fi/publications Eero Hyvönen: Publishing and Using Cultural

    Heritage Linked Data on the Semantic Web. Morgan & Claypool, Palo Alto, CA, USA, October, 2012.