Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Semantic Brno

Semantic Brno

Thomas Steiner

October 10, 2011
Tweet

Other Decks in Research

Transcript

  1. How to help Search Engines make Sense of your Content

    Part 1 Thomas Steiner | Research Scientist, Hamburg, @tomayac Arnaud Brousseau | Intern, Hamburg, @arnaudbrousseau Part of the slides from: Kavi Goel | Product Manager, Search Experience, Mountain View
  2. Some things we've done for awhile Page with a list

    of useful features: http://www.google.com/help/features.html And so on...
  3. Truthfully, we often can't. General information understanding is a very

    hard problem. How are we able to understand all this information on web pages?
  4. Lots of exciting problems left to tackle Some searches that

    are still hard to do "pasta recipe with white sauce, no garlic, less than 30min to make" "fridge under 19 inches tall so it can fit under my shelf" "what new bands should I be listening to?" "suggest a trip itinerary for my visit to Chicago this weekend"
  5. Lots of exciting problems left to tackle Some searches that

    are still hard to do "pasta recipe with white sauce, no capers, less than 30min to make" "fridge under 19 inches tall so it can fit under my shelf" "what new bands should I be listening to?" "suggest a trip itinerary for my visit to Chicago this weekend” Recipe search
  6. A three-pronged approach General data extraction •  High recall, low

    precision •  When it works, it's amazing. But it's hard to get it to work. •  Powers Google Squared, fact extraction, search tools Structured data markup •  Encourage webmasters to encode semantic labels in their web pages •  Use open standards like Microdata/HTML5, Microformats, RDFa •  Data is available to anyone, helps spur innovation across the web •  Medium recall, medium to high precision •  Powers rich snippets Feeds •  Ideal for rapidly changing data •  Risk that data can go out of sync with the corresponding web page •  Low recall, medium to high precision, fresh data updates
  7. A three-pronged approach General data extraction •  High recall, low

    precision •  When it works, it's amazing. But it's hard to get it to work. •  Powers Google Squared, fact extraction, search tools Structured data markup •  Encourage webmasters to encode semantic labels in their web pages •  Use open standards like Microdata/HTML5, Microformats, RDFa •  Data is available to anyone, helps spur innovation across the web •  Medium recall, medium to high precision •  Powers rich snippets Feeds •  Ideal for rapidly changing data •  Risk that data can go out of sync with the corresponding web page •  Low recall, medium to high precision, fresh data updates
  8. How can the Semantic Web help? Kitten – taken from:

    http://www.flickr.com/photos/fweez/278017185/ http://creativecommons.org/licenses/by-sa/2.0/deed.en
  9. How can the Semantic Web help? •  WTF is the

    Semantic Web? Based on http://www.focus.com/images/view/29135/
  10. How can the Semantic Web help? •  The Semantic Web

    is about giving meaning to things. Based on http://www.focus.com/images/view/29135/
  11. How can the Semantic Web help? •  HTML defines a

    computer-readable syntax, but not one that provides meaning. Based on http://www.focus.com/images/view/29135/
  12. How can the Semantic Web help? •  Using Semantic Web

    technologies, the meaning of things can be interwoven, and the documents become machine-understandable. •  If you love the bunny, use the search engine of your choice and search for Oolong." Based on http://www.focus.com/images/view/29135/
  13. Introducing RDFa •  The current HTML: <img src="http://imgs.xkcd.com/comics/debian_main.png" title="dpkg: error

    processing package (--purge): subprocess pre-removal script returned error exit 163: OH_GOD_THEYRE_INSIDE_MY_CLOTHES" alt="debian-main" /> This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc/2.5/"> Creative Commons Attribution-NonCommercial 2.5 License </a>.
  14. Introducing RDFa •  Adding license information with RDFa: <img src="http://imgs.xkcd.com/comics/debian_main.png"

    title="dpkg: error processing package (--purge): subprocess pre-removal script returned error exit 163: OH_GOD_THEYRE_INSIDE_MY_CLOTHES" alt="debian-main" /> This work is licensed under a <a about="http://imgs.xkcd.com/comics/debian_main.png" rel="license" href="http://creativecommons.org/licenses/by-nc/2.5/"> Creative Commons Attribution-NonCommercial 2.5 License </a>.
  15. Introducing RDFa •  Adding license information with RDFa: <img src="http://imgs.xkcd.com/comics/debian_main.png"

    title="dpkg: error processing package (--purge): subprocess pre-removal script returned error exit 163: OH_GOD_THEYRE_INSIDE_MY_CLOTHES" alt="debian-main" /> This work is licensed under a <a about="http://imgs.xkcd.com/comics/debian_main.png" rel="license" href="http://creativecommons.org/licenses/by-nc/2.5/"> Creative Commons Attribution-NonCommercial 2.5 License </a>.
  16. How to help Search Engines make Sense of your Content

    Part 2 Thomas Steiner | Research Scientist, Hamburg, @tomayac Arnaud Brousseau | Intern, Hamburg, @arnaudbrousseau
  17. Introduction to the Semantic Web •  Reminder: the Semantic Web

    is all about triples: <http://imgs.xkcd.com/comics/debian_main.png> <cc:license> <http://creativecommons.org/licenses/by-nc/2.5/>
  18. From the Web to the Web of Data Fundamental shift:

    From sending bits from one host to the other towards making sense of those bits.
  19. From the Web to the Web of Data BelgianChocolates.com Pralinés

    Deluxe Mix 2,99€/100g Shopping Cart Merchant Name Product Name Price Product Image
  20. From the Web to the Web of Data How can

    website owners help Google make sense of their bits? Mark up their content using any of the following syntaxes:  Microformats  Microdata  RDFa "[...] We realized that structured data on the Web can and should accommodate multiple encodings."
  21. From the Web to the Web of Data <div xmlns:v="http://rdf.data-vocabulary.org/#"

    typeof="v:Event"> <a href="http://www.example.com/events/poisel_offenback.hmtl" rel="v:url" property="v:summary">Philipp Poisel in Offenbach</a> <span property="v:description">See Philipp Poisel in Offenbach</span> When: <span property="v:startDate" content="2011-01-16T19:00-01:00"> Jan 16, 7:00PM</span> <span property="v:endDate" content="2011-01-16T21:00-01:00"> 9:00PM</span> Where: <span rel="v:location"> <span typeof="v:Organization"> <span property="v:name">Capitol</span>, <span rel="v:address"> <span typeof="v:Address"> <span property="v:street-address">Kaiserstrae 106</span>, <span property="v:locality">Offenbach am Main</span>, </span> </span> <span rel="v:geo"> <span typeof="v:Geo"> <span property="v:latitude" content="50.10945"></span> <span property="v:longitude" content="8.76579" ></span> </span> </span> </span> </span> Category: <span property="v:eventType">Concert</span> </div> From structured mark-up on a Website...
  22. How do we get this data? With RDFa markup: <div

    xmlns:v="http://rdf.data-vocabulary.org/#" typeof="v:Review-aggregate"> <span rel=“v:itemreviewed"> <h1 property="v:name">Drooling Dog Bar B Q</h1> <img rel="v:rating" content="4" src="stars_map.png" alt="4 star rating" /> <em>based on <span property="v:count">15</span> reviews</em> </span> </div> With Microformats markup: <div class="hreview-aggregate"> <span class="item vcard"> <h1 class="fn org">Drooling Dog Bar B Q</h1> <img class="rating average" src= "stars_map.png" alt="4 star rating" /> <em>based on <span class="count">15</span> reviews</em> </span> </div>
  23. Is it all double rainbows? Springtime rainbow : taken from

    - http://www.geograph.org.uk/photo/1833025 http://creativecommons.org/licenses/by/2.0/deed.en
  24. Have fun on your journey to the land of glory

    Penallta pit pony: taken from http://www.geograph.org.uk/photo/90215 http://creativecommons.org/licenses/by-sa/2.0/