Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Semantic Brno

Semantic Brno

06d564d49f5933f926210402b7f2f4c0?s=128

Thomas Steiner

October 10, 2011
Tweet

Transcript

  1. How to help Search Engines make Sense of your Content

    Part 1 Thomas Steiner | Research Scientist, Hamburg, @tomayac Arnaud Brousseau | Intern, Hamburg, @arnaudbrousseau Part of the slides from: Kavi Goel | Product Manager, Search Experience, Mountain View
  2. Some things we've done for awhile Stock prices

  3. Some things we've done for awhile Calculator

  4. Some things we've done for awhile Unit converter

  5. Some things we've done for awhile Weather

  6. Some things we've done for awhile Definition

  7. Some things we've done for awhile Suicide prevention

  8. Some things we've done for awhile Page with a list

    of useful features: http://www.google.com/help/features.html And so on...
  9. Adding structure...

  10. Getting smarter... Revamped fact extraction

  11. …but not smart enough Revamped fact extraction

  12. Our concept car: Google Squared Fully structured search results. Great

    when it works.
  13. How are we able to understand all this information on

    web pages?
  14. Truthfully, we often can't. General information understanding is a very

    hard problem. How are we able to understand all this information on web pages?
  15. Lots of exciting problems left to tackle Some searches that

    are still hard to do "pasta recipe with white sauce, no garlic, less than 30min to make" "fridge under 19 inches tall so it can fit under my shelf" "what new bands should I be listening to?" "suggest a trip itinerary for my visit to Chicago this weekend"
  16. Lots of exciting problems left to tackle Some searches that

    are still hard to do "pasta recipe with white sauce, no capers, less than 30min to make" "fridge under 19 inches tall so it can fit under my shelf" "what new bands should I be listening to?" "suggest a trip itinerary for my visit to Chicago this weekend” Recipe search
  17. A three-pronged approach General data extraction •  High recall, low

    precision •  When it works, it's amazing. But it's hard to get it to work. •  Powers Google Squared, fact extraction, search tools Structured data markup •  Encourage webmasters to encode semantic labels in their web pages •  Use open standards like Microdata/HTML5, Microformats, RDFa •  Data is available to anyone, helps spur innovation across the web •  Medium recall, medium to high precision •  Powers rich snippets Feeds •  Ideal for rapidly changing data •  Risk that data can go out of sync with the corresponding web page •  Low recall, medium to high precision, fresh data updates
  18. A three-pronged approach General data extraction •  High recall, low

    precision •  When it works, it's amazing. But it's hard to get it to work. •  Powers Google Squared, fact extraction, search tools Structured data markup •  Encourage webmasters to encode semantic labels in their web pages •  Use open standards like Microdata/HTML5, Microformats, RDFa •  Data is available to anyone, helps spur innovation across the web •  Medium recall, medium to high precision •  Powers rich snippets Feeds •  Ideal for rapidly changing data •  Risk that data can go out of sync with the corresponding web page •  Low recall, medium to high precision, fresh data updates
  19. How can the Semantic Web help? Kitten – taken from:

    http://www.flickr.com/photos/fweez/278017185/ http://creativecommons.org/licenses/by-sa/2.0/deed.en
  20. How can the Semantic Web help? •  WTF is the

    Semantic Web? Based on http://www.focus.com/images/view/29135/
  21. How can the Semantic Web help? •  The Semantic Web

    is about giving meaning to things. Based on http://www.focus.com/images/view/29135/
  22. How can the Semantic Web help? •  HTML defines a

    computer-readable syntax, but not one that provides meaning. Based on http://www.focus.com/images/view/29135/
  23. How can the Semantic Web help? •  Using Semantic Web

    technologies, the meaning of things can be interwoven, and the documents become machine-understandable. •  If you love the bunny, use the search engine of your choice and search for Oolong." Based on http://www.focus.com/images/view/29135/
  24. Introducing triples •  The Semantic Web is all about triples:

    <#Hamburg> <#is_in_the> <#Germany> .
  25. Introducing triples •  Real-world usage: <#license> Share Remix Attribution Noncommercial

    . http://xkcd.com/797/
  26. Introducing RDFa •  The current HTML: <img src="http://imgs.xkcd.com/comics/debian_main.png" title="dpkg: error

    processing package (--purge): subprocess pre-removal script returned error exit 163: OH_GOD_THEYRE_INSIDE_MY_CLOTHES" alt="debian-main" /> This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc/2.5/"> Creative Commons Attribution-NonCommercial 2.5 License </a>.
  27. Introducing RDFa •  Adding license information with RDFa: <img src="http://imgs.xkcd.com/comics/debian_main.png"

    title="dpkg: error processing package (--purge): subprocess pre-removal script returned error exit 163: OH_GOD_THEYRE_INSIDE_MY_CLOTHES" alt="debian-main" /> This work is licensed under a <a about="http://imgs.xkcd.com/comics/debian_main.png" rel="license" href="http://creativecommons.org/licenses/by-nc/2.5/"> Creative Commons Attribution-NonCommercial 2.5 License </a>.
  28. Introducing RDFa •  Adding license information with RDFa: <img src="http://imgs.xkcd.com/comics/debian_main.png"

    title="dpkg: error processing package (--purge): subprocess pre-removal script returned error exit 163: OH_GOD_THEYRE_INSIDE_MY_CLOTHES" alt="debian-main" /> This work is licensed under a <a about="http://imgs.xkcd.com/comics/debian_main.png" rel="license" href="http://creativecommons.org/licenses/by-nc/2.5/"> Creative Commons Attribution-NonCommercial 2.5 License </a>.
  29. How to help Search Engines make Sense of your Content

    Part 2 Thomas Steiner | Research Scientist, Hamburg, @tomayac Arnaud Brousseau | Intern, Hamburg, @arnaudbrousseau
  30. Introduction to the Semantic Web •  Reminder: the Semantic Web

    is all about triples: <http://imgs.xkcd.com/comics/debian_main.png> <cc:license> <http://creativecommons.org/licenses/by-nc/2.5/>
  31. From the Web to the Web of Data Fundamental shift:

    From sending bits from one host to the other towards making sense of those bits.
  32. From the Web to the Web of Data BelgianChocolates.com Pralinés

    Deluxe Mix 2,99€/100g Shopping Cart
  33. From the Web to the Web of Data BelgianChocolates.com Pralinés

    Deluxe Mix 2,99€/100g Shopping Cart Merchant Name Product Name Price Product Image
  34. From the Web to the Web of Data How can

    website owners help Google make sense of their bits? Mark up their content using any of the following syntaxes:  Microformats  Microdata  RDFa "[...] We realized that structured data on the Web can and should accommodate multiple encodings."
  35. From the Web to the Web of Data <div xmlns:v="http://rdf.data-vocabulary.org/#"

    typeof="v:Event"> <a href="http://www.example.com/events/poisel_offenback.hmtl" rel="v:url" property="v:summary">Philipp Poisel in Offenbach</a> <span property="v:description">See Philipp Poisel in Offenbach</span> When: <span property="v:startDate" content="2011-01-16T19:00-01:00"> Jan 16, 7:00PM</span> <span property="v:endDate" content="2011-01-16T21:00-01:00"> 9:00PM</span> Where: <span rel="v:location"> <span typeof="v:Organization"> <span property="v:name">Capitol</span>, <span rel="v:address"> <span typeof="v:Address"> <span property="v:street-address">Kaiserstrae 106</span>, <span property="v:locality">Offenbach am Main</span>, </span> </span> <span rel="v:geo"> <span typeof="v:Geo"> <span property="v:latitude" content="50.10945"></span> <span property="v:longitude" content="8.76579" ></span> </span> </span> </span> </span> Category: <span property="v:eventType">Concert</span> </div> From structured mark-up on a Website...
  36. From the Web to the Web of Data ...to a

    Rich Snippet on Google
  37. Rich Snippets Formats: Reviews

  38. Rich Snippets Formats: People

  39. Rich Snippets Formats: Events

  40. Rich Snippets Formats: Recipes

  41. How do we get this data?

  42. How do we get this data? With RDFa markup: <div

    xmlns:v="http://rdf.data-vocabulary.org/#" typeof="v:Review-aggregate"> <span rel=“v:itemreviewed"> <h1 property="v:name">Drooling Dog Bar B Q</h1> <img rel="v:rating" content="4" src="stars_map.png" alt="4 star rating" /> <em>based on <span property="v:count">15</span> reviews</em> </span> </div> With Microformats markup: <div class="hreview-aggregate"> <span class="item vcard"> <h1 class="fn org">Drooling Dog Bar B Q</h1> <img class="rating average" src= "stars_map.png" alt="4 star rating" /> <em>based on <span class="count">15</span> reviews</em> </span> </div>
  43. Get your mark-up right with the testing tool

  44. Best Buy example 1 http://stores.bestbuy.com/1125/

  45. Best Buy example 2 RDFa-enriched search results on BestBuy

  46. Best Buy example 3 Track listings on BestBuy

  47. Is it all double rainbows? Springtime rainbow : taken from

    - http://www.geograph.org.uk/photo/1833025 http://creativecommons.org/licenses/by/2.0/deed.en
  48. No! Cry me a river Sad Panda: taken from http://www.flickr.com/photos/damski/3429712490/

    http://creativecommons.org/licenses/by/2.0/deed.en
  49. rNews vocabulary in the making http://dev.iptc.org/rNews-Introduction-to-rNews

  50. Have fun on your journey to the land of glory

    Penallta pit pony: taken from http://www.geograph.org.uk/photo/90215 http://creativecommons.org/licenses/by-sa/2.0/
  51. But watch out: unicorns crossing Unicorn crossing: taken from http://www.flickr.com/photos/rumpleteaser/2812559753/

    http://creativecommons.org/licenses/by-sa/2.0/
  52. Further reading, erm, watching http://www.youtube.com/watch?v=5lCSDOuqv1A The Structured Search Engine Google

    Tech Talk January 19, 2011 by Andrew Hogue.
  53. Děkuji vám za pozornost Thomas Steiner tomac@google.com @tomayac Arnaud Brousseau

    arnaudb@google.com @arnaudbrousseau