Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Semantic Analytics at Scale

Semantic Analytics at Scale

Andrew Montalenti (co-founder & CTO of Parse.ly) presents "Semantic Analytics at Scale" at NYTimes Headquarters. For the TimesOpen event, "Bigger Data and Smarter Scaling".

Related links:

Parse.ly: http://parse.ly
Schema.to: http://schema.to
Live blog of event: http://open.blogs.nytimes.com/2012/10/17/live-blogging-timesopen-bigger-data-and-smarter-scaling/

Andrew Montalenti

October 19, 2012
Tweet

More Decks by Andrew Montalenti

Other Decks in Technology

Transcript

  1. What makes Dash different? Dash is purpose-built for publishers and

    media companies. We believe insight > data. Our simple, elegant, and intuitive interface requires no training. And your tech team will love our easy integration. Simply put, you get the best insights to increase audience and engagement on your site.
  2. 4   Join the web’s best publishers “the best part

    of working with Parse.ly is just how good you have been about implementation ... It looks like you guys have a more agile development environment...” “[It] gave us insight into the content and our performance explained in publisher language rather than your standards analytics jargon.” Jason Marlin, Director of Technology, ArsTechnica Zee Kane Editor-in-Chief, The Next Web
  3. Parse.ly  Mission   •  Empower  editors  &  writers   • 

    Liberate  product  teams   •  Catalyze  adop>on  of  seman>c  web   •  Help  with  sustainability   6  
  4. 7   Empower  editors,  writers,  &  analysts   with  !mely,

     relevant,  &  ac!onable   engagement  data  about  web  content.  
  5. 8   Liberate  product  teams  from  the  CMS,   with

     powerful  data-­‐driven  APIs  to  assist   virtuous  user  behaviors.  
  6. What  kind  of  scale?   •  >3  billion  pageviews  per

     month   •  >10  million  crawled  ar!cles   •  >2,500  requests  per  second  at  peak   •  ~70  server  nodes  across  three  data  centers   •  >Terabyte  of  RAM  with  produc>on  data   13  
  7. 21   Standard! Implementation! Primary Purpose! Coverage! OpenGraph! Multiple META

    tags" Facebook Rich Embeds" ~60%" Schema.org Article! Microdata" SEO" ~80%" hNews! Microdata" News Industry Standard" ~90%" rNews! RDFa & Microdata" News Industry Standard" ~100%" HTML5! Tags" W3C Standard" ~20%" parsely-page! Single META tag" Semantic Analytics" ~70%"
  8. Field! OpenGraph! rNews! HTML5! parsely-page! Title! og:title" headline" <title>" title"

    Pub Date! a:published_time" datePublished" <time>" pub_date" Body! N/A" articleBody" <article>" N/A*" Author! a:author" creator" rel=“author”" author" Section! a:section" articleSection" N/A" section" Tags! a:tag" about" N/A" tags" Page Type! og:type" N/A" N/A" type" Canonical Link! og:url" N/A" rel=“canonical”" link" Post ID! N/A" identifier" N/A" post_id" Main Image! og:image" associatedMedia" N/A" image_url"
  9. 24   schema.to   •  Meet  Mr.  Schemato!  A  friendly

     seman>c  web   robot  that  makes  metadata  cool  again.   •  Open  source   •  Public  service   •  Eases  implementa!on  with  validators   •  Eases  consump!on  with  normalizers   •  Extensible  
  10. 25   HTML5,  hNews,  rNews,   Schema.org,  OpenGraph,   parsely-­‐page

      {! “implements”: {! “ogp”: true, ! “rnews”: true! },! “distilled”: {! “title”: “The Bookstore’s Last Stand”! “link”: “http://nytimes.com/123/…”! “pub_date”: “2012-01-28”,! “image_url”: “http://img.nyt.com/…”,! “author”: “Julie Bosman”,! “section”: “Business Day”,! “tags”: “Barnes & Noble”, “Amazon”,! “type”: “post”,! “post_id”: “100000001318096”! },! “extracted”: {...}! }!
  11. 26   Here  Today   Coming  Soon   •  Valida>on

     Framework   •  Dis>lling  Framework   •  Web  Service   •  Validators:   –  rNews   –  Schema.org   –  OpenGraph   –  parsely-­‐page   •  Dis>llers   •  Site  Registry   •  Proxy   •  Command-­‐Line  Tools   •  Validators:   –  hNews   –  Dublin  Core   –  HTML5  
  12. Parse.ly  Mission   •  Empower  editors  &  writers   • 

    Liberate  product  teams   •  Catalyze  adop>on  of  seman>c  web   •  Help  with  sustainability   27  
  13. 28   1 line of javascript that will not break

    or slow down your site Streamlined Integration Publishers sign up for a free 30-day trial at: http://dash.to/try
  14. Get  in  Touch   •  Tweet  us  (now!)   – @amontalen>

      – @parsely   •  Email  us  (whenever!)   – [email protected]   – [email protected]   Open  source  contribu>ons  to  Schema.to,   Parse.ly  demos,  or  anything  else!   29