Semantic Analytics at Scale

Andrew Montalenti (co-founder & CTO of Parse.ly) presents "Semantic Analytics at Scale" at NYTimes Headquarters. For the TimesOpen event, "Bigger Data and Smarter Scaling".

Andrew Montalenti

October 19, 2012

  1. Semantic Analytics at Scale Wednesday, October 17, 2012 1  

  2. Insights for the web’s best publishers

  3. What makes Dash different? Dash is purpose-built for publishers and

    media companies. We believe insight > data. Our simple, elegant, and intuitive interface requires no training. And your tech team will love our easy integration. Simply put, you get the best insights to increase audience and engagement on your site.
  4. 4   Join the web’s best publishers “the best part

    of working with Parse.ly is just how good you have been about implementation ... It looks like you guys have a more agile development environment...” “[It] gave us insight into the content and our performance explained in publisher language rather than your standards analytics jargon.” Jason Marlin, Director of Technology, ArsTechnica Zee Kane Editor-in-Chief, The Next Web
  5. 5  

  6. Parse.ly  Mission   •  Empower  editors  &  writers   • 

    Liberate  product  teams   •  Catalyze  adop>on  of  seman>c  web   •  Help  with  sustainability   6  
  7. 7   Empower  editors,  writers,  &  analysts   with  !mely,

     relevant,  &  ac!onable   engagement  data  about  web  content.  
  8. 8   Liberate  product  teams  from  the  CMS,   with

     powerful  data-­‐driven  APIs  to  assist   virtuous  user  behaviors.  
  9. 9   Catalyze  adop>on  of  seman!c  web   standards  across

     the  media  industry.  
  10. 10   Help  media  companies  achieve   profitability  &  sustainability

     in  digital.  
  11. 11   Scale.  

  12. 12  

  13. What  kind  of  scale?   •  >3  billion  pageviews  per

     month   •  >10  million  crawled  ar!cles   •  >2,500  requests  per  second  at  peak   •  ~70  server  nodes  across  three  data  centers   •  >Terabyte  of  RAM  with  produc>on  data   13  
  14. 14   Data.  

  15. 15  

  16. 16   d3.js  –  Data-­‐Driven  Documents  

  17. 17  

  18. 18  

  19. 19   Metadata.  

  20. 20   rNews

  21. 21   Standard! Implementation! Primary Purpose! Coverage! OpenGraph! Multiple META

    tags" Facebook Rich Embeds" ~60%" Schema.org Article! Microdata" SEO" ~80%" hNews! Microdata" News Industry Standard" ~90%" rNews! RDFa & Microdata" News Industry Standard" ~100%" HTML5! Tags" W3C Standard" ~20%" parsely-page! Single META tag" Semantic Analytics" ~70%"
  22. Field! OpenGraph! rNews! HTML5! parsely-page! Title! og:title" headline" <title>" title"

    Pub Date! a:published_time" datePublished" <time>" pub_date" Body! N/A" articleBody" <article>" N/A*" Author! a:author" creator" rel=“author”" author" Section! a:section" articleSection" N/A" section" Tags! a:tag" about" N/A" tags" Page Type! og:type" N/A" N/A" type" Canonical Link! og:url" N/A" rel=“canonical”" link" Post ID! N/A" identifier" N/A" post_id" Main Image! og:image" associatedMedia" N/A" image_url"
  23. 23  

  24. 24   schema.to   •  Meet  Mr.  Schemato!  A  friendly

     seman>c  web   robot  that  makes  metadata  cool  again.   •  Open  source   •  Public  service   •  Eases  implementa!on  with  validators   •  Eases  consump!on  with  normalizers   •  Extensible  
  25. 25   HTML5,  hNews,  rNews,   Schema.org,  OpenGraph,   parsely-­‐page

      {! “implements”: {! “ogp”: true, ! “rnews”: true! },! “distilled”: {! “title”: “The Bookstore’s Last Stand”! “link”: “http://nytimes.com/123/…”! “pub_date”: “2012-01-28”,! “image_url”: “http://img.nyt.com/…”,! “author”: “Julie Bosman”,! “section”: “Business Day”,! “tags”: “Barnes & Noble”, “Amazon”,! “type”: “post”,! “post_id”: “100000001318096”! },! “extracted”: {...}! }!
  26. 26   Here  Today   Coming  Soon   •  Valida>on

     Framework   •  Dis>lling  Framework   •  Web  Service   •  Validators:   –  rNews   –  Schema.org   –  OpenGraph   –  parsely-­‐page   •  Dis>llers   •  Site  Registry   •  Proxy   •  Command-­‐Line  Tools   •  Validators:   –  hNews   –  Dublin  Core   –  HTML5  
  28. 28   1 line of javascript that will not break

    or slow down your site Streamlined Integration Publishers sign up for a free 30-day trial at: http://dash.to/try
  29. Get  in  Touch   •  Tweet  us  (now!)   – @amontalen>

      – @parsely   •  Email  us  (whenever!)   – andrew@parsely.com   – hello@parsely.com   Open  source  contribu>ons  to  Schema.to,   Parse.ly  demos,  or  anything  else!   29