$30 off During Our Annual Pro Sale. View Details »

Semantic Analytics at Scale

Semantic Analytics at Scale

Andrew Montalenti (co-founder & CTO of Parse.ly) presents "Semantic Analytics at Scale" at NYTimes Headquarters. For the TimesOpen event, "Bigger Data and Smarter Scaling".

Related links:

Parse.ly: http://parse.ly
Schema.to: http://schema.to
Live blog of event: http://open.blogs.nytimes.com/2012/10/17/live-blogging-timesopen-bigger-data-and-smarter-scaling/

Andrew Montalenti

October 19, 2012
Tweet

More Decks by Andrew Montalenti

Other Decks in Technology

Transcript

  1. Semantic Analytics at Scale
    Wednesday, October 17, 2012
    1  

    View Slide

  2. Insights for the web’s best publishers

    View Slide

  3. What makes Dash different?
    Dash is purpose-built for publishers and media
    companies. We believe insight > data. Our simple,
    elegant, and intuitive interface requires no training.
    And your tech team will love our easy integration.
    Simply put, you get the best insights to increase
    audience and engagement on your site.

    View Slide

  4. 4  
    Join the web’s best publishers
    “the best part of working with Parse.ly is
    just how good you have been about
    implementation ... It looks like you guys
    have a more agile development
    environment...”
    “[It] gave us insight into the content
    and our performance explained in
    publisher language rather than your
    standards analytics jargon.”
    Jason Marlin,
    Director of Technology, ArsTechnica
    Zee Kane
    Editor-in-Chief, The Next Web

    View Slide

  5. 5  

    View Slide

  6. Parse.ly  Mission  
    •  Empower  editors  &  writers  
    •  Liberate  product  teams  
    •  Catalyze  adop>on  of  seman>c  web  
    •  Help  with  sustainability  
    6  

    View Slide

  7. 7  
    Empower  editors,  writers,  &  analysts  
    with  !mely,  relevant,  &  ac!onable  
    engagement  data  about  web  content.  

    View Slide

  8. 8  
    Liberate  product  teams  from  the  CMS,  
    with  powerful  data-­‐driven  APIs  to  assist  
    virtuous  user  behaviors.  

    View Slide

  9. 9  
    Catalyze  adop>on  of  seman!c  web  
    standards  across  the  media  industry.  

    View Slide

  10. 10  
    Help  media  companies  achieve  
    profitability  &  sustainability  in  digital.  

    View Slide

  11. 11  
    Scale.  

    View Slide

  12. 12  

    View Slide

  13. What  kind  of  scale?  
    •  >3  billion  pageviews  per  month  
    •  >10  million  crawled  ar!cles  
    •  >2,500  requests  per  second  at  peak  
    •  ~70  server  nodes  across  three  data  centers  
    •  >Terabyte  of  RAM  with  produc>on  data  
    13  

    View Slide

  14. 14  
    Data.  

    View Slide

  15. 15  

    View Slide

  16. 16  
    d3.js  –  Data-­‐Driven  Documents  

    View Slide

  17. 17  

    View Slide

  18. 18  

    View Slide

  19. 19  
    Metadata.  

    View Slide

  20. 20  
    rNews

    View Slide

  21. 21  
    Standard! Implementation! Primary Purpose! Coverage!
    OpenGraph! Multiple META tags" Facebook Rich Embeds" ~60%"
    Schema.org Article! Microdata" SEO" ~80%"
    hNews! Microdata" News Industry Standard" ~90%"
    rNews! RDFa & Microdata" News Industry Standard" ~100%"
    HTML5! Tags" W3C Standard" ~20%"
    parsely-page! Single META tag" Semantic Analytics" ~70%"

    View Slide

  22. Field! OpenGraph! rNews! HTML5! parsely-page!
    Title! og:title" headline" " title"
    Pub Date! a:published_time" datePublished" " pub_date"
    Body! N/A" articleBody" " N/A*"
    Author! a:author" creator" rel=“author”" author"
    Section! a:section" articleSection" N/A" section"
    Tags! a:tag" about" N/A" tags"
    Page Type! og:type" N/A" N/A" type"
    Canonical Link! og:url" N/A" rel=“canonical”" link"
    Post ID! N/A" identifier" N/A" post_id"
    Main Image! og:image" associatedMedia" N/A" image_url"

    View Slide

  23. 23  

    View Slide

  24. 24  
    schema.to  
    •  Meet  Mr.  Schemato!  A  friendly  seman>c  web  
    robot  that  makes  metadata  cool  again.  
    •  Open  source  
    •  Public  service  
    •  Eases  implementa!on  with  validators  
    •  Eases  consump!on  with  normalizers  
    •  Extensible  

    View Slide

  25. 25  
    HTML5,  hNews,  rNews,  
    Schema.org,  OpenGraph,  
    parsely-­‐page  
    {!
    “implements”: {!
    “ogp”: true, !
    “rnews”: true!
    },!
    “distilled”: {!
    “title”: “The Bookstore’s Last Stand”!
    “link”: “http://nytimes.com/123/…”!
    “pub_date”: “2012-01-28”,!
    “image_url”: “http://img.nyt.com/…”,!
    “author”: “Julie Bosman”,!
    “section”: “Business Day”,!
    “tags”: “Barnes & Noble”, “Amazon”,!
    “type”: “post”,!
    “post_id”: “100000001318096”!
    },!
    “extracted”: {...}!
    }!

    View Slide

  26. 26  
    Here  Today   Coming  Soon  
    •  Valida>on  Framework  
    •  Dis>lling  Framework  
    •  Web  Service  
    •  Validators:  
    –  rNews  
    –  Schema.org  
    –  OpenGraph  
    –  parsely-­‐page  
    •  Dis>llers  
    •  Site  Registry  
    •  Proxy  
    •  Command-­‐Line  Tools  
    •  Validators:  
    –  hNews  
    –  Dublin  Core  
    –  HTML5  

    View Slide

  27. Parse.ly  Mission  
    •  Empower  editors  &  writers  
    •  Liberate  product  teams  
    •  Catalyze  adop>on  of  seman>c  web  
    •  Help  with  sustainability  
    27  

    View Slide

  28. 28  
    1 line of javascript that will not
    break or slow down your site
    Streamlined Integration
    Publishers sign up for a free 30-day trial at:
    http://dash.to/try

    View Slide

  29. Get  in  Touch  
    •  Tweet  us  (now!)  
    – @amontalen>  
    – @parsely  
    •  Email  us  (whenever!)  
    – [email protected]  
    – [email protected]  
    Open  source  contribu>ons  to  Schema.to,  
    Parse.ly  demos,  or  anything  else!  
    29  

    View Slide