Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MarkLogic World 2012 - BBC DSP [Jem Rayfield]

MarkLogic World 2012 - BBC DSP [Jem Rayfield]

Dynamic Semantic Publishing at the BBC
Jem Rayfield
Lead Tecnical Architect
BBC Future Media

jemrayfield

April 27, 2012
Tweet

More Decks by jemrayfield

Other Decks in Technology

Transcript

  1. Future Media © BBC MMXII BBC Dynamic Semantic Publishing [DSP]

    MarkLogic World 2012 •  Jem Rayfield : Lead Technical Architect •  BBC Future Media
  2. Future Media © BBC MMXII Outline BBC News Online BBC

    World Cup 2010 BBC Sport 2012 BBC News Mobile
  3. Future Media © BBC MMXII Static News The Good 1)

    Simple 2) Scales cheaply 3) Difficult to break [bad rendering logic etc..] 4) Handles high load
  4. Future Media © BBC MMXII Static News The BAD 1) 

    Relational taxonomic meta model 2) Static! Inflexible! SSI! 3) Document publishing 4) Content non re-usable 5) Content non repurpose-able 6) Difficult to personalize 7) Publication per output
  5. Future Media © BBC MMXII 1.  32 teams, 8 groups,

    736 players  776 pages 2.  Fixtures & Results, Groups & Teams pages 3.  To many web pages for too few journalists 4.  Improve the publishing system to help achieve all of this World Cup 2010
  6. Future Media © BBC MMXII Rationale •  Automated content publishing

    •  Huge increase in content breadth (number of manageable pages) •  Content re-use and re-purposing, increasing reach •  Simplified content management •  Journalist headcount reduction •  Multi-dimensional entry points and semantic navigation •  Improved user experience with high levels of user engagement •  Dynamic, state (time|event) and semantic driven page layout •  Personalized content aggregations •  Open data and API’s
  7. Future Media © BBC MMXII •  750+ Dynamic aggregations/pages (Player,

    Squad, Group, etc..) •  Average unique page requests a day : 2 million + •  Average OWLIM SPARQL queries a day : 1 million •  100s RDF statement updates/inserts per minute with full OWL reasoning and associated inference. •  Multi data center fully resilient, clustered 6 node triple store •  RDF graph model ideally suited to model domain representations such as sport World Cup statistics the GOOD
  8. Future Media © BBC MMXII •  Sports stories and indices

    static •  Sport content not responsive or personalized •  RDF Store unable to handle thousands of statistic updates a second •  RDF Store forward-chained closures expensive increase write latency •  RDF graph model and SPARQL not ideally suited to the BBC’s News and Sport document publication model World Cup statistics the BAD
  9. Future Media © BBC MMXII Sport Refresh 2012 •  Page

    per Athlete [10,000+], Page per country [200+], Page per Discipline [400-500], Page per venue, Page per team  A lot of output… •  Almost real time statistics and live event pages •  Time coded, metadata annotated, on demand video, 58,000 hours of content •  Far too many web pages for far too few journalists •  DSP annotation architecture to automate content aggregation
  10. Future Media © BBC MMXII Augment architecture with a Content

    Store 1.  Atomic content assets stored in MarkLogic XML store 2.  XML content queryable via Xquery 3.  Content Assets searchable 4.  Sports statistics searchable/queryable via XQuery 5.  Ontological SPARQL via BigOWLIM, assets Xquery via MarkLogic
  11. Future Media © BBC MMXII Sport Stats REST API SSL

    Accessible API GET https://api.live.bbc.co.uk/sportsdata/statsapi/football/table/ais/competition/118996114 GET https://api.live.bbc.co.uk/sportsdata/statsapi/football/table/ais/competition/118996114 Accept: application/json GET https://api.live.bbc.co.uk/sportsdata/statsapi/football/videprinter GET https://api.int.bbc.co.uk/sportsdata/statsapi/formula1/year/2012/calendar Accept: application/json etc……etc…..etc….
  12. Future Media © BBC MMXII Dynamic News mobile •  Multi

    device capability •  Responsive Web design •  Built on a dynamic service API •  New re-usable content model •  Dynamic assets
  13. Future Media © BBC MMXII MarkLogics handy Xinclude resolution Including

    story data on news index XML <item> <xi:include href="http://www.bbc.co.uk/asset/13447877" xpointer="xmlns(bbc=http://www.bbc.co.uk/content/asset) xpointer(/bbc:story/bbc:itemMeta)"> <xi:fallback> <!-- Unable to find href="http://www.bbc.co.uk/asset/13447877" xpointer="xmlns(bbc=http://www.bbc.co.uk/content/asset) xpointer(/bbc:story/bbc:itemMeta)" --> </xi:fallback> </xi:include> ...
  14. Future Media © BBC MMXII News Index API Including story

    data on news index XML HTTP GET https://api.live.bbc.co.uk/content/asset/news/technology/ HTTP Headers X-Candy-Audience: Domestic X-Candy-Platform: EnhancedMobile Accept: application/json Or HTTP Headers X-Candy-Audience: Domestic X-Candy-Platform: EnhancedMobile Accept: application/xml Contextualised output •  Audience •  Platform •  Response type
  15. Future Media © BBC MMXII News Story API Including story

    data on news index XML HTTP GET https://api.live.bbc.co.uk/content/asset/news/uk-17829360 HTTP Headers X-Candy-Audience: Domestic X-Candy-Platform: EnhancedMobile Accept: application/json Or HTTP Headers X-Candy-Audience: Domestic X-Candy-Platform: EnhancedMobile Accept: application/xml
  16. Future Media © BBC MMXII BBC sport site re-engineered to

    use fully dynamic approach (News Mobile style) BBC news high web site re-engineered to use fully dynamic approach (News Mobile style) Real-time Olympics 2012 stats and video overlay Upgrade to MarkLogic 5 MarkLogic XA Transactions (Removing handcrafted Xquery for master/master replication) MarkLogic Binary storage R&D Etc…. Platform future…..