EdSense: Building a self-adapting, interactive learning portal with Couchbase and ElasticSearch

C14f6f7b7ce45e286106e7e7d5421f40?s=47 Chris Tse
September 21, 2012

EdSense: Building a self-adapting, interactive learning portal with Couchbase and ElasticSearch

Full video of the talk is now on YouTube: http://www.youtube.com/watch?v=0mQt5gEOIhI

Talk from Christopher Tse (@christse), Director of McGraw-Hill Education Labs (MHE Labs), on how to architect a scalable adaptive learning system using a combination of Couchbase 2.0 and ElasticSearch as back-ends. These slides are the presented at CouchConf San Francisco on September 21, 2012.

Code for the proof-of-concept project, called "Learning Portal" has been open sourced and is available via Github at http://github.com/couchbaselabs/learningportal

C14f6f7b7ce45e286106e7e7d5421f40?s=128

Chris Tse

September 21, 2012
Tweet

Transcript

  1. EdSense Building a self-adapting, interactive learning portal with Couchbase Christopher

    Tse (@christse) · CouchConf San Francisco · September 21, 2012
  2. Research Development New Ventures & Open Source &

  3. Research Development New Ventures & I am the Director of

    Heading our Looking for opportunities in education efforts A strong believer in the power of Chris Tse @christse Open Source
  4. is the reason why I’m here Open Source

  5. The Problem As learning move online in great numbers We

    need to build interactive learning environments that Scale! Scale to millions of learners Serve MHE as well as third-party content Including open content Support learning apps 010100100 111010101 010101001 010101010 Self-adapt via usage data
  6. The Approach + • Ride the Google-Apple-Mozilla- Microsoft HTML5 innovation

    train • Easily support interactive content on desktop and mobile devices • Develop in the language of the Web: JavaScript • Deliver content and learning tool via customer-facing APIs • Build bridges to existing enterprise system under the APIs • Provide all user data via a uniform security policy. No God mode!
  7. The Challenge • Allow for elastic scaling under spike periods

    • Ability to catalog & deliver content from many sources • Consistent low-latency for metadata and stats access • Require full-text search support for content discovery • Offer tunable content ranking & recommendation functions Backend is an Interactive Content Delivery Cloud that must: XML Databases SQL/MR Engines In-memory Data Grids Enterprise Search Servers So we experimented with a combination of: Hmmm...this looks kinda like: + Content Caching (Scale) + Social Gaming (Stats) + Ad Targeting (Smarts)
  8. The Technologies Back-end Middleware Front-end NoSQL OLTP- OLAP Database JSON-REST

    Search Engine Java-based Integration & Security Framework JavaScript MVC & ORM Framework
  9. Interactive Content Modules or “Cards” Ember = JavaScript + HTML

    + CSS The Overall Architecture Back-end Middleware Front-end Persistence & Analytics Couchbase = JSON Docs + JavaScript Map Reduce Content Access & Authoring APIs Ziniki = REST + WebSockets on JVM Search Index & Ranking ElasticSearch = JSON Docs + Query DSL
  10. Looks like web content Card UI Works like a mobile

    app Feels like a saved file Pure HTML5, no Flash or PDF Edit button is built-in. Drag & drop across all content types JS N Card state is stored as Front-end
  11. • Is stored as JSON documents using open source NoSQL

    technologies proven to scale: • Zynga ‣ Couchbase • Infochimps ‣ ElasticSearch • Updates in real-time: • Indexing using incremental MapReduce • Client synchronization via Memcached protocol Card State JS N Card state is stored as Back-end Couchbase Transaction Cluster Manages all the content, user profile, user preferences, user behavior Couchbase Analytical Cluster Performs real-time analytics on user profile and user behavior data ElasticSearch Cluster Full-text index all incoming content, apply ranking functions upon request
  12. + Store full-text articles as well as document metadata for

    image, video and text content in Couchbase Combine user preferences statistics with custom relevancy scoring to provide personalized search results Logs user behavior to calculate user preference statistics (e.g. video > text) 1 2 4 Continuously accept updates from Couchbase with new content & stats 3 2.0
  13. Introducing Learning Portal • Designed and built as a collaboration

    between MHE Labs and Couchbase • Serves as proof-of- concept and testing harness for Couchbase + ElasticSearch integration • Available for download and further development as open source code
  14. None
  15. Calculating statistics via Couchbase 2.0 Views Top Contributors & Tags

    driven by Incremental MapReduce Views
  16. Tuning content ranking via ElasticSearch ElasticSearch-driven based on settings below

    Content popularity boost User preference boost
  17. Data MR Query Proof-of-Concept Architecture App Server Hosted on Heroku

    External Media Store ES Queries over HTTP Couchbase Ruby SDK MR Views MR Views MR Views MR Views Couchbase Server Cluster ElasticSearch Server Cluster TS Query Doc Refs Cross Data Center Replication XDCR-based CB-ES Transport
  18. Data Modeling Content Metadata Bucket User Profiles Bucket Content Stats

    Bucket • Stores content metadata for media objects and content for articles • Includes tags, contributors, type information • Includes pointer to the media • Stores user view details per type • Updated every time a user views a doc with running count • To be used for customizing ES search results per user preference • Stores content view details • Updated for every time a document is viewed • To be used for boosting ES search results based on popularity
  19. Sample Document Content Metadata Bucket

  20. User Profile Bucket Sample Document 146 145 “click!”

  21. EdSense The Adaptive Bits Version 0.1

  22. Documents with Aggregated Scores Custom Scoring Algorithm User’s Content Preference

    Content Popularity Statistics User Query Tuned with Preferences Personalized Content Results Increm ent M apReduce Upon User Request Continuous Push Do Search Against G enerate + Metadata + External Assets Performs Action User Event Log
  23. Analytics & Event Logging { "_id": ”4ae5be2df3122f06ba45b70753001841”, “_rev”: ”1-0013b349ffc3afc700000000068000000”, “$flags”:

    0, “#expiration”: 0, “type”: “access”, “user”: “chris.tse@gmail.com”, “resource”: “379823”, “timestamp”: “2012-09-02T22:46:07Z” } { "_id": ”4ae5be2df3122f06ba45b70753001842”, “_rev”: ”1-0013b349ffc3afc700000000068000000”, “$flags”: 0, “#expiration”: 0, “type”: “create”, “user”: “chris.tse@gmail.com”, “resource”: “948177”, “resource”: “719301”, “timestamp”: “2012-09-02T22:45:59Z” } What? Who? Which? When? • Store full event log for offline analysis • Stored on a separate analytics cluster • Limit impact on OLTP • Tuned differently • Keep an upper-bound on data size via TTL (24 hrs)
  24. User Preference Boost Use ElasticSearch filter boosting { "filter": {

    "term": { "type": "video” } }, "boost": USER_VIDEO_PREFERENCE * PREFERENCE_SLIDER }
  25. Document Popularity Boost "script": "_score * (((doc['popularity'].value + 1) /

    AVG_POPULARITY ) * POPULARITY_SLIDER)" Use ElasticSearch custom script to score documents
  26. Combined Algorithm in a Query "filters": [ { "filter": {

    "term": { "type": "video" } }, "boost": USER_VIDEO_PREFERENCE * PREFERENCE_SLIDER }, … image and texts filters omitted … ], "score_mode": "total" } }, "script": "_score * (((doc['popularity'].value + 1) / AVG_POPULARITY ) * POPULARITY_SLIDER)" } + More List This + Fuzzy Matching + Facets + Tunable Relevance + Lucene Expressions + more
  27. Action Collections EdSense: Real-time Reactions Learning Style Engagement User Intents

    Recommendations Reaction Activity Log The Future Achievements Efficacy
  28. None
  29. Preview of Project Medici 01 02 03 04 05 06

    07 08 09 10 11 12 13 14 15 16 17 18 19 20 Review slides before the next class slides before. Read More. Review slides before the next class slides before. Read More. Review slides before the next class Review slides before the next class slides before. Read More. Please explain the Krebs Cylce with regards to blah and blah in 250+ Words The Secret Life of Plankton The Secret Life of Plankton Biology 101 PRACTICE EXPLORE PRACTICE Quiz 03 TEST Review slides before the next class Review slides before the next class Read More. How Cells Create Energy INSTRUCT 0 – 30 min. 0 – 30 min. 0 – 30 min. 0 – 30 min. 0 – 30 min. 04 Molecules of Life 03.24 – 03.30 Review slides before the next class Review slides before the next class Read More.Vit; nessent, utuituus publice rtermilinve, pub- lis siciam inum nonferibus hus hocrit.Horum, nonsula ves? inaWm ocuro, Palessistrum nihilicae consigit. Review slides before the next class Review slides before the next class Read More.Vit; nessent, utuituus publice rtermilinve, publis siciam inum nonferibus hus hocrit.Horum, nonsula ves? inam ocuro, Palessistrum nihilicae consigit. TOPIC: THE CELL Lesson Overview Lesson Objective Add Card Dashboard My Courses Reports Discussions Fundamentals of Biology Syllabus Overview Professor Lo
  30. Review slides before the next class slides before. Read More.

    The Secret Life of Plankton The Secret Life of Plankton Biology 101 PRACTICE EXPLORE PRACTICE Quiz 03 TEST Review slides before the next class Review slides before the next class Read More. How Cells Create Energy INSTRUCT 0 – 30 min. 0 – 30 min. 04 Molecules of Life 03.24 – 03.30 Review slides before the next class Review slides before the next class Read More.Vit; nessent, utuituus publice rtermilinve, pub- lis siciam inum nonferibus hus hocrit.Horum, nonsula ves? inaWm ocuro, Palessistrum nihilicae consigit. Review slides before the next class Review slides before the next class Read More.Vit; nessent, utuituus publice rtermilinve, publis siciam inum nonferibus hus hocrit.Horum, nonsula ves? inam ocuro, Palessistrum nihilicae consigit. TOPIC: THE CELL Lesson Overview Lesson Objective Add Card Dashboard My Courses Reports Discussions of Biology Syllabus Overview
  31. Questions? @christse Follow me on Twitter Sign up for our

    beta list at mhelabs.com
  32. http://github.com/couchbaselabs/learningportal Open Source The reason I’m here :)