Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Architecting Content-Centric Applications

Richard Esplin
February 09, 2012

Architecting Content-Centric Applications

Using the CMIS standard and open source tools to build applications that interact with unstructured binary content (audio, video, office documents, and other opaque formats).

Richard Esplin

February 09, 2012
Tweet

More Decks by Richard Esplin

Other Decks in Technology

Transcript

  1. Architecting Content-Centric Applications Patterns for Handling Content in Applications A

    Content Management Primer Why Relational Won't Cut It What I Wish I Understood About Content Before Writing My First Real Application (Solving SharePoint-Type Problems With an Open Source Stack) Richard Esplin Community Technology Alfresco
  2. Agenda • Making the case for content management • Best

    practices: the platform approach • Introducing CMIS • Live examples
  3. What is “content”? • Data • Don't mistake Code for

    Content • Unstructured Data • Structured data works well in a relational data store, XML store, or key-value store • Unstructured Binary Data • Unstructured non-binary data works well in source control • Examples: • Audio, Video, Images, Office Documents, Engineering Files, Reports
  4. What is a “content-centric application”? • Applications that access binary

    files • Files are often generated collaboratively • Often must deal with large numbers of files • May include a mix of structured and unstructured content • May also include business processes
  5. A few examples • Web site with catalogs, white papers,

    and videos • Expense report review and approval • Contract negotiation, creation, and review • Research study authoring • Sales / Marketing collateral creation and communication • Course guide authoring and publishing • Images and media in games • Media curation, transformation, and delivery • Legal compliance and corporate records management
  6. Or the business is saying . . . • I've

    got a ton of files, • I've got people that produce and consume them, • I've got systems that use them, • I want to make it easier! Doug Waldron (cc attribution share-alike) http://www.flickr.com/photos/dougww/922328173/
  7. DIY approach seems simple . . . • “This is

    standard stuff.” • Grab a web-application toolkit • Favorite front-end / presentation framework • Store a bunch of files • Relational Database • Data Model / Metadata • Comments / Ratings • Tagging / Categorization
  8. File storage options • On disk • Amazon S3 or

    an internal CAS filer • Source code control repository • XML database • NoSQL document store
  9. Relational may not cut it • Good at text and

    numbers. Not so good at binary. • Good at static table definitions. Not so good at dynamic aspects. • Size limits. • Random seek (streaming). • Search: Some relational databases can index into blobs, but not all.
  10. Once files are figured out . . . • Ensure

    security • Execute a workflow • Transform the content between types • Schedule a job • Provide shared drive access • Versioning • Replication • API Access • Integrate with authoring tools Lots of custom code!
  11. Evaluating DIY reasonableness • Number and size of documents •

    Number and concurrency of users • Number and nature of integration points • Business process volatility and complexity • Time and cost of • Integrating all of these services / sub-systems • Maintaining all of that code . . . forever • Access to off-the-shelf alternatives
  12. Introducing the content repository • Content = a file +

    metadata • File system • Content binaries • Search indexes • Database • Relations (associations) • Metadata • Repository • Abstraction layer
  13. Components of content-centric systems • User Interface • Persistence /

    Data Model / Metadata • Business Process / Workflow • Library Services (Upload / Download, Versioning, Check-in / Check-out) • Security • Search • Scheduler • Transformation / Rendition / Thumbnails • Tagging / Categorization • Authoring tool integration • Remote API • Transfer / Publication • Comments • Ratings • Activity Streams / Notification • Quotas
  14. Open source content management • Alfresco • Nuxeo • Knowledge

    Tree • Magnolia • Apache Jackrabbit • Plone • sort-of: check out cmis4plone
  15. Platform approach • The common problems have been solved •

    Content Platform = Repository + Services • Find a platform that meets your needs • Extend the platform with your own business logic • Customize the UI that the platform provides • Or write your own front-end using whatever language or framework makes sense • Meets your current needs while providing a roadmap for the future
  16. Evaluating content platforms • Agility • Applicable to a broad

    set of solutions vs a vertical specific solution • Scale up, scale down • Developer ergonomics • Fast and friendly developer model • Open Source • Troubleshooting • Bug tracking • Community • Standards compliance • Easier integration • Lower migration costs • Developer familiarity
  17. General architecture Web Applications Knowledge Portals Web Services Virtual File

    System High Availability Business Process Engine CRM Portal Server App Server
  18. and

  19. What is CMIS? • Content Management Interoperability Services • Language-independent,

    vendor-neutral API for content management • Least-common-denominator (some vendors have extensions) • CRUD functions for nodes • Check-in / check-out • Associations • Permissions (Access Control Lists) • Policies • Queries • Repository Traversal
  20. What is CMIS? • OASIS standard • 30+ ECM vendors

    agreed to implement • Two parts • Interoperability through standard SOAP and AtomPub bindings – JSON bindings coming soon • SQL-based query language for rich content repositories • Vendor specific extensions may be useful
  21. Use cases • Collaborative content creation • Portals • Client

    application Integration • Mashups • Embedded content store Client Content Repository Content Repository Content Repository Client Content Repository Content Repository Content Repository • Workflow & BPM • Archival • Documents generation • Digital Asset Management (DAM) • Web Content Mangaement (WCM)
  22. Meet CMIS Client Content Repository Services Domain Model read write

    Consumer Provider Vendor Mapping Content Management Interoperability Services CMIS lets you read, search, write, update, delete, version, control, … content and metadata!
  23. Types Document • Content • Renditions • Version History Folder

    • Container • Hierarchy • Filing Relationship • Source Object • Target Object ACL • Target Object Described by Type Definitions Policy • Target Object
  24. Type Definitions * Custom Type Object • Type Id •

    Parent • Display Name • Queryable • Controllable Document • Versionable • Allow Content Folder Relationship • Source Types • Target Types Policy Property • Property Id • Display Name • Type • Required • Default Value • …
  25. Apache Chemistry • Open Source implementations of CMIS • Umbrella

    project for all CMIS related projects within the ASF • OpenCMIS (Java, client and server) • cmislib (Python, client) • phpclient (PHP, client) • DotCMIS (.NET, client) • De-facto reference for CMIS and used by CMIS technical committee to test 1.1 features
  26. My setup • Ubuntu 11.04 • OpenJDK 1.6.0_22 • PHP

    5.3.5 • Python 2.7.1 • Alfresco Community Edition 4.0.d
  27. CMIS Workbench • Download • http://chemistry.apache.org/java/developing/tools /dev-tools-workbench.html • Connect to

    Alfresco • http://localhost:8080/alfresco/cmisatom • Good tool for figuring out what CMIS can do • Check out the Groovy Console!
  28. PHP and Drupal • Drupal CMIS Views • http://drupal.org/project/cmis_views •

    Built on Drupal CMIS • http://drupal.org/project/cmis • Configure a repository in settings.php • Enable cmis_sync • Bundles an early release of phplib • Currently read-only • Good for exposing unstructured data alongside a structured web page
  29. Python • In the shell: virtualenv . ./bin/easy_install cmislib ./bin/python

    from cmislib.model import CmisClient client = CmisClient("http://192.168.56.1:8080/alfresco/cmisatom" , "admin", "admin") repo = client.defaultRepository repo.id repo.name for (k,v) in repo.getCapabilities().iteritems(): print "%s: %s" %(k,v) for (k,v) in repo.getRepositoryInfo().iteritems(): print "%s: %s" %(k,v) root = repo.getRootFolder() root.name folder = root.createFolder('cmis-demo') folder.id folder.name for (k,v) in folder.properties.iteritems(): print "%s: %s" %(k,v) • Continued: props = {} props["cmis:objectTypeId"]="cmis:document" doc = folder.createDocumentFromString('testdoc.txt', props, contentString="This is a test showing how to create a text document", contentType='text/plain') doc.isCheckedOut() props = {} props['cmis:name'] = "test-updated.txt" doc = doc.updateProperties(props) doc.name doc.delete() len(folder.getChildren()) result = repo.query("select * from cmis:folder where cmis:name like '%alf%'") len(result) for i in result: print i.name result = repo.query("select * from cmis:document where contains('name')") for i in result print i.name
  30. Where to learn more • cmis.alfresco.com includes a public CMIS

    server and links to CMIS resources (check out the cheet sheet) • Read the CMIS specification • Apache Chemistry site has clients, lightweight server, documentation • “Getting Started with CMIS” tutorial shows how to us" cURL to hit AtomPub bindings directly • Slideshare has some CMIS related presentations from Alfresco DevCon here and here
  31. Attribution and Licensing • Copyright 2012, Alfresco Software • Some

    images used in this presentation are Licensed under the Creative Commons by- attribution non-commercial share-alike license. • Original work in this presentation is licensed under the Creative Commons by-attribution license. • Thanks to Jeff Potts for allowing me to base my presentation on his.