Using the CMIS standard and open source tools to build applications that interact with unstructured binary content (audio, video, office documents, and other opaque formats).
Content Management Primer Why Relational Won't Cut It What I Wish I Understood About Content Before Writing My First Real Application (Solving SharePoint-Type Problems With an Open Source Stack) Richard Esplin Community Technology Alfresco
Content • Unstructured Data • Structured data works well in a relational data store, XML store, or key-value store • Unstructured Binary Data • Unstructured non-binary data works well in source control • Examples: • Audio, Video, Images, Office Documents, Engineering Files, Reports
files • Files are often generated collaboratively • Often must deal with large numbers of files • May include a mix of structured and unstructured content • May also include business processes
and videos • Expense report review and approval • Contract negotiation, creation, and review • Research study authoring • Sales / Marketing collateral creation and communication • Course guide authoring and publishing • Images and media in games • Media curation, transformation, and delivery • Legal compliance and corporate records management
got a ton of files, • I've got people that produce and consume them, • I've got systems that use them, • I want to make it easier! Doug Waldron (cc attribution share-alike) http://www.flickr.com/photos/dougww/922328173/
standard stuff.” • Grab a web-application toolkit • Favorite front-end / presentation framework • Store a bunch of files • Relational Database • Data Model / Metadata • Comments / Ratings • Tagging / Categorization
numbers. Not so good at binary. • Good at static table definitions. Not so good at dynamic aspects. • Size limits. • Random seek (streaming). • Search: Some relational databases can index into blobs, but not all.
security • Execute a workflow • Transform the content between types • Schedule a job • Provide shared drive access • Versioning • Replication • API Access • Integrate with authoring tools Lots of custom code!
Number and concurrency of users • Number and nature of integration points • Business process volatility and complexity • Time and cost of • Integrating all of these services / sub-systems • Maintaining all of that code . . . forever • Access to off-the-shelf alternatives
Content Platform = Repository + Services • Find a platform that meets your needs • Extend the platform with your own business logic • Customize the UI that the platform provides • Or write your own front-end using whatever language or framework makes sense • Meets your current needs while providing a roadmap for the future
set of solutions vs a vertical specific solution • Scale up, scale down • Developer ergonomics • Fast and friendly developer model • Open Source • Troubleshooting • Bug tracking • Community • Standards compliance • Easier integration • Lower migration costs • Developer familiarity
agreed to implement • Two parts • Interoperability through standard SOAP and AtomPub bindings – JSON bindings coming soon • SQL-based query language for rich content repositories • Vendor specific extensions may be useful
project for all CMIS related projects within the ASF • OpenCMIS (Java, client and server) • cmislib (Python, client) • phpclient (PHP, client) • DotCMIS (.NET, client) • De-facto reference for CMIS and used by CMIS technical committee to test 1.1 features
Built on Drupal CMIS • http://drupal.org/project/cmis • Configure a repository in settings.php • Enable cmis_sync • Bundles an early release of phplib • Currently read-only • Good for exposing unstructured data alongside a structured web page
from cmislib.model import CmisClient client = CmisClient("http://192.168.56.1:8080/alfresco/cmisatom" , "admin", "admin") repo = client.defaultRepository repo.id repo.name for (k,v) in repo.getCapabilities().iteritems(): print "%s: %s" %(k,v) for (k,v) in repo.getRepositoryInfo().iteritems(): print "%s: %s" %(k,v) root = repo.getRootFolder() root.name folder = root.createFolder('cmis-demo') folder.id folder.name for (k,v) in folder.properties.iteritems(): print "%s: %s" %(k,v) • Continued: props = {} props["cmis:objectTypeId"]="cmis:document" doc = folder.createDocumentFromString('testdoc.txt', props, contentString="This is a test showing how to create a text document", contentType='text/plain') doc.isCheckedOut() props = {} props['cmis:name'] = "test-updated.txt" doc = doc.updateProperties(props) doc.name doc.delete() len(folder.getChildren()) result = repo.query("select * from cmis:folder where cmis:name like '%alf%'") len(result) for i in result: print i.name result = repo.query("select * from cmis:document where contains('name')") for i in result print i.name
server and links to CMIS resources (check out the cheet sheet) • Read the CMIS specification • Apache Chemistry site has clients, lightweight server, documentation • “Getting Started with CMIS” tutorial shows how to us" cURL to hit AtomPub bindings directly • Slideshare has some CMIS related presentations from Alfresco DevCon here and here
images used in this presentation are Licensed under the Creative Commons by- attribution non-commercial share-alike license. • Original work in this presentation is licensed under the Creative Commons by-attribution license. • Thanks to Jeff Potts for allowing me to base my presentation on his.