Slide 1

Slide 1 text

Architecting Content-Centric Applications Patterns for Handling Content in Applications A Content Management Primer Why Relational Won't Cut It What I Wish I Understood About Content Before Writing My First Real Application (Solving SharePoint-Type Problems With an Open Source Stack) Richard Esplin Community Technology Alfresco

Slide 2

Slide 2 text

Agenda ● Making the case for content management ● Best practices: the platform approach ● Introducing CMIS ● Live examples

Slide 3

Slide 3 text

What is “content”? ● Data ● Don't mistake Code for Content ● Unstructured Data ● Structured data works well in a relational data store, XML store, or key-value store ● Unstructured Binary Data ● Unstructured non-binary data works well in source control ● Examples: ● Audio, Video, Images, Office Documents, Engineering Files, Reports

Slide 4

Slide 4 text

What is a “content-centric application”? ● Applications that access binary files ● Files are often generated collaboratively ● Often must deal with large numbers of files ● May include a mix of structured and unstructured content ● May also include business processes

Slide 5

Slide 5 text

A few examples ● Web site with catalogs, white papers, and videos ● Expense report review and approval ● Contract negotiation, creation, and review ● Research study authoring ● Sales / Marketing collateral creation and communication ● Course guide authoring and publishing ● Images and media in games ● Media curation, transformation, and delivery ● Legal compliance and corporate records management

Slide 6

Slide 6 text

Or the business is saying . . . ● I've got a ton of files, ● I've got people that produce and consume them, ● I've got systems that use them, ● I want to make it easier! Doug Waldron (cc attribution share-alike) http://www.flickr.com/photos/dougww/922328173/

Slide 7

Slide 7 text

Let's build it ourselves! Pasukaru76 (cc attribution) http://www.flickr.com/photos/pasukaru76/4277763808/

Slide 8

Slide 8 text

DIY approach seems simple . . . ● “This is standard stuff.” ● Grab a web-application toolkit ● Favorite front-end / presentation framework ● Store a bunch of files ● Relational Database ● Data Model / Metadata ● Comments / Ratings ● Tagging / Categorization

Slide 9

Slide 9 text

File storage options ● On disk ● Amazon S3 or an internal CAS filer ● Source code control repository ● XML database ● NoSQL document store

Slide 10

Slide 10 text

Relational may not cut it ● Good at text and numbers. Not so good at binary. ● Good at static table definitions. Not so good at dynamic aspects. ● Size limits. ● Random seek (streaming). ● Search: Some relational databases can index into blobs, but not all.

Slide 11

Slide 11 text

Once files are figured out . . . ● Ensure security ● Execute a workflow ● Transform the content between types ● Schedule a job ● Provide shared drive access ● Versioning ● Replication ● API Access ● Integrate with authoring tools Lots of custom code!

Slide 12

Slide 12 text

“What have we done?” http://commons.wikimedia.org/wiki/File:Professor_Lucifer_Butts.gif

Slide 13

Slide 13 text

“What have we done?” gobucks2 (cc attribution non-commercial share-alike) http://www.flickr.com/photos/69331170@N00/2854583096

Slide 14

Slide 14 text

Evaluating DIY reasonableness ● Number and size of documents ● Number and concurrency of users ● Number and nature of integration points ● Business process volatility and complexity ● Time and cost of ● Integrating all of these services / sub-systems ● Maintaining all of that code . . . forever ● Access to off-the-shelf alternatives

Slide 15

Slide 15 text

Introducing the content repository ● Content = a file + metadata ● File system ● Content binaries ● Search indexes ● Database ● Relations (associations) ● Metadata ● Repository ● Abstraction layer

Slide 16

Slide 16 text

Components of content-centric systems ● User Interface ● Persistence / Data Model / Metadata ● Business Process / Workflow ● Library Services (Upload / Download, Versioning, Check-in / Check-out) ● Security ● Search ● Scheduler ● Transformation / Rendition / Thumbnails ● Tagging / Categorization ● Authoring tool integration ● Remote API ● Transfer / Publication ● Comments ● Ratings ● Activity Streams / Notification ● Quotas

Slide 17

Slide 17 text

Packaged systems

Slide 18

Slide 18 text

Open source content management ● Alfresco ● Nuxeo ● Knowledge Tree ● Magnolia ● Apache Jackrabbit ● Plone ● sort-of: check out cmis4plone

Slide 19

Slide 19 text

Best Practices: The Platform Approach

Slide 20

Slide 20 text

Platform approach ● The common problems have been solved ● Content Platform = Repository + Services ● Find a platform that meets your needs ● Extend the platform with your own business logic ● Customize the UI that the platform provides ● Or write your own front-end using whatever language or framework makes sense ● Meets your current needs while providing a roadmap for the future

Slide 21

Slide 21 text

Evaluating content platforms ● Agility ● Applicable to a broad set of solutions vs a vertical specific solution ● Scale up, scale down ● Developer ergonomics ● Fast and friendly developer model ● Open Source ● Troubleshooting ● Bug tracking ● Community ● Standards compliance ● Easier integration ● Lower migration costs ● Developer familiarity

Slide 22

Slide 22 text

General architecture Web Applications Knowledge Portals Web Services Virtual File System High Availability Business Process Engine CRM Portal Server App Server

Slide 23

Slide 23 text

and

Slide 24

Slide 24 text

What is CMIS? ● Content Management Interoperability Services ● Language-independent, vendor-neutral API for content management ● Least-common-denominator (some vendors have extensions) ● CRUD functions for nodes ● Check-in / check-out ● Associations ● Permissions (Access Control Lists) ● Policies ● Queries ● Repository Traversal

Slide 25

Slide 25 text

What is CMIS? ● OASIS standard ● 30+ ECM vendors agreed to implement ● Two parts ● Interoperability through standard SOAP and AtomPub bindings – JSON bindings coming soon ● SQL-based query language for rich content repositories ● Vendor specific extensions may be useful

Slide 26

Slide 26 text

Use cases ● Collaborative content creation ● Portals ● Client application Integration ● Mashups ● Embedded content store Client Content Repository Content Repository Content Repository Client Content Repository Content Repository Content Repository ● Workflow & BPM ● Archival ● Documents generation ● Digital Asset Management (DAM) ● Web Content Mangaement (WCM)

Slide 27

Slide 27 text

The beauty of CMIS ? Presentation Tier Content Services Tier ? Enterprise Apps Tier REST SOAP

Slide 28

Slide 28 text

Meet CMIS Client Content Repository Services Domain Model read write Consumer Provider Vendor Mapping Content Management Interoperability Services CMIS lets you read, search, write, update, delete, version, control, … content and metadata!

Slide 29

Slide 29 text

Types Document ● Content ● Renditions ● Version History Folder ● Container ● Hierarchy ● Filing Relationship ● Source Object ● Target Object ACL ● Target Object Described by Type Definitions Policy ● Target Object

Slide 30

Slide 30 text

Type Definitions * Custom Type Object ● Type Id ● Parent ● Display Name ● Queryable ● Controllable Document ● Versionable ● Allow Content Folder Relationship ● Source Types ● Target Types Policy Property ● Property Id ● Display Name ● Type ● Required ● Default Value ● …

Slide 31

Slide 31 text

Apache Chemistry ● Open Source implementations of CMIS ● Umbrella project for all CMIS related projects within the ASF ● OpenCMIS (Java, client and server) ● cmislib (Python, client) ● phpclient (PHP, client) ● DotCMIS (.NET, client) ● De-facto reference for CMIS and used by CMIS technical committee to test 1.1 features

Slide 32

Slide 32 text

Examples

Slide 33

Slide 33 text

My setup ● Ubuntu 11.04 ● OpenJDK 1.6.0_22 ● PHP 5.3.5 ● Python 2.7.1 ● Alfresco Community Edition 4.0.d

Slide 34

Slide 34 text

CMIS Workbench ● Download ● http://chemistry.apache.org/java/developing/tools /dev-tools-workbench.html ● Connect to Alfresco ● http://localhost:8080/alfresco/cmisatom ● Good tool for figuring out what CMIS can do ● Check out the Groovy Console!

Slide 35

Slide 35 text

PHP and Drupal ● Drupal CMIS Views ● http://drupal.org/project/cmis_views ● Built on Drupal CMIS ● http://drupal.org/project/cmis ● Configure a repository in settings.php ● Enable cmis_sync ● Bundles an early release of phplib ● Currently read-only ● Good for exposing unstructured data alongside a structured web page

Slide 36

Slide 36 text

Python ● Interactive demo ● PyQt + cmislib demo

Slide 37

Slide 37 text

Python ● In the shell: virtualenv . ./bin/easy_install cmislib ./bin/python from cmislib.model import CmisClient client = CmisClient("http://192.168.56.1:8080/alfresco/cmisatom" , "admin", "admin") repo = client.defaultRepository repo.id repo.name for (k,v) in repo.getCapabilities().iteritems(): print "%s: %s" %(k,v) for (k,v) in repo.getRepositoryInfo().iteritems(): print "%s: %s" %(k,v) root = repo.getRootFolder() root.name folder = root.createFolder('cmis-demo') folder.id folder.name for (k,v) in folder.properties.iteritems(): print "%s: %s" %(k,v) ● Continued: props = {} props["cmis:objectTypeId"]="cmis:document" doc = folder.createDocumentFromString('testdoc.txt', props, contentString="This is a test showing how to create a text document", contentType='text/plain') doc.isCheckedOut() props = {} props['cmis:name'] = "test-updated.txt" doc = doc.updateProperties(props) doc.name doc.delete() len(folder.getChildren()) result = repo.query("select * from cmis:folder where cmis:name like '%alf%'") len(result) for i in result: print i.name result = repo.query("select * from cmis:document where contains('name')") for i in result print i.name

Slide 38

Slide 38 text

Where to learn more ● cmis.alfresco.com includes a public CMIS server and links to CMIS resources (check out the cheet sheet) ● Read the CMIS specification ● Apache Chemistry site has clients, lightweight server, documentation ● “Getting Started with CMIS” tutorial shows how to us" cURL to hit AtomPub bindings directly ● Slideshare has some CMIS related presentations from Alfresco DevCon here and here

Slide 39

Slide 39 text

Attribution and Licensing ● Copyright 2012, Alfresco Software ● Some images used in this presentation are Licensed under the Creative Commons by- attribution non-commercial share-alike license. ● Original work in this presentation is licensed under the Creative Commons by-attribution license. ● Thanks to Jeff Potts for allowing me to base my presentation on his.