Slide 1

Slide 1 text

Big  Data  Portal   using  Liferay  and   MongoDB     June  14,  2012  

Slide 2

Slide 2 text

About  CIGNEX  Datamatics   2

Slide 3

Slide 3 text

What  Does  CIGNEX  Datamatics  Do?   3   Since  2000,  CIGNEX  Datamatics  helped  its   clients  save  in  excess  of    US$500  Million  by   leveraging  Commercial  Open  Source   Software  across  200+  implementations.     We  are  experts  at  Open  Source!   2011  Liferay  Partner  of  the  Year  &   Global  Platinum  Partner   Thought  Leader  in  the  Open  Source   Community    and  author  of  technical   resource  guides  

Slide 4

Slide 4 text

About  the  presenter   •  Yash   Badiani   is   the   Big   Data   Practice   Lead   at   CIGNEX   Datamatics   and   is   focused  on  Big  Data  Technologies  like  MongoDB  &  Hadoop.     •  He   has   worked   extensively   on   large   Data   warehousing   &   Business   Intelligence  projects  with  tools  like  Business  Objects,  Microsoft  SQL  Server,   Microstrategy  &  IBM  Cognos.     •  Yash  can  be  reached  at  [email protected]   4

Slide 5

Slide 5 text

Agenda   •  What  is  Big  Data  Portal   •  Introduction  to  Portals  &  Liferay   •  Key  challenges  with  content  and  RDBMS   •  Introduction  to  MongoDB   •  Power  of  Liferay  +  Power  of  MongoDB  =  Big  Data  portal   •  Bene[its   •  Solution  Details   –  User  View   –  Administrator  View   –  Developer  View   •  Summary   5

Slide 6

Slide 6 text

What  is  Big  Data  Portal 6

Slide 7

Slide 7 text

    A  Big  Data  Portal  is  a  web  based  solution  which   combines  the  powerful  presentation  capabilities  of  a   portal  such  as  rich  user  interface,  collaboration   and  secure  access  with  a  centralized  &  massively   scalable  data  storage  as  the  back  end  consisting  of   a  variety  of  content  (Audio,  Video,  Images,   Documents,  Metadata)  in  large  volumes.     7 What  is  Big  Data  Portal?  

Slide 8

Slide 8 text

Introduction  to  Portals  &  Liferay   8

Slide 9

Slide 9 text

What  are  Portals?   •  A  software  platform  for  building  websites  and  web   applications   9 End  Users   Business   Organizations   IT  Organizations   Single,  personalized  point     of  access  to  relevant  and     authoritative  information   Uni[ied  place  to  engage,     support,  learn  and     respond  to  customers     Agile,  scalable  web  apps,   Enable  Collaboration,     Delegate  responsibilities  

Slide 10

Slide 10 text

What  is  Liferay?     •  Enterprise  web  platform   for  building  business   solutions   •  Leading  Open  source   portal   –  Strong  community   –  4  million  downloads   –  350,000  –  500,000  deployments   worldwide   –  Leader  in  Gartner’s  Magic   Quadrant  for  Horizontal  Portals   10 Capabilities   •  Built  on  Java  –  Cross  platform  &  Light  weight     •  Content  &  Document  Management  with  MS   Of[ice  Integration   •  Web  Publishing  &  Shared  workspaces   •  Enterprise  collaboration  &  Application   Integration   •  Enterprise  portals  &  Identity  Management   •  Social  Networking  &  Mashup  

Slide 11

Slide 11 text

Key  challenges  with  content   11

Slide 12

Slide 12 text

Key  challenges  with  content   12 Variety  of  Content   Centralization  of  content   Volume   *Reference  Image  

Slide 13

Slide 13 text

Limitations  of  RDBMS   13

Slide 14

Slide 14 text

Evolution  in  computing  is  impacting  traditional  RDBMS   14   Volume  of  Data   Agile  Development   New  Hardware  Architectures   •  Commodity  servers   •  Cloud  Computing   •  Trillions  of  records   •  100’s  of  millions  of   queries  per  second   •  Iterative   •  Continuous   Source:  10gen  Corp  Overview  

Slide 15

Slide 15 text

Business  Limitations  of  RDBMS   15 Cost  of  database  increases   •  Vertical,  not  horizontal,  scaling   •  High  cost  of  SAN   Productivity  decreases   •  Needed  to  add  new  software   layers  of  ORM,  Caching,   Sharding,  Message  Queue   •  Polymorphic,  semi-­‐structured   and  unstructured  data  not  well   supported   Source:  10gen  Corp  Overview  

Slide 16

Slide 16 text

Introduction  to  MongoDB   16

Slide 17

Slide 17 text

What  is  MongoDB?   17 •  Open  source,  written  in  C++   •  Document-­‐oriented  Storage   –  Based  on  JSON  Documents   –  Schema-­‐less   •  Cool  Vendor  –  Information   Infrastructure  and  Big  Data  -­‐  2012   •  Full  featured  indexes,  query   language     •  Replication  &  High  Availability   •  Auto-­‐sharding   MongoDB  is  a  scalable,  high-­‐performance  NoSQL  database.   Source:  10gen  Corp  Overview  

Slide 18

Slide 18 text

Why  Organizations  should  use  MongoDB?   18 Easy  to  code  for  increased   Agility   Easy  to  scale  for   performance  &  high   availability   Easy  to  operate,  even  in   the  cloud   Source:  10gen  Corp  Overview  

Slide 19

Slide 19 text

  Power  of  Liferay  +  Power  of  MongoDB   =  Big  Data  portal     19

Slide 20

Slide 20 text

Solution  :  Combining  Liferay  &  MongoDB   20 Portal  with    Rich  UI  front  end     Secure  Access  –  Role  Based,  Site  based   Versioning   Search   Locking   Mobile  Access                                                              (Powerful  Back  end)     Structured,  Unstructured  data     Massively  scalable   Highly  reliable  data  storage   Highly  performance   Highly  Flexible-­‐Schema,  Development         Big  Data  PORTAL   Rich  UI  features       Connector           Data  storage   Data  Assets      

Slide 21

Slide 21 text

Bene[its   How  does  MongoDB  enhance  Liferay?   How  does  Liferay  enhance  MongoDB?   21

Slide 22

Slide 22 text

Bene[its:  How  MongoDB  enhances  Liferay   • Leverage  Auto  sharding  &  replica  set  features   • Elasticity  in  scaling  storage  –  go  up  or  down   Scalability   • Commodity  Hardware  –  Eliminates  Network  storage  like  SAN   • Eliminates  need  for  high-­‐end  storage  systems    such  as  EMC  Documentum   Cost  Effectiveness   • Faster  Development   • Easier  Deployment   • Flexible  &  Schema  less   Agility  &  Performance   • GridFS  enables  large  binary  objects  like  Images,  Video  or  Audio   • Simpli[ies  Management  of  data   • Single  system  to  manage  structured  &  unstructured  data     Large  Object  Storage  &  Centralized  Data  Management   22

Slide 23

Slide 23 text

Bene[its:  How  Liferay  enhances  MongoDB   • Powerful  Websites  consisting  of   • Gadgets  &  Portlets  –  Portions  of  a  Web  page  that  may  be  a  complete  application   • Pages  &  Themes  –  Common,  Consistent  look  &  feel  across  multiple  pages   • Navigation  –  Menu  bar,  Tabs,  Links   Rich  Front  End   • Role  based   • Site  based   • Login  status  based   Secure  Views  to  data   • Data  access  on  the  go   • Different  Themes  for  Mobile  –  HTML5,  CSS3   Mobile  Integration   • Use  of  Open  standards  ,  Web  services  and  integration  tools   • SOA       Flexible  Architecture  and  Lean  Platform     23

Slide 24

Slide 24 text

Solution  Details   User,  Administrator  and  Developer  Views   24

Slide 25

Slide 25 text

Solution  &  Features   •  News  site   –  CIGNEX  News  portal  providing  secure  user  interface  to  :   •  Latest  News  articles  &  archives   •  Latest  Videos  &  archives   •  Images  &  archives   –  Features  of  the  portal:   •  Provide  content  authors  to  Add  /  Delete  /  Update  /  Retrieve   documents,  Lock  for  updates  &  version  them   •  Provide  Administrators  to  con[igure  [ine  grained  access  control  to  the   site  –  Role  based,  Site  based,  Folder  /  File  based   •  Provide  Work[low  support  for  content  review  &  [inalization  at  various   levels   –  Scalability  &  Flexibility  in  content  storage  -­‐  Provide  a  scalable  &   [lexible  data  storage  to  scale  for  ever  growing  variety  of  content   25

Slide 26

Slide 26 text

Solution  &  Features   •  Statistics   –  1M  content  [iles  uploaded  for  demo   –  Scalable  upto  100s  of  millions   –  Video  [iles  more  than  100  MB  stored  into  GridFS   26

Slide 27

Slide 27 text

User  View   27   MongoDB   MongoDB   MongoDB   MongoDB   MongoDB  

Slide 28

Slide 28 text

User  View   28 Folders  organizing  the  data  with  View  /  Edit  privileges  at  each  Folder.    

Slide 29

Slide 29 text

User  View   File  Level  Edit  Access   File  Level  View   only    Access   29 Videos  Folder  containing  video  [iles  of  different  type  stored  in  MongoDB  

Slide 30

Slide 30 text

Administrator  View   Creating  different  Roles  –  Regular,  Site,  Org   30

Slide 31

Slide 31 text

Administrator  View   user1   Creating  users   31

Slide 32

Slide 32 text

Administrator  View   Assigning  a  Role(newsrole)  to  a   user(user1)   32

Slide 33

Slide 33 text

Administrator  View   Assigning  a  Role(newsrole)  to  a   user(user1)   Assigning  Rights  to  the  role   33

Slide 34

Slide 34 text

Administrator  View   Assigning  a  Role(newsrole)  to  a   user(user1)   Assigning  Rights  to  the  role   Assigning  File  level   permissions  to  role   34

Slide 35

Slide 35 text

Developer  View  -­‐  Technical  Architecture     35 MongoDB  Connector   CD  MongoDBFileSystemStore     [Storing  data  in  MongoDB]   Document   Library  Portlet   Store   MySQL   (metadata)   Lucene   (indexing  &   search)   Detailed   MongoDB  (GridFS)  

Slide 36

Slide 36 text

Developer  View  –  Technologies  &  Components   •  Technologies  used:   –  liferay-­‐portal-­‐6.1.10-­‐ee-­‐ga1   –  mongodb-­‐linux-­‐x86_64-­‐2.0.4   •  Liferay  Extension  plugin   –  Method  of  extending  Liferay   –  Allows  usage  of  internal  APIs  /  overwriting  [iles  in  Liferay  core   –  Require  server  to  be  restarted  after  development   •  Liferay  portal  con[iguration  [ile   –  portal.properties  –  Main  con[iguration  [ile  for  Liferay  portal.   Contains  detailed  explanation  of  the  properties   –  Portal-­‐ext.properties  –  Used  to  change  the  value  of  any  of  the   properties  de[ined  in  portal.properties     –  Contains  reference  to  the  custom  Implementation  class  &  Mongo   host  &  access  information   36

Slide 37

Slide 37 text

Developer  View  -­‐  Components   •  Document  Library  portlet   –  Central  place  to  aggregate  and  manage  all  content   –  Provides  document  management  backed  by  different  persistence   systems   –  Features  such  as  check  in  /  check  out,  meta  data,  versioning   •  CD  MongoDBFileSystemStore   –  Implementation  of  Liferay  Doc  Library  store  API   –  Signatures  of  all  methods  (add,  update,  view,  delete)   •  MongoDB  Connector   –  Gets  the  Host  information  from  the  portal-­‐ext.properties   –  Uses  the  JAVA  driver  for  data  manipulation  commands   –  Leverages  the  GridFS  API  to  store  large  binary  objects   37

Slide 38

Slide 38 text

Developer  View  -­‐  Components   •  GridFS     –  Speci[ication  for  storing  large  [iles  in  MongoDB   –  Native  storage  of  binary  data  within  BSON  objects  limited  at  16MB   –  Ef[iciently  stores  large  [iles     –  Transparently  divides  large  [iles  among  multiple   documents(chunks)   –  Each  chunk  256k  in  size   –  2  collections:  [iles(stores  the  metadata),  chunks(actual  data)   –  All  drivers  support  GridFS  Implementation  through  API   38

Slide 39

Slide 39 text

Developer  view  -­‐  Design   •  Design   –  Uses  Liferay  Extension  plugin  to  develop  new  Document  Library  store  for   MongoDB   –  Uses  Liferay  portal  con[iguration  [ile  to  con[igure  document  library  portlet   to  use  MongoDB  to  store  content   –  Once  con[igured,  all  document  upload  /  download  requests  from   document  library  portlet  are  delegated  to  CD  MongoDBFileSystemStore   –  CD  MongoDBFileSystemStore  uses  MongoDB  java  driver  &  GridFS  Java  API   to  store  or  retrieve  documents  from  MongoDB   –  Java  driver  uses  Mongo  Wire  protocol           39

Slide 40

Slide 40 text

Summary   40

Slide 41

Slide 41 text

Summary   •  MongoDB  enables  Portals  for  scalability  (for  huge  volumes   of  content)  and  [lexibility  (schema-­‐less  content)   •  Liferay’s  rich  user  interface,    content  management,   security,  social  and  mobile  features  compliment   MongoDB’s  powerful  storage  features   •  Big  Data  Portal  with  MongoDB  and  Liferay  provide  lower   TCO  and  higher  ROI  to  enterprises   41

Slide 42

Slide 42 text

Thank  you.  Questions?   42   CIGNEX  Datamatics  makes   Open  Source  work  for  you!     Yash  Badiani   Big  Data  Practice  Lead   [email protected]       Brendan  Coleman   Director  of  Channels   [email protected]     Kristin  Smith   Sales  &  Marketing  Manager   [email protected]