Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Elephant in the Cloud - Bring the Power of ...

Uri Cohen
December 05, 2012

The Elephant in the Cloud - Bring the Power of Hadoop and the Cloud Togethe

The massive computing and storage resources that are needed to support big data deployments in general, and Hadoop in particular, make cloud environments, public and private, an ideal fit.

Managing your big data app, however, is no walk in the park - especially when considering the fact that these systems and application stacks often include other services beyond Hadoop such as: relational databases, other NoSQL databases and more, where each framework comes with its own management, installation, configuration, and scaling solutions. This is where Cloudify for BigInsights fits in. Together these provide consistent management that makes the provisioning, deployment, and scaling of your Hadoop system simple and consistent with the rest of your stack. What’s more, it enables cloud portability between bare-metal and public cloud offering for Hadoop making it possible to run your system on the right cloud for the job—public/virtualized environment for on-demand workloads and bare-metal for I/O intensive operations.

Come see how large-scale big data distributions can be easily ported and managed consistently across andy cloud.

Uri Cohen

December 05, 2012
Tweet

More Decks by Uri Cohen

Other Decks in Technology

Transcript

  1. Realiza9on:       What  You   Really  Care  

    about  Is     App     Portability  
  2. •  Available  under  a   number  of  distros:   • 

    Apache     •  Cloudera   •  HortonWorks     •  IBM  BigInsights   •  MapR  
  3. InfoSphere   BigInsights •  Analysis  of  Data  at   Rest

      •  Based  on  Hadoop  with   IBM  Value  Adds:   •  Enterprise  Grade   •  Security   •  Workload  management   •  IBM  Analy9cs     •  Visualiza9on   •  Simplified  App   Development  
  4. Big  Data   Strategy:     Move  the   Analy9cs

      Closer  to  the   Data   BI  /   Repor9ng     BI  /   Repor9ng   Explora9on  /   Visualiza9on   Func9onal   App   Industry   App   Predic9ve   Analy9cs   Content   Analy9cs   Analy9c  Applica9ons   IBM  Big  Data  Pla2orm   Systems   Management   Applica9on   Development   Visualiza9on     &  Discovery   Accelerators   Informa9on  Integra9on  &  Governance   Hadoop   System   Stream   Compu9ng   Data   Warehouse  
  5. Capability Open Source Hadoop Distributions InfoSphere BigInsights Parallel Processing Engine

    (MapReduce)   Mixed Data Type File System Support   Columnar Database   Text analytics  Performance and Workload Optimizations  Data Visualization  Developer Workbench & Admin Console  Accelerators  Enterprise Connectors  Security  Comparing  Open  Source  Hadoop  with   Enterprise  Grade  BigInsights    
  6. eBay: Two 1,008 node Hadoop clusters. 50 petabytes of disk

    Big  Data  =  Big  Resources   •  Yahoo!     •  40,000  Hadoop  nodes   •  180-­‐200  petabytes  of  data.     •  Facebook   •  2000  compute  nodes     •  20  Petabytes  of  data.   Hadoop:   •  Average  cluster  size  is  200  servers     •  most  clusters  around  30  servers  
  7. • What  insight  could   you  gain  if  you  had  full

      use  of  a  100-­‐node   cluster   We  don’t  have   resources  to  do   anything  like  that   What  if  one  hour  of  this   100-­‐node  cluster  would   cost  $34?   Cloud  removes  the  impediment  
  8. •  Auto  start  VMs   •  Install  and  configure  

    app  components     •  Monitor     •  Repair     •  (Auto)  Scale   •  Burst…    
  9. •  Auto  start  VMs   •  Orchestrate   •  Install

     and  configure   •  Monitor     •  Repair  (par9ally)     •  (Auto)  Scale   •  Burst…    
  10. Consistent   Management     Making  the  deployment,   installa9on,

     scaling,  fail-­‐ over  looks  the  same   through  the  en9re  stack  
  11. 30   application  {    name="petclinic"    service  {  

         name  =  "mongod"      }    service  {        name  =  "mongoConfig"      }    service  {        name  =  "apacheLB"      }    service  {        name  =  "mongos"        dependsOn  =  ["mongoConfig",  "mongod"]    }    service  {        name  =  "tomcat"        dependsOn  =  ["mongos","apacheLB"]            }   }   How  It  Works  
  12. 31   service  {      name  "mysql"    

     icon  "mysql.png"      type  "DATABASE"    ...   } How  It  Works  
  13. ®  Copyright  2012  GigaSpaces  Ltd.  All  Rights   Reserved  

    32     monitors  {        def  ctxPath  =  ("default"  ==  context.applicationName)?"":"${context.applicationName}“        def  metricNamesToMBeansNames  =  [          "Current  Http  Threads  Busy":  ["Catalina:type=ThreadPool,name=\"http-­‐bio-­‐${currHttpPort}\"",                                                                        "currentThreadsBusy"],            "Current  Http  Thread  Count":  ["Catalina:type=ThreadPool,name=\"http-­‐bio-­‐${currHttpPort}\"",                                                                        "currentThreadCount"],        return  getJmxMetrics("127.0.0.1",currJmxPort,metricNamesToMBeansNames)   }   How  It  Works  
  14. ®  Copyright  2012  GigaSpaces  Ltd.  All  Rights   Reserved  

    33   scalingRules  ([      scalingRule  {          serviceStatistics  {              metric  "Total  Requests  Count"              statistics  Statistics.maximumThroughput              movingTimeRangeInSeconds  20          }          highThreshold  {              value  1              instancesIncrease  1          }          lowThreshold  {              value  0.2              instancesDecrease  1          }          }   ])   How  It  Works  
  15. Pufng   Cloudify  and   Hadoop   Together •  Run

     on  Any  Cloud   •  Consistent  MGT   •  Dynamic  Scaling     •  Auto  Recovery   •  Auto  Scaling   •  Role  Assignments     •  Monitoring   •  Simple  maintenance  
  16. BigDataUniversity.com:     Making  Learning  Big  Data  Easy  and  Fun

      •  Flexible  on-­‐line  delivery   allows  learning  @your  place   and  @your  pace   •   Free  courses,  free  study   materials.     •   Cloud-­‐based  sandbox  for   exercises  –  zero  setup   •   ~54K  registered  students