Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Apache Hadoop in the Enterprise

Avatar for dfwbigdata dfwbigdata
October 11, 2011

Apache Hadoop in the Enterprise

by Cloudera

Avatar for dfwbigdata

dfwbigdata

October 11, 2011
Tweet

More Decks by dfwbigdata

Other Decks in Technology

Transcript

  1. Agenda ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or

    redistribution without written permission is prohibited. 2 1 Introductions & housekeeping 2 Why Apache Hadoop? 3 Where is the technology and market headed? 4 Brief Cloudera commercial 5 Wrap up & Q&A
  2. Who We Are ©2011 Cloudera, Inc. All Rights Reserved. Confidential.

    Reproduction or redistribution without written permission is prohibited. 3 How We Do It We deliver relevant products and services.  A distribution of Apache Hadoop that is tested, certified and supported  Comprehensive support and professional service offerings  A suite of management software for Hadoop operations  Training and certification programs for developers, administrators, managers and data scientists Technical Team Unmatched knowledge and experience.  Founders, committers and contributors to Hadoop  A wealth of experience in the design and delivery of production software Credentials The Apache Hadoop experts.  Number 1 distribution of Apache Hadoop in the world  Largest contributor to the open source Hadoop ecosystem  More committers on staff than any other company  More than 100 customers across a wide variety of industries  Strong growth in revenue and new accounts Mission: To help organizations to profit from all of their data Leadership Strong executive team with proven abilities. Mike Olson CEO Kirk Dunn COO Charles Zedlewski VP, Product Mary Rorabaugh CFO Jeff Hammerbacher Chief Scientist Amr Awadalla VP Engineering Doug Cutting Chief Architect Omer Trajman VP, Customer Solutions
  3. Agenda ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or

    redistribution without written permission is prohibited. 4 1 Introductions & housekeeping 2 Why Apache Hadoop? 3 Where is the technology and market headed? 4 Brief Cloudera commercial 5 Wrap up & Q&A
  4. What is Apache Hadoop? ©2011 Cloudera, Inc. All Rights Reserved.

    Confidential. Reproduction or redistribution without written permission is prohibited. 5 Hadoop Distributed File System (HDFS) File Sharing & Data Protection Across Physical Servers MapReduce Distributed Computing Across Physical Servers Flexibility  A single repository for storing processing & analyzing any type of data  Not bound by a single schema Scalability  Scale-out architecture divides workloads across multiple nodes  Flexible file system eliminates ETL bottlenecks Low Cost  Can be deployed on commodity hardware  Open source platform guards against vendor lock Hadoop is a platform for data storage and processing that is…  Scalable  Fault tolerant  Open source CORE HADOOP COMPONENTS
  5. If you are a data scientist… ©2011 Cloudera, Inc. All

    Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited. 6 • Fraud • Credit risk • Trading strategies • Customer preferences • Device failures • Systems • Networks • Security • Media • Investment portfolios
  6. What this means for you ©2011 Cloudera, Inc. All Rights

    Reserved. Confidential. Reproduction or redistribution without written permission is prohibited. 7 More algos More data
  7. What this means for you ©2011 Cloudera, Inc. All Rights

    Reserved. Confidential. Reproduction or redistribution without written permission is prohibited. 8 Up front design Just in time
  8. What this means for you ©2011 Cloudera, Inc. All Rights

    Reserved. Confidential. Reproduction or redistribution without written permission is prohibited. 9 Data committee Data entrepreneur
  9. If you are a data processing architect… ©2011 Cloudera, Inc.

    All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited. 10 • Sessionize clicks • Mediate content • Reconcile trades • Calculate value at risk • Map genomes
  10. What this means for you ©2011 Cloudera, Inc. All Rights

    Reserved. Confidential. Reproduction or redistribution without written permission is prohibited. 11 Structured data All Data
  11. What this means for you ©2011 Cloudera, Inc. All Rights

    Reserved. Confidential. Reproduction or redistribution without written permission is prohibited. 12 Scaling with a PhD Scaling for Free
  12. What this means for you ©2011 Cloudera, Inc. All Rights

    Reserved. Confidential. Reproduction or redistribution without written permission is prohibited. 13 Silos Sharing
  13. The new alchemy ©2011 Cloudera, Inc. All Rights Reserved. Confidential.

    Reproduction or redistribution without written permission is prohibited.
  14. Agenda ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or

    redistribution without written permission is prohibited. 15 1 Introductions & housekeeping 2 Why Apache Hadoop? 3 Where is the technology and market headed? 4 Brief Cloudera commercial 5 Wrap up & Q&A
  15. The benchmark ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction

    or redistribution without written permission is prohibited.
  16. What did Google do? ©2011 Cloudera, Inc. All Rights Reserved.

    Confidential. Reproduction or redistribution without written permission is prohibited. Dremel Dremel Evenflow MySQL Gateway Sawzall Bigtable Chubby MapReduce / GFS Evenflow
  17. What did Google do? ©2011 Cloudera, Inc. All Rights Reserved.

    Confidential. Reproduction or redistribution without written permission is prohibited. Dremel Dremel Evenflow MySQL Gateway Sawzall Bigtable Chubby MapReduce / GFS Evenflow Store data
  18. What did Google do? ©2011 Cloudera, Inc. All Rights Reserved.

    Confidential. Reproduction or redistribution without written permission is prohibited. Dremel Dremel Evenflow MySQL Gateway Sawzall Bigtable Chubby MapReduce / GFS Evenflow Process data
  19. What did Google do? ©2011 Cloudera, Inc. All Rights Reserved.

    Confidential. Reproduction or redistribution without written permission is prohibited. Dremel Dremel Evenflow MySQL Gateway Sawzall Bigtable Chubby MapReduce / GFS Evenflow Ingest data
  20. What did Google do? ©2011 Cloudera, Inc. All Rights Reserved.

    Confidential. Reproduction or redistribution without written permission is prohibited. Dremel Dremel Evenflow MySQL Gateway Sawzall Bigtable Chubby MapReduce / GFS Evenflow Serve data
  21. What did Google do? ©2011 Cloudera, Inc. All Rights Reserved.

    Confidential. Reproduction or redistribution without written permission is prohibited. Dremel Dremel Evenflow MySQL Gateway Sawzall Bigtable Chubby MapReduce / GFS Evenflow High level domain specific language
  22. What did Google do? ©2011 Cloudera, Inc. All Rights Reserved.

    Confidential. Reproduction or redistribution without written permission is prohibited. Dremel Dremel Evenflow MySQL Gateway Sawzall Bigtable Chubby MapReduce / GFS Evenflow Chain together complex workloads
  23. What did Google do? ©2011 Cloudera, Inc. All Rights Reserved.

    Confidential. Reproduction or redistribution without written permission is prohibited. Dremel Dremel Evenflow MySQL Gateway Sawzall Bigtable Chubby MapReduce / GFS Evenflow Schedule them
  24. What did Google do? ©2011 Cloudera, Inc. All Rights Reserved.

    Confidential. Reproduction or redistribution without written permission is prohibited. Dremel Dremel Evenflow MySQL Gateway Sawzall Bigtable Chubby MapReduce / GFS Evenflow Columnar storage + metadata
  25. What did Google do? ©2011 Cloudera, Inc. All Rights Reserved.

    Confidential. Reproduction or redistribution without written permission is prohibited. Dremel Dremel Evenflow MySQL Gateway Sawzall Bigtable Chubby MapReduce / GFS Evenflow End users query data
  26. What did Google do? ©2011 Cloudera, Inc. All Rights Reserved.

    Confidential. Reproduction or redistribution without written permission is prohibited. Dremel Dremel Evenflow MySQL Gateway Sawzall Bigtable Chubby MapReduce / GFS Evenflow Coordinate within system
  27. The pattern repeated ©2011 Cloudera, Inc. All Rights Reserved. Confidential.

    Reproduction or redistribution without written permission is prohibited. HiPal Hive Databee Databee Scribe Hive HBase Zookeeper
  28. The pattern repeated ©2011 Cloudera, Inc. All Rights Reserved. Confidential.

    Reproduction or redistribution without written permission is prohibited. Hive Oozie Oozie Data Highway Pig & Hive HBase Zookeeper
  29. The pattern repeated ©2011 Cloudera, Inc. All Rights Reserved. Confidential.

    Reproduction or redistribution without written permission is prohibited. Azkaban Azkaban Sqoop Kafka Pig Voldemort Zookeeper
  30. Open source cambrian explosion ©2011 Cloudera, Inc. All Rights Reserved.

    Confidential. Reproduction or redistribution without written permission is prohibited. 31
  31. Assembled for enterprise use in CDH ©2011 Cloudera, Inc. All

    Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited. 32 Customer and partner requirements Prioritization based on customer value, cost and community readiness Enhancements written and contributed to Apache projects HDFS HBase Flume, etc Releases selected or cut Integration, testing, & backporting GA release! Beta cycle, more backporting Hadoop 0.20.2 +923 HBase 0.90.1 +15 Hive 0.7 +22 Pig 0.8 +20 Flume 0.9.3 +17 Oozie 20.2 +31 Hue 1.2.0 +0 Sqoop 1.2 +24 Zookeeper 3.3.3 +12
  32. Assembled for enterprise use in CDH ©2011 Cloudera, Inc. All

    Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited. Hue Hue Hive Oozie Oozie Sqoop Flume Hive / Pig HBase Zookeeper Cloudera’s Distribution Including Apache Hadoop
  33. Similar to Linux ©2011 Cloudera, Inc. All Rights Reserved. Confidential.

    Reproduction or redistribution without written permission is prohibited. 34
  34. Joined by a commercial ecosystem ©2011 Cloudera, Inc. All Rights

    Reserved. Confidential. Reproduction or redistribution without written permission is prohibited. 35 Packaging, testing Sqoop frame- work, adapter s Drivers, language enhancements, testing
  35. Agenda ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or

    redistribution without written permission is prohibited. 37 1 Introductions & housekeeping 2 Why Apache Hadoop? 3 Where is the technology and market headed? 4 Brief Cloudera commercial 5 Wrap up & Q&A
  36. Cloudera University How can Cloudera help? ©2011 Cloudera, Inc. All

    Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited. 38 Logs Files Web Data Relational Databases IDE’s BI / Analytics Enterprise Reporting Enterprise Data Warehouse Low-Latency Serving Systems Web Application Management Tools OPERATORS ENGINEERS ANALYSTS BUSINESS USERS CUSTOMERS Cloudera’s Distribution Including Apache Hadoop (CDH) Cloudera Enterprise  Cloudera Management Suite  Cloudera Support
  37. What is Cloudera Enterprise? ©2011 Cloudera, Inc. All Rights Reserved.

    Confidential. Reproduction or redistribution without written permission is prohibited. 39  Simplify and Accelerate Hadoop Deployment  Reduce Adoption Costs and Risks  Lower the Cost of Administration  Increase the Transparency Control of Hadoop  Leverage the Experience of Our Experts Cloudera Enterprise makes open source Hadoop enterprise-easy EFFECTIVENESS Ensuring You Get Value From Your Hadoop Deployment EFFICIENCY Enabling You to Affordably Run Hadoop in Production Cloudera Management Suite Comprehensive Software Toolset for Hadoop Administration Production-Level Support Our Team of Experts On- Call to Help You Meet Your SLAs CLOUDERA ENTERPRISE COMPONENTS
  38. Agenda ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or

    redistribution without written permission is prohibited. 40 1 Introductions & housekeeping 2 Why Apache Hadoop? 3 Where is the technology and market headed? 4 Brief Cloudera commercial 5 Wrap up & Q&A
  39. In conclusion ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction

    or redistribution without written permission is prohibited. 41 • Apache Hadoop represents a big step forward for enterprises in all industries • This new, open big data stack is your point of departure, benefitting from years of R&D • The user, developer & vendor communities have all rallied around • Cloudera can help you run Apache Hadoop in trial or production • Learn more here, or at Hadoop World in New York City, November 8-9
  40. ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution

    without written permission is prohibited. 42 We appreciate your time and interest in For Additional Information: cloudera.com +1 (888) 789-1488 [email protected] twitter.com/ cloudera facebook.com/ cloudera