redistribution without written permission is prohibited. 2 1 Introductions & housekeeping 2 Why Apache Hadoop? 3 Where is the technology and market headed? 4 Brief Cloudera commercial 5 Wrap up & Q&A
Reproduction or redistribution without written permission is prohibited. 3 How We Do It We deliver relevant products and services. A distribution of Apache Hadoop that is tested, certified and supported Comprehensive support and professional service offerings A suite of management software for Hadoop operations Training and certification programs for developers, administrators, managers and data scientists Technical Team Unmatched knowledge and experience. Founders, committers and contributors to Hadoop A wealth of experience in the design and delivery of production software Credentials The Apache Hadoop experts. Number 1 distribution of Apache Hadoop in the world Largest contributor to the open source Hadoop ecosystem More committers on staff than any other company More than 100 customers across a wide variety of industries Strong growth in revenue and new accounts Mission: To help organizations to profit from all of their data Leadership Strong executive team with proven abilities. Mike Olson CEO Kirk Dunn COO Charles Zedlewski VP, Product Mary Rorabaugh CFO Jeff Hammerbacher Chief Scientist Amr Awadalla VP Engineering Doug Cutting Chief Architect Omer Trajman VP, Customer Solutions
redistribution without written permission is prohibited. 4 1 Introductions & housekeeping 2 Why Apache Hadoop? 3 Where is the technology and market headed? 4 Brief Cloudera commercial 5 Wrap up & Q&A
Confidential. Reproduction or redistribution without written permission is prohibited. 5 Hadoop Distributed File System (HDFS) File Sharing & Data Protection Across Physical Servers MapReduce Distributed Computing Across Physical Servers Flexibility A single repository for storing processing & analyzing any type of data Not bound by a single schema Scalability Scale-out architecture divides workloads across multiple nodes Flexible file system eliminates ETL bottlenecks Low Cost Can be deployed on commodity hardware Open source platform guards against vendor lock Hadoop is a platform for data storage and processing that is… Scalable Fault tolerant Open source CORE HADOOP COMPONENTS
All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited. 10 • Sessionize clicks • Mediate content • Reconcile trades • Calculate value at risk • Map genomes
redistribution without written permission is prohibited. 15 1 Introductions & housekeeping 2 Why Apache Hadoop? 3 Where is the technology and market headed? 4 Brief Cloudera commercial 5 Wrap up & Q&A
Confidential. Reproduction or redistribution without written permission is prohibited. Dremel Dremel Evenflow MySQL Gateway Sawzall Bigtable Chubby MapReduce / GFS Evenflow
Confidential. Reproduction or redistribution without written permission is prohibited. Dremel Dremel Evenflow MySQL Gateway Sawzall Bigtable Chubby MapReduce / GFS Evenflow Store data
Confidential. Reproduction or redistribution without written permission is prohibited. Dremel Dremel Evenflow MySQL Gateway Sawzall Bigtable Chubby MapReduce / GFS Evenflow Process data
Confidential. Reproduction or redistribution without written permission is prohibited. Dremel Dremel Evenflow MySQL Gateway Sawzall Bigtable Chubby MapReduce / GFS Evenflow Ingest data
Confidential. Reproduction or redistribution without written permission is prohibited. Dremel Dremel Evenflow MySQL Gateway Sawzall Bigtable Chubby MapReduce / GFS Evenflow Serve data
Confidential. Reproduction or redistribution without written permission is prohibited. Dremel Dremel Evenflow MySQL Gateway Sawzall Bigtable Chubby MapReduce / GFS Evenflow High level domain specific language
Confidential. Reproduction or redistribution without written permission is prohibited. Dremel Dremel Evenflow MySQL Gateway Sawzall Bigtable Chubby MapReduce / GFS Evenflow Chain together complex workloads
Confidential. Reproduction or redistribution without written permission is prohibited. Dremel Dremel Evenflow MySQL Gateway Sawzall Bigtable Chubby MapReduce / GFS Evenflow Schedule them
Confidential. Reproduction or redistribution without written permission is prohibited. Dremel Dremel Evenflow MySQL Gateway Sawzall Bigtable Chubby MapReduce / GFS Evenflow Columnar storage + metadata
Confidential. Reproduction or redistribution without written permission is prohibited. Dremel Dremel Evenflow MySQL Gateway Sawzall Bigtable Chubby MapReduce / GFS Evenflow End users query data
Confidential. Reproduction or redistribution without written permission is prohibited. Dremel Dremel Evenflow MySQL Gateway Sawzall Bigtable Chubby MapReduce / GFS Evenflow Coordinate within system
Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited. 32 Customer and partner requirements Prioritization based on customer value, cost and community readiness Enhancements written and contributed to Apache projects HDFS HBase Flume, etc Releases selected or cut Integration, testing, & backporting GA release! Beta cycle, more backporting Hadoop 0.20.2 +923 HBase 0.90.1 +15 Hive 0.7 +22 Pig 0.8 +20 Flume 0.9.3 +17 Oozie 20.2 +31 Hue 1.2.0 +0 Sqoop 1.2 +24 Zookeeper 3.3.3 +12
Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited. Hue Hue Hive Oozie Oozie Sqoop Flume Hive / Pig HBase Zookeeper Cloudera’s Distribution Including Apache Hadoop
Reserved. Confidential. Reproduction or redistribution without written permission is prohibited. 35 Packaging, testing Sqoop frame- work, adapter s Drivers, language enhancements, testing
redistribution without written permission is prohibited. 37 1 Introductions & housekeeping 2 Why Apache Hadoop? 3 Where is the technology and market headed? 4 Brief Cloudera commercial 5 Wrap up & Q&A
Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited. 38 Logs Files Web Data Relational Databases IDE’s BI / Analytics Enterprise Reporting Enterprise Data Warehouse Low-Latency Serving Systems Web Application Management Tools OPERATORS ENGINEERS ANALYSTS BUSINESS USERS CUSTOMERS Cloudera’s Distribution Including Apache Hadoop (CDH) Cloudera Enterprise Cloudera Management Suite Cloudera Support
Confidential. Reproduction or redistribution without written permission is prohibited. 39 Simplify and Accelerate Hadoop Deployment Reduce Adoption Costs and Risks Lower the Cost of Administration Increase the Transparency Control of Hadoop Leverage the Experience of Our Experts Cloudera Enterprise makes open source Hadoop enterprise-easy EFFECTIVENESS Ensuring You Get Value From Your Hadoop Deployment EFFICIENCY Enabling You to Affordably Run Hadoop in Production Cloudera Management Suite Comprehensive Software Toolset for Hadoop Administration Production-Level Support Our Team of Experts On- Call to Help You Meet Your SLAs CLOUDERA ENTERPRISE COMPONENTS
redistribution without written permission is prohibited. 40 1 Introductions & housekeeping 2 Why Apache Hadoop? 3 Where is the technology and market headed? 4 Brief Cloudera commercial 5 Wrap up & Q&A
or redistribution without written permission is prohibited. 41 • Apache Hadoop represents a big step forward for enterprises in all industries • This new, open big data stack is your point of departure, benefitting from years of R&D • The user, developer & vendor communities have all rallied around • Cloudera can help you run Apache Hadoop in trial or production • Learn more here, or at Hadoop World in New York City, November 8-9
without written permission is prohibited. 42 We appreciate your time and interest in For Additional Information: cloudera.com +1 (888) 789-1488 [email protected] twitter.com/ cloudera facebook.com/ cloudera