Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Global Big Data Conference 2016 - Apache Hadoop Is Retro: Unlocking Business Value

John Mertic
September 01, 2016

Global Big Data Conference 2016 - Apache Hadoop Is Retro: Unlocking Business Value

In 2006, Apache Hadoop was a small project deployed on 20 machines at Yahoo and by 2010 it was running on 45K machines. Apache Hadoop truly had become the backbone of Yahoo’s data infrastructure. One would think by 2016 that Apache Hadoop would be the backbone of data infrastructures for all enterprises, but widespread adoption has been shockingly low. Apache Hadoop and Big Data proponents recognize that this technology has not achieved its game-changing business potential. Gartner puts it well: "Despite considerable hype and reported successes for early adopters, 54 percent of survey respondents report no plans to invest [in Hadoop] at this time, while only 18 percent have plans to invest in Hadoop over the next two years," said Nick Heudecker, research director at Gartner. "Furthermore, the early adopters don't appear to be championing for substantial Hadoop adoption over the next 24 months; in fact, there are fewer who plan to begin in the next two years than already have." - Gartner Survey Highlights Challenges to Hadoop Adoption While proven a as popular platform among developers requiring a technology that can power large, complex applications, the rapid, and often healthy, innovation happening with Hadoop components and Hadoop Distros can also slow big data ecosystem development and limits adoption. In this presentation, John Mertic, director of program management for ODPi at The Linux Foundation, new developments that help unlock more business value for Apache Hadoop initiatives. In ODPi’s view, the industry needs more open source-based big data technologies and standards so application developers and enterprises are able to more easily build data-driven applications. This includes standardizing the commodity work of the components of an Hadoop distribution to spur the creation of more applications, which is a boost for the entire ecosystem. What's the takeaway for the audience? Attendees will learn: Why widespread adoption of Apache Hadoop in the enterprise has been low New developments enabling increased business value for Apache Hadoop initiatives The need for Standardizing the commodity work of the components of an Hadoop distribution The need for a common platform against which to certify apps to reduce the complexities of interoperability The benefits of compatibility and standardization across distribution and application offerings for management and integration - See more at: http://globalbigdataconference.com/70/santa-clara/4th-annual-global-big-data-conference/speaker-details/41151/john-mertic.html#sthash.EjFOdZpm.dpuf

John Mertic

September 01, 2016
Tweet

More Decks by John Mertic

Other Decks in Technology

Transcript

  1. APACHE HADOOP IS RETRO: UNLOCKING BUSINESS VALUE The open ecosystem

    of big data. Apache Hadoop Is Retro: Unlocking Business Value John Mertic, Director of Program Management, ODPi @jmertic - @ODPIorg The open ecosystem of big data.
  2. APACHE HADOOP IS RETRO: UNLOCKING BUSINESS VALUE Hadoop - 10

    years ago to now Hadoop 10 years ago… • Primarily a data management and processing tool • Focus was processing large data sets in a distributed manner on commodity hardware with high redundancy and easy failover Hadoop is now… • Amorphous technology for data management, access, governance, and security • Cornerstone for an organization’s data insight and actions strategy
  3. APACHE HADOOP IS RETRO: UNLOCKING BUSINESS VALUE How the Hadoop

    stack has grown HDFS Data Processing Data Storage Data Access Data Management YARN Interactive SQL Machine Learning Streaming Data Other Data Flows Monitoring Security Governance Workflow MapReduce HDFS/Hadoop Compatible Filesystems Column Data Stores (HBase) MapReduce Hadoop 10 years ago Hadoop today
  4. APACHE HADOOP IS RETRO: UNLOCKING BUSINESS VALUE Hadoop Apache Project

    Commercial Support Tracker April 2016 Projects Amazon Cloudera HortonWorks IBM MapR Number of Supporters Apache HDFS 2.7.2 2.6 2.7.1 2.7.1 API 5 Apache Mapreduce 2.6.0 2.6 2.7.1 2.7.1 2.7.1 5 Apache YARN 2.6.0 2.6 2.7.1 2.7.1 2.7.1 5 Apache Avro 1.7.5 1.7.6 1.7.5 1.7.7 1.7.4 5 Apache Flume 1.5.0* 1.6 1.5.3 1.5.2 1.6.0 5 Apache HBase 1.2 1.2 1.1.2 1.1.1 1.1 5 Apache Hive 1.0 1.1 1.2.1 1.2.1 1.2.1 5 Apache Oozie 4.2.0 4.1 4.2.0 4.2.0 4.2.0 5 Apache Parquet 1.5.0 1.5 1.6.0 2.2.0 1.8.1 5 Apache Pig 0.14 0.12 0.15.0 0.15 0.15 5 Apache Solr 4.2.0 4.10.3 5.2.1 5.1.0 4.10.3 5 Apache Spark 1.6.1 1.6 1.6.0 1.5.1 1.5.2 5 Apache Sqoop 1.4.6 1.4.6 1.4.6 1.4.6 1.4.6 (1.99.6) 5 Apache Zookeeper 3.4.6* 3.4.5 3.4.6 3.4.6 3.4.5 5 Apache Kafka 0.8.1.1 0.9 0.9.0 0.8.2.1 4 Apache Mahout 0.11.1* 0.9 0.9.0 .11 4 Hue 3.7.1 3.1 2.6.1 3.9.0 4 Apache DataFu 1.0.0 1.1 1.3.0 3 Cascading 2.5 3.0.1 2.5 3 Adrian, M. (2016, April 27). Hadoop Apache Project Commercial Support Tracker April 2016 - Merv Adrian. Retrieved April 29, 2016, from http://blogs.gartner.com/merv-adrian/2016/04/27/hadoop-apache-project-commercial-support-tracker-april-2016/
  5. APACHE HADOOP IS RETRO: UNLOCKING BUSINESS VALUE Impacts of Hadoop

    stack growth • Traditional “core Hadoop” is being redefined • Many functional areas without de-facto choice - ecosystem of projects has many overlaps • Data processing strategies varied and nuanced • Downstream consumers have confusion of direction because of “hype cycle”, and feel abandoned when they focus on stable components.
  6. APACHE HADOOP IS RETRO: UNLOCKING BUSINESS VALUE Fast, unbridled growth

    has hurt adoption Source: http://www.gartner.com/newsroom/id/3051717
  7. APACHE HADOOP IS RETRO: UNLOCKING BUSINESS VALUE Why is it

    so low? Product maturity issues End users want a feedback loop to correct Engaging upstream in ASF is challenging for end users
  8. APACHE HADOOP IS RETRO: UNLOCKING BUSINESS VALUE More ASF engagement

    Less ASF Engagement Hadoop Platforms Hadoop Components App Vendors Solution Providers End Users Ecosystem Engagement
  9. APACHE HADOOP IS RETRO: UNLOCKING BUSINESS VALUE Upstream Projects Produce

    End-Users Want Predictability Stability Consistency Downstream Projects
  10. APACHE HADOOP IS RETRO: UNLOCKING BUSINESS VALUE Challenges Standardization Outcomes

    Limited flexibility in tool options Inconsistent/lack of support of stable tools Ecosystem incompatibility across product lines Ability to match right tools to the task Lowered costs of support - more time innovate Broader offering of tools to customer base Downstream challenges solved through standardization
  11. APACHE HADOOP IS RETRO: UNLOCKING BUSINESS VALUE The shipping container

    standard increased trade 790% over 20 years “Estimating the Effects of the Container Revolution on World Trade“, by Daniel Bernhofen, Zouheir El-Sahli and Richard Kneller, Lund University, Working Paper 2013:4, February 2013
  12. APACHE HADOOP IS RETRO: UNLOCKING BUSINESS VALUE Cost or Activity

    Before Container Standard After Container Standard Improvement Loading Cost per Ton $5.83 $0.16 3,544% Ton per Hour Loaded onto Ships 1.7 30 1,665% Theft of Cargo Rampant Minimal Insurance Cost High Low The Economist, Why have containers boosted trade so much? http://www.economist.com/blogs/economist-explains/2013/05/economist-explains-14
  13. APACHE HADOOP IS RETRO: UNLOCKING BUSINESS VALUE “A sustainable environment

    requires increased productivity; productivity comes about by innovation; innovation is the result of investment; and investment is only possible when a reasonable return is expected. The efficient use of money is more assured when there are known standards in which to operate.” Robert W. Lane, Chairman & CEO, Deere & Company World Standards Day 2001 Speech
  14. APACHE HADOOP IS RETRO: UNLOCKING BUSINESS VALUE Why do this

    for Hadoop? • Hadoop continues to be a game changing technology • Incredible flexibility • Heavy investment and focus • Trend in business is to become more data and insight driven