data. Economical Hadoop distributes the data and processing across clusters of commonly available computers. These clusters can number into the thousands of nodes. Eﬃcient Hadoop can process the distributed data in parallel on the nodes where the data is located. Reliable Hadoop automa<cally maintains mul<ple copies of data and automa<cally redeploys compu<ng tasks based on failures.
-‐ 30 nodes Facebook -‐ Use for repor<ng and analy<cs -‐ 320 nodes FOX -‐ Use for log analysis and data mining -‐ 140 nodes Last.fm -‐ Use for chart calcula<on and log analysis -‐ 27 nodes New York Times -‐ Use for large scale image conversion -‐ 100 nodes Yahoo! -‐ Use for Ad systems and Web search -‐ 10.000 nodes Who is using it?
than 150 parameters. • No security against accidents. User iden<ﬁca<on added a=er Last.fm deleted a ﬁleystem by accident. • HDFS is primarily designed for streaming access of large ﬁles. Reading through small ﬁles normally causes lots of seeks and lots of hopping from datanode to datanode to retrieve each small ﬁle. • Steep learning curve. According to Facebook, using Hadoop was not easy for end users, especially for the ones who were not familiar with MapReduce. Challenges