processing on large clusters," in Proc. the 6th conference on Symposium on Operating Systems Design & Implementation, 2004, pp. 137-150. [2] T. White, Hadoop: The Definitive guide, 1st ed.: O'Reilly Media, 2010. [3] S. Ghemawat, H. Gobioff, and S. T. Leung, "The Google file system," in Proc. the nineteenth ACM symposium on Operating systems principles, 2003, pp. 29-43. [4] A. Rajaraman and J. Ullman, Mining of Massive Datasets, Cambridge University Press, 2011, ch. 2. [5] A. M. Middleton, Data-Intensive Technologies for Cloud Computing. B. Furht and A. Escalante, Handbook of Cloud Computing, New York: Springer, 2010, ch. 5. [6] S. Chen and S. W. Schlosser, "Map-Reduce Meets Wider Varieties of Applications," Intel Research Pittsburgh, IRP-TR-08-05, 2008. [7] K. S. Cho et al., "Opinion Mining in MapReduce Framework," Secure and Trust Computing, Data Management, and Applications, 2011, pp. 50-55. [8] R. Cordeiro et al., "Clustering Very Large Multi-dimensional Datasets with MapReduce," in Proc. the International Conference on Knowledge Discovery and Data Mining, 2011, pp. 690-698. [9] M. Niemenmaa, A. Kallio, A. Schumacher, E. Klemela, and K. Heljanko, "Hadoop-BAM: directly manipulating next generation sequencing data in the cloud," Oxford Journals, Bioinformatics, vol. 28, no. 6, pp. 876-877, 2012. [10] J. Chandar, "Join Algorithms using Map/Reduce," M.S. thesis, School of Informatics, University of Edinburgh, United Kingdom 2010. [11] N. S. Srirama, P. Jakovits, and E. Vainikko, "Adapting scientific computing problems to clouds using MapReduce," Future Generation Computer Systems, vol. 28, no. 1, pp. 184-192, 2012. [12] L. Kaufman and P. Rousseeuw, Finding Groups in Data An Introduction to Cluster Analysis. New York: Wiley Interscience, 1990. [13] J. R. Shewchuk, "An introduction to the conjugate gradient method without the agonizing pain," Technical Report, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, 1994. [14] D. Jiang, B. Ooi, L. Shi, and S. Wu, "The Performance of MapReduce: An In-depth Study," PVLDB, vol. 3, no. 1, pp. 472-483, 2010. [15] Y. Bu, B. Howe, M. Balazinska, and M. D. Ernst, "HaLoop: Efficient Iterative Data Processing on Large Clusters," PVLDB, vol. 3, no. 1, pp. 285--296, 2010. [16] E. Elnikety, T. Elsayed, and H. E. Ramadan," iHadoop: Asynchronous Iterations for MapReduce," in Proc. the 2011 IEEE Third International Conference on Cloud Computing Technology and Science, 2011, pp. 81- 90. [17] Y. Zhang, Q. Gao, L. Gao, and C. Wang, "iMapReduce: A Distributed Computing Framework for Iterative Computation," in Proc. the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum, 2011, pp. 1112-1121. [18] A. Dave, W. Lu, J. Jackson, and R. Barga, "CloudClustering: Toward an iterative data processing pattern on the cloud," in Proc. the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum, 2011, pp. 1132-1137. [19] J. Ekanayake et al., "Twister: A Runtime for Iterative MapReduce," in Proc. the 19th ACM International Symposium on High Performance Distributed Computing, 2010, pp. 810-818. [20] A. Thusoo et al., "Hive – A Petabyte Scale Data Warehouse Using Hadoop," in Proc. 26th IEEE International Conference on Data Engineering, Long Beach, California, 2010, pp. 996-1005. [21] M. Eltabakh, Y. Tian, F. Gemulla, A. Krettek, and J. McPherson, "CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop," PVLDB, vol. 4, no. 9, pp. 575-585, 2011. [22] T. Condie et al., "MapReduce Online," in Proc. the 7th USENIX Conference on Networked Systems Design and Implementation, SWan Jose, California, 2010, pp. 313-327. [23] M. Stonebraker et al., "MapReduce and Parallel DBMSs: Friends or Foes ?" Communications of the ACM, vol. 53, no. 1, pp. 64-71, 2010. [24] A. Abouzeid, K. Bajda Pawlikowski, D. Abadi, A. Silberschatz, and A. Rasin, "HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads," PVLDB, vol. 2, no. 1, pp. 922-933, 2009. [25] J. Dittrich et al., "Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing)," PVLDB, vol. 3, no. 1, pp. 518-529, 2010. [26] M. Isard, M. Budiu, and Y. Yu, "Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks," in Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems, 2007, pp. 59-72. [27] Y. Yu et al., "DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language," in Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, San Diego, 2008, pp. 1-14. [28] A. Floratou, J. Patel, E. Shekita, and S. Tata, "Column Oriented Storage Techniques for MapReduce," PVLDB, vol. 4, no. 7, pp. 419-429, 2011. [29] E. Jahani, M. Cafarella, and C. R´ e, "Automatic Optimization for MapReduce Programs," PVLDB, vol. 4, no. 6, pp. 385-396, 2010. [30] S. Chen, "Cheetah: a high performance, custom data warehouse on top of MapReduce," PVLDB, vol. 3, no. 2, pp. 1459-1468, 2010. [31] Y. Kwon, M. Balazinska, B. Howe, and J. Rolia, "A study of skew in mapreduce applications," presented in the 5th Open Cirrus Summit, 2011. [32] Y. Kwon, M. Balazinska, B. Howe, and J. Rolia, "Skew-resistant parallel processing of feature-extracting scientific user-defined functions," in Proceedings of the 1st ACM Symposium on Cloud Computing, 2010, pp. 75-86. [33] B. Gufler, N. Augsten, A. Reiser, and A. Kemper, "Handing Data Skew in MapReduce," in Proceedings of The First International Conference on Cloud Computing and Services Science, 2011, pp. 574-583. [34] Y. Kwon, M. Balazinska, B. Howe, and J. Rolia, "SkewTune in action: mitigating skew in MapReduce applications," PVLDB, vol. 5, no. 12, pp. 1934-1937, 2012. [35] J. Leverich and C. Kozyrakis, "On the Energy (In) efficiency of Hadoop Clusters," ACM SIGOPS Operating Systems Review, vol. 44, no. 1, pp. 61-65, 2010. [36] W. Lang and J. M. Patel, "Energy Management for MapReduce Clusters," PVLDB, vol. 3, no. 1, pp. 129-139, 2010. [37] Y. Chen, S. Alspaugh, D. Borthakur, and R. Katz, "Energy efficiency for large-scale mapreduce workloads with significant interactive analysis," in Proc. the 7th ACM european Conference on Computer Systems, Bern, Switzerland, 2012, pp. 43-56. Abdelrahman Elsayed was born in Egypt 1977; he received his master and BSc in 2008 and 2000 resp. from Information System Dept., Faculty of Computers and Information, Cairo University, Egypt. His research interests are data mining and data intensive algorithms. Now he is working as Research assistant at Central Laboratory for agriculture expert systems. Mr. Elsayed is member of the IACSIT. Osama Ismail was born in Egypt 1969; He received a MSc. degree in Computer Science and Information from Cairo University in 1997 and a PhD degree in Computer Science from Cairo University in 2008. He is a Lecturer in Faculty of Computers and Information, Cairo University. His fields of interest include, Cloud Computing, Multi-agent Systems, Service Oriented Architectures, and Data Mining. Mohamed E. El-Sharkawi was born in Egypt; he is currently a professor in Faculty of Computers and Information, Cairo University. He received his Doctor of Engineering and Master of Engineering in 1991 and 1988 resp. in Computer Science and Communication Engineering from Kyushu University, Fukuoka, Japan. He received his BSc. in Systems and Computer Engineering from the Faculty of Engineering, Al-Azhar University, Cairo, Egypt in 1981. His research interests are database systems, query processing and optimization, social networks, and data mining. 39 International Journal of Computer and Electrical Engineering, Vol. 6, No. 1, February 2014