Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Introduction To Hadoop
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Marc Cluet
June 18, 2013
Technology
120
1
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Introduction To Hadoop
Marc Cluet
June 18, 2013
More Decks by Marc Cluet
See All by Marc Cluet
FOSDEM'14 - Autoscaling Best Practices
lynxman
1
120
A metadata ocean in Chef and Puppet
lynxman
0
62
Rackspace Hack Night - Vagrant & Packer
lynxman
0
150
Innovation in the Cloud - Rackspace Zurich Event
lynxman
0
110
Introduction to DevOps - Rackspace Tech Night
lynxman
1
83
SSH That Wonderful Thing
lynxman
1
92
Hadoop Operations
lynxman
0
120
Networking & DNS 101
lynxman
0
100
Juju and Puppet - Rapid Harmonious Deployment
lynxman
0
110
Other Decks in Technology
See All in Technology
從開發到部署全都交給 AI:實作 AI 驅動的自動化流程
appleboy
0
100
SteampipeとExcel Power QueryでAWS構成定義書の作成を自動化する
jhashimoto
0
170
入門!AWS Blocks
ysuzuki
1
170
[チョークトーク資料]AWS DevOps Agent を使いこなす / AWS Dev Ops Agent Chalk Talk AWS Summit Japan 2026
kinunori
3
730
WebGIS AI Agentの紹介
_shimizu
0
340
FPGAの開発コンペでZephyrを使ってみた
iotengineer22
0
180
自宅LLMの話
jacopen
1
700
IaC コードを資産へ:AWS CDK 社内ライブラリと横断展開 / aws-summit-japan-2026
gotok365
10
1.5k
iOS アプリの「これって不具合ですか?」を AI に調べてもらう
miichan
0
130
AIはどのように 組織のアジリティを変えるのか?
junki
4
1.1k
サイバーエージェントにおけるAI推進戦略と変革への取り組み
shotatsuge
0
410
Oracle AI Database@AWS:サービス概要のご紹介
oracle4engineer
PRO
4
3k
Featured
See All Featured
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
287
14k
Collaborative Software Design: How to facilitate domain modelling decisions
baasie
1
250
Being A Developer After 40
akosma
91
590k
So, you think you're a good person
axbom
PRO
2
2.1k
B2B Lead Gen: Tactics, Traps & Triumph
marketingsoph
0
160
The Curious Case for Waylosing
cassininazir
1
400
Why Your Marketing Sucks and What You Can Do About It - Sophie Logan
marketingsoph
0
170
Highjacked: Video Game Concept Design
rkendrick25
PRO
1
400
Measuring & Analyzing Core Web Vitals
bluesmoon
9
870
sira's awesome portfolio website redesign presentation
elsirapls
0
280
Why Mistakes Are the Best Teachers: Turning Failure into a Pathway for Growth
auna
0
160
30 Presentation Tips
portentint
PRO
1
330
Transcript
Marc Cluet – Lynx Consultants What’s behind Big Data
What we’ll cover? ¡ Understand Hadoop components ¡ Understand
different technologies involved ¡ Embrace Big Data! Lynx Consultants © 2013
What is Big Data? Lynx Consultants © 2013
What is Big Data? ¡ SQL has a limited ability
to process changing data § SQL schemas are the truth, data needs to fit that Lynx Consultants © 2013
What is Big Data? ¡ Big Data is the solution!
§ Data can be truly dynamic Lynx Consultants © 2013
What is Big Data? ¡ Big Data is the solution!
§ Data can be truly dynamic § Designed to handle Terabytes of data Lynx Consultants © 2013
What is Big Data? ¡ Big Data is the solution!
§ Data can be truly dynamic § Designed to handle Terabytes of data § Designed for fault tolerance and securing data Lynx Consultants © 2013
What is Big Data? ¡ Big Data is the solution!
§ Data can be truly dynamic § Designed to handle Terabytes of data § Designed for fault tolerance and securing data § Designed around exploiting hardware to the fullest Lynx Consultants © 2013
What is Big Data? ¡ Big Data is the solution!
§ Data can be truly dynamic § Designed to handle Terabytes of data § Designed for fault tolerance and securing data § Designed around exploiting hardware to the fullest § Designed around Map/Reduce Lynx Consultants © 2013
Who runs Big Data? ¡ A few small companies
Lynx Consultants © 2013
Who runs Big Data? ¡ A few small companies
Lynx Consultants © 2013
Who runs Big Data? ¡ A few small companies
Lynx Consultants © 2013
Who runs Big Data? ¡ A few small companies
Lynx Consultants © 2013
Who runs Big Data? ¡ A few small companies
Lynx Consultants © 2013
Who runs Big Data? ¡ A few small companies
Lynx Consultants © 2013
Who runs Big Data? ¡ A few small companies
Lynx Consultants © 2013
Who runs Big Data? ¡ A few small companies
Lynx Consultants © 2013
Who runs Big Data? ¡ A few small companies
Lynx Consultants © 2013
Who runs Big Data? ¡ A few small companies
Lynx Consultants © 2013
Who runs Big Data? ¡ A few small companies
Lynx Consultants © 2013
Who runs Big Data? ¡ A few small companies
Lynx Consultants © 2013
What is Hadoop? Lynx Consultants © 2013
What is Hadoop? ¡ Hadoop is one of the big
players for Big Data § Developed as an Open Source implementation to implement Google BigTable Lynx Consultants © 2013
What is Hadoop? ¡ Hadoop is one of the big
players for Big Data § Developed as an Open Source implementation to implement Google BigTable § Mainly developed at Yahoo! Lynx Consultants © 2013
What is Hadoop? ¡ Hadoop is one of the big
players for Big Data § Developed as an Open Source implementation to implement Google BigTable § Mainly developed at Yahoo! § Current companies behind it: Hortonworks and Cloudera Lynx Consultants © 2013
What are the features of Hadoop? ¡ HDFS – Hadoop
Distributed File System § HDFS is a distributed filesystem across many nodes § Has many copies of your data (default: 3) § If one node goes down makes sure all the data is rebalanced Lynx Consultants © 2013
What are the features of Hadoop? ¡ HDFS – Hadoop
Distributed File System Lynx Consultants © 2013
What are the features of Hadoop? ¡ HDFS – Hadoop
Distributed File System ¡ Hbase – Hadoop NoSQL Database § Schemaless Key-‐Value storage § All data exportable in JSON Lynx Consultants © 2013
What are the features of Hadoop? ¡ HDFS – Hadoop
Distributed File System ¡ Hbase – Hadoop NoSQL Database Lynx Consultants © 2013
What are the features of Hadoop? ¡ HDFS – Hadoop
Distributed File System ¡ Hbase – Hadoop NoSQL Database ¡ Map/Reduce – The key to it all § This was invented by Google § Given a dataset we Map all that match a criteria § Then we Reduce this to a result Lynx Consultants © 2013
What are the features of Hadoop? ¡ Map/Reduce – The
key to it all Lynx Consultants © 2013
What are the features of Hadoop? ¡ HDFS – Hadoop
Distributed File System ¡ Hbase – Hadoop NoSQL Database ¡ Map/Reduce – The key to it all ¡ Hive – SQL for NoSQL § Hive provides a SQL language called HiveSQL § Provides a good entrance for SQL users :) Lynx Consultants © 2013
What are the features of Hadoop? ¡ HDFS – Hadoop
Distributed File System ¡ Hbase – Hadoop NoSQL Database ¡ Map/Reduce – The key to it all ¡ Hive – SQL for NoSQL ¡ Pig – Map/Reduce made easy § Creates data results given a reduced language § Reinvents SQL somehow Lynx Consultants © 2013
What are the features of Hadoop? ¡ Hive Lynx
Consultants © 2013
What are the features of Hadoop? ¡ Pig Lynx
Consultants © 2013
What are the features of Hadoop? ¡ HDFS – Hadoop
Distributed File System ¡ Hbase – Hadoop NoSQL Database ¡ Map/Reduce – The key to it all ¡ Hive – SQL for NoSQL ¡ Pig – Map/Reduce made easy ¡ Flume – Fault Tolerant transport Lynx Consultants © 2013
What are the features of Hadoop? ¡ Flume §
Divides in Sources, Channels, Sinks § Can have multiple of everything, makes it fault tolerant § Many sources! ▪ Avro, Exec, JMS, Syslog, HTTP, NetCat, Your Own (Java) Lynx Consultants © 2013
What are the features of Hadoop? ¡ Flume §
Divides in Sources, Channels, Sinks § Can have multiple of everything, makes it fault tolerant § Many sources! § Many channels! ▪ Memory, File, Your Own (Java) Lynx Consultants © 2013
What are the features of Hadoop? ¡ Flume §
Divides in Sources, Channels, Sinks § Can have multiple of everything, makes it fault tolerant § Many sources! § Many channels! § Many sinks! ▪ Avro, HDFS, Logger, IRC, File, Hbase, ElasticSearch, S3, Community sinks, Your Own (Java) Lynx Consultants © 2013
What are the features of Hadoop? ¡ Flume Lynx
Consultants © 2013
How Hadoop looks like in a DC ¡ Components
§ Primary Namenode § Secondary Namenode § Data Node Lynx Consultants © 2013
How Hadoop looks like in a DC ¡ Components
§ Primary Namenode ▪ Controls all the cluster, knows where the data resides ▪ Runs the job tracker to keep track of Map/Reduce jobs ▪ Biggest point of failure, shadowing it is a potential option § Secondary Namenode § Data Node Lynx Consultants © 2013
How Hadoop looks like in a DC ¡ Components
§ Primary Namenode § Secondary Namenode ▪ Performs secondary cleanup options § Data Node Lynx Consultants © 2013
How Hadoop looks like in a DC ¡ Components
§ Primary Namenode § Secondary Namenode § Data Node ▪ Stores all the information ▪ Runs Map/Reduce Lynx Consultants © 2013
How Hadoop looks like in a DC ¡ Components
Lynx Consultants © 2013
Questions? Lynx Consultants © 2013