Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
HDFS
Search
jiangbo
October 23, 2012
0
180
HDFS
jiangbo
October 23, 2012
Tweet
Share
More Decks by jiangbo
See All by jiangbo
HDFS RAID
jiangbo
0
3.9k
Memcached内存管理
jiangbo
1
1.6k
awk
jiangbo
4
310
vim
jiangbo
8
500
Featured
See All Featured
Docker and Python
trallard
46
3.6k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.6k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
26
3k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
35
3.1k
Stop Working from a Prison Cell
hatefulcrawdad
271
21k
Speed Design
sergeychernyshev
32
1.1k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
162
15k
jQuery: Nuts, Bolts and Bling
dougneiner
64
7.9k
Testing 201, or: Great Expectations
jmmastey
45
7.7k
Building Flexible Design Systems
yeseniaperezcruz
329
39k
Unsuck your backbone
ammeep
671
58k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
29
1.9k
Transcript
HDFS ᇙതʢ镜圆ʣ Monday, October 22, 12
Outline • Overview • NameNode • DataNode • SecondaryNode •
Client Monday, October 22, 12
Overview Monday, October 22, 12
NN Data Structure • Valid fsname -> block list (keep
on disk) • Set of all valid blocks (inverted #1) • block -> machinelist (keep in memory, rebuilt from datanode blocksreport) • machine -> blocklist (inverted #2) • LRU cache of updated-heartbeat machines Monday, October 22, 12
NN-FSDirectory Monday, October 22, 12
NN-FSDirectory • FSDirectory༻ဋ维护લܥ统தతจ݅树 • INodeDirectoryදࣔ树தతҰ录 • INodeDiretoryWithQuotaੋINodeDirectoryత扩ల㜎ɼଈ带额తจ݅ 录 • INodeFileදࣔINode树தతҰจ݅ɼଖத༗BlockInfoਾɼ维护
该จ݅ॴ༗తBlock৴ଉ Monday, October 22, 12
NN-BlocksMap Monday, October 22, 12
NN-BlocksMap • BlocksMap༻ဋ维护Block -> { INode, datanodes, self ref }
తөࣹ䎔 ܥ • BlockදࣔҰBlockతجຊ৴ଉ • BlockInfo扩లࢠBlockɼআجຊ৴ଉ֎还แׅ该blockతinodeҾ༻ɼ ॴଐతdatanode৴ଉ Monday, October 22, 12
NN-BlockInfo 1. DN1ɼDN2ɼDN3䫲දࣔଘ༗վblockతࡾdatanodeతҾ༻ (DataNodeDescriptorʣ 2. DN1-prev-blkදࣔࡏDN1্blockྻදதલblockతલஔblockҾ༻ 3. DN1-next-blkදࣔࡏDN1্blockྻදதલblockతஔblockҾ༻ Monday, October
22, 12
NN-FSImage • FSImage༻ဋදࣔจ݅ܥ统త镜૾ɼ༻ဋ࣋ٱԽจ݅ܥ统৴ଉ启动时 伭䐾จ݅ܥ统结构 Monday, October 22, 12
Monday, October 22, 12
NN-FSEditLog • FSEditLog༻ဋܥ统启动อଘจ݅ܥ统্తमվ记录ɻFSEditLogத తૢ࡞记录ձपظੑ߹ኂ౸FSImageத Monday, October 22, 12
NN-ଖଞ结构 1.CorruptReplicasMapɿ௨过ҰTreeMap维护corruptঢ়态blockతblocks– >datanodedescriptor(s)өࣹɻ 2.recentInvalidateSetsɿ 维护࠷ࣦۙᏈతblockू߹ɼmapத为storageId- >ArrayList 3.datanodeMapɿ维护datanode->blockతөࣹ 4.neededReplicationsɿ௨过Ұ优ઌ级队ྻདྷ维护લधཁ备㟨తblockू߹ 5.PendingReplicationBlocksɿ维护લਖ਼ࡏ备㟨తblockू߹ 6.overReplicatedBlocksɿલधཁ检查ੋ൱备㟨过ଟతblockू߹
7.excessReplicateMapɿ维护ܥ统தdatanode༩ଖ্త额备㟨blockతू߹ɼ这 ࠣ额త备㟨কඃ删আɻ Monday, October 22, 12
NN运ߦࣜ NameNodeଘࡏࡾ䝅运ߦࣜɿ 1. Normalɿ NameNodeਖ਼ৗ务తঢ়态 2. Safe modeɿNameNodeॏ启时进ೖSafe modeɼ该ࣜԼܥ统ੋ读 తɼҎศဋNameNodeखصDataNode৴ଉ
3. Backup modeɿ备㟨NameNode处ဋBackup modeɼඃ动తᏅओ NameNodeత检查৴ଉ Monday, October 22, 12
NN启动 ࢀ⻅见ɿhttp://jiangbo.me/blog/2012/10/18/hdfs-namenode-startup/ Monday, October 22, 12
NN-Safe Mode NameNode进ೖSafeModeձ৽ݐҰSafeModeMonitor线ఔདྷ检测ੋ൱ೳ 䭧㩂䇖SafeModeɻ 㩂䇖҆શࣜత标।༗䫆ɿ 1.达౸࠷෭ຊཁٻతblockॴ༗blockతൺྫ 2.进ೖ҆શࣜత时间ੋ൱达౸࠷ཁٻ ࢀ⻅见ɿhttp://jiangbo.me/blog/2012/10/18/hdfs-namenode-safe-mode/ Monday, October
22, 12
NN-线ఔ ࡏNameNodeதଘࡏԼ႓䝅线ఔɿ 1. DataNode ݈߁检查ཧ线ఔ 2. ෭ຊཧ线ఔ 3. ે约ཧʢlease Managementʣ
4. IPC Handler 线ఔ Monday, October 22, 12
NN-Heartbeat • NameNodeத维护ྃҰdatanode৺检测త组heartbeatsɼཬ໘维护 ྃ㑌datanodeత࠷৽时间ፎɼdatanodeधཁपظੑNameNode发ૹ ৺请ٻɼߋ৽࠷৽时间ፎɻ • NameNodeத HeartbeatMonitor线ఔपظߦ检查heartbeatsྻදதੋ൱༗ 时ະߋ৽తdatanodeɼ༗则认为该节ቮ经deadɼ删আ该节ɼኂ 删আ该节্ॴ༗తblock৴ଉɻ
• 过༗༗Ꮘతblockɼক这ࠣ༗ᏈblockՃೖ౸धཁ备㟨తྻදதɼ进ߦ备 㟨ɻ • Ռ删আblock导கू܈த达౸备㟨ܥత༗ᏈblockతൺྫԼ߱౸进ೖ safemodeత阀值ɼকձ导கNameNode进ೖSafeModeɼࢸ备㟨满 ࠷ཁٻ ࢀ⻅见ɿhttp://jiangbo.me/blog/2012/10/18/hdfs-heartbeat/ Monday, October 22, 12
NN-෭ຊཧ • HDFSதత෭ຊཧ௨过FSNameSystem.javaதతReplicationMonitor线 ఔདྷɻ • 该线ఔपظੑว历 neededReplicationsྻදɼ҈র优ઌ级查ፙ࠷ߴ优ઌ 级ঘະ备㟨తblock • ༻
replicator为该block选औҰଘ์备㟨త标datanodeʢ实际త备 㟨ૢ࡞ੋࡏ该datanodeԼҰ࣍৺检测时༝NN௨ଖ҈ʣ • ক该blockՃೖ౸ਖ਼ࡏ备㟨తྻදதʢ pendingReplicationsʣɼኂဓ neededReplicationsҠআ • ಉ时该线ఔձ检测ਖ਼ࡏ备㟨తblockத时ঘະ备㟨తɼকଖဓ pendingReplicationsҠআɼॏ৽Ճೖ౸neededReplicationsத ࢀ⻅见ɿhttp://jiangbo.me/blog/2012/10/18/hdfs-namenode-replica-management/ Monday, October 22, 12
NN-෭ຊଘ์ࡦུ 1. ୈҰ෭ຊଘ์ࡏલdatanodeతຊ 2. ୈೋ෭ຊଘ์ࡏ༩ୈҰ෭ຊॴࡏdatanodeෆࡏಉҰصՍ্తҰ datanode্ 3. ୈࡾ෭ຊଘ์ࡏ༩ୈೋ෭ຊಉҰصՍୠෆಉdatanode্ Monday, October
22, 12
NN-ે约ཧ • LeaseManagementੋHDFSதతҰಉ㑊ص੍ɼ༻ဋอ证ಉҰ时ࠁ ༗Ұclient对Ұจ݅进ߦࣸ҃创ݐૢ࡞ɻ • HDFSதે约ཧ௨过LeaseManagerདྷɼओཁ职责แׅɿ • 1.client发ىcreateɼwriteૢ࡞ੋ发์leaseɼ续约ճᏅɻ • 2.௨过Ұ线ఔճᏅ过ظతlease
• Client௨过DFSClient.LeaseChecker 对 ঘࡏ༻ୠેظ 过 త lease 进 ߦ 续约 ࢀ⻅见ɿhttp://jiangbo.me/blog/2012/10/18/hdfs-namenode-lease-management/ Monday, October 22, 12
NN-ճᏅࢉ๏ 1) NameNode 查 ፙlease৴ଉ 2) 对 ဋleaseதత㑌จ݅fɼྩb 为 fత࠷Ұblockɼ࡞Լૢ࡞ɿ
2.1) 获 औbॴࡏతdatanodeྻද 2.2) ྩଖதҰdatanode࡞ 为 primary datanode p 2.3) p ဓNameNode 获 औ࠷৽త 时间 ፎ 2.4) p ဓ㑌DataNode 获 औblock৴ଉ 2.5) p 计 ࢉ࠷খతblock 长 2.6) p ༻࠷খతblock 长 ࠷৽త 时间 ፎདྷߋ৽۩༗༗Ꮘ 时间 ፎతdatanode 2.7) p ௨NameNodeߋ৽ 结 Ռ 2.8) NameNodeߋ৽BlockInfo 2.9) NameNodeဓleaseத 删 আfɼՌࠑ 时该 leaseதॴ༗จ݅ቮඃ 删 আɼক 删 আ 该 lease 2.10) NameఏަमվతEditLog Monday, October 22, 12
Secondary NameNode • SecondaryNameNodeࡏHDFSதతओཁ࡞༻ੋ㢦ॿmaster NameNodeपظੑʢᘍ认5钟ʣ 执ߦcheckpointૢ࡞ɻ • ଖத༗䫆ओཁతՄஔଐੑɿ 1. checkpointPeriod:
䫆࣍检查త间ִ时间ɼՄ௨过fs.checkpoint.periodஔ 2. checkpointSize: EditLogจ݅త࠷େ值ɼEditLog过这࠷େ值时ձ㖘੍೭ߦ checkpointɼՄ௨过fs.checkpoint.sizeஔɼᘍ认ੋ64M Monday, October 22, 12
SDN-checkpoint Monday, October 22, 12
DN-ਾ结构 • HDFSதDataNodeओཁ 负责维护 block->stream bytesతөࣹ䎔ܥɼଈ 实际 blockਾతଘ 储 ɻ
data/ ├── blocksBeingWritten ├── current │ ├── VERSION │ ├── blk_-1148021215131449924 │ ├── blk_-1148021215131449924_1001.meta │ ├── blk_-8598609183581346893 │ ├── blk_-8598609183581346893_1002.meta │ ├── blk_6693595845022390257 │ ├── blk_6693595845022390257_1003.meta │ └── dncp_block_verification.log.curr ├── detach ├── storage └── tmp Monday, October 22, 12
DN-FSDataSet Monday, October 22, 12
DN-FSDataSet • FSVolumne༻ဋ进ߦblockจ݅ॴଐతრཧɼ统计ଘ储录额༻ႎ • FSVolumeSetੋFSVolumeతू߹ɼఏڙྃॴ༗༰ྔɼႫ༨ۭ间ํ๏ɻ • FSDataSetੋࡏFSVolumeSet೭্进ߦ෧实现FSDatasetInterfaceआޱɼ֎ఏڙ块查询ૢ ࡞ํ๏ɻ • FSDir༻ဋ构ݐblock块ࡏdatanode࣓盘্త层࣍结构ɼᘍ认ႎԼ㑌录Լ࠷ଟ64ࢠ
录ɼ࠷ଟೳଘ储64块ɻ录ॳ࢝Խ时ձ递归扫ඳ录Լతॴ༗ࢠ录จ݅ɼ构ݐҰ树ܗ 结构ɻaddBlock时ɼटઌ尝试ࡏલ录৽Ճ块ɼՌલ录༗ۭ闲ۭ间ɼ则尝试ࡏࢠ 录தఴՃɼՌ༗ࢠ录ɼ则৽ݐҰࢠ录ɻ Monday, October 22, 12
NN&DN-DNၽ • DataNodeࡏ启动时ձNameNodeၽɼఏަԼ৴ଉɿ • nameɿصث໊ʢओص໊+务ޱ߸ʣ • infoPort: ঢ়态৴ଉ务ޱ • ipcPortɿ
ఏڙipc务తޱ߸ • NameNode为DataNodeҰstorageId Monday, October 22, 12
Monday, October 22, 12
NN&DN-৺检测 Monday, October 22, 12
NN&DN-blockReport Monday, October 22, 12
DN-offerservice Monday, October 22, 12
Client-码结构 Monday, October 22, 12
Client-DFSClient Monday, October 22, 12
Client-DFSClient 1. LeaseCheckerओཁ༻ဋlease检查续约ɻ 2. DFSOutputStream༻ဋఏڙ带bufferతࣈ节ྲྀࣸೖޭೳɻclientࡏࣸೖਾ时ઌকਾ缓ଘࡏ ຊɻኂকਾଟpacketʢᘍ认㑌packet为64Kʣɻ㑌packetຢඃ፥ଟ chunkʢᘍ认512Byteʣɼ㑌chunk༗Ұchecksumɻclientࣸ满Ұpacketձক该 packetՃೖ౸Ұdataqueueதɻ༝DataStreamer线ఔ负责ক㑌packet发ૹ给datanode pipelineɻ发ૹҰpakcetɼstreamerձকଖဓdataqueueҠࢸackqueueதɻ ResponseProcessor负责Ꮕdatanode发ճతack৴ଉɼ㑌ޭᏅҰpacketతack৴
ଉɼResponseProcessorձকackqueueத该packet删আɻ 3. DFSInputStream༻ဋఏڙࣈ节ྲྀత读औɼଖ෦෧ྃ༩NNDNతަޓ 4. DataStreamer: 负责datanode pipeline发ૹpacketɻଖຊੋҰDaemon线ఔɼဓ namenode获औblockIdblockଘ์Ґஔɼকpacket发ૹ给pipelineதతdatanodeɼ㑌 packet༗ҰseqIdɼ㑌packet发ૹ时ձᏅ౸datanodeతack৴ଉɻᏅ౸ॴ༗ packetతack৴ଉʢදࣔ该blockቮ发ૹʣɼstreamer䎔闭该blockɻ 5. ResponseProcessor:༻ဋᏅdatanodeฦճack৴ଉɼኂক㠳应ackqueueதతpacket删আ Monday, October 22, 12
ࣸೖྲྀఔ Monday, October 22, 12
读औྲྀఔ Monday, October 22, 12
DataXceiverServer • DataNodeࡏ 启动时 ձ௨ 过 DataXceiverServer䇖 启 ҰSocketޱɼ 负责
blockਾత 读 ࣸɻ DataXceiverServerຊ࡞ 为 Ұक 护线 ఔɼ 监 ჶdfs.datanode.addressஔతਾ 读 ࣸ 务 ޱɻ༗ 请 ٻདྷ 时 ɼ৽ݐҰDataXceiver 线 ఔ 处 ཧ 请 ٻɻ • DataXceiver 线 ఔ༻ဋ 处 ཧҰ 读 /ࣸਾྲྀ 请 ٻɼଖrunํ๏ೖԼओཁੋࠜਾ 请 ٻதෆಉత 请 ٻ㜎ܕɼ 调 ༻㠳 应 త 处 ཧํ๏ Monday, October 22, 12
readBlock() readBlock()ओཁဓdisk读औblockਾɼ构ݐҰDataOutputStreamਾྲྀɼኂ৽ݐҰ BlockSenderক这ਾྲྀ发ૹग़ڈʢdatanode҃ऀclientʣɻ BlockSender.sendBlock()发ૹతBlockతྲྀఔେମԼɿ 1. 读औblockతmeta৴ଉɼ获ಘchecksumኂ发ૹ 2. 发ૹਾ读औతภҠྔ 3. কblockਾ为packetɼ发ૹ给client
4. ॴ༗packet发ૹ೭ɼ䎔闭checksumจ݅blockจ݅ Monday, October 22, 12
writeBlock() 1.BlockReceiverဓ্҈packetҰ节读औਾɼࣸೖ౸ຊdisk 2.༗ԼҰ备㟨节ɼক该packet转发给ԼҰ节 3.ক该packetՃೖ౸ackqueue队ྻதackফଉ 4.ԼҰ节该packetࣸೖձฦճ该packet对应తack৴ଉ 5.PakcetResponderᏅ౸ack৴ଉɼকackqueueத该packet删আɼኂલஔ节发 ૹack৴ଉ Monday, October 22,
12
• 详细㩘记ࢀ⻅见ɿhttp://jiangbo.me/blog/ categories/hdfs/ Monday, October 22, 12
THX Monday, October 22, 12