Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
HDFS
Search
jiangbo
October 23, 2012
0
190
HDFS
jiangbo
October 23, 2012
Tweet
Share
More Decks by jiangbo
See All by jiangbo
HDFS RAID
jiangbo
0
3.9k
Memcached内存管理
jiangbo
1
1.6k
awk
jiangbo
4
310
vim
jiangbo
8
500
Featured
See All Featured
How STYLIGHT went responsive
nonsquared
100
5.8k
How To Stay Up To Date on Web Technology
chriscoyier
791
250k
How to Think Like a Performance Engineer
csswizardry
27
2k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
35
3.2k
How GitHub (no longer) Works
holman
315
140k
Making Projects Easy
brettharned
119
6.4k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.2k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
114
20k
Art, The Web, and Tiny UX
lynnandtonic
303
21k
Facilitating Awesome Meetings
lara
56
6.6k
Context Engineering - Making Every Token Count
addyosmani
5
210
How to Ace a Technical Interview
jacobian
280
24k
Transcript
HDFS ᇙതʢ镜圆ʣ Monday, October 22, 12
Outline • Overview • NameNode • DataNode • SecondaryNode •
Client Monday, October 22, 12
Overview Monday, October 22, 12
NN Data Structure • Valid fsname -> block list (keep
on disk) • Set of all valid blocks (inverted #1) • block -> machinelist (keep in memory, rebuilt from datanode blocksreport) • machine -> blocklist (inverted #2) • LRU cache of updated-heartbeat machines Monday, October 22, 12
NN-FSDirectory Monday, October 22, 12
NN-FSDirectory • FSDirectory༻ဋ维护લܥ统தతจ݅树 • INodeDirectoryදࣔ树தతҰ录 • INodeDiretoryWithQuotaੋINodeDirectoryత扩ల㜎ɼଈ带额తจ݅ 录 • INodeFileදࣔINode树தతҰจ݅ɼଖத༗BlockInfoਾɼ维护
该จ݅ॴ༗తBlock৴ଉ Monday, October 22, 12
NN-BlocksMap Monday, October 22, 12
NN-BlocksMap • BlocksMap༻ဋ维护Block -> { INode, datanodes, self ref }
తөࣹ䎔 ܥ • BlockදࣔҰBlockతجຊ৴ଉ • BlockInfo扩లࢠBlockɼআجຊ৴ଉ֎还แׅ该blockతinodeҾ༻ɼ ॴଐతdatanode৴ଉ Monday, October 22, 12
NN-BlockInfo 1. DN1ɼDN2ɼDN3䫲දࣔଘ༗վblockతࡾdatanodeతҾ༻ (DataNodeDescriptorʣ 2. DN1-prev-blkදࣔࡏDN1্blockྻදதલblockతલஔblockҾ༻ 3. DN1-next-blkදࣔࡏDN1্blockྻදதલblockతஔblockҾ༻ Monday, October
22, 12
NN-FSImage • FSImage༻ဋදࣔจ݅ܥ统త镜૾ɼ༻ဋ࣋ٱԽจ݅ܥ统৴ଉ启动时 伭䐾จ݅ܥ统结构 Monday, October 22, 12
Monday, October 22, 12
NN-FSEditLog • FSEditLog༻ဋܥ统启动อଘจ݅ܥ统্తमվ记录ɻFSEditLogத తૢ࡞记录ձपظੑ߹ኂ౸FSImageத Monday, October 22, 12
NN-ଖଞ结构 1.CorruptReplicasMapɿ௨过ҰTreeMap维护corruptঢ়态blockతblocks– >datanodedescriptor(s)өࣹɻ 2.recentInvalidateSetsɿ 维护࠷ࣦۙᏈతblockू߹ɼmapத为storageId- >ArrayList 3.datanodeMapɿ维护datanode->blockతөࣹ 4.neededReplicationsɿ௨过Ұ优ઌ级队ྻདྷ维护લधཁ备㟨తblockू߹ 5.PendingReplicationBlocksɿ维护લਖ਼ࡏ备㟨తblockू߹ 6.overReplicatedBlocksɿલधཁ检查ੋ൱备㟨过ଟతblockू߹
7.excessReplicateMapɿ维护ܥ统தdatanode༩ଖ্త额备㟨blockతू߹ɼ这 ࠣ额త备㟨কඃ删আɻ Monday, October 22, 12
NN运ߦࣜ NameNodeଘࡏࡾ䝅运ߦࣜɿ 1. Normalɿ NameNodeਖ਼ৗ务తঢ়态 2. Safe modeɿNameNodeॏ启时进ೖSafe modeɼ该ࣜԼܥ统ੋ读 తɼҎศဋNameNodeखصDataNode৴ଉ
3. Backup modeɿ备㟨NameNode处ဋBackup modeɼඃ动తᏅओ NameNodeత检查৴ଉ Monday, October 22, 12
NN启动 ࢀ⻅见ɿhttp://jiangbo.me/blog/2012/10/18/hdfs-namenode-startup/ Monday, October 22, 12
NN-Safe Mode NameNode进ೖSafeModeձ৽ݐҰSafeModeMonitor线ఔདྷ检测ੋ൱ೳ 䭧㩂䇖SafeModeɻ 㩂䇖҆શࣜత标।༗䫆ɿ 1.达౸࠷෭ຊཁٻతblockॴ༗blockతൺྫ 2.进ೖ҆શࣜత时间ੋ൱达౸࠷ཁٻ ࢀ⻅见ɿhttp://jiangbo.me/blog/2012/10/18/hdfs-namenode-safe-mode/ Monday, October
22, 12
NN-线ఔ ࡏNameNodeதଘࡏԼ႓䝅线ఔɿ 1. DataNode ݈߁检查ཧ线ఔ 2. ෭ຊཧ线ఔ 3. ે约ཧʢlease Managementʣ
4. IPC Handler 线ఔ Monday, October 22, 12
NN-Heartbeat • NameNodeத维护ྃҰdatanode৺检测త组heartbeatsɼཬ໘维护 ྃ㑌datanodeత࠷৽时间ፎɼdatanodeधཁपظੑNameNode发ૹ ৺请ٻɼߋ৽࠷৽时间ፎɻ • NameNodeத HeartbeatMonitor线ఔपظߦ检查heartbeatsྻදதੋ൱༗ 时ະߋ৽తdatanodeɼ༗则认为该节ቮ经deadɼ删আ该节ɼኂ 删আ该节্ॴ༗తblock৴ଉɻ
• 过༗༗Ꮘతblockɼক这ࠣ༗ᏈblockՃೖ౸धཁ备㟨తྻදதɼ进ߦ备 㟨ɻ • Ռ删আblock导கू܈த达౸备㟨ܥత༗ᏈblockతൺྫԼ߱౸进ೖ safemodeత阀值ɼকձ导கNameNode进ೖSafeModeɼࢸ备㟨满 ࠷ཁٻ ࢀ⻅见ɿhttp://jiangbo.me/blog/2012/10/18/hdfs-heartbeat/ Monday, October 22, 12
NN-෭ຊཧ • HDFSதత෭ຊཧ௨过FSNameSystem.javaதతReplicationMonitor线 ఔདྷɻ • 该线ఔपظੑว历 neededReplicationsྻදɼ҈র优ઌ级查ፙ࠷ߴ优ઌ 级ঘະ备㟨తblock • ༻
replicator为该block选औҰଘ์备㟨త标datanodeʢ实际త备 㟨ૢ࡞ੋࡏ该datanodeԼҰ࣍৺检测时༝NN௨ଖ҈ʣ • ক该blockՃೖ౸ਖ਼ࡏ备㟨తྻදதʢ pendingReplicationsʣɼኂဓ neededReplicationsҠআ • ಉ时该线ఔձ检测ਖ਼ࡏ备㟨తblockத时ঘະ备㟨తɼকଖဓ pendingReplicationsҠআɼॏ৽Ճೖ౸neededReplicationsத ࢀ⻅见ɿhttp://jiangbo.me/blog/2012/10/18/hdfs-namenode-replica-management/ Monday, October 22, 12
NN-෭ຊଘ์ࡦུ 1. ୈҰ෭ຊଘ์ࡏલdatanodeతຊ 2. ୈೋ෭ຊଘ์ࡏ༩ୈҰ෭ຊॴࡏdatanodeෆࡏಉҰصՍ্తҰ datanode্ 3. ୈࡾ෭ຊଘ์ࡏ༩ୈೋ෭ຊಉҰصՍୠෆಉdatanode্ Monday, October
22, 12
NN-ે约ཧ • LeaseManagementੋHDFSதతҰಉ㑊ص੍ɼ༻ဋอ证ಉҰ时ࠁ ༗Ұclient对Ұจ݅进ߦࣸ҃创ݐૢ࡞ɻ • HDFSதે约ཧ௨过LeaseManagerདྷɼओཁ职责แׅɿ • 1.client发ىcreateɼwriteૢ࡞ੋ发์leaseɼ续约ճᏅɻ • 2.௨过Ұ线ఔճᏅ过ظతlease
• Client௨过DFSClient.LeaseChecker 对 ঘࡏ༻ୠેظ 过 త lease 进 ߦ 续约 ࢀ⻅见ɿhttp://jiangbo.me/blog/2012/10/18/hdfs-namenode-lease-management/ Monday, October 22, 12
NN-ճᏅࢉ๏ 1) NameNode 查 ፙlease৴ଉ 2) 对 ဋleaseதత㑌จ݅fɼྩb 为 fత࠷Ұblockɼ࡞Լૢ࡞ɿ
2.1) 获 औbॴࡏతdatanodeྻද 2.2) ྩଖதҰdatanode࡞ 为 primary datanode p 2.3) p ဓNameNode 获 औ࠷৽త 时间 ፎ 2.4) p ဓ㑌DataNode 获 औblock৴ଉ 2.5) p 计 ࢉ࠷খతblock 长 2.6) p ༻࠷খతblock 长 ࠷৽త 时间 ፎདྷߋ৽۩༗༗Ꮘ 时间 ፎతdatanode 2.7) p ௨NameNodeߋ৽ 结 Ռ 2.8) NameNodeߋ৽BlockInfo 2.9) NameNodeဓleaseத 删 আfɼՌࠑ 时该 leaseதॴ༗จ݅ቮඃ 删 আɼক 删 আ 该 lease 2.10) NameఏަमվతEditLog Monday, October 22, 12
Secondary NameNode • SecondaryNameNodeࡏHDFSதతओཁ࡞༻ੋ㢦ॿmaster NameNodeपظੑʢᘍ认5钟ʣ 执ߦcheckpointૢ࡞ɻ • ଖத༗䫆ओཁతՄஔଐੑɿ 1. checkpointPeriod:
䫆࣍检查త间ִ时间ɼՄ௨过fs.checkpoint.periodஔ 2. checkpointSize: EditLogจ݅త࠷େ值ɼEditLog过这࠷େ值时ձ㖘੍೭ߦ checkpointɼՄ௨过fs.checkpoint.sizeஔɼᘍ认ੋ64M Monday, October 22, 12
SDN-checkpoint Monday, October 22, 12
DN-ਾ结构 • HDFSதDataNodeओཁ 负责维护 block->stream bytesతөࣹ䎔ܥɼଈ 实际 blockਾతଘ 储 ɻ
data/ ├── blocksBeingWritten ├── current │ ├── VERSION │ ├── blk_-1148021215131449924 │ ├── blk_-1148021215131449924_1001.meta │ ├── blk_-8598609183581346893 │ ├── blk_-8598609183581346893_1002.meta │ ├── blk_6693595845022390257 │ ├── blk_6693595845022390257_1003.meta │ └── dncp_block_verification.log.curr ├── detach ├── storage └── tmp Monday, October 22, 12
DN-FSDataSet Monday, October 22, 12
DN-FSDataSet • FSVolumne༻ဋ进ߦblockจ݅ॴଐతრཧɼ统计ଘ储录额༻ႎ • FSVolumeSetੋFSVolumeతू߹ɼఏڙྃॴ༗༰ྔɼႫ༨ۭ间ํ๏ɻ • FSDataSetੋࡏFSVolumeSet೭্进ߦ෧实现FSDatasetInterfaceआޱɼ֎ఏڙ块查询ૢ ࡞ํ๏ɻ • FSDir༻ဋ构ݐblock块ࡏdatanode࣓盘্త层࣍结构ɼᘍ认ႎԼ㑌录Լ࠷ଟ64ࢠ
录ɼ࠷ଟೳଘ储64块ɻ录ॳ࢝Խ时ձ递归扫ඳ录Լతॴ༗ࢠ录จ݅ɼ构ݐҰ树ܗ 结构ɻaddBlock时ɼटઌ尝试ࡏલ录৽Ճ块ɼՌલ录༗ۭ闲ۭ间ɼ则尝试ࡏࢠ 录தఴՃɼՌ༗ࢠ录ɼ则৽ݐҰࢠ录ɻ Monday, October 22, 12
NN&DN-DNၽ • DataNodeࡏ启动时ձNameNodeၽɼఏަԼ৴ଉɿ • nameɿصث໊ʢओص໊+务ޱ߸ʣ • infoPort: ঢ়态৴ଉ务ޱ • ipcPortɿ
ఏڙipc务తޱ߸ • NameNode为DataNodeҰstorageId Monday, October 22, 12
Monday, October 22, 12
NN&DN-৺检测 Monday, October 22, 12
NN&DN-blockReport Monday, October 22, 12
DN-offerservice Monday, October 22, 12
Client-码结构 Monday, October 22, 12
Client-DFSClient Monday, October 22, 12
Client-DFSClient 1. LeaseCheckerओཁ༻ဋlease检查续约ɻ 2. DFSOutputStream༻ဋఏڙ带bufferతࣈ节ྲྀࣸೖޭೳɻclientࡏࣸೖਾ时ઌকਾ缓ଘࡏ ຊɻኂকਾଟpacketʢᘍ认㑌packet为64Kʣɻ㑌packetຢඃ፥ଟ chunkʢᘍ认512Byteʣɼ㑌chunk༗Ұchecksumɻclientࣸ满Ұpacketձক该 packetՃೖ౸Ұdataqueueதɻ༝DataStreamer线ఔ负责ক㑌packet发ૹ给datanode pipelineɻ发ૹҰpakcetɼstreamerձকଖဓdataqueueҠࢸackqueueதɻ ResponseProcessor负责Ꮕdatanode发ճతack৴ଉɼ㑌ޭᏅҰpacketతack৴
ଉɼResponseProcessorձকackqueueத该packet删আɻ 3. DFSInputStream༻ဋఏڙࣈ节ྲྀత读औɼଖ෦෧ྃ༩NNDNతަޓ 4. DataStreamer: 负责datanode pipeline发ૹpacketɻଖຊੋҰDaemon线ఔɼဓ namenode获औblockIdblockଘ์Ґஔɼকpacket发ૹ给pipelineதతdatanodeɼ㑌 packet༗ҰseqIdɼ㑌packet发ૹ时ձᏅ౸datanodeతack৴ଉɻᏅ౸ॴ༗ packetతack৴ଉʢදࣔ该blockቮ发ૹʣɼstreamer䎔闭该blockɻ 5. ResponseProcessor:༻ဋᏅdatanodeฦճack৴ଉɼኂক㠳应ackqueueதతpacket删আ Monday, October 22, 12
ࣸೖྲྀఔ Monday, October 22, 12
读औྲྀఔ Monday, October 22, 12
DataXceiverServer • DataNodeࡏ 启动时 ձ௨ 过 DataXceiverServer䇖 启 ҰSocketޱɼ 负责
blockਾత 读 ࣸɻ DataXceiverServerຊ࡞ 为 Ұक 护线 ఔɼ 监 ჶdfs.datanode.addressஔతਾ 读 ࣸ 务 ޱɻ༗ 请 ٻདྷ 时 ɼ৽ݐҰDataXceiver 线 ఔ 处 ཧ 请 ٻɻ • DataXceiver 线 ఔ༻ဋ 处 ཧҰ 读 /ࣸਾྲྀ 请 ٻɼଖrunํ๏ೖԼओཁੋࠜਾ 请 ٻதෆಉత 请 ٻ㜎ܕɼ 调 ༻㠳 应 త 处 ཧํ๏ Monday, October 22, 12
readBlock() readBlock()ओཁဓdisk读औblockਾɼ构ݐҰDataOutputStreamਾྲྀɼኂ৽ݐҰ BlockSenderক这ਾྲྀ发ૹग़ڈʢdatanode҃ऀclientʣɻ BlockSender.sendBlock()发ૹతBlockతྲྀఔେମԼɿ 1. 读औblockతmeta৴ଉɼ获ಘchecksumኂ发ૹ 2. 发ૹਾ读औతภҠྔ 3. কblockਾ为packetɼ发ૹ给client
4. ॴ༗packet发ૹ೭ɼ䎔闭checksumจ݅blockจ݅ Monday, October 22, 12
writeBlock() 1.BlockReceiverဓ্҈packetҰ节读औਾɼࣸೖ౸ຊdisk 2.༗ԼҰ备㟨节ɼক该packet转发给ԼҰ节 3.ক该packetՃೖ౸ackqueue队ྻதackফଉ 4.ԼҰ节该packetࣸೖձฦճ该packet对应తack৴ଉ 5.PakcetResponderᏅ౸ack৴ଉɼকackqueueத该packet删আɼኂલஔ节发 ૹack৴ଉ Monday, October 22,
12
• 详细㩘记ࢀ⻅见ɿhttp://jiangbo.me/blog/ categories/hdfs/ Monday, October 22, 12
THX Monday, October 22, 12