Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
HDFS
Search
jiangbo
October 23, 2012
0
180
HDFS
jiangbo
October 23, 2012
Tweet
Share
More Decks by jiangbo
See All by jiangbo
HDFS RAID
jiangbo
0
3.9k
Memcached内存管理
jiangbo
1
1.5k
awk
jiangbo
4
310
vim
jiangbo
8
500
Featured
See All Featured
The Straight Up "How To Draw Better" Workshop
denniskardys
234
140k
KATA
mclloyd
30
14k
Why You Should Never Use an ORM
jnunemaker
PRO
58
9.4k
GraphQLとの向き合い方2022年版
quramy
49
14k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
252
21k
Product Roadmaps are Hard
iamctodd
PRO
54
11k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
124
52k
The Cost Of JavaScript in 2023
addyosmani
51
8.5k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
667
120k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
30
2.1k
How To Stay Up To Date on Web Technology
chriscoyier
790
250k
Scaling GitHub
holman
459
140k
Transcript
HDFS ᇙതʢ镜圆ʣ Monday, October 22, 12
Outline • Overview • NameNode • DataNode • SecondaryNode •
Client Monday, October 22, 12
Overview Monday, October 22, 12
NN Data Structure • Valid fsname -> block list (keep
on disk) • Set of all valid blocks (inverted #1) • block -> machinelist (keep in memory, rebuilt from datanode blocksreport) • machine -> blocklist (inverted #2) • LRU cache of updated-heartbeat machines Monday, October 22, 12
NN-FSDirectory Monday, October 22, 12
NN-FSDirectory • FSDirectory༻ဋ维护લܥ统தతจ݅树 • INodeDirectoryදࣔ树தతҰ录 • INodeDiretoryWithQuotaੋINodeDirectoryత扩ల㜎ɼଈ带额తจ݅ 录 • INodeFileදࣔINode树தతҰจ݅ɼଖத༗BlockInfoਾɼ维护
该จ݅ॴ༗తBlock৴ଉ Monday, October 22, 12
NN-BlocksMap Monday, October 22, 12
NN-BlocksMap • BlocksMap༻ဋ维护Block -> { INode, datanodes, self ref }
తөࣹ䎔 ܥ • BlockදࣔҰBlockతجຊ৴ଉ • BlockInfo扩లࢠBlockɼআجຊ৴ଉ֎还แׅ该blockతinodeҾ༻ɼ ॴଐతdatanode৴ଉ Monday, October 22, 12
NN-BlockInfo 1. DN1ɼDN2ɼDN3䫲දࣔଘ༗վblockతࡾdatanodeతҾ༻ (DataNodeDescriptorʣ 2. DN1-prev-blkදࣔࡏDN1্blockྻදதલblockతલஔblockҾ༻ 3. DN1-next-blkදࣔࡏDN1্blockྻදதલblockతஔblockҾ༻ Monday, October
22, 12
NN-FSImage • FSImage༻ဋදࣔจ݅ܥ统త镜૾ɼ༻ဋ࣋ٱԽจ݅ܥ统৴ଉ启动时 伭䐾จ݅ܥ统结构 Monday, October 22, 12
Monday, October 22, 12
NN-FSEditLog • FSEditLog༻ဋܥ统启动อଘจ݅ܥ统্తमվ记录ɻFSEditLogத తૢ࡞记录ձपظੑ߹ኂ౸FSImageத Monday, October 22, 12
NN-ଖଞ结构 1.CorruptReplicasMapɿ௨过ҰTreeMap维护corruptঢ়态blockతblocks– >datanodedescriptor(s)өࣹɻ 2.recentInvalidateSetsɿ 维护࠷ࣦۙᏈతblockू߹ɼmapத为storageId- >ArrayList 3.datanodeMapɿ维护datanode->blockతөࣹ 4.neededReplicationsɿ௨过Ұ优ઌ级队ྻདྷ维护લधཁ备㟨తblockू߹ 5.PendingReplicationBlocksɿ维护લਖ਼ࡏ备㟨తblockू߹ 6.overReplicatedBlocksɿલधཁ检查ੋ൱备㟨过ଟతblockू߹
7.excessReplicateMapɿ维护ܥ统தdatanode༩ଖ্త额备㟨blockతू߹ɼ这 ࠣ额త备㟨কඃ删আɻ Monday, October 22, 12
NN运ߦࣜ NameNodeଘࡏࡾ䝅运ߦࣜɿ 1. Normalɿ NameNodeਖ਼ৗ务తঢ়态 2. Safe modeɿNameNodeॏ启时进ೖSafe modeɼ该ࣜԼܥ统ੋ读 తɼҎศဋNameNodeखصDataNode৴ଉ
3. Backup modeɿ备㟨NameNode处ဋBackup modeɼඃ动తᏅओ NameNodeత检查৴ଉ Monday, October 22, 12
NN启动 ࢀ⻅见ɿhttp://jiangbo.me/blog/2012/10/18/hdfs-namenode-startup/ Monday, October 22, 12
NN-Safe Mode NameNode进ೖSafeModeձ৽ݐҰSafeModeMonitor线ఔདྷ检测ੋ൱ೳ 䭧㩂䇖SafeModeɻ 㩂䇖҆શࣜత标।༗䫆ɿ 1.达౸࠷෭ຊཁٻతblockॴ༗blockతൺྫ 2.进ೖ҆શࣜత时间ੋ൱达౸࠷ཁٻ ࢀ⻅见ɿhttp://jiangbo.me/blog/2012/10/18/hdfs-namenode-safe-mode/ Monday, October
22, 12
NN-线ఔ ࡏNameNodeதଘࡏԼ႓䝅线ఔɿ 1. DataNode ݈߁检查ཧ线ఔ 2. ෭ຊཧ线ఔ 3. ે约ཧʢlease Managementʣ
4. IPC Handler 线ఔ Monday, October 22, 12
NN-Heartbeat • NameNodeத维护ྃҰdatanode৺检测త组heartbeatsɼཬ໘维护 ྃ㑌datanodeత࠷৽时间ፎɼdatanodeधཁपظੑNameNode发ૹ ৺请ٻɼߋ৽࠷৽时间ፎɻ • NameNodeத HeartbeatMonitor线ఔपظߦ检查heartbeatsྻදதੋ൱༗ 时ະߋ৽తdatanodeɼ༗则认为该节ቮ经deadɼ删আ该节ɼኂ 删আ该节্ॴ༗తblock৴ଉɻ
• 过༗༗Ꮘతblockɼক这ࠣ༗ᏈblockՃೖ౸धཁ备㟨తྻදதɼ进ߦ备 㟨ɻ • Ռ删আblock导கू܈த达౸备㟨ܥత༗ᏈblockతൺྫԼ߱౸进ೖ safemodeత阀值ɼকձ导கNameNode进ೖSafeModeɼࢸ备㟨满 ࠷ཁٻ ࢀ⻅见ɿhttp://jiangbo.me/blog/2012/10/18/hdfs-heartbeat/ Monday, October 22, 12
NN-෭ຊཧ • HDFSதత෭ຊཧ௨过FSNameSystem.javaதతReplicationMonitor线 ఔདྷɻ • 该线ఔपظੑว历 neededReplicationsྻදɼ҈র优ઌ级查ፙ࠷ߴ优ઌ 级ঘະ备㟨తblock • ༻
replicator为该block选औҰଘ์备㟨త标datanodeʢ实际త备 㟨ૢ࡞ੋࡏ该datanodeԼҰ࣍৺检测时༝NN௨ଖ҈ʣ • ক该blockՃೖ౸ਖ਼ࡏ备㟨తྻදதʢ pendingReplicationsʣɼኂဓ neededReplicationsҠআ • ಉ时该线ఔձ检测ਖ਼ࡏ备㟨తblockத时ঘະ备㟨తɼকଖဓ pendingReplicationsҠআɼॏ৽Ճೖ౸neededReplicationsத ࢀ⻅见ɿhttp://jiangbo.me/blog/2012/10/18/hdfs-namenode-replica-management/ Monday, October 22, 12
NN-෭ຊଘ์ࡦུ 1. ୈҰ෭ຊଘ์ࡏલdatanodeతຊ 2. ୈೋ෭ຊଘ์ࡏ༩ୈҰ෭ຊॴࡏdatanodeෆࡏಉҰصՍ্తҰ datanode্ 3. ୈࡾ෭ຊଘ์ࡏ༩ୈೋ෭ຊಉҰصՍୠෆಉdatanode্ Monday, October
22, 12
NN-ે约ཧ • LeaseManagementੋHDFSதతҰಉ㑊ص੍ɼ༻ဋอ证ಉҰ时ࠁ ༗Ұclient对Ұจ݅进ߦࣸ҃创ݐૢ࡞ɻ • HDFSதે约ཧ௨过LeaseManagerདྷɼओཁ职责แׅɿ • 1.client发ىcreateɼwriteૢ࡞ੋ发์leaseɼ续约ճᏅɻ • 2.௨过Ұ线ఔճᏅ过ظతlease
• Client௨过DFSClient.LeaseChecker 对 ঘࡏ༻ୠેظ 过 త lease 进 ߦ 续约 ࢀ⻅见ɿhttp://jiangbo.me/blog/2012/10/18/hdfs-namenode-lease-management/ Monday, October 22, 12
NN-ճᏅࢉ๏ 1) NameNode 查 ፙlease৴ଉ 2) 对 ဋleaseதత㑌จ݅fɼྩb 为 fత࠷Ұblockɼ࡞Լૢ࡞ɿ
2.1) 获 औbॴࡏతdatanodeྻද 2.2) ྩଖதҰdatanode࡞ 为 primary datanode p 2.3) p ဓNameNode 获 औ࠷৽త 时间 ፎ 2.4) p ဓ㑌DataNode 获 औblock৴ଉ 2.5) p 计 ࢉ࠷খతblock 长 2.6) p ༻࠷খతblock 长 ࠷৽త 时间 ፎདྷߋ৽۩༗༗Ꮘ 时间 ፎతdatanode 2.7) p ௨NameNodeߋ৽ 结 Ռ 2.8) NameNodeߋ৽BlockInfo 2.9) NameNodeဓleaseத 删 আfɼՌࠑ 时该 leaseதॴ༗จ݅ቮඃ 删 আɼক 删 আ 该 lease 2.10) NameఏަमվతEditLog Monday, October 22, 12
Secondary NameNode • SecondaryNameNodeࡏHDFSதతओཁ࡞༻ੋ㢦ॿmaster NameNodeपظੑʢᘍ认5钟ʣ 执ߦcheckpointૢ࡞ɻ • ଖத༗䫆ओཁతՄஔଐੑɿ 1. checkpointPeriod:
䫆࣍检查త间ִ时间ɼՄ௨过fs.checkpoint.periodஔ 2. checkpointSize: EditLogจ݅త࠷େ值ɼEditLog过这࠷େ值时ձ㖘੍೭ߦ checkpointɼՄ௨过fs.checkpoint.sizeஔɼᘍ认ੋ64M Monday, October 22, 12
SDN-checkpoint Monday, October 22, 12
DN-ਾ结构 • HDFSதDataNodeओཁ 负责维护 block->stream bytesతөࣹ䎔ܥɼଈ 实际 blockਾతଘ 储 ɻ
data/ ├── blocksBeingWritten ├── current │ ├── VERSION │ ├── blk_-1148021215131449924 │ ├── blk_-1148021215131449924_1001.meta │ ├── blk_-8598609183581346893 │ ├── blk_-8598609183581346893_1002.meta │ ├── blk_6693595845022390257 │ ├── blk_6693595845022390257_1003.meta │ └── dncp_block_verification.log.curr ├── detach ├── storage └── tmp Monday, October 22, 12
DN-FSDataSet Monday, October 22, 12
DN-FSDataSet • FSVolumne༻ဋ进ߦblockจ݅ॴଐతრཧɼ统计ଘ储录额༻ႎ • FSVolumeSetੋFSVolumeతू߹ɼఏڙྃॴ༗༰ྔɼႫ༨ۭ间ํ๏ɻ • FSDataSetੋࡏFSVolumeSet೭্进ߦ෧实现FSDatasetInterfaceआޱɼ֎ఏڙ块查询ૢ ࡞ํ๏ɻ • FSDir༻ဋ构ݐblock块ࡏdatanode࣓盘্త层࣍结构ɼᘍ认ႎԼ㑌录Լ࠷ଟ64ࢠ
录ɼ࠷ଟೳଘ储64块ɻ录ॳ࢝Խ时ձ递归扫ඳ录Լతॴ༗ࢠ录จ݅ɼ构ݐҰ树ܗ 结构ɻaddBlock时ɼटઌ尝试ࡏલ录৽Ճ块ɼՌલ录༗ۭ闲ۭ间ɼ则尝试ࡏࢠ 录தఴՃɼՌ༗ࢠ录ɼ则৽ݐҰࢠ录ɻ Monday, October 22, 12
NN&DN-DNၽ • DataNodeࡏ启动时ձNameNodeၽɼఏަԼ৴ଉɿ • nameɿصث໊ʢओص໊+务ޱ߸ʣ • infoPort: ঢ়态৴ଉ务ޱ • ipcPortɿ
ఏڙipc务తޱ߸ • NameNode为DataNodeҰstorageId Monday, October 22, 12
Monday, October 22, 12
NN&DN-৺检测 Monday, October 22, 12
NN&DN-blockReport Monday, October 22, 12
DN-offerservice Monday, October 22, 12
Client-码结构 Monday, October 22, 12
Client-DFSClient Monday, October 22, 12
Client-DFSClient 1. LeaseCheckerओཁ༻ဋlease检查续约ɻ 2. DFSOutputStream༻ဋఏڙ带bufferతࣈ节ྲྀࣸೖޭೳɻclientࡏࣸೖਾ时ઌকਾ缓ଘࡏ ຊɻኂকਾଟpacketʢᘍ认㑌packet为64Kʣɻ㑌packetຢඃ፥ଟ chunkʢᘍ认512Byteʣɼ㑌chunk༗Ұchecksumɻclientࣸ满Ұpacketձক该 packetՃೖ౸Ұdataqueueதɻ༝DataStreamer线ఔ负责ক㑌packet发ૹ给datanode pipelineɻ发ૹҰpakcetɼstreamerձকଖဓdataqueueҠࢸackqueueதɻ ResponseProcessor负责Ꮕdatanode发ճతack৴ଉɼ㑌ޭᏅҰpacketతack৴
ଉɼResponseProcessorձকackqueueத该packet删আɻ 3. DFSInputStream༻ဋఏڙࣈ节ྲྀత读औɼଖ෦෧ྃ༩NNDNతަޓ 4. DataStreamer: 负责datanode pipeline发ૹpacketɻଖຊੋҰDaemon线ఔɼဓ namenode获औblockIdblockଘ์Ґஔɼকpacket发ૹ给pipelineதతdatanodeɼ㑌 packet༗ҰseqIdɼ㑌packet发ૹ时ձᏅ౸datanodeతack৴ଉɻᏅ౸ॴ༗ packetతack৴ଉʢදࣔ该blockቮ发ૹʣɼstreamer䎔闭该blockɻ 5. ResponseProcessor:༻ဋᏅdatanodeฦճack৴ଉɼኂক㠳应ackqueueதతpacket删আ Monday, October 22, 12
ࣸೖྲྀఔ Monday, October 22, 12
读औྲྀఔ Monday, October 22, 12
DataXceiverServer • DataNodeࡏ 启动时 ձ௨ 过 DataXceiverServer䇖 启 ҰSocketޱɼ 负责
blockਾత 读 ࣸɻ DataXceiverServerຊ࡞ 为 Ұक 护线 ఔɼ 监 ჶdfs.datanode.addressஔతਾ 读 ࣸ 务 ޱɻ༗ 请 ٻདྷ 时 ɼ৽ݐҰDataXceiver 线 ఔ 处 ཧ 请 ٻɻ • DataXceiver 线 ఔ༻ဋ 处 ཧҰ 读 /ࣸਾྲྀ 请 ٻɼଖrunํ๏ೖԼओཁੋࠜਾ 请 ٻதෆಉత 请 ٻ㜎ܕɼ 调 ༻㠳 应 త 处 ཧํ๏ Monday, October 22, 12
readBlock() readBlock()ओཁဓdisk读औblockਾɼ构ݐҰDataOutputStreamਾྲྀɼኂ৽ݐҰ BlockSenderক这ਾྲྀ发ૹग़ڈʢdatanode҃ऀclientʣɻ BlockSender.sendBlock()发ૹతBlockతྲྀఔେମԼɿ 1. 读औblockతmeta৴ଉɼ获ಘchecksumኂ发ૹ 2. 发ૹਾ读औతภҠྔ 3. কblockਾ为packetɼ发ૹ给client
4. ॴ༗packet发ૹ೭ɼ䎔闭checksumจ݅blockจ݅ Monday, October 22, 12
writeBlock() 1.BlockReceiverဓ্҈packetҰ节读औਾɼࣸೖ౸ຊdisk 2.༗ԼҰ备㟨节ɼক该packet转发给ԼҰ节 3.ক该packetՃೖ౸ackqueue队ྻதackফଉ 4.ԼҰ节该packetࣸೖձฦճ该packet对应తack৴ଉ 5.PakcetResponderᏅ౸ack৴ଉɼকackqueueத该packet删আɼኂલஔ节发 ૹack৴ଉ Monday, October 22,
12
• 详细㩘记ࢀ⻅见ɿhttp://jiangbo.me/blog/ categories/hdfs/ Monday, October 22, 12
THX Monday, October 22, 12