Big Data - 04 - Distributed File Systems

Big Data - 04 - Distributed File Systems

Lecture given at ETH on October 3 and 4, 2017

66d5abafc597b670cf6f109e4c278ebc?s=128

Ghislain Fourny

October 04, 2017
Tweet

Transcript

  1. Ghislain Fourny Big Data 4. Distributed file systems Kheng Ho

    Toh / 123RF Stock Photo
  2. 2 So far... We've rehearsed relational databases

  3. 3 So far... We've looked into scaling out

  4. 4 So far... We've seen a simple model for object

    storage
  5. 5 So far... We've looked into Key-Value Stores

  6. 6 Duplication: 2 ranges 000... 111... mod 2n

  7. 7 Review: vector clocks incoming request put(key1, A)

  8. 8 Review: vector clocks incoming request put(key1, A) Key Node

    key1 n1, n2, n3 n1 n2 n3
  9. 9 Review: vector clocks incoming request put(key1, A) Key Node

    key1 n1, n2, n3 coordinator n2 n3
  10. 10 Review: vector clocks incoming request put(key1, A) coordinator n2

    n3 New context [ (n1, 1) ]
  11. 11 Review: vector clocks coordinator n2 n3 A, [ (n1,

    1) ]
  12. 12 Review: vector clocks coordinator n2 n3 A, [ (n1,

    1) ] A, [ (n1, 1) ] A, [ (n1, 1) ] replication replication
  13. 13 Review: vector clocks coordinator n2 n3 A, [ (n1,

    1) ] A, [ (n1, 1) ] A, [ (n1, 1) ] get(A)
  14. 14 Review: vector clocks coordinator n2 n3 A, [ (n1,

    1) ] A, [ (n1, 1) ] A, [ (n1, 1) ] get(A) gathering all versions gathering all versions
  15. 15 Review: vector clocks coordinator n2 n3 A, [ (n1,

    1) ] A, [ (n1, 1) ] A, [ (n1, 1) ] get(A) A, [ (n1, 1) ] A, [ (n1, 1) ] A, [ (n1, 1) ]
  16. 16 Review: vector clocks coordinator n2 n3 A, [ (n1,

    1) ] A, [ (n1, 1) ] A, [ (n1, 1) ] get(A) A, [ (n1, 1) ]
  17. 17 Review: vector clocks coordinator n2 n3 A, [ (n1,

    1) ] A, [ (n1, 1) ] A, [ (n1, 1) ] get(A) Returning A, [ (n1, 1) ] context
  18. 18 Review: vector clocks coordinator n2 n3 A, [ (n1,

    1) ] A, [ (n1, 1) ] A, [ (n1, 1) ] incoming request put(key1, [ (n1, 1) ], B) B, [ (n1, 2) ] new context (+1)
  19. 19 Review: vector clocks coordinator n2 n3 A, [ (n1,

    1) ] A, [ (n1, 1) ] A, [ (n1, 1) ] B, [ (n1, 2) ]
  20. 20 Review: vector clocks coordinator n2 n3 A, [ (n1,

    1) ] A, [ (n1, 1) ] A, [ (n1, 1) ] B, [ (n1, 2) ] B, [ (n1, 2) ] B, [ (n1, 2) ] replication replication
  21. 21 Review: vector clocks coordinator n2 n3 A, [ (n1,

    1) ] A, [ (n1, 1) ] A, [ (n1, 1) ] B, [ (n1, 2) ] B, [ (n1, 2) ] B, [ (n1, 2) ] get(A)
  22. 22 Review: vector clocks coordinator n2 n3 A, [ (n1,

    1) ] A, [ (n1, 1) ] A, [ (n1, 1) ] B, [ (n1, 2) ] B, [ (n1, 2) ] B, [ (n1, 2) ] get(A) gathering all versions gathering all versions A, [ (n1, 1) ] B, [ (n1, 2) ]
  23. 23 Review: vector clocks coordinator n2 n3 A, [ (n1,

    1) ] A, [ (n1, 1) ] A, [ (n1, 1) ] B, [ (n1, 2) ] B, [ (n1, 2) ] B, [ (n1, 2) ] get(A) gathering all versions gathering all versions A, [ (n1, 1) ] B, [ (n1, 2) ] Returning B, [ (n1, 2) ] maximum element
  24. 24 Review: vector clocks coordinator n2 n3 A, [ (n1,

    1) ] A, [ (n1, 1) ] A, [ (n1, 1) ] B, [ (n1, 2) ] B, [ (n1, 2) ] B, [ (n1, 2) ] incoming request put(key1, [ (n1, 2) ], C) C, [ (n1, 3) ] new context (+1)
  25. 25 Review: vector clocks n2 n3 A, [ (n1, 1)

    ] A, [ (n1, 1) ] A, [ (n1, 1) ] B, [ (n1, 2) ] B, [ (n1, 2) ] B, [ (n1, 2) ] C, [ (n1, 3) ] network partition
  26. 26 Review: vector clocks n2 n3 A, [ (n1, 1)

    ] A, [ (n1, 1) ] A, [ (n1, 1) ] B, [ (n1, 2) ] B, [ (n1, 2) ] B, [ (n1, 2) ] C, [ (n1, 3) ] network partition get(A) gathering all versions interim coordinator
  27. 27 Review: vector clocks n2 n3 A, [ (n1, 1)

    ] A, [ (n1, 1) ] A, [ (n1, 1) ] B, [ (n1, 2) ] B, [ (n1, 2) ] B, [ (n1, 2) ] C, [ (n1, 3) ] network partition get(A) Returning B, [ (n1, 2) ] gathering all versions interim coordinator
  28. 28 Review: vector clocks n2 n3 A, [ (n1, 1)

    ] A, [ (n1, 1) ] A, [ (n1, 1) ] B, [ (n1, 2) ] B, [ (n1, 2) ] B, [ (n1, 2) ] C, [ (n1, 3) ] network partition incoming request put(key1, [ (n1, 2) ], D) D, [ (n1, 2), (n2, 1) ] new context interim coordinator
  29. 29 Review: vector clocks interim coordinator n2 n3 A, [

    (n1, 1) ] A, [ (n1, 1) ] A, [ (n1, 1) ] B, [ (n1, 2) ] B, [ (n1, 2) ] B, [ (n1, 2) ] C, [ (n1, 3) ] network partition D, [ (n1, 2), (n2, 1) ]
  30. 30 Review: vector clocks n2 n3 A, [ (n1, 1)

    ] A, [ (n1, 1) ] A, [ (n1, 1) ] B, [ (n1, 2) ] B, [ (n1, 2) ] B, [ (n1, 2) ] C, [ (n1, 3) ] D, [ (n1, 2), (n2, 1) ]
  31. 31 Review: vector clocks coordinator n2 n3 A, [ (n1,

    1) ] A, [ (n1, 1) ] A, [ (n1, 1) ] B, [ (n1, 2) ] B, [ (n1, 2) ] B, [ (n1, 2) ] C, [ (n1, 3) ] D, [ (n1, 2), (n2, 1) ]
  32. 32 Review: vector clocks coordinator n2 n3 A, [ (n1,

    1) ] A, [ (n1, 1) ] A, [ (n1, 1) ] B, [ (n1, 2) ] B, [ (n1, 2) ] B, [ (n1, 2) ] C, [ (n1, 3) ] D, [ (n1, 2), (n2, 1) ] get(A)
  33. 33 Review: vector clocks coordinator n2 n3 A, [ (n1,

    1) ] A, [ (n1, 1) ] A, [ (n1, 1) ] B, [ (n1, 2) ] B, [ (n1, 2) ] B, [ (n1, 2) ] C, [ (n1, 3) ] D, [ (n1, 2), (n2, 1) ] get(A) gathering all versions gathering all versions A, [ (n1, 1) ] B, [ (n1, 2) ] C, [ (n1, 3) ] D, [ (n1, 2), (n2, 1) ]
  34. 34 Review: vector clocks A, [ (n1, 1) ] B,

    [ (n1, 2) ] C, [ (n1, 3) ] D, [ (n1, 2), (n2, 1) ]
  35. 35 Directed Acyclic Graph A, [ (n1, 1) ] B,

    [ (n1, 2) ] C, [ (n1, 3) ] D, [ (n1, 2), (n2, 1) ]
  36. 36 Directed Acyclic Graph A, [ (n1, 1) ] B,

    [ (n1, 2) ] C, [ (n1, 3) ] D, [ (n1, 2), (n2, 1) ] not comparable maximal elements
  37. 37 Review: vector clocks coordinator n2 n3 A, [ (n1,

    1) ] A, [ (n1, 1) ] A, [ (n1, 1) ] B, [ (n1, 2) ] B, [ (n1, 2) ] B, [ (n1, 2) ] C, [ (n1, 3) ] D, [ (n1, 2), (n2, 1) ] get(A) gathering all versions gathering all versions A, [ (n1, 1) ] B, [ (n1, 2) ] C, [ (n1, 3) ] D, [ (n1, 2), (n2, 1) ]
  38. 38 Review: vector clocks coordinator n2 n3 A, [ (n1,

    1) ] A, [ (n1, 1) ] A, [ (n1, 1) ] B, [ (n1, 2) ] B, [ (n1, 2) ] B, [ (n1, 2) ] C, [ (n1, 3) ] D, [ (n1, 2), (n2, 1) ] get(A) gathering all versions gathering all versions A, [ (n1, 1) ] B, [ (n1, 2) ] C, [ (n1, 3) ] D, [ (n1, 2), (n2, 1) ] return C, D, [ (n1, 3), (n2, 1) ]
  39. 39 Review: vector clocks coordinator n2 n3 A, [ (n1,

    1) ] A, [ (n1, 1) ] A, [ (n1, 1) ] B, [ (n1, 2) ] B, [ (n1, 2) ] B, [ (n1, 2) ] C, [ (n1, 3) ] D, [ (n1, 2), (n2, 1) ]
  40. 40 Review: vector clocks A, [ (n1, 1) ] B,

    [ (n1, 2) ] D, [ (n1, 2), (n2, 1) ] C, [ (n1, 3) ] A, [ (n1, 1) ] B, [ (n1, 2) ] D, [ (n1, 2), (n2, 1) ] C, [ (n1, 3) ] A, [ (n1, 1) ] B, [ (n1, 2) ] D, [ (n1, 2), (n2, 1) ] C, [ (n1, 3) ] synchronization
  41. 41 Review: vector clocks D, [ (n1, 3), (n2, 1)

    ] C, [ (n1, 3) ] D, [ (n1, 3), (n2, 1) ] C, [ (n1, 3) ] D, [ (n1, 3), (n2, 1) ] C, [ (n1, 3) ] Cleanup A, [ (n1, 1) ] B, [ (n1, 2) ] C, [ (n1, 3) ] D, [ (n1, 2), (n2, 1) ]
  42. 42 Review: vector clocks D, [ (n1, 3), (n2, 1)

    ] C, [ (n1, 3) ] D, [ (n1, 3), (n2, 1) ] C, [ (n1, 3) ] D, [ (n1, 3), (n2, 1) ] C, [ (n1, 3) ] incoming request put(key1, [ (n1, 3), (n2, 1) ], E) (Client semantically solved the conflict between C and D)
  43. 43 Review: vector clocks D, [ (n1, 3), (n2, 1)

    ] C, [ (n1, 3) ] D, [ (n1, 3), (n2, 1) ] C, [ (n1, 3) ] D, [ (n1, 3), (n2, 1) ] C, [ (n1, 3) ] incoming request put(key1, [ (n1, 3), (n2, 1) ], E) E, [ (n1, 4), (n2, 1) ]
  44. 44 Review: vector clocks D, [ (n1, 3), (n2, 1)

    ] C, [ (n1, 3) ] D, [ (n1, 3), (n2, 1) ] C, [ (n1, 3) ] D, [ (n1, 3), (n2, 1) ] C, [ (n1, 3) ] E, [ (n1, 4), (n2, 1) ] E, [ (n1, 4), (n2, 1) ] E, [ (n1, 4), (n2, 1) ]
  45. 45 Review: vector clocks D, [ (n1, 3), (n2, 1)

    ] C, [ (n1, 3) ] D, [ (n1, 3), (n2, 1) ] C, [ (n1, 3) ] D, [ (n1, 3), (n2, 1) ] C, [ (n1, 3) ] E, [ (n1, 4), (n2, 1) ] E, [ (n1, 4), (n2, 1) ] E, [ (n1, 4), (n2, 1) ] C, [ (n1, 3) ] D, [ (n1, 2), (n2, 1) ] E, [ (n1, 4), (n2, 1) ]
  46. 46 Version history (all times) A, [ (n1, 1) ]

    B, [ (n1, 2) ] C, [ (n1, 3) ] D, [ (n1, 2), (n2, 1) ] E, [ (n1, 4), (n2, 1) ] absolute maximum
  47. 47 Now, back to today's topic. There is Big Data

    and Big Data Anna Liebiedieva / 123RF Stock Photo Vadym Kurgak / 123RF Stock Photo
  48. 48 Use cases A hugeamount of largefiles?

  49. 49 Use cases vs. A hugeamount of largefiles? A largeamount

    of hugefiles?
  50. 50 Use cases vs. Billionsof TB files Millions of PBfiles

    Object Storage File Storage
  51. 51 Where does the data come from? Raw Data Sensors

    Measurements Events Logs Oleg Dudko / 123RF Stock Photo
  52. 52 Where does the data come from? Raw Data Derived

    Data Sensors Measurements Events Logs Aggregated data Intermediate data Oleg Dudko / 123RF Stock Photo Anton Starikov / 123RF Stock Photo
  53. 53 Technologies and models Key-Value Store File System Object Storage

    Block Storage Billions of <TB files Millions of <PB files vs.
  54. 54 Technologies and models Key-Value Store File System Object Storage

    Block Storage Billions of <TB files Millions of <PB files vs.
  55. 55 Technologies and models Key-Value Store File System Object Storage

    Block Storage Billions of <TB files Millions of <PB files vs.
  56. 56 Technologies and models Key-Value Store File System Object Storage

    Block Storage Billions of <TB files Millions of <PB files vs.
  57. 57 Distributed file systems: inception FS 57

  58. 58 GFS genesis Characteristics

  59. 59 GFS genesis Characteristics Requirements

  60. 60 GFS genesis Characteristics File System Design Requirements

  61. 61 Fault tolerance and robustness Vitaly Korovin / 123RF Stock

    Photo It mightfail Local disk
  62. 62 Fault tolerance and robustness Vitaly Korovin / 123RF Stock

    Photo It mightfail nodes willfail Kheng Ho Toh / 123RF Stock Photo Local disk Cluster with 100s to10,000s of machines 62
  63. 63 Fault tolerance and robustness Monitoring Kheng Ho Toh /

    123RF Stock Photo
  64. 64 Fault tolerance and robustness Monitoring Error detection Kheng Ho

    Toh / 123RF Stock Photo
  65. 65 Fault tolerance and robustness Monitoring Error detection Automatic Recovery

    Kheng Ho Toh / 123RF Stock Photo
  66. 66 Fault tolerance and robustness Fault tolerance Monitoring Error detection

    Automatic Recovery Kheng Ho Toh / 123RF Stock Photo
  67. 67 File update model Random access

  68. 68 File update model Random access Upsert/append only vs.

  69. 69 File update model immutable Append suitable for Sensors Logs

    Intermediate data _____ _____ _____
  70. 70 Appends Append only 100s of clients in parallel atomic

    GFS only!
  71. 71 Performance requirements Top priority: Throughput

  72. 72 Performance requirements ? ! Top priority: Throughput Secondary: Latency

  73. 73 The progress made (1956-2010): Logarithmic Picture: Ash Waechter/123RF 150,000,000x

    10,000x 8x Source: Michael E. Friske, Claus Mikkelsen, The History of Storage, SHARE 2014 Capacity Throughput Latency
  74. 74 The progress made (1956-2010): Logarithmic Picture: Ash Waechter/123RF 150,000,000x

    10,000x 8x Source: Michael E. Friske, Claus Mikkelsen, The History of Storage, SHARE 2014 Capacity Throughput Latency Parallelize!
  75. 75 The progress made (1956-2010): Logarithmic Picture: Ash Waechter/123RF 150,000,000x

    10,000x 8x Source: Michael E. Friske, Claus Mikkelsen, The History of Storage, SHARE 2014 Capacity Throughput Latency Batch processing!
  76. 76 Hadoop

  77. 77 Hadoop Initiated in 2006

  78. 78 Hadoop Primarily: • Distributed File System (HDFS) • MapReduce

    • Wide column store (HBase) Covered in this lecture
  79. 79 Hadoop Inspired by Google's • GFS (2003) • MapReduce

    (2004) • BigTable (2006) Covered in this lecture
  80. 80 Size timeline Date Size reported by Yahoo April 2006

    188 May 2006 300 October 2006 600 April 2007 1,000 February 2008 10,000 (index generation) March 2009 24,000 (17 clusters) June 2011 42,000 (100+ PB)
  81. 81 Distributed file systems: the model

  82. 82 Lorem Ipsum Dolor sit amet Consectetur Adipiscing Elit. In

    Imperdiet Ipsum ante File Systems (Logical Model) Key-Value Storage
  83. 83 Lorem Ipsum Dolor sit amet Consectetur Adipiscing Elit. In

    Imperdiet Ipsum ante File Systems (Logical Model) Lorem Ipsum Dolor sit amet Consectetur Adipiscing Elit. In Imperdiet Ipsum ante Key-Value Storage File Storage vs.
  84. 84 Block Storage (Physical Storage) 111010010110101… 1 2 3 4

    5 6 7 8 Object Storage Block Storage
  85. 85 Terminology HDFS: Block GFS: Chunk

  86. 86 Files and blocks Lorem Ipsum Dolor sit amet Consectetur

    Adipiscing Elit. In Imperdiet Ipsum ante
  87. 87 Files and blocks Lorem Ipsum Dolor sit amet Consectetur

    Adipiscing Elit. In Imperdiet Ipsum ante 1 2 3 4 5 6 7 8
  88. 88 Files and blocks Lorem Ipsum Dolor sit amet Consectetur

    Adipiscing Elit. In Imperdiet Ipsum ante 1 2 3 4 5 6 7 8 1 2 3
  89. 89 Files and blocks Lorem Ipsum Dolor sit amet Consectetur

    Adipiscing Elit. In Imperdiet Ipsum ante 1 2 3 4 5 6 7 8 1 2 3 1 2 3 4
  90. 90 Why blocks?

  91. 91 Why blocks? 1. Files bigger than a disk PBs!

    91
  92. 92 Why blocks? 1. Files bigger than a disk PBs!

    2. Simpler level of abstraction 92
  93. 93 Single machine vs. distributed

  94. 94 The right block size Simple file system 4 kB

  95. 95 The right block size Simple file system Distributed file

    system 4 kB 128 MB
  96. 96 The right block size Relational Database Distributed file system

    4 kB – 32 kB 128 MB
  97. 97 HDFS Architecture

  98. 98 How do we connect the many machines?

  99. 99 Peer-to-peer architecture

  100. 100 Master-slave architecture Slave Master Slave Slave Slave Slave Slave

  101. 101 HDFS server architecture

  102. 102 HDFS server architecture Datanode Datanode Datanode Datanode Datanode Datanode

    Namenode
  103. 103 From the file perspective Namenode File...

  104. 104 From the file perspective File... ...divided into 128MB chunks...

    Namenode
  105. 105 From the file perspective File... ...divided into 128MB chunks...

    ... replicated for fault tolerance Namenode
  106. 106 Concurrently accessed Datanode Datanode Datanode Datanode Datanode Datanode Namenode

  107. 107 Hadoop implementation (Packaged code)

  108. 108 HDFS Architecture Datanode Datanode Datanode Datanode Datanode Datanode Namenode

  109. 109 HDFS Architecture: NameNode Datanode Datanode Datanode Datanode Datanode Datanode

    Namenode
  110. 110 NameNode: all system-wide activity

  111. 111 NameNode: all system-wide activity Memory 1 File namespace (+Access

    Control)
  112. 112 NameNode: all system-wide activity Memory /dir/file1 /dir/file2 /file3 File

    to block mapping 1 File namespace (+Access Control) 2
  113. 113 NameNode: all system-wide activity Memory Block locations /dir/file1 /dir/file2

    /file3 File to block mapping 1 File namespace (+Access Control) 2 3
  114. 114 HDFS Architecture Datanode Datanode Datanode Datanode Datanode Datanode Namenode

  115. 115 HDFS Architecture: DataNode Datanode Datanode Datanode Datanode Datanode Datanode

    Namenode
  116. 116 DataNode

  117. 117 DataNode

  118. 118 DataNode Blocks are stored on the local disk

  119. 119 DataNode Proximity to hardware facilitates disk failure detection

  120. 120 Block IDs 64 bits e.g., 7586700455251598184

  121. 121 Subblock granularity: Byte Range

  122. 122 Communication Datanode Namenode Datanode Client

  123. 123 Client Protocol (RPC) Client Metadata operations Namenode

  124. 124 Client Protocol (RPC) Client Metadata operations DataNode location Namenode

  125. 125 Client Protocol (RPC) Client Metadata operations DataNode location Block

    IDs Namenode
  126. 126 Client Protocol (RPC) Namenode Client Metadata operations DataNode location

    Block IDs Java API available 126
  127. 127 Communication Datanode Namenode Datanode Client Control

  128. 128 DataNode Protocol (RPC) Datanode Datanode always initiates connection! Namenode

  129. 129 DataNode Protocol (RPC) Datanode Datanode always initiates connection! Registration

    Namenode
  130. 130 DataNode Protocol (RPC) Datanode Heartbeat Datanode always initiates connection!

    every 3s customizable Registration Namenode
  131. 131 DataNode Protocol (RPC) Datanode Heartbeat Block operations Datanode always

    initiates connection! every 3s customizable Registration Namenode
  132. 132 DataNode Protocol (RPC) Datanode Heartbeat BlockReport Block operations Datanode

    always initiates connection! every 3s every 6h customizable Registration Namenode
  133. 133 DataNode Protocol (RPC) Datanode Heartbeat BlockReport Block operations Datanode

    always initiates connection! every 3s every 6h customizable Registration BlockReceived Namenode
  134. 134 DataNode Protocol (RPC) Datanode Namenode Heartbeat BlockReport Block operations

    Java API available every 3s every 6h customizable Registration BlockReceived
  135. 135 Communication Datanode Namenode Datanode Client Control Control

  136. 136 DataTransfer Protocol (Streaming) DataNode Client Data blocks DataNode DataNode

    136
  137. 137 DataTransfer Protocol (Streaming) DataNode Client Data blocks DataNode DataNode

    Replication pipelining (write only) 137
  138. 138 DataTransfer Protocol (Streaming) DataNode Client Data blocks DataNode DataNode

    Replication pipelining (write only) 138
  139. 139 Communication Datanode Namenode Datanode Client Control Control Control Data

    Control
  140. 140 Summary Datanode Namenode Datanode Client Client Protocol DataTransfer Protocol

    DataNode Protocol
  141. 141 Summary Datanode Namenode Datanode Client metadata data

  142. 142 Metadata functionality Create directory Delete directory Write file Read

    file Delete file
  143. 143 Client reads a file

  144. 144 Client reads a file Asks for file 1

  145. 145 Client reads a file Get block locations Multiple DataNodes

    for each block, sorted by distance 2
  146. 146 Client reads a file Read 3 Input Stream

  147. 147 Client writes a file

  148. 148 Client writes a file Create 1

  149. 149 Client writes a file DataNodes for first block 3

  150. 150 Client writes a file Organizes pipeline 4

  151. 151 Client writes a file Sends data over 5

  152. 152 Client writes a file Ack 6

  153. 153 Client writes a file Block IDs received 8

  154. 154 Client writes a file DataNodes for second block 3

  155. 155 Client writes a file Organizes pipeline 4

  156. 156 Client writes a file Sends data over 5

  157. 157 Client writes a file Ack 6

  158. 158 Client writes a file Block IDs received 8

  159. 159 Replicas Number of replicas specified per file default:3

  160. 160 Replica placement: what to consider? Reliability Read/Write Bandwidth Block

    distribution
  161. 161 Replica placement: Reminder on topology Cluster Rack Node

  162. 162 Replica placement: Distance B A D(A,B)=1

  163. 163 Replica placement: Distance B A D(A,B)=2

  164. 164 Replica placement

  165. 165 Replica placement Replica 1: same node as client (or

    random), rack A
  166. 166 Replica placement Replica 1: same node as client (or

    random), rack A Replica 2: a node in a different rack B
  167. 167 Replica placement Replica 1: same node as client (or

    random), rack A Replica 2: a node in a different rack B Replica 3: a node in same rack B
  168. 168 Replica placement Replica 1: same node as client (or

    random), rack A Replica 2: a node in a different rack B Replica 3: a node in same rack B Replica 4 and beyond: random, but if possible: • at most one replica per node • at most two replicas per rack
  169. 169 Replica placement Client

  170. 170 Why replicas 2+3 on other rack? Client

  171. 171 If replicas 1+2 were on same rack... Block concentration

    on same rack (2/3)
  172. 172 Performance and availability

  173. 173 The NameNode is a single point of failure Datanode

    Datanode Datanode Datanode Datanode Datanode Namenode /dir/file1 /dir/file2 /file3
  174. 174 The namenode is a single point of failure... Datanode

    Datanode Datanode Datanode Datanode Datanode Namenode /dir/file1 /dir/file2 /file3 What if it fails?
  175. 175 NameNode: all system-wide activity Memory Block locations /dir/file1 /dir/file2

    /file3 File to block mapping 1 File namespace (+Access Control) 2 3
  176. 176 1. You want to persist Memory 1 /dir/file1 2

    3 not persisted
  177. 177 1. You want to persist Namespace file Persistent Storage

    Memory 1 /dir/file1 2 3 not persisted
  178. 178 1. You want to persist Namespace file Persistent Storage

    Memory 1 /dir/file1 /dir/file2 2 3 not persisted Edit log
  179. 179 1. You want to persist Namespace file Persistent Storage

    Memory 1 /dir/file1 /dir/file2 /file3 2 3 not persisted Edit log
  180. 180 2. You want to backup Namespace file Edit log

    Persistent Storage Shared drive Backup drives Glacier
  181. 181 The namenode is a single point of failure... Datanode

    Datanode Datanode Datanode Datanode Datanode Namenode /dir/file1 /dir/file2 /file3 What if it fails?
  182. 182 The namenode is a single point of failure... Datanode

    Datanode Datanode Datanode Datanode Datanode Namenode /dir/file1 /dir/file2 /file3 We need to start it up again!
  183. 183 Namenodes: Startup Namespace file Persistent Storage Memory Edit log

  184. 184 Namenodes: Startup Namespace file Persistent Storage Memory Filesystem Edit

    log
  185. 185 Edit log Namenodes: Startup Namespace file Persistent Storage Memory

    Filesystem
  186. 186 Edit log Namenodes: Startup Namespace file Persistent Storage Memory

    Filesystem /dir/file1 /dir/file2 /file3
  187. 187 Namenodes: Startup Namespace file Persistent Storage Memory Filesystem /dir/file1

    /dir/file2 /file3 Block locations Edit log
  188. 188 Namenodes: Startup Namespace file Persistent Storage Memory Filesystem /dir/file1

    /dir/file2 /file3 Block locations Edit log
  189. 189 Namenodes: Startup Namespace file Persistent Storage Memory Filesystem /dir/file1

    /dir/file2 /file3 Block locations Edit log
  190. 190 Starting a namenode... ... takes 30 minutes!

  191. 191 Starting a namenode... Can we do better?

  192. 192 3. Checkpoints with secondary namenodes Old namespace file Edit

    log New namespace file
  193. 193 4. High Availability (HA): Backup Namenodes Datanode Datanode Datanode

    Datanode Datanode Datanode Namenode /dir/file1 /dir/file2 /file3 Namenode /dir/file1 /dir/file2 /file3 Maintains mappings and locations in memory like the namenode. Ready to take over at all times
  194. 194 5. Federated DFS Datanode Datanode Datanode Datanode Datanode Datanode

    Namenode /foo /foo/file1 /foo/file2 Namenode /bar /bar/file1 /bar/file2
  195. 195 Using HDFS

  196. 196 HDFS Shell $ hadoop fs <args>

  197. 197 HDFS Shell $ hadoop fs <args> $ hdfs dfs

    <args>
  198. 198 HDFS Shell $ hadoop fs <args> $ hdfs dfs

    <args> local filesystem
  199. 199 HDFS Shell: POSIX-like $ hadoop fs –ls $ hadoop

    fs –cat /dir/file $ hadoop fs –rm /dir/file $ hadoop fs –mkdir /dir
  200. 200 HDFS Shell: download from HDFS $ hadoop fs –get

    /user/hadoop/file localfile $ hadoop fs –copyToLocal /user/hadoop/file localfile
  201. 201 HDFS Shell: upload to HDFS $ hadoop fs –put

    localfile1 localfile2 /user/hadoop/hadoopdir $ hadoop fs –copyFromLocal localfile1 localfile2 /user/hadoop/hadoopdir
  202. 202 HDFS Shell: Configuration core-site.xml <properties> <property> <name>fs.defaultFS</name> <value>hdfs://host:8020</value> <description>NameNode

    hostname</description> </property> </properties>
  203. 203 HDFS Shell: Configuration hdfs-site.xml <properties> <property> <name>dfs.replication</name> <value>1</value> <description>Replication

    factor</description> </property> <property> <name>dfs.namenode.name.dir</name> <value>/grid/hadoop/hdfs/nn</value> <description>NameNode directory</description> </property> <property> <name>dfs.datanode.name.dir</name> <value>/grid/hadoop/hdfs/nn</value> <description>DataNode directory</description> </property> </properties>
  204. 204 Populating HDFS: Apache Flume Collects, aggregates, moves log data

    (into HDFS) _____ _____ __ _____ ___ _____
  205. 205 Populating HDFS: Apache Sqoop Imports from a relational database

  206. 206 GFS

  207. 207 GFS vs. HDFS: Terminology NameNode DataNode Block FS Image

    Edit log HDFS Master Chunkserver Chunk Checkpoint image Operation log GFS
  208. 208 HDFS vs. GFS: Block size GFS/Apache HDFS 64 MB

    128 MB Cloudera HDFS
  209. 209 Pointers Official documentation http://hadoop.apache.org/docs/r2.7.3/ GFS Paper On course website

    Java API http://blog.woopi.org/wordpress/files/hadoop- 2.6.0-javadoc/