Operate HBase clusters at Scale

Operate HBase clusters at Scale

Presented by Florentin Dubois & Kevin Georges at SysadminDays #8 (https://sysadmindays.fr)

415efaa445ed983307231341eaa4be55?s=128

Renaud Chaput

October 20, 2018
Tweet

Transcript

  1. 15.

    @sysadmindays @ovh 15 Our clusters size BHS: • 30 nodes

    • 400 TB • 120 Mbps GRA: • 150 nodes • 2 PB • 1.1 Gbps
  2. 18.

    Warp10 Egress Warp10 Directory Warp10 Store @sysadmindays @ovh 18 Our

    cluster architecture Region server + Datanode Region server + Datanode Region server + Datanode Region server + Datanode Warp10 Ingress Warp10 Store Kafka Warp10 Directory Warp10 Egress
  3. 21.

    @sysadmindays @ovh 21 Hardware pitfalls Be sure how much controlers

    matches the number of disk & sata ports Be sure that your network link can handle your disk IO capacity Be sure of threads distributions, (IRQ, NUMA surprises,ingest+processing+gc+...)
  4. 32.

    @sysadmindays @ovh 32 Use cases families • Billing ……………………………………...……….... (e.g.

    bill on maximum consumption in a month) • Monitoring …………………………………………….…………………... (APM, infrastructure,appliances,...) • IoT ………………………………………………………….……………….... (Manage devices, operator integration, ...) • Geo Location …………………………………………………………………...………………... (manage localized fleets)
  5. 33.

    @sysadmindays @ovh 33 Use cases • DC Temperature/Elec/Cooling map •

    Pay as you go billing (PCI/IPLB) • GSCAN • Monitoring • ML Model scoring (Anti-Fraude) • Pattern Detection for medical applications
  6. 63.

    @sysadmindays @ovh 63 Xreceiver ERROR org.apache.hadoop.dfs.DataNode: DatanodeRegistration(...): DataXceiver: java.io.IOException: xceiverCount

    258 exceeds the limit of concurrent xcievers 256 HDFS INFO org.apache.hadoop.dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Could not read from stream INFO org.apache.hadoop.dfs.DFSClient: Abandoning block blk_-546... WARN org.apache.hadoop.dfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable to create new block. WARN org.apache.hadoop.dfs.DFSClient: Error Recovery for block blk_-546.. bad dn[0] FATAL org.apache.hadoop.hbase.regionserver.Flusher: Replay of hlog required. Forcing server shutdown HBASE
  7. 64.

    @sysadmindays @ovh 64 Xreceiver if (curXceiverCount > dataXceiverServer.maxXceiverCount) { throw

    new IOException(“xceiverCount ” + curXceiverCount + ” exceeds the limit of concurrent xcievers “ + dataXceiverServer.maxXceiverCount); }
  8. 65.

    @sysadmindays @ovh 65 Xreceiver if (curXceiverCount > dataXceiverServer.maxXceiverCount) { throw

    new IOException(“xceiverCount ” + curXceiverCount + ” exceeds the limit of concurrent xcievers “ + dataXceiverServer.maxXceiverCount); }
  9. 67.

    @sysadmindays @ovh 67 Hardware pitfalls Be sure how much controlers

    matches the number of disk & sata ports Be sure that your network link can handle your disk IO capacity Be sure of threads distributions, (IRQ, NUMA surprises,ingest+processing+gc+...)
  10. 68.

    @sysadmindays @ovh 68 Hardware pitfalls Be sure how much controlers

    matches the number of disk & sata ports Be sure that your network link can handle your disk IO capacity Be sure of threads distributions, (IRQ, NUMA surprises,ingest+processing+gc+...)