Slide 1

Slide 1 text

Containerizing Apache HBase Clusters David Pope & Javier Maestro Production Engineers - HBase

Slide 2

Slide 2 text

Container Overview

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

1964 IBM 360 1979 UNIX chroot 1982 BSD chroot 1999 FreeBSD jail 2013 Docker 2007-8 cgroups LXC

Slide 5

Slide 5 text

Container Platforms

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

HBase Containers @ Facebook

Slide 8

Slide 8 text

LEASE BUY

Slide 9

Slide 9 text

Buy vs. Lease • Our scale • Timing • Control / Full Ownership • Financial Infrastructure (os, network) Container Platform (Tupperware) Application Services Physical (data center, hardware)

Slide 10

Slide 10 text

Tupperware Overview scheduler server db host3 host4 host1 host2 config.tw

Slide 11

Slide 11 text

Tupperware Spec

Slide 12

Slide 12 text

Tupperware Benefits • Configuration Spec • Deployments • Scheduler • Health Monitor • Logging • Canary • Web UI / CLI / API • Elasticity (auto-scaling)

Slide 13

Slide 13 text

HBase Cell rack rack rack rack

Slide 14

Slide 14 text

Types of servers controllers nodes

Slide 15

Slide 15 text

High Availability

Slide 16

Slide 16 text

Server Pools controller pool (jobs) node pool (jobs) regionserver datanode master zk

Slide 17

Slide 17 text

Stateful Elastic “cloud”

Slide 18

Slide 18 text

Behind the Container

Slide 19

Slide 19 text

The “Noisy Neighbor”

Slide 20

Slide 20 text

• High iops on /dev/sda • Synchronous logging The “Noisy Neighbor” From the Container HBase Container /dev/sda

Slide 21

Slide 21 text

• Configuration Management putting load on /dev/sda • Memory pressure forcing paging on /dev/sda • Large configuration subscriptions • Bloated packages The “Noisy Neighbor” From the Host System HBase Container Host System /dev/sda

Slide 22

Slide 22 text

Performance & The Bug

Slide 23

Slide 23 text

• Increased latency and timeouts • Cyclical spikes in io-wait across all of the Region Servers / Datanodes Performance & The Bug From the Container

Slide 24

Slide 24 text

• Cyclical spikes across all disks hitting 100% utilization • No sign of any applications accessing the disks (iops == 0) Performance & The Bug From the Host System

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

• Log entries of a “learning cycle” every few minutes • Correlation to the drives locking up • Configuration mode to enable this “learning cycle” Performance & The Bug From the Hardware RAID Controller learning cycle HBase Container Host System

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

The Scheduler Apocalypse

Slide 29

Slide 29 text

The Scheduler Apocalypse scheduler host3 host4 host1 host2

Slide 30

Slide 30 text

Conclusions

Slide 31

Slide 31 text

Conclusions • Containers provide a rich suite of tools and technologies to create standard, consistent and repeatable services • However, there are critical decisions to be made: • Buy vs. Lease • What parts of the Container Platform to use • You still need to be aware of what is happening behind the container • The leverage of the container goes both ways

Slide 32

Slide 32 text

Q&A

Slide 33

Slide 33 text

Thanks!