Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017

Disaster Recovery for Big Data

About us  We are nerds!  Started working in
Big Data for international companies  Founded a start-up a few years ago:  With colleagues working in related technical areas  And who also knew business stuff!  We’ve been participating in different Big Data projects

Introduction “I already have HDFS replication and High Availability in
my services, why would I need Disaster Recovery (or backup)?”

Concepts  High Availability (HA)  Protects from failing components:
disks, servers, network  Is generally a “systems” issue  Redundant, doubles components  Generally has strict network requirements  Fully automated, immediate

Concepts  Backup  Allows you to go back to
a previous state in time: daily, monthly, etc.  It is a “data” issue  Protects from accidental deletion or modification  Also used to check for unwanted modifications  Takes some time to restore

Concepts  Disaster Recovery  Allows you to work elsewhere
 It is a “business” issue  Covers you from: main site failures such as electric power or network outages, fires, floods or building damage  Similar to having insurance  Medium time to be back online

The ideal Disaster Recovery  High Availability for datacenters 
Exact duplicate of the main site  Seamless operation (no changes required)  Same performance  Same data  This is often very expensive and sometimes downright impossible

DR considerations  So, can we build a cheap(ish) DR?
 We must evaluate some tradeoffs:  What’s the cost of the service not being available? (Murphy’s Law: accidents will happen when you are busiest)  Is all information equally important? Can we lose a small amount of data?  Can we wait until we recover certain data from backup?  Can I find other uses for the DR site?

DR considerations  Near or far?  Availability  Latency
 Legal considerations

DR considerations  Synchronous vs Asynchronous  Synchronous replication requires
a FAST connection  Synchronous works at transaction level and is necessary for operational systems  Asynchronous replication converges over time  Asynchronous is not affected by delays nor does it create them

Big Data DR  Can’t generally be copied synchronously 
No VM replication  Other DR rules apply:  Since it impacts users, someone is in charge of the “starting gun”  DNS and network changes to point clients  Main types:  Storage replication  Dual ingestion

Storage replication  Similar to non-Big Data solutions, where central
storage is replicated  Generally implemented using distcp and HDFS snapshots  Data is ingested in source cluster and then copied

Storage replication  Administrative overhead:  Copy jobs must be
scheduled  Metadata changes must be tracked  Good enough for data that comes traditional ETLs such as daily batches

Dual Ingestion  No files, just streams  Generally ingested
from multiple outside sources through Kafka  Streams must be directed to both sites

Dual Ingestion  Adds complexity to apps  Nifi can
be set up as a front-end to both endpoints  Data consistency must be checked  Can be automatically set up via monitoring  Consolidation processes (such as a monthly re-sync) might be needed

Others  Ingestion replication  Variant of the dual ingestion
 A consumer is set up in the source Kafka that in turn writes to a destination Kafka  Bottleneck if the initial streams were generated by many producers  Mixed:  Previous solutions are not mutually exclusive  Storage replication for batch processes’ results  Dual ingestion for streams

Commercial offerings  Solutions that ease DR setup  Cloudera
BDR  Coordinates HDFS snapshots and copy  WANdisco Fusion  Continuous storage replication  Confluent Multi-site  Allows multi-site Kafka data replication

Tips  Big Data clusters have many nodes  Costly
to replicate  Performance / Capacity tradeoff  We can use cheaper servers in DR, since we don’t expect to use them often

Tips  Document and test procedures  DR is rarely
fully automated, so responsibilities and actions should be clearly defined  Plan for (at least) a yearly DR run  Track changes in software and configuration

Tips  Once you have a DR solution, other uses
will surface  DR site can be used for backup  Maintain HDFS snapshots  DR data can be used for testing / reporting  Warning: it may alter stored data

Conclusions  Balance HA / Backup / DR as needed,
they are not exclusive:  Different costs  Different impact  Big Data DR is different:  Dedicated hardware  No VMs, no storage cabin  Plan for DATA CENTRIC solutions

Questions

Disaster Recovery for Big Data by Carlos Izquie...

Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017

Big Data Spain

More Decks by Big Data Spain

Other Decks in Technology

Featured

Transcript

Disaster Recovery for Big Data

About us  We are nerds!  Started working in

Introduction “I already have HDFS replication and High Availability in

Concepts  High Availability (HA)  Protects from failing components:

Concepts  Backup  Allows you to go back to

Concepts  Disaster Recovery  Allows you to work elsewhere

The ideal Disaster Recovery  High Availability for datacenters 

DR considerations  So, can we build a cheap(ish) DR?

DR considerations  Near or far?  Availability  Latency

DR considerations  Synchronous vs Asynchronous  Synchronous replication requires

Big Data DR  Can’t generally be copied synchronously 

Storage replication  Similar to non-Big Data solutions, where central

Storage replication  Administrative overhead:  Copy jobs must be

Dual Ingestion  No files, just streams  Generally ingested

Dual Ingestion  Adds complexity to apps  Nifi can

Others  Ingestion replication  Variant of the dual ingestion

Commercial offerings  Solutions that ease DR setup  Cloudera

Tips  Big Data clusters have many nodes  Costly

Tips  Document and test procedures  DR is rarely

Tips  Once you have a DR solution, other uses

Conclusions  Balance HA / Backup / DR as needed,

Questions