MapReduce Processing Model • Define mappers • Shuffling is automatic • Define reducers • For complex work, chain jobs together – Use a higher level language or DSL that does this for you
Conclusion • If possible use Streams: Kafka, Logstash • Advanced Data Processing and Machine Learning : Spark • Expose your data using SQL for your “BI folks” : Drill • Aggregation and Full Text Search : Elasticsearch • Data Visualisation : Kibana