Google's FlumeJava • Provides a Java based API for M/R pipelines • It uses an MST ( multiple serializable type ) data model • Good for processing complex data types • Better for “non tuple” data types i.e. – Images – Audio – Seismic data www.semtech-solutions.co.nz [email protected]
Reduce Pipe line ? – Map – Shuffle – Reduce – Combine • Arranged in sequence and / or in parallel • Potentially very long chains www.semtech-solutions.co.nz [email protected]
Crunch – Inner / Outer like SQL joins – Same with Left / Right / Full joins – MapSide join is an in memory join www.semtech-solutions.co.nz [email protected]
runs efficiently • Crunch is a thin veneer on top of Map Reduce • Two implementations available – Hadoop Writeables – Avro • Avro implementation much faster www.semtech-solutions.co.nz [email protected]
www.semtech-solutions.co.nz – [email protected] • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems