Apache Tez

What Is Apache Tez ? • An application framework •
Build on top of Apache Hadoop YARN • Uses directed-acyclic-graphs ( DAG's ) • Open source / Apache 2.0 license • Scaleable • Performant

Hadoop Eco Sphere

Tez DAG • Tez directed-acyclic-graphs ( DAG ) • Distributed
data processing • Vertices represent data transformation • Edges represent data movement • For data processing applications • TEZ is an execution engine • Built on top of YARN

Tez Performance • Performance improvement compared to Map Reduce –
No need for HDFS storage between MR jobs – Better execution performance • Expressive dataflow API for DAG – Visualise what you wish to construct – Add processor vertices to graph – Add data movement edges to graph – To build the computational DAG that you require

Tez Deployment • Tez is client side • Install Tez
client locally • Build task DAG • Load DAG/Tez libraries to HDFS • Execute YARN based job – From Tez client – Using HDFS based DAG library

Tez Existing MR Tasks • Tez can process existing Map
Reduce ( MR ) tasks • No need for any modification • Allows for phased migration – Of existing MR jobs to DAG's • Allows for near real time task types • Rather than just MR tasks which are – Batch oriented – Iterative – Resource intensive

Tez API • Tez DAG defines the job • Vertex
defines one DAG job step – Requires user logic and resources for step • Edge defines one DAG data movement step – From producer to consumer – Edge properties define movement • How data moves • Schedules when data moves relationally • Defines durability of data

Tez Hive • Increased performance – Compared to Map Reduce
usage • No need to use HDFS for intermediate steps • Greater parallelism via DAG's • Less complex steps in DAG compared to MR • Reduced latency • Higher throughput • Better speed

Available Books • See “Big Data Made Easy” – Apress
Jan 2015 • See “Mastering Apache Spark” – Packt Oct 2015 • See “Complete Guide to Open Source Big Data Stack – “Apress Jan 2018” – • Find the author on Amazon – www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ • Connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020

Connect • Feel free to connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020
• See my open source blog at – open-source-systems.blogspot.com/ • I am always interested in – New technology – Opportunities – Technology based issues – Big data integration

Apache Tez

Apache Tez

Mike Frampton

More Decks by Mike Frampton

Other Decks in Technology

Featured

Transcript

What Is Apache Tez ? • An application framework •

Hadoop Eco Sphere

Tez DAG • Tez directed-acyclic-graphs ( DAG ) • Distributed

Tez Performance • Performance improvement compared to Map Reduce –

Tez Deployment • Tez is client side • Install Tez

Tez Existing MR Tasks • Tez can process existing Map

Tez API • Tez DAG defines the job • Vertex

Tez Hive • Increased performance – Compared to Map Reduce

Available Books • See “Big Data Made Easy” – Apress

Connect • Feel free to connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020