Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An Introduction to Apache Oozie

An Introduction to Apache Oozie

An Introduction to Apache Oozie, what is it and what is it used
for ? How is it used with Hadoop ?

Mike Frampton

July 17, 2013
Tweet

More Decks by Mike Frampton

Other Decks in Technology

Transcript

  1. Apache Oozie • What is it ? • Why use

    it ? • Architecture • Examples www.semtech-solutions.co.nz [email protected]
  2. Oozie – What is it ? • Work flow scheduler

    for Hadoop • Manages Hadoop Jobs • Integrated with many Hadoop apps i.e. Pig • Scaleable • Schedule jobs • A work flow is a collection of actions i.e. – map/reduce, pig, hfs • A work flow is – Arranged as a DAG ( direct acyclic graph ) – Graph stored as hPDL ( XML process definition ) www.semtech-solutions.co.nz [email protected]
  3. Oozie – Why use it ? • It is designed

    for Hadoop • It is open source • It is designed for big data • It allows you to design task work flow • It allows you to interact with jobs – Stop, start, suspend, resume, rerun www.semtech-solutions.co.nz [email protected]
  4. Oozie – Architecture • Install Oozie on edge node /

    not on cluster • Oozie has client – Launches jobs and talks to server • Ozzie has server – Controls jobs – Launches jobs • Pipelines – Chained workflows – Work flow output – Is input to next www.semtech-solutions.co.nz [email protected]
  5. Contact Us • Feel free to contact us at –

    www.semtech-solutions.co.nz – [email protected] • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems