Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An Introduction to Apache Flume

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.

An Introduction to Apache Flume

An Introduction to Apache Flume, what is it used for and
how does it work ? How does it fit into the Hadoop tool
set ?

Avatar for Mike Frampton

Mike Frampton

July 21, 2013
Tweet

More Decks by Mike Frampton

Other Decks in Technology

Transcript

  1. Apache Flume • What is it ? • How does

    it work ? • Architecture • Reliability www.semtech-solutions.co.nz [email protected]
  2. Flume – What is it ? • A data collection

    service for Hadoop • For distributed systems • Open source • Scaleable • Reliable • Manageable • Fault tolerant www.semtech-solutions.co.nz [email protected]
  3. Flume – How does it work ? • Flumes uses

    agents which have – A source • Listen for events • Write events to channel – A channel • Queue event data as transactions – A sink • Write event data to target i.e. HDFS • Remove event from queue www.semtech-solutions.co.nz [email protected]
  4. Flume – Architecture • A single agent showing its parts

    • Generally one agent for a given data type www.semtech-solutions.co.nz [email protected]
  5. Flume – Architecture • Agents can be chained into flows

    • Avro can be used for data serialization www.semtech-solutions.co.nz [email protected]
  6. Flume – Architecture In complicated flows it may be necessary

    to think about • Event Data Reliability • Should we have – Complete end to end reliability – Send and forget – Or something in between ? www.semtech-solutions.co.nz [email protected]
  7. Contact Us • Feel free to contact us at –

    www.semtech-solutions.co.nz – [email protected] • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems