Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An Introduction to Apache Flume

An Introduction to Apache Flume

An Introduction to Apache Flume, what is it used for and
how does it work ? How does it fit into the Hadoop tool
set ?

Mike Frampton

July 21, 2013
Tweet

More Decks by Mike Frampton

Other Decks in Technology

Transcript

  1. Apache Flume • What is it ? • How does

    it work ? • Architecture • Reliability www.semtech-solutions.co.nz [email protected]
  2. Flume – What is it ? • A data collection

    service for Hadoop • For distributed systems • Open source • Scaleable • Reliable • Manageable • Fault tolerant www.semtech-solutions.co.nz [email protected]
  3. Flume – How does it work ? • Flumes uses

    agents which have – A source • Listen for events • Write events to channel – A channel • Queue event data as transactions – A sink • Write event data to target i.e. HDFS • Remove event from queue www.semtech-solutions.co.nz [email protected]
  4. Flume – Architecture • A single agent showing its parts

    • Generally one agent for a given data type www.semtech-solutions.co.nz [email protected]
  5. Flume – Architecture • Agents can be chained into flows

    • Avro can be used for data serialization www.semtech-solutions.co.nz [email protected]
  6. Flume – Architecture In complicated flows it may be necessary

    to think about • Event Data Reliability • Should we have – Complete end to end reliability – Send and forget – Or something in between ? www.semtech-solutions.co.nz [email protected]
  7. Contact Us • Feel free to contact us at –

    www.semtech-solutions.co.nz – [email protected] • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems