Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An introduction to Cloudera Impala

An introduction to Cloudera Impala

An introduction to Cloudera Impala, what is it and
how does it work ? How can it bring real time
performance gains to Apache Hadoop ?

Mike Frampton

August 14, 2013
Tweet

More Decks by Mike Frampton

Other Decks in Technology

Transcript

  1. Impala • What is it ? • How does it

    work ? • Performance • Formats • Architecture www.semtech-solutions.co.nz [email protected]
  2. Impala – What is it ? • Adhoc real time

    query for Hadoop • Open source • Developed by Cloudera • Based on Google 2010 dremel paper • Direct data access via Impala engine • Future Hadoop parquet update will – Add columnar binary storage to Hadoop – Improve Impala performance www.semtech-solutions.co.nz [email protected]
  3. Impala – How does it work ? • Direct data

    access • Query planning / coordination on data nodes • Node based query engine • Low latency • Perfomance imrovement • Query data on HDFS or Hbase • Uses same Hive QL syntax ( SQL like ) • Has the Hue GUI • Allows table joins and aggregation www.semtech-solutions.co.nz [email protected]
  4. Impala – Performance Impala delivers performance gains • IO bound

    queries – hardware limitations – Min 3 times • Complex – multiple MapReduce stages – Min 7 times • Cached queries – Min 20 times www.semtech-solutions.co.nz [email protected]
  5. Impala – Formats Supported formats – Text & Sequence Files

    which can be compressed as • Snappy • GZIP • BZIP – Future support for • Avro • RCFile • LZO text file • Parquet www.semtech-solutions.co.nz [email protected]
  6. Impala – Requirements What does Impala need to run ?

    – CentOS 6.2 – or RHEL (Red Hat Enterprise Linux) – CDH 4.1 (Cloudera Hadoop Distribution) – Cloudera Manager ( advised ) www.semtech-solutions.co.nz [email protected]
  7. Contact Us • Feel free to contact us at –

    www.semtech-solutions.co.nz – [email protected] • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems