query for Hadoop • Open source • Developed by Cloudera • Based on Google 2010 dremel paper • Direct data access via Impala engine • Future Hadoop parquet update will – Add columnar binary storage to Hadoop – Improve Impala performance www.semtech-solutions.co.nz [email protected]
access • Query planning / coordination on data nodes • Node based query engine • Low latency • Perfomance imrovement • Query data on HDFS or Hbase • Uses same Hive QL syntax ( SQL like ) • Has the Hue GUI • Allows table joins and aggregation www.semtech-solutions.co.nz [email protected]
queries – hardware limitations – Min 3 times • Complex – multiple MapReduce stages – Min 7 times • Cached queries – Min 20 times www.semtech-solutions.co.nz [email protected]
which can be compressed as • Snappy • GZIP • BZIP – Future support for • Avro • RCFile • LZO text file • Parquet www.semtech-solutions.co.nz [email protected]
www.semtech-solutions.co.nz – [email protected] • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems