Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An Introduction to Apache Hive

An Introduction to Apache Hive

What is Apache Hive in terms of big data and Hadoop ?
How does it relate to business intelligence and
management reporting ? Can it be used with Business
Objects ?

Mike Frampton

July 10, 2013
Tweet

More Decks by Mike Frampton

Other Decks in Technology

Transcript

  1. Apache Hadoop Hive • What is it ? • Architecture

    • Related Projects • Hive DDL • Hive DML • HiveQL Examples • Business Intelligence
  2. Hadoop – What is it ? • A data warehouse

    for Hadoop • Open source writen in Java • Holds meta data in a relational database • Allows SQL like queries • Supports “big data” data sets • Offers built in and user defined functions • Has indexing
  3. Hive – Architecture • Given an existing HDFS and Hadoop

    cluster • Then add Hive and the meta data structure • Use Flume and Sqoop to move data • Use Hive LOAD DATA command to load from flat files • Use ODBC for connectivity to your BI layer
  4. Hive – Related Projects • Apache Flume – move large

    data sets to Hadoop • Apache Sqoop – cmd line, move rdbms data to Hadoop • Apache Hbase – Non relational database • Apache Pig – analyse large data sets • Apache Oozie – work flow scheduler • Apache Mahout – machine learning and data mining • Apache Hue – Hadoop user interface • Apache Zoo Keeper – configuration / build
  5. Hive - DDL • Create table hive> CREATE TABLE customer

    (age INT, address STRING); • Partitions hive> CREATE TABLE customer (age INT, address STRING) PARTITIONED BY ( sdate STRING) ; • Show table hive> SHOW TABLES ; • Describe table hive> DESCRIBE customer;
  6. Hive - DDL • Alter table hive> ALTER TABLE customer

    ADD COLUMNS ( age INT) ; • Drop table hive> DROP TABLE customer;
  7. Hive - DML • Loading flat files into Hive hive>

    LOAD DATA LOCAL INPATH './data/home/x1a.txt' OVERWRITE INTO TABLE customer; • No verification of incoming data
  8. HiveQL Examples • HiveQL, an SQL like language hive> SELECT

    a.age FROM customer a WHERE a.sdate ='2008-08-15'; selects all data from table for a partition but doesnt store it hive> INSERT OVERWRITE DIRECTORY '/data/hdfs_file' SELECT a.* FROM customer a WHERE a.sdate='2008-08-15'; writes all of customer table to an hdfs directory
  9. Hive – Business Intelligence • Use ODBC to connect Hive

    to your BI layer • Now you can use BI tools like Business Objects – Create a universe over the Hive instance – Create reports against the universe – Create add hoc queries against the universe
  10. Contact Us • Feel free to contact us at –

    www.semtech-solutions.co.nz – [email protected] • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems