How to: install and test Hive on hadoop cluster

After we install and config hadoop to run in fully distributed Mode, Centos, then we will move forward and test Hive.

Download one stable release from apache site, and decompress it.

tar -xvf hive-0.6.0.tar.gz
cp –rf hive-0.6.0 /usr/lib/hive
export PATH=$PATH:/usr/lib/hive/bin
#need to setup the hadoop home variable
export HADOOP_HOME=/usr/lib/hadoop

then you can run hive command shell,

the hive here is just a standard java application, here is all the arguments to start hive shell,
/usr/lib/jvm/java/bin/java -Xmx4096m -Dhadoop.log.dir=/usr/lib/hadoop/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.root.logger=INFO,console -Djava.library.path=/usr/lib/hadoop/lib/native/Linux-amd64-64 -Dhadoop.policy.file=hadoop-policy.xml -classpath "all the jars" org.apache.hadoop.util.RunJar /usr/lib/hive/lib/hive-cli-0.6.0.jar org.apache.hadoop.hive.cli.CliDriver
the clidriver is a standard java application to process users commands

By default , hive use the java  derby as the store to keep meta data.

given the raw input like this

we will create a table called rawtable first just with one column to hold all the data, using the syntax of creating table

Create table rawtable(raw string);
load data inpath ‘/user/hadpoop/input’ into table rawtable;

when you run a simple query, (select count  ( *) from rawtable, you can tell the mr job is invoked to get the results.

if you check the hdfs file systems, every table you just created will be stored under /user/hive/warehouse/[tablename]

After that, we can write quite complex query like join, subquery, check it out here,

Besides, CLI , you can run ‘hive –service hwi’ to expose a simple web interface for data accessing, http://home:9999/hwi


Also, we can start a hive server image

remote user can use the thrift api or JDBC to connect this DB powered by Hive.

