Monday, March 21, 2011

How to: install and test Hive on hadoop cluster

After we install and config hadoop to run in fully distributed Mode, Centos, then we will move forward and test Hive.

Download one stable release from apache site, and decompress it.

wget http://mirror.metrocast.net/apache//hive/stable/hive-0.6.0.tar.gz
tar -xvf hive-0.6.0.tar.gz
cp –rf hive-0.6.0 /usr/lib/hive
export PATH=$PATH:/usr/lib/hive/bin
#need to setup the hadoop home variable
export HADOOP_HOME=/usr/lib/hadoop

then you can run hive command shell,
image

the hive here is just a standard java application, here is all the arguments to start hive shell,
/usr/lib/jvm/java/bin/java -Xmx4096m -Dhadoop.log.dir=/usr/lib/hadoop/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str= -Dhadoop.root.logger=INFO,console -Djava.library.path=/usr/lib/hadoop/lib/native/Linux-amd64-64 -Dhadoop.policy.file=hadoop-policy.xml -classpath "all the jars" org.apache.hadoop.util.RunJar /usr/lib/hive/lib/hive-cli-0.6.0.jar org.apache.hadoop.hive.cli.CliDriver
the clidriver is a standard java application to process users commands
image

By default , hive use the java  derby as the store to keep meta data.
image

given the raw input like this

"01","35004","AL","ACMAR",86.51557,33.584132,6055,0.001499
we will create a table called rawtable first just with one column to hold all the data, using the syntax of creating table http://wiki.apache.org/hadoop/Hive/HiveQL

Create table rawtable(raw string);
load data inpath ‘/user/hadpoop/input’ into table rawtable;
image

when you run a simple query, (select count  ( *) from rawtable, you can tell the mr job is invoked to get the results.
image

if you check the hdfs file systems, every table you just created will be stored under /user/hive/warehouse/[tablename]
image

After that, we can write quite complex query like join, subquery, check it out here, http://wiki.apache.org/hadoop/Hive/HiveQL

Besides, CLI , you can run ‘hive –service hwi’ to expose a simple web interface for data accessing, http://home:9999/hwi

image

Also, we can start a hive server image

remote user can use the thrift api or JDBC to connect this DB powered by Hive.

1 comment:

Unknown said...

Nice blog and well explained the installation of Hive on Hadoop.Thanks for sharing.Even I want to share some thing useful to you http://www.hadooponlinetutor.com is offering the hadoop complete hadoop videos for $20 only.The videos are really good

 
Locations of visitors to this page