My Note on Solutions.: How to: install and test Hive on hadoop cluster

Monday, March 21, 2011

How to: install and test Hive on hadoop cluster

After we install and config hadoop to run in fully distributed Mode, Centos, then we will move forward and test Hive.

Download one stable release from apache site, and decompress it.

wget http://mirror.metrocast.net/apache//hive/stable/hive-0.6.0.tar.gz

tar -xvf hive-0.6.0.tar.gz
cp –rf hive-0.6.0 /usr/lib/hive
export PATH=$PATH:/usr/lib/hive/bin
#need to setup the hadoop home variable
export HADOOP_HOME=/usr/lib/hadoop

then you can run hive command shell,

the hive here is just a standard java application, here is all the arguments to start hive shell,

/usr/lib/jvm/java/bin/java -Xmx4096m -Dhadoop.log.dir=/usr/lib/hadoop/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str= -Dhadoop.root.logger=INFO,console -Djava.library.path=/usr/lib/hadoop/lib/native/Linux-amd64-64 -Dhadoop.policy.file=hadoop-policy.xml -classpath "all the jars" org.apache.hadoop.util.RunJar /usr/lib/hive/lib/hive-cli-0.6.0.jar org.apache.hadoop.hive.cli.CliDriver

the clidriver is a standard java application to process users commands

By default , hive use the java derby as the store to keep meta data.

given the raw input like this

"01","35004","AL","ACMAR",86.51557,33.584132,6055,0.001499

we will create a table called rawtable first just with one column to hold all the data, using the syntax of creating table http://wiki.apache.org/hadoop/Hive/HiveQL

Create table rawtable(raw string);
load data inpath ‘/user/hadpoop/input’ into table rawtable;

when you run a simple query, (select count ( *) from rawtable, you can tell the mr job is invoked to get the results.

if you check the hdfs file systems, every table you just created will be stored under /user/hive/warehouse/[tablename]

After that, we can write quite complex query like join, subquery, check it out here, http://wiki.apache.org/hadoop/Hive/HiveQL

Besides, CLI , you can run ‘hive –service hwi’ to expose a simple web interface for data accessing, http://home:9999/hwi

Also, we can start a hive server

remote user can use the thrift api or JDBC to connect this DB powered by Hive.

1 comment:

Unknown said...: Nice blog and well explained the installation of Hive on Hadoop.Thanks for sharing.Even I want to share some thing useful to you http://www.hadooponlinetutor.com is offering the hadoop complete hadoop videos for $20 only.The videos are really good; April 13, 2015 at 3:23 AM

Pages

My Note on Solutions.

Monday, March 21, 2011

How to: install and test Hive on hadoop cluster

1 comment:

Search This Blog

All Blogs

Blog Archive