After we install and config hadoop to run in fully distributed Mode, Centos, then we will move forward and test Hive.
Download one stable release from apache site, and decompress it.
|tar -xvf hive-0.6.0.tar.gz |
cp –rf hive-0.6.0 /usr/lib/hive
#need to setup the hadoop home variable
then you can run hive command shell,
the hive here is just a standard java application, here is all the arguments to start hive shell,
given the raw input like this
|we will create a table called rawtable first just with one column to hold all the data, using the syntax of creating table http://wiki.apache.org/hadoop/Hive/HiveQL |
Create table rawtable(raw string);
load data inpath ‘/user/hadpoop/input’ into table rawtable;
when you run a simple query, (select count ( *) from rawtable, you can tell the mr job is invoked to get the results.
After that, we can write quite complex query like join, subquery, check it out here, http://wiki.apache.org/hadoop/Hive/HiveQL
Besides, CLI , you can run ‘hive –service hwi’ to expose a simple web interface for data accessing, http://home:9999/hwi
remote user can use the thrift api or JDBC to connect this DB powered by Hive.