Thursday, March 29, 2012

DataStax DSE 2.0, Hadoop/Hive quick tutorial - 2

Hive is used as a batch approach, once we get the analysis result, we need put it into a store which is available for real-time access, for Cassandra, it will be Cassandra keyspace.
Given the previous result, Let’s create one keyspace called OLDPks and one column family called result. for the result column family,
here we use the cityname as the rowid, and the number will be stored in the result:count
[image%255B43%255D.png]

like the hbase-hive handler, we can create the keyspace, column family first, then add a external table in Hive.
or ask hive to create the underlying keyspace and column family.

now let’s do the second approach,
in the hive cli, create one external table served by cassandrafs
image

then from the Opscenter, you can see the keyspace/columnfamily is there
image
or via cassandra cli, show schema,
image

now, we can insert some data to the result cf via casansra-cli,
image

then list the result,
image
also you can view the data throught the opscenter data explorer,

image


Now, let’s query the data through hive,
image

Now we can Load the hive analysis result that we did last time to this external table
image
from the opscetner hadoop jobs, click full details, to go to the familiar jobtrackr admin ui.
image

Once done, result is there, in the hive external table called result, which is mapped to a Column family named result under OLAPks,
image
you can query it by Hive,
image
Or by Cassandra API,
image

So, we create one hbase-hive handler style approach in cassandra. run data analysis by Hive, and export to real time CassandraFS. then data is ready for real-time access by ad-hoc clients.

next stop will be solr integration, stay tuned.

No comments:

 
Locations of visitors to this page