Hive is used as a batch approach, once we get the analysis result, we need put it into a store which is available for real-time access, for Cassandra, it will be Cassandra keyspace.
Given the previous result, Let’s create one keyspace called OLDPks and one column family called result. for the result column family,
here we use the cityname as the rowid, and the number will be stored in the result:count
like the hbase-hive handler, we can create the keyspace, column family first, then add a external table in Hive.
or ask hive to create the underlying keyspace and column family.
now let’s do the second approach,
in the hive cli, create one external table served by cassandrafs
then from the Opscenter, you can see the keyspace/columnfamily is there
or via cassandra cli, show schema,
now, we can insert some data to the result cf via casansra-cli,
then list the result,
also you can view the data throught the opscenter data explorer,
Now, let’s query the data through hive,
Now we can Load the hive analysis result that we did last time to this external table
from the opscetner hadoop jobs, click full details, to go to the familiar jobtrackr admin ui.
Once done, result is there, in the hive external table called result, which is mapped to a Column family named result under OLAPks,
you can query it by Hive,
Or by Cassandra API,
So, we create one hbase-hive handler style approach in cassandra. run data analysis by Hive, and export to real time CassandraFS. then data is ready for real-time access by ad-hoc clients.
next stop will be solr integration, stay tuned.