Thursday, September 27, 2012

Cassandra DSE, Testing Solr integration

SOLR support is one of the DataStax commercial offering for cassandra. which basically enable us to run a real-time solr query against the data in the cassandra. Here is one basic try of the features.

when you create the DSE cluster, you can change some node by setting the /etc/default/dse enable hadoop or Solr.
Change the SOLR_ENABLED or HADOOP_ENABLED one will add one more role to that Cassandra node.

For me, I have 8 Nodes as a whole cluster. 3 as regular cassandra, 3 as Hadoop node, 2 as SOLR, you can see fro the ops center view. or just through node tool.



Now Let’s create one KS called solrtest first through the OPscenter admin ui,

And create one column family called info using Cqlsh. and load some data.


DSE has some default mapping between the data stored in cassandra, and data needed to be indexed to SOLR.

By default, Nodes will be mapped to Shard. for my case, I have 2 Solr nodes, that means I have 2 Solr Shards. CF will be mapped to Core, so here I need to tell the system I want to index info column family.  for columns in that CF, will be mapped to solr field. we can pick which columns need to be indexed throught some level of configration called schema.xml which is the same file in SOLR.

in the /usr/share/dse-demo/wikipedia , there are some sample schemas and script.
  we change the schema.xml first, basically we just need two fields to be indexed, and default search field is comments,

then we need post his xml to SOLR, here is one script called

change the mapping url, shoul http://ANYSOLRNODE:8983/solr/resource/KEYSPACE.CF/solrconfig.xml

then we run the script to post the schema, and config file. (you are only need run this for one solr server, the solr server shared those configuration.)

after done, you can access.
http://ANYSOlrServer:8983/solr/KEYSPACE.CF/admin/ to see the solr admin ui,

when you run search, you can see the docs returned as expected.

if we change the query to comments:hello, then only first doc will be returned

If we insert more data, those data will be indexed on the fly.
search solr, 2 docs returned.


At the same time, you can use the CQL to query the data using solr syntax.

So that’s it. you may wondering what happened underneath when we run this query?

basically it will query all SOLR Nodes to run a distrubuted shared query, get the items key, and pull the data from cassandra.


Matt said...

Hi Ryan,

Thanks for the post -
Were you able to figure out how to index data directly in Solr that is not first entered/created in Cassandra ?
It seems like it is possible : " If you HTTP post the files to a non-existing column keyspace or column family, DSE Search creates the keyspace and column family, and then starts indexing the data. F"


Matt said...

Also, everything worked for me also, except i got
an error on the Solr search UI when querying; i had to do a '
update keyspace WITH placement_strategy = 'NetworkTopologyStrategy' and strategy_options=[{Solr:1}]; '
to make things work ..

Rahul said...

Here i am using DE-3.0.
My cassandra database output is :--
cqlsh:mykeyspace> SELECT * FROM mysolr WHERE KEY =124;
KEY | Date_Time | body | date | name | title
124 | 2013-02-11 10:10:10+0530 | A chicken in every pot ... | dec 15, 1933 | Roosevelt | fireside chat

My schema.xml configuration is:--


but at solr query output not contains value at Date_Time.
output looks lilke this:--
124,A chicken in every pot ...,fireside chat,Roosevelt,"dec 15, 1933"

What am i missing.Please help me.I have badly need it.

Locations of visitors to this page