SOLR support is one of the DataStax commercial offering for cassandra. which basically enable us to run a real-time solr query against the data in the cassandra. Here is one basic try of the features.
when you create the DSE cluster, you can change some node by setting the /etc/default/dse enable hadoop or Solr.
Change the SOLR_ENABLED or HADOOP_ENABLED one will add one more role to that Cassandra node.
For me, I have 8 Nodes as a whole cluster. 3 as regular cassandra, 3 as Hadoop node, 2 as SOLR, you can see fro the ops center view. or just through node tool.
Now Let’s create one KS called solrtest first through the OPscenter admin ui,
And create one column family called info using Cqlsh. and load some data.
DSE has some default mapping between the data stored in cassandra, and data needed to be indexed to SOLR.
By default, Nodes will be mapped to Shard. for my case, I have 2 Solr nodes, that means I have 2 Solr Shards. CF will be mapped to Core, so here I need to tell the system I want to index info column family. for columns in that CF, will be mapped to solr field. we can pick which columns need to be indexed throught some level of configration called schema.xml which is the same file in SOLR.
in the /usr/share/dse-demo/wikipedia , there are some sample schemas and script.
we change the schema.xml first, basically we just need two fields to be indexed, and default search field is comments,
then we need post his xml to SOLR, here is one script called 1-add-schema.sh
change the mapping url, shoul http://ANYSOLRNODE:8983/solr/resource/KEYSPACE.CF/solrconfig.xml
then we run the script to post the schema, and config file. (you are only need run this for one solr server, the solr server shared those configuration.)
after done, you can access.
http://ANYSOlrServer:8983/solr/KEYSPACE.CF/admin/ to see the solr admin ui,
when you run search, you can see the docs returned as expected.
if we change the query to comments:hello, then only first doc will be returned
If we insert more data, those data will be indexed on the fly.
search solr, 2 docs returned.
At the same time, you can use the CQL to query the data using solr syntax.
So that’s it. you may wondering what happened underneath when we run this query?
basically it will query all SOLR Nodes to run a distrubuted shared query, get the items key, and pull the data from cassandra.