SOLR support is one of the DataStax commercial offering for cassandra. which basically enable us to run a real-time solr query against the data in the cassandra. Here is one basic try of the features.
when you create the DSE cluster, you can change some node by setting the /etc/default/dse enable hadoop or Solr.
Change the SOLR_ENABLED or HADOOP_ENABLED one will add one more role to that Cassandra node.
For me, I have 8 Nodes as a whole cluster. 3 as regular cassandra, 3 as Hadoop node, 2 as SOLR, you can see fro the ops center view. or just through node tool.
And create one column family called info using Cqlsh. and load some data.
By default, Nodes will be mapped to Shard. for my case, I have 2 Solr nodes, that means I have 2 Solr Shards. CF will be mapped to Core, so here I need to tell the system I want to index info column family. for columns in that CF, will be mapped to solr field. we can pick which columns need to be indexed throught some level of configration called schema.xml which is the same file in SOLR.
in the /usr/share/dse-demo/wikipedia , there are some sample schemas and script.
we change the schema.xml first, basically we just need two fields to be indexed, and default search field is comments,
then we need post his xml to SOLR, here is one script called 1-add-schema.sh
change the mapping url, shoul http://ANYSOLRNODE:8983/solr/resource/KEYSPACE.CF/solrconfig.xml
then we run the script to post the schema, and config file. (you are only need run this for one solr server, the solr server shared those configuration.)
after done, you can access.
http://ANYSOlrServer:8983/solr/KEYSPACE.CF/admin/ to see the solr admin ui,
So that’s it. you may wondering what happened underneath when we run this query?
basically it will query all SOLR Nodes to run a distrubuted shared query, get the items key, and pull the data from cassandra.