Monday, July 26, 2010

How to create a Cassandra cluster on a single PC / windows Tutorial

I have been playing with Cassandra couple days recently, I will summarize the prerequisites to run a cluster in a single PC. here it comes a tutorial to run/create a Cassandra cluster on windows.

before you try to run/create a cluster, please check the following requirements.

  • Install a JDK/JRE
    • 32/64 bit jvm
  • Pickup three/four TCP Ports
    • Cassandra storage
      • default is 7000
    • Thrift Listener
      • for the remote client connection
      • default is 9160
    • JMX monitoring port for two nodes, one is 8080, another one is 9080
    • JWDP debugging port (optional)
  • At least two IP address you can use. each member need one dedicated IP address.

in this given example, I will setup one cluster named HelloCassendra with two storage nodes.

  • Download the Cassandra bits from http://cassandra.apache.org/ , . I downloaded 0.6.3.
    • unzip it to a folder like C:\apache-cassandra-0.6.3
  • Copy two Conf folders, and rename that at conf1, conf2. We are going to use  one codebase, but create two separate configurations.
    • C:\apache-cassandra-0.6.3\conf1
    • C:\apache-cassandra-0.6.3\conf2
  • go to conf1 folder, change two cong files. Conf1 will listen on 127.0.0.1 and acts as the seed node. (Cassandra use the gossip-based clustering protocol. we will config this node as the seed node.)
    • C:\apache-cassandra-0.6.3\conf1\log4j.properties
      • # Edit the next line to point to your logs directory
        log4j.appender.R.File=/var/log/cassandra/system.log
      • change the folder to =/var/log/cassandra/c1/system.log
      • so this node will put the log to c:/var/log/cassandra/c1/system.log
    • C:\apache-cassandra-0.6.3\conf1\storage-conf.xml
      • Change the clustername to HelloCassandra
        • <ClusterName>Test Cluster</ClusterName>
        • <ClusterName>HelloCassandra</ClusterName>
      • change the commitlogdirectory and DataFileDirectory
        • <CommitLogDirectory>/var/lib/cassandra/C1/commitlog</CommitLogDirectory>
            <DataFileDirectories>
                <DataFileDirectory>/var/lib/cassandra/C1/data</DataFileDirectory>
            </DataFileDirectories>
      • replacethe localhost with 127.0.0.1 explicitly.
        • <ListenAddress>127.0.0.1</ListenAddress>
        • <ThriftAddress>127.0.0.1</ThriftAddress>
  • go to conf2 folder, change two cong files. Conf2 will listen on 127.0.0.2 and acts as the regular node. will talk to seed node 127.0.0.1 for membership information)
    • C:\apache-cassandra-0.6.3\conf2\log4j.properties
      • # Edit the next line to point to your logs directory
        log4j.appender.R.File=/var/log/cassandra/system.log
      • change the folder to =/var/log/cassandra/c2/system.log
      • so this node will put the log to c:/var/log/cassandra/c2/system.log
    • C:\apache-cassandra-0.6.3\conf2\storage-conf.xml
      • Change the clustername to HelloCassandra
        • <ClusterName>Test Cluster</ClusterName>
        • <ClusterName>HelloCassandra</ClusterName>
      • change the commitlogdirectory and DataFileDirectory
        • <CommitLogDirectory>/var/lib/cassandra/C2/commitlog</CommitLogDirectory>
            <DataFileDirectories>
                <DataFileDirectory>/var/lib/cassandra/C3/data</DataFileDirectory>
            </DataFileDirectories>
      • replace the localhost with 127.0.0.2 explicitly.
        • <ListenAddress>127.0.0.2</ListenAddress>
        • <ThriftAddress>127.0.0.2</ThriftAddress>
      • enable the auto bootstrap mode
        • <AutoBootstrap>false</AutoBootstrap>
  • Go to C:\apache-cassandra-0.6.3\bin, copy cassandra.bat as c1.bat and c2.bat. each bat will be the bootstrap of different instance.
    • C:\apache-cassandra-0.6.3\bin\c1.bat
    • C:\apache-cassandra-0.6.3\bin\c2.bat
  • EDIT c1.bat, point to the conf1 folder, and change the default JMX PORT and Debug Port
    • if NOT DEFINED CASSANDRA_CONF set CASSANDRA_CONF=%CASSANDRA_HOME%\conf
      • if NOT DEFINED CASSANDRA_CONF set CASSANDRA_CONF=%CASSANDRA_HOME%\conf1
    • disable the debugging by remove the follow line
      • -Xrunjdwp:transport=dt_socket,server=y,address=8888,suspend=n^
    • for the C1 instance, keep the default jmx port 8080
      • -Dcom.sun.management.jmxremote.port=8080^
  • EDIT c2.bat, point to the conf2 folder, and change the default JMX PORT and Debug Port
    • if NOT DEFINED CASSANDRA_CONF set CASSANDRA_CONF=%CASSANDRA_HOME%\conf
      • if NOT DEFINED CASSANDRA_CONF set CASSANDRA_CONF=%CASSANDRA_HOME%\conf2
    • disable the debugging by remove the follow line
      • -Xrunjdwp:transport=dt_socket,server=y,address=8888,suspend=n^
    • for the C2 instance, change the default jmx port 8080 to 9080
      • -Dcom.sun.management.jmxremote.port=9080^
  • Click to start c1.bat and c2.bat

When you start C1.bat. you might get the following information. it will tell you that this node is confgured to be a seed node. its thrift port.

Starting Cassandra Server
INFO 16:04:31,597 Auto DiskAccessMode determined to be mmap
INFO 16:04:31,909 Saved Token not found. Using 22656600690150525193669162742751150004
INFO 16:04:31,909 Saved ClusterName not found. Using HelloCassandra
INFO 16:04:31,909 Creating new commitlog segment /var/lib/cassandra/c1/commitlog\CommitLog-1280185471909.log
INFO 16:04:31,987 LocationInfo has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/var/lib/cassandra/c1/commitlog\C
INFO 16:04:31,987 Enqueuing flush of Memtable-LocationInfo@1351579886(171 bytes, 4 operations)
INFO 16:04:31,987 Writing Memtable-LocationInfo@1351579886(171 bytes, 4 operations)
INFO 16:04:32,236 Completed flushing C:\var\lib\cassandra\c1\data\system\LocationInfo-1-Data.db
INFO 16:04:32,283 Starting up server gossip
INFO 16:04:32,299 This node will not auto bootstrap because it is configured to be a seed node.
INFO 16:04:32,346 Binding thrift service to /127.0.0.1:9160
INFO 16:04:32,346 Cassandra starting up...


then you may run a tcpview to check how many ports instance 1 is listening.
image

here, 7000 is the storage port, 8080 is the jmx port, so you may use jconsole to connect and monitor this jvm. 9160 is the thrift port. How about 60625 and 60626?
   
   now, there is only one node in the cluster.

C:\apache-cassandra-0.6.3>bin\nodetool --host 127.0.0.1 --port 8080 ring
Starting NodeTool
Address       Status     Load          Range                                      Ring
127.0.0.1     Up         497 bytes     22656600690150525193669162742751150004     |<--|

time to kick off the c2.bat, after that, you will notice that C2 joins the cluster. and wait for some time to do some housekeeping thing, might take 120 seconds.
 

Starting Cassandra Server
INFO 16:11:58,940 Auto DiskAccessMode determined to be mmap
INFO 16:11:59,237 Saved Token not found. Using 168810650452358861593947197964955051846
INFO 16:11:59,252 Saved ClusterName not found. Using HelloCassandra
INFO 16:11:59,252 Creating new commitlog segment /var/lib/cassandra/c2/commitlog\CommitLog-1280185919252.log
INFO 16:11:59,315 LocationInfo has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/var/lib/cassandra/c2/commitlog\Com
INFO 16:11:59,315 Enqueuing flush of Memtable-LocationInfo@625647261(171 bytes, 4 operations)
INFO 16:11:59,315 Writing Memtable-LocationInfo@625647261(171 bytes, 4 operations)
INFO 16:11:59,564 Completed flushing C:\var\lib\cassandra\c2\data\system\LocationInfo-1-Data.db
INFO 16:11:59,596 Starting up server gossip
INFO 16:11:59,627 Joining: getting load information
INFO 16:11:59,627 Sleeping 90000 ms to wait for load information...
INFO 16:12:01,577 Node /127.0.0.1 is now part of the cluster
INFO 16:12:02,592 InetAddress /127.0.0.1 is now UP
INFO 16:12:02,592 Started hinted handoff for endPoint /127.0.0.1
INFO 16:12:02,607 Finished hinted handoff of 0 rows to endpoint /127.0.0.1
INFO 16:13:29,657 Joining: getting bootstrap token
INFO 16:16:44,916 New token will be 107727192420385141059512814600693202868 to assume load from /127.0.0.1
INFO 16:16:44,931 Joining: sleeping 30000 ms for pending range setup
INFO 16:17:14,952 Bootstrapping
INFO 16:17:15,030 Bootstrap/move completed! Now serving reads.
INFO 16:17:15,108 Binding thrift service to /127.0.0.2:9160
INFO 16:17:15,108 Cassandra starting up...

run tcpview again, you will see two nodes are get established via the storage port.
image
   run the ring query again.

C:\apache-cassandra-0.6.3>bin\nodetool --host 127.0.0.1 --port 8080 ring
Starting NodeTool
Address       Status     Load          Range                                      Ring
                                       107727192420385141059512814600693202868
127.0.0.1     Up         497 bytes     22656600690150525193669162742751150004     |<--|
127.0.0.2     Up         497 bytes     107727192420385141059512814600693202868    |-->|

 

Also you can use the Jconsole to monitor the Nodes.  
  run jconsole which is located in the java sdk bin directly. connect to the node 1 by localhost:8080, node 2 by localhost:9080.
  unfold those mbeans, you will be able to see a lot counters.

image

 

Now, everything is set. enjoy you exploring.

More open source solutions.

4 comments:

Pablo C├║bico said...

Hi there, I created a post about setting up a Cassandra cluster on a single PC via virtual machines, it's a differente approach (more resource-consuming...).

Check it out:
http://silicoholic.com/2010/09/14/cluster-cassandra-ubuntu-virtualbox/

Tamil said...

Hi dude I'm using cassandra latest 0.8x version. I dunno what is seed provider in it. I'm not getting the result u are getting on creating clusters in a single machine

Ace said...

Hi,

I am looking for a Windows Server 2008 R2 multiple node cluster to install and run a commercial Windows program. Can I do this using Cassandra? I appreciate your help.

With kind regards,

Andy

Androidyou said...

To ACE,

No, Cassandra is basically a java application which provides the distributed storage service.

 
Locations of visitors to this page