Apache Cassandra > Index > Setup
Added by Avinash Lakshman, last edited by Avinash Lakshman on Mar 21, 2009  (view change)

There is a standard config that comes with the download of Cassandra. Here is a description of some of the basic information that needs to be in the configuration file:

Property Description Default
ClusterName The name of this cluster. This is mainly used to prevent machines in another logical cluster from joining any other cluster. Test
HashingStrategy Partitioning strategy required. Choices are RANDOM and OPHF RANDOM
RackAware Replication policy preferred. For now just say false false
MulticastChannel This is not used anymore.
ReplicationFactor Replication factor desired 3
ZookeeperAddress This is not used currently.
RpcTimeoutInMillis RPC timeout in milliseconds 2000
StoragePort TCP port on which all inter cluster communication takes place. 7000
ControlPort UDP port on which all cluster membership communications occur. 7001
ColumnIndexSizeInKB Size after which columns are indexed by the system 256KB
HttpPort HTTP port for the WEB dashboard. 80
MetadataDirectory Cassandra metadata information location.
CommitLogDirectory Location of the commit log.
CommitLogRotationThresholdInMB Size after which commit log needs to be rotated. 128
Seeds This is for dynamic node discovery. Nodes discover other nodes by initially contacting the seeds.

After the installation step navigate to the directory that contains the ANT build config and type "ant jar" at the prompt. Some common problems in the deployment steps are:

  • Modify the start-server script to the -Dstorage-config property to the location which contains the storage-conf.xml
  • Make sure the log4j.properties also resides in the same location as above.
  • Make sure the directory location for the log4j log files are pre-created or you could make them part of the start-server script.
  • Modify the Java heap settings in the start-server as per your requirements.

Typical Cassandra deployment has been on machines with 16GB RAM dual CPU quad core machines and more disks the better. We have typically used 3 disks per machine.