内存

最近写入的数据总是先写入内存表(aka memtable) , 但是旧的数据会被刷新到磁盘,被保存在操作系统的文件系统缓存。换句话说,内存嘛,韩信点兵多多益善,建议最少有1G的虚拟内存。可以观察到性能的改善依赖于你的比较常用的数据集(hot data set),但是硬件通常需要4GB,在高端应用中,比如你看到的集群,通常每个节点会带着16GB到32GB乃至更多。

内存也被用来做键缓冲(0.5版本引入)和行的缓冲(0.6版本)。

CPU

在内存达到极限之前,很多的时候,CPU负载达到了极限。Cassandra会尽量使用更多的资源。对于没有改良的硬件来说,8核CPU是目前的性价比较好的选择。如果你是运行在虚拟机,考虑使用如比允许CPU突发的Rackspace Clound服务器。

磁盘

简短的回答是至少2个磁盘。一个保存你的提交日志目录,另一个用于数据文件目录。精确的回答:依赖于你的使用率,这是一个很重要的理解点。

Cassandra以两种不同的目的保存数据到磁盘。第一种是用于提交日志,当一个新的写入时,为防止系统崩溃或系统宕机,可以重新执行。第二种达到内存表的限制后以排序表的形式存储。

Commit logs receive every write made to a Cassandra node and have the potential to block client operations, but they are only ever read on node start-up. SSTable (data file) writes on the other hand occur asynchronously, but are read to satisfy client look-ups. SSTables are also periodically merged and rewritten in a process called compaction. Another important difference between commitlog and sstables is that commit logs are purged after the corresponding data has been flushed to disk as an SSTable, so CommitLogDirectory only holds uncommitted data while the directories in DataFileDirectories store all of the data written to a node.

So to summarize, if you use a different device for your CommitLogDirectory it needn't be large, but it should be fast enough to receive all of your writes (as appends, i.e., sequential i/o). Then, use one or more devices for DataFileDirectories and make sure they are both large enough to house all of your data, and fast enough to both satisfy reads that are not cached in memory and to keep up with flushing and compaction.

As covered in MemtableSSTable, compactions can require up to 100% of your in-use space temporarily in the worst case, free on a single volume (that is, in a data file directory). So if you are going to be approaching 50% or more of your disks' capacity, you should raid0 your data directory volumes. B. Todd Burruss adds on the mailing list, "With the file sizes we're talking about with cassandra and other database products, the [raid] stripe size doesn't seem to matter. Mine is set to 128k, which produced the same results as 16k and 256k." In addition to giving you capacity for compactions, raid0 will help smooth out io hotspots within a single sstable.

On ext2/ext3 the maximum file size is 2TB, even on a 64 bit kernel. On ext4 that goes up to 16TB. Since Cassandra can use almost half your disk space on a single file, if you are raiding large disks together you may want to use XFS instead, particularly if you are using a 32-bit kernel. XFS file size limits are 16TB max on a 32 bit kernel, and basically unlimited on 64 bit.

Cloud

Several heavy users of Cassandra deploy in the cloud, e.g. CloudKick on Rackspace Cloud Servers and SimpleGeo on Amazon EC2. The general consensus in the community seems to be that Rackspace's VMs offer better performance for Cassandra because of CPU bursting, raided local disks, and separate public/private network interfaces.

https://c.statcounter.com/9397521/0/fe557aad/1/|stats

  • No labels