SSTable Overview

DRAFT. Notes on documenting how SSTables work in Cassandra (data format, indexing, serialization, searching)

SSTables have 3 separate files created, and are per column-family.

  1. Bloom Filter
  2. Index
  3. Data

When adding a new key to an SSTable here are the steps it goes through. All keys are sorted before writing.

  1. Serialize Index (ColumnIndexer.serialize(!IIterableColumns columns, DataOutput dos))
    1. Sort columns for key
    2. Serialize columns bloom filter
      1. Loop through columns and subcolumns that make up for column family
        1. Build sum for columnCount by column getObjectCount (includes getting subcolumn counts for super columns)
        2. Create bloom filter with column count
        3. Loop through columns (again) and add column name to bloom filter
          1. If super column detected, loop through subcolumns and add column name
      2. Write bloom filter hash count (int)
      3. Write serialized bloom filter length (int)
      4. Write serialized bytes of bloom filter
    3. Start indexing based on column family comparator
      1. If columns empty write integer zero, return
      2. Iterator over all columns creating a collection of IndexHelper.IndexInfo objects each IndexInfo representing at most getColumnIndexSize() worth of data (default is 64KB: Value from yaml's column_index_size_in_kb)
        1. Construct each new IndexInfo that consists of first and last columns visited that fit in the index size limit
      3. Write size of indexSizeInBytes (int)
      4. Serialize each IndexInfo object - (firstname is first column name visited in block, and lastname is the last column name visited)
        1. Write byte firstname - (length >> 8) & 0xFF
        2. Write byte firstname - (length & 0xFF)
        3. Write byte firstname
        4. Write byte lastname - (length >> 8) & 0xFF
        5. Write byte lastname - (length & 0xFF)
        6. Write byte lastname
        7. Write long startPosition
        8. Write long endPosition - startPosition
  2. Serialize Data (ColumnFamilySerializer.serializeForSSTable(ColumnFamily columnFamily, DataOutput dos)
    1. Write columnFamily localDeletionTime (int)
    2. Write columnFamily markedForDeleteAt (long)
    3. Sort columns
    4. Write the number of columns (int)
    5. Determine Column Serializer and Serialize Column
      1. Determine length of column name as length
      2. Write byte - (length >> 8) & 0xFF
      3. Write byte - length & 0xFF
      4. Write byte of column name
      5. Write boolean isMarkedForDelete
      6. Write long timestamp
      7. Write column value length (int)
      8. Write column value as byte
  3. Write to SSTable Data File
    1. Write out row key in UTF, this is based on partitioner
      1. Random Partitioner
        1. key token + DELIMITER + key name
        2. Delimiter is colon
    2. Write size of row value (int)
    3. Write byte of row value
  4. Write SSTable Bloom Filter and SSTable Index
    1. Add to bloom filter disk key based on partitioner
      1. Random Partitioner
        1. key token + DELIMITER + key name
        2. Delimiter is colon
    2. Write disk key to SSTable Index file (UTF)
    3. Write file position before (Write to SSTable Data File) (int)|stats

  • No labels