DUE TO SPAM, SIGN-UP IS DISABLED. Goto Selfserve wiki signup and request an account.
Status
Current state: Accepted
Discussion thread: here
JIRA: CASSANDRA-19947
Released: <Cassandra Version>
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Scope
The proposed changes are needed in CQL grammar and in Cassandra itself.
Motivation
Currently, we have a way of setting up max sizes for individual columns and max collection sizes, but they can only be set up at a general database level and the approach lacks extensibility. We want to provide more flexibility to Cassandra users and operators by providing a set of different usable constraints at a more granular level, that will ease validations at application level and protect the Database from misconfigured clients.
This brings to the table several benefits and flexibility. Some examples:
- Cassandra operators have more control to reason about your data and appropriately tune for performance.
- Potential reduction on maintenance overhead, being able to better predict partition sizes.
- Extensibility to more complex validations in the future.
- Potential value in storage engine making decisions based on data size.
Audience
Cassandra Operators, Cassandra Users.
Goals
To provide a way of specifying and enforcing such constraints at a column level in a table schema definition and enforcing them at write time.
Approach
Let’s focus on a simple use case for the constraint framework, that will probably be the first one we implement. The length limit per row.
We could accept an optional CQL notation such as:
CREATE TABLE keyspace.table ( name text CHECK length(name) < 256, number_of_items int CHECK number_of_items < 1000 ..., CONSTRAINT lastName CHECK length(lastname) < 120 );
With this approach, we have two ways to configure contraints:
- `column_name cql_type CHECK EXPR`. Inline styple, allows only one constraint.
- `CONSTRAINT [optional_name] CHECK column_name EXPR`. Separate definition, allows defines multiple constraints w/ different expressions CQL keyword.
Where EXPR is defined as such EXPR: (CQL_FUNCTION | column_name) operator term.
Every constraint_type will be aplicable for a subset of column types. For instance, it does not make sense to define a size constraint on primitive types that have fixed size, e.g. int. The size constraint is also not compatible with the data types can grow size incrementally, e.g. collections and UDT. For such cases, the CQL command should fail with a relevant error message. For example:
CREATE TABLE keyspace.table ( numberOfItems int, ..., CONSTRAINT number_of_items CHECK length(numberOfItems) < 256 );
Error: keyspace.table.number_of_items constraint error: type int does not support length constraint
Note: This constraints are only enforced at write time. So, an ALTER CONSTRAINT with more restrictive constraints shouldn’t affect preexisting data.
Syntax
Creating a table with Constraints:
CREATE TABLE keyspace.table ( p1 int, p2 int, ..., CONSTRAINT [name] CHECK (p1 != p2) );
- Name of the constraint is optional. If it is not provided, a name is generated for the constraint.
- Predicate parentheses are optional
Alter a Constraint:
ALTER TABLE keyspace.table ALTER CONSTRAINT [name] CHECK (condition)
- Predicate parentheses are optional
Drop a constraint:
ALTER TABLE keyspace.table DROP CONSTRAINT [name]
Errors
Failed constraint error messages need to be descriptive and always contain the list of errors of the failed constraints, including the keyspace, table name and constraint name, value and column failing the constraint, and the definition of the constraint.
// For not supported constraints per column type Error: keyspace.table constraint 'number_of_items' error: type int does not support length constraint // For failed constraint when trying to write Error: keyspace.table constraint 'number_of_items' error: numberOfItems = 300 does not comply with constraint numberOfItems < 256
Describe table
A DESCRIBE TABLE CQL statement includes all constraints for a given table.
Proposed Changes
CQL extension
Adding support for CONSTRAINT keyword and the rest of the parameters
Constraint Handling at schema level
Basically, what constraints check logic needs to be run by Cassandra before accepting a write, which in turns depends on the table schema. This should validate that the constraints are valid.
Constraint execution at write time
Validate that a write complies with the constraints before executing it.
Constraints
The initial core constraints this CEP proposes are:
- Length Constraint for String types, as in
name text CHECK
length(name) < 256 - Value Constraint for numeric type
number_of_items int CHECK number_of_items < 1000
Those two alone and combined provide a lot of flexibility, and allow complex validations that enable "new types" such as:
CREATE TYPE keyspace.cidr_address_ipv4 ( ip_adress inet, subnet_mask int, CONSTRAINT CHECK subnet_mask > 0, CONSTRAINT CHECK subnet_mask < 32 ) CREATE TYPE keyspace.color ( r int, g int, b int, CONSTRAINT CHECK r >= 0, CONSTRAINT CHECK r < 255, CONSTRAINT CHECK g >= 0, CONSTRAINT CHECK g < 255, CONSTRAINT CHECK b >= 0, CONSTRAINT CHECK b < 255, )
Those two initial Constraints are de fundamental constraints that would give value to the feature. The framework can (and will) be extended with other Constraints, leaving us with the following:
For numeric types:
- Max (<)
- Min (>)
- Equality ( = = )
- Difference (!=)
For date types:
- Before (<)
- After (>)
For text based types:
- Length
- is_json (is the text a json?)
- complies with a given pattern
For all columns:
- Not Null (not_null)
General table constraints (including more than one column):
- Compare between numeric types (a < b, a > b, a != b, …)
- Compare between date types (date1 < date2, date1>date2, date1!=date2, …)
New or Changed Public Interfaces
Constraint
This interface will be implemented by all the different constraints. It should enforce:
- validate: At schema definition time, this method is called with the column name and type. It will return false if the data is not valid.
- execute: At write time, this method is called if the constraint is in the schema with column name, column value, and constraint parameters. It will return false if constraint is not met.
Compatibility, Deprecation, and Migration Plan
This is an additive optional feature, so no deprecation plan is needed.
Cassandra 5.0 and newer versions are supported.
For the case in which there are preexisting offending rows on the database after configuring a constraint, no issues should be found. We plan to enforce these constraints at write level, so reads will keep working. When trying to update the offending row, the update will need to be compliant with the constraints.
Interaction with Guardrails Framework
As we mentioned in the motivation section, we currently have some guardrails for columns size in place which can be extended for other data types.
Those guardrails will take preference over the defined constraints in the schema, and a SCHEMA ALTER adding constraints that break the limits defined by the guardrails framework will fail.
If the guardrails themselves are modified, operator should get a warning mentioning that there are schemas with offending constraints.
Test Plan
- Tests for Create table with constraints at column level
- Tests for Create table with constraints at table level
- Tests for altering constraints
- Tests for dropping constraints
- Tests for text based constraints
- Tests for numeric type constraints
- Tests for not null
- Tests for multi column constraints
Rejected Alternatives
Specific per row constraints
Let’s focus on a simple use case for the constraint framework, that will probably be the first one we implement. The size limit per row.
We could accept an optional CQL notation such as:
CREATE TABLE keyspace.table ( name text, ..., CONSTRAINT name MAX_SIZE_IN_BYTES 256 );
With this new CQL keyword, we allow expressions such as:CONSTRAINT [column_name] [constraint_type] [constraint_parameters]
Every constraint_type will be aplicable for a subset of column types. For instance, it wouldn’t make sense to define a MAX_SIZE_IN_BYTES constraint to an int type column. For such cases, the CQL command should fail with a relevant error message. For example:
CREATE TABLE keyspace.table ( number_of_items int, ..., CONSTRAINT number_of_items MAX_SIZE_IN_BYTES 256 );
Error: number_of_items of type int does not support MAX_SIZE_IN_BYTES constraint
We have rejected this alternative as its syntax is less flexible than the chosen approach. For instance, it is hard to pass several parameters to a constraint.
Application level validation
Basically, not implementing this CEP. The problem here is that Database then must trust the clients to do the correct thing, with limited control over what goes in the database. Current protection may be insufficient in some cases.
Future enhancements
The constraint framework could be applied at two different levels. Cassandra wide level (a set of defaults provided by the Cassandra operator), and table level, defined by the table schema that have some padding to override the Cassandra level defaults.
References
Other database technologies have similar features built in: