FS2
This document currently describes crazy thoughts and heretical ideas. It should not be taken as gospel truth, and none of this may ever see the light of day. However, there's been enough discussion about FS2 that we need some place to collect our thoughts.
Drawbacks of the Current Systems
The current FS backends (FSFS and BDB) have done well within their original design parameters, scaling beyond what most of the original implementors thought possible. This is a testament to both their flexibility and simplicity, but as additional feature requests have come along, it has become apparent that both the FSFS and BDB backends are lacking.
Some of the problems with the current systems that FS2 hopes to resolve are:
- The burden (and limitations) of maintaining two separate backend systems
- The intermingling of metadata with data
- The immutableness of historic revisions
- An unhealthy obsession with the lower-level details of how bits are stored on disk
Since the advent of FSFS and BDB almost a decade ago, and number of new technologies have been introduced which allow us to design a better, more flexible system, while still maintaining the performance and other qualities that Subversion is known for.
Goals
The overarching goal of FS2 is to create a complete replacement for the current filesystem layer with an improved set of abstractions to allow pluggable storage interfaces for a variety of user needs.
FS2 backends should be able to perform such operations as:
- Answer questions like:
- To where did a certain change propagate? (Forward history tracing.)
- "obliterate" specified contents, by removing links to it, or completely removing content from the system
- [Insert others here]
The abstractions of FS2 should be such that it is possible to plug in various backends for the physical storage of data with relatively little effort. (One obvious solution is using a key-value abstraction which can easily be implemented on a number of different platforms and data stores.)
For further discussion, see Requirements.
Design
All repository data will be stored in a canonical form, including property and directory contents. There will be only one correct way of representing data (which probably implies some sort of ordering constraint in the case of directory entries).
Contents will be stored by hash in some sort of key-value store (which may be user-configurable), and information about those contents, such as relationships between them, will be stored separately. While the metadata may share the same key-value store as the contents, it will refer to them by hash.
Metadata will store both forward and backward links for easier history traversal.
We envision the FS2 data mutation interface to look much like Ev2.
See the versioned filesystem theory and design notes for details.
Implementation
There is currently no implementation plan.
Further Reading
Despite concerns about unhealthy obsession about storage layout, here's some relevant reading material.
- Stratified B-trees and versioning dictionaries
- (video of presentation about same)
- LevelDB, an example of an SDA-like key-value database