Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.

Configuration Spaces and Histories

Before we dive into versioned filesystem design and implementation details, let's first take a look at the concepts we're talking about. We'll begin by discussing what a configuration management system actually operates on.

N.B.: Although the following text uses mathematical concepts and notation, it's not intended to be a rigorous dissertation on configuration theory. We'll often make statements and draw conclusions without providing proof. Therefore, dear reader, you can either trust we're right, or do your own homework to prove (or disprove) our assertions. We will, however, try to conscientiously flag⚑ all leaps of faith in the text.

Definition: A data configuration is a unique pattern of data that can be represented as exactly one distinct sequence of binary digits.

  • The representation of the empty configuration E is the zero-length sequence.
  • The size , Sc, of a configuration c is the length of its representative sequence of binary digits.

Definition: A data configuration space is a metric space (B, D) where:

  • B is the set of all data configurations;
  • D: B × B ⟶ ℝ is a metric on B.

Definition: A configuration history is a sequence of elements of B; in other words, it is a curve in (B, D).

Defining the Metric

We'll now show that we can indeed define a metric for B. One example is the Levenshtein distance,

L: B × B ⟶ ℕ 0

applied to the binary-digit sequence representation of data configurations. It is easy to show⚑ that L has all the required properties of a metric. In addition, it has the following properties:

  • The minimum distance between any two distinct configurations is 1.
  • The maximum distance between any two distinct configurations is the greater of their sizes: ∀a,b ∈ B: L(a, b) ≤ max(Sa, Sb);
    • from which it follows that: ∀c ∈ B: L(C, c) = Sc.
  • All distances are non-negative integers.

In the rest of this text, we'll specifically talk about (B, L) — that is, the data configuration space with a Levenshtein distance metric; also referred to as edit distance. Even though the term "edit distance" is less precise in general, it fits the concept of edits (or edit transformations), which we'll discuss later on.

The History Curve