Table of Contents |
---|
Introduction
What's the difference between 'sync' and 'reintegrate' merges in Subversion? Could a single algorithm serve both purposes? How do 'cherry-pick' merges fit in?
Symmetric Merging and Complete Merges
This document is about how Subversion's merging capabilities compare with a symmetric model of merging. A symmetric model of merging is based on the idea that most merges are of the kind that takes "all the changes" from branch X onto branch Y. This is a model that people familiar with ClearCase tend to have.
In this model, merging doesn't come in two different forms. You make a branch. When you want to 'sync' the new branch, you merge (all the changes) from the original branch onto the new branch. When you want to 'reintegrate' the branch, you merge (all the changes) from the new branch back to the parent. The merge is the same operation, invoked by the same command and using the same logic, in either direction.
In this model, the branching and merging can be drawn in the form of a directed acyclic graph (DAG).
Certainly we will have to account for cherry-pick merges as well, but the main focus here is on complete merges. We investigate how Subversion's sync and reintegrate merges relate to the symmetric model of merging, and how we could move Subversion towards that model.
Info | ||
---|---|---|
| ||
[danielsh] State here briefly the reasons we may want to move towards that model? |
Abstract
The essence of this document is that it looks like we can relatively easily enhance Subversion so that a plain 'merge' command will act as either 'sync' or 'reintegrate' automatically, depending on the history of the branch, and furthermore will work better than either of them in the scenario of continuing work on a branch after a reintegrate.
We explain how Subversion performs sync and reintegrate merges, expressing the results in terms of copying logical changes from one branch to another. We compare this with the way a symmetric merge would handle the same cases, and we see how the symmetric merge can handle a much bigger set of useful cases. We also show how previous cherry-pick merges are handled in V1.7 and how they would be handled by a symmetric merge.
In particular, we show the limitations of using the so-called keep-alive dance after reintegrating a branch, and we show how much better the symmetric merge is in this scenario.
We start to develop the symmetric merge as an algorithm which can be implemented in Subversion using the existing V1.7 mergeinfo. It should be suitable for replacing both the sync and reintegrate algorithms, because it performs as a backward-compatible upgrade for each of them.
Background on 3-way merging
The following sections assume that the reader understands how the diff3 algorithm works. The diff3 algorithm is used to perform a merge between an original version of a file and two different versions of the file, each derived from the original version (in Subversion, the derived versions are often referred to as "mine" and "theirs"). This paper provides an explanation and an analysis of the diff3 algorithm: http://www.cis.upenn.edu/~bcpierce/papers/diff3-short.pdf
Playing catch-up with Sync and Reintegrate
Brane wrote:
No Format |
---|
When you create a branch, the origin of the branch is an interesting bit of information. However, for merging, it is entirely irrelevant if branch A was created from B or the other way around. To illustrate: (1) +- b@r2 ---- b@r3 ---- (branch) / | (merge) / v --- a@r1 -------------+- a@r4 ---- (2) --- a@r1 ----------- a@r3 ---- \ | (merge) (branch) \ v +- b@r2 ------+- b@r4 ---- Cases (1) and (2) are exactly equivalent as far as the merge algorithm is concerned, but Subversion calls the first a reintegrate merge and the second a sync merge, and treats them differently, as if branch (a) were somehow special. It's not. |
Let's examine how Subversion performs a sync merge and how it performs a reintegrate merge.
In the following example, two sync merges are shown. Let's jump straight to examining the second one. (The first one is just a simpler case of the same thing.)
To perform the second sync merge (before it was committed as revision B5), Subversion will:
- Find the common ancestor of A and B (just following the plain lines of descent, ignoring merges).
- That's state O on the graph.
- Consider all the changes along the history of branch A, after the common ancestor O up to and including A4.
- That's changes A1, A2, A3, A4.
- Look at B's mergeinfo, to see what has already been merged from A.
- That's changes A1 and A2.
- Determine the changes from A that have not yet been merged into B, according to the evidence of B's mergeinfo.
- That's changes A3 and A4.
- Perform a 3-way merge of A3 and A4 into B (actually into the WC based on B4). The base of the 3-way merge is the predecessor of change A3, which is A2.
- Finally, the user commits that WC as revision B5.
(The arcs that start with a circle and end with a T-shaped bar represent the span of the source side of the merge. See Graph_Notation.)
What if, instead of a second sync merge, at that point we try a reintegrate instead?
The main significant differences are:
- For the 3-way merge, Subversion selects a base node that is on the target branch rather than on the source branch.
- For the mergeinfo, Subversion records all the changes along the source branch (B), starting from 'O'.
It is interesting to consider the meaning of the mergeinfo that is recorded. The mergeinfo added to branch A in change A5 says that all the changes along branch B have been merged which, loosely speaking, is correct. But if we look carefully, that does not accurately descibe the set of logical changes that have been merged, because B3 was not a new logical change introduced on branch B. B3 (if viewed as a change on B) was the merging into B of changes A1 and A2, and we haven't tried to merge those back onto A; instead we've carefully chosen a 3-way merge that avoided trying to re-apply them to A and just merged the other changes from B.
More significantly, neither does the mergeinfo "B:1-4" accurately describe the set of physical changes that were merged. The physical (3-way) merge took the difference between A2 and B4, which we can describe as A2:B3 followed by B3:B4. The second part (B3:B4) we can descibe in mergeinfo syntax as "B:4". The first part, A2:B3, we can't describe using mergeinfo syntax. Logically it's equivalent to changes B1 and B2, but it's not physically the same, because A2:B3 is the rewriting of B1 and B2 into the context of branch A. (See The Two Sides of a Merge, below, for clarification.)
Subtree Merges
This section sets out how Subversion should perform a merge when a subtree of the merge source and/or target has its own mergeinfo that differs from the mergeinfo that it would otherwise inherit from its parent directory. In other words, the set of revisions of this subtree that have been merged differs from the set of revisions of its parent directory that have been merged.
The Principle
Each subtree should be considered separately in terms of calculating what needs to be merged.
We break down the whole merge into phases (revision ranges) and we complete each phase across all subtrees before moving on to the next phase.
The Rationale
In a complete merge, the user is always asking Subversion to merge all of the changes (up to a given limiting revision) that have not already been merged. We should not omit unmerged changes in a subtree just because they lie outside the revision ranges that need to be merged for the root of the merge.
Why do we complete a revision-range phase across all subtrees before moving to the next phase? The main alternative would be to merge each subtree in the minimum number of possible steps. Advantages: A logical single-revision tree state after each phase – may help if conflict resolution needed, as user will have a consistent reference base across the tree. We stop after a phase if there are any unresolved conflicts. Disadvantages: Any merges can conflict more if broken into phases than if applied in one go.
The Details
For a given merge request (from a source URL at a given revision to a target WC), apply that same request separately to each subtree. In principle, that means every subtree in the tree. In implementation, we can first work out which subtrees will need a different set of merges, and then treat just those subtrees specially.
The merge required for a subtree might have its base after or before the base of the parent's merge. It might even have its base on the opposite branch from the parent's merge.
Significant and Insignificant Subtree Merging
Often, Subversion creates subtree mergeinfo when the user performs a merge of a subtree (even a single file) knowing that that subtree contains all the changes that there were in the selected revisions. In these cases, mergeinfo on the root of the branch would be equivalent.
OTOH, when the subtree contains some changes, but other changes exist outside it, that's a significant subtree merge.
Example
Consider the following mergeinfo catalog:
^/branch: /trunk:100-110
^/branch/B: /trunk/B:101-104
^/branch/C: /trunk/C:104-107
The merge 'svn merge ^/trunk@110
' (into ^/branch
) would break the merge into the following ranges: [100], [101-103], [104], [105-107], [108-110]. Each range would be treated independently, and an actual merge will only be attempted on the subtrees that do not indicate they have already been merged that range. (For example, the [101-103] range would not be merged into the ^/branch/B
subtree. It would be merged into its parents and siblings, though, just as with 'svn up --set-depth=exclude B && svn merge -c 101:103 --disregard-mergeinfo
'.)
Mixed-Rev, Switched, or Sparse WC
This section considers conditions of the target WC that have little to do with the functioning of the symmetric merge algorithm itself but affect the way the implementation works when applying changes into the target WC.
The target WC may have any of the following conditions.
- mixed-revision (not all nodes have the same base revision)
- contains any switched subtrees
- is sparse (incomplete)
- ...
Often, for a merge target, some of these conditions are considered to be undesirable because they tend to add complexity without providing a needed benefit for the user. In some cases there may be benefits for the user.
The merge API may check for these conditions and raise an error. For conditions that the implementation can handle, the check should be overridable.
Requirements
Mixed-Rev WC
No use cases known; just a minor convenience for the user.
Switched Paths in the WC
For a merge that affects paths both inside and outside a switched subtree: No use cases known.
For a merge that affects paths inside or outside a switched subtree but not both: This seems like it could sometimes be convenient, when the user knows that a given merge will affect either paths inside or outside a switched subtree but not both.
See my reply and any subsequent replies for an attempt to capture the rationale for the existing behaviour in the form of a requirement spec.
Sparse WC
[TODO]
Functional Spec
Mixed-Rev WC
[TODO]
Switched Paths in the WC
See the section "Switched Paths" in the merge tracking functional spec. Fuller explanation of the first two bullet points, see Paul's email. See my reply and any subsequent replies for more, including deleting a switched path.
Sparse WC
See the section "Sparse Checkouts" in the merge tracking functional spec.
[TODO] How does 'depth' work if you have added children to a directory that was initially sparse (and so presumably still recorded in the WC as being 'sparse') or, conversely, have excluded children from a directory that was initially depth non-empty?
Cherry-pick Merges
[Note: This section is rather long, and should be considered more as reference material than an introduction, as it aims to set out all the possible cases.]
A cherry-pick merge, for our purposes here, is one that takes from the source branch a change that is not the first unmerged change, thus leaving an "gap" of unmerged changes between sequences of merged changes. By contrast, a complete merge (a sync or a reintegrate) ensures that all source changes up to and including the specified one are merged to the target.
We are not going to discuss here how a cherry-pick merge is performed. (See FourWayCherryPick for an interesting side note about that.) Instead we are concerned with how a complete merge will account for any changes that have already been merged by cherry-picking.
If a cherry-pick merge has occurred since the last complete (sync or reintegrate) merge, the next complete merge should, loosely speaking, skip it. We can classify such a cherry-pick by its position and direction with respect to the last complete merge:
Any cherry-pick merge that could be requested from one branch to the other falls into one of those three cases or is a degenerate case. The degenerate cherry-pick cases are:
- The source change is already on the target branch (such as A1 or B2 in the 'cherry 1' graph), which is a no-op.
- The source change is the first source change that hasn't already been merged to the target (such as A2), which turns into a sync merge.
- The cherry pick occurs before any sync or reintegrate merge. In this case the last 'complete merge' point is the branch point (common ancestor).
We need to consider each case in two contexts: when the next complete merge is in the same direction as, or in the opposite direction from the last complete merge. We'll call the first context "Sync Again" for short, and the second context "Reintegrate". The base for the (initial) 3-way merge will be on the source branch in the first context, and on the target branch in the second context.
Cherry-Pick 1 and Sync Again (Merge in Same Direction)
Subversion's sync merge handles cherry-picks in the same direction by skipping over them, splitting the merge into two or more phases, with each phase performing a 3-way merge of one sub-range of the requested source range.
Requested:
Performed as two phases:
Cherry-Pick 2 and Sync Again (Merge in Same Direction)
Subversion's sync merge doesn't handle cherry-picks in the opposite direction. Given this requested merge:
... Subversion simply does not notice the cherry-pick and so performs a merge that attempts to re-apply some changes that originated on the target branch. This relies on (automatic and/or manual) conflict resolution to weed out the duplicate changes.
What should happen is A5 should be recognized as a (merge of a) change that we already have on B, and so the merge should be broken down into two phases, skipping A5 (similar to how the cherry-pick 1 case above skipped A4), like this:
Cherry-Pick 3 and Sync Again (Merge in Same Direction)
Just like "Cherry-Pick 2 and Sync Again", Subversion V1.7 does not notice the cherry-pick in this case, and what should happen is it should break the merge into two phases.
Cherry-Pick 1 and Reintegrate (Merge in Opposite Direction)
[??? Need to check this] Subversion's reintegrate doesn't recognize cherry-picked revisions in either direction. It looks specifically for the last complete catch-up.
Cherry-Pick 2 and Reintegrate (Merge in Opposite Direction)
Just like "Cherry-Pick 2 and Sync Again", Subversion V1.7 does not notice the cherry-pick in this case, and what should happen is it should break the merge into two phases.
Cherry-Pick 3 and Reintegrate (Merge in Opposite Direction)
Like "Cherry-Pick 2 and Sync Again", Subversion V1.7 does not notice the cherry-pick in this case.
What should happen, however, is more complex. The cherry-pick logical change (B3) is not one of the changes we were planning to merge. The first change we're planning to merge is A1:B4, which represents three logical changes (B2, B3, B4). We're not going to attempt to take part of that first physical change (A1:B4), so our options are to complain to the user, or to go ahead and accept that part of what we're merging is already on the target.
The Keep-Alive Dance
If you want to continue working on a branch after it has been reintegrated, we have been saying that there is a choice of two solutions:
- delete the branch and re-create it
- do a record-only merge of the reintegrated revision
The latter is informally known as the "keep-alive dance". What exactly does it do?
The arrow labeled "cherry" is the record-only merge. After this, a subsequent sync merge is broken down into two phases. For the first phase, Subversion locates A1 as the base and merges changes A2 and A3 into the WC. For the second phase, Subversion locates A4 as the base and merges A5 into the WC.
This is all very well, and merges the right set of logical changes, but the first phase is not as good as it could be. Changes A2 and A3 have already been merged with the early part of branch B (up to B3) and the result committed as A4; that means the user has already done any necessary conflict resolution for that part of the merge. By merging the same physical changes A2 and A3 again into B, we require the user to resolve the same conflicts again for this part of the merge, as well as solving any further conflicts related to merging change A5. (Of course in any given particular merge it is possible that there might be no conflicts, or no conflicts that aren't resolved automatically at the text level by the 3-way merge tool, but I'm talking about the general case.)
What should happen instead? It would be better to choose the younger common ancestor B3 rather than A1 as the base of the merge, and therefore merge the change B3:A4 instead of A1:A3, because then any conflict resolution associated with merging A:2-3 with B:2-3 has already been done (in A4) and we'd only be concerned with merging the newer changes on A (B3:A4, A4:A5) with the newer changes on B (B3:B5). See Unifying Sync and Reintegrate, below.
Up until this point I was working on trying to make Subversion automatically skip an old reintegrate in the same way that the keep-alive dance (record-only merge) would skip it, but now we can see that the keep-alive dance is simply wrong: it is worse than deleting the branch and creating it again. So I'll turn my attention to calculating the proper merge.
Unifying Sync and Reintegrate
What should happen when we try another merge after a reintegrate?
[Note: the bend in the 'reint' arrow is just an oddity of the graph drawing program; it doesn't mean anything special.]
We would call the merging direction in the first case a reintegrate, and in the second case a sync, but notice that the functionality needed for the reintegrate appears to be exactly what Subversion's sync merge does, and vice-versa. Looking more carefully at the second case ...
... we would want the new mergeinfo to reflect the idea that "all" the changes from the source branch have been merged to the target, with just the same form and meaning as described above for the basic reintegrate merge.
The conclusion is: to merge correctly, the 3-way merge base should be chosen on 'this' or 'that' branch according to the direction of the last merge, not according to whether the current merge is a reintegrate or a sync.
We call the result a symmetric merge algorithm.
In V1.7 we have sync which looks for a base on the source branch, and reintegrate which looks for a base on the target branch. What we need is a single algorithm that finds the most recent base, on either branch. Like the V1.7 reintegrate, it needs to choose a base for the 3-way merge, and potentially a different base (this one always on the source branch) for the mergeinfo to be recorded.
Considering the final merge in the "sync after reintegrate" graph (repeated from above), we have:
- mergeinfo on A5, saying "B:1-3"
- mergeinfo on B5, saying "A:1"
The longest continuous prefix of branch A that's fully merged into B is A:1. The longest continuous prefix of B merged to A is B:1-3. We look for a continuous prefix because we need to ignore any later revisions that may have been cherry-picked, after a gap, because in that case it's this first gap that is what we need to be filling first.
Youngest common ancestor (w.r.t. complete merging of the branches) is B3, so that's the 3-way base to use.
End of longest continuous prefix of source branch is A1, so that's the mergeinfo base to use.
Symmetric Merge Algorithm
This algorithm provides a symmetric merge with skipping of source revisions that have already been cherry-picked to the target branch.
- Find the latest rev of A synced to B and of B synced to A.
- Choose a base.
- Identify cherry-picks.
- Break into 3-way merges, skipping the cherry-picks.
- Perform the 3-way merges and mergeinfo addition.
In more detail.
- Find the latest revision of A synced to B and the latest revision of B synced to A.
- This means, for the A to B direction, the youngest revision 'rN' on A at which all revisions of A, up to and including rN, have effectively been merged into B. "Effectively...merged" means that one of the following is true for every revision X on A, where X <= rN:
- The merge of rX from A -> B is recorded in the mergeinfo on the version of B that is being used as the merge target. Usually the result corresponds to the most recent merge into B from A, but not necessarily, because later revisions from A might previously have been cherry-picked into B.
- The merge of rX from Subtree-of-A -> Subtree-of-B is recorded in the subtree mergeinfo on the version of B that is being used as the merge target *and* rX is operative only within Subtree-of-A (i.e. the subtree merge of rX, if repeated at the root of A <-> B would be inoperative except for mergeinfo changes).
- rX is inoperative on A.
- We consider only direct A <-> B mergeinfo (including subtree mergeinfo under A or B). (Consideration of a merge graph extending to other branches is future work. Since we record mergeinfo transitively, considering only direct A<->B mergeinfo already suffices for some 3-branch scenarios too, but not all such scenarios.)
TODO: If, in a later phase of development, we're going to examine indirect mergeinfo for detection of cherry picks, then probably we also need to do so here.
No Format In: A_tip: pathrev B_wc: wc-abspath (Note: B is a WC "actual" tree including any local mods.) Out: A_base: pathrev # last location on A that's fully sync'd to B A_to_B: pathrev # location on B where became fully sync'd from A B_base: pathrev # last location on B that's fully sync'd to A B_to_A: pathrev # location on A where became fully sync'd from B (Note: B_base can't possibly be B_wc if B_wc has local mods, because that modified version can't have been merged to A because we can only merge committed changes. A_to_B, on the other hand, probably can be B_wc so perhaps its data type can't be strictly a pathrev.) Implementation: Currently: find_base_on_source() find_base_on_target() TODO: fill in the details here and re-visit the implementation.
- This means, for the A to B direction, the youngest revision 'rN' on A at which all revisions of A, up to and including rN, have effectively been merged into B. "Effectively...merged" means that one of the following is true for every revision X on A, where X <= rN:
- Choose a base.
- Choose the latest revision of A synced to B or the latest revision of B synced to A.
- Each candidate base is a common ancestor of the source and target, if ancestry means following either the direct line of descent or the arrows that indicate complete merges.
- Choose the 'more recent' one in some sense – be it time of commit, or number of changes since then, or some other measure.
- [TODO] In what cases do the selection methods give different results? Only after a criss-cross merge?
TODO: Compare A_base.rev with B_base.rev only, or compare also A_to_B with B_to_A?
No Format In: A_base A_to_B B_base B_to_A Out: base: pathrev # A_base or B_base mid: pathrev # A_to_B or B_to_A Implementation: Currently, in find_symmetric_merge(): (A_base->rev > B_base->rev) ? A_base : B_base
- Identify cherry-picks.
- Find changes along the source side of the merge that are already 'in' the target.
- Look for merges in both directions (from current source to current target ("forward") and from current target to current source ("backward")). For each merge:
- Break down the merge into its component logical changes.
- If the merge consists entirely of logical changes that are not in the target, or consists entirely of logical changes that are in the target, treat it as a unit – a cherry pick.
Otherwise, fall back to the user: report to the user which logical changes that merge includes, and suggest that the user run more specific merges that pull only the subset of logical changes they want.
3a. (Only needed for implementation phase 2.) Create an (implicit or explicit) list of indivisible changes along each side of the merge, since the BASE.No Format For example: Source side Target side B@10:A@15 B@10:B@11 (B@10 is BASE) A@15:A@20 B@11:B@12 ... ... A@26:A@27 B@29:B@30 B@30:B_wc (B@30 is B_wc's origin) In: base mid A_tip B_wc Out: A_changes: a list of (URL1@REV1 : URL2@REV2) pairs? Or, more compactly, BASE plus a list of REVs that index into the branch history? B_changes: the same Note: The first change in one of the lists might be a cross-branch change (from 'BASE' on the target branch to 'MID' on the source branch), whereas all subsequent changes are changes along a branch. Note: The B_changes output needs to be able to represent B_wc as its end point, implicitly or explicitly. Note: These lists need to reference changes in such a way that we can fetch and examine the changes, at least in terms of their mergeinfo and whether they're operative. TODO: Should the outputs identify each individual change (revision) as operative or inoperative, or is it acceptable to output revisions (or ranges of revs) that are not all known to be operative?
3b. Discover the changes from A that have been merged to B since BASE.
No Format In: A_changes B_changes Out: A_changes_to_skip: a subset of A_changes A_changes_to_warn: a subset of A_changes Implementation phase 1: This detects direct cherry-picks in the same direction as the present merge, which is all that Subversion 1.7 supports. Fetch mergeinfo on B_wc; note any changes from A that are recorded as merged to B (since BASE, and directly) as "already on B". Implementation phase 2: This adds detection of cherry-picks in the opposite direction, which is new functionality. For each change in A_changes: Fetch the mergeinfo change. If the mergeinfo did change (it represents a merge into A): If that merge was a direct merge from B, and from nowhere else, add this A change onto A_changes_to_skip. If that is, instead, a merge from both B and other sources, then add this A change onto A_changes_to_warn. Add this to the phase 1 results for A->B. Note: If the first change is a BASE:MID cross-branch change, it must first be turned into the equivalent series of source-side changes, which is a non-trivial exercise. Implementation phase 3: This detects both direct and indirect (that is, via a third branch) cherry-pick merges, in both directions. This is more complex, and a simple implementation of it may run slowly in some scenarios. For each change in A_changes: Fetch the mergeinfo change. If the mergeinfo did change (it represents a merge into A), break down that merge into its component logical changes, else take this change on A as the logical change to test. See if those logical changes are already on B. If all of those logical changes are on B, add this A change onto A_changes_to_skip; if some of them are on B, add this A change onto A_changes_to_warn.
Break into 3-way merges, skipping the cherry-picks.
No Format Implementation: # Define a merge as (base, tip, target). merges = [(BASE, A_tip, B_wc)] for change in A_changes_to_skip: m = find change.rev in merges replace [m] with [ (m.base, change-1, m.target), (change, m.tip, m.target) ] # elide either or both of those if they are no-ops
Perform the 3-way merges and mergeinfo addition.
No Format Implementation: for m in merges: three_way_merge(m.base, m.tip, m.target) if m.base is on A: do a normal mergeinfo addition else: mergeinfo += (all source revs from YCA(?) up to m.tip)
Symmetric Merge with Criss-Cross Merge
The following kind of merge history is known as a 'criss-cross' merge.
The scenario is notorious for being an awkward case for typical DAG-oriented merge algorithms to handle. At first glance, there are two likely-looking nodes that could be considered as a base, but neither of them leads to a correct 3-way merge in the general case where changes have been made on both branches. In fact, in general there is no way to reconcile the two branches just by using simple 3-way merges without reverse-merging and user interaction.
I don't expect to be able to teach Subversion to resolve criss-cross merges automatically, and I don't think this is important. I think it's on the same level as resolving cases where the user cherry-picked some changes from branches A and B into a third branch C all at once (in one commit) and then wants to automatically sync-merge from C into A. In this case as in that case, Subversion should simply detect that there is no sequence of plain 3-way merges possible, and tell the user.
We now analyze what Subversion's symmetric merge algorithm will do in such cases.
Criss-Cross #1: Minimal
This is how Subversion will handle the simple criss-cross merge illustrated in the graph above.
First we pick a base according to the rules we decided. (At the time of writing this section, I don't know exactly what those rules are.) If the rules take into account the relative ages of A1, B1, A2 and B2, there might be a definite answer. Otherwise there may be no reason to choose one over the other. So, given only the information in the diagram, it could be A1 or B1 with equal probability. Let's see if the choice makes a difference.
If we pick base A1:
- Candidate changes are { A2 }.
- Target natural history is { B1, B2 }; target mergeinfo is { A1 }.
- Notice candidate A2 can be skipped because it represents a logical change (B1) that is already in the target branch's natural history.
- Skip A2 from the source ranges.
- Update the mergeinfo on B. (See "Recording Mergeinfo for Skipped Changes".)
- Perfect result. (In this case, no-op.)
If we pick base B1:
- Candidate changes are { B1:A2 }.
- Target natural history is { B1, B2 }; target mergeinfo is { A1 }.
- Notice candidate B1:A2 can be skipped because it represents a logical change (A1) that is already recorded as merged into the target branch.
- Skip B1:A2 from the source ranges.
- Update the mergeinfo on B. (See "Recording Mergeinfo for Skipped Changes".)
- Perfect result. (In this case, no-op.)
One non-trivial requirement here is that the cherry-pick identification step recognizes that B1:A2 is a candidate change (although it is not in itself a change on the natural history of any branch), and that it represents logical change A1.
Criss-Cross #2: With unmerged logical changes on the source
In the general case, a criss-cross merge need not be a no-op. Consider the following case:
If A1 were picked as base:
- Candidate changes are { A2, A3 }.
- Target natural history is { B1, B3 }; target mergeinfo is { A1 }.
- Notice candidate A3 can be skipped because it represents B1, as before.
- Merge just A2 to B.
- Update the mergeinfo on B. (See "Recording Mergeinfo for Skipped Changes".)
- Perfect result.
If B1 were picked as base:
- Candidate changes are { B1:A3 }.
- Target natural history is { B1, B2 }; target mergeinfo is { A1 }.
- Notice candidate B1:A3 represents logical changes { A1, A2 }. A1 is already on B; A2 isn't. Therefore we can neither merge nor skip this change.
- Bail out, telling the user there's a problem.
It appears that in this case there is a "good" base and a "bad" base. We could automatically detect this (by trying both ways, if not more easily).
If we don't automatically detect and select the good base, then criss-cross merges would sometimes work and sometimes not, with little clue as to why.
It seems like there must be more complex cases in which neither base will work.
Criss-Cross #3: with unmerged logical changes on both sides
Let's try the same but with a change in B as well.
If A1 were picked as base:
- Candidate changes are { A2, A3 }.
- Target natural history is { B1, B2, B3 }; target mergeinfo is { A1 }.
- Notice candidate A3 can be skipped because it represents B1, as before.
- Merge "just" A2 to B.
- Conflict between A2 and { B1, B2 } has to be resolved.
- Conflict between A2 and B1 had already been resolved in the B1 -> A3 merge, so this part of the user's work is repetitive.
The two sides of the merge are A1:A2 which we can write as { A2 }, and A1:B3 which represents original changes { B1, B2 }. The conflict between A2 and B2 has not been resolved before, but the conflict between A2 and B1 had already been resolved by the user during the merge B1->A3.
The conflict between A2 and B1 will be broadly the same as it was before. It will not be exactly the same, because the conflicting hunks that were previously seen in B1 are now being seen in state B3 where they may have been modified or moved around by the subsequent changes B2 and B3.
The quality of the result here can be judged by how much conflict the merge encounters that the user feels should not be necessary. If there is a lot of conflict between A2 and B1 (perhaps B1 is a big change), the user may feel that it should not be necessary to resolve that part of the conflict again. If there is little conflict between A2 and B1, resolving that again is no big deal.
If B1 were picked as base:
- Candidate changes are { B1:A3 }.
- Target natural history is { B1, B2, B3 }; target mergeinfo is { A1 }.
- Notice candidate B1:A3 represents logical changes { A1, A2 }. A1 is already on B; A2 isn't. Therefore we can neither merge nor skip this change.
- Bail out, telling the user there's a problem.
If O were picked as base:
- Candidate changes are { A1, A2, A3 }.
- Target natural history is { B1, B2, B3 }; target mergeinfo is { A1 }.
- Notice candidate A1 is already on B, and A3 represents B1, so those can be skipped.
- The merge would then be: base A1, source-right A2, target B4.
That is then the same as if we'd picked A1 as base.
Criss-Cross #4: with unmerged logical changes on both sides
Let's try the same but with changes after the merges as well.
If A1 were picked as base:
- Candidate changes are { A2, A3, A4 }.
- Target natural history is { B1, B2, B3, B4 }; target mergeinfo is { A1 }.
- Notice candidate A3 can be skipped because it represents B1, as before.
- Merge the changes { A2, A4 } to B: that is, two 3-way merges, first { base A1, source-right A2, target B } then { base A3, source-right A4, target B }.
Conclusion about Criss-Cross merges
As noted above, I don't expect Subversion to solve criss-cross cases, at least I put this at a lower priority than 3-branch merge patterns.
It seems that sometimes there is a "good" base: that is, in some cases it is possible to choose a base and break down the required merge into a series of simple 3-way merges. If we can do so without too much trouble, it would be nice to automatically do this when possible.
TODO: find an example where there is no good base. Maybe if we extend #3 by adding further changes after A3 and B3 before the merge attempt?
Can we define the base-choosing algorithm so that it always picks the "good" base when there is one? The last rev on A synced to B is A1, and the last rev on B synced to A is B1. Normally, we'd expect either B1 to be >= the target of the A1->B merge, or vice versa. But neither is the case here; that's more or less what defines this as a criss-cross. There's something special going on, so the presence of change A2 makes a base on the source branch succeed whereas a base on the target branch fails. We perhaps just need to codify that "something special" – some rule that says which base to consider the "better one" depending on the relative ages of ... A1, B1, A3, B3, and A2.
Criss-Cross Merge References
- "LCA Tree Merging" in Bazaar, <http://doc.bazaar.canonical.com/bzr.dev/developers/lca_tree_merging.html>. This page "describes how we handle merging tree-shape when there is not a single unique ancestor (criss-cross merge)".
The User's Perspective
Let's remind ourselves broadly what we are aiming to achieve with this 'symmetric merge' development. An improved user experience for merging, of course. How are users going to appreciate this change as an improvement, exactly? To set the context, let's think back to the time when we introduced merge tracking. Forgive me for repeating what is old news to many of us. We could paraphrase that news as:
- Subversion now remembers what revisions you have merged, so that if you want to merge "everything that's new since last time", you don't need to tell it. At least from a given source to a given target.
- When you want to merge back in the other direction, don't use a plain "merge" command because that will go through the motions of merging but produce the wrong result (in general).
- You can tell Subversion you are merging in the "other direction" by specifying this is a "reintegrate", and then it will do it right.
- If you specify "reintegrate", many more restrictions apply.
- Don't specify "reintegrate" when you are merging in the same direction as last time; if you do, Subversion will go through the motions but produce the wrong result.
- Don't expect to be able to merge between the same two branches again, after a reintegrate, without additional effort.
As a user who understands Subversion's merging, we know which variant of the merge algorithm is needed for a given scenario and can tell Subversion whether it should do it the "reintegrate" way or the "non-reintegrate" way. But just as merge tracking is about Subversion knowing what revisions need to be merged so we don't have to tell it, it also has enough information to work out which variant is needed, so we shouldn't have to tell it.
The essential points about 'symmetric' merge are:
- We can now expect Subversion to perform a requested merge from A to B without having to tell it which variant of the merge algorithm is the right one for the job, since only one variant is right and it will figure out which one.
- We can now expect Subversion to perform a requested merge from A to B or B to A, no matter how many times we have merged from A to B and/or B to A before.
- In the current, crude implementation, all the "reintegrate" restrictions still apply when that style of merge is needed. (In theory, we could write a symmetric implementation.)
I think one of the enablers for this advance is that we now have a better model of what merge tracking means [5]. This model allows us to describe what a merge will do in terms of bringing original changes (that is, non-merge commits) onto a branch, which is a bit higher level than describing merges in terms of mergeinfo and three-way diffs. In this model, "get me all the latest changes" has a consistent meaning no matter which way you last merged – it is a symmetric model.
Appendices
References
http://www.perforce.com/sites/default/files/convergence-divergence-merging-wingerd.pdf
Particularly slide 20, "The effect of 'edit' arrows", which is showing how merging "up" does not necessarily guarantee convergence (that is, make the content of the target branch look just like the source branch), even if the source branch has just been synced from the target.
Origin of a Branch
As indicated in the quote by Brane at the top of this page, it shouldn't make any difference in theory whether you branched B from A:
or A from B:
And indeed Subversion doesn't care. When tracing the youngest common ancestor of two branches (in terms of branching/copying, that it, not in terms of merges) Subversion follows the "copy history" of each branch which means it follows through both renames and branching. The only thing that matters is that the two branches have some common ancestor. Therefore, in most of the graphs here, the common ancestor is shown indicatively as "O":
The Two Sides of a Merge
The result of a 3-way merge can be seen from two sides. Consider this simple scenario:
Usually we see this merge as a change that took place along the history of branch A. We see a change A3 consisting of the addition of some changes from B, specifically the change B2 (strictly speaking, O:B2) into the content of A2.
Alternatively, and especially if we decide to use B2 as the base for a subsequent merge, we can see the same revision A3 as a change that occurred along a line of merge history that started at B2; that is, we are looking the change B2:A3:
Looking from this "side" of the merge, the change B2:A3, which is the difference between B2 and A3, consists of the addition of change A2 (strictly, O:A2) into the content of B2.
Symmetry
The merge is combining changes from both branches into one result, and, in principle, the merge process need not care which branch the result is destined for: the very same resulting tree of files and directories could equally well be produced by a merge in the other direction, from branch A to branch B:
Of course, while in theory the result is independent of which branch is the source and which is the target, in practice the merge may give different results. For example, if changes A2 and B2 both add some lines at the same place in a certain file, then perhaps the 3-way merge tool (or the user, for whom the tool is just a tool) will choose to put the source-branch lines before the target-branch lines in the result. In such ways, the result may differ. But notice that, using a suitable 3-way merge tool and/or user input, the result could be guaranteed identical.
Terminology
sync merge
indent |
---|
In this Wiki page, I use "sync merge" to mean a _complete merge_ in which the _base node_ is on the source branch. Outside this Wiki page, "sync merge" may be used more generally to mean any _complete merge_. In svn 1.7, this is obtained by specifying "svn merge" without the "--reintegrate" option and without any minimum revision number and without any of the conditions or options that would prevent merge tracking. |
indent |
---|
A _complete merge_ in which the _base node_ is on the target branch: that is, one that is in the opposite direction from the previous _complete merge_. In svn 1.7, this is obtained by specifying "svn merge" with the "--reintegrate" option. |
indent |
---|
A merge that merges all the changes from the source branch (up to the current head, or up to some specified maximum revision) that have not yet been merged into the target branch. This is in contrast to a _cherry-pick_ merge. A _complete merge_ is always a _forward merge_. A _sync_ merge and a _reintegrate_ merge and a _symmetric_ merge are all _complete_ merges. |
indent |
---|
A _complete merge_ that works in either direction between two branches, no matter which branch is regarded as the parent and which as the child, and no matter which direction any previous merges between those branches were performed. A _complete merge_ is the normal kind of merge in systems such as ClearCase; this paper is about developing such a merge in Subversion. |
indent |
---|
A merge that leaves a gap in the series of changes that have been merged from the source branch. This is obtained by specifying a minimum revision number. This is in contrast to a _complete_ merge. It can be a _forward_ or a _reverse merge_. A _cherry-pick_ merge does not necessarily ignore mergeinfo; in svn 1.7, the default behaviour of a _forward_ cherry-pick merge is to use mergeinfo to omit any of the specified revisions that have already been merged. |
indent |
---|
A merge that applies a pre-existing change again in the same sense. Symmetric merging is only concerned with _forward merges_. Contrast with a _reverse merge_. |
indent |
---|
A merge that applies the inverse of a pre-existing change. Symmetric merging is not concerned with reverse merges. Merge tracking does not fully track reverse merges comprehensively; it only tracks whether a branch is currently considered to "have" a given change. The opposite of a _forward merge_. Note that a single "svn merge" command can apply a mixture of _forward_ and _reverse merges_. |
indent |
---|
The sequence of path-revision coordinates where a node existed, if you trace it back through history along its normal changes and any copy-from links. Nothing to do with merge arrows. |
Drawing The Graphs
The graphs in this page are constructed with the program merge-graph.py from configuration files such as merge-reint-1.txt which may be found alongside the corresponding .png files as attachments to the Wiki page.
Graph Notation
A name and number such as "A2", shown in an ellipse on the graphs, means version "2" of branch "A". That means the state of branch "A" in some revision of the Subversion repository; the actual revision number is not stated.
Time advances from left to right along lines on the graph, both the dotted lines showing natural history and the solid or dashed lines showing merges. Where there is no line, no time relationship implied: the revision number of B1 could be lower, the same or higher than the revision number of A1, and likewise B2 could have been committed before A1.
In some contexts, "A1" refers to the change that describes revision A1 relative to its predecessor along its line of natural history. Mergeinfo syntax, for example, refers to changes in this way.
The arcs that start with a circle and end with a T-shaped bar represent the span of the source side of the merge. The circle means "exclusive" and the bar "inclusive", in the sense that an arc from A1 to A2 doesn't include the change associated with A1 (that is, the difference between state A0 and state A1) and does include the change associated with A2 (the difference between state A1 and state A2).
Info | ||
---|---|---|
| ||
The assumption is that the sync merge is one special case of merging and so can take one set of shortcuts, while the reintegrate is another special case and can take a different approach with its own set of restrictions and short-cuts. [2] ClearCase can perform a cherry-pick merge, and may even be able to record some metadata about it, but (as far as I can discover) the merge algorithm doesn't take account of cherry picks when working out what needs to be merged. Git is similar; from what I read recently it has an extension that can perform a cherry-pick merges and store cherry-pick metadata somewhere and use it to track cherry picks when using that extension – but the metadata is completely separate from the fundamental merge DAG, and AFAIK the standard merges don't recognize and account for cherry picks. |