Move Tracking in the Update Editor
Summary
A WC can contain multiple instances of the same repository node, by mixed-revision and/or switched paths. During an update, multiple instances can move to the same target path, and one instance can move to multiple target paths. We can't reasonably avoid WCs getting into such a state, nor forbid updating such a WC.
An editor that can perform one move per node (that is, per node-copy-id) is suitable for editing a repository state, and can thus be used as the commit editor. The existing update code drives an edit over WC paths, not over repository nodes or URLs. An editor that can perform only one move per node cannot be used as the update editor.
Therefore, the update editor must somehow handle multiple source and destination paths for the same move.
Details
WC has Multiple Instances per Node
A WC can contain multiple instances of the same repository node.
- A switched path points to the same or a different revision of any node
WC: (A@10, X=^/A@10)
WC: (A@10, X=^/A@11); repo: (^/A@10 and ^/A@11 are the same node-rev)
- A non-switched path points to a different revision of a moved node
WC: (A@10, B@20); repo: (mv ^/A@10 ^/B@20)
The WC does not (currently) know about node-copy-ids, and so does not know that it has multiple instances of the same node, except in the simple case where the URL@REV is the same.
Multi-Source and Multi-Target Moves
During an update, multiple instances of one repository node can move to the same target path, and one instance can move to multiple target paths.
Multiple Sources
WC: (A@10, B@20); repo: (mv A@10 → B@20 → C@30)
Update to r30; A moves to C, and also B moves to C.
| | +-- A mv--\ | | \ | +-- B mv--\ \ | \-\--> +-- C
If multiple sources have the same tree shape (switches, depth, etc.) and no local modifications, then it makes sense for the WC to simply accept the single destination. If there are local modifications to multiple source instances, then the client might want to merge them or raise a conflict.
One Move, One Non-Move
This case is similar to Multiple Sources, with one important difference. If we assume an editor in which each move is labelled by a move-id, the consumer cannot recognize such a conflict just by examining the move-ids.
WC: (A@10, B@20); repo: (r10: mkdir A; r20: mv A B; edit B; r30: edit B)
Update to r30; A moves to B (with edits), and also the existing B is updated.
These changes conflict. A typical resolution would be to merge them: put both sets of local mods into the same B.
| | +-- A mv---\ | | \ | +-- B mod--> \--> +-- B
Multiple Targets
WC: (A@10, X@10, Y=^/X@10); repo: (mv A@10 → X/A@20)
Update to r20; A moves to X/A and also to Y/A.
| | +-- A mv--\ | | \ \ | +-- X \ \ +-- X | \ \--> | +-- A | \ | +-- Y \ +-- Y \--> +-- A
With multiple targets, there is no need to prevent multiple instances of the destination node from being created. However, if there are local modifications, it could be undesirable to end up with the same modifications in multiple places, so the client might want to warn the user or allow the user to choose what happens to the modifications.
Many-to-Many Move
Many-to-many move, combining multiple sources with multiple target paths:
WC: r10 (A, B=^/A, X, Y=^/X); repo: (mv A@10 → X/B@20)
Update to r20; A and B both move to X/C and to Y/C.
With a many-to-many move, there is the possibility that the sources and destinations can be logically paired according to their path-wise nearness. Example, starting from
WC (trunk1→^/trunk@10, trunk2→^/trunk@20) and repo (mv trunk/A@10 → trunk/B@20)
:
| | +-- trunk1 +-- trunk1 | | | | | +-- A mv--\ | | | \-> | +-- B | | +-- trunk2 +-- trunk2 | | +-- A mv--\ | \-> +-- B
This pairing could be implemented by the edit driver, in which case it should describe each such move with its own id, or by the consumer on receiving a set of many-to-many moves.
### What are the rules for this nearness pairing?
Avoidance
We can't easily avoid WCs getting into such a state. To avoid it, the WC would probably need to know node-ids and have substantial changes in the allowed patterns of usage.
When a WC has multiple instances of the same repository node, we can't reasonably forbid updating it.
Editor
Either the update editor must cope with multiple source and destination paths for the same move, or the client must request several simple edits, each with no multiple instances. Options include:
- Traversal over WC paths using non-unique mv-away and mv-here
- Traversal over URLs or repository nodes
- multiple edit operations per node, one for each WC path
- Client requests multiple edits, with no multiple instances in a single edit
Non-Unique Moves
Traversal over the WC paths, using non-unique mv-away and mv-here.
- 1 op. per path (excluding replacements)
- mv-away and mv-here not uniquely paired by their id
Problems:
- The consumer (client/WC) may want to know whether a given move is unique before executing it, so that it can choose to warn or raise a conflict (for example).
- Each move-away is (logically) accompanied by its own edit. For example, with
, one edit applies to A (r10:30) and another edit applies to B (r20:30).
WC (A@10, B@20), repo (A@10 → B@20 → C@30)
| | +-- A mv-----\ | | +r10:30 \ | | \ | +-- B mv-----\ \ | +r20:30 \ \ | \--\--> +-- C
But the WC doesn't necessarily need to receive instructions to edit every move-away instance. If two instances have the same tree shape (switches, depth, etc.) then it only needs to move and edit one of them and can discard the others (after preserving any local mods). Thus:
| | +-- A mv[1*] | # Redundant duplicate of move-away id 1; | | # can be deleted; indicated by id "1*". | | +-- B mv[1]--\ | # Any edits described for C are relative +r20:30 \ | # to this instance. \-----> +-- C
The edit operation for path A cannot be simply “delete”. It needs to indicate that A is part of the same move that also affects B, because the client may want to notify the user appropriately and preserve any local mods in A (perhaps merging them into C).
So, is something like this the way forward?
Traversal over URLs or repository nodes
Would it help if the edit traversed URLs or nodes instead of WC paths? If such an editor would visit each URL once, it would need to be able to send multiple edit operations for the same URL, one operation per instance of that node. Or something similar for nodes instead of URLs.
This seems to have no advantages over the approach of describing non-unique moves.
Multiple, Simple Edits
Another option was briefly considered. Can the client request multiple edits, with no multiple instances in a single edit?
It would need to know node ids. Either the client would know the node ids in the WC and make multiple reports, requesting one edit each, or the reporter (report handler) would know them and have the ability to issue multiple edits to the client. Either way, how would it decide how to partition the changes? And still the client would want to be able to detect conflicts, and this would seem to be more difficult to achieve if the conflicting changes are in separate edits.
This seems too complex, and too much of a departure from the existing system.
Conclusion
(No conclusion yet.)
Appendix: Notes on Semantics
Desirable semantics include:
- Move is a local operation. For example, we can make pairings in this many-to-many move scenario:
WC: (trunk1=^/trunk, trunk2=^/trunk)
Repo: (mv trunk/foo to trunk/bar)
Update: mv trunk1/foo → trunk1/bar; mv trunk2/foo → trunk2/bar.
Current WC semantics include:
- A node's name and existence is regarded as a property of that node, rather than of its parent directory. An update of the path 'A' can add or delete the node at 'A' in the base tree without having to update its parent.