WeaveDiff: an append-only weave Although bzr does not intend to use append-only weaves, it was the kind of interesting problem I couldn't let go until I'd solved. Terminology: weavediff: The entire WeaveDiff file revision diff: A section of a WeaveDiff that describes how to construct a revision, given its ancestor revision build-parent: the leftmost parent in the list of parents build-ancestry: the list of all revisions needed to construct a revision Features: - Each revision diff can be applied to a foreign weavediff, so long as the foreign weavediff contains all its parents - Revisions can be constructeded in O(n) time, where n is the amount of change since the oldest line in the file. This does require an index, but the index can be append-only, too. - Can be read efficiently over any dumb transport that allows you to start reading from a given position All lines are identified by the revision that added them, and their position in that revision. Unique identifiers appear on the first line Each revision is identified by the number of steps away from the current revision it is. There may be more than one parent, but this number always refers to the the build ancestry. It is a syntax error to delete a line that was added in the current revision. Example texts ============= Revision asdf (no parent) A B C D E Revision lkj parent asdf B C E Revision fsdf parent asdf A B C E Revision fhkh parent fsdf B F C E Weavediff Example ================= Revision asdf lines 5 hash asdfqewra add 1.-1 A add 0.0 B add 0.1 C add 0.2 D add 0.3 E Revision lkj parent asdf lines 3 hash mofala del 1.0 del 1.3 Revision fsdf parent asdf lines 4 hash dasfa del 1.3 Revision fhkh parent fsdf lines 4 hash flqr del 1.0 add 2.1 F del 1.3 The # of lines permits us to stop once we have read all the lines we need. The hash allows us to check for accidental corruption Procedure for extraction without reading the whole file, starting from the end: 1. Read revision diff 2. Add lines to global list, before the lines that were inserted after them. 3. Mark added lines as present, except those that were deleted. 4. Store a reversed list of insertions, so that when we encounter the line that a line was inserted after, we can insert the old line before it. 5. If the number of lines present is less than the number of lines required by the revision, repeat It is easy to imagine using periodic snapshots to reduce the amount of data still further. Procedure for weave merge: For each line in A that is not present in B, and was added in C, iterate backwards until 1. you find a deletion of the line (drop the line) 2. you hit an ancestor of C in every line of developement (keep the line)