truMerge, a merge helper script for Subversion =============================================== Contents: I. Purpose of truMerge II. How does it work III. Preparing to use truMerge IV. Using truMerge V. Limitations and known problems VI. Reference guide VII. Regression testing I. PURPOSE OF TRUMERGE Subversion 1.5 has some limitations handling renames or moves during merges, also known as tree conflicts. The two most important problems are: 1) Merge of a modified element to a moved element on the target branch (happens often while merging bug fixes from older releases). foo.c foo.c' branchA -------+-----------+---------- \ \ \ bar.c \ branchB +------+------------ \ \ svn merge \ v +--------- bar.c svn merge message: "Skipped missing target C:\branchB\foo.c" Result: the changes are _not_ merged, bar.c is left unchanged. Risk: a developer could miss the "skipped missing target" message and commit the changes thinking the merge went ok. 2) Merge of a moved element to a modified element on the target branch (happens often while merging between two lines of active development). foo.c bar.c branchA -------+-----------+---------- \ \ \ foo.c' \ branchB +------+------------ \ \ svn merge \ v +--------- +bar.c svn merge message: "Deleted C:\branchB\foo.c" "Added C:\branchB\bar.c" Result: svn merge will copy the moved element to the new path on the target, thereby ignoring all modifications made on the element on the target before the move. Risk: a developer is likely to ignore the Delete and Add, because (s)he has no way of knowing that foo.c and bar.c have differences that need to be merged. truMerge solves these (and other) problems. truMerge will detect and automatically solve most tree conflicts. truMerge tries to employ a set of sensible heuristics, balancing convenience and usability with safety. (This is naturally a delicated balance, and we think that truMerge strikes a good one, but are open to suggestions for improvement.) truMerge does not merge itself, but instead generates a merge script that the user must review and can subsequently execute using doMerge. II. HOW DOES IT WORK? truMerge solves the problems described above (and many other, more complex cases) by analyzing the changes on both the source and the target branch, and translating the changes in the revision range to be merged to a set of equivalent changes on the target branch. A key concept is that of element (file or directory) correspondence. truMerge will mine the Subversion repository for the data it needs to deduce which element on the target branch has a shared ancestor with the element on the source branch. When it cannot find such an element, or when there are multiple elements that share ancestry, truMerge will use heuristics to find the best possible candidate, and issue a warning that it has done this. In the example illustrating the maintenance use case above, the file foo.c on the source branch corresponds to the file bar.c on the target branch. Because truMerge understands that foo.c corresponds to bar.c, it can determine that the changes in foo.c need to be merged to bar.c, and it will do just that. The second example shown above is slightly more complex: here truMerge detects that foo.c on source corresponds to foo.c on target, and it will merge the rename of foo.c to bar.c on the source branch as a rename of foo.c to bar.c on the target branch because of this. Of course there are limits to what truMerge can do: it will handle most cases of what are known in Subversion as tree conflicts (a concept introduced in Subversion 1.6, through active involvement and sponsorship of Philips in the Subversion community), but some cases are simply not reconcilable automatically. For example: imagine a file foo.c being renamed to bar.c on the source branch, being merged to a branch where that same file has been renamed to gnat.c. It is not obvious what should happen here, and truMerge will issue a warning. It is important to note that, because Subversion 1.5 does not have native tree conflicts, truMerge works in a somewhat peculiar way: instead of actually executing all the actions to perform the merge, it generates a merge script, executable using the doMerge script, that the user should review and check for warnings (and fix appropriately). Once the user is happy with the script, the script can be run, and the merge will be performed automatically. As of Subversion 1.6, truMerge is planned to skip this step of creating a script, and use the tree conflict concept in 1.6 to make the user aware of problems, just as Subversion uses ordinary conflicts to make the user aware of problems merging textual changes. Note that this definition of "identity" is completely deterministic, but it has some weaknesses: * When elements are cloned (except for branching), truMerge cannot distinguish between the two. * When a user forgets to use svn move (but instead adds a new element, copying the content from an old element and deleting the old element) truMerge cannot identify the new element as being a copy of the old element. truMerge makes a best effort guess for both cases: users do not always use Subversion "correctly", so it happens quite frequently that an idenfitication cannot be made. In that case, truMerge looks for the element with "the most similar" path name, and presents that as a candidate. Unfortunately, Subversion does not make the required information available in an easy-to-use and highly performant API, so truMerge requires some start-up time before it can start generating the merge script. truMerge uses the following algorithm for merging: 1. Determine what elements are added, deleted, are moved and/or modified in the revision range to be merged. 2. For every element deleted: find the corresponding element in the target cache (using the MoveInfoCache's of the source and target branches). If a unique element is found, schedule the corresponding svn delete command. If no element is found, ignore the delete (assumes tacitly the element has already been deleted on target). If multiple elements are found: issue a warning. 3. For every element added: find the element in the target branch corresponding to the directory where the element was added in the source branch. Schedule an svn copy command if the parent is found and unique. Issue a warning otherwise. 4. For every element moved/modified: find the corresponding element in the target directory, and if unique, schedule an svn merge to merge the changes from source to target. If an element is moved as well, calculate a "corresponding" svn move command on the target branch and (if a unique solution exists) schedule the svn move command. Issue a warning if either of the previous steps fail. 5. Adds and moves are performed in two phases to avoid as many obstructions as possible: in phase 1, the element is moved to the new location with a temporary (hopefully non colliding) name. In phase 2, all elements are renamed to their final names. III. PREPARING TO USE TRUMERGE You need to have a development environment with the command-line client of Subversion 1.5 and (Active)Perl installed. 1) Get a copy of truMerge. truMerge contains the following scripts: * truMerge.cmd (truMerge.sh for use on Linux) Command to merge a range of revisions from one branch to another. * doMerge.cmd (doMerge.sh for use on Linux) Command to run a script generated by truMerge. This command also patches the merge properties after a nonrecursive merge (removes '*' for the range indicated). See also Section V. 2) Checkout a working copy of the target branch to merge to The working copy has to include your entire branch. IV. USING TRUMERGE 1. Ensure your target working copy is clean (i.e., it has no local modifications). 2. (Optional one-time action; strongly recommended) Create a cache file for your repository using: truMerge --cache --init-cache Where: = File to store truMerge's internal data = url of the repository This step is only needed once for a repository. truMerge will re-use the cache for every merge in this repository. Building this cache file can take quite some time as truMerge will reconstruct the complete history of the repository in an internal representation. It is recommended to first create a full xml log of your repository: svn log -v --xml -r 1:HEAD > logfile.xml and specify it on the command line: -log:logfile.xml This will make building the cache faster and less network dependent. NOTE: truMerge will need the COMPLETE log of the full repository, even if you intend to merge only between two specific branches. 3. Run truMerge: truMerge -r : [--cache ] [-m ] > merge.tm Where: = start revision number to be merged = end revision number to be merged = File to load/store truMerge's internal data structure from/to. = File to store a default commit message in = url of the source branch = Path to the target working copy truMerge generates a script (merge.tm) with all commands needed to perform the requested merge correctly. Redirect the standard output to a *.bat file to create an executable script. This script can be run to perform the actual merge. You must review and possibly fix the script before running it: truMerge will embed warnings (marked as "Warning:") in the script at places where it is unsure what to do. On Linux: use truMerge.sh instead of truMerge. 3. Review & correct the merge script truMerge will log warnings as comments in the merge script for tree conflicts it could not resolve. Review these warning and correct the merge script to reflect correct merging. Detailed explanations of the possible warnings and suggestions for their resolution can be found in the reference guide. 4. Run the corrected merge script: doMerge merge.tm 5. Resolve conflicts 6. Commit the changes Things to NOTE when using truMerge: 1. truMerge will update your working copy to HEAD if --reintegrate is specified (unless --no-update is specified as well). 2. Source URL and Target Working copy MUST be specified at the top level of the branch. V. LIMITATIONS AND KNOWN PROBLEMS * truMerge was developed in perl on a Microsoft Windows XP platform. It works on Linux as well (tested on Ubuntu 10.04), but it has not been tested extensively on Linux. * truMerge requires a Subversion 1.5 client (older clients have too many working copy restrictions to be useful) * truMerge deals with merge tracking in a limited way because svn's built-in merge tracking appears broken under tree conflicts. truMerge deals with this in a very specific way: 1. truMerge uses --depth empty for all merges to deal with tree conflicts. 2. truMerge only touches files and directories that have actual changes on them. 3. The only reliable merge tracking information is recorded at the root level (much the same as svnmerge.py). (This is due to yet another defect in Subversion prior to 1.6.3.) 4. truMerge "promotes" merges to recursive by patching the merge properties. * Subversion cannot replace directories in one go. truMerge does not yet work around this, as there is no reliable work-around in Subversion. For the https transports, the following work-around does work: 1. Run svn del on the directory (truMerge already does this) 2. Run rmtree /s/q on the directory (truMerge does not do this) 3. Run svn move to replace the directory with another (truMerge generates this command, but it will fail because the second step is not generated). 4. Commit your changes. Note: Running rmtree /s/q will corrupt your working copy. The state is ok for committing, but there is not easy way to revert your changes for example. VI. REFERENCE GUIDE 1. USAGE TRUMERGE truMerge [-v|--verbose] [-d debugLevel] [--cache cacheFile] [--init-cache] \\ [-log:svnLogFile] [--record-only] [--no-force-root] [--unmatched-roots] \\ [--force-merge-tracking] [--conflict-delete-propdiffs] [--force-delete] \\ [--no-require-closure] [--no-update] [--no-localmods] [-m messageFile] \\ [--reintegrate|-c M[,N,...]|[-r|--revision] M:N] srcURL [WCPATH] -v or --verbose : print extra information -d debugLevel : Debug: the higher debugLevel, the more debug output (0=none) --cache cacheFile : file name of the cache file (the cache is created by truMerge and read back to help speed up merging) --init-cache : Only initialize the cache file, do not generate merge script -log:svnLogFile : (OBSOLETE) file name of a file with the verbose log of the complete repository (output of svn log -v --xml -r 1:HEAD) --record-only : do a record-only merge --no-force-root : do not force a merge of the root of the branch --unmatched-roots : the source and target branch roots may be unrelated (i.e. do not share ancestry). --force-merge-tracking : Force merge tracking to be set at individual elements (this option is offered only for backward compatibility). --conflict-delete-propdiffs : raise a tree conflict when deleting an element that has property differences between source and target --force-delete : force a delete when source and target have differences --no-require-closure : do not require the repository to have internal closure (i.e., do not allow copy from entries from undefined sources). --no-update : do not update the working copies used to HEAD --no-localmods : do not allow local modifications in the working copy -m messageFile : Generate a commit message and write it to messageFile --reintegrate : reintegrate the source branch -c M[,N,...] : change set(s) to merge, use -M to indicate a reverse merge [-r|--revision] M:N : revision range to merge (-c and -r may be mixed and specified multiple times) srcURL : URL of the source branch (in case of reintegrate: path of the source working copy). WCPATH : Filesystem path of the target working copy (the working must be in a pristine state, local modifications are not allowed) 1a. STANDARD OUTPUT The standard output is a script that can be run to perform the actual merge. Make sure to review and possibly fix the script before executing the merge: truMerge will embed warnings (marked as "Warning:") in the script at places where it is unsure what to to. truMerge will generate the following type of commands: Copy newly added elements from the source branch: * svn copy src\@N tgt Merge changes: * svn[-merge-tracking] merge [--record-only] -r N:M src tgt * svn[-merge-tracking] merge [--record-only] src1\@N src2\@M tgt Delete elements: * svn del --force tgt Move elements: * svn move --force tgt1 tgt2 Housekeeping * rmtree tgt * move tgt1 tgt2 Multiple revision ranges * checkConflicts tgt Redirect standard output to a file to create a script executable by doMerge. 1b. STANDARD ERROR Debugging information (if applicable) will be routed to standard error. 1c. USE OF CACHE FILES truMerge needs either a full log of the repository in xml format (obsolete, only provided for backward compatibility) or a cache file to reach acceptable performance. The best option is to use the cache file: it is a dump of truMerge's internal format and can be loaded and stored to file very quickly. The cache file will be automatically updated by truMerge after every use of truMerge. When a cache file does not exist yet, truMerge will create one. Supply the --cache option to specify the name of the cache file. The best way to create a cache file from scratch is to specify the --init-cache option. truMerge will only generate a cache file, but generate a merge script. In summary: * First time use: specify --cache and --init-cache to create the cache file * Subsequent use: specify --cache only Note: a cache file is valid for an entire repository, it is not limited to any specific branch. This means it can be re-used for any merge in the repository for which the cache was created. 2. WARNINGS Be sure to check the merge script for all warnings and take appropriate action before running the script! truMerge can identify a number of different issues for which it will issue warnings (all warnings are marked by > Warning: ). The warnings can be grouped into: * Usage Issues: truMerge has detected a potential problem in the way it is being used. * Tree Conflicts: Source and target branches have tree changes that are in conflict * Target Matching Issues: Determining the merge target from ancestry relations alone is not possible. * Scheduling Conflicts: Refactoring may lead to (temporary) obstructions of elements. truMerge tries to resolve this, and warns when it does so. 2a. USAGE PROBLEMS Warning: Source branch root (SOURCE) and target branch root (TARGET) are not related! * This is an indication that your trying to merge two directories that do not share ancestry. * Resolution: You probably do not want to do this! Rerun truMerge with correct source and target branch paths. Warning: Working copy PATH appears empty, is that ok? * This is an indication that you are trying to merge to an incorrect working copy path. * Resolution: You probably do not want to do this! Rerun truMerge with the correct working copy path. 2b. TREE CONFLICTS Warning: treeconflict in ACTION: no target found for SOURCE * This indicates that SOURCE appears not to have a counterpart on the target branch. 1. This usually happens when a file has been "moved" inappropriately (i.e., by performing an svn add and an svn delete, instead of svn move). 2. A less frequent case is that the equivalent of SOURCE has been deleted on the target branch, because it is no longer needed, or that its content has been partially copied into other files (refactoring). 3. A third (highly unlikely) case is that the element does have a corresponding element on the target branch, which was added to the target branch in an inappropriate manner (simply by adding the element without history, instead of merging the solution) and that the element has a different name. * Resolution: determine if the changes need to be merged. If so, merge the changes manually. If not: no action needed. Warning: treeconflict in handleDelete: source differs from target * Indicates that the revision range being merged contains a file delete, and the file deleted differs between source and branch. truMerge issues a warning to prevent changes on the source branch being lost. * Resolution: determine if the delete is allowed or not, if not, remove the delete command from the script. If it is, check if the changes on the target element need to be merged somewhere else, and if so, do so. Warning: treeconflict in handleAdd, element already present in target * Indicates that the merge range contains a file addition, and that that file already exists in the target branch. This is an indication of a "double merge": the file has been copied explicitly from the source branch without recording merge tracking history. Warning: treeconflict in handleMove, rename conflict * Indicates that the element has been renamed in conflicting ways on the source and target branch. * Resolution: determine the correct name for the target element. If a rename is needed, add it to the script. (To do: check if the merge command needs to be added as well.) Warning: treeconflict in handleMove, move conflict * Indicates that the element has been moved in conflicting ways on the source and target branch. * Resolution: determine the correct path for the target element. If a move is needed, add it to the script. (To do: check if the merge command needs to be added as well.) Warning: "treeconflict in handleAdd, copy source target cannot be found * Indicates that the element is to be copied, but the copy source cannot be found on the target branch. This can happen if the copy source has been deleted on the target branch. * Resolution: determine if the copy is needed at all. If the copy is not needed, no action is needed. If a copy is needed, copy the element from an older revision of the copy source on the target branch or from the copy source on the source branch. 2c. TARGET MATCHING ISSUES Warning: no common history in ACTION, selecting best match * This indicates that SOURCE appears not to have a counterpart that shares ancestry with SOURCE on the target branch, but that truMerge has found a candidate that may be the corresponding element. This can happen when the element was added to the target branch in an inappropriate manner (simply by adding the element without history, instead of merging the solution). * Resolution: determine if the candidate is correct. Is so: no action needed, if not: change the merge command to reflect the correct merge target (If no merge is needed, because the element was deleted: remove the merge command from the script.) The next two lines of the script indicate the URL of the source element and the path of the selected target. Warning: only implicit copies in ACTION, selecting best match * This indicates that SOURCE appears to have only counterparts that were created by the merge itself. (To do: describe resolution.) Warning: multiple targets in ACTION, selecting best match * This indicates that SOURCE appears to have multiple counterparts that share ancestry with SOURCE on the target branch, and that truMerge has selected a candidate that may be the corresponding element. This can happen when the element was cloned (inside a branch, using svn copy. Please avoid using svn copy!). * Resolution: determine if the candidate is correct. Is so: no action needed, if not: change the merge command to reflect the correct merge target (If no merge is needed, because the element was deleted: remove the merge command from the script.) The next two lines of the script indicate the URL of the source element and the path of the selected target. Warning: Element FROM maps to multiple elements, selecting best match: TO * This indicates that (in a move or edit action) the FROM element at the start of the merge range appears to have multiple counterparts that share ancestry with TO at the end of the merge range, and that truMerge cannot guarantee the matches is has made for this element. This can happen when the element was cloned (inside a branch, using svn copy. Please avoid using svn copy!). * Resolution: determine if the matches made for this element are correct. Is so: no action needed, if not: change the merge command(s) to reflect the correct TO element. * Note: This situation can potentially also lead to incorrect matches on the target branch, so be sure to review the commands carefully. 2d. SCHEDULING CONFLICTS Warning: obstruction, moving to temporary path * Indicates that the target path for an add, rename or addition is obstructed by another element. truMerge will use a temporary path name to still do the required action, and attempt to reconcile at the end of the script. * Resolution: usually, no action is needed. Warning: treeconflict, final path obstructed * Indicates that the final target path for an add, rename or addition is obstructed by another element. truMerge is trying to move an element from its temporary path name to the final path to reconcile obstructions. * Resolution: determine the correct final path for the element and adapt the script accordingly. Warning: overwriting implicitly copied target * Indicates that a rename or move results in an element being overwritten that was copied from the source branch from an earlier action in the merge script. This has to do with the fact that Subversion always performs a deep directory copy. The typical use case that causes this is the following: 1. A new directory was created on the source branch 2. An existing file was moved into this new directory truMerge translates this into a copy of the new directory from the source branch to the target branch, followed by a move of the file from the old location in the target branch to the new location. This move overwrites the file that was copied implicitly by the directory copy done earlier. * Resolution: Usually, this situation needs no action. 3. USAGE DOMERGE doMerge [-v] [--threads n] mergescript.tm -v : Verbose mode --threads n : Maximum number of threads to use (default 10) --rerun n : Maximum number times to rerun a failed command (default 2) mergescript.tm : The merge script as generated by truMerge Note: --threads 0 --rerun 0 is equivalent to sequential execution without reruns (backward compatibility) 3a. RERUNNING MERGE SCRIPTS doMerge uses the following model to deal with network errors and user interruptions: at the completion of doMerge, the script is annotated as follows: * Completed commands are commented out with the prefix # COMPLETED * Failed commands are follows by the output of the command run (as comments) so that the user can determine what went wrong * Commands that were not run are unaltered This model allows a very easy recovery from e.g. network failure: simply rerun doMerge with the exact same parameters, and doMerge will try to complete the remaining commands of the script. 3b. TUNING DOMERGE doMerge can run commands in parallel, to hide latency for remote servers. The default number of threads to use is 10, which appears to be a safe limit (if too many commands are run in parallel, commands can fail because of throttling at the server). The maximum number of threads to use is specified at the command line with the --threads option. doMerge will try to rerun a command that has failed, in an attempt to recover from intermittent network failures. The number of retries by default is 2 (i.e., a command will be attempted no more than 3 times). The number of retries can be tuned using the --rerun option. doMerge will attempt to schedule "big" operations aggressively. doMerge considers adds and merges to files of more than 3MB to be "big". This limit is determined by a variable in the doMerge script itself. To avoid "big" operations pushing out many small operations, no more than 50% of the threads will be used to run "big" operations. This fraction is currently hardcoded. VII. REGRESSION TESTING truMerge comes with a (limited) set of regression tests: Usage: 1. rullalltests.cmd (runs all tests) It will report ---- OK! ---- in case everything checks out ok. 2. runtest.cmd (runs the speficied testscript) (On Linux: runtest.sh) It will report ---------------------- OK! in running test ---------------------- in case everything checks out ok. The available test scripts are: * TestTruMerge.pl * TestTruMerge.pl localmodifications * TestDoMerge.pl * TestMTAwareness.pl