Major Multi-VCS surgery
Friday, June 25th, 2010This summer I am working on the Google Summer of Code project “Revive GNU Prolog for Java”. It now has a project page and a Git repository which resulted in a rather entertaining screen shot.
Yesterday I found out that last year someone did a lot of the work I was intending to do this summer and they gave me a svn dump of the changes that they made. I had previously found two other people that had made some changes over the last 10 years while the project was dormant.
So I was faced with the task of taking the SVN dump of the changes made by Michiel Hendriks (elmuerte) and splicing them onto the old CVS history of the code he took the source .zip and then converting it into Git (which is what we are now using as our VCS). I was kind of hoping that with luck the two development histories would then share a common root which could be used to help with merging the two development histories back together that hope was in vain (though I have another idea I might try later).
Anyway this whole splicing thing was non-trivial so I thought I would document how I did it (partly so that I can find the instructions later).
So there were problems with the SVN dumpfile I was given: it couldn’t be applied to a bare repository as all the Node-Paths had an additional extension ‘trese/ample/trunk’ on the beginning. No one fix I tried was to add
Node-path: trese Node-kind: dir Node-action: add Node-path: trese/ample Node-kind: dir Node-action: add Node-path: trese/ample/trunk Node-kind: dir Node-action: add
Into the the first commit using vim: now this worked but it meant that I still had the extra ‘trese/ample/trunk/’ which I didn’t want.
So I removed that using vim and “:%s/trese\/ample\/trunk\///g”, unfortunately there were a couple of instances where trese/ample/trunk was refereed to directly when files were being added to svn:ignore unfortunately I didn’t find out how to refer to the top level directory in an svn dump file so I just edited those bits out (there were only 2 commits which were effected). So now I had a working svn dumpfile. To do the splicing I used svndumptool.py to remove the first commit resulting in a dumpfile I called gnuprolog-mod.mod2-86.svn.dump (see below for instructions on how to do this).
I got a copy of the CVS repository, so that the current working directory contained CVSROOT and also a folder called gnuprolog which contained the VCed code.
# This makes a dumpfile 'svndump' of the code in the gnuprolog module I only care about the trunk. cvs2svn --trunk-only --dumpfile=svndump gnuprolog # Then we use svndumptool to remove the first commit as cvs2svn adds one to the beginning where it makes various directories. svndumptool.py split svndump 2 8 cvs-1.svn.dump # Then we use vim to edit the dump file and do :%s/trunk\///g to strip of the leading 'trunk/' from Node-Paths vim cvs-1.svn.dump # Create a SVN repository to import into svnadmin create gnuprolog-mod.plaster.svn # Import the CVS history svnadmin load gnuprolog-mod.plaster.svn < cvs-1.svn.dump # Import the SVN history svnadmin load gnuprolog-mod.plaster.svn < gnuprolog-mod.mod2-86.svn.dump # Make a git repository from the SVN repository git svn clone file:///home/daniel/dev/gnuprolog/gnuprolog-mod.plaster.svn gnuprolog-mod.plaster.git
Things that I found which are useful
The SVN dump file format
svn-to-git got me the closest to being able to import the svn dumpfile into git.
These instructions on how to fix svn dumpfiles.
Sorry this post is rather ramblely stuck half way between a howto and an anecdote but hopefully someone will find it useful.