Posts Tagged ‘svndumpfilter’

Major Multi-VCS surgery

Friday, June 25th, 2010

This summer I am working on the Google Summer of Code project “Revive GNU Prolog for Java”. It now has a project page and a Git repository which resulted in a rather entertaining screen shot.

Yesterday I found out that last year someone did a lot of the work I was intending to do this summer and they gave me a svn dump of the changes that they made. I had previously found two other people that had made some changes over the last 10 years while the project was dormant.

So I was faced with the task of taking the SVN dump of the changes made by Michiel Hendriks (elmuerte) and splicing them onto the old CVS history of the code he took the source .zip and then converting it into Git (which is what we are now using as our VCS). I was kind of hoping that with luck the two development histories would then share a common root which could be used to help with merging the two development histories back together that hope was in vain (though I have another idea I might try later).

Anyway this whole splicing thing was non-trivial so I thought I would document how I did it (partly so that I can find the instructions later).

So there were problems with the SVN dumpfile I was given: it couldn’t be applied to a bare repository as all the Node-Paths had an additional extension ‘trese/ample/trunk’ on the beginning. No one fix I tried was to add

Node-path: trese
Node-kind: dir
Node-action: add

Node-path: trese/ample
Node-kind: dir
Node-action: add

Node-path: trese/ample/trunk
Node-kind: dir
Node-action: add

Into the the first commit using vim: now this worked but it meant that I still had the extra ‘trese/ample/trunk/’ which I didn’t want.
So I removed that using vim and “:%s/trese\/ample\/trunk\///g”, unfortunately there were a couple of instances where trese/ample/trunk was refereed to directly when files were being added to svn:ignore unfortunately I didn’t find out how to refer to the top level directory in an svn dump file so I just edited those bits out (there were only 2 commits which were effected). So now I had a working svn dumpfile. To do the splicing I used svndumptool.py to remove the first commit resulting in a dumpfile I called gnuprolog-mod.mod2-86.svn.dump (see below for instructions on how to do this).

I got a copy of the CVS repository, so that the current working directory contained CVSROOT and also a folder called gnuprolog which contained the VCed code.

# This makes a dumpfile 'svndump' of the code in the gnuprolog module I only care about the trunk.
cvs2svn --trunk-only --dumpfile=svndump gnuprolog
# Then we use svndumptool to remove the first commit as cvs2svn adds one to the beginning where it makes various directories.
svndumptool.py split svndump 2 8 cvs-1.svn.dump
# Then we use vim to edit the dump file and do :%s/trunk\///g to strip of the leading 'trunk/' from Node-Paths
vim cvs-1.svn.dump
# Create a SVN repository to import into
svnadmin create gnuprolog-mod.plaster.svn
# Import the CVS history
svnadmin load gnuprolog-mod.plaster.svn < cvs-1.svn.dump
# Import the SVN history
svnadmin load gnuprolog-mod.plaster.svn < gnuprolog-mod.mod2-86.svn.dump
# Make a git repository from the SVN repository
git svn clone file:///home/daniel/dev/gnuprolog/gnuprolog-mod.plaster.svn gnuprolog-mod.plaster.git

Things that I found which are useful

The SVN dump file format
svn-to-git got me the closest to being able to import the svn dumpfile into git.
These instructions on how to fix svn dumpfiles.

Sorry this post is rather ramblely stuck half way between a howto and an anecdote but hopefully someone will find it useful.