Posts Tagged ‘git’

Updating file copyright information using git to find the files

Wednesday, September 1st, 2010

Update: I missed off --author="Your Name".
All source files should technically have a copyright section at the top. However updating this every time the file is changed is tiresome and tends to be missed out. So after a long period of development you find yourself asking the question “Which files have I modified sufficiently that I need to add myself to the list of copyright holders on the file?”.
Of course you are using a version control system so the information you need about which files you modified and how much is available it just needs to be extracted. The shell is powerful and a solution is just one (long) line (using long options and splitting over multiple lines for readability).

$ find -type f -exec bash -c 'git log --ignore-all-space --unified=0 --oneline  \
    --author="Your Name" {} | grep "^[+-]" | \
    grep --invert-match "^\(+++\)\|\(---\)" | wc --lines | \
    xargs test 20 -le ' \; -print > update_copyright.txt \
    && wc --lines update_copyright.txt

In words: find all files and run a command on them and if that command returns true (0) then print out that file name to the ‘update_copyright.txt’ file and then count how many lines are in the file. Where the command is: use git log to find all changes which changed things other than space and minimise things other than the changes themselves (--oneline to reduce commit message etc and --unified=0 to remove context) then strip out all lines which don’t start with + or – and then strip out all the lines which start with +++ or — then count how many lines we get from that and test whether this is larger than 20. If so return true (zero) else return false (non zero).

This should result in an output like:

 284 update_copyright.txt

I picked 20 for the ‘number of lines changed’ value because 10 lines of changes is generally the size at which copyright information should be updated (I think GNU states this) and we are including both additions and removals so we want to double that.

Now I could go from there to a script which then automatically updated the copyright rather than going through manually and updating it myself… however the output I have contains lots of files which I should not update. Then there are files in different languages which use different types of comments etc. so such a script would be much more difficult to write.

Apologies for the poor quality of English in this post and to those of you who have no idea what I am on about.

Compiling git from git fails if core.autocrlf is set to true

Monday, August 16th, 2010

If you get the following error when compiling git from a git clone of git:

: command not foundline 2: 
: command not foundline 5: 
: command not foundline 8: 
./GIT-VERSION-GEN: line 14: syntax error near unexpected token `elif'
'/GIT-VERSION-GEN: line 14: `elif test -d .git -o -f .git &&
make: *** No rule to make target `GIT-VERSION-FILE', needed by `git-am'. Stop.
make: *** Waiting for unfinished jobs....

and you have core.autocrlf set to true in your git config then consider the following “it’s probably not a good idea to convert newlines in the git repo” curtsey of wereHamster on #git.

So having core.autocrlf set to true may result in bad things happening in odd ways because line endings are far more complicated than they have any reason to be (thanks to decisions made before I was born). Bugs due to white space errors are irritating and happen to me far too often :-).

Today I used CVS – it was horrible, I tried to use git’s cvsimport and cvsexportcommit to deal with the fact that CVS is horrible unfortunately this did not work ;-(.

This post exists so that the next time someone is silly like me they might get a helpful response from Google.

Major Multi-VCS surgery

Friday, June 25th, 2010

This summer I am working on the Google Summer of Code project “Revive GNU Prolog for Java”. It now has a project page and a Git repository which resulted in a rather entertaining screen shot.

Yesterday I found out that last year someone did a lot of the work I was intending to do this summer and they gave me a svn dump of the changes that they made. I had previously found two other people that had made some changes over the last 10 years while the project was dormant.

So I was faced with the task of taking the SVN dump of the changes made by Michiel Hendriks (elmuerte) and splicing them onto the old CVS history of the code he took the source .zip and then converting it into Git (which is what we are now using as our VCS). I was kind of hoping that with luck the two development histories would then share a common root which could be used to help with merging the two development histories back together that hope was in vain (though I have another idea I might try later).

Anyway this whole splicing thing was non-trivial so I thought I would document how I did it (partly so that I can find the instructions later).

So there were problems with the SVN dumpfile I was given: it couldn’t be applied to a bare repository as all the Node-Paths had an additional extension ‘trese/ample/trunk’ on the beginning. No one fix I tried was to add

Node-path: trese
Node-kind: dir
Node-action: add

Node-path: trese/ample
Node-kind: dir
Node-action: add

Node-path: trese/ample/trunk
Node-kind: dir
Node-action: add

Into the the first commit using vim: now this worked but it meant that I still had the extra ‘trese/ample/trunk/’ which I didn’t want.
So I removed that using vim and “:%s/trese\/ample\/trunk\///g”, unfortunately there were a couple of instances where trese/ample/trunk was refereed to directly when files were being added to svn:ignore unfortunately I didn’t find out how to refer to the top level directory in an svn dump file so I just edited those bits out (there were only 2 commits which were effected). So now I had a working svn dumpfile. To do the splicing I used svndumptool.py to remove the first commit resulting in a dumpfile I called gnuprolog-mod.mod2-86.svn.dump (see below for instructions on how to do this).

I got a copy of the CVS repository, so that the current working directory contained CVSROOT and also a folder called gnuprolog which contained the VCed code.

# This makes a dumpfile 'svndump' of the code in the gnuprolog module I only care about the trunk.
cvs2svn --trunk-only --dumpfile=svndump gnuprolog
# Then we use svndumptool to remove the first commit as cvs2svn adds one to the beginning where it makes various directories.
svndumptool.py split svndump 2 8 cvs-1.svn.dump
# Then we use vim to edit the dump file and do :%s/trunk\///g to strip of the leading 'trunk/' from Node-Paths
vim cvs-1.svn.dump
# Create a SVN repository to import into
svnadmin create gnuprolog-mod.plaster.svn
# Import the CVS history
svnadmin load gnuprolog-mod.plaster.svn < cvs-1.svn.dump
# Import the SVN history
svnadmin load gnuprolog-mod.plaster.svn < gnuprolog-mod.mod2-86.svn.dump
# Make a git repository from the SVN repository
git svn clone file:///home/daniel/dev/gnuprolog/gnuprolog-mod.plaster.svn gnuprolog-mod.plaster.git

Things that I found which are useful

The SVN dump file format
svn-to-git got me the closest to being able to import the svn dumpfile into git.
These instructions on how to fix svn dumpfiles.

Sorry this post is rather ramblely stuck half way between a howto and an anecdote but hopefully someone will find it useful.