Posts Tagged ‘GSoC’

Updating file copyright information using git to find the files

Wednesday, September 1st, 2010

Update: I missed off --author="Your Name".
All source files should technically have a copyright section at the top. However updating this every time the file is changed is tiresome and tends to be missed out. So after a long period of development you find yourself asking the question “Which files have I modified sufficiently that I need to add myself to the list of copyright holders on the file?”.
Of course you are using a version control system so the information you need about which files you modified and how much is available it just needs to be extracted. The shell is powerful and a solution is just one (long) line (using long options and splitting over multiple lines for readability).

$ find -type f -exec bash -c 'git log --ignore-all-space --unified=0 --oneline  \
    --author="Your Name" {} | grep "^[+-]" | \
    grep --invert-match "^\(+++\)\|\(---\)" | wc --lines | \
    xargs test 20 -le ' \; -print > update_copyright.txt \
    && wc --lines update_copyright.txt

In words: find all files and run a command on them and if that command returns true (0) then print out that file name to the ‘update_copyright.txt’ file and then count how many lines are in the file. Where the command is: use git log to find all changes which changed things other than space and minimise things other than the changes themselves (--oneline to reduce commit message etc and --unified=0 to remove context) then strip out all lines which don’t start with + or – and then strip out all the lines which start with +++ or — then count how many lines we get from that and test whether this is larger than 20. If so return true (zero) else return false (non zero).

This should result in an output like:

 284 update_copyright.txt

I picked 20 for the ‘number of lines changed’ value because 10 lines of changes is generally the size at which copyright information should be updated (I think GNU states this) and we are including both additions and removals so we want to double that.

Now I could go from there to a script which then automatically updated the copyright rather than going through manually and updating it myself… however the output I have contains lots of files which I should not update. Then there are files in different languages which use different types of comments etc. so such a script would be much more difficult to write.

Apologies for the poor quality of English in this post and to those of you who have no idea what I am on about.

Packaging a java library for Debian as an upstream

Tuesday, August 17th, 2010

I have just finished my GSoC project working on GNU Prolog for Java which is a Java library for doing Prolog. I got as far as the beta release of version 0.2.5. One of the things I want to do for the final release is to integrate the making of .deb files to distribute the binary, source and documentation into the ant build system. This post exists to chronicle how I go about doing that.

First some background reading: There is a page on the debian wiki on Java Packaging. You will want to install javahelper as well as debain-policy and debhelper. You will then have the javahelper documentation in /usr/share/doc/javahelper/. The debian policy on java will probably also be useful.

Now most debian packages will be created by people working for debian using the releases made by upstreams but I want to be a good upstream and do it myself.

So I ran:

jh_makepkg --package=gnuprologjava --maintainer="Daniel Thomas" \
    --email="drt24-debian@srcf.ucam.org" --upstream="0.2.5" --library --ant --default

I then reported a bug in jh_makepkg where it fails to support the --default and --email options properly because I got “Invalid option: default”. Which might be fixed by the time you read this.
So I ran:

jh_makepkg --package=gnuprologjava --maintainer="Daniel Thomas" \
    --email="drt24-debian@srcf.ucam.org" --upstream="0.2.5" --library --ant

And selected ‘default’ when prompted by pressing F.
I got:

dch warning: Recognised distributions are:
{dapper,hardy,intrepid,jaunty,karmic,lucid,maverick}{,-updates,-security,-proposed,-backports} and UNRELEASED.
Using your request anyway.
dch: Did you see that warning?  Press RETURN to continue...

Which I think stems from the fact that javahelper is assuming debian but javahelper uses debhelper or similar which Ubuntu has configured for Ubuntu. However this doesn’t matter to me as I am trying to make a .deb for Debian (as it will then end up in Ubuntu and many other places – push everything as far upstream as possible).
Now because of the previously mentioned bug the --email option is not correctly handled so we need to fix changelog, control and copyright in the created debian folder to use the correct email address rather than user@host in the case of changelog and NAME in the case of control and copyright (two places in copyright).

copyright needs updating to have Upstream Author, Copyright list and licence statement and homepage.
For licence statement I used:

    This library is free software; you can redistribute it and/or
    modify it under the terms of the GNU Library General Public
    License as published by the Free Software Foundation; either
    version 3 of the License, or (at your option) any later version.

    This library is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
    Library General Public License for more details.

    You should have received a copy of the GNU Library General Public
    License along with this library; if not, write to the Free Software
    Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston MA  02110-1301 USA

On Debian systems the full text of the GNU Library Public License, can be
found in the file /usr/share/common-licenses/LGPL-3.

control needs updating to add homepage, Short Description and Long Description (which needs to be indented with one space). I also added texinfo and libgetopt-java to Build-Depends and Suggests: libgetopt-java.
GNU Prolog for Java has a build dependency on gnu.getopt as gnu.prolog.test.GoalRunner uses it so I added
export CLASSPATH=/usr/share/java/gnu-getopt.jar to debian/rules. libgnuprologjava-java.javadoc needed build/api put in it so that it could find the javadoc.

So at this point I have a debian folder with the right files in it but I don’t have an ant task which will do everything neatly. Since debian package building really wants to happen on an unpacked release tarball of the source I modified my existing dist-src target to copy the debian folder across as well and added the following to my init target:

<!-- Key id to use for signing the .deb files -->
<property name="dist.keyid" value="" />
<property name="dist.name-version" value="${ant.project.name}-${version}"/>
<property name="dist.debdir" value="dist/${dist.name-version}/" />

I could then add the following dist-deb target to do all the work.

<target name="dist-deb" depends="dist-src" description="Produce a .deb to install with">
	<mkdir dir="${dist.debdir}"/>
	<unzip src="dist/${dist.name-version}-src.zip" dest="${dist.debdir}"/>
	<!-- Delete the getopt code as we want to use the libgetopt-java package for that -->
	<delete dir="${dist.debdir}src/gnu/getopt"/>
	<exec executable="dpkg-buildpackage" dir="${dist.debdir}">
		<arg value="-k${dist.keyid}"/>
	</exec>
</target>

So this gives me binary and source files for distribution and includes the javadoc documentation but lacks the manual.
To fix that I added libgnuprologjava-java.docs containing

build/manual/
docs/readme.txt

, libgnuprologjava-java.info containing build/gnuprologjava.info and libgnuprologjava-java.doc-base.manual containing:

Document: libgnuprologjava-java-manual
Title: Manual for libgnuprologjava-java
Author: Daniel Thomas
Abstract: This the Manual for libgnuprologjava-java
Section: Programming

Format: HTML
Index: /usr/share/doc/libgnuprologjava-java/manual
Files: /usr/share/doc/libgnuprologjava-java/manual/*.html

Now all of this might result in the creation of the relevant files needed to get GNU Prolog for Java into Debian but then again it might not :-). The .deb file does install (and uninstall) correctly on my machine but I might have violated some debian-policy.

Major Multi-VCS surgery

Friday, June 25th, 2010

This summer I am working on the Google Summer of Code project “Revive GNU Prolog for Java”. It now has a project page and a Git repository which resulted in a rather entertaining screen shot.

Yesterday I found out that last year someone did a lot of the work I was intending to do this summer and they gave me a svn dump of the changes that they made. I had previously found two other people that had made some changes over the last 10 years while the project was dormant.

So I was faced with the task of taking the SVN dump of the changes made by Michiel Hendriks (elmuerte) and splicing them onto the old CVS history of the code he took the source .zip and then converting it into Git (which is what we are now using as our VCS). I was kind of hoping that with luck the two development histories would then share a common root which could be used to help with merging the two development histories back together that hope was in vain (though I have another idea I might try later).

Anyway this whole splicing thing was non-trivial so I thought I would document how I did it (partly so that I can find the instructions later).

So there were problems with the SVN dumpfile I was given: it couldn’t be applied to a bare repository as all the Node-Paths had an additional extension ‘trese/ample/trunk’ on the beginning. No one fix I tried was to add

Node-path: trese
Node-kind: dir
Node-action: add

Node-path: trese/ample
Node-kind: dir
Node-action: add

Node-path: trese/ample/trunk
Node-kind: dir
Node-action: add

Into the the first commit using vim: now this worked but it meant that I still had the extra ‘trese/ample/trunk/’ which I didn’t want.
So I removed that using vim and “:%s/trese\/ample\/trunk\///g”, unfortunately there were a couple of instances where trese/ample/trunk was refereed to directly when files were being added to svn:ignore unfortunately I didn’t find out how to refer to the top level directory in an svn dump file so I just edited those bits out (there were only 2 commits which were effected). So now I had a working svn dumpfile. To do the splicing I used svndumptool.py to remove the first commit resulting in a dumpfile I called gnuprolog-mod.mod2-86.svn.dump (see below for instructions on how to do this).

I got a copy of the CVS repository, so that the current working directory contained CVSROOT and also a folder called gnuprolog which contained the VCed code.

# This makes a dumpfile 'svndump' of the code in the gnuprolog module I only care about the trunk.
cvs2svn --trunk-only --dumpfile=svndump gnuprolog
# Then we use svndumptool to remove the first commit as cvs2svn adds one to the beginning where it makes various directories.
svndumptool.py split svndump 2 8 cvs-1.svn.dump
# Then we use vim to edit the dump file and do :%s/trunk\///g to strip of the leading 'trunk/' from Node-Paths
vim cvs-1.svn.dump
# Create a SVN repository to import into
svnadmin create gnuprolog-mod.plaster.svn
# Import the CVS history
svnadmin load gnuprolog-mod.plaster.svn < cvs-1.svn.dump
# Import the SVN history
svnadmin load gnuprolog-mod.plaster.svn < gnuprolog-mod.mod2-86.svn.dump
# Make a git repository from the SVN repository
git svn clone file:///home/daniel/dev/gnuprolog/gnuprolog-mod.plaster.svn gnuprolog-mod.plaster.git

Things that I found which are useful

The SVN dump file format
svn-to-git got me the closest to being able to import the svn dumpfile into git.
These instructions on how to fix svn dumpfiles.

Sorry this post is rather ramblely stuck half way between a howto and an anecdote but hopefully someone will find it useful.