Daniel Thomas' Blog

Archive for the ‘CompSci’ Category

IB Group Projects

Thursday, March 10th, 2011

On Wednesday the Computer Science IB students demonstrated the projects that they have been working on for the last term. This is my thoughts on them.

Some of the projects were really quite interesting, some of them even actually useful in real life, some of them didn’t work, were boring and simply gimmicks.

Alpha: “African SMS Radio” was a project to create a pretty GUI to a “byzantine and buggy” backend. It could allow a radio operator to run polls and examine stats of texts sent to a particular number. However it didn’t look particularly interesting and though there might be use cases for such a system I think only as a component of a larger more enterprise system and only after the “buggy” backend they had to use had been fixed up/rewritten.

Bravo: “Crowd control” was a project to simulate evacuations of buildings. It is a nice use of the Open Room Map project to provide the building data. It looked like it was still a little buggy – in particular it was allowing really quite nasty crushes to occur and the resulting edge effects as people were thrown violently across the room as the system tried to deal with multiple people being in the same place at the same time was a little amusing. With a little more work it could become quite useful as an extension in the Open Room Map ecosystem which could help it gain momentum and take off. I think that the Open Room Map project is really quite cool and useful – it is the way that data on the current structure and contents of buildings can be crowd sourced and kept up to date but then it is a project of my supervisor. ;-)

Charlie: “Digit[Ov]al automated cricket commentary” this was a project to use little location transmitters on necklaces and usb receivers plugged into laptops to determine the location of cricketers while they were playing and then automatically construct commentary on that. It won the prize for best technical project but it didn’t actually work. They hadn’t solved the problem of people being between the transmitter and the receiver reducing transmission strength by 1/3 or the fact that placing a hand over it reduced it by 1/3 or the fact that the transmitters were not omnidirectional and so orientation was a major issue. They were also limited to only four receivers due to only having four suitable laptops. They used a square arrangement to try and detect location. It is possible that a double triangle arrangement with three corners at ground level and then the other triangle higher up (using the ‘stadium’ to gain height) and offset so that the upper vertices lined up with the mid point of the lower edges would have given them a better signal. Calibrating and constructing algorithms to deal with the noise and poor data would probably have been quite difficult and required some significant work – which IB students haven’t really been taught enough for yet.

Delta: “Hand Wave, Hand Wave” was a project to use two sensors with gyroscopes and accelerometers to do gesture recognition and control. It didn’t really work in the demo and since it had reimplemented everything it didn’t manage to do anything particularly interesting. I think using such sensors for gesture control is probably a dead end as kinect and the like makes just using a camera so much easier and more interesting.

Echo: “iZoopraxiscope – Interactive Handheld Projector” this project was about using a phone with a build in pico projector as an interface. This was obviously using very prototype technology – using the projector would drain the phones battery very quickly, in some cases even when the phone was plugged in and fitting it in the (slightly clunky) phone clearly was at the expense of providing the normal processing power that is expected in an Android phone resulting in it being somewhat sluggish. Since the sensors were rather noisy and techniques for coping with that were not as advanced as they might have been (they just used an exponential moving average and manually tweaked the parameter) they had some difficulties with sluggishness in the controls of some of the games. However I think they produced several nice arcade style games (I didn’t play any of them) and so did demonstrate a wide range of uses. With better knowledge of how to deal with sensors (not really covered in any of the courses offered at the CL) and better technology this could be really neat. However getting a battery powered projector to compete with normal lighting is going to be quite a challenge.
The thing I really like about small projectors is that it could help make it easier to interact in lectures. Sometimes when asking a question or making a comment in lectures it might be useful to draw a diagram which the lecturer (and the rest of the audience) can see and currently doing so is really quite hard. (I should take to carrying around a laser pointer for use in these circumstances).

Foxtrot: “Lounge Star” this was a android app for making air passenger’s lives a little easier by telling them information such as which gate to use etc. without them having to go anywhere and integrating with various airlines systems. As someone who has ‘given up flying’ (not in an absolute sense but in a ‘while any other option (including not going) still remains’ sense) this was not vastly interesting but it could really work as a product if the airlines like it. So: “Oh it is another nice little Android app” (but then associated short attention span kicks in and “bored now”).

Golf: The Energy Forecast this was a project I really liked (it pushed the right buttons) it is a project to predict the energy production of all the wind farms in the country based on the predicted wind speed. It integrated various sources of wind speeds, power production profiles for different types of wind farm and the locations and types of many different wind farms (they thought all but I found some they were missing) and they had a very pretty GUI using google maps etc to show things geographically and were using a very pretty graph drawing javascript library. So I did the “oh you should use the SRCF to host that” thing (they were using a public IP on one of their own computers) and I am sort of thinking “I would really like to have your code” (Oh wait I know where that is kept, snarfle, snarfle ;-) It is something I would really like to make into a part of the ReadYourMeter ecosystem (I may try and persuade Andy he wants to get something done with it).
I love wind turbines all my (small) investments are in them, we have one in our back garden etc. this could be really useful. [end fanboyism]

Hotel: “Top Tips” this was a project to see whether the comments traders put on their trading tips actually told you anything about how good the trade was. The answer was no, not really, nothing to see here. Which is a little disappointing and not a particularly interesting project “lets do some data analysis!” etc.

India: “True Mobile Coverage” this was a project to crowd source the collection of real mobile signal strength data. It actually serves a useful purpose and could be really helpful. They needed to work on their display a little as it wasn’t very good at distinguishing between areas they didn’t know much about and areas with weak signal and unfortunately as with all projects it started working in a very last minute manner so they didn’t have that much data to show. Nice crowd sourcing data collection android app of the kind that loads of people in the CL love. Of course there will be large quantities they could do to improve it using the kind of research which has been done in the CL but it is a good start.

Juliet: “Twitter Dashboard” this was so obviously going to win from the beginning – a twitter project (yey bandwagon) which looks pretty. They did do a very good job, it looked pretty, it ate 200% of the SRCF’s CPU continuously during the demo (but was niced to 19 so didn’t affect other services) – there are probably efficiency savings to be made here but that isn’t a priority for a Group Project which is mainly about producing something that looks pretty and as if it works all other considerations are secondary. My thoughts were mainly “Oh another project to make it easier for Redgate to do more of their perpetual advertising. meh.” (they have lovely people working for them but I couldn’t write good enough Java for them)

Kilo: “Walk out of the Underground” this was a project to guide you from the moment you stepped out of the underground to your destination using an arrow on the screen of your phone. It was rather hard to demo inside the Intel Lab where there is both poor signal and insufficient scale to see whether it actually works. It might be useful, it might work, it is yet another app for the app store and could probably drum up a few thousand users as a free app.

Lima: “Who is my Customer?” this was a very enterprise project to do some rather basic Information Retrieval to find the same customer in multiple data sets. The use case being $company has a failsome information system and their data is poor quality and not well linked together. Unfortunately the project gave the impression of being something which one person could hack together in a weekend. I may be being overly harsh but I found it a little boring.

So in summary: I liked “The Energy Forcast” most because it pushed the right buttons, “True mobile coverage” is interesting and useful. Charlie could be interesting if it could be made to work but I think that the ‘cricket’ aspect is a little silly – if you want commentary use a human. iZoopraxiscope (what a silly name) points out some cool tech that will perhaps be useful in the future but really is not ready yet (they might need/be using some of the cool holgrams tech that Tim Wilkinson is working on (he gave a CUCaTS talk “Do We Really Need Pixels?” recently).

Idea for next year: have a competition after the end of the presentations to write up the project in a scientific paper style and then publish the ones that actually reach a sufficiently good standard in a IB Group Project ‘journal’ as this would provide some scientific skills to go with all the Software Engineering skills that the Group project is currently supposed to teach. (No this is so not going to happen in reality)

Tags: code, Competition, Computer Laboratory, Computer Science, education, Energy Forcast, Group Project, IB, science, Software Engineering, teaching
Posted in CompSci, University | No Comments »

tidy_vig: Automatically reformatting generated HTML into something cleaner

Friday, February 4th, 2011

As webmaster and secretary of various things I regularly need to upload minutes to websites and hence want to upload html files. While Open/LibreOffice’s export to html functionality works it doesn’t produce nice html. tidy is a useful tool for finding flaws in html and making it correct and nicer but it is not sufficient to accomplish this task on its own. Hence I have finally scriptified the various automatable parts of turning generated html into something publishable (this loses all style definitions so won’t look the same – use tidy_up if you want to avoid that).

#!/bin/bash


set -e #bail if something goes wrong
tidy_up='tidy -indent -modify -clean -bare -asxml -utf8 -wrap 80 -access 3 --logical-emphasis yes'

$tidy_up $1 #Normalise to lowercase and remove most rubbish $tidy_up $1 $tidy_up $1 #Repeat until stabalises - this happens third time # Get sed to select the range of lines to apply the replacement on first. # No I don't know what is going on here. sed -i '/]*>/,/<\/style>/ {:ack N; /<\/style>/! b ack s/]*>.*<\/style>//g }' $1 sed -i 's/ class="[^"]*"//g' $1 sed -i 's/<\/*span>//g' $1 $tidy_up $1 #Reformat now that remaining cruft removed sed -i 's/ class="[^"]*"//g' $1 #Remove any classes that got un-line breaked

Unfortunately there may still need to be some manual work if for example headers haven’t been specified as headers when the person who wrote the original file wrote it and so it may be that some sections might need conversion.

It is probably possible to do this in a cleaner more logical way and I have probably missed edge cases and this probably counts as being a little hacky however hopefully someone will find it useful.

Tags: html, libreoffice, minutes, openoffice.org, script, secretary, webmaster
Posted in CompSci | 1 Comment »

dh_installdocs –link-doc when using jh_installjavadoc

Monday, January 3rd, 2011

This post exists to stop people (possibly including myself) spending hours being thoroughly confused as to why
override_dh_installdocs: dh_installdocs --link-doc=packagel
is not working. In fact it probably is working but your package-doc.javadoc probably looks something like:
docs/api
rather than like:
docs/api /usr/share/doc/package/api
and hence jh_installjavadoc is correctly creating the /usr/share/doc/package-doc/ directory when really you didn’t want that to happen and so dh_installdocs –link-doc silently ignores your instruction as it doesn’t make sense in context.

Relatedly I have ‘finished’ packaging JEval for Debian partly using the instructions on packaging java packages using git. It just needs some further testing and to be uploaded and sponsored.
(I am packaging the dependencies of my Part II Project)

Tags: code, compiling, debian, googlefodder, JEval, packaging
Posted in CompSci, Debian | 2 Comments »

LaTeX user search path

Monday, November 1st, 2010

Because this took far too long to find out:
If you have .sty files you have written want to be able to easily reuse in multiple documents then put them in ~/texmf/tex
Then you should be able to use them with \usepackage{foo} as normal.

Tags: LaTeX, linux, search paths, TeX
Posted in CompSci, Part II Project, Uncategorized | No Comments »

Firesheep as applied to Cambridge

Tuesday, October 26th, 2010

Many of you will have already heard about Firesheep which is essentially a Firefox extension which allows you to login to other people’s Facebook, Amazon etc. accounts if they are on the same (unsecured) network to you. This post is on my initial thoughts on what this means to the people on Cambridge University networks.

Essentially this whole thing is nothing new – in one sense people who know anything about security already knew that this was possible and that programs for doing this existed. The only innovation is an easy to use User Interface and because Human Computer Interaction (HCI) is hard, this means that Eric Butler has won.

In Cambridge we have unsecured wireless networks such as Lapwing and the CLs shared key networks and I think that Firesheep should work fine on these and so for example in lectures where lots of students are checking Facebook et al. (especially in the CL) there is great potential for “pwned with Firesheep” becoming the status of many people. However this would be morally wrong and violate the Terms of Service of the CUDN/JANET etc. If that isn’t sufficient – the UCS has magic scripts that watch network traffic, they know where you live and if you do something really bad they can probably stop you graduating. So while amusing I don’t think that a sudden epidemic of breaking into people’s accounts would be sensible.

So what does that mean for the users of Cambridge networks? Use Eduroam. Eduroam is wonderful and actually provides security in this case (at least as long as you trust the UCS, but we have to do that anyway). If you are using Lapwing and you use a site listed on the handlers page for firesheep (though don’t visit that link on an unsecured network as GitHub is on that list) then you have to accept the risk that someone may steal your cookies and pretend to be you.

What does this mean for people running websites for Cambridge people? Use SSL, if you are using the SRCF then you win as we provide free SSL and it is simply a matter of using a .htaccess file to turn it on. It should also be pointed out that if you are using Raven for authentication (which you should be) then you still need to use SSL for all the pages which you are authenticated on or you lose^[0]. If you are not using the SRCF – then why not? The SRCF is wonderful!^[1] . If you are within *.cam.ac.uk and not using the SRCF then you can also obtain free SSL certificates from the UCS (though I doubt anyone likely to read this is).

So do I fail on this count? Yes I think I have multiple websites on the SRCF which don’t use SSL everywhere they should and I don’t think any uses secure cookies. I also feel slightly responsible for another website which both uses poorly designed cookies and no SSL.

Users – know the risks. Developers – someone is telling us to wake up again, and even though I knew I was sleeping.

[0]: Unfortunately I think that until the SRCF rolls out per user and society subdomains which will be happening RSN if you use raven to login to one site on the SRCF and then visit any non-SSL page on the SRCF then your Raven cookie for the SRCF has just leaked to anyone listening. Oops. ~~Using secure cookies would fix this though I haven’t worked out how to do this yet – I will post a HOWTO later~~ Update: if the original authentication is done to an SSL protected site then the Raven cookie will be set to be secure.
[1]: I may be wearing my SRCF Chairman hat while writing that – though that doesn’t mean it isn’t true.

Tags: code, firefox extension, firesheep, unsecured network, wireless
Posted in CompSci, Security, University | 6 Comments »

Models of human sampling and interpolating regular data

Saturday, October 23rd, 2010

On Thursday I submitted my project proposal for my Part II project. A HTML version of it (generated using hevea and tidy from LaTeX with all styling stripped out) follows. (With regard to the work schedule – I appear to be one week behind already. Oops.)

Part II Computer Science Project Proposal

Models of human sampling and interpolating regular data

D. Thomas, Peterhouse

Originator: Dr A. Rice

Special Resources Required

The use of my own laptop (for development)
The use of the PWF (backup, backup development)
The use of the SRCF (backup, backup development)
The use of zeus (backup)

Project Supervisor: Dr A. Rice

Director of Studies: Dr A. Norman

Project Overseers: Alan Blackwell + Cecilia Mascolo
(AFB/CM)

Introduction

When humans record information they do not usually do so in the same
regular manner that a machine does as the rate at which they sample depends
on factors such as how interested in the data they are and whether they have
developed a habit of collecting the data on a particular schedule. They are
also likely to have other commitments which prevent them recording at precise
half hour intervals for years. In order to be able to test methods of
interpolating from human recorded data to a more regular data stream such as
that which would be created by a machine we need models of how humans collect
data. ReadYourMeter.org contains data collected by humans which can be used
to evaluate these models. Using these models we can then create test data
sets from high resolution machine recorded data sets¹ and then try to interpolate back to the
original data set and evaluate how good different machine learning techniques
are at doing this. This could then be extended with pluggable models for
different data sets which could then use the human recorded data set to do
parameter estimation. Interpolating to a higher resolution regular data set
allows for comparison between different data sets for example those collected
by different people or relating to different readings such as gas and
electricity.

Work that has to be done

The project breaks down into the following main sections:-

Investigating the distribution of recordings in
the ReadYourMeter.org data set.
Constructing hypotheses of how the human recording
of data can be modelled and evaluating these models against the
ReadYourMeter.org data set.
Using these models to construct test data sets by
sampling the regular machine recorded data sets² to produce pseudo-human read test data sets
which can be used to be learnt from as the results can be compared with the
reality of the machine read data sets.
Using machine learning interpolation techniques to
try and interpolate back to the original data sets from the test data sets
and evaluating success of different methods in achieving this.
- Polynomial fit
- Locally weighted linear regression
- Gaussian process regression (see Chapter 2 of
  Gaussian Processes for Machine Learning by Rasmussen &
  Williams)
- Neural Networks (possibly using java-fann)
- Hidden Markov Models (possibly using jahmm)
If time allows then using parameter estimation on
a known model of a system to interpolate from a test data set back to the
original data set and evaluating how well this compares with the machine
learning techniques which have no prior knowledge of the system.
Writing the Dissertation.

Difficulties to Overcome

The following main learning tasks will have to be undertaken before the
project can be started:

To find a suitable method for comparing different
sampling patterns to enable hypothesises of human behaviour to be
evaluated.
Research into existing models for related human
behaviour.

Starting Point

I have a good working knowledge of Java and of queries in SQL.
I have read “Machine Learning” by Tom Mitchell.
Andrew Rice has written some Java code which does some basic linear
interpolation it was written for use in producing a particular paper but
should form a good starting point at least providing ideas on how to go
forwards. It can also be used for requirement sampling.

ReadYourMeter.org database

I have worked with the ReadYourMeter.org database before (summer 2009) and
with large data sets of sensor readings (spring 2008).
For the purpose of this project the relevant data can be viewed as a table
with three columns: “meter_id, timestamp, value“.
There are 99 meters with over 30 readings, 39 with over 50, 12 with over 100
and 5 with over 200. This data is to be used for the purpose of constructing
and evaluating models of how humans record data.

Evaluation data sets

There are several data sets to be used for the purpose of training and
evaluating the machine learning interpolation techniques. These are to be
sampled using the model constructed in the first part of the project for how
humans record data. This then allows the data interpolated from this sampled
data to be compared with the actual data which was sampled from.
The data sets are:

half hourly electricity readings for the WGB from
2001-2010 (131416 records in “timestamp, usage rate”
format).
monthly gas readings for the WGB from 2002-2010 (71
records in “date, total usage” format)
half hourly weather data from the DTG weather
station from 1995-2010 (263026 records)

Resources

This project should mainly developed on my laptop which has sufficient
resources to deal with the anticipated workload.
The project will be kept in version control using GIT. The SRCF, PWF and zeus
will be set to clone this and fetch regularly. Simple backups will be taken
at weekly intervals to SRCF/PWF and to an external disk.

Success criterion

Models of human behaviour in recording data must
be constructed which emulate real behaviour in the ReadYourMeter.org
dataset.
The machine learning methods must produce better
approximations of the underlying data than linear interpolation and these
different methods should be compared to determine their relative merits on
different data sets.
The machine once trained should be able apply this
to unseen data of a similar class and produce better results than linear
interpolation.
A library should be produced which is well
designed and documented to allow users – particularly researchers – to be
able to easily combine various functions on the input data.
The dissertation should be written.

Work Plan

Planned starting date is 2010-10-15.

Dates in general indicate start dates or deadlines and this is clearly
indicated. Work items should usually be finished before the next one starts
except where indicated (extensions run concurrently with dissertation
writing).

Monday, October 18

Start: Investigating the distribution of
recordings in the ReadYourMeter.org data set

Monday, October 25

Start: Constructing hypotheses of how the human
recording of data can be modelled and evaluating these models against the
ReadYourMeter.org data set.
This involves examining the distributions and modes of recording found in
the previous section and constructing parametrised models which can
encapsulate this. For example a hypothesis might be that some humans record
data in three phases, first frequently (e.g. several times a day) and then
trailing off irregularly until some more regular but less frequent mode is
entered where data is recorded once a week/month. This would then be
parametrised by the length and frequency in each stage and within that
stage details such as the time of day would probably need to be
characterised by probability distributions which can be calculated from the
ReadYourMeter.org dataset.

Monday, November 8

Start: Using these models to construct test data
sets by sampling a regular machine recorded data sets.

Monday, November 15

Start: Using machine learning interpolation techniques to try and
interpolate back to the original data sets from the test data sets and
evaluating success of different methods in achieving this.

Monday, November 15: Start: Polynomial fit
Monday, November 22: Start: Locally weighted linear
regression
Monday, November 29: Start: Gaussian process regression
Monday, December 13: Start: Neural Networks
Monday, December 27: Start: Hidden Markov Models

Monday, January 3, 2011

Start: Introduction chapter

Monday, January 10, 2011

Start: Preparation chapter

Monday, January 17, 2011

Start: Progress report

Monday, January 24, 2011

Start: If time allows then using parameter
estimation on a known model of a system to interpolate from a test data set
back to the original data set. This continues on until 17^th
March and can be expanded or shrunk depending on available time.

Friday, January 28, 2011

Deadline: Draft progress
report

Wednesday, February 2, 2011

Deadline: Final progress report
printed and handed in. By this point the core of the project should be
completed with only extension components and polishing remaining.

Friday, February 4, 2011, 12:00

Deadline: Progress Report
Deadline

Monday, February 7, 2011

Start: Implementation Chapter

Monday, February 21, 2011

Start: Evaluation Chapter

Monday, March 7, 2011

Start: Conclusions chapter

Thursday, March 17, 2011

Deadline: First Draft of
Dissertation (by this point revision for the exams will be in full swing
limiting time available for the project and time is required between drafts
to allow people to read and comment on it)

Friday, April 1, 2011

Deadline: Second draft
dissertation

Friday, April 22, 2011

Deadline: Third draft
dissertation

Friday, May 6, 2011

Deadline: Final version of
dissertation produced

Monday, May 16, 2011

Deadline: Print, bind and
submit dissertation

Friday, May 20, 2011, 11:00

Deadline: Dissertation
submission deadline

1: Such as the WGB’s Energy usage, see §Starting
Point for more details.
2: These are detailed in §Starting Point

Tags: CompSci, dissertation, dplumb, project, project proposal
Posted in CompSci, Part II Project | No Comments »

Proving Java Terminates (or runs forever) – Avoiding deadlock automatically with compile time checking

Tuesday, October 12th, 2010

This is one of the two projects I was considering doing for my Part II project and the one which I decided not to do.
This project would involve using annotations in Java to check that the thread safety properties which the programmer believes that the code has actually hold.
For example @DeadlockFree or @ThreadSafe.
Now you might think that this would be both incredibly useful and far too difficult/impossible however implementations of this exist at least in part: The MIT licensed CheckThread project and another implementation forms part of the much larger JSR-305 project to use annotations in java to detect software defects. JSR-305 was going to be part of Java 7 but development appears to be slow (over a year between the second most recent commit and the most recent commit (4 days ago)).

Essentially if you want to use annotations in Java to find your thread safety/deadlock bugs and prevent them from reoccurring the technology is out there (but needs some polishing). In the end I decided that since I could probably hack something working together and get it packaged in Debian in a couple of days/weeks by fixing problems in the existing implementations then it would not be particularly useful to spend several months reimplementing it as a Part II project.

If a library which allows you to use annotations in Java to prevent thread safety/deadlock problems is not in Debian (including new queue) by the end of next summer throw things at me until it is.

Tags: annotation, code, deadlock, debian, java, library, thread safety
Posted in CompSci, Part II Project | 1 Comment »

Updating file copyright information using git to find the files

Wednesday, September 1st, 2010

Update: I missed off --author="Your Name".
All source files should technically have a copyright section at the top. However updating this every time the file is changed is tiresome and tends to be missed out. So after a long period of development you find yourself asking the question “Which files have I modified sufficiently that I need to add myself to the list of copyright holders on the file?”.
Of course you are using a version control system so the information you need about which files you modified and how much is available it just needs to be extracted. The shell is powerful and a solution is just one (long) line (using long options and splitting over multiple lines for readability).

$ find -type f -exec bash -c 'git log --ignore-all-space --unified=0 --oneline  \
    --author="Your Name" {} | grep "^[+-]" | \
    grep --invert-match "^\(+++\)\|\(---\)" | wc --lines | \
    xargs test 20 -le ' \; -print > update_copyright.txt \
    && wc --lines update_copyright.txt

In words: find all files and run a command on them and if that command returns true (0) then print out that file name to the ‘update_copyright.txt’ file and then count how many lines are in the file. Where the command is: use git log to find all changes which changed things other than space and minimise things other than the changes themselves (--oneline to reduce commit message etc and --unified=0 to remove context) then strip out all lines which don’t start with + or – and then strip out all the lines which start with +++ or — then count how many lines we get from that and test whether this is larger than 20. If so return true (zero) else return false (non zero).

This should result in an output like:

 284 update_copyright.txt

I picked 20 for the ‘number of lines changed’ value because 10 lines of changes is generally the size at which copyright information should be updated (I think GNU states this) and we are including both additions and removals so we want to double that.

Now I could go from there to a script which then automatically updated the copyright rather than going through manually and updating it myself… however the output I have contains lots of files which I should not update. Then there are files in different languages which use different types of comments etc. so such a script would be much more difficult to write.

Apologies for the poor quality of English in this post and to those of you who have no idea what I am on about.

Tags: code, copyright, find, git, GNU, gnuprologjava, GSoC, script, source files, VCS, version control
Posted in CompSci | 3 Comments »

Packaging a java library for Debian as an upstream

Tuesday, August 17th, 2010

I have just finished my GSoC project working on GNU Prolog for Java which is a Java library for doing Prolog. I got as far as the beta release of version 0.2.5. One of the things I want to do for the final release is to integrate the making of .deb files to distribute the binary, source and documentation into the ant build system. This post exists to chronicle how I go about doing that.

First some background reading: There is a page on the debian wiki on Java Packaging. You will want to install javahelper as well as debain-policy and debhelper. You will then have the javahelper documentation in /usr/share/doc/javahelper/. The debian policy on java will probably also be useful.

Now most debian packages will be created by people working for debian using the releases made by upstreams but I want to be a good upstream and do it myself.

So I ran:

jh_makepkg --package=gnuprologjava --maintainer="Daniel Thomas" \
    --email="drt24-debian@srcf.ucam.org" --upstream="0.2.5" --library --ant --default

I then reported a bug in jh_makepkg where it fails to support the --default and --email options properly because I got “Invalid option: default”. Which might be fixed by the time you read this.
So I ran:

jh_makepkg --package=gnuprologjava --maintainer="Daniel Thomas" \
    --email="drt24-debian@srcf.ucam.org" --upstream="0.2.5" --library --ant

And selected ‘default’ when prompted by pressing F.
I got:

dch warning: Recognised distributions are:
{dapper,hardy,intrepid,jaunty,karmic,lucid,maverick}{,-updates,-security,-proposed,-backports} and UNRELEASED.
Using your request anyway.
dch: Did you see that warning?  Press RETURN to continue...

Which I think stems from the fact that javahelper is assuming debian but javahelper uses debhelper or similar which Ubuntu has configured for Ubuntu. However this doesn’t matter to me as I am trying to make a .deb for Debian (as it will then end up in Ubuntu and many other places – push everything as far upstream as possible).
Now because of the previously mentioned bug the --email option is not correctly handled so we need to fix changelog, control and copyright in the created debian folder to use the correct email address rather than user@host in the case of changelog and NAME in the case of control and copyright (two places in copyright).

copyright needs updating to have Upstream Author, Copyright list and licence statement and homepage.
For licence statement I used:

    This library is free software; you can redistribute it and/or
    modify it under the terms of the GNU Library General Public
    License as published by the Free Software Foundation; either
    version 3 of the License, or (at your option) any later version.

    This library is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
    Library General Public License for more details.

    You should have received a copy of the GNU Library General Public
    License along with this library; if not, write to the Free Software
    Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston MA  02110-1301 USA

On Debian systems the full text of the GNU Library Public License, can be
found in the file /usr/share/common-licenses/LGPL-3.

control needs updating to add homepage, Short Description and Long Description (which needs to be indented with one space). I also added texinfo and libgetopt-java to Build-Depends and Suggests: libgetopt-java.
GNU Prolog for Java has a build dependency on gnu.getopt as gnu.prolog.test.GoalRunner uses it so I added
export CLASSPATH=/usr/share/java/gnu-getopt.jar to debian/rules. libgnuprologjava-java.javadoc needed build/api put in it so that it could find the javadoc.

So at this point I have a debian folder with the right files in it but I don’t have an ant task which will do everything neatly. Since debian package building really wants to happen on an unpacked release tarball of the source I modified my existing dist-src target to copy the debian folder across as well and added the following to my init target:

<!-- Key id to use for signing the .deb files -->
<property name="dist.keyid" value="" />
<property name="dist.name-version" value="${ant.project.name}-${version}"/>
<property name="dist.debdir" value="dist/${dist.name-version}/" />

I could then add the following dist-deb target to do all the work.

<target name="dist-deb" depends="dist-src" description="Produce a .deb to install with">
	<mkdir dir="${dist.debdir}"/>
	<unzip src="dist/${dist.name-version}-src.zip" dest="${dist.debdir}"/>
	<!-- Delete the getopt code as we want to use the libgetopt-java package for that -->
	<delete dir="${dist.debdir}src/gnu/getopt"/>
	<exec executable="dpkg-buildpackage" dir="${dist.debdir}">
		<arg value="-k${dist.keyid}"/>
	</exec>
</target>

So this gives me binary and source files for distribution and includes the javadoc documentation but lacks the manual.
To fix that I added libgnuprologjava-java.docs containing

build/manual/
docs/readme.txt

, libgnuprologjava-java.info containing build/gnuprologjava.info and libgnuprologjava-java.doc-base.manual containing:

Document: libgnuprologjava-java-manual
Title: Manual for libgnuprologjava-java
Author: Daniel Thomas
Abstract: This the Manual for libgnuprologjava-java
Section: Programming

Format: HTML
Index: /usr/share/doc/libgnuprologjava-java/manual
Files: /usr/share/doc/libgnuprologjava-java/manual/*.html

Now all of this might result in the creation of the relevant files needed to get GNU Prolog for Java into Debian but then again it might not :-). The .deb file does install (and uninstall) correctly on my machine but I might have violated some debian-policy.

Tags: .deb, ant, apt, build systems, code, debian, distributions, dpkg, GNU Prolog for Java, gnuprologjava, GSoC, java, packaging, prolog
Posted in CompSci | 2 Comments »

Compiling git from git fails if core.autocrlf is set to true

Monday, August 16th, 2010

If you get the following error when compiling git from a git clone of git:

: command not foundline 2: 
: command not foundline 5: 
: command not foundline 8: 
./GIT-VERSION-GEN: line 14: syntax error near unexpected token `elif'
'/GIT-VERSION-GEN: line 14: `elif test -d .git -o -f .git &&
make: *** No rule to make target `GIT-VERSION-FILE', needed by `git-am'. Stop.
make: *** Waiting for unfinished jobs....

and you have core.autocrlf set to true in your git config then consider the following “it’s probably not a good idea to convert newlines in the git repo” curtsey of wereHamster on #git.

So having core.autocrlf set to true may result in bad things happening in odd ways because line endings are far more complicated than they have any reason to be (thanks to decisions made before I was born). Bugs due to white space errors are irritating and happen to me far too often :-).

Today I used CVS – it was horrible, I tried to use git’s cvsimport and cvsexportcommit to deal with the fact that CVS is horrible unfortunately this did not work ;-(.

This post exists so that the next time someone is silly like me they might get a helpful response from Google.

Tags: code, compiling, error, git, make, white space
Posted in CompSci | No Comments »