Archive for the ‘CompSci’ Category

MyCloud part 0: Why? It is my data.

Wednesday, September 25th, 2013

I am leaving Google and similar companies cloud services and moving to my own infrastructure for my personal data. This process is going to take a while and I am going to document it here to make it easier for others. However the obvious question is why move from free cloud services which already exist and are easy to use to paying for my own infrastructure and configuring it myself? Well partly I do not want to be the product any more which is being sold, I want to be the customer not merely a user who is being sold to advertisers. Since there is no way to pay Google to stop selling me I have to go elsewhere. I could go to someone like MyKolab which claims to care about privacy and do things properly – and people who cannot roll their own probably should think about it – but I get better guarantees from rolling my own and it should be a good learning experience.

Also Snowden. My aim is to make it such that if anyone (including state actors) want my data, then the easiest way of gaining access to it is to come and ask me nicely, we can discuss it like civilised people over tea and cake and if you make a sensible argument then you can have it. If not come back with a warrant. I am not a criminal or a terrorist and I do not expect to be treated like one with all my communications being intercepted. My data includes other people’s personally identifying information (PII) and so can only be disclosed to people who they would expect it to be given to for the purpose for which it was provided. That does not include GCHQ etc. and so I am not following the spirit of the Data Protection Act (DPA) if I make it possible for other people to obtain it without asking.

Similarly some of my friends work for Christian, environmental, aid or democracy organisations, sometimes in countries where doing so is dangerous. Information which might compromise their security is carefully never committed to computer systems (such operational security has been common in Christian circles for 2000 years) but sometimes people make mistakes, particularly when communicating internally in ‘safe’ countries like the UK. However no countries have clean records on human rights etc. and data collected by the ‘five eyes’ is shared with others (e.g. unfiltered access is given to Israel) and there are countries who are our allies in the ‘war on terror’ but which also persecute (or have elements of their security forces who persecute) minorities or groups within their country. I might in some sense be willing to trust the NSA and GCHQ etc. (because they have no reason to be interested in me) but I cannot because that means trusting 800,000 people in the US alone, some of whom will be working for bad governments.

Similarly while our present government is mostly trying to be good if frequently foolish. It is very easy for that to change. So we need to ensure that the work required to go from where we are to a police state is huge so that we have enough time to realise and do something about it. Presently the distance to cover in terms of infrastructure is far too small, being almost negligible. It is our duty as citizens to grow that gap and to keep it wide.

So I am going to try and find solutions which follow best practises of current computer security, following the principle of least privilege and using compartmentalisation to limit the damage that the compromise of any one component can cause. I am going to document this so that you can point out the holes in it so that we can learn together how to do this properly.

Maybe some of this might even help towards my PhD…

Filters that work

Thursday, August 8th, 2013

Summary: The architecture for David Cameron’s filtering plans is wrong and has a negative consequences, however there are alternative architectures which might work.

There has been much news coverage about David Cameron’s plans for opt-out filters for all internet users in the UK. With opt-in systems barely anyone will opt-in and with opt-out systems barely anyone will opt-out and so this is a proposal for almost everyone to have a filter on their internet traffic. Enabling households to easily filter out bad content from their internet traffic is useful in that there are many people who do want to do this (such as myself[1]). However the proposed architecture has a number of significant flaws and (hopefully unintended) harmful side effects.

Here I will briefly recap what those flaws and side-effects are and propose an architecture which I claim lacks these flaws and side-effects while providing the desired benefits.

  1. All traffic goes through central servers which have to process it intensively. This makes bad things like analysing this traffic much easier. It also means that traffic cannot be so efficiently routed. It means that there can be no transparency about what is actually going on as no one outside the ISP can see.
  2. There is no transparency or accountability. The lists of things being blocked are not available and even if they were it is hard to verify that those are the ones actually being used. If an address gets added which should not be (say that of a political party or an organisation which someone does not like) then there is no way of knowing that it has been or of removing it from the list. Making such lists available even for illegal content (such as the IWF’s lists) does not make that content any more available but it does make it easier to detect and block it (for example TOR exit nodes could block it). In particular it means having found some bad content it is easier to work out if that content needs to be added to the list or if it is already on it.
  3. Central records must be kept on who is and who is not using such filters, really such information is none of anyone else’s business. They should not know or be able to tell, and they do not need to.

I am not going to discuss whether porn is bad for you though I have heard convincing arguments that it is. Nor will I expect any system to prevent people who really want to access such content from doing so. I also will not use a magic ‘detect if adult’ device to prevent teenagers from changing the settings to turn filters off.

Most home internet systems consist of a number of devices connected to some sort of ISP provided hub which then connects to the ISP’s systems and then to the internet. This hub is my focus as it is provided by the ISP and so can be provisioned with the software they desire and configured by them but is also under the control of the household and provides an opportunity for some transparency. The same architecture can be used with the device itself performing the filtering, for example when using mobile phones on 3G or inside web browsers when using TLS.

So how would such a system work? Well these hubs are basically just a very small Linux machine, like a Raspberry Pi and it is already handling the networking for the devices in the house, probably running a NAT[0] and doing DHCP, it should probably also be running a DNS server and using DNSSEC. It already has a little web server to display its management pages and so could trivially display web pages saying “this content blocked for you because of $reason, if this is wrong do $thing”. Then when it makes DNS requests for domains to the ISP’s servers then they can reply with additional information about whether this domain is known to have bad content and where to find additional information on that which the hub can then look up and use to as input to apply local policy.
Then the household can configure to hub that applies the policy they want and it can be shipped with a sensible default and no one knows what policy they chose unless they snoop their traffic (which should require a warrant).
Now there might want to be a couple of extra tweaks in here, for example there is some content which people really do not want to see but find very difficult not to seek out, for example I have friends who have struggled for a long time to recover from a pornography addiction. Hence providing the functionality whereby filter settings can be made read only such that a user can choose to make ‘impossible’ to turn off can be useful as in a stronger moment they can make a decision that prevents them being able to do something they do not want to in a weaker moment. Obviously any censorship system can be circumvented by a sufficiently determined person but self blocking things is an effective strategy to help people break addictions, whether to facebook in the run up to exams or to more addictive websites.

So would such a system actually work? I think that it is technically feasible and would achieve the purposes it is intended to and not have the same problems that the current proposed architecture has. However it might not work with currently deployed hardware as that might not have quite enough processing power (though not by much). However an open, well specified system would allow incremental roll out and independent implementation and verification. Additionally it does not provide the services for which David Cameron’s system is actually being built which is to make it easier to snoop on all internet users web traffic. This is just the Digital Economy bill all over again but with ‘think of the children’ rather than ‘think of the terrorists’ as its sales pitch. There is little point blocking access to illegal content as that can always be circumvented, much better to take the content down[2] and lock up the people who produced it, failing that, detect it as the traffic leaves the ISP’s network towards bad places and send round a police van to lock up the people accessing it. Then everything has to go through the proper legal process in plain sight.

[0]: in the case of Virgin Media’s ‘Super Hub’ doing so incredibly badly such that everything needs tunnelling out to a sane network.
[1]: Though currently I do not beyond using Google’s strict safe search because there is no easy mechanism for doing so, the only source of objectionable content that actually ends up on web pages I see is adverts, on which more later.
[2]: If this is difficult then make it easier, it is far too hard to take down criminal website such as phishing scams at the moment and improvements in international cooperation on this would be of great benefit.

Surveillance consequences

Wednesday, August 7th, 2013

Mass surveillance of the citizens of a country allows intelligence services to use ‘big data’ techniques to find suspicious things which they would not otherwise have found. They can analyse the graph structure of communications to look for suspicious patterns or suspicious keywords. However as a long term strategy it is fundamentally flawed. The problem is the effect of surveillance on those being watched. Being watched means not being trusted, being outside and other, separate from those who know best and under suspicion. It makes you foreign, alien and apart, it causes fear and apprehension, it reduces integration. It makes communities which feel that they are being picked on, distressed and splits them apart from those around them. This causes a feeling of oppression and unfairness, of injustice. This results in anger, which grows in the darkness and leads to death.

That is not the way to deal with ‘terrorism’. Come, let us build our lives together as one community, not set apart and divided. Let us come together and talk of how we can build a better world for us and for our children. Inside we are all the same, it does not matter where we came from, only where we are going to and how we get there.
Come, let us put on love rather than fear, let us welcome rather than reject, let us build a country where freedom reigns and peace flows like a river through happy tree lined streets where children play.

I may be an idealist but that does not make this impossible, only really hard, and massively worth it. The place to begin is as always in my own heart for I am not yet ready to live in the country I want us to be. There is a long way to go, and so my friends: let us begin.

Communicating with a Firefox extension from Selenium

Monday, May 20th, 2013

Edit: I think this now longer works with more recent versions of Firefox, or at least I have given up on this strategy and gone for extending Webdriver to do what I want instead.

For something I am currently working on I wanted to use Selenium to automatically access some parts of Firefox which are not accessible from a page. The chosen method was to use a Firefox extension and send events between the page and the extension to carry data. Getting this working was more tedious than I was expecting, perhaps mainly because I have tried to avoid javascript whenever possible in the past.

The following code extracts set up listeners with Selenium and the Firefox extension and send one event in each direction. Using this to do proper communication and to run automated tests is left as an exercise for the author but hopefully someone else will find this useful as a starting point. The full code base this forms part of will be open sourced and made public at some future point when it does something more useful.



import org.openqa.selenium.JavascriptExecutor;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.openqa.selenium.firefox.FirefoxProfile;

public class App {
private static final String SEND = "\"syncCommandToExtension\"";
private static final String RECV = "\"syncCommandToPage\"";

public static void main(String[] args) throws IOException {
// This is where maven is configured to put the compiled .xpi
File extensionFile = new File("target/extension.xpi");
// So that the relevant Firefox extension developer settings get turned on.
File developerFile = new File("developer_profile-0.1-fn+fx.xpi");
FirefoxProfile firefoxProfile = new FirefoxProfile();
WebDriver driver = new FirefoxDriver(firefoxProfile);
if (driver instanceof JavascriptExecutor) {
AsyncExecute executor = new AsyncExecute(((JavascriptExecutor) driver));
executor.execute("document.addEventListener( " + RECV + ", function(aEvent) { document.title = (" + RECV
+ " + aEvent) }, true);");
"document.dispatchEvent(new CustomEvent(" + SEND + "));");

} else {
System.err.println("Driver does not support javascript execution");

* Encapsulate the boilerplate code required to execute javascript with Selenium
private static class AsyncExecute {
private final JavascriptExecutor executor;

public AsyncExecute(JavascriptExecutor executor) {
this.executor = executor;

public void execute(String javascript) {
executor.executeAsyncScript("var callback = arguments[arguments.length - 1];"+ javascript
+ "callback(null);", new Object[0]);

browserOverlay.js Originally cribbed from the XUL School hello world tutorial.

"syncCommandToExtension", function(aEvent) { window.alert("document syncCommandToExtension" + aEvent);/* do stuff*/ }, true, true);

// do not try to add a callback until the browser window has
// been initialised. We add a callback to the tabbed browser
// when the browser's window gets loaded.
window.addEventListener("load", function () {
// Add a callback to be run every time a document loads.
// note that this includes frames/iframes within the document
gBrowser.addEventListener("load", pageLoadSetup, true);
}, false);

function syncLog(message){
Application.console.log("SYNC-TEST: " + message);

function sendToPage(doc) {
doc.dispatchEvent(new CustomEvent("syncCommandToPage"));

function pageLoadSetup(event) {
// this is the content document of the loaded page.
let doc = event.originalTarget;

if (doc instanceof HTMLDocument) {
// is this an inner frame?
if (doc.defaultView.frameElement) {
// Frame within a tab was loaded.
// Find the root document:
while (doc.defaultView.frameElement) {
doc = doc.defaultView.frameElement.ownerDocument;
// The event listener is added after the page has loaded and we don't want to trigger
// the event until the listener is registered.
setTimeout(function () {sendToPage(doc);},1000);

HTC android phones – the prefect spying platform?

Wednesday, October 24th, 2012

I am reading “Systematic detection of capability leaks in stock Android smartphones” for the CL’s Mobile Security reading group.

I read “all the tested HTC phones export the RECORD AUDIO permission, which allows any untrusted app to specify which file to write recorded audio to without asking for the RECORD AUDIO permission.” and then went back and looked again at Table 3 and saw the other permissions that stock HTC android images export to all other applications: these include access location and camera permissions. The authors report that HTC was rather slow at responding to their telling HTC that they had a problem. Hence stock HTC Legend, EVO 4G and Wildfire S phones are a really nice target for installing spy software because it doesn’t have to ask for any permissions at all (can pretend to be a harmless game) and yet can record where you go[0] what you say and at least if your phone is not in your pocket also what you see.

This is probably more likely to be incompetence than a deliberate conspiracy but if they were trying to be evil it would look just like this.

On the plus side Google’s Nexus range with stock images are much safer and Google is rather better at responding promptly to security issues. Since my android phone is one that Google has given our research group for doing research I am fortunately safe.

I also particularly liked HTC’s addition of the FREEZE capability which locks you out of the phone until you remove the battery, just perfect for when the attacker realises you are on to them to allow them to do the last bit of being malicious without your being able to stop them.

End of being provocative. ;-)

[0] Ok so on Wildfire S location information is implicitly rather than explicitly exported so probably harder to get hold of.

Raspberry Pie

Saturday, August 25th, 2012

In honour of the Raspberry Pi I wanted to make a Raspberry Pie, I tried to do this by looking up a recipe on the rPi plugged into the TV but page loads were too slow (still running debian squeeze rather than raspbian so not taking advantage of the speed increases associated with that).
So I decided to just experiment and throw things together until they looked about right (the temporary absence of scales meant that being accurate was difficult). When you are making something yummy out of components which are all yummy there is only so far you can go wrong.
This produced the following:
A raspberry pie in a pyrex dish lead

There was a little less pastry than would have been optimal made using flour, unsalted butter and a little bit of water (cribbing from Delia’s instructions but without any accuracy). I left it in the fridge for well over the half an hour I had originally intended before rolling it out. This was cooked for ~10minutes at 180℃ (might have been better to leave it longer). I used two punnets of raspberries most of which went in raw on top of the cooked pastry but ~1/3 of a punnet went in with some sugar (mainly castor sugar but a little bit of soft brown which deepened the colour) and two heaped tablespoons of corn flour and a little big of water this was stirred vigorously on a hob such that it did a lot of bubbling until it turned into a rather nice thick goo with all the bits of raspberry broken up (looked very jam like). That then got poured on top. I left it in the fridge over night as it was quite late by this point and we ate most of it for lunch.

The only good pie chart - fraction of pie which is pacman, fraction which is pie

The only good pie chart, fraction of pie dish which looks like pacman, fraction which is pie.

Raspberry Pi Entropy server

Thursday, August 23rd, 2012

The Raspberry Pi project is one of the more popular projects the Computer Lab is involved with at the moment and all the incoming freshers are getting one.

One of the things I have been working on as a Research Assistant in the Digital Technology Group is on improving the infrastructure we use for research and my current efforts include using puppet to automate the configuration of our servers.

We have a number of servers which are VMs and hence can be a little short of entropy. One solution to having a shortage of entropy is an ‘entropy key‘ which is a little USB device which uses reverse biased diodes to generate randomness and has a little ARM chip (ARM is something the CL is rather proud of) which does a pile of crypto and analysis to ensure that it is good randomness. As has been done before (with pretty graphs) this can then be fed to VMs providing them with the randomness they want.

My solution to the need for some physical hardware to host the entropy key was a Raspberry Pi because I don’t need very much compute power and dedicated hardware means that it is less likely to get randomly reinstalled. A rPi can be thought of as the hardware equivalent of a small VM.

Unboxed Raspberry Pi with entropy key

I got the rPi from Rob Mullins by taking a short walk down the corridor on the condition that there be photos. One of the interesting things about using rPis for servers is that the cost of the hardware is negligible in comparison with the cost of connecting that hardware to the network and configuring it.

The Raspberry Pi with entropy key temporarily installed in a wiring closet

The rPi is now happily serving entropy to various VMs from the back of a shelf in one of the racks in a server room (not the one shown, we had to move it elsewhere).

Initially it was serving entropy in the clear via the EGD protocol over TCP. Clearly this is rather bad as observable entropy doesn’t really gain you anything (and might lose you everything). Hence it was necessary to use crypto to protect the transport from the rPi to the VMs.
This is managed by the dtg::entropy, dtg::entropy::host and dtg::entropy::client classes which generate the relevant config for egd-linux and stunnel.

This generates an egd-client.conf which looks like this:

; This stunnel config is managed by Puppet.

sslVersion = TLSv1
client = yes

setuid = egd-client
setgid = egd-client
pid = /
chroot = /var/lib/stunnel4/egd-client

socket = l:TCP_NODELAY=1
socket = r:TCP_NODELAY=1
TIMEOUTclose = 0

debug = 0
output = /egd-client.log

verify = 3

CAfile = /usr/local/share/ssl/cafile

accept = 7777
connect =

And a host config like:

; This stunnel config is managed by Puppet.

sslVersion = TLSv1

setuid = egd-host
setgid = egd-host
pid = /
chroot = /var/lib/stunnel4/egd-host

socket = l:TCP_NODELAY=1
socket = r:TCP_NODELAY=1
TIMEOUTclose = 0

debug = 0
output = /egd-host.log

cert = /root/puppet/ssl/stunnel.pem
key = /root/puppet/ssl/stunnel.pem
CAfile = /usr/local/share/ssl/cafile

accept = 7776
connect = 777
cert = /root/puppet/ssl/stunnel.pem
key = /root/puppet/ssl/stunnel.pem
CAfile = /usr/local/share/ssl/cafile

Getting that right was somewhat tedious due to defaults not working well together.
openssl s_client -connect
and a python egd client were useful for debugging. In the version of debian in rasperian the stunnel binary points to an stunnel3 compatibility script around the actual stunnel4 binary which resulted in much confusion when trying to run stunnel manually.

Vertical labels on gnuplot LaTeX graphs

Thursday, April 28th, 2011

This post exists because doing this was far harder than it should have been and hopefully this will help someone else in the future.

When creating a bar chart/histogram in gnuplot using the latex driver if there are a lot of bars and the labels for the bars are over a certain length then the labels overlap horribly. The solution to this would be to rotate them and the following LaTeX and gnuplot code allows that to happen and deals with various fallout that results.

The following defines a length for storing offsets of the labels in \verticallabeloffset as this offset will depend on the length of the text. It also stores a length which holds the maximum of those values \maxverticallabeloffset. It provides a command \verticallabel which does hte following: calculates the length of the label, moves the start position across by 1.4em, moves the label down by 2em (to provide space for the x axis label) and then down by the length of the text. It then uses a sideways environment to make the text vertical. Then it uses the ifthen package to work out if the \verticallabeloffset is bigger than any previously seen and if so it sets \globaldefs=1 so that the effect of the \setlength command will be global rather than being restricted to the local scope and then sets it back to 0 (this is a nasty hack).
It also provides the \xaxislabel command which shifts the x axis title up into the space between the x axis and the labels.

\providecommand{\verticallabel}[1]{\settowidth{\verticallabeloffset}{#1}\hspace{1.4em}\vspace{-2em}\vspace{-\verticallabeloffset}\begin{sideways} #1 \end{sideways}\ifthenelse{\lengthtest{\verticallabeloffset>\maxverticallabeloffset}}{\globaldefs=1\setlength\maxverticallabeloffset{\verticallabeloffset}\globaldefs=0}{}}

Having defined that the following allows gnuplot histograms generated with the LaTeX driver to be included in a LaTeX document. It first resets the maximum offset and then ensures there is sufficient space for the labels

\caption{foo bar chart}

The following gnuplot code generates a histogram which uses this to get the labels to display correctly.

set terminal latex size 16cm, 7.5cm
set style histogram errorbars

set ylabel "\\begin\{sideways\}Mean absolute error\\end\{sideways\}"
set xlabel "\\xaxislabel\{Data set\}"

set output "comparisonSummary_data.tex"
plot "comparisonSummary.dat" index 0 using 2:($3*1.96):xtic("\\verticallabel\{" . stringcolumn(1) . "\}") with histogram title "Mean absolute error for different data sets"

I hope this helps someone. I should get around to actually patching the gnuplot latex driver so that it works properly – but that will have to wait until post exams.

IB Group Projects

Thursday, March 10th, 2011

On Wednesday the Computer Science IB students demonstrated the projects that they have been working on for the last term. This is my thoughts on them.

Some of the projects were really quite interesting, some of them even actually useful in real life, some of them didn’t work, were boring and simply gimmicks.

Alpha: “African SMS Radio” was a project to create a pretty GUI to a “byzantine and buggy” backend. It could allow a radio operator to run polls and examine stats of texts sent to a particular number. However it didn’t look particularly interesting and though there might be use cases for such a system I think only as a component of a larger more enterprise system and only after the “buggy” backend they had to use had been fixed up/rewritten.

Bravo: “Crowd control” was a project to simulate evacuations of buildings. It is a nice use of the Open Room Map project to provide the building data. It looked like it was still a little buggy – in particular it was allowing really quite nasty crushes to occur and the resulting edge effects as people were thrown violently across the room as the system tried to deal with multiple people being in the same place at the same time was a little amusing. With a little more work it could become quite useful as an extension in the Open Room Map ecosystem which could help it gain momentum and take off. I think that the Open Room Map project is really quite cool and useful – it is the way that data on the current structure and contents of buildings can be crowd sourced and kept up to date but then it is a project of my supervisor. ;-)

Charlie: “Digit[Ov]al automated cricket commentary” this was a project to use little location transmitters on necklaces and usb receivers plugged into laptops to determine the location of cricketers while they were playing and then automatically construct commentary on that. It won the prize for best technical project but it didn’t actually work. They hadn’t solved the problem of people being between the transmitter and the receiver reducing transmission strength by 1/3 or the fact that placing a hand over it reduced it by 1/3 or the fact that the transmitters were not omnidirectional and so orientation was a major issue. They were also limited to only four receivers due to only having four suitable laptops. They used a square arrangement to try and detect location. It is possible that a double triangle arrangement with three corners at ground level and then the other triangle higher up (using the ‘stadium’ to gain height) and offset so that the upper vertices lined up with the mid point of the lower edges would have given them a better signal. Calibrating and constructing algorithms to deal with the noise and poor data would probably have been quite difficult and required some significant work – which IB students haven’t really been taught enough for yet.

Delta: “Hand Wave, Hand Wave” was a project to use two sensors with gyroscopes and accelerometers to do gesture recognition and control. It didn’t really work in the demo and since it had reimplemented everything it didn’t manage to do anything particularly interesting. I think using such sensors for gesture control is probably a dead end as kinect and the like makes just using a camera so much easier and more interesting.

Echo: “iZoopraxiscope – Interactive Handheld Projector” this project was about using a phone with a build in pico projector as an interface. This was obviously using very prototype technology – using the projector would drain the phones battery very quickly, in some cases even when the phone was plugged in and fitting it in the (slightly clunky) phone clearly was at the expense of providing the normal processing power that is expected in an Android phone resulting in it being somewhat sluggish. Since the sensors were rather noisy and techniques for coping with that were not as advanced as they might have been (they just used an exponential moving average and manually tweaked the parameter) they had some difficulties with sluggishness in the controls of some of the games. However I think they produced several nice arcade style games (I didn’t play any of them) and so did demonstrate a wide range of uses. With better knowledge of how to deal with sensors (not really covered in any of the courses offered at the CL) and better technology this could be really neat. However getting a battery powered projector to compete with normal lighting is going to be quite a challenge.
The thing I really like about small projectors is that it could help make it easier to interact in lectures. Sometimes when asking a question or making a comment in lectures it might be useful to draw a diagram which the lecturer (and the rest of the audience) can see and currently doing so is really quite hard. (I should take to carrying around a laser pointer for use in these circumstances).

Foxtrot: “Lounge Star” this was a android app for making air passenger’s lives a little easier by telling them information such as which gate to use etc. without them having to go anywhere and integrating with various airlines systems. As someone who has ‘given up flying’ (not in an absolute sense but in a ‘while any other option (including not going) still remains’ sense) this was not vastly interesting but it could really work as a product if the airlines like it. So: “Oh it is another nice little Android app” (but then associated short attention span kicks in and “bored now”).

Golf: The Energy Forecast this was a project I really liked (it pushed the right buttons) it is a project to predict the energy production of all the wind farms in the country based on the predicted wind speed. It integrated various sources of wind speeds, power production profiles for different types of wind farm and the locations and types of many different wind farms (they thought all but I found some they were missing) and they had a very pretty GUI using google maps etc to show things geographically and were using a very pretty graph drawing javascript library. So I did the “oh you should use the SRCF to host that” thing (they were using a public IP on one of their own computers) and I am sort of thinking “I would really like to have your code” (Oh wait I know where that is kept, snarfle, snarfle ;-) It is something I would really like to make into a part of the ReadYourMeter ecosystem (I may try and persuade Andy he wants to get something done with it).
I love wind turbines all my (small) investments are in them, we have one in our back garden etc. this could be really useful. [end fanboyism]

Hotel: “Top Tips” this was a project to see whether the comments traders put on their trading tips actually told you anything about how good the trade was. The answer was no, not really, nothing to see here. Which is a little disappointing and not a particularly interesting project “lets do some data analysis!” etc.

India: “True Mobile Coverage” this was a project to crowd source the collection of real mobile signal strength data. It actually serves a useful purpose and could be really helpful. They needed to work on their display a little as it wasn’t very good at distinguishing between areas they didn’t know much about and areas with weak signal and unfortunately as with all projects it started working in a very last minute manner so they didn’t have that much data to show. Nice crowd sourcing data collection android app of the kind that loads of people in the CL love. Of course there will be large quantities they could do to improve it using the kind of research which has been done in the CL but it is a good start.

Juliet: “Twitter Dashboard” this was so obviously going to win from the beginning – a twitter project (yey bandwagon) which looks pretty. They did do a very good job, it looked pretty, it ate 200% of the SRCF’s CPU continuously during the demo (but was niced to 19 so didn’t affect other services) – there are probably efficiency savings to be made here but that isn’t a priority for a Group Project which is mainly about producing something that looks pretty and as if it works all other considerations are secondary. My thoughts were mainly “Oh another project to make it easier for Redgate to do more of their perpetual advertising. meh.” (they have lovely people working for them but I couldn’t write good enough Java for them)

Kilo: “Walk out of the Underground” this was a project to guide you from the moment you stepped out of the underground to your destination using an arrow on the screen of your phone. It was rather hard to demo inside the Intel Lab where there is both poor signal and insufficient scale to see whether it actually works. It might be useful, it might work, it is yet another app for the app store and could probably drum up a few thousand users as a free app.

Lima: “Who is my Customer?” this was a very enterprise project to do some rather basic Information Retrieval to find the same customer in multiple data sets. The use case being $company has a failsome information system and their data is poor quality and not well linked together. Unfortunately the project gave the impression of being something which one person could hack together in a weekend. I may be being overly harsh but I found it a little boring.

So in summary: I liked “The Energy Forcast” most because it pushed the right buttons, “True mobile coverage” is interesting and useful. Charlie could be interesting if it could be made to work but I think that the ‘cricket’ aspect is a little silly – if you want commentary use a human. iZoopraxiscope (what a silly name) points out some cool tech that will perhaps be useful in the future but really is not ready yet (they might need/be using some of the cool holgrams tech that Tim Wilkinson is working on (he gave a CUCaTS talk “Do We Really Need Pixels?” recently).

Idea for next year: have a competition after the end of the presentations to write up the project in a scientific paper style and then publish the ones that actually reach a sufficiently good standard in a IB Group Project ‘journal’ as this would provide some scientific skills to go with all the Software Engineering skills that the Group project is currently supposed to teach. (No this is so not going to happen in reality)

tidy_vig: Automatically reformatting generated HTML into something cleaner

Friday, February 4th, 2011

As webmaster and secretary of various things I regularly need to upload minutes to websites and hence want to upload html files. While Open/LibreOffice’s export to html functionality works it doesn’t produce nice html. tidy is a useful tool for finding flaws in html and making it correct and nicer but it is not sufficient to accomplish this task on its own. Hence I have finally scriptified the various automatable parts of turning generated html into something publishable (this loses all style definitions so won’t look the same – use tidy_up if you want to avoid that).


set -e #bail if something goes wrong

tidy_up='tidy -indent -modify -clean -bare -asxml -utf8 -wrap 80 -access 3 --logical-emphasis yes'

$tidy_up $1 #Normalise to lowercase and remove most rubbish
$tidy_up $1
$tidy_up $1 #Repeat until stabalises - this happens third time
# Get sed to select the range of lines to apply the replacement on first.
# No I don't know what is going on here.
sed -i '/]*>/,/<\/style>/ {:ack N; /<\/style>/! b ack s/]*>.*<\/style>//g }' $1
sed -i 's/ class="[^"]*"//g' $1
sed -i 's/<\/*span>//g' $1
$tidy_up $1 #Reformat now that remaining cruft removed
sed -i 's/ class="[^"]*"//g' $1 #Remove any classes that got un-line breaked

Unfortunately there may still need to be some manual work if for example headers haven’t been specified as headers when the person who wrote the original file wrote it and so it may be that some sections might need conversion.

It is probably possible to do this in a cleaner more logical way and I have probably missed edge cases and this probably counts as being a little hacky however hopefully someone will find it useful.