Archive for the ‘CompSci’ Category

Ross Anderson 1956-2024

Friday, March 29th, 2024

My first interaction with Ross was through his work. In my second year at university a PhD student (David Simner) suggested that reading Ross’ textbook “Security Engineering” was good preparation, and so over the course of a few weeks that autumn I read it cover to cover. Now I am a Senior Lecturer in cybersecurity, and one of the places that began was there. I still recommend that anyone involved in security read the latest edition of that book, because of the way it so clearly and accessibly explains and systematises such a huge breadth of important topics in cybersecurity.

Later Ross was my lecturer, explaining so much, so clearly and engagingly. In those days I still found Ross a little intimidating, but as I got to know him better I discovered how warm, friendly, and caring he was. He became my great grand PhD supervisor (Alastair Beresford <- Frank Stajano <- Ross Anderson), my co-lecturer, my PI, my co-author, my co-conspirator, and my friend.

So many people owe so much to Ross. His broad understanding of cybersecurity that proactively drew in other disciplines and created fields like security economics. His commitment to civil society through the cryptowars, patient privacy, government IT, and civil liberties. His commitment to his family and friends, and his support for disadvantaged people. While much of what he did was very public, some of the most important things were only visible if you got close to him.

He made a huge difference to the careers of many people (including my own), with many of his PhD students and postdocs going on to obtain faculty positions internationally, or other senior roles where they have in turn had a huge impact. He supported a diversity of thought and brought people into the department from a range of disciplines, helping to redefine what computer science is.

He had a huge impact on the University of Cambridge (once named the “most powerful person”) through a range of campaigns and several terms on the University Council. He was ever a critical friend of the Vice Chancellor and played an important role in uncovering various kinds of corruption, mismanagement and discrimination. I learnt a lot from him through that. For a while he chaired the Cycling and Walking Sub Group of the Transport Working Group of which I was secretary. I think our first formal joint work was our proposed policy on cycling and walking, which was completely ignored by the University. He fought with me for the rights of postdocs to continue to be allowed to vote in University democracy (we lost). He was always someone you wanted by your side in a fight.

Ever one to have a memorable turn of phrase, one of the things he achieved in his battles with the central administrators was to plant a little ghost of himself in their heads so that every time they thought of doing something silly (e.g. on IP) the ghost would remind them of what response that would get and so put them off. This saved him a lot of time.

Another memorable description was of the zombie government policies on cryptography, or ID cards, or NHS IT. Ross and others would keep killing these policies off, carrying them away with great fanfare and burying them deep under the ground. Only for the policies to claw their way back out again after the next election.

One of our many great losses is that we will no longer have Ross in our ranks as we fight these good fights. However, many of us carry the memory of Ross, and a model of what he might do. Not that we should necessarily do the same thing, but it is often a helpful starting point.

One of his last battles with the central administrators was over Cambridge’s mandatory retirement age. He didn’t want to retire as he had so much left to do. While he was forced to partially retire in 2023 he had not given up on returning to full time pay (he was still doing full time work). That injustice remains for others to right.

I had hoped that the next government would live to regret appointing Ross to the House of Lords, he would have been good at that and caused some good trouble.

He was not perfect, like all of us he made mistakes, and sometimes enemies, but he was our friend and we loved him. We will miss him. He leaves both a void and a great many people who he trained to fill it.

There is much more I could say and much worth saying that I do not know. He did a lot. He was a giant and he helped us stand on his shoulders. He showed us that humans could be heroes.

Transferring files between servers (without root)

Wednesday, January 3rd, 2024

I needed to transfer files between an old server and its replacement. rsync over ssh is the obvious tool to use but working out how to use it properly such that all file permissions are preserved under the restriction that it is not possible to ssh as root, and so sudo is required on both ends. Additionally rsync cannot be used to transfer files between two remote hosts, one end needs to be local to rsync. There is a further complication caused by the fact that we cannot type in the sudo password required by the sudo on the remote host end on the command line as rsync is already using that pipe for its own stuff so we need some X-forwarding to give us an independent channel for that and ssh-askpass to make use of it. The most useful advice came from a 2015 blog from dg12158 at Bristol but I had to add a few things for my use case.

## Preparation
# On source host
sudo apt install rsync # If not already installed

# On destination host
sudo apt install ssh-askpass rsync # If not already installed

## Transfer
# On local host
ssh -AX admin-user@source.host.example.com # To get to source host with ssh-agent and X forwarding
sudo --preserve-env=SSH_AUTH_SOCK rsync --delete --relative --acls --xattrs --atimes --crtimes --hard-links -ave 'ssh -X' --rsync-path='SUDO_ASKPASS=/usr/bin/ssh-askpass sudo -A rsync' /home/./user1/Maildir /home/./user2/Maildir /home/./user3/Maildir /home/./user4/ admin-user@destination.host.example.com:/home/
# --preserve-env so that the ssh-agent forwarding works inside sudo (using ssh agent forwarding is a security risk if source.host is compromised during the transfer)
# --delete because I expect to run this multiple times while making sure destination.host is ready before flipping over to the new host, and so also need to carry over file deletions
# --relative because I am copying directories from several different user accounts. The "/./" in the path truncates the relative path at that dot so that it all ends up in the right place in /home/ later.
# --acls --xattrs --atimes --crtimes --hard-links to make rsync be even more archivey than -a makes it
# -v for verbosity during testing
# -e to pass -X to the inner ssh used by rsync to continue the X-forwarding on to destination.host
# --rsync-path sets the SUDO_ASKPASS so that all that X-forwarding can be put to use, specifies sudo be used, and with -A so that ssh-askpass is used to ask for the password.
# Then the source folders to send over (using /./ as mentioned earlier, to avoid an extra cd command)
# Finally the destination host details and directory

What I decided against was echoing passwords around or putting them in environment variables (risk of them being logged or ending up in bash histories, and using sudo -v in advance (because that requires editing the sudo config into a less secure state not using tty_tickets).

Hopefully that will come in useful to someone else, if not, then probably future me.

MyCloud part 0: Why? It is my data.

Wednesday, September 25th, 2013

I am leaving Google and similar companies cloud services and moving to my own infrastructure for my personal data. This process is going to take a while and I am going to document it here to make it easier for others. However the obvious question is why move from free cloud services which already exist and are easy to use to paying for my own infrastructure and configuring it myself? Well partly I do not want to be the product any more which is being sold, I want to be the customer not merely a user who is being sold to advertisers. Since there is no way to pay Google to stop selling me I have to go elsewhere. I could go to someone like MyKolab which claims to care about privacy and do things properly – and people who cannot roll their own probably should think about it – but I get better guarantees from rolling my own and it should be a good learning experience.

Also Snowden. My aim is to make it such that if anyone (including state actors) want my data, then the easiest way of gaining access to it is to come and ask me nicely, we can discuss it like civilised people over tea and cake and if you make a sensible argument then you can have it. If not come back with a warrant. I am not a criminal or a terrorist and I do not expect to be treated like one with all my communications being intercepted. My data includes other people’s personally identifying information (PII) and so can only be disclosed to people who they would expect it to be given to for the purpose for which it was provided. That does not include GCHQ etc. and so I am not following the spirit of the Data Protection Act (DPA) if I make it possible for other people to obtain it without asking.

Similarly some of my friends work for Christian, environmental, aid or democracy organisations, sometimes in countries where doing so is dangerous. Information which might compromise their security is carefully never committed to computer systems (such operational security has been common in Christian circles for 2000 years) but sometimes people make mistakes, particularly when communicating internally in ‘safe’ countries like the UK. However no countries have clean records on human rights etc. and data collected by the ‘five eyes’ is shared with others (e.g. unfiltered access is given to Israel) and there are countries who are our allies in the ‘war on terror’ but which also persecute (or have elements of their security forces who persecute) minorities or groups within their country. I might in some sense be willing to trust the NSA and GCHQ etc. (because they have no reason to be interested in me) but I cannot because that means trusting 800,000 people in the US alone, some of whom will be working for bad governments.

Similarly while our present government is mostly trying to be good if frequently foolish. It is very easy for that to change. So we need to ensure that the work required to go from where we are to a police state is huge so that we have enough time to realise and do something about it. Presently the distance to cover in terms of infrastructure is far too small, being almost negligible. It is our duty as citizens to grow that gap and to keep it wide.

So I am going to try and find solutions which follow best practises of current computer security, following the principle of least privilege and using compartmentalisation to limit the damage that the compromise of any one component can cause. I am going to document this so that you can point out the holes in it so that we can learn together how to do this properly.

Maybe some of this might even help towards my PhD…

Filters that work

Thursday, August 8th, 2013

Summary: The architecture for David Cameron’s filtering plans is wrong and has a negative consequences, however there are alternative architectures which might work.

There has been much news coverage about David Cameron’s plans for opt-out filters for all internet users in the UK. With opt-in systems barely anyone will opt-in and with opt-out systems barely anyone will opt-out and so this is a proposal for almost everyone to have a filter on their internet traffic. Enabling households to easily filter out bad content from their internet traffic is useful in that there are many people who do want to do this (such as myself[1]). However the proposed architecture has a number of significant flaws and (hopefully unintended) harmful side effects.

Here I will briefly recap what those flaws and side-effects are and propose an architecture which I claim lacks these flaws and side-effects while providing the desired benefits.

  1. All traffic goes through central servers which have to process it intensively. This makes bad things like analysing this traffic much easier. It also means that traffic cannot be so efficiently routed. It means that there can be no transparency about what is actually going on as no one outside the ISP can see.
  2. There is no transparency or accountability. The lists of things being blocked are not available and even if they were it is hard to verify that those are the ones actually being used. If an address gets added which should not be (say that of a political party or an organisation which someone does not like) then there is no way of knowing that it has been or of removing it from the list. Making such lists available even for illegal content (such as the IWF’s lists) does not make that content any more available but it does make it easier to detect and block it (for example TOR exit nodes could block it). In particular it means having found some bad content it is easier to work out if that content needs to be added to the list or if it is already on it.
  3. Central records must be kept on who is and who is not using such filters, really such information is none of anyone else’s business. They should not know or be able to tell, and they do not need to.

I am not going to discuss whether porn is bad for you though I have heard convincing arguments that it is. Nor will I expect any system to prevent people who really want to access such content from doing so. I also will not use a magic ‘detect if adult’ device to prevent teenagers from changing the settings to turn filters off.

Most home internet systems consist of a number of devices connected to some sort of ISP provided hub which then connects to the ISP’s systems and then to the internet. This hub is my focus as it is provided by the ISP and so can be provisioned with the software they desire and configured by them but is also under the control of the household and provides an opportunity for some transparency. The same architecture can be used with the device itself performing the filtering, for example when using mobile phones on 3G or inside web browsers when using TLS.

So how would such a system work? Well these hubs are basically just a very small Linux machine, like a Raspberry Pi and it is already handling the networking for the devices in the house, probably running a NAT[0] and doing DHCP, it should probably also be running a DNS server and using DNSSEC. It already has a little web server to display its management pages and so could trivially display web pages saying “this content blocked for you because of $reason, if this is wrong do $thing”. Then when it makes DNS requests for domains to the ISP’s servers then they can reply with additional information about whether this domain is known to have bad content and where to find additional information on that which the hub can then look up and use to as input to apply local policy.
Then the household can configure to hub that applies the policy they want and it can be shipped with a sensible default and no one knows what policy they chose unless they snoop their traffic (which should require a warrant).
Now there might want to be a couple of extra tweaks in here, for example there is some content which people really do not want to see but find very difficult not to seek out, for example I have friends who have struggled for a long time to recover from a pornography addiction. Hence providing the functionality whereby filter settings can be made read only such that a user can choose to make ‘impossible’ to turn off can be useful as in a stronger moment they can make a decision that prevents them being able to do something they do not want to in a weaker moment. Obviously any censorship system can be circumvented by a sufficiently determined person but self blocking things is an effective strategy to help people break addictions, whether to facebook in the run up to exams or to more addictive websites.

So would such a system actually work? I think that it is technically feasible and would achieve the purposes it is intended to and not have the same problems that the current proposed architecture has. However it might not work with currently deployed hardware as that might not have quite enough processing power (though not by much). However an open, well specified system would allow incremental roll out and independent implementation and verification. Additionally it does not provide the services for which David Cameron’s system is actually being built which is to make it easier to snoop on all internet users web traffic. This is just the Digital Economy bill all over again but with ‘think of the children’ rather than ‘think of the terrorists’ as its sales pitch. There is little point blocking access to illegal content as that can always be circumvented, much better to take the content down[2] and lock up the people who produced it, failing that, detect it as the traffic leaves the ISP’s network towards bad places and send round a police van to lock up the people accessing it. Then everything has to go through the proper legal process in plain sight.

[0]: in the case of Virgin Media’s ‘Super Hub’ doing so incredibly badly such that everything needs tunnelling out to a sane network.
[1]: Though currently I do not beyond using Google’s strict safe search because there is no easy mechanism for doing so, the only source of objectionable content that actually ends up on web pages I see is adverts, on which more later.
[2]: If this is difficult then make it easier, it is far too hard to take down criminal website such as phishing scams at the moment and improvements in international cooperation on this would be of great benefit.

Surveillance consequences

Wednesday, August 7th, 2013

Mass surveillance of the citizens of a country allows intelligence services to use ‘big data’ techniques to find suspicious things which they would not otherwise have found. They can analyse the graph structure of communications to look for suspicious patterns or suspicious keywords. However as a long term strategy it is fundamentally flawed. The problem is the effect of surveillance on those being watched. Being watched means not being trusted, being outside and other, separate from those who know best and under suspicion. It makes you foreign, alien and apart, it causes fear and apprehension, it reduces integration. It makes communities which feel that they are being picked on, distressed and splits them apart from those around them. This causes a feeling of oppression and unfairness, of injustice. This results in anger, which grows in the darkness and leads to death.

That is not the way to deal with ‘terrorism’. Come, let us build our lives together as one community, not set apart and divided. Let us come together and talk of how we can build a better world for us and for our children. Inside we are all the same, it does not matter where we came from, only where we are going to and how we get there.
Come, let us put on love rather than fear, let us welcome rather than reject, let us build a country where freedom reigns and peace flows like a river through happy tree lined streets where children play.

I may be an idealist but that does not make this impossible, only really hard, and massively worth it. The place to begin is as always in my own heart for I am not yet ready to live in the country I want us to be. There is a long way to go, and so my friends: let us begin.

Communicating with a Firefox extension from Selenium

Monday, May 20th, 2013

Edit: I think this now longer works with more recent versions of Firefox, or at least I have given up on this strategy and gone for extending Webdriver to do what I want instead.

For something I am currently working on I wanted to use Selenium to automatically access some parts of Firefox which are not accessible from a page. The chosen method was to use a Firefox extension and send events between the page and the extension to carry data. Getting this working was more tedious than I was expecting, perhaps mainly because I have tried to avoid javascript whenever possible in the past.

The following code extracts set up listeners with Selenium and the Firefox extension and send one event in each direction. Using this to do proper communication and to run automated tests is left as an exercise for the author but hopefully someone else will find this useful as a starting point. The full code base this forms part of will be open sourced and made public at some future point when it does something more useful.

App.java


package uk.ac.cam.cl.dtg.sync;

import java.io.File;
import java.io.IOException;

import org.openqa.selenium.JavascriptExecutor;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.openqa.selenium.firefox.FirefoxProfile;

public class App {
private static final String SEND = "\"syncCommandToExtension\"";
private static final String RECV = "\"syncCommandToPage\"";

public static void main(String[] args) throws IOException {
// This is where maven is configured to put the compiled .xpi
File extensionFile = new File("target/extension.xpi");
// So that the relevant Firefox extension developer settings get turned on.
File developerFile = new File("developer_profile-0.1-fn+fx.xpi");
FirefoxProfile firefoxProfile = new FirefoxProfile();
firefoxProfile.addExtension(extensionFile);
firefoxProfile.addExtension(developerFile);
WebDriver driver = new FirefoxDriver(firefoxProfile);
driver.get("about:blank");
if (driver instanceof JavascriptExecutor) {
AsyncExecute executor = new AsyncExecute(((JavascriptExecutor) driver));
executor.execute("document.addEventListener( " + RECV + ", function(aEvent) { document.title = (" + RECV
+ " + aEvent) }, true);");
executor.execute(
"document.dispatchEvent(new CustomEvent(" + SEND + "));");

} else {
System.err.println("Driver does not support javascript execution");
}
}

/**
* Encapsulate the boilerplate code required to execute javascript with Selenium
*/
private static class AsyncExecute {
private final JavascriptExecutor executor;

public AsyncExecute(JavascriptExecutor executor) {
this.executor = executor;
}

public void execute(String javascript) {
executor.executeAsyncScript("var callback = arguments[arguments.length - 1];"+ javascript
+ "callback(null);", new Object[0]);
}
}
}

browserOverlay.js Originally cribbed from the XUL School hello world tutorial.


document.addEventListener(
"syncCommandToExtension", function(aEvent) { window.alert("document syncCommandToExtension" + aEvent);/* do stuff*/ }, true, true);

// do not try to add a callback until the browser window has
// been initialised. We add a callback to the tabbed browser
// when the browser's window gets loaded.
window.addEventListener("load", function () {
// Add a callback to be run every time a document loads.
// note that this includes frames/iframes within the document
gBrowser.addEventListener("load", pageLoadSetup, true);
}, false);

function syncLog(message){
Application.console.log("SYNC-TEST: " + message);
}

function sendToPage(doc) {
doc.dispatchEvent(new CustomEvent("syncCommandToPage"));
}

function pageLoadSetup(event) {
// this is the content document of the loaded page.
let doc = event.originalTarget;

if (doc instanceof HTMLDocument) {
// is this an inner frame?
if (doc.defaultView.frameElement) {
// Frame within a tab was loaded.
// Find the root document:
while (doc.defaultView.frameElement) {
doc = doc.defaultView.frameElement.ownerDocument;
}
}
// The event listener is added after the page has loaded and we don't want to trigger
// the event until the listener is registered.
setTimeout(function () {sendToPage(doc);},1000);
};
};

HTC android phones – the prefect spying platform?

Wednesday, October 24th, 2012

I am reading “Systematic detection of capability leaks in stock Android smartphones” for the CL’s Mobile Security reading group.

I read “all the tested HTC phones export the RECORD AUDIO permission, which allows any untrusted app to specify which file to write recorded audio to without asking for the RECORD AUDIO permission.” and then went back and looked again at Table 3 and saw the other permissions that stock HTC android images export to all other applications: these include access location and camera permissions. The authors report that HTC was rather slow at responding to their telling HTC that they had a problem. Hence stock HTC Legend, EVO 4G and Wildfire S phones are a really nice target for installing spy software because it doesn’t have to ask for any permissions at all (can pretend to be a harmless game) and yet can record where you go[0] what you say and at least if your phone is not in your pocket also what you see.

This is probably more likely to be incompetence than a deliberate conspiracy but if they were trying to be evil it would look just like this.

On the plus side Google’s Nexus range with stock images are much safer and Google is rather better at responding promptly to security issues. Since my android phone is one that Google has given our research group for doing research I am fortunately safe.

I also particularly liked HTC’s addition of the FREEZE capability which locks you out of the phone until you remove the battery, just perfect for when the attacker realises you are on to them to allow them to do the last bit of being malicious without your being able to stop them.

End of being provocative. ;-)

[0] Ok so on Wildfire S location information is implicitly rather than explicitly exported so probably harder to get hold of.

Raspberry Pie

Saturday, August 25th, 2012

In honour of the Raspberry Pi I wanted to make a Raspberry Pie, I tried to do this by looking up a recipe on the rPi plugged into the TV but page loads were too slow (still running debian squeeze rather than raspbian so not taking advantage of the speed increases associated with that).
So I decided to just experiment and throw things together until they looked about right (the temporary absence of scales meant that being accurate was difficult). When you are making something yummy out of components which are all yummy there is only so far you can go wrong.
This produced the following:
A raspberry pie in a pyrex dish lead

There was a little less pastry than would have been optimal made using flour, unsalted butter and a little bit of water (cribbing from Delia’s instructions but without any accuracy). I left it in the fridge for well over the half an hour I had originally intended before rolling it out. This was cooked for ~10minutes at 180℃ (might have been better to leave it longer). I used two punnets of raspberries most of which went in raw on top of the cooked pastry but ~1/3 of a punnet went in with some sugar (mainly castor sugar but a little bit of soft brown which deepened the colour) and two heaped tablespoons of corn flour and a little big of water this was stirred vigorously on a hob such that it did a lot of bubbling until it turned into a rather nice thick goo with all the bits of raspberry broken up (looked very jam like). That then got poured on top. I left it in the fridge over night as it was quite late by this point and we ate most of it for lunch.

The only good pie chart - fraction of pie which is pacman, fraction which is pie

The only good pie chart, fraction of pie dish which looks like pacman, fraction which is pie.

Raspberry Pi Entropy server

Thursday, August 23rd, 2012

The Raspberry Pi project is one of the more popular projects the Computer Lab is involved with at the moment and all the incoming freshers are getting one.

One of the things I have been working on as a Research Assistant in the Digital Technology Group is on improving the infrastructure we use for research and my current efforts include using puppet to automate the configuration of our servers.

We have a number of servers which are VMs and hence can be a little short of entropy. One solution to having a shortage of entropy is an ‘entropy key‘ which is a little USB device which uses reverse biased diodes to generate randomness and has a little ARM chip (ARM is something the CL is rather proud of) which does a pile of crypto and analysis to ensure that it is good randomness. As has been done before (with pretty graphs) this can then be fed to VMs providing them with the randomness they want.

My solution to the need for some physical hardware to host the entropy key was a Raspberry Pi because I don’t need very much compute power and dedicated hardware means that it is less likely to get randomly reinstalled. A rPi can be thought of as the hardware equivalent of a small VM.

Unboxed Raspberry Pi with entropy key

I got the rPi from Rob Mullins by taking a short walk down the corridor on the condition that there be photos. One of the interesting things about using rPis for servers is that the cost of the hardware is negligible in comparison with the cost of connecting that hardware to the network and configuring it.

The Raspberry Pi with entropy key temporarily installed in a wiring closet

The rPi is now happily serving entropy to various VMs from the back of a shelf in one of the racks in a server room (not the one shown, we had to move it elsewhere).

Initially it was serving entropy in the clear via the EGD protocol over TCP. Clearly this is rather bad as observable entropy doesn’t really gain you anything (and might lose you everything). Hence it was necessary to use crypto to protect the transport from the rPi to the VMs.
This is managed by the dtg::entropy, dtg::entropy::host and dtg::entropy::client classes which generate the relevant config for egd-linux and stunnel.

This generates an egd-client.conf which looks like this:

; This stunnel config is managed by Puppet.

sslVersion = TLSv1
client = yes

setuid = egd-client
setgid = egd-client
pid = /egd-client.pid
chroot = /var/lib/stunnel4/egd-client

socket = l:TCP_NODELAY=1
socket = r:TCP_NODELAY=1
TIMEOUTclose = 0

debug = 0
output = /egd-client.log

verify = 3

CAfile = /usr/local/share/ssl/cafile

[egd-client]
accept = 7777
connect = entropy.dtg.cl.cam.ac.uk:7776

And a host config like:

; This stunnel config is managed by Puppet.

sslVersion = TLSv1

setuid = egd-host
setgid = egd-host
pid = /egd-host.pid
chroot = /var/lib/stunnel4/egd-host

socket = l:TCP_NODELAY=1
socket = r:TCP_NODELAY=1
TIMEOUTclose = 0

debug = 0
output = /egd-host.log

cert = /root/puppet/ssl/stunnel.pem
key = /root/puppet/ssl/stunnel.pem
CAfile = /usr/local/share/ssl/cafile

[egd-host]
accept = 7776
connect = 777
cert = /root/puppet/ssl/stunnel.pem
key = /root/puppet/ssl/stunnel.pem
CAfile = /usr/local/share/ssl/cafile

Getting that right was somewhat tedious due to defaults not working well together.
openssl s_client -connect entropy.dtg.cl.cam.ac.uk:7776
and a python egd client were useful for debugging. In the version of debian in rasperian the stunnel binary points to an stunnel3 compatibility script around the actual stunnel4 binary which resulted in much confusion when trying to run stunnel manually.

Vertical labels on gnuplot LaTeX graphs

Thursday, April 28th, 2011

This post exists because doing this was far harder than it should have been and hopefully this will help someone else in the future.

When creating a bar chart/histogram in gnuplot using the latex driver if there are a lot of bars and the labels for the bars are over a certain length then the labels overlap horribly. The solution to this would be to rotate them and the following LaTeX and gnuplot code allows that to happen and deals with various fallout that results.

The following defines a length for storing offsets of the labels in \verticallabeloffset as this offset will depend on the length of the text. It also stores a length which holds the maximum of those values \maxverticallabeloffset. It provides a command \verticallabel which does hte following: calculates the length of the label, moves the start position across by 1.4em, moves the label down by 2em (to provide space for the x axis label) and then down by the length of the text. It then uses a sideways environment to make the text vertical. Then it uses the ifthen package to work out if the \verticallabeloffset is bigger than any previously seen and if so it sets \globaldefs=1 so that the effect of the \setlength command will be global rather than being restricted to the local scope and then sets it back to 0 (this is a nasty hack).
It also provides the \xaxislabel command which shifts the x axis title up into the space between the x axis and the labels.

\newlength{\verticallabeloffset}
\newlength{\maxverticallabeloffset}
\setlength{\maxverticallabeloffset}{0pt}
\providecommand{\verticallabel}[1]{\settowidth{\verticallabeloffset}{#1}\hspace{1.4em}\vspace{-2em}\vspace{-\verticallabeloffset}\begin{sideways} #1 \end{sideways}\ifthenelse{\lengthtest{\verticallabeloffset>\maxverticallabeloffset}}{\globaldefs=1\setlength\maxverticallabeloffset{\verticallabeloffset}\globaldefs=0}{}}
\providecommand{\xaxislabel}[1]{\vspace{2em}#1}

Having defined that the following allows gnuplot histograms generated with the LaTeX driver to be included in a LaTeX document. It first resets the maximum offset and then ensures there is sufficient space for the labels

\begin{figure}
\centering
\setlength{\maxverticallabeloffset}{0pt}
\include{figs/graphs/comparisonSummary_data}
\vspace{-3em}\vspace{\maxverticallabeloffset}
\caption{foo bar chart}
\label{figs:graphs:comparisonSummary_data}
\end{figure}

The following gnuplot code generates a histogram which uses this to get the labels to display correctly.

set terminal latex size 16cm, 7.5cm
set style histogram errorbars

set ylabel "\\begin\{sideways\}Mean absolute error\\end\{sideways\}"
set xlabel "\\xaxislabel\{Data set\}"

set output "comparisonSummary_data.tex"
plot "comparisonSummary.dat" index 0 using 2:($3*1.96):xtic("\\verticallabel\{" . stringcolumn(1) . "\}") with histogram title "Mean absolute error for different data sets"

I hope this helps someone. I should get around to actually patching the gnuplot latex driver so that it works properly – but that will have to wait until post exams.