Posts Tagged ‘CompSci’

You are not just yourself

Thursday, May 11th, 2017

Sometimes people feel powerless, like their individual action does not matter. That is not true, it matters tremendously and it is enormously powerful, I am going to explain one of the reasons why.

When you make a decision it is not just you making that decision, it is also people like you making the same decision for similar reasons. No one exists in isolation or acts alone, every individual is part of many overlapping, interconnected, and interdependent groups, most of which they are not even aware of. When you make a decision you make it based on how you think and what you know (consciously or not). Other people like you will be in similar situations and make the same decision, you make it together.

This means that every action you take does matter because it is not just you, it is people like you doing the same thing. Your individual action might be tiny, but your collective action might be huge. If the only thing stopping you is that you do not think it will make a difference because it is just you, then do it, if you do it then other people will too, if you do not then they will not. You have the responsibility to make the decision and to do the thing, but in doing it, you will not be alone.

There are lots of reasons to vote and this is only one of them, but you should.

There is a dark side to the fact that you are not just yourself, you are a community, and that is that if others control the inputs to your community and target them carefully for every group, then you are not yourself, you are theirs.

Think carefully, think twice, install an ad-blocker and make your decision.

 

Now that sounds horribly patronising, which it is, and so this academic is going to get off his ivory tower with his simplistic notions and go and do some work.

Communicating with a Firefox extension from Selenium

Monday, May 20th, 2013

Edit: I think this now longer works with more recent versions of Firefox, or at least I have given up on this strategy and gone for extending Webdriver to do what I want instead.

For something I am currently working on I wanted to use Selenium to automatically access some parts of Firefox which are not accessible from a page. The chosen method was to use a Firefox extension and send events between the page and the extension to carry data. Getting this working was more tedious than I was expecting, perhaps mainly because I have tried to avoid javascript whenever possible in the past.

The following code extracts set up listeners with Selenium and the Firefox extension and send one event in each direction. Using this to do proper communication and to run automated tests is left as an exercise for the author but hopefully someone else will find this useful as a starting point. The full code base this forms part of will be open sourced and made public at some future point when it does something more useful.

App.java


package uk.ac.cam.cl.dtg.sync;

import java.io.File;
import java.io.IOException;

import org.openqa.selenium.JavascriptExecutor;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.openqa.selenium.firefox.FirefoxProfile;

public class App {
private static final String SEND = "\"syncCommandToExtension\"";
private static final String RECV = "\"syncCommandToPage\"";

public static void main(String[] args) throws IOException {
// This is where maven is configured to put the compiled .xpi
File extensionFile = new File("target/extension.xpi");
// So that the relevant Firefox extension developer settings get turned on.
File developerFile = new File("developer_profile-0.1-fn+fx.xpi");
FirefoxProfile firefoxProfile = new FirefoxProfile();
firefoxProfile.addExtension(extensionFile);
firefoxProfile.addExtension(developerFile);
WebDriver driver = new FirefoxDriver(firefoxProfile);
driver.get("about:blank");
if (driver instanceof JavascriptExecutor) {
AsyncExecute executor = new AsyncExecute(((JavascriptExecutor) driver));
executor.execute("document.addEventListener( " + RECV + ", function(aEvent) { document.title = (" + RECV
+ " + aEvent) }, true);");
executor.execute(
"document.dispatchEvent(new CustomEvent(" + SEND + "));");

} else {
System.err.println("Driver does not support javascript execution");
}
}

/**
* Encapsulate the boilerplate code required to execute javascript with Selenium
*/
private static class AsyncExecute {
private final JavascriptExecutor executor;

public AsyncExecute(JavascriptExecutor executor) {
this.executor = executor;
}

public void execute(String javascript) {
executor.executeAsyncScript("var callback = arguments[arguments.length - 1];"+ javascript
+ "callback(null);", new Object[0]);
}
}
}

browserOverlay.js Originally cribbed from the XUL School hello world tutorial.


document.addEventListener(
"syncCommandToExtension", function(aEvent) { window.alert("document syncCommandToExtension" + aEvent);/* do stuff*/ }, true, true);

// do not try to add a callback until the browser window has
// been initialised. We add a callback to the tabbed browser
// when the browser's window gets loaded.
window.addEventListener("load", function () {
// Add a callback to be run every time a document loads.
// note that this includes frames/iframes within the document
gBrowser.addEventListener("load", pageLoadSetup, true);
}, false);

function syncLog(message){
Application.console.log("SYNC-TEST: " + message);
}

function sendToPage(doc) {
doc.dispatchEvent(new CustomEvent("syncCommandToPage"));
}

function pageLoadSetup(event) {
// this is the content document of the loaded page.
let doc = event.originalTarget;

if (doc instanceof HTMLDocument) {
// is this an inner frame?
if (doc.defaultView.frameElement) {
// Frame within a tab was loaded.
// Find the root document:
while (doc.defaultView.frameElement) {
doc = doc.defaultView.frameElement.ownerDocument;
}
}
// The event listener is added after the page has loaded and we don't want to trigger
// the event until the listener is registered.
setTimeout(function () {sendToPage(doc);},1000);
};
};

Raspberry Pi Entropy server

Thursday, August 23rd, 2012

The Raspberry Pi project is one of the more popular projects the Computer Lab is involved with at the moment and all the incoming freshers are getting one.

One of the things I have been working on as a Research Assistant in the Digital Technology Group is on improving the infrastructure we use for research and my current efforts include using puppet to automate the configuration of our servers.

We have a number of servers which are VMs and hence can be a little short of entropy. One solution to having a shortage of entropy is an ‘entropy key‘ which is a little USB device which uses reverse biased diodes to generate randomness and has a little ARM chip (ARM is something the CL is rather proud of) which does a pile of crypto and analysis to ensure that it is good randomness. As has been done before (with pretty graphs) this can then be fed to VMs providing them with the randomness they want.

My solution to the need for some physical hardware to host the entropy key was a Raspberry Pi because I don’t need very much compute power and dedicated hardware means that it is less likely to get randomly reinstalled. A rPi can be thought of as the hardware equivalent of a small VM.

Unboxed Raspberry Pi with entropy key

I got the rPi from Rob Mullins by taking a short walk down the corridor on the condition that there be photos. One of the interesting things about using rPis for servers is that the cost of the hardware is negligible in comparison with the cost of connecting that hardware to the network and configuring it.

The Raspberry Pi with entropy key temporarily installed in a wiring closet

The rPi is now happily serving entropy to various VMs from the back of a shelf in one of the racks in a server room (not the one shown, we had to move it elsewhere).

Initially it was serving entropy in the clear via the EGD protocol over TCP. Clearly this is rather bad as observable entropy doesn’t really gain you anything (and might lose you everything). Hence it was necessary to use crypto to protect the transport from the rPi to the VMs.
This is managed by the dtg::entropy, dtg::entropy::host and dtg::entropy::client classes which generate the relevant config for egd-linux and stunnel.

This generates an egd-client.conf which looks like this:

; This stunnel config is managed by Puppet.

sslVersion = TLSv1
client = yes

setuid = egd-client
setgid = egd-client
pid = /egd-client.pid
chroot = /var/lib/stunnel4/egd-client

socket = l:TCP_NODELAY=1
socket = r:TCP_NODELAY=1
TIMEOUTclose = 0

debug = 0
output = /egd-client.log

verify = 3

CAfile = /usr/local/share/ssl/cafile

[egd-client]
accept = 7777
connect = entropy.dtg.cl.cam.ac.uk:7776

And a host config like:

; This stunnel config is managed by Puppet.

sslVersion = TLSv1

setuid = egd-host
setgid = egd-host
pid = /egd-host.pid
chroot = /var/lib/stunnel4/egd-host

socket = l:TCP_NODELAY=1
socket = r:TCP_NODELAY=1
TIMEOUTclose = 0

debug = 0
output = /egd-host.log

cert = /root/puppet/ssl/stunnel.pem
key = /root/puppet/ssl/stunnel.pem
CAfile = /usr/local/share/ssl/cafile

[egd-host]
accept = 7776
connect = 777
cert = /root/puppet/ssl/stunnel.pem
key = /root/puppet/ssl/stunnel.pem
CAfile = /usr/local/share/ssl/cafile

Getting that right was somewhat tedious due to defaults not working well together.
openssl s_client -connect entropy.dtg.cl.cam.ac.uk:7776
and a python egd client were useful for debugging. In the version of debian in rasperian the stunnel binary points to an stunnel3 compatibility script around the actual stunnel4 binary which resulted in much confusion when trying to run stunnel manually.

Models of human sampling and interpolating regular data

Saturday, October 23rd, 2010

On Thursday I submitted my project proposal for my Part II project. A HTML version of it (generated using hevea and tidy from LaTeX with all styling stripped out) follows. (With regard to the work schedule – I appear to be one week behind already. Oops.)

Part II Computer Science Project Proposal

Models of human sampling and interpolating regular data

D. Thomas, Peterhouse

Originator: Dr A. Rice

Special Resources Required

The use of my own laptop (for development)
The use of the PWF (backup, backup development)
The use of the SRCF (backup, backup development)
The use of zeus (backup)

Project Supervisor: Dr A. Rice

Director of Studies: Dr A. Norman

Project Overseers: Alan Blackwell + Cecilia Mascolo
(AFB/CM)

Introduction

When humans record information they do not usually do so in the same
regular manner that a machine does as the rate at which they sample depends
on factors such as how interested in the data they are and whether they have
developed a habit of collecting the data on a particular schedule. They are
also likely to have other commitments which prevent them recording at precise
half hour intervals for years. In order to be able to test methods of
interpolating from human recorded data to a more regular data stream such as
that which would be created by a machine we need models of how humans collect
data. ReadYourMeter.org contains data collected by humans which can be used
to evaluate these models. Using these models we can then create test data
sets from high resolution machine recorded data sets1 and then try to interpolate back to the
original data set and evaluate how good different machine learning techniques
are at doing this. This could then be extended with pluggable models for
different data sets which could then use the human recorded data set to do
parameter estimation. Interpolating to a higher resolution regular data set
allows for comparison between different data sets for example those collected
by different people or relating to different readings such as gas and
electricity.

Work that has to be done

The project breaks down into the following main sections:-

  1. Investigating the distribution of recordings in
    the ReadYourMeter.org data set.
  2. Constructing hypotheses of how the human recording
    of data can be modelled and evaluating these models against the
    ReadYourMeter.org data set.
  3. Using these models to construct test data sets by
    sampling the regular machine recorded data sets2 to produce pseudo-human read test data sets
    which can be used to be learnt from as the results can be compared with the
    reality of the machine read data sets.
  4. Using machine learning interpolation techniques to
    try and interpolate back to the original data sets from the test data sets
    and evaluating success of different methods in achieving this.

    • Polynomial fit
    • Locally weighted linear regression
    • Gaussian process regression (see Chapter 2 of
      Gaussian Processes for Machine Learning by Rasmussen &
      Williams)
    • Neural Networks (possibly using java-fann)
    • Hidden Markov Models (possibly using jahmm)
  5. If time allows then using parameter estimation on
    a known model of a system to interpolate from a test data set back to the
    original data set and evaluating how well this compares with the machine
    learning techniques which have no prior knowledge of the system.
  6. Writing the Dissertation.

Difficulties to Overcome

The following main learning tasks will have to be undertaken before the
project can be started:

  • To find a suitable method for comparing different
    sampling patterns to enable hypothesises of human behaviour to be
    evaluated.
  • Research into existing models for related human
    behaviour.

Starting Point

I have a good working knowledge of Java and of queries in SQL.
I have read “Machine Learning” by Tom Mitchell.
Andrew Rice has written some Java code which does some basic linear
interpolation it was written for use in producing a particular paper but
should form a good starting point at least providing ideas on how to go
forwards. It can also be used for requirement sampling.

ReadYourMeter.org database

I have worked with the ReadYourMeter.org database before (summer 2009) and
with large data sets of sensor readings (spring 2008).
For the purpose of this project the relevant data can be viewed as a table
with three columns: “meter_id, timestamp, value“.
There are 99 meters with over 30 readings, 39 with over 50, 12 with over 100
and 5 with over 200. This data is to be used for the purpose of constructing
and evaluating models of how humans record data.

Evaluation data sets

There are several data sets to be used for the purpose of training and
evaluating the machine learning interpolation techniques. These are to be
sampled using the model constructed in the first part of the project for how
humans record data. This then allows the data interpolated from this sampled
data to be compared with the actual data which was sampled from.
The data sets are:

  • half hourly electricity readings for the WGB from
    2001-2010 (131416 records in “timestamp, usage rate
    format).
  • monthly gas readings for the WGB from 2002-2010 (71
    records in “date, total usage” format)
  • half hourly weather data from the DTG weather
    station from 1995-2010 (263026 records)

Resources

This project should mainly developed on my laptop which has sufficient
resources to deal with the anticipated workload.
The project will be kept in version control using GIT. The SRCF, PWF and zeus
will be set to clone this and fetch regularly. Simple backups will be taken
at weekly intervals to SRCF/PWF and to an external disk.

Success criterion

  1. Models of human behaviour in recording data must
    be constructed which emulate real behaviour in the ReadYourMeter.org
    dataset.
  2. The machine learning methods must produce better
    approximations of the underlying data than linear interpolation and these
    different methods should be compared to determine their relative merits on
    different data sets.
  3. The machine once trained should be able apply this
    to unseen data of a similar class and produce better results than linear
    interpolation.
  4. A library should be produced which is well
    designed and documented to allow users – particularly researchers – to be
    able to easily combine various functions on the input data.
  5. The dissertation should be written.

Work Plan

Planned starting date is 2010-10-15.

Dates in general indicate start dates or deadlines and this is clearly
indicated. Work items should usually be finished before the next one starts
except where indicated (extensions run concurrently with dissertation
writing).

Monday, October 18
Start: Investigating the distribution of
recordings in the ReadYourMeter.org data set
Monday, October 25
Start: Constructing hypotheses of how the human
recording of data can be modelled and evaluating these models against the
ReadYourMeter.org data set.
This involves examining the distributions and modes of recording found in
the previous section and constructing parametrised models which can
encapsulate this. For example a hypothesis might be that some humans record
data in three phases, first frequently (e.g. several times a day) and then
trailing off irregularly until some more regular but less frequent mode is
entered where data is recorded once a week/month. This would then be
parametrised by the length and frequency in each stage and within that
stage details such as the time of day would probably need to be
characterised by probability distributions which can be calculated from the
ReadYourMeter.org dataset.
Monday, November 8
Start: Using these models to construct test data
sets by sampling a regular machine recorded data sets.
Monday, November 15
Start: Using machine learning interpolation techniques to try and
interpolate back to the original data sets from the test data sets and
evaluating success of different methods in achieving this.

Monday, November 15
Start: Polynomial fit
Monday, November 22
Start: Locally weighted linear
regression
Monday, November 29
Start: Gaussian process regression
Monday, December 13
Start: Neural Networks
Monday, December 27
Start: Hidden Markov Models
Monday, January 3, 2011
Start: Introduction chapter
Monday, January 10, 2011
Start: Preparation chapter
Monday, January 17, 2011
Start: Progress report
Monday, January 24, 2011
Start: If time allows then using parameter
estimation on a known model of a system to interpolate from a test data set
back to the original data set. This continues on until 17th
March and can be expanded or shrunk depending on available time.
Friday, January 28, 2011
Deadline: Draft progress
report
Wednesday, February 2,
2011
Deadline: Final progress report
printed and handed in. By this point the core of the project should be
completed with only extension components and polishing remaining.
Friday, February 4, 2011,
12:00
Deadline: Progress Report
Deadline
Monday, February 7, 2011
Start: Implementation Chapter
Monday, February 21, 2011
Start: Evaluation Chapter
Monday, March 7, 2011
Start: Conclusions chapter
Thursday, March 17, 2011
Deadline: First Draft of
Dissertation (by this point revision for the exams will be in full swing
limiting time available for the project and time is required between drafts
to allow people to read and comment on it)
Friday, April 1, 2011
Deadline: Second draft
dissertation
Friday, April 22, 2011
Deadline: Third draft
dissertation
Friday, May 6, 2011
Deadline: Final version of
dissertation produced
Monday, May 16, 2011
Deadline: Print, bind and
submit dissertation
Friday, May 20, 2011,
11:00
Deadline: Dissertation
submission deadline

1
Such as the WGB’s Energy usage, see §Starting
Point for more details.
2
These are detailed in §Starting Point

Phone scammers

Thursday, June 24th, 2010

Today I received a call at about 10:05 to my home landline. I rapidly realised it was some kind of computer based scam and decided to have some fun seeing what they would try and do.
I had great fun doing this but I think that someone who does not understand computer could have easily been taken in.

As in many such scams they claimed to be a company working for Microsoft and offering this free service of finding out what is wrong with my computer as they detected that it was downloading lots of junk files from the internet which were slowing it down. Now our old Windows XP desktop is indeed old and slow and this is quite possibly due to junk. However it was obvious that they were making all this up. So they wanted me to turn my computer on – now obviously I wasn’t going to risk following any instructions on the real computer so I booted my XP VM on my laptop instead (which I will subsequently need to wipe).

Having booted the XP VM and possibly being passed onto a different call centre person. I was given a series of instructions the purpose of which was to prove that the computer had a problem. This involved going to the event viewer in computer administration (Start -> right click on “My Computer” -> Manage -> Event viewer and then to both Application and System. With a little sorting for effect we get a screen something like the following:The Event viewer screen of Computer Administration showing a screen full of errors on Application
I suppose many people might find that quite scary but I have previously looked at such screens and it was what I expected to see.

Having ‘proved’ that there was something wrong with my computer they then proceeded to try and get me to provide greater access to them. This was done by getting me to visit www.logmein123.com and use the code 807932 (which they really didn’t want me to reveal to anyone).

They then got remote access to my computer and went and installed a fake scanner from http://majorgeeks.com/Advanced_WindowsCare_v2_Personal_d4991.html Downloading the fake scanner This proceeded to produce some fake results:Results of the fake scann

They then wanted to see if my “software warranty” had expired as this would be why my computer was “downloading junk files which can’t be removed by anti-virus”.Software warranty has 'expired'
This was done by opening cmd and doing

cd \
tree

and while tree was running typing “expired.” so that it would appear at the bottom.

At this point they went in for the kill and opened up a form and claimed that “it is a timed http form so we can’t look at it” and that it would “automatically go in 8 minutes so you need to fill it in quickly”.Enter your card details here...
Obviously I wasn’t going to fill this form in so at that point I revealed that I knew that they were scammers. They denied this and got progressively more angry and incoherent and when I asked to be put through to their supervisor they hung up.

Follow up

Now obviously it is my duty to try and prevent this kind of thing from happening again.
So my first step was to try to find out the number which was used to call me using 1471 but unfortunately this did not work. I then tried the local police but they could not be of any help and they advised me to contact BT unfortunately BT could not help either as it was an international call with no number given.
I then reported relevant URLs to google and the exe to Stop Badware.

I contacted the company behind logmein123.com which seems to be a legitimate company telling them that their services are being abused and requesting comment from them about this. I received a very positive response: “Thanks for the heads up on this. We take this stuff very seriously and will investigate immediately. Any misuse of the product or trials for the purpose you describe is a violation of our terms and immediate grounds for termination of the service. Thank you for sending the PIN as it helps us not only track this down to end their service, it also gives us information we need should we decide to press legal action. …”

Now looking to see whether anyone else has discovered this gogreenpc.net scam I found that they have. So gogreenpc.net is a big scam site. Now I need to work out how to take them out. :-D

The people on #cl on irc.srcf.ucam.org were helpful in providing advice on follow up.

Online Banking: liabilities

Wednesday, December 30th, 2009

I was surprised to find that the co-operative bank’s policy is not as evil as “Security Engineering” would suggest bank’s policies are. Specifically:

“We will repay you any money that is taken from your account due to: any error by our staff or our systems, a computer crime which is not found and stopped by our security system.”

Whereas “Security Engineering” suggests that in general UK banks say ‘you are an evil criminal’ if a computer crime against your account succeeds.
Halifax says:

“If a customer of our online service is a victim of online fraud, we guarantee that they won’t lose any money from their account and will always be reimbursed in full.”

but I suppose the “our system is secure and so online fraud is not possible so you are a criminal” trick might work there…
Possibly this means that banks policies are improving as they realise that tackling fraud is their responsibility. (Perhaps they read the book which is very good).

From the point of view of login security the co-operative would give me a chip and pin card reader to verify online transactions which gives better security than Halifax’s username + password + some random fact that would be very easy to find out using something like facebook. (though there are flaws in such a chip and pin system detailed in the book).

Only 5 chapters left to read… :-)

1.0 release of fractals

Sunday, December 14th, 2008

I realise that this is definitely a namespace clash but hey.
After a few days work and significant help from #cl on the srcf irc my fractals program is ready for 1.0 release – at least in my eyes there are no known bugs – save that there may be more efficient algorithms to use especially from the point of view of quality of display – see here for an example of tc’s better version of carpet.

The ML code is released under GPL version 2 and can be found here

To compile it you will need a version of Moscow ML with its libraries – e.g. NOT the one shipped by Ubuntu which does not have the libraries packaged – hopefully this will be fixed in Jaunty – I used the version on the linux pwf machines at Cambridge University. All other instructions on compilation are included as a comment in the file.

Examples of output can be found here

This is an extension of ML Tick6* Foundations Of Computer Science at Cambridge University

And now for the pretty pictures:

part of the mandelbrot set

part of the mandelbrot set

koch curve

koch curve

Koch Snowflake

Koch Snowflake

Sierpiński Triangle

Sierpiński Triangle

Sierpiński Carpet

Sierpiński Carpet

Brownian Curve

Brownian Curve

Random Walk

Random Walk