Blog Data Policy

So an AI walks into a Pub..

There is a joke/useful analogy that in very simple terms explains despite all the complexities and technicalities, how modern AI systems work at a fundamental level..

An AI walks into a pub and goes up to the bar, the bartender greets the newcomer and wants to know what they would like to drink..

“What’s everyone else drinking…”

Good is it not…

An AI or specifically a LLM is a reflection of it’s training data and is looking for the most statistical relevant or in simple terms “most common” response to any question you give it.. what most people are drinking in the bar analogy..

“What’s everyone else drinking…”

The reason I bring this up is a trip I made to Dublin last week, and a visit to one or two bars in that fine city.

What is everyone else drinking… well in the Temple Bar area of Dublin, is going to be a pint of Guinness.. and perhaps in most of the city that is going to be the case.

But how representative is this.. the bartender in the joke / analogy is of course the training data used to train our model so while Guinness have the statistical significance in a Temple Bar pub, is it the case for Dublin, or indeed the rest of Ireland.

If we expanded out sample of bartenders to include all or Ireland Guinness may have less significance on the other hand if we focused on some of Dublins more up market bars we might find a lot of expresso martinis consumed..

A bar in Dallas, Sydney, or Bangkok are all lightly to produce different responses for our imaginary bartender..

The moral of this is clearly that models are very sensitive to their training data and how representative the training data is of the subject of interest, in almost all cases in may not be as representative and we might like and an important question for the industry is what to do in those circumstances.

How we alter the response (weight) of a system based on a foundation model to take into account limitations of data is the real “Question for our Times”, and indeed it’s also important to remember that sometimes the data is actually an accurate reflection of reality even if we might not like it..

In AI data is the code

In AI data is the code, so we need to really understand all aspects of it, not just how representative it is but its antecedence, who created it and for what original purpose.

More thinking along this lines to follow…

Data Policy

Data driven development in Kenya

If you are interested in finding out how data can make a real change to peoples lives in Kenya, come along and find out about Gather a startup I have been helping over the last few months…

Gather is holding its first public launch on Tuesday 20th June at 7.00pm at the Urban Innovation Centre in London.

The evening will be a great opportunity to learn more about Gather and meet our wider team as we launch our demo platform.

Doors will open from 6.30pm and refreshments will be provided.

To RSVP, please email We look forward to welcoming you on 20th June.

About Gather:
Gather uses data to transform city sanitation. Gather’s platform will visualise the areas of greatest need, provide insight and track progress towards providing sanitation for everyone in cities, starting in Nairobi, Kenya. For more information, please visit

Data Policy

NRE App – just wrong !

Today National Rail Enquires have released a free iPhone app for real time train information. Hang-on you may say, I though that app already existed.. well it does !

For the last few years National Rail Enquires (NRE)  have been licensing at some considerable cost it’s information to independent software developers for them to develop their own apps, indeed one of my favourite all time apps is UK Train Times developed by Dave Addey and his team at Agant.

Todays release is clearly a case of channel conflict by a Quasi-Government organisation, and I would suggest anti-competitive.

NRE should not be developing an app and competing with it’s “partners” who have developed a range of apps for the last few years. NRE should just release the data under an Open Gov Licence and let the ecosystem develop !

So much for the release of government data empowering the software industry, my old friends at Ordnance Survey always recognised this was an issue and kept out of their partners space, not developing a mobile OS maps application despite what I might have argued at the time 🙂

Written and submitted from home (51.425N, 0.331W)