Categories
Blog Data Policy

So an AI walks into a Pub..

There is a joke/useful analogy that in very simple terms explains despite all the complexities and technicalities, how modern AI systems work at a fundamental level..

An AI walks into a pub and goes up to the bar, the bartender greets the newcomer and wants to know what they would like to drink..

“What’s everyone else drinking…”

Good is it not…

An AI or specifically a LLM is a reflection of it’s training data and is looking for the most statistical relevant or in simple terms “most common” response to any question you give it.. what most people are drinking in the bar analogy..

“What’s everyone else drinking…”

The reason I bring this up is a trip I made to Dublin last week, and a visit to one or two bars in that fine city.

What is everyone else drinking… well in the Temple Bar area of Dublin, is going to be a pint of Guinness.. and perhaps in most of the city that is going to be the case.

But how representative is this.. the bartender in the joke / analogy is of course the training data used to train our model so while Guinness have the statistical significance in a Temple Bar pub, is it the case for Dublin, or indeed the rest of Ireland.

If we expanded out sample of bartenders to include all or Ireland Guinness may have less significance on the other hand if we focused on some of Dublins more up market bars we might find a lot of expresso martinis consumed..

A bar in Dallas, Sydney, or Bangkok are all lightly to produce different responses for our imaginary bartender..

The moral of this is clearly that models are very sensitive to their training data and how representative the training data is of the subject of interest, in almost all cases in may not be as representative and we might like and an important question for the industry is what to do in those circumstances.

How we alter the response (weight) of a system based on a foundation model to take into account limitations of data is the real “Question for our Times”, and indeed it’s also important to remember that sometimes the data is actually an accurate reflection of reality even if we might not like it..

In AI data is the code

In AI data is the code, so we need to really understand all aspects of it, not just how representative it is but its antecedence, who created it and for what original purpose.

More thinking along this lines to follow…

Categories
Data Policy

Data driven development in Kenya

If you are interested in finding out how data can make a real change to peoples lives in Kenya, come along and find out about Gather a startup I have been helping over the last few months…

Gather is holding its first public launch on Tuesday 20th June at 7.00pm at the Urban Innovation Centre in London.

The evening will be a great opportunity to learn more about Gather and meet our wider team as we launch our demo platform.

Doors will open from 6.30pm and refreshments will be provided.

To RSVP, please email john@gatherhub.org. We look forward to welcoming you on 20th June.

About Gather:
Gather uses data to transform city sanitation. Gather’s platform will visualise the areas of greatest need, provide insight and track progress towards providing sanitation for everyone in cities, starting in Nairobi, Kenya. For more information, please visit www.gatherhub.org/about.

Categories
Data Policy

NRE App – just wrong !

Today National Rail Enquires have released a free iPhone app for real time train information. Hang-on you may say, I though that app already existed.. well it does !

For the last few years National Rail Enquires (NRE)  have been licensing at some considerable cost it’s information to independent software developers for them to develop their own apps, indeed one of my favourite all time apps is UK Train Times developed by Dave Addey and his team at Agant.

Todays release is clearly a case of channel conflict by a Quasi-Government organisation, and I would suggest anti-competitive.

NRE should not be developing an app and competing with it’s “partners” who have developed a range of apps for the last few years. NRE should just release the data under an Open Gov Licence and let the ecosystem develop !

So much for the release of government data empowering the software industry, my old friends at Ordnance Survey always recognised this was an issue and kept out of their partners space, not developing a mobile OS maps application despite what I might have argued at the time 🙂

Written and submitted from home (51.425N, 0.331W)

Categories
Data Policy INSPIRE

EU Hackathon, Google can help you get there..

Google is supporting the EU Hackathon in November and is offering travel expenses to selected individuals !

The deadline to apply is October 17, and the Hackathon takes place November 8th and 9th, at the European Parliament in Brussels.
Applicants must be citizens or residents of the EU.
All expenses will be covered for selected hackers, and winners along each of the two tracks will receive €3,000.

Application, and more info, here.
Written and submitted from the Google Offices, Dubai (25.095N, 55.162E)

Categories
Data Policy Google Maps Thoughts

Evening all, what going on with these crime maps then…

So initially the  moral of this story seems to be, if you are launching a Government website across the mass media, make sure you do the load testing with 100x what you expect.

The real issue is that despite having best intentions and a commitment to transparency, it’s very easy to confuse, mislead and lose credibility with poor crime mapping.

One of the  key positives of UK police website is the availability of the data behind the site which can been downloaded or accessed via a REST based API, secondly and something which few commentators have mentioned a link to local police teams who are ultimately responsible for reducing crime at the local level. Of course one years aggregated data is of little value here, allowing only relative comparisons between locations to be made, the real value will come in the future years when trends are identifiable and hopefully may be linked to local policing initiatives.

Many have commented however on issues with the mapping where the site designers have tried to offer more detail than the previous ward level statistics by moving to reporting the actual location of crimes, as commonly found in American crime maps.

While this is something I personally think should be made available, the map is not actually shown the real locations.

Many crimes are not accurately located in the first place, and because of privacy concerns expressed by the Information Commissioners’ Office some locations have been modified, moved or aggregated so that the points displayed on the map do not actually represent the actual location of the crimes but are indicative of the location.  I think it’s clear that perhaps an American style crime map was intended but what have ended up with is an uncomfortable and misleading compromise.

The fact that the points don’t actually represent the locations of crimes is at one level understandable, but to most people a point on the map represents the location of something, so much of the uproar in the press calling into question the accuracy of the maps can be understood.

However because the underlying data is available, budding data visualisation experts and cartographers can get to work and attempt to produce maps and other visualisations that perhaps better represent the data, already Jonathan Raper’s team at placr have come up with this different visualisation, using a multiresolution grid rather than the less obvious neighbour/street locations.

I hope the Home Office is not put off by the criticism of this first attempt, if Government is really to be more open and make use of the web in tackling complex issues such a crime and the local perception of crime, they must follow the web philosophy of constant iteration and development.

So they must dust themselves down, listen to the criticism, and make the next version better; and the following version even better… but quickly !

Written and submitted from the Google Offices, London (51.495N, 0.146W)

Categories
Data Policy Thoughts

The Open Government Licence

Last week in addition to the new more open OS licesning, another in many ways more fundamental new license was introduced with little fanfare, but I would argue it’s impact if widely adopted could be far more important.

The new UK Open Government licence (OGL) developed by the National Archives, is a robust licence developed using Creative Commons like language for the specific purpose of distributing Government data. The OGL will be the become the default licence for UK Crown copyright, replacing the current Click-Use system and the data.gov.uk terms and conditions, and will therefore create a simple and consistent framework for the reuse of Public Sector Information.

The Key elements of the license are that a user may,

  • copy, publish, distribute and transmit the Infomation;
  • adapt the Information;
  • exploit the Information commercially for example, by combining it with other Information, or by including it in your own product or application.

There is a attribution clause which requires reference back to the OGL website where it’s possible.

This is a great step forward, we just now need to continue to push public sector bodies to release their information, as one more of the perceived barriers has been removed.

Written and submitted from the T3 BA Lounge (51.469N, 0.460W)
Categories
Data Policy Ordnance Survey

So is the OS derived data issue now solved ?

Well from reading a couple of press releases the signs look hopeful…

Both the OS and the Dept. of Communities and Local Government have announced the signing of the new Public Sector Mapping Agreement (PSMA), a sole source long term contract for providing mapping data to all of the public sector. I’m sure this has not gone down very well with other data providers, but that’s another topic I’m sure we will here all about in due course !

My question today is will this new agreement and the supposed more liberal licensing framework allow public sector organisations to publish their data online without restrictions imposed by the OS.

Specifically will the OS now allow local councils to publish their data using Google Maps or potentially add data to OpenStreetMap ?

Well the language is very positive..

“We’re opening the door to a world of government information that will allow the good ideas of ordinary people to become innovative digital solutions that improve public services.” says Local Government Minister Grant Shapps,

Chris Holcroft of the AGI talks about “Breaking down barriers and better enabling data sharing, the PSMA should help the public sector make better and more transparent decisions and allocate its resources more efficiently, saving time and money.”

The key passage from the OS press release is this..

“The new agreement also introduces a new licensing framework that will enable more collaborative working with delivery partners and will allow public sector organisations to re-use the data for core non-commercial public sector activities. It will also enable sharing of the data, and derived data, with other third parties for specific purposes to support delivery of the member’s public sector activity, for example, contractors, schools, ‘third sector’ charities, the public, for all your core, non-commercial, public sector activity.”

So maybe now the debate will move on from what is derived data to what is a “core activity”?

Still this all seems rather positive does it not, the proof of course will come in April next year when the PSMA comes into effect, and yes I’m sorry I know this is all rather confusing for my Australian readers as your PSMA is a whole different thing!

UPDATE : Paul from the OS Press Office has kindly responded on the OS blog, which I have now commented on, I have reproduced my comments below, but I suggest you follow the debate at the OS blog.

“Paul,

Thank you so much for responding publicly on this issue, so much of the discussion and relied on rumour and misinformed speculation, it is really very useful to have an official OS line on the matter.
I believe contrary to what you say derived data rights do remain core to the issue, however firstly I would like to clarify a few points you make.
Google does not claim any IP rights in data published either using the free or premier (paid for) maps API.
“Google claims no ownership over Your Content, and you retain copyright and any other rights you already hold in Your Content.” Makes this point quite clear.
The terms of service also clearly state;
“This license is solely for the purpose of enabling Google to operate the Service, to promote the Service (including through public presentations), and to index and serve such content as search results through Google Services”
To state that “Google claiming the right to use any data you display in Google Maps in any way it sees fit, even if it doesn’t belong to them.” is rather misrepresenting the facts.
If as a data publisher you are unable to agree to this requirement you are able to prevent you map from being indexed or appearing in search results by opting out using the well known robots.txt protocol. This is clearly stated in the terms of service.
Such terms of service are not unique to Google, most services which host user generated content have similar terms, indeed again contrary to your blog post OS Openspace contains the following in section 5.5 of its terms of service..
“However, for the period during which You incorporate and/or display Your Data on a Web Application, You shall grant to Us a revocable, world-wide, royalty-free, and non-exclusive licence to use, display and distribute Your Data on Your behalf, solely for the purpose of allowing Us to deliver the OS OpenSpace service to You and End Users.”
There does not appear to be any alternative to offering OS this rights unlike the Google Maps API.
So no onto Derived data…
At last years AGI conference, nearly 12 Months ago, following my presentation highlighting the problem of derived data, the OS promised to clarify what it views derived data to be and what is not derived data. This is key because many public sector bodies would like to publish their content using Google Maps but have been told by OS sales staff that they cannot as it is derived data.
No such clarification has been made as far as I am aware.
So Paul, Can you answer the following questions..
Can a Local Authority use Google Maps to publish the location of their local libraries, schools or recycling centres ?
Can Defra using Google maps to publish the location of restricted areas to manage any potential future agricultural disease event such as foot and mouth
Can the Royal Household use Google Maps to publish the destination of future visits of the Royal Family, perhaps opening a shinning new office building in Southampton ?
Look forward to reading your comments ?

Written and submitted from the Boulder Marriott (40.016N, 105.260W)

Categories
Data Policy Google Maps

Live Tube Map

A great early example of the value of the new TfL train prediction API, a map of the ‘Real time” locations of Tube Trains.

Produced by MySociety ace Matthew Somerville a really neat demo and another example of the value of releasing government datasets, and in this case an example of an occasion where an API is more useful than the raw data.

Written and submitted from the Google Offices, London (51.495N, 0.146W)

Categories
Data Policy

Image of the day : Democracy at work ?

Courtesy of last weeks drop of Ordanace Survey Open Data, the Westminster electoral boundaries in Google Earth.

It’s amazing to think that such a key dataset was not easily accessible until so recently. Without doubt this is just the type of data set that the “free our data” movement was calling to be made freely available.

The type of innovation that comes from the increased accessibility of information is well demonstrated by the code point web service developed by Stuart Harrison over the weekend at http://www.uk-postcodes.com/ and documented on the uk-government-data-developers mailing list.

What a difference a week makes !!

Written and submitted from the Google Offices, London (51.495N, 0.146W)
Categories
Data Policy Ordnance Survey

The OS free data licence

I have had a couple of questions about how the free OS data is licensed, here is the license which as you can see is basically a creative commons attribution license.

This confirms there are no derived data issues.

In fact this license makes OS Opendata more “open” than Openstreetmap.

Written and submitted from the Where 2.0 Conference (37.331N, 121.888W)

.