≡ Menu

When Sunlight Disinfects


One of the most highly publicized Open Government initiatives of the last few years got a thundering wakeup call last week, when Ellen Miller – Executive Director of the Sunlight Foundation – addressed succinctly, and with hard facts, the Open Data ‘Elephant in the room’. That is, the problem of missing or wrong data, contained within Government issued data-sets.

In her Gov 2.0 Summit presentation, Miller presented an Open Government Scorecard, with some frank views on the status of the movement, and the administration’s efforts to-date.

Her central thesis is that “the drive for transparency appears stalled”. This is because of a few reasons, but most relate to the Consistency, Completeness and Timeliness of USASpending.gov, and other Obama Administration Open Government initiatives:

  • On the Open Government Directive – Miller believes its objectives are teetering on the edge: ‘The plans that resulted were little more than aspirational. In the first of those plans, 12 out of 30 agencies didn’t identify any data for future publication and altogether only 75 new data sets were promised… That was hugely disappointing. Enforcement of these plans has always been ‘soft.’’
  • On Data.gov – ‘It started with enormous promise…But it’s still a pretty mediocre data repository and the types of data available remains an enormous concern.
  • On Recovery.gov – ‘It’s hard consider it more than a qualified success.’

Her primary focus, however, was on the federal spending website USASpending.gov – initiated by legislation from Barack Obama and Tom Coburn. Launched nearly three years ago, it was intend to provide the public with information about how the federal government spends tax dollars. Miller explains how it’s a visually impressive website, but believe the effort expanded on three extensive redesigns should have been used elsewhere. While applauding the usability of the site, her criticism centers on the substance:

Unfortunately, its data is almost completely useless…

To backup this claim ,she announced the launch of a new project called ClearSpending – which ‘tracks and illustrates just how broken the data [in USASpending] is’. The intention is that through making the problems with data quality transparent and easily identifiable, it will help improve accuracy within USASpending.gov. The problems, however, are huge:

What Sunlight has found, and Clearspending shows in great detail, is that more than $1.3 trillion in federal reporting data from 2009 is unreliable. The data inaccuracies we uncovered account for 70 percent of the total $1.9 trillion in government spending data reported in that year. Some of the numbers are too big, some are too small and some are missing completely, while other spending data entries don’t have the detail that’s required or were reported months later than the law demands.

In her concluding remarks, she says:

The data powering USASpending is broken. You can’t trust any aggregate numbers you get from the site — answers to questions about federal spending that rise above the micro level. When we say things just don’t add up, we mean it…

We are beginning to worry that the Administration is more interested in style than substance.

‘More interested in style than substance’

This insinuation, however, has caused some in the Open Government movement to hit back, and challenge the tone of the speech. The first to address this was Gunnar Hellekson (Chief Tech Strategist for Red Hat), who pronounced the speech as ‘poisonous’, and neglecting the fact that citizens now have more information available to them than ever before:

The keynote was a remarkable turn: the administration was completely eviscerated by one of its closest allies..The fact that the US government is even attempting this is amazing.

He goes on to explain that imperfection and risk should be tolerated, and that while some of the data is ‘ridiculous’, this is one of benefits of data transparency i.e. public scrutiny:

Sunlight’s $1.3 trillion discovery is an example of the process working, not a failure…You’re just seeing how hard it is for one of the largest, most complicated organizations on the planet to keep its records straight.

His thesis is that Transparency and Open Data does not reform make. Rather, it provides the impetus and evidence based reasoning for changes to occur:

Sunlight has, I think, dangerously conflated transparency for reform. You get transparency first, and that compels reform. That’s the whole point. You don’t ask for perfection right out of the gate, it’s unreasonable….The solution is a long, difficult, complicated, and unpleasant series of reforms that produce better quality data. That requires patience, diligence, perseverance.

In Sunlight’s response, they agree that while perfection is the enemy of the good, many of the totals are not even close. They highlight that ClearSpending reveals 70% of the totals analysed were flawed. As such, Sunlight’s Tom Kitt worries it has to potential to ‘mislead a lot of people’, and affect trust in the entire initiative. In the end, his primary concern relates to timescales and the lack of urgency from OMB and GSA in fixing the data systems powering USASpending.

Tough love in the Open Government movement

Nevertheless, Gunnar recognises that “Sunlight has done the right thing here by doing real and substantial work”, others believe they’ve gone too far in calling out the emperor’s clothes. Derek Willis believes that:

Sunlight hasn’t earned the right to say that the government is “more interested in style than substance”.

This is because “It’s about the process, the culture, an entirely new way of doing things”. So rather than believing in the data, what’s more important is the site, the initiative, and the changes/legislation/directives that have facilitated this analysis. To coin an equestrian analogy (as Nancy Scola does when she says Sunlight are “prodding the Obama administration in the direction that it want it to go, like you do with a horse”); it’s better to bet on the horse, than the race.

In the end, he and others recognise the objective of the initiative, but perhaps feel more recognition is needed as to the success in creating platforms where all this data can be critiqued:

I’m grateful that organizations like Sunlight are pushing for greater access to accurate public data…But just as government processes can seem alien and counterproductive at times, so can those of transparency advocates.

Data accuracy – a shared issue

One of the most interesting aspects of ClearSpending, is not that it calls out agencies on data quality (this has been highlighted many times before), but rather that such an in-depth analysis could be undertaken in the first place.

Sunlight evaluated the data quality based on a methodology that has been used by the Government Accounting Office , and checked data against the Federal Awards and Assistance Data System. This kind of reconciliation helped to create ClearSpending, and is useful when understanding how other datasets could be checked.

The quality of procurement data released by governments is not just a US issue. When the UK government released extracts of the Combined On-line Information System (COINS), containing expenditure by UK Government Departments over £25K for the years 2008-2010, the guidance document explained (my emphasis):

The data on COINS are quality-assured and complete at the level at which they are required for the following purposes: fiscal management; operational publications (e.g. Main and Supplementary Estimates); and statistical publications (e.g. Public Expenditure Statistical Analyses, the joint ONS/Treasury Public Sector Finances statistical bulletin and the National Accounts).

Lower levels of data are not quality assured by the Treasury. Individual departments can to some extent choose the level of granularity that they use within pre-defined aggregates set by the Treasury. Lower level detailed data may therefore appear incomplete and be inconsistent across departments.

While, this did at least explain how some lower level data may-not be accurate, the guidance for local authority spending has no such caveats. This guidance, published on Friday, provides details for local government on how to comply with the Prime Minister’s call to publish each financial transaction over £500 from January 2011.

The guidance makes no remarks as to the quality of the data released. The principle stated is to ‘Publish raw data quickly’, rather than to try to make sure it’s accurate first. Indeed, Tim-Berners-Lee’s “Putting Government Data online” – to which the guidance refers – makes no mention of data quality either. It appears, that the emphasis is on publishing data – in any format (39% published their spending in PDF format only) – rather than checking its consistency or accuracy.

The remit of the recently initiated UK Public Data Transparency Board is to ensure tight deadlines are met for releasing key datasets, and that open data standards are adhered to. Their draft Public Data Principles make no mention of data accuracy or integrity. As such, it looks like this task – to ensure data accuracy – falls to the Gov 2.0 community. As Ellen Miller says:

For starters, we have to take on some of the responsibility for making this happen ourselves – I mean ‘us’ as in the community of Americans [read British] who are concerned about accountability…

Our job is to hold the Administration’s [read Coalition/Local Government’s] feet to the fire – bureaucrats aren’t going to act just because someone asks nicely. Government isn’t going to change how and when it makes data available – even when a few good people on the inside want it to – because of a directive…

And finally, we need to admit that Gov2.0 isn’t happening until citizens are truly actively engaged in helping to demand and co-create it.

For the promise of Gov 2.0 to be realised someone is going to have to undertake the less glamorous tasks of checking data accuracy and verifying it against available datasources. Achieving this should not be a case of carrots or sticks, but should be up to government themselves. They should want to achieve high data quality standards, because it helps them, and furthers the purpose of the Gov 2.0 mission. When this happens, we’ll know that the Gov 2.0 movement has achieved an important milestone. Getting there won’t happen, however, until as Miller says “citizens are truly actively engaged in helping to demand and co-create it”. Now where’s that Open Data Quality bandwagon?


(Photo credit: smcgee on Flickr)

{ 1 comment… add one }

    Leave a Comment

    Page optimized by WP Minify WordPress Plugin

    Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
    This work by http://www.rfahey.org is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported.