CameronLaird

Problems and Opportunities in Government Data

by Cameron Laird (CameronLaird) on 16-12-2011 06:56 AM - last edited on 16-12-2011 08:31 AM by Administrator

Tax receipts in France!  Radio licenses in Nebraska!  Lobbyists in Washington, DC!  Listen to any media outlet, and you’ll soon catch at least a hint of the large and growing enthusiasm for “open government data.” The movement is worldwide, and it boasts plenty of success already.

It’s not a one-way street, though. There are several areas in which public access to government data is closing. To understand why is a bit complicated.

It's easy enough to list the trends to watch:

  • data loss
  • privatization
  • mis-publication
  • greater respect for privacy and other citizen concerns, and
  • excessive secrecy

It's considerably harder to describe their many subtle interactions and sources. In investigating this area, belief in simple "conspiracies" would be a comfort in comparison with the complexity of motivations actually involved. What is possible in a brief survey, though, is to highlight a few incidents that turn out to be characteristic of significant larger movements.

Data Loss

"Data loss" is a good example. Evidence from old court proceedings goes missing all too frequently. Sometimes it happens from what appears to be malfeasance, often just as clerical error, occasionally rather mysteriously. As near as I can to determine, no one really knows enough about the facts to generalize causes with much confidence. What's certain, though, is that there are losses. This is a stark and even poignant contrast with our technical gains into leveraging evidence. Single hairs or smudges, when viewed with the aid of modern molecular genetics or image analysis, contribute mightily to conclusions of guilt or innocence. They can't do so, though, when entirely absent.

Other government agencies also lose what they had. Epidemiologist Dr. Wayne Shandera, a co-author of the first description of AIDS and now with the Baylor College of Medicine, recounted to me his own research into malaria. He happened to call the Texas Department of Health on the same day the Houston office was discarding its written records on malaria. The Department was rational; malaria was largely eliminated by 1952 in the United States, so the Department saw retention of the data as pointless.

In other cases, though, datasets once laughed off for their obscurity eventually proved seminal in new discoveries having to do with climate change, genetic disorders, crop improvement, and a host of other scientific advances. Who knows what breakthroughs have already gone out in the trash?

The point for now, however, is not to judge loss as good or bad. The essential first step is to recognize how frequent it is, and how much different actors' views of it diverge. Understaffed workers in an office struggling to cope with today's demands rightly feel impatient when diverted to sort through or maintain dusty backroom boxes. Those same musty files, in the right hands, might be as exciting as any finds from a pharaoh's tomb.

If you or your organization rely on specific data archived by a government agency, you owe it to yourself to verify the agency's commitment to the archive, and what your options are to replicate the data. Perhaps the right action is to scan it and post it online in a digital and index-able form.

Privatization

A variant on data loss is privatization. Think for a moment of 2011's Hurricane Irene: It was simultaneously a natural disaster, and a governmental success. Dozens of agencies co-operated usefully to disseminate accurate statuses and predictions to minimize the risk and disruption of the communities along the Atlantic seaboard.

The National Oceanic and Atmospheric Administration (NOAA) was a leader in this episode. By the time of the next incident, though, the NOAA will likely lose many of its technical capabilities. Congress has determined that weather satellites, hurricane-spotting airplanes, tsunami-trackers, and other such tools belong in the hands of private rather than public organizations.

In extreme but surprisingly-common cases, even datasets collected through government grants are available only by payment to private organizations. It's easy to caricature these arrangements as unjust and prejudicial; whatever the arguments in a particular case, there's no doubt that privatization is a powerful trend.

What are the consequences? In many cases — financial companies, food-packing, mineral extraction, pharmaceutical research, delivery of supplies to military units, and more — there is effectively no independent public source for crucial data. Information about drug trials, oil-spill clean-ups, school lunches, and weapon effectiveness is held by the same people who have an incentive, if not a fiduciary responsibility, to keep the data as private as possible. Many data sets once maintained with pride as civic trusts are simply slipping out of public hands.

Notice that it can even happen simultaneously that open government initiatives succeed in making a dataset more available, at the same time as the data themselves become "brittle" because their collection was subcontracted or mandated to proprietary companies.

Mis-publication

Of course, public agencies have plenty of blemishes in their own publication records. In one recent example, the U. S. Senate finally caught up with the mandate of the Legislative Branch Appropriations Act of 2009 by putting records of its own spending online. This first publication shows typical quality errors, including inconsistent date labels and documented omissions. What is present is a single 12 megabyte PDF. Whether it was a deliberate choice to complicate analysis or the result of engineering ignorance, this reality falls far short of the possibilities of current technology. The Sunlight Foundation might, with some delay, re-publish the data as text, but there are apparently no plans yet to tabulate them as SDMX (Statistical Data and Metadata Exchange) or XBRL (eXtensible Business Reporting Language) repositories or application programming interfaces (APIs).

More generally, "publication" or "transparency" is a spectrum rather than a status. An agency can be more or less complete, accurate, and precise with its data. Specialists applaud the White House for its disclosures about certain Federal construction initiatives, at the same time as they question how representative those "14 handpicked projects" are. The format of publication also ranges widely in ease-of-use, from the moral equivalent of microfiche look-up all the way to fully-indexed, searchable datasets with automation interfaces.

Privacy and Other Public Concerns

Public expectations for privacy have changed through the years, and legislation tracks this in part. Most famously, the Health Insurance Portability and Accountability Act (HIPAA) of 1996 enormously changed the treatment of medical data. More accurately, HIPAA extended to the country at large the expectations for medical privacy that privileged patients long enjoyed.

Treatment of other kinds of records, including educational and financial ones, have also changed through recent decades. Counteracting this trend is the "amateurization" of media which has the effect of pressuring government to reveal details that earlier generations wouldn't have expected. Amateurization also results often in a situation where outsiders have more expertise on a subject than the government servants nominally in charge of a dataset.

Finally, the national government faces grave challenges of the sort David Gewirtz describes in his recent lecture to the National Defense University; each of these challenges will inevitably result in changes to the way the government handles data. Consistent handling of intellectual property, for example, is so thorny that the US government has only begun to address what is involved.

Excessive Secrecy

Finally, public servants hide information for reasons other than the privacy or protection of the public. These other reasons invariably are problematic. As Gewirtz documented in a different presentation, "All presidents [and many governors] hide e-mail," even when statutes seem to prohibit this. There are also abundant instances of data withheld from the public that seem to have only tenuous connections to the national security or criminal investigation their secrecy is supposed to promote. In fact, the evidence of the best-documented cases, including the landmark 1953 Supreme Court Reynolds case on the State Secrets Privilege, is that "national security" has more to do with keeping information from citizens or governmental rivals than from military or criminal enemies. Even though the Third Circuit Appeals Court explicitly declared in 2005, regarding Reynolds, that "there was no fraud," a daughter of one of the original claimants summarized that, for her, "It's such an important case, and it's based on a lie."

One sad innovation of recent years is that the US has borrowed from English practice and now frequently resorts to a variety of secrecy and gag orders which themselves can't be revealed. A system administrator, for example, can be ordered to monitor traffic, and not only tell no one about the monitoring, but not even disclose that his own speech is restricted.

Crudely, privacy is a value, and secrecy suspicious. In that regard, the two belong in separate categories. From the outside, though, it can be hard to distinguish them, and their impacts on government transparency are similar.

A Mixed Bag

"Open data" has taken off. The opportunities for "evidence-based journalism" are better than ever before, and it's at least heartening to survey the International Aid Transparency Initiative, the record modernization mandate, and a wealth of smaller projects that put data back in the hands of the citizenry.

But keep an eye out for the five "counter-trends" described above; the limits they put on government transparency are crucial to an accurate picture.

See also:

Comments
by Dan Nguyen(anon) on 16-12-2011 07:41 AM

Excellent overview of the issues. However, I have a couple of suggestions:

For the sub-topic of Privitization, the handing over of crime data to sites like CrimeReports stifles the ability for the public to examine records in any meaningful way beyond clicking on markers to find out about crimes that are several months old.

 

For the sub-topic of Privacy/Secrecy, the recent kerfluffle over the National Pracitioner Data Bank is a disheartening development. A database that had been public for a long while was pulled because a newspaper used it to help corroborate its story on undisciplined doctors. It was re-uploaded by the government but came with a number of strings that are likely to result in a constitutional law battle

 

 

by Cameron Laird (CameronLaird) on 16-12-2011 08:11 AM

I can't agree more, Dan Nguyen:  these are prime examples of the paradoxes of "open data" that are happening on a daily basis.  Thanks for the references to the particular cases.  I'm still trying to figure out what more citizens can do than to continue to shine a light on the specific details and facts of each situation as it arises. 

Post a Comment
Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.

The HP Input Output site is sponsored by HP and features articles and content from HP and third-party contributors. Third-party articles and content, while paid for by HP, do not necessarily represent the views and opinions of HP. HP does not endorse this content and is not responsible for its accuracy, availability and quality.

Follow Us
Spotlight
The Permissions Your Database Users Really Need (Video) The 16 Linux Shell Commands Every Desktop Linux User Should Know 7 Deadly Sins of Job Searching: Why You Still Don't Have a Job, and How to Get Back on Track 9 Tech Analogies That No Longer Mean Anything To Those Young Whippersnappers
┼ Based on energy, paper and toner savings from regular printer usage. Results may vary.