Problems and Opportunit ies in Government Data
Tax receipts in France! Radio licenses in Nebraska! Lobbyists in Washington, DC! Listen to any media outlet, and you’ll soon catch at least a hint of the large and growing enthusiasm for “open government data.” The movement is worldwide, and it boasts plenty of success already.
It’s not a one-way street, though. There are several areas in which public access to government data is closing. To understand why is a bit complicated.
It's easy enough to list the trends to watch:
- data loss
- privatization
- mis-publication
- greater respect for privacy and other citizen concerns, and
- excessive secrecy
It's considerably harder to describe their many subtle interactions and sources. In investigating this area, belief in simple "conspiracies" would be a comfort in comparison with the complexity of motivations actually involved. What is possible in a brief survey, though, is to highlight a few incidents that turn out to be characteristic of significant larger movements.
Data Loss
"Data loss" is a good example. Evidence from old court proceedings goes missing all too frequently. Sometimes it happens from what appears to be malfeasance, often just as clerical error, occasionally rather mysteriously. As near as I can to determine, no one really knows enough about the facts to generalize causes with much confidence. What's certain, though, is that there are losses. This is a stark and even poignant contrast with our technical gains into leveraging evidence. Single hairs or smudges, when viewed with the aid of modern molecular genetics or image analysis, contribute mightily to conclusions of guilt or innocence. They can't do so, though, when entirely absent.
Other government agencies also lose what they had. Epidemiologist Dr. Wayne Shandera, a co-author of the first description of AIDS and now with the Baylor College of Medicine, recounted to me his own research into malaria. He happened to call the Texas Department of Health on the same day the Houston office was discarding its written records on malaria. The Department was rational; malaria was largely eliminated by 1952 in the United States, so the Department saw retention of the data as pointless.
In other cases, though, datasets once laughed off for their obscurity eventually proved seminal in new discoveries having to do with climate change, genetic disorders, crop improvement, and a host of other scientific advances. Who knows what breakthroughs have already gone out in the trash?
The point for now, however, is not to judge loss as good or bad. The essential first step is to recognize how frequent it is, and how much different actors' views of it diverge. Understaffed workers in an office struggling to cope with today's demands rightly feel impatient when diverted to sort through or maintain dusty backroom boxes. Those same musty files, in the right hands, might be as exciting as any finds from a pharaoh's tomb.
If you or your organization rely on specific data archived by a government agency, you owe it to yourself to verify the agency's commitment to the archive, and what your options are to replicate the data. Perhaps the right action is to scan it and post it online in a digital and index-able form.
Privatization
A variant on data loss is privatization. Think for a moment of 2011's Hurricane Irene: It was simultaneously a natural disaster, and a governmental success. Dozens of agencies co-operated usefully to disseminate accurate statuses and predictions to minimize the risk and disruption of the communities along the Atlantic seaboard.
The National Oceanic and Atmospheric Administration (NOAA) was a leader in this episode. By the time of the next incident, though, the NOAA will likely lose many of its technical capabilities. Congress has determined that weather satellites, hurricane-spotting airplanes, tsunami-trackers, and other such tools belong in the hands of private rather than public organizations.
In extreme but surprisingly-common cases, even datasets collected through government grants are available only by payment to private organizations. It's easy to caricature these arrangements as unjust and prejudicial; whatever the arguments in a particular case, there's no doubt that privatization is a powerful trend.
What are the consequences? In many cases — financial companies, food-packing, mineral extraction, pharmaceutical research, delivery of supplies to military units, and more — there is effectively no independent public source for crucial data. Information about drug trials, oil-spill clean-ups, school lunches, and weapon effectiveness is held by the same people who have an incentive, if not a fiduciary responsibility, to keep the data as private as possible. Many data sets once maintained with pride as civic trusts are simply slipping out of public hands.
Notice that it can even happen simultaneously that open government initiatives succeed in making a dataset more available, at the same time as the data themselves become "brittle" because their collection was subcontracted or mandated to proprietary companies.
Mis-publication
Of course, public agencies have plenty of blemishes in their own publication records. In one recent example, the U. S. Senate finally caught up with the mandate of the Legislative Branch Appropriations Act of 2009 by putting records of its own spending online. This first publication shows typical quality errors, including inconsistent date labels and documented omissions. What is present is a single 12 megabyte PDF. Whether it was a deliberate choice to complicate analysis or the result of engineering ignorance, this reality falls far short of the possibilities of current technology. The Sunlight Foundation might, with some delay, re-publish the data as text, but there are apparently no plans yet to tabulate them as SDMX (Statistical Data and Metadata Exchange) or XBRL (eXtensible Business Reporting Language) repositories or application programming interfaces (APIs).
More generally, "publication" or "transparency" is a spectrum rather than a status. An agency can be more or less complete, accurate, and precise with its data. Specialists applaud the White House for its disclosures about certain Federal construction initiatives, at the same time as they question how representative those "14 handpicked projects" are. The format of publication also ranges widely in ease-of-use, from the moral equivalent of microfiche look-up all the way to fully-indexed, searchable datasets with automation interfaces.
Privacy and Other Public Concerns
Public expectations for privacy have changed through the years, and legislation tracks this in part. Most famously, the Health Insurance Portability and Accountability Act (HIPAA) of 1996 enormously changed the treatment of medical data. More accurately, HIPAA extended to the country at large the expectations for medical privacy that privileged patients long enjoyed.
Treatment of other kinds of records, including educational and financial ones, have also changed through recent decades. Counteracting this trend is the "amateurization" of media which has the effect of pressuring government to reveal details that earlier generations wouldn't have expected. Amateurization also results often in a situation where outsiders have more expertise on a subject than the government servants nominally in charge of a dataset.
Finally, the national government faces grave challenges of the sort David Gewirtz describes in his recent lecture to the National Defense University; each of these challenges will inevitably result in changes to the way the government handles data. Consistent handling of intellectual property, for example, is so thorny that the US government has only begun to address what is involved.
Excessive Secrecy
Finally, public servants hide information for reasons other than the privacy or protection of the public. These other reasons invariably are problematic. As Gewirtz documented in a different presentation, "All presidents [and many governors] hide e-mail," even when statutes seem to prohibit this. There are also abundant instances of data withheld from the public that seem to have only tenuous connections to the national security or criminal investigation their secrecy is supposed to promote. In fact, the evidence of the best-documented cases, including the landmark 1953 Supreme Court Reynolds case on the State Secrets Privilege, is that "national security" has more to do with keeping information from citizens or governmental rivals than from military or criminal enemies. Even though the Third Circuit Appeals Court explicitly declared in 2005, regarding Reynolds, that "there was no fraud," a daughter of one of the original claimants summarized that, for her, "It's such an important case, and it's based on a lie."
One sad innovation of recent years is that the US has borrowed from English practice and now frequently resorts to a variety of secrecy and gag orders which themselves can't be revealed. A system administrator, for example, can be ordered to monitor traffic, and not only tell no one about the monitoring, but not even disclose that his own speech is restricted.
Crudely, privacy is a value, and secrecy suspicious. In that regard, the two belong in separate categories. From the outside, though, it can be hard to distinguish them, and their impacts on government transparency are similar.
A Mixed Bag
"Open data" has taken off. The opportunities for "evidence-based journalism" are better than ever before, and it's at least heartening to survey the International Aid Transparency Initiative, the record modernization mandate, and a wealth of smaller projects that put data back in the hands of the citizenry.
But keep an eye out for the five "counter-trends" described above; the limits they put on government transparency are crucial to an accurate picture.
See also:
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Email to a Friend
- Printer Friendly Page
- Report Inappropriate Content








