Want to Know What Kind of Data the Government Keeps on You?

The release, expected as early as March, will provide vast information about how government agencies keep data, what is published, and what is kept secret.
Want to Know What Kind of Data the Government Keeps on You?
Jonathan Nackstrand/AFP/Getty Images
Updated:

A massive index of government data kept on U.S. citizens could be publicly released as soon as the first week of March. It is believed to be the largest index of government data in the world, according to the open information group that requested it, the Sunlight Foundation.

Based on a Freedom of Information Act (FOIA) request filed more than a year ago, the release will “create a more complete picture of the government’s data holdings,” according to the Sunlight Foundation.

The data index, kept by the Office of Management and Budget (OMB) is based on something all federal agencies keep called Enterprise Data Inventories (EDIs). The inventories detail the information that a given agency holds.

Once released, it will be an unprecedented look at what data federal agencies government-wide collect on Americans. The EDIs stem from a 2013 order by President Barack Obama that mandated the construction and maintenance of agency data indexes. Part of that order required that the data be “open and machine readable.”

Until now, the public hasn’t known what they don’t know. In other words, there’s been know way to know what information the government doesn’t or doesn’t collect. 

The thing is the most interesting is actually figuring out why certain private data sets are listed as private.
Matt Rumsey, Sunlight Foundation

“The thing is the most interesting is actually figuring out why certain private data sets are listed as private,” said Matt Rumsey, director of the Advisory Committee on Transparency for the Sunlight Foundation, who worked on the FOIA request and will be involved in analyzing the data.

He said the Foundation expects to see a spreadsheet of the names of the data sets.

“Each data set has some other [related] information about it,” he said. Some is already available online, while other data will have information about who to get into contact with for more details. “It’s supposed to explain why that data isn’t public.”

Reasons that information isn’t already online and available could range from it being only in a paper format to it being subject to a FOIA exemption. At this point, Rumsey said that the Sunlight Foundation doesn’t know exactly what they will get.

“We’re not sure exactly what it will look like.”

They do anticipate that campaign finance data will be particularly interesting. In general, the Foundation wants to help people like journalists get raw information to create informative visualizations, similar to the work that’s been done recently with Dept. of Transportation data.

The Foundation will not know how they can pass the information on to the public until they see it, and are hopeful that OMB will make a change to their policy and will post the data widely.

“The goal will be to get it out to the widest audience possible,” said Rumsey. “We’ve been working on this for a long time. It’s hard to make intelligent choices about data release without understanding what kind of data exists.”

Why data is private will help people have more informed conversations.

They are particularly interested in the large amount of data that the Department of Justice has.

The Sunlight Foundation said in a statement about their FOIA request for the EDIs hat it is a “significant victory for the open government and open data movement.” The organization said it believes that the new information will create a more complete picture of what data the government is holding and it will help the public decide how to use that data.

“There’s no guarantee that agencies’ indexes will contain all the data that it should (because they are iterative, living documents, many of which are still in various stages of construction),” reads the statement.  

Despite that, whatever information is contained will provide vast information about how government agencies are keeping data, what is being published, and what is being kept secret.