How the US Census Bureau works to improve data privacy is a lesson for businesses

View all on-demand sessions from the Intelligent Security Summit here.


Americans’ data privacy concerns are on the rise. There is a lack of far-reaching legislation at national level, such as that of Europe AVG laws Americans feel weary and vulnerable to data collection by both corporations and the government.

According to Pew research, 81% say the risks outweigh the benefits of collecting data from companies, and 61% feel the same way about collecting government data. And it’s not just talking – 52% say they have decided not to use a good or service due to data collection and privacy concerns.

Federal lawmakers are working to address this. 2021, 27 privacy accounts were adopted by states aimed at controlling the casual handling and sale of personal data by the tech industry. So far in 2022, Utah and Connecticut joined California, Colorado, and Virginia, among others, in passing their own state data privacy laws — which take effect in 2023.

“One of the most important things about it dataPrivacy is that privacy is contextual,” said Os Keyes, a Ph.D. candidate in the Department of Human Centered Design and Engineering at the University of Washington, who conducts research in data ethics, medical AI, facial recognition, gender, and sexuality.

Event

Intelligent Security Summit on demand

Learn the critical role of AI and ML in cybersecurity and industry-specific case studies. Check out on-demand sessions today.

Look here

Data, Keyes explained, can become deanonymized quickly when put in context with other data about you. One data set combined with another from another source can reveal a lot quite quickly, and that can sometimes get dangerous.

“All you have to do is link existing datasets together,” says Keyes.

Government agencies, such as the US Census Bureau, are scrutinizing their data privacy practices and responsibilities. Looking ahead to the 2030 Census, the Bureau opened a comment period for experts like Keyes to think about data anonymization efforts and how they can be improved before collecting the next decade’s worth of data.

Testing datasets to see what works to find what doesn’t

Keyes and colleague Abraham (Abie) Flaxman, an associate professor of health metrics and global health at the University of Washington, set out to develop a key hypothesis for the Census Bureau: Can transgender teens be revealed and identified using simulated datasets?

The unfortunate answer, the two found, was yes. Using the Census Bureau’s data anonymization approach from the 2010 Census, Keyes and Flaxman were able to identify 605 transgender teens. Although it was a simulation to test specifically for this purpose, it turns out how easy personally identifiable information (PII) can be de-anonymized, which in the case of transgender teens could put them at risk for hate crimes or put their parents at risk for child abuse costs of seeking gender-affirming medical care for their child – depending on where they live.

“We used simulated data designed to mimic the datasets the Census Bureau releases publicly and tried to re-identify trans teens, or at least narrow down where they could live, and unfortunately we succeeded,” say they. wrote in one piece before The Scientific American.

While alarming, the results of the simulation are why the Census Bureau has opened a comment period – to see what might not be working and where they can improve so that it doesn’t actually happen in the future.

“We were encouraged that Os and Abie’s work helps validate our concerns and decisions for 2020 and beyond,” said Daniel Kifer, senior advisor for formal privacy to the Census Bureau’s Decade Avoidance Development Team. “Privacy is mainly about protecting how you differ from everyone else; perceptions of what information is private can change over time; data can be misused and attacked in many different ways that are hard to anticipate.”

The limits of privacy protection

Kifer pointed out that while this happened with the Census Bureau’s approach to the simulated data in 2010, Keyes and Flaxman’s simulation still “can’t beat random guessing when the attacker uses the Census Bureau’s demonstration data products based on the 2020 Census disclosure system, but is much more successful against legacy techniques the agency used prior to the 2020 ten-year product releases.”

The 2020 product release was a new differential privacy approach specifically aimed at improving privacy protections for census data.

Keyes and Flaxman confirmed Kifer’s claim, saying that when they used the Census Bureau’s new approach to data privacy, it reduced the identification rate of transgender teens by 70%. All three underlined the importance for the agency to continue its work and become even better before it embarks on its 2030 Census undertaking.

“The Census Bureau has come back to say it is not possible to cut 100%. They believe there’s always some kind of accidental disclosure — and I think they’re right about that,” Flaxman said. “So we’ve had this back and forth with the Bureau, where we’ve been trying to figure out what the line is of protecting privacy and have they reached it? I think it’s pretty clear to me at this point that their machine is capable of achieving that kind of optimal privacy. They are now in the stage of making their final decision on where to set the knobs on their machine to improve it for 2030.”

Designing better data privacy

Founded in 1902, the Census Bureau is probably not what most think of when they look at who is at the forefront of data innovation with a machine capable of maximizing privacy, but the bureau actually has a long history to do just that.

“Part of this innovation is driven every 10 years by the 10-year census and the significant scrutiny it receives,” Kifer told VentureBeat. “As the largest federal agency of statistics, the Census Bureau conducts other surveys and also collects statistical data on behalf of other agencies. Necessity and access to data has given the Census Bureau a huge advantage in innovating collection, analysis, and dissemination, as well as finding new uses for the data.

Many of the Bureau’s innovations in data privacy and collection, Kifer explains, come from research communities that have worked to turn privacy into “a mathematical science compatible with policy and regulation.”

Continuing to find ways to innovate data collection and privacy practices is important not just for the Census Bureau, he explained, but for the entire U.S. federal statistical system.

“High-quality data is needed to support policy decisions,” says Kifer. “The population is changing, the important policy questions are changing and the need for data is changing.”

When data needs change, one of the Census Bureau’s goals is to adapt, as the bureau’s access to data and the latest research drives its innovation even further.

The way a 120-year-old government agency can become fast, proactive and agile to adapt to changing data and the needs of the population says a lot about the games in other industries that argue that privacy is too challenging to adapt to. pass, Keyes and Flaxman pointed out .

“It tells us there’s a tension in privacy, which we know more or less abstractly,” Keyes said. “This tension is really worth paying attention to. This idea, as some big data hype people say ‘privacy is dead’, it really isn’t. What we see here is not only proof that we shouldn’t just throw away privacy, but also that there are techniques to protect people thoughtfully and wisely… There are all kinds of stereotypes that government is the problem rather than the solution. I think it’s nice to see an example where the US Census is actually leading the way.”

No excuse for not prioritizing data privacy

What really highlights this, Keyes and Flaxman agreed, is that private companies have no excuse for not prioritizing data privacy — or claiming they can’t be perfect in the face of regulations that force them to do so.

Because the Census Bureau must consider privacy part of its job, it has found a way to do this while optimizing privacy to extract policy-influencing insights from data without sacrificing innovation, Keyes explains.

“I think it’s a really interesting example to hear people say, ‘Oh, you can’t regulate the private industry around privacy, because it will kill innovation, and it won’t work.’ Well, here we have an example of both things being false,” Keyes said.

“Not only will it work,” Keyes said, “but the Census Bureau is actually responsible for a lot of really interesting and intricate privacy protection mechanisms, and also answering questions like, OK, how do we link records between datasets in a way that’s robust is if we have this privacy protection?’ “They’re under heavy regulation and they’re still innovating. A big part of the lesson is that there’s no contradiction between regulation and doing things better. At least it’s the other way around.”

VentureBeat’s mission is to become a digital city plaza where tech decision makers can learn about transformative business technology and execute transactions. Discover our Briefings.