Can We De-Identify Data to Preserve Privacy?

By Michael Hendrix
June 20, 2014
General Foundation

It’s been an open question for some time whether personal data could be effectively de-identified. That is, could we handle and analyze mass datasets without exposing individual’s personal information?

Some have contended that the answer is “no.” But a recent paper by Ann Cavoukian and Daniel Castro argues otherwise.

“It is possible to strongly de-identify the data (and thus achieve a high degree of privacy), while at the same time preserve the required level of data quality necessary for data analysis. Maximizing both privacy and data quality enables a shift from a zero-sum paradigm to a positive-sum paradigm, a key principle of Privacy by Design. This doubly-enabling ‘win-win’ strategy avoids unnecessary trade-offs and illustrates that it is often possible to de-identify personal information in a manner that maintains both privacy and data quality.”

 

Cavoukian and Castro also argue that it’s important we get de-identification right and push back against any misconceptions.

 

“Advancements in data analytics are unlocking opportunities to use de-identified datasets in ways never before possible. Where appropriate safeguards exist, the evidence-based insights and innovations made possible through such analysis create substantial social and economic benefits. However, the continued lack of trust in de-identification and focus on re-identification risks may make data custodians less inclined to provide researchers with access to much needed information, even if it has been strongly de-identified; or worse, to believe that they should not waste their time even attempting to de-identify personal information before making it available for secondary research purposes. This could have a highly negative impact on the availability of de-identified information for potentially beneficial secondary uses.”


See also: