Translucent Databases

Author: Peter Wayner
This Month Hacker News 1


by specialist   2022-07-22
The great irony is that actual privacy requires unique identifiers, like RealID or equiv.

GUIDs unlock the Translucent Databases achievement, actual per field encryption of PII data at rest. TLDR, clever applications of salting and hashing, just like with proper password storage.

I was utterly against RealID, until I figured this out. Much chagrin. Super embarrassing.

Source: Worked on both electronic medical records and protecting voter privacy. Did a translucent database POC for medical records, back in the day.

If there's another technical solution, I haven't found it.

But I think to your point, people generally don't want the sensitive data being collected in the first place. I don't have an answer for that.

by specialist   2022-01-02
Two tangential "yes and" points:


I'm not smart enough to understand differential privacy.

So my noob mental model is: Fuzz the data to create hash collisions. Differential privacy's heuristics guide the effort. Like how much source data and how much fuzz you need to get X% certainty of "privacy". Meaning the likelihood someone could reverse the hash to recover the source identity.

BUT: This is entirely moot if original (now fuzzed) data set can be correlated with another data set.


All PII should be encrypted at rest, at the field level.

I really wish Wayner's Translucent Databases was more well known. TLDR: Wayner shows clever ways of using salt+hash to protect identity. Just like how properly protected password files should be salt+hash protected.

Again, entirely moot if protected data is correlated with another data set.

Bonus point 3)

The privacy "fix" is to extend property rights to all personal data.

My data is me. I own it. If someone's using my data, for any reason, I want my cut.

Pay me.

by specialist   2019-08-27
re: IRMA

I've been thinking about negotiated disclosure since the mid 90s. Back then we called it faceted personas. In an effort to protect oneself from aggregators of demographic data.

I've gotten nowhere.

TLDR: 99% certain deanonymization will always prevail.

Not saying I'm right. I'm not particularly smart or insightful. I just try to apply ideas foraged from academia to real world problems. Alas, the times I've slogged thru the maths and algos, I'm always left befuddled. I'm just not clever enough to figure out all the attack vectors. (I'd make a terrible criminal.)


re: Privacy by Design

That means Translucent Databases. Where all data at rest is encrypted. Just like you salt and hash password files.

This book details clever applications of that strategy to real world problems:

Mea culpa: I'm still unclear how GDPR's tokenization of PII in transit works in practice. Anyone have some sample code? And I still don't see how it protects data at rest.


Source: Design, implemented, supported some of the first electronic medical records exchanges (BHIX, NYCLIX, others). Worked on election integrity for a decade, including protecting voter privacy (secret ballot).


Prediction: Accepting de-anon will always win in the long run, we'll eventually also accept that privacy has a half-life. To adjust, we'll adapt differential privacy algos to become temporal privacy.