About

De-identification, in plain language

A sticker-checking game for robots: how good are different tools at finding every private detail in a therapy transcript β€” without erasing the parts a therapist still needs?

De-identification is not anonymisation

The two words get used interchangeably, but they make very different promises β€” and the difference is the reason this benchmark exists.

  • De-identification is a process: finding and masking the identifying details in a text β€” names, phones, and the subtler clues. It lowers the risk of recognising someone; it does not promise zero. Whatever the tools miss, leaks.
  • Anonymisation is a result β€” and in European data-protection law (GDPR) a strong legal claim: the data can no longer be traced back to a person by any means reasonably likely to be used, irreversibly. Truly anonymous data falls outside the GDPR altogether.

CONFIDE deliberately says de-identification. A redacted therapy transcript is rarely anonymous in the legal sense β€” a handful of surviving everyday details (an age, a profession, a city) can still single someone out. So instead of claiming anonymity, the benchmark measures the gap: what the tools caught, what slipped through, and how close to anonymous the result actually is (the residual-risk section of the report).

A single figure pinpointed by four ordinary attributes β€” age, workplace, city, medication β€” each harmless alone but together identifying one person.
Quasi-identifiers: ordinary alone, identifying together.

The two kinds of clues

De-identification means removing the clues that reveal who someone is. There are two kinds:

  • Direct identifiers β€” a name, phone, email, ID number. One of these alone points to you.
  • Quasi-identifiers β€” your age, your job, your city, your employer, a medication. Each is shared by many people, but stack three or four together and only one person matches.

CONFIDE-Bench tests a team of tools working together: a rule-checker for patterns (phones, emails), a name-finder for Russian (Natasha), the OpenAI Privacy Filter for English, and a small local language model (Qwen2.5-3B via Ollama) for the trickier, world-knowledge clues. The scorecard answers three questions:

Did we catch it?
We count misses β€” a miss is a leak.
Did we erase too much?
Covering up real therapy content is its own cost (utility).
Could someone still guess?
We let a local AI try to re-identify the client from what's left.

Why this benchmark is different

  • Bilingual. Almost every de-id benchmark is English-only; CONFIDE covers Russian and English, with Russian as the priority and the harder side.
  • Therapy, not hospital notes. Existing clinical de-id work targets structured medical records; therapy conversation is messier and more revealing.
  • Recall-first, entity-level. Scored the way privacy researchers do β€” a recall-weighted F2, and the strict TAB bar where an entity is "protected" only if all its mentions are masked.
  • Local-first. The default stack runs on the therapist's own machine β€” nothing sensitive goes to the cloud to be de-identified.
  • Honest about what survives. It reports the residual re-identification surface and maps it onto named regulatory risks (WP29 singling-out / linkability / inference; a HIPAA-inspired checklist).

Ethics, consent & our stance

Consent first β€” and honesty about what survives

Explicit consent for every recording

The benchmark is built on synthetic, fictional transcripts β€” no real patients. But the tools it measures are meant, eventually, for real therapy sessions, and there our position is unambiguous: any real recording or transcript must have the client's explicit, informed consent before it is recorded, stored, or processed β€” including being run through these de-identification tools. Consent is specific (this purpose), revocable, and documented; processing is local-only by default, so nothing identifiable leaves the therapist's machine. No real session data should ever be re-identified, and none is exposed to annotators.

Our stance

  • Privacy-first, local-first. The default stack runs on the therapist's own machine; the cloud is opt-in, after redaction, and only with consent.
  • Recall-first & honest. We weight catching a leak over tidy precision, and we publish our misses, our confidence intervals, and our limitations rather than a single flattering number.
  • Harm-reduction, not a guarantee. De-identification lowers risk; it does not make a transcript safe to share freely. We say so plainly.

The red part β€” what we don't hide

residual risk: RED

Under named regulatory risks (the WP29 singling-out / linkability / inference triad, plus a HIPAA-inspired checklist), the default Russian stack lands at the RED tier: a direct identifier can still leak at the strict entity bar, and ~31% of quasi-identifiers survive the redaction. A local inference attack recovers attributes from the masked text β€” a lower bound, since a frontier model would recover more.

We report this prominently by design. A tool that claimed "anonymized β€” safe to upload" would be the dangerous one. CONFIDE's value is the opposite: it shows exactly how much re-identification surface remains, so a clinician can decide β€” with the client β€” whether a given session may leave the room at all.

What this is β€” and is NOT

Not a HIPAA/GDPR compliance certificate. Types map loosely to Safe-Harbor / GDPR identifier concepts, but the mapping is illustrative, not legal certification. The one real-text exception in the data is the external RU-real (JayGuard) slice β€” anonymized, non-clinical, public Russian text under an Apache-2.0 licence. Samples are small, so per-type numbers are directional.

Gold labels are currently located from curated answer-keys and hand-verified β€” a planted-signal recovery eval, not yet independently annotated. That is exactly why multi-annotator labelling is the project's most-wanted contribution.

Who & where

CONFIDE is built by Gleb Kalinin and contributors, for the Psychodemia 2026 masterclass. The tooling, gold, and the full datasheet are released openly so others can build safer mental-health AI.

Built on the de-identification and re-identification literature β€” TAB (PilΓ‘n et al. 2022), i2b2/n2c2, Microsoft Presidio-research, Staab et al. on LLM inference attacks, and the Datasheets for Datasets / Data Statements documentation standards. Full credits are in the report references.