About

De-identification, in plain language

A sticker-checking game for robots: how good are different tools at finding every private detail in a therapy transcript — without erasing the parts a therapist still needs?

De-identification is not anonymisation

The two words get used interchangeably, but they make very different promises — and the difference is the reason this benchmark exists.

De-identification is a process: finding and masking the identifying details in a text — names, phones, and the subtler clues. It lowers the risk of recognising someone; it does not promise zero. Whatever the tools miss, leaks.
Anonymisation is a result — and in European data-protection law (GDPR) a strong legal claim: the data can no longer be traced back to a person by any means reasonably likely to be used, irreversibly. Truly anonymous data falls outside the GDPR altogether.

CONFIDE deliberately says de-identification. A redacted therapy transcript is rarely anonymous in the legal sense — a handful of surviving everyday details (an age, a profession, a city) can still single someone out. So instead of claiming anonymity, the benchmark measures the gap: what the tools caught, what slipped through, and how close to anonymous the result actually is (the residual-risk section of the report).

A single figure pinpointed by four ordinary attributes — age, workplace, city, medication — each harmless alone but together identifying one person. — Quasi-identifiers: ordinary alone, identifying together.

The two kinds of clues

De-identification means removing the clues that reveal who someone is. There are two kinds:

Direct identifiers — a name, phone, email, ID number. One of these alone points to you.
Quasi-identifiers — your age, your job, your city, your employer, a medication. Each is shared by many people, but stack three or four together and only one person matches.

CONFIDE-Bench tests a team of tools working together: a rule-checker for patterns (phones, emails), a name-finder for Russian (Natasha), the OpenAI Privacy Filter for English, and a small local language model (Qwen2.5-3B via Ollama) for the trickier, world-knowledge clues. The scorecard answers three questions:

Did we catch it?: We count misses — a miss is a leak.
Did we erase too much?: Covering up real therapy content is its own cost (utility).
Could someone still guess?: We let a local AI try to re-identify the client from what's left.

Why this benchmark is different

Bilingual. Almost every de-id benchmark is English-only; CONFIDE covers Russian and English, with Russian as the priority and the harder side.
Therapy, not hospital notes. Existing clinical de-id work targets structured medical records; therapy conversation is messier and more revealing.
Recall-first, entity-level. Scored the way privacy researchers do — a recall-weighted F2, and the strict TAB bar where an entity is "protected" only if all its mentions are masked.
Local-first. The default stack runs on the therapist's own machine — nothing sensitive goes to the cloud to be de-identified.
Honest about what survives. It reports the residual re-identification surface and maps it onto named regulatory risks (WP29 singling-out / linkability / inference; a HIPAA-inspired checklist).

Ethics, consent & our stance

Consent first — and honesty about what survives

Explicit consent for every recording

The benchmark is built on synthetic, fictional transcripts — no real patients. But the tools it measures are meant, eventually, for real therapy sessions, and there our position is unambiguous: any real recording or transcript must have the client's explicit, informed consent before it is recorded, stored, or processed — including being run through these de-identification tools. Consent is specific (this purpose), revocable, and documented; processing is local-only by default, so nothing identifiable leaves the therapist's machine. No real session data should ever be re-identified, and none is exposed to annotators.

Our stance

Privacy-first, local-first. The default stack runs on the therapist's own machine; the cloud is opt-in, after redaction, and only with consent.
Recall-first & honest. We weight catching a leak over tidy precision, and we publish our misses, our confidence intervals, and our limitations rather than a single flattering number.
Harm-reduction, not a guarantee. De-identification lowers risk; it does not make a transcript safe to share freely. We say so plainly.

The red part — what we don't hide

What this is — and is NOT

Not a HIPAA/GDPR compliance certificate. Types map loosely to Safe-Harbor / GDPR identifier concepts, but the mapping is illustrative, not legal certification. The one real-text exception in the data is the external RU-real (JayGuard) slice — anonymized, non-clinical, public Russian text under an Apache-2.0 licence. Samples are small, so per-type numbers are directional.

Gold labels are currently located from curated answer-keys and hand-verified — a planted-signal recovery eval, not yet independently annotated. That is exactly why multi-annotator labelling is the project's most-wanted contribution.

Who & where

CONFIDE is built by Gleb Kalinin and contributors, for the Psychodemia 2026 masterclass. The tooling, gold, and the full datasheet are released openly so others can build safer mental-health AI.

Built on the de-identification and re-identification literature — TAB (Pilán et al. 2022), i2b2/n2c2, Microsoft Presidio-research, Staab et al. on LLM inference attacks, and the Datasheets for Datasets / Data Statements documentation standards. Full credits are in the report references.

Code & docs on GitHub How the benchmark works