About
De-identification, in plain language
A sticker-checking game for robots: how good are different tools at finding every private detail in a therapy transcript β without erasing the parts a therapist still needs?
De-identification is not anonymisation
The two words get used interchangeably, but they make very different promises β and the difference is the reason this benchmark exists.
- De-identification is a process: finding and masking the identifying details in a text β names, phones, and the subtler clues. It lowers the risk of recognising someone; it does not promise zero. Whatever the tools miss, leaks.
- Anonymisation is a result β and in European data-protection law (GDPR) a strong legal claim: the data can no longer be traced back to a person by any means reasonably likely to be used, irreversibly. Truly anonymous data falls outside the GDPR altogether.
CONFIDE deliberately says de-identification. A redacted therapy transcript is rarely anonymous in the legal sense β a handful of surviving everyday details (an age, a profession, a city) can still single someone out. So instead of claiming anonymity, the benchmark measures the gap: what the tools caught, what slipped through, and how close to anonymous the result actually is (the residual-risk section of the report).
The two kinds of clues
De-identification means removing the clues that reveal who someone is. There are two kinds:
- Direct identifiers β a name, phone, email, ID number. One of these alone points to you.
- Quasi-identifiers β your age, your job, your city, your employer, a medication. Each is shared by many people, but stack three or four together and only one person matches.
CONFIDE-Bench tests a team of tools working together: a rule-checker for patterns (phones, emails), a name-finder for Russian (Natasha), the OpenAI Privacy Filter for English, and a small local language model (Qwen2.5-3B via Ollama) for the trickier, world-knowledge clues. The scorecard answers three questions:
- Did we catch it?
- We count misses β a miss is a leak.
- Did we erase too much?
- Covering up real therapy content is its own cost (utility).
- Could someone still guess?
- We let a local AI try to re-identify the client from what's left.
Why this benchmark is different
- Bilingual. Almost every de-id benchmark is English-only; CONFIDE covers Russian and English, with Russian as the priority and the harder side.
- Therapy, not hospital notes. Existing clinical de-id work targets structured medical records; therapy conversation is messier and more revealing.
- Recall-first, entity-level. Scored the way privacy researchers do β a recall-weighted F2, and the strict TAB bar where an entity is "protected" only if all its mentions are masked.
- Local-first. The default stack runs on the therapist's own machine β nothing sensitive goes to the cloud to be de-identified.
- Honest about what survives. It reports the residual re-identification surface and maps it onto named regulatory risks (WP29 singling-out / linkability / inference; a HIPAA-inspired checklist).
Ethics, consent & our stance
Consent first β and honesty about what survives
Explicit consent for every recording
The benchmark is built on synthetic, fictional transcripts β no real patients. But the tools it measures are meant, eventually, for real therapy sessions, and there our position is unambiguous: any real recording or transcript must have the client's explicit, informed consent before it is recorded, stored, or processed β including being run through these de-identification tools. Consent is specific (this purpose), revocable, and documented; processing is local-only by default, so nothing identifiable leaves the therapist's machine. No real session data should ever be re-identified, and none is exposed to annotators.
Our stance
- Privacy-first, local-first. The default stack runs on the therapist's own machine; the cloud is opt-in, after redaction, and only with consent.
- Recall-first & honest. We weight catching a leak over tidy precision, and we publish our misses, our confidence intervals, and our limitations rather than a single flattering number.
- Harm-reduction, not a guarantee. De-identification lowers risk; it does not make a transcript safe to share freely. We say so plainly.
The red part β what we don't hide
Under named regulatory risks (the WP29 singling-out / linkability / inference triad, plus a HIPAA-inspired checklist), the default Russian stack lands at the RED tier: a direct identifier can still leak at the strict entity bar, and ~31% of quasi-identifiers survive the redaction. A local inference attack recovers attributes from the masked text β a lower bound, since a frontier model would recover more.
We report this prominently by design. A tool that claimed "anonymized β safe to upload" would be the dangerous one. CONFIDE's value is the opposite: it shows exactly how much re-identification surface remains, so a clinician can decide β with the client β whether a given session may leave the room at all.
What this is β and is NOT
Not a HIPAA/GDPR compliance certificate. Types map loosely to Safe-Harbor / GDPR identifier concepts, but the mapping is illustrative, not legal certification. The one real-text exception in the data is the external RU-real (JayGuard) slice β anonymized, non-clinical, public Russian text under an Apache-2.0 licence. Samples are small, so per-type numbers are directional.
Gold labels are currently located from curated answer-keys and hand-verified β a planted-signal recovery eval, not yet independently annotated. That is exactly why multi-annotator labelling is the project's most-wanted contribution.
Who & where
CONFIDE is built by Gleb Kalinin and contributors, for the Psychodemia 2026 masterclass. The tooling, gold, and the full datasheet are released openly so others can build safer mental-health AI.
Built on the de-identification and re-identification literature β TAB (PilΓ‘n et al. 2022), i2b2/n2c2, Microsoft Presidio-research, Staab et al. on LLM inference attacks, and the Datasheets for Datasets / Data Statements documentation standards. Full credits are in the report references.