The toolkit

Ten skills, a red team, and a benchmark to keep them honest

CONFIDE isn't only a scorecard. It ships the thing that does the redaction — a local anonymizer skill — a package of nine companion skills for what comes after, and the thing that tries to break it all — CONFIDE-Red, a re-identification red-team.

The skill

session-anonymizer — local PII redaction for your agent

A small hand roller applying neat blank masking labels over lines of handwritten text.

A three-layer, fully-local de-identifier for therapy / coaching session transcripts (Russian & English). It installs straight into your coding agent as a skill, so you can say "anonymize this transcript" and it runs on your own machine — no raw client data leaves it.

layer	tool	catches
1	Natasha	Russian names, locations, organizations
2	regex	emails, URLs, phones, structured IDs (policy / account / card)
3	Ollama LLM	medications, dates, contextual IDs

Plus a Russian entity-propagation pass that masks every variant of a detected name (inflected, possessive, vocative, transliterated) — closing the strict entity-level leaks the benchmark surfaced.

Install in one command

The skill is distributed with the skills CLI. From your project (or with --global for all projects):

npx skills add glebis/confide

This clones the repo, finds the session-anonymizer skill, auto-detects your agent, and installs it. Then run its one-time setup (installs the three local layers, idempotent): ./setup.sh inside the skill folder.

Target a specific agent

npx skills add glebis/confide --agent claude-code   # or: cursor, codex, windsurf, …
npx skills add glebis/confide --agent '*'           # install to every detected agent

Install globally (all your projects)

npx skills add glebis/confide --global

Just try it once, without installing

npx skills use glebis/confide@session-anonymizer

Manual install (no npx)

git clone https://github.com/glebis/confide
cp -r confide/skills/session-anonymizer ~/.claude/skills/   # or your agent's skills dir
cd ~/.claude/skills/session-anonymizer && ./setup.sh

:::

The companion skills

Nine skills for what comes after anonymization

Redaction is step one. The same open package ships nine companion skills — analysis lenses, a supervision self-check, client-facing interventions, and the synthetic-data generator that feeds this benchmark. All are plain-prose skills that run on Claude Code and Codex; analysis output is in Russian (the audience language).

The rule that protects the whole package: local tools ≠ local inference. These agents reason in the cloud by default, so every analysis skill operates only on already-anonymized, human-reviewed text — and a safety-boundary scan (flagging material that needs a human protocol) runs before any analysis. It flags; it never scores risk.

A fountain pen conjuring two translucent fictional silhouettes rising from a blank page.

infrastructure

`synthetic-session-generator`

Realistic, fully fictional session transcripts for evals, demos, and training data — the kind of corpus this benchmark runs on. No real client ever needed.

A magnifying glass over a speech bubble revealing a small tangled knot of thread.

analysis

`cbt-session-analysis`

One anonymized session in; the client’s cognitive distortions out — named, quoted as evidence, and confidence-rated (Diagnosis-of-Thought structure).

A thread weaving through a row of pages, tracing a gently falling curve.

analysis

`multi-session-patterns`

A client’s whole corpus at once: distortion trends, theme evolution, avoidance candidates, and homework assigned-vs-checked.

An empty therapist armchair facing a hand mirror, a small checklist card at its base.

analysis

`cbt-supervision`

A structural self-check of the therapist’s own session — a 10-element CBT protocol score plus five questions to bring to supervision.

A six-sided prism bending thin arrows toward a distant flag, one arrow turning away.

lens

`act-lens`

The same corpus read through ACT psychological flexibility — hexaflex processes, towards/away moves. Exploratory, hypotheses only.

A serene iceberg, nine tenths below the waterline.

lens

`psychodynamic-lens`

A cautious psychodynamic reading: the CCRT relationship pattern, coarse defense groups, transference hints. Lowest confidence — deliberately last in line.

A small wooden toolbox holding tools shaped like thought bubbles and a worksheet.

intervention

`cognitive-toolkit`

Client-facing guided CBT/DBT exercises — thought records, opposite action. An intervention you do, not an analysis you read.

intervention

`mental-toolkit`

Structured self-management: timed worry processing, scenario planning, anxiety check-ins.

A wide funnel sorting falling sheets of paper into three wooden trays.

infrastructure

`meeting-processor`

Auto-detects what kind of transcript it’s looking at and applies type-specific extraction — including coaching and therapy sessions.

The pipeline order (it matters)

1. Anonymize locally (session-anonymizer) → manual quasi-identifier check → 2. safety-boundary scan → 3. single-session analysis → 4. multi-session patterns → 5. supervision self-check → 6. the interpretive lenses last, at lowest confidence. Every observation follows a fixed schema: evidence quote → interpretation → alternative → confidence in words → clinical action → boundary. No diagnoses, no risk scoring, no treatment decisions.

Install the package

npx skills add glebis/psychodemia-2026        # all skills, auto-detects your agent
git clone https://github.com/glebis/psychodemia-2026   # or copy/symlink skill folders manually

Each skill is a single plain-prose SKILL.md folder — see the package's INDEX for per-skill docs and the shared discipline.

:::

The red-team

CONFIDE-Red — try to re-identify what survived

Removing names is routinely mistaken for "safe." CONFIDE-Red is the adversary that checks: it runs the three GDPR Article-29 attacks against the redacted output and reports what a motivated reader could still recover.

Inference

A local LLM attacker, given only the masked transcript, guesses the client's attributes (top-k). A weak local model is a lower bound — a frontier model recovers more.

Singling-out

Do the surviving quasi-identifiers, combined, narrow the population to one person? Estimated against sourced priors (illustrative on synthetic personas, not a probability).

Linkability

Can two redacted sessions be matched as the same person? Reported as a pairwise ROC AUC: a 0–1 score for how reliably an attacker can tell the two apart, where 1.0 = perfectly linkable and 0.50 = no better than a coin-flip. At or below 0.50 is the safe direction.

This is the same attack suite behind the report's residual-risk (RED) tier. It's why we say the honest part out loud: even the best stack leaves a re-identification surface, and CONFIDE-Red is how you measure it before a session ever leaves the room.

Run it (from the repo)

git clone https://github.com/glebis/confide && cd confide
python -m confide_eval.redteam.confide_red --dataset ru

Code & docs on GitHub How the benchmark works Contribute