The toolkit

Ten skills, a red team, and a benchmark to keep them honest

CONFIDE isn't only a scorecard. It ships the thing that does the redaction β€” a local anonymizer skill β€” a package of nine companion skills for what comes after, and the thing that tries to break it all β€” CONFIDE-Red, a re-identification red-team.

The skill

session-anonymizer β€” local PII redaction for your agent

A small hand roller applying neat blank masking labels over lines of handwritten text.

A three-layer, fully-local de-identifier for therapy / coaching session transcripts (Russian & English). It installs straight into your coding agent as a skill, so you can say "anonymize this transcript" and it runs on your own machine β€” no raw client data leaves it.

layertoolcatches
1NatashaRussian names, locations, organizations
2regexemails, URLs, phones, structured IDs (policy / account / card)
3Ollama LLMmedications, dates, contextual IDs

Plus a Russian entity-propagation pass that masks every variant of a detected name (inflected, possessive, vocative, transliterated) β€” closing the strict entity-level leaks the benchmark surfaced.

Install in one command

The skill is distributed with the skills CLI. From your project (or with --global for all projects):

npx skills add glebis/confide

This clones the repo, finds the session-anonymizer skill, auto-detects your agent, and installs it. Then run its one-time setup (installs the three local layers, idempotent): ./setup.sh inside the skill folder.

Target a specific agent

npx skills add glebis/confide --agent claude-code   # or: cursor, codex, windsurf, …
npx skills add glebis/confide --agent '*'           # install to every detected agent

Install globally (all your projects)

npx skills add glebis/confide --global

Just try it once, without installing

npx skills use glebis/confide@session-anonymizer

Manual install (no npx)

git clone https://github.com/glebis/confide
cp -r confide/skills/session-anonymizer ~/.claude/skills/   # or your agent's skills dir
cd ~/.claude/skills/session-anonymizer && ./setup.sh
consent first

The skill is local-first by design, but it never replaces explicit, informed client consent for recording and processing a real session. De-identification is harm-reduction, not anonymization β€” see our ethics & consent stance.

:::

The companion skills

Nine skills for what comes after anonymization

Redaction is step one. The same open package ships nine companion skills β€” analysis lenses, a supervision self-check, client-facing interventions, and the synthetic-data generator that feeds this benchmark. All are plain-prose skills that run on Claude Code and Codex; analysis output is in Russian (the audience language).

The rule that protects the whole package: local tools β‰  local inference. These agents reason in the cloud by default, so every analysis skill operates only on already-anonymized, human-reviewed text β€” and a safety-boundary scan (flagging material that needs a human protocol) runs before any analysis. It flags; it never scores risk.
A fountain pen conjuring two translucent fictional silhouettes rising from a blank page.
infrastructure

synthetic-session-generator

Realistic, fully fictional session transcripts for evals, demos, and training data β€” the kind of corpus this benchmark runs on. No real client ever needed.

A magnifying glass over a speech bubble revealing a small tangled knot of thread.
analysis

cbt-session-analysis

One anonymized session in; the client’s cognitive distortions out β€” named, quoted as evidence, and confidence-rated (Diagnosis-of-Thought structure).

A thread weaving through a row of pages, tracing a gently falling curve.
analysis

multi-session-patterns

A client’s whole corpus at once: distortion trends, theme evolution, avoidance candidates, and homework assigned-vs-checked.

An empty therapist armchair facing a hand mirror, a small checklist card at its base.
analysis

cbt-supervision

A structural self-check of the therapist’s own session β€” a 10-element CBT protocol score plus five questions to bring to supervision.

A six-sided prism bending thin arrows toward a distant flag, one arrow turning away.
lens

act-lens

The same corpus read through ACT psychological flexibility β€” hexaflex processes, towards/away moves. Exploratory, hypotheses only.

A serene iceberg, nine tenths below the waterline.
lens

psychodynamic-lens

A cautious psychodynamic reading: the CCRT relationship pattern, coarse defense groups, transference hints. Lowest confidence β€” deliberately last in line.

A small wooden toolbox holding tools shaped like thought bubbles and a worksheet.
intervention

cognitive-toolkit

Client-facing guided CBT/DBT exercises β€” thought records, opposite action. An intervention you do, not an analysis you read.

An hourglass beside a calm head in profile, a soft breathing circle around them.
intervention

mental-toolkit

Structured self-management: timed worry processing, scenario planning, anxiety check-ins.

A wide funnel sorting falling sheets of paper into three wooden trays.
infrastructure

meeting-processor

Auto-detects what kind of transcript it’s looking at and applies type-specific extraction β€” including coaching and therapy sessions.

The pipeline order (it matters)

1. Anonymize locally (session-anonymizer) β†’ manual quasi-identifier check β†’ 2. safety-boundary scan β†’ 3. single-session analysis β†’ 4. multi-session patterns β†’ 5. supervision self-check β†’ 6. the interpretive lenses last, at lowest confidence. Every observation follows a fixed schema: evidence quote β†’ interpretation β†’ alternative β†’ confidence in words β†’ clinical action β†’ boundary. No diagnoses, no risk scoring, no treatment decisions.

Install the package

npx skills add glebis/psychodemia-2026        # all skills, auto-detects your agent
git clone https://github.com/glebis/psychodemia-2026   # or copy/symlink skill folders manually

Each skill is a single plain-prose SKILL.md folder β€” see the package's INDEX for per-skill docs and the shared discipline.

:::

The red-team

CONFIDE-Red β€” try to re-identify what survived

Removing names is routinely mistaken for "safe." CONFIDE-Red is the adversary that checks: it runs the three GDPR Article-29 attacks against the redacted output and reports what a motivated reader could still recover.

Inference

A local LLM attacker, given only the masked transcript, guesses the client's attributes (top-k). A weak local model is a lower bound β€” a frontier model recovers more.

Singling-out

Do the surviving quasi-identifiers, combined, narrow the population to one person? Estimated against sourced priors (illustrative on synthetic personas, not a probability).

Linkability

Can two redacted sessions be matched as the same person? Reported as a pairwise ROC AUC: a 0–1 score for how reliably an attacker can tell the two apart, where 1.0 = perfectly linkable and 0.50 = no better than a coin-flip. At or below 0.50 is the safe direction.

This is the same attack suite behind the report's residual-risk (RED) tier. It's why we say the honest part out loud: even the best stack leaves a re-identification surface, and CONFIDE-Red is how you measure it before a session ever leaves the room.

Run it (from the repo)

git clone https://github.com/glebis/confide && cd confide
python -m confide_eval.redteam.confide_red --dataset ru