The toolkit
Ten skills, a red team, and a benchmark to keep them honest
CONFIDE isn't only a scorecard. It ships the thing that does the redaction β a local anonymizer skill β a package of nine companion skills for what comes after, and the thing that tries to break it all β CONFIDE-Red, a re-identification red-team.
The skill
session-anonymizer β local PII redaction for your agent
A three-layer, fully-local de-identifier for therapy / coaching session transcripts (Russian & English). It installs straight into your coding agent as a skill, so you can say "anonymize this transcript" and it runs on your own machine β no raw client data leaves it.
| layer | tool | catches |
|---|---|---|
| 1 | Natasha | Russian names, locations, organizations |
| 2 | regex | emails, URLs, phones, structured IDs (policy / account / card) |
| 3 | Ollama LLM | medications, dates, contextual IDs |
Plus a Russian entity-propagation pass that masks every variant of a detected name (inflected, possessive, vocative, transliterated) β closing the strict entity-level leaks the benchmark surfaced.
Install in one command
The skill is distributed with the skills CLI. From your project (or with --global for all projects):
npx skills add glebis/confide This clones the repo, finds the session-anonymizer skill, auto-detects your agent, and installs it. Then run its one-time setup (installs the three local layers, idempotent): ./setup.sh inside the skill folder.
Target a specific agent
npx skills add glebis/confide --agent claude-code # or: cursor, codex, windsurf, β¦
npx skills add glebis/confide --agent '*' # install to every detected agent Install globally (all your projects)
npx skills add glebis/confide --global Just try it once, without installing
npx skills use glebis/confide@session-anonymizer Manual install (no npx)
git clone https://github.com/glebis/confide
cp -r confide/skills/session-anonymizer ~/.claude/skills/ # or your agent's skills dir
cd ~/.claude/skills/session-anonymizer && ./setup.sh The skill is local-first by design, but it never replaces explicit, informed client consent for recording and processing a real session. De-identification is harm-reduction, not anonymization β see our ethics & consent stance.
The companion skills
Nine skills for what comes after anonymization
Redaction is step one. The same open package ships nine companion skills β analysis lenses, a supervision self-check, client-facing interventions, and the synthetic-data generator that feeds this benchmark. All are plain-prose skills that run on Claude Code and Codex; analysis output is in Russian (the audience language).
The rule that protects the whole package: local tools β local inference. These agents reason in the cloud by default, so every analysis skill operates only on already-anonymized, human-reviewed text β and a safety-boundary scan (flagging material that needs a human protocol) runs before any analysis. It flags; it never scores risk.
synthetic-session-generator
Realistic, fully fictional session transcripts for evals, demos, and training data β the kind of corpus this benchmark runs on. No real client ever needed.
cbt-session-analysis
One anonymized session in; the clientβs cognitive distortions out β named, quoted as evidence, and confidence-rated (Diagnosis-of-Thought structure).
multi-session-patterns
A clientβs whole corpus at once: distortion trends, theme evolution, avoidance candidates, and homework assigned-vs-checked.
cbt-supervision
A structural self-check of the therapistβs own session β a 10-element CBT protocol score plus five questions to bring to supervision.
act-lens
The same corpus read through ACT psychological flexibility β hexaflex processes, towards/away moves. Exploratory, hypotheses only.
psychodynamic-lens
A cautious psychodynamic reading: the CCRT relationship pattern, coarse defense groups, transference hints. Lowest confidence β deliberately last in line.
cognitive-toolkit
Client-facing guided CBT/DBT exercises β thought records, opposite action. An intervention you do, not an analysis you read.
mental-toolkit
Structured self-management: timed worry processing, scenario planning, anxiety check-ins.
meeting-processor
Auto-detects what kind of transcript itβs looking at and applies type-specific extraction β including coaching and therapy sessions.
The pipeline order (it matters)
1. Anonymize locally (session-anonymizer) β manual quasi-identifier check β 2. safety-boundary scan β 3. single-session analysis β 4. multi-session patterns β 5. supervision self-check β 6. the interpretive lenses last, at lowest confidence. Every observation follows a fixed schema: evidence quote β interpretation β alternative β confidence in words β clinical action β boundary. No diagnoses, no risk scoring, no treatment decisions.
Install the package
npx skills add glebis/psychodemia-2026 # all skills, auto-detects your agent
git clone https://github.com/glebis/psychodemia-2026 # or copy/symlink skill folders manually Each skill is a single plain-prose SKILL.md folder β see the package's INDEX for per-skill docs and the shared discipline.
The red-team
CONFIDE-Red β try to re-identify what survived
Removing names is routinely mistaken for "safe." CONFIDE-Red is the adversary that checks: it runs the three GDPR Article-29 attacks against the redacted output and reports what a motivated reader could still recover.
Inference
A local LLM attacker, given only the masked transcript, guesses the client's attributes (top-k). A weak local model is a lower bound β a frontier model recovers more.
Singling-out
Do the surviving quasi-identifiers, combined, narrow the population to one person? Estimated against sourced priors (illustrative on synthetic personas, not a probability).
Linkability
Can two redacted sessions be matched as the same person? Reported as a pairwise ROC AUC: a 0β1 score for how reliably an attacker can tell the two apart, where 1.0 = perfectly linkable and 0.50 = no better than a coin-flip. At or below 0.50 is the safe direction.
This is the same attack suite behind the report's residual-risk (RED) tier. It's why we say the honest part out loud: even the best stack leaves a re-identification surface, and CONFIDE-Red is how you measure it before a session ever leaves the room.
Run it (from the repo)
git clone https://github.com/glebis/confide && cd confide
python -m confide_eval.redteam.confide_red --dataset ru