Contribute

Help build a safer mental-health AI

CONFIDE is open. The single most valuable thing you can do is help label its gold standard — and Russian speakers are especially needed.

Volunteer to annotate Contribute code

Several hands, each with a different coloured highlighter, independently marking the same transcript — collaborative annotation. — Independent annotators, one shared gold standard.

Annotate transcripts most wanted

Therapists increasingly want AI to review their sessions — but a raw transcript can leak a client's name, diagnosis, employer, or worse. CONFIDE measures how well automatic tools strip that out before anything leaves the therapist's machine. To make the benchmark trustworthy, its gold standard must be labelled by several independent people, not just one. That's where you come in.

What you'd do

Read short therapy/counselling transcripts and highlight every piece of identifying information — names, places, employers, ages, dates, phone numbers, medications.
Follow a clear codebook (provided, plus a 30-minute walkthrough); label in a simple tool — no coding.
Work independently — that's the point: we measure how much annotators agree.

Time

A pilot of ~6–10 transcripts, roughly 3–5 hours total, on your own schedule. Optional follow-up adjudication session to resolve disagreements together.

Who fits

Psychologists, therapists, counsellors, linguists, privacy/NLP people — or simply careful, detail-oriented readers. Russian especially; English too; German/French/Spanish welcome next.

What you get

Co-authorship or named acknowledgement (your choice), a concrete contribution to open mental-health privacy research, and a certificate of contribution if useful.

Interested? Reach out at t.me/glebkalinin with one line about your background (clinical / linguistic / technical / careful-reader) and which language(s) you can annotate. No experience required — we train you.

Contribute code & data

Prefer to help technically? The benchmark is fully reproducible and modular:

Add a detector layer — wire a new de-identifier into the cache/manifest pipeline; it scores against the same gold as everything else.
Add a language — German, French, and Spanish are the next targets; the report is already internationalized from one source.
Add a real-text anchor — like the RU-real (JayGuard) slice: an external, licensed, anonymized corpus mapped into the CONFIDE gold schema.
Run it & report — reproduce the numbers on your hardware, file discrepancies, or stress the adversarial probe with new hard forms.

Start from the repository — the datasheet, annotation codebook, preregistration, and reproducibility guide are all in docs/.

Open an issue / PR See current results

· · ·

По-русски

Помогите собрать CONFIDE — открытый бенчмарк для безопасного удаления персональных данных из расшифровок психотерапии.

Психологи всё чаще хотят использовать ИИ для разбора своих сессий — но отправка «сырой» расшифровки в облако может раскрыть имя клиента, диагноз, место работы и не только. CONFIDE измеряет, насколько хорошо автоматические инструменты вычищают идентифицирующую информацию до того, как что-либо покидает компьютер терапевта. Чтобы бенчмарк был достоверным, его «эталон» должны разметить несколько независимых людей. Здесь нужны вы — особенно носители русского языка.

Что нужно делать

Читать короткие расшифровки сессий и выделять каждую единицу идентифицирующей информации — имена, места, работодателей, возраст, даты, телефоны, лекарства.
Следовать понятному кодбуку (даём его и 30-минутный разбор); разметка в простом инструменте, программировать не нужно.
Работать независимо — мы измеряем согласованность разметчиков.

Время: пилот из ~6–10 расшифровок, примерно 3–5 часов, в удобном темпе. Конфиденциальность: вы видите только синтетические (вымышленные) расшифровки или согласованные реальные под строгим локальным протоколом — реальные данные клиентов никогда не раскрываются.

Хотите помочь? Напишите на t.me/glebkalinin одну строку о себе (клиника / лингвистика / техника / внимательный читатель) и на каком языке готовы размечать. Опыт не нужен — мы обучим.