Feelpath Logo

Are the transcripts accurate?

A plainspoken guide to how automated transcription behaves in real sessions, what the numbers mean, and how to keep clinical judgment central.

The honest answer

If you are using transcripts in therapy, this is the right question to ask. The answer is simple. They are not perfectly accurate.

Unless you are reviewing everything, it is wise to expect there will be some inaccuracies in transcripts. And because some Feelpath insights are based on transcripts, it is also wise to expect there can be occasional inaccuracies in insights.

In practice, we recommend occasional spot checking. If you are relying on a specific detail, do not rely on the transcript alone. Cloud transcription improves accuracy, but it does not make review unnecessary.

This is not a moral failure of the clinician or the client. It is a limitation of automated speech to text. The goal is to use it in a way that protects the work.

How transcription works

Speech to text is a prediction system. It listens to audio and outputs the most likely words. When audio is clean, it can be impressively good. When audio gets messy, accuracy can drop quickly.

That messiness is normal in real sessions. People speak softly. They talk over each other. Someone cries. Someone speaks quickly. Someone has an accent. A client uses a name, a medication, a city, a niche term, or a piece of slang. Those are exactly the moments where a transcript can drift.

Why accuracy varies

Transcript accuracy depends on the conditions of the call. For example:

  • Microphone quality and distance
  • Internet connection and audio dropouts
  • Background noise and room echo
  • Speaking pace and overlapping speech
  • Accents and dialects
  • Uncommon words, proper nouns, and clinical terms

This is why one session can look clean and the next can look rough, even with the same people.

Accuracy can also be uneven across accents and dialects. In some sessions, that can increase error rates even when the connection is stable.

If you want fewer errors, cleaner audio helps.

  • Use a quiet room when possible
  • Use a headset or a dedicated microphone when possible
  • Avoid talking over each other when possible

We notice this most with uncommon proper names and spelling.

What “accuracy” actually means

A common way researchers measure transcript quality is Word Error Rate. Word Error Rate measures how often the generated text differs from a reference transcript after a cleaning step. The cleaning often removes filler words, punctuation, and speaker names, and it standardizes numbers.

Word Error Rate is useful. It is also incomplete.

  • It tells you how many words differ.
  • It does not tell you whether the important meaning survived.

In therapy, the difference matters. A single missed “no” can flip the meaning of a sentence. A misheard medication can create confusion. A misheard emotion word can change the clinical story.

So we try to hold two truths at once. Many transcript errors are minor. Some transcript errors matter a lot.

In practice, the errors that matter most tend to involve small words and specific details. For example, missed negations, numbers, proper names, and who said what during overlap.

What accuracy can look like, with numbers

Zoom commissioned TestDevLab to compare meeting transcription across major platforms. In their September 2024 evaluation, Zoom had a Word Error Rate of 7.40% across three English meeting scenarios (2-person, 4-person, and 16-person meetings, repeated multiple times), which was lower than Webex and Teams in that comparison.

A simple way to read that number is: under those test conditions, about 7 out of 100 words differed from the reference transcript after cleaning. That can still be clinically meaningful, because the worst errors are not always the most frequent ones.

The same report also includes a separate meaning-level evaluation that scored Zoom’s transcript at 99.05% for context and overall meaning. That can be reassuring, and it still leaves room for the errors clinicians care about most: names, acronyms, negations, and emotionally loaded moments.

Live vs cloud transcription in Feelpath

Feelpath uses Zoom Video SDK for sessions.

  • Live transcription happens during the session. It is fast and can be helpful in the moment. It is also the most likely to miss words when audio conditions are imperfect.
  • Cloud transcription is generated after the session. It is slower and usually more accurate.

In Feelpath, our insights use the cloud transcript because it is the more reliable input. That reduces error, but it does not remove it. Unless you review, it is still wise to expect occasional mistakes.

A safe clinical stance

If you are relying on details, it helps to review. If something looks wrong or too sensitive, correct it or remove it. If something is wrong or will be reused, we recommend making a quick edit. Treat transcript text as support material, not a verbatim quote, unless you have verified it. For high-stakes details, a quick verification step is worth it. Feelpath is designed so each person controls their own words, and so “no” can be a stable option when transcripts are not clinically appropriate.

If you want the practical details around consent, editing, and data handling, these guides are the canonical references:

Our stance

We take accuracy seriously. We keep looking for ways to improve it. We also keep the narrative honest. Automated transcripts can be useful support material. They are not a perfect record. Review is what makes it safe to lean on.


Selected references