5Speech audiometry — SRT, word recognition & rollover
Pure tones tell you what a person can detect; speech tests tell you what they can understand — and understanding is, in the end, what hearing is for. Speech audiometry begins with a simple cross-check, the speech reception threshold, which should agree with the pure-tone average and so validates the audiogram. It then moves above threshold to ask a harder question: of the words that are clearly audible, how many are correctly identified? The answer, the maximum word score, characterises the clarity of the ear, and the way that score behaves at still-higher levels — whether it holds or rolls over — is one of the battery's classic pointers beyond the cochlea. This module covers how speech is tested and why the seemingly mundane choices of material, talker and level decide whether the number means anything.
TThe speech reception threshold
The speech reception threshold (SRT) is the lowest level at which a listener correctly repeats 50% of two-syllable spondee words. Its main job is a cross-check: it should agree with the pure-tone average within ~10 dB, validating the audiogram (and flagging a non-organic loss when it does not).
CWord recognition, PB-max & rollover
Suprathreshold word recognition uses phonetically-balanced monosyllables (CNC) presented well above threshold; the peak score is PB-max. A normal ear reaches ~100%, a cochlear loss plateaus lower, and a fall in score at still-higher levels — rollover — is a retrocochlear / neural sign.[2009]
CRecorded, not live voice
Recorded materials are mandatory. Monitored-live-voice and recorded word scores differed in 72% of ears (by up to 80 percentage points), because the VU meter cannot track low-level consonants and a live talker subtly adapts. Only recorded, calibrated speech gives a reproducible, comparable result.[2020]
CLevel, words vs sentences, open vs closed
Three more choices shape the number. Presentation level: scores fall at soft levels, and since conversational speech sits near 60 dB SPL, candidacy-relevant testing favours 60 dBA over a flattering 70 dB. Words vs sentences: monosyllables strip away context while sentences add it, so a listener can pass sentences yet fail words — a dissociation that flags a temporal-processing or memory deficit. And open- vs closed-set formats trade realism against the need to correct for chance. How these materials become candidacy cutoffs is the subject of the next chapter.
What does this pattern suggest?
What is the main role of the speech reception threshold (SRT)?
Why are recorded (not monitored-live-voice) speech materials mandatory?