5Tracking the Voice: F0/F2 Formant Extraction
Before spectral-maxima strategies, the Melbourne/Nucleus team tried to explicitly extract the perceptually important features of speech. The first feature-extraction processor tracked the fundamental frequency F0 and the second formant F2 using zero-crossing detectors — a fundamentally different philosophy from waveform strategies.
TThe feature-extraction philosophy
Feature-extraction strategies attempt to estimate and transmit specific perceptually important speech parameters (formants, voicing, pitch) rather than the raw band waveform or envelope Formants are the resonant peaks of the vocal tract; F0 is the voice pitch and F2 is the second formant, both central to vowel and voicing perception This approach assumes a model of speech production/perception, unlike the assumption-free CIS The F0/F2 strategy was implemented on the early-1980s Nucleus implant from Cochlear Corporation / University of Melbourne, a 22-24 electrode device.[2000][1999]
CThe F0/F2 signal chain
The chain is: mic to AGC, then two paths — a 270 Hz low-pass filter feeding a zero-crossing detector to estimate F0 (which sets the pulse rate), and a 1000-4000 Hz band-pass filter feeding a zero-crossing detector to estimate F2 (which selects the electrode) F2 frequency selects which electrode is stimulated, F0 sets the stimulation (pulse) rate, and voicing is conveyed via the zero-crossing rate Voiced segments are stimulated at F0 pulses per second Unvoiced segments are stimulated at quasi-random intervals averaging about 100 pulses per second.[1987][2000]
CVoicing by zero-crossing
Zero-crossing detection identifies the instants where the waveform crosses the zero-amplitude axis and is used here to estimate frequency and classify voiced versus unvoiced segments F0 was estimated via a zero-crossing detector after a 270 Hz low-pass filter F2 was estimated via a zero-crossing detector after a 1000-4000 Hz band-pass filter Fundamental frequency carries important cues for intelligibility, especially in background noise.[1987][2010]
TWhy F0/F2 was superseded
Encoding only F0 and F2 leaves out the low-frequency vowel information carried by the first formant F1 Vowel and consonant discrimination needs more spectral detail than two tracked parameters provide Adding F1 improved performance, directly motivating the F0/F1/F2 strategy Feature-extraction by zero-crossing is vulnerable to estimation errors when the tracked feature is corrupted.[1987][2006]
TBy the numbers
FHear it
Which design limitation of F0/F2 best accounts for the poor consonant perception?
In the F0/F2 strategy, what does the estimated F2 frequency control?
How is F0 estimated in the F0/F2 strategy?