CI Atlas · Speech-Coding Strategies: The Complete Lineage · Module 05

5Tracking the Voice: F0/F2 Formant Extraction

Before spectral-maxima strategies, the Melbourne/Nucleus team tried to explicitly extract the perceptually important features of speech. The first feature-extraction processor tracked the fundamental frequency F0 and the second formant F2 using zero-crossing detectors — a fundamentally different philosophy from waveform strategies.

TThe feature-extraction philosophy

Feature-extraction strategies attempt to estimate and transmit specific perceptually important speech parameters (formants, voicing, pitch) rather than the raw band waveform or envelope Formants are the resonant peaks of the vocal tract; F0 is the voice pitch and F2 is the second formant, both central to vowel and voicing perception This approach assumes a model of speech production/perception, unlike the assumption-free CIS The F0/F2 strategy was implemented on the early-1980s Nucleus implant from Cochlear Corporation / University of Melbourne, a 22-24 electrode device.[2000][1999]

8%F0F2 open-set NU-6 monosyllabic word recognition (n=5) [1990]

31%F0F2 sentences without context (vs 64% with F1 added) [1990]

CThe F0/F2 signal chain

The chain is: mic to AGC, then two paths — a 270 Hz low-pass filter feeding a zero-crossing detector to estimate F0 (which sets the pulse rate), and a 1000-4000 Hz band-pass filter feeding a zero-crossing detector to estimate F2 (which selects the electrode) F2 frequency selects which electrode is stimulated, F0 sets the stimulation (pulse) rate, and voicing is conveyed via the zero-crossing rate Voiced segments are stimulated at F0 pulses per second Unvoiced segments are stimulated at quasi-random intervals averaging about 100 pulses per second.[1987][2000]

CVoicing by zero-crossing

Zero-crossing detection identifies the instants where the waveform crosses the zero-amplitude axis and is used here to estimate frequency and classify voiced versus unvoiced segments F0 was estimated via a zero-crossing detector after a 270 Hz low-pass filter F2 was estimated via a zero-crossing detector after a 1000-4000 Hz band-pass filter Fundamental frequency carries important cues for intelligibility, especially in background noise.[1987][2010]

TWhy F0/F2 was superseded

Encoding only F0 and F2 leaves out the low-frequency vowel information carried by the first formant F1 Vowel and consonant discrimination needs more spectral detail than two tracked parameters provide Adding F1 improved performance, directly motivating the F0/F1/F2 strategy Feature-extraction by zero-crossing is vulnerable to estimation errors when the tracked feature is corrupted.[1987][2006]

TBy the numbers

FHear it

Case 15.5 · Tracking the Voice

A 1980s Nucleus recipient on an F0/F2 processor identifies many vowels by their pitch and openness but struggles badly with consonants such as /s/, /f/ and /th/.

Which design limitation of F0/F2 best accounts for the poor consonant perception?

Self-assessment — Module 52 questions

Question 1

In the F0/F2 strategy, what does the estimated F2 frequency control?

Question 2

How is F0 estimated in the F0/F2 strategy?

Tracked locally in your browser — see /progress for the dashboard.