CI Atlas · Speech-Coding Strategies: The Complete Lineage · Module 09

9The n-of-m Family: SPEAK and the Rise of Peak Picking

The n-of-m idea — filter into m bands, stimulate only the n with the largest envelopes — became the backbone of Cochlear's strategies. SPEAK refined it with an adaptive number of selected channels, but its low stimulation rate, forced by the slow Nucleus 22 link, set up the move to ACE.

TThe n-of-m principle

n-of-m is like CIS but adds a channel-selection step: each frame the m channel envelopes are scanned and only the n with the highest amplitudes are stimulated n-of-m / peak-picking origins trace to Wilson et al. 1988, and the idea relates to peak-picking channel-vocoder methods Deleting low-amplitude channels each frame can reduce overall masking across the implanted cochlea, an 'unmasking' effect A common textbook example is a 4-of-8 strategy.[1988][1998]

8-10Spectral maxima typically selected per cycle by SPEAK (peak-picking) [1994]

+9.2Percentage-point SPEAK advantage on monosyllabic words in quiet [1994]

CSPEAK: adaptive n

SPEAK is an adaptive n-of-m approach in which the number of selected channels varies frame-to-frame, typically about 6 with a range of 1-10, based on which envelopes exceed a noise threshold SPEAK uses up to 20 analysis bands with envelope detectors at a 200 Hz cutoff SPEAK was implemented on the Nucleus 22 (Spectra 22) system and validated by Skinner and colleagues in 1994 SPEAK activates 6 to 10 electrodes sequentially at an average rate of about 250 pps per activated electrode.[1994][1995]

CThe rate ceiling of SPEAK

SPEAK's cycle rate is only about 180-300 cycles/s (average ~250), constrained by the slow Nucleus 22 transcutaneous link Including fewer electrodes per cycle allows higher rates, while including more electrodes lowers the rate The combination of low rate and a 200 Hz envelope cutoff is below the minimum needed to prevent aliasing, so frequencies above about 125 Hz in the modulation are aliased at the ~250/s rate Pulsatile, non-simultaneous stimulation is retained, preserving the interleaving advantage.[1994][1995]

TWhy SPEAK gave way to ACE

SPEAK's low stimulation rate and the resulting aliasing distortions limited its temporal fidelity Combining SPEAK-style spectral-maxima selection with a higher stimulation rate produced ACE If n equals m, SPEAK/ACE reduce essentially to CIS SPEAK remained useful for tonal-language and tone perception studies but was generally surpassed by higher-rate ACE.[1995][2001]

TThe n-of-m / peak-picking formalism and its descendants

An 'n-of-m' strategy divides the spectrum into m analysis bands and, each frame, stimulates only the n bands with the most energy, discarding the rest as a built-in noise-suppression mechanism (Skinner 1994). SPEAK selected roughly 6-10 of its ~20 bands per cycle; its noise advantage over fixed-feature MPEAK grew from +10 pp in quiet to ~+29 pp at +5 dB SNR for sentences (Skinner 1994). SPEAK assigns 4 electrodes to F1 frequencies versus 6-8 for MPEAK, which traded slightly better F1 coding in MPEAK for better r-color/duration/F2 coding in SPEAK (Skinner 1996; Skinner 1999). On phoneme identification SPEAK matched MPEAK on vowels (73.4% vs 72.3%) but transmitted more place information for consonants (76.2% vs 67.5%, p<0.001) (Skinner 1996; Skinner 1999). ACE is the high-rate n-of-m successor to SPEAK and outperformed it markedly in children, e.g. words in noise 84.4% (ACE) vs 43.3% (SPEAK) at +10 dB SNR (Pasanisi 2002).[1994][1996]

TBy the numbers

Case 15.9 · The n-of-m Family

A clinician notes that a Nucleus 22 SPEAK user has good speech-in-quiet but reports that music and voice pitch sound rough, and the research team attributes this to the strategy's timing.

Which SPEAK characteristic best explains the rough pitch percept?

Self-assessment — Module 92 questions

Question 1

What distinguishes SPEAK from a fixed n-of-m strategy?

Question 2

Why is SPEAK's per-channel stimulation rate so low (~250/s)?

Tracked locally in your browser — see /progress for the dashboard.