Cochlear Implant Atlas
CI Atlas · Speech-Coding Strategies: The Complete Lineage · Module 09

9The n-of-m Family: SPEAK and the Rise of Peak Picking

The n-of-m idea — filter into m bands, stimulate only the n with the largest envelopes — became the backbone of Cochlear's strategies. SPEAK refined it with an adaptive number of selected channels, but its low stimulation rate, forced by the slow Nucleus 22 link, set up the move to ACE.

TThe n-of-m principle

n-of-m is like CIS but adds a channel-selection step: each frame the m channel envelopes are scanned and only the n with the highest amplitudes are stimulated n-of-m / peak-picking origins trace to Wilson et al. 1988, and the idea relates to peak-picking channel-vocoder methods Deleting low-amplitude channels each frame can reduce overall masking across the implanted cochlea, an 'unmasking' effect A common textbook example is a 4-of-8 strategy.[1988][1998]

8-10Spectral maxima typically selected per cycle by SPEAK (peak-picking) [1994]
+9.2Percentage-point SPEAK advantage on monosyllabic words in quiet [1994]

CSPEAK: adaptive n

SPEAK is an adaptive n-of-m approach in which the number of selected channels varies frame-to-frame, typically about 6 with a range of 1-10, based on which envelopes exceed a noise threshold SPEAK uses up to 20 analysis bands with envelope detectors at a 200 Hz cutoff SPEAK was implemented on the Nucleus 22 (Spectra 22) system and validated by Skinner and colleagues in 1994 SPEAK activates 6 to 10 electrodes sequentially at an average rate of about 250 pps per activated electrode.[1994][1995]

Only bands above threshold fire — so n varies

thresholdn = 12 electrodes this frame

SPEAK is the spectral-maxima idea made adaptive. Rather than always taking a fixed count, it selects the bands whose energy stands above the spectral floor, so the number of stimulated electrodes n rises and falls with the sound — typically 6–10 of 20. A broadband sound lights many electrodes; a pure tone lights only a few. Raising the threshold here is a stand-in for that moment-to-moment adaptation. Schematic.

CThe rate ceiling of SPEAK

SPEAK's cycle rate is only about 180-300 cycles/s (average ~250), constrained by the slow Nucleus 22 transcutaneous link Including fewer electrodes per cycle allows higher rates, while including more electrodes lowers the rate The combination of low rate and a 200 Hz envelope cutoff is below the minimum needed to prevent aliasing, so frequencies above about 125 Hz in the modulation are aliased at the ~250/s rate Pulsatile, non-simultaneous stimulation is retained, preserving the interleaving advantage.[1994][1995]

TWhy SPEAK gave way to ACE

SPEAK's low stimulation rate and the resulting aliasing distortions limited its temporal fidelity Combining SPEAK-style spectral-maxima selection with a higher stimulation rate produced ACE If n equals m, SPEAK/ACE reduce essentially to CIS SPEAK remained useful for tonal-language and tone perception studies but was generally surpassed by higher-rate ACE.[1995][2001]

More electrodes per cycle → lower rate each

4 elec16 elec1800 pps / channel4.5× the 400 Hz cutoff

Rate stays comfortably above the envelope requirement.

A cochlear implant's radio link can only carry so many pulses per second in total. Spread that fixed budget over more electrodes per cycle and each electrode's rate falls. SPEAK on the Nucleus 22 chose broad spectral coverage at a modest ~250 pps per channel; the cost was a lower rate that samples the envelope less finely — the very gap that ACE was built to close by pushing the rate back up. Schematic.

TThe n-of-m / peak-picking formalism and its descendants

An 'n-of-m' strategy divides the spectrum into m analysis bands and, each frame, stimulates only the n bands with the most energy, discarding the rest as a built-in noise-suppression mechanism (Skinner 1994). SPEAK selected roughly 6-10 of its ~20 bands per cycle; its noise advantage over fixed-feature MPEAK grew from +10 pp in quiet to ~+29 pp at +5 dB SNR for sentences (Skinner 1994). SPEAK assigns 4 electrodes to F1 frequencies versus 6-8 for MPEAK, which traded slightly better F1 coding in MPEAK for better r-color/duration/F2 coding in SPEAK (Skinner 1996; Skinner 1999). On phoneme identification SPEAK matched MPEAK on vowels (73.4% vs 72.3%) but transmitted more place information for consonants (76.2% vs 67.5%, p<0.001) (Skinner 1996; Skinner 1999). ACE is the high-rate n-of-m successor to SPEAK and outperformed it markedly in children, e.g. words in noise 84.4% (ACE) vs 43.3% (SPEAK) at +10 dB SNR (Pasanisi 2002).[1994][1996]

TBy the numbers

Peak-Picking Pays Off in Noise: SPEAK vs MPEAK Across SNRs

020406080Percent correct (sentences)Quiet+15 dB SNR+10 dB SNR+5 dB SNR
Listening condition+5 dB SNRSPEAK (n-of-m peak-picking)60.4%MPEAK (fixed feature)31.7%

The 'n-of-m' family stimulates only the n highest-amplitude of m analysis channels each frame. This peak-picking shows its value most as noise rises: SPEAK degrades gently while fixed-feature MPEAK collapses, the two curves diverging from a 10 pp gap in quiet to ~29 pp at +5 dB SNR. Verified sentence means from Skinner 1994 (n=63).

Case 14.9 · The n-of-m Family
A clinician notes that a Nucleus 22 SPEAK user has good speech-in-quiet but reports that music and voice pitch sound rough, and the research team attributes this to the strategy's timing.

Which SPEAK characteristic best explains the rough pitch percept?

Self-assessment — Module 92 questions
Question 1

What distinguishes SPEAK from a fixed n-of-m strategy?

Question 2

Why is SPEAK's per-channel stimulation rate so low (~250/s)?

Tracked locally in your browser — see /progress for the dashboard.