Cochlear Implant Atlas
CI Atlas · Speech-Coding Strategies: The Complete Lineage · Module 08

8SMSP: Letting the Spectrum Choose

The Spectral Maxima Sound Processor abandoned explicit feature extraction entirely. It analysed the spectrum with a 16-filter bank and transmitted only the six largest spectral maxima — no F0, no F1, no formant tracking. This data-driven 6-of-16 design is the bridge between the feature-extraction era and the modern n-of-m family.

TThe SMSP principle

SMSP is an n-of-m spectral-maxima strategy (6-of-16) that extracts no explicit features such as F0 or F1 from the waveform It analyses the spectrum with a 16-filter bank and transmits the six largest spectral maxima It was developed in the early 1990s for the Nucleus multi-electrode cochlear implant Unlike MPEAK it makes no assumptions about which spectral peaks are formants.[1993][2006]

60.4%SPEAK sentences at +5 dB SNR vs 31.7% for MPEAK [1994]
63Adults in the 17-week ABAB SPEAK-vs-MPEAK field study [1994]

CThe 6-of-16 chain

The chain is: speech to a bank of 16 band-pass filters (centre frequencies 250-5400 Hz) to per-band rectification plus low-pass filtering (200 Hz cutoff) to a spectral-maxima detector that selects the six largest of 16 outputs every 4 ms to logarithmic compression to radio transmission to the six selected electrodes The 16 band-pass filters have centre frequencies spanning 250-5400 Hz Per channel the envelope is obtained by rectification and a 200 Hz low-pass filter The six maxima are selected at 4 ms intervals and their amplitudes are logarithmically compressed.[1993][2006]

Every 4 ms, pick the 6 tallest of 16 bands

low freqhigh freq6 maxima → 6 electrodes

The Spectral Maxima Sound Processor threw out the formant model entirely. It analyses the sound in 16 bands, and every 4 ms it simply selects the six with the most energy and stimulates their electrodes — no voiced/unvoiced decision, no formant estimate to get wrong. Because peaks of energy are robust even in noise, this data-driven selection proved far steadier than feature tracking, and it became the template for SPEAK and ACE. Schematic.

TRobustness over feature extraction

By selecting the largest spectral peaks rather than estimating formants, SMSP is robust to the formant-extraction errors that degraded MPEAK in noise Typical clinical stimulation rates for SMSP ranged from 250 pps to 1800 pps The pure-spectral approach foreshadowed the n-of-m family (SPEAK, ACE) that became commercially dominant Strategies based on spectral signal analysis outperformed explicit speech-feature extraction overall.[2006][1993]

CWhere SMSP sits in the lineage

SMSP is conceptually the direct ancestor of the n-of-m / spectral-peak family It was superseded in commercial use by higher-rate, more-flexible spectral-maxima strategies (SPEAK and ACE) and by CIS implementations within the Nucleus and other systems The idea of picking n maxima of m bands generalises directly into SPEAK and ACE SMSP demonstrated that throwing away the feature-extraction model improved noise robustness.[2006][1993]

Same noisy input — peak-picking vs formant tracking

SMSP — spectral maximaMPEAK — formant estimate

Feed both strategies the same signal and raise the noise. MPEAK must estimate formants, so its output (red) becomes erratic as the tracker locks onto noise. SMSP only asks which bands carry the most energy — a question that stays answerable in noise — so its selection (green) holds steady. This robustness, not raw clarity in quiet, is why data-driven peak-picking displaced feature extraction. Schematic.

TBy the numbers

SPEAK (Spectral Maxima) vs MPEAK: Sentences in Quiet and Noise

020406080Percent correct (sentences)Quiet+15 dB SNR+10 dB SNR+5 dB SNR
Listening condition+5 dB SNRSPEAK60.4%MPEAK31.7%

SPEAK (the spectral-maxima descendant of SMSP) picks the 6-10 most energetic spectral bands each cycle and stimulates only those, discarding the rest. Its advantage over fixed-feature MPEAK widens dramatically in noise: from +10 pp in quiet to roughly +23-29 pp across +15 to +5 dB SNR, where peak-picking suppresses noise-dominated channels. Verified means from Skinner 1994 (n=63).

Case 14.8 · SMSP
A research processor in the early 1990s analyses speech with sixteen band-pass filters but stimulates only six electrodes at a time, choosing them anew every few milliseconds, and never tries to label any peak as a formant.

Which strategy is being described, and what is its defining advantage over MPEAK?

Self-assessment — Module 82 questions
Question 1

What is the n-of-m configuration of the SMSP strategy?

Question 2

What does SMSP deliberately NOT extract that MPEAK did?

Tracked locally in your browser — see /progress for the dashboard.