Cochlear Implant Atlas
CI Atlas · From Sound to Stimulation · Module 05

5Envelope, fine structure & the vocoder

Once the filter bank has split sound into bands, the processor makes its boldest move: from each band it keeps only the slow rise and fall of amplitude — the envelope — and throws away the fast oscillation underneath it, the temporal fine structure. This is the single most consequential decision in cochlear-implant coding, and it has a name and a pedigree: the channel vocoder, a speech-transmission idea from the 1930s. It works astonishingly well for speech, as a famous experiment proved by showing that the envelopes of just a few bands let listeners understand sentences. But the discarded fine structure is precisely where pitch and music live, so the same trade-off that makes speech easy makes music hard. This module is about that bargain and why the implant strikes it.

TEnvelope and fine structure

Any band of sound can be split into two parts: a slowly varying envelope (how loud the band is, moment to moment) and a fast fine structure (the rapid carrier oscillation inside it). The envelope carries the rhythm and timing of speech; the fine structure carries much of the pitch and the cues for separating sounds. The implant keeps the first and discards the second.

Keep the slow envelope, drop the fast fine structure — the implant's great trade-off

grey = fine structure (discarded) · blue = envelope (kept)

Shannon's vocoder experiments showed that the envelope of just a few bands carries enough information for good speech recognition — the empirical licence for this trade-off. But the discarded fine structure is exactly where the cues for pitch, melody and talker identity live, which is why music and speech-in-noise stay hard. The whole device rests on this bet that envelope is enough. Schematic.

CThe vocoder model

The conceptual model is the channel vocoder: split sound into bands, extract each band's envelope, and use those envelopes to modulate a set of carriers — in the implant, trains of electrical pulses on each electrode. A cochlear implant is, in effect, a vocoder whose carriers are electrodes in the cochlea. This is also why acoustic vocoder simulations (noise- or tone-vocoded speech) are used to model implant hearing in normal-hearing listeners.

What the channels do to speech — fine spectrum coarsened into a few bands

time →coarse horizontal bands

Full-resolution speech has fine, sweeping spectral detail. Vocoding it into a handful of bands — what the implant effectively does — flattens that into a few broad stripes of envelope over time. Remarkably, even 4 bands keep speech intelligible in quiet (Shannon), because the formant pattern survives; but the lost detail is exactly what the ear would use for pitch, music and separating voices in noise. Schematic.

CWhy envelope is (mostly) enough

It is not obvious that throwing away the fine structure should leave speech intelligible — but it does. In a landmark study, listeners recognised sentences from the envelopes of as few as four bands, showing that the temporal envelope across a handful of channels carries most of the information speech needs. That result is the empirical licence for the whole approach: it is why a device that discards so much can still deliver open-set speech.[1995]

CWhat it costs

The bill comes due elsewhere. Pitch, melody, talker identity and the subtle cues that let us follow one voice among many live largely in the fine structure — so discarding it is why implant users find music thin and speech in noise hard. Much of the rest of the chapter — fine-structure coding, current focusing — is an attempt to give back a little of what this step takes away, without losing the robustness that made envelope coding work.

Same envelope, different carrier — noise, tone, or the implant's pulses

gold = the shared envelope

In the implant the carrier is a train of electrical pulses on each electrode, amplitude-modulated by the band envelope. This is the real output of the device.

Case 8.5 · Speech from four bands
A student is surprised to learn that normal-hearing listeners can understand sentences from noise-vocoded speech carrying only the envelopes of four frequency bands, and asks what this implies for cochlear implants.

What is the correct implication?

Self-assessment — Module 52 questions
Question 1 · Trainee

What does envelope extraction keep and discard?

Question 2 · Clinician

What is the cochlear implant best modelled as, in signal-processing terms?

Tracked locally in your browser — see /progress for the dashboard.