2Sound & acoustics
Hearing begins not in the ear but in the air. Before any physiology happens, there is a physical signal — a travelling disturbance of air pressure — with properties the ear has evolved to measure: how fast it oscillates, how far, and how those oscillations are mixed across frequency and time. Get these few acoustic ideas straight and the rest of the chapter falls into place, because everything the cochlea, the nerve, and ultimately a cochlear implant does is an attempt to capture and re-represent this signal.
FWhat sound is
Sound is a pressure wave. When something vibrates — a vocal fold, a guitar string, a loudspeaker cone — it pushes and pulls on the air next to it, alternately compressing and rarefying it. Each air molecule only jostles a tiny distance back and forth, but it collides with its neighbours, and the disturbance propagates outward as a wave. What travels is the pattern of pressure change, not the air itself; the molecules stay roughly where they were.[2012]
Two consequences follow immediately. First, sound needs a medium — there is no sound in a vacuum. Second, what the ear ultimately has to measure is a fluctuating pressure at one point in space: the eardrum sits in the path of the wave and moves in and out as the pressure rises and falls. Everything downstream is the nervous system's reading of that one wiggling membrane.
FFrequency & amplitude — the roots of pitch and loudness
The simplest sound is a pure tone: a single sinusoidal oscillation, like an idealised tuning fork. It has just two properties, and they map onto the two most basic perceptual qualities:
- Frequency — how many pressure cycles occur each second, measured in hertz (Hz). Frequency is the main physical correlate of pitch: more cycles per second, higher pitch. The healthy young human ear responds from roughly 20 Hz to 20,000 Hz.
- Amplitude — how large the pressure swing is. Amplitude is the main physical correlate of loudness: bigger swings, louder sound.
Drag the two sliders below. Raising frequency packs more cycles into the same window of time; raising amplitude makes the wave taller. The period — the time for one cycle — is simply the inverse of frequency.
FTMeasuring level — the decibel
The ear copes with an astonishing range of sound pressures — the loudest tolerable sound is around a million times more intense in pressure than the faintest audible one. A linear scale would be unwieldy, so sound level is expressed logarithmically in decibels of sound pressure level (dB SPL), referenced to 20 micropascals — about the quietest sound a healthy young ear can detect, which is therefore defined as 0 dB SPL.[2012]
Because the scale is logarithmic, equal steps in decibels are equal ratios in pressure: every 20 dB is a tenfold change in sound pressure. This compresses the enormous physical range into the familiar 0–120 dB span. Slide the marker up the ladder.
The whole audiogram — the map of a person's hearing thresholds — is plotted in decibels (dB HL, a clinically referenced cousin of dB SPL). A “profound” hearing loss of 90 dB does not mean sound is 90 units quieter; it means thresholds are elevated by a factor of tens of thousands in pressure. The logarithmic scale is why small-looking numbers on an audiogram describe very large losses.
FTSpectrum & timbre
Almost no real sound is a pure tone. The decisive idea of acoustics — Fourier's — is that any sound can be described as a sum of pure tones, each with its own frequency and amplitude. That recipe is the sound's spectrum: a plot of how much energy sits at each frequency.[2012]
A periodic sound — a sustained musical note, a vowel — has a special spectrum: energy only at integer multiples of a single fundamental frequency, called the harmonics. The fundamental sets the perceived pitch; the relative strengths of the harmonics give the sound its timbre — the quality that lets you tell a violin from a flute playing the same note. Switch between the three sound types below.
TThe speech signal
Speech is the sound that matters most for a hearing device, and it has a characteristic structure. Voiced sounds— vowels and sounds like /m/ or /z/ — are produced by the vocal folds opening and closing rhythmically, generating a harmonic series whose fundamental frequency (F0) is heard as the speaker's pitch. Heavier folds give a lower F0 with closely spaced harmonics (typical adult male voice); lighter folds give a higher F0 with wider spacing (typical female or child voice).[2009]
The vocal tract above the folds then acts as a resonator, emphasising certain frequency bands — the formants. The pattern of formant peaks is what distinguishes one vowel from another: the vowel in “bet” (/ɛ/), for example, has formant peaks near 512, 1792, and 2432 Hz. Consonants, by contrast, are brief and spectrally dynamic — bursts, hisses, and rapid transitions — and carry much of the information that makes speech intelligible.[1952, 2009]
Vowels are loud, low-frequency, and steady; consonants are soft, higher-frequency, and fleeting — yet consonants do much of the work of intelligibility. This is why a hearing loss that spares the low frequencies but takes the highs can leave speech audible but unclear: the vowels come through, the consonants do not. The same logic shapes how a cochlear implant allocates frequency across its electrodes.
FWhy this matters for hearing devices
Every property in this module becomes a design constraint for a hearing device. A cochlear implant has to capture the incoming pressure wave with a microphone, measure its frequency content (to decide which electrodes to stimulate), track its level (to set how much current to deliver), and follow its changes over time(to convey the dynamics of speech) — all while squeezing the ear's ~120 dB acoustic range into the much narrower range of comfortable electrical stimulation.[2009]
Hold on to three numbers as you go: hearing spans roughly 20 Hz–20 kHz in frequency and about 120 dB in level, and speech lives mostly in the middle of both ranges. The next module follows this airborne wave to the threshold of the inner ear, where the outer and middle ear hand it across to fluid.
Which acoustic feature of speech best explains why she hears that speech is present but cannot understand it?
On a pure tone, which physical property is the main correlate of pitch?
The decibel scale for sound level is logarithmic. Roughly what does every 20 dB represent?
What are the formants of a vowel?