David's Web Page


 

Auditory perception and SBC coding.

 

Human hearing and voice

 

Human hearing range is about 20 Hz to 20 kHz, and is most sensitive between 2 to 4 KHz.

Normal voice range is about 500 Hz to 2 kHz.

 

•Low frequencies are vowels and bass

•High frequencies are consonants

 

This plot is quite easy to understand ; thehigher the "dB", the louder the sound is. When the sound is loud enough to be just perceptible, you draw a point.

It can be reproduced : put a person in a quiet room. Raise level of 1 kHz tone until just barely audible, then vary the frequency and plot.

(To hear a 14Khz tone, or more, the output level must be very high.)

------------------------------------------------------------------------

 

Frequency Masking

Do receptors interfere with each other ? Normal human ears are sensitive to a wide range of frequencies. However, when a lot of signal energy is present at one frequency, the ear cannot hear lower energy at nearby frequencies. We say that the louder frequency masks the softer frequencies. The louder frequency is called the masker.

Strictly speaking, what I'm describing here is really called simultaneous masking (masking across frequency). There are also nonsimultaneous masking (masking across time) phenomena.

•Experiment: Play 1 kHz tone (masking tone) at fixed level (60 dB).

•Play test tone at a different level (e.g., 1.1kHz), and raise level until just distinguishable.

•Vary the frequency of the test tone and plot the threshold when it becomes audible:

 

This plot shows that frequencies near to 1kHz must be replayed at a much higher level to become distinguishable. The 1Khz frequency tone masks them.

 

------------------------------------------------------------------------

Sub band coding

Sub Band Coding (SBC) depends on this phenomenon of the human hearing system. (Masking.)

The basic idea of SBC is to save signal bandwidth by throwing away information about frequencies which are masked. The result won't be the same as the original signal, but if the computation is done right, human ears can't hear the difference.

Most SBC encoders use a structure like this.

•First, a time-frequency mapping (a filter bank, or FFT, or something else) decomposes the input signal into subbands.

•The psychoacoustic model looks at these subbands as well as the original signal, and determines masking thresholds using psychoacoustic information.

•Using these masking thresholds, each of the subband samples is quantized and encoded so as to keep the quantization noise below the masking threshold.

•The final step is to assemble all these quantized samples into frames, so that the decoder can figure it out without getting lost.

 

Decoding is easier, since there is no need for a psychoacoustic model. The frames are unpacked, subband samples are decoded, and a frequency-time mapping turns them back into a single output audio signal.

This is a basic, generic sketch of how SBC works.

 

Thanks to Canadian CS for the nice plots and to http://www.mp3bench.com/mpeg/index

 


If a link on this page doesn't work, let me know It doesn't fuckin'work!

All this web page is designed by Dave Guile © 1998