Introduction

What does Dither123 stand for?
My PC-based jukebox was the origin of this project. It used mpg123 and ogg123 to play MP3 and OGG files. To improve audio quality, dithering was added to the output routines of both codecs. 
Recently this audio-only jukebox was upgraded to play both video and audio using mplayer.
For MP3 and Ogg, Mplayer uses the same routines, so adding dither was straightforward.

What's dither and do I need it?
It's is not up to me to try to explain all the ins and outs of dither when better explanations float the web:
http://www.hifi-writer.com/he/dvdaudio/dither.htm
http://www.users.qwest.net/%7Evolt42/cadenzarecording/DitherExplained.pdf
http://www.mtsu.edu/~dsmitche/rim420/reading/rim420_Dither.html
It boils down to:
Any operation on digital audio that involves truncation should use dither. By adding a minimal noise, the ill effects of rounding are masked, giving linearity even below the LSB.
Most audio codecs (like mpg123 decoding engine) use floating point numbers internally to represent the digital audio. As a final step these floating point numbers are converted to 16 bit integers and send to the soundcard. Unfortunately most programmers forgot about dither in this last step....



Implementation

For mpg123 the audio decoder block decodes the MP3 audio data to floating point numbers. Since the sound card can only handle 16 bit integers, these floating point numbers must be converted to integer, thereby introducing rounding errors. Ill effects of this conversion can be circumvented by adding dither noise prior to float->int format conversion.The green parts in the block diagram are added to the existing software to implement the dither.


The dither noise source can easily be created in software: Wide band 2LSB amplitude dither with triangular probability density can be generated by adding two random values [1]. 
However this dither isn't optimal. Better implementations use noise shaping: The added dither noise consists of high frequencys only, so the noise floor isn't raised at frequencys where the human ear is most sensitive [2].
I wasn't tempted by the idea of programming a DSP routines for the noise shaping filter, so I came up with a very simple alternative: Just store about a second of pre-calculated dither noise in a look-up table!

Pro and cons of this dither table: 
+Fast:minimal extra CPU load since no calculations are required to generate the noise
+Simple: The table can be constructed using a audio editor, which is tailored for this job, instead of software routines.
-The noise isn't of infinite length but "circular." When the end of the look-up table is reached, the noise samples are re-read from the start of the table. This makes the noise spectrum consist of discrete components instead of continu. I choose a large lookup table size of 64k values (over 1.4 sec repeat time at 44.1kHz) to combat possible ill-effects of this finite size. 
-Memory cost: for the chosen table length this table takes 256kB (=64k x 4Byte/sample).


Now how does all of this look in software?

1) Declarations:
Each audio channel requires its own pointer to the dither look up table. For stereo 2 variables suffice, using 8 might even handle multichannel audio:
static unsigned short ditheridx[8]={0,0,0,0,0,0,0,0};
To preserve these index variables to the next call to the decoding routine, static variables are used.

Declare the dither table itself. Only 4 of the 64k array values are shown here....
float dithertable[65536]={
-0.126371,
0.497872,
-0.779434,
0.843073,.....};


2)Now the hardest part: Where in the source code is the output of the decoding engine truncated to 16 bit integer? 
This conversion is frequently accompanied by boundary checking at 16 bit limits 32767 and -32768.
In mplayers ogg decoder (ad_libvorbis.c) the following line is responsible:
int val=mono[j]*scale;

3) To add the dither replace the line by:
int val=(mono[j]*scale+dithertable[ditheridx[i]++]);
Note: Since the dither table has 64k values, just incrementing the index automatically wraps around.


How can this be so simple, but still be neglected by most programmers??

For mpg123 (same codec as mplayers mp3lib) the implementation is more difficult since the decoding routines are optimized in assembly language.



Results
This blah-blah bores me, now just show me that it works!

spectrum plots
Source for the spectrum plots was the 2tone.mp3 file. It contains two very low amplitude sine waves: 
440Hz at -85dB
494Hz at -91dB
MP3 Decoder: mpg123-pre0.59s Pentium-build

The 1st picture shows the output spectrum when decoded using mpg123 default decode_i586.s routine. Note that due to quantization distorsion the spectrum is crowded with frequency components that aren't present in the source.

 

The 2nd picture uses wide band 2LSB triangular distribution dither noise. Notice the clean spectrum, at the penalty of a raised noise floor (according to [1] an extra 4.7dB above the 16 bit noise floor). 

 

The 3rd picture uses 5 LSB noise-shaped dither. It combines the low noise floor with a clean spectrum. All the noise energy is pushed into the higher frequencys.

For those who want to conduct test their own mp3 decoder setup, the 2tone.mp3 file can be found on the download page.

music
Nice pictures, but it's the sound that matters. To hear any difference, do I need audiophile stuff like oxygen-free cables and golden ears?

On the download page I prepared some special samples, that will expose the difference on even the cheapest PC speakers!
A special MP3 source file was created by lowering the volume by -74dB. This MP3 was decoded to wav, and the resulting wavs were amplified by the same amount of 74 dB.
 
This test not only shows the benefits of dither in practice, it also shows that some codecs lack LSB precision: Check out the mp3lib mmx-optimized sample on the download page: At this low amplitude level, the music isn't even recognizable! Unfortunately, this is the decoding mode used by most Intel processors...


References:
1: Quantization and Dither: A theretical survey. S.P. Lipshitz and R.A. Wannamaker and J Vanderkooy
J. Audio eng. Soc. Vol 40 #5 1992 May

2: The Application of Narrow-Band Dither Operating at the Nyquist Frequency in Digital Systems to Provide Improved Signal-to-Noise Ratio over Conventional Dithering. Barry A. Blesser and Bart N. Locanthi
J. Audio eng. Soc. Vol 35 #6 1987 June