[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[kidsgames] speech compression




I spent too many hours last night searching for speech compression code
on the internet (how Britain expects to be a "center for e-commerce" when
the have such horrendous local call charges I don't know).  There are a
number of adequate algorithms for our purposes (telephone quality speech),
with bit rates as low as 2400 bits/sec.  You can check out the audio
samples yourself at: http://www.lincom-asg.com/ssadto/speech_codec.html
and see if you agree.  Considering that 16-bit 8 kHz audio requires
128000 bits/sec, this is very good compression (90 kb vs. 4.8 Mb for
a 5 minute story).  Unfortunately, I could only find code for CELP
(4800 bits/sec) and LPC10E (2400 bits/sec, but not so good quality).
I found the WI codec (2400 bits/sec) acceptable, but I could not find
any code which implemented it (at least not at 2 in the morning--anyone
else care to try?)  I didn't download the CELP sample from this site,
so I cannot do a direct comparison, though I did find the CELP samples
at another site acceptable.

Again, speech codecs will make mince out of any other sounds you might
feed them.  For music, I would strongly recommend MIDI (using software
based synthesizers such as timidity of your hardware doesn't support it)
since it is so compact.  Software synthesis does "Eat more CPU time than
a small CPU-time-eating animal" according to the timidity 0.2 man page,
though it may have improved by now.

Environmental sounds will only be needed once, no matter how many
languages are included in the distribution.  I wouldn't expect a lot of
them, as so I would use raw PCM to encode them, or maybe ADPCM (which
just encodes the differences between samples rather than the samples
themselves, so letting you get away with fewer bits per sample).
Code for G723 (16bit/sample -> 3bit/sample) is available from Sun
ftp://svr-ftp.eng.cam.ac.uk/pub/comp.speech/coding/G711_G721_G723.tar.gz

If you want lots of high quality audio, MPEG layer 3 seems like
the way to go (http://www.mpeg.org/MPEG/mp3.html#software),
at least until the community comes up with an unencumbered format
(http://www.xiph.org/ogg/vorbis.html).  The development list for vorbis
is still active, so there is hope they will produce something useable.
The encumberances are patents on the encoding algorithms (see the bottom
of http://www.sulaco.org/mp3/ for details), so it only inconveniences
the developers and not the users.  There are said to be significant
speed and quality differences between the various MP3 encoders, with the
best of the free being LAME (http://www.sulaco.org/mp3/), which is also
available as a linux binary (http://hive.me.gu.edu.au/not_lame/).

Paul Kienzle
pkienzle@kienzle.powernet.co.uk
-
kidgames@smluc.org  -- To get off this list send "unsubscribe" in the
body of a message to majordomo@smluc.org