[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[school-discuss] falling costs of producing audio books



hello,

many months ago, i estimated the cost of producing an audio book at $5000-7000 
U.S.,

that's 100 to 140 man-hours.

totday the cost for a good rendering, like what we are doing to lion, witch 
and wardrobe and odyssey is about $2,000 or a single man-week.

completing a "six-pack" of works by the same author is approaching $1,000 now.
(good quality rendering with a single voice and no music or sound effects.)

that's six books for twenty man-hours.

at twenty man-hours with a payoff of half-a-dozen books, we hope the idea of 
starting your own audio book effort is attractive.

how many things can you do for the greater good in a little over two work 
days, that result in 6 new books for young children, English as a second 
language students, and plain old book lovers, can be played on any old mp3 
player and can be duplicated for less than 50 cents?

meanwhile, we are continuing to move the technology forward in three 
directions (marked A, B and C):

A- the gutenberg group wants to rip audio books on the fly.  last year, our 
technology to do that produced a mono-tone voice, that was hypnotic - it put 
you to sleep.  today, we can rip a book on the fly that has a reasonably 
engaging inflection.  this resulted directly from focusing on the effect 
consonant patterns have on a "steady state" audio generator, that is, a sound 
generate that produces a certain number of phenoms per time unit, as viavoice 
appears to be (this is speculation, because the source code is not available, 
and the speculation is based on more than a year of observations].

unfortunately, running redhat linux 7.2 on an atlalon 900 with .5 gig of 
memory and a typical ide drive, it takes about 2.5 hours to complete a book 
like war of the worlds.  fortunately, no single chapter takes over ten 
minutes.  (this abberation in arithmetic is accounted for by our not having a 
dedicated box to rip audio on, so loading affects run time).

we hope gutenberg users will use the chapter-by-chapter ripping software, and 
will help out with a few cycles and maybe some bandwidth :-)  

the new software, which i am using to rip the etc ... new year's cd does the 
following :

1 - downloads the gutenberg book
2 - applies an author specific set of sed parsers to correct 
mis-pronunciations.
3 - introduces appropriate pauses in the reading.
4 - dithers speech rate and voice baseline frequency to improve inflection.
5 - creates an .au file in a non-standard format.  sorry.
this is an emacspeak / viavoice artifact.  without the source code we can't 
fix it.
6- creates .wav files
7 - upsamples the .wav file and introduces masking to reduce digitized 
artifacts (i.e. "buzzing")
8 - expands the voice output in the critical mid-range.
9 - adds reflections to improve clarity, according to the guidelines 
established for this by the German classical recording company, DGG,
a leader in this field.
10 - converts the wav file to mp3 and adds title information.
we would love to make this conform to koa(?) indexing requirements,
but our hands are full.

b : rewriting classic works as radio plays :

this involves translation of the work to first person, introduction of 
multiple computer voices, sound effects and music and more extensive 
listening to output, to perfect inflection.

perfection of inflection - this is changing the speech rate to prevent 
distortion of the time sense of a phrase.  some vowel patters cause 
emacspeak/viavoice to draw a word out, as if the reader was dealing with the 
after-effects of a stroke!  the rate must revert to the baseline value 
immediately after the problematic word.

you can spend as much on one of these as is prudent, it's easy to go 
overboard.

Alice in wonderland could be considered a representative example, along with 
the unwilling vestal (see our audio book downloads).

c : the crown jewel of these efforts is a live reading of a script done as a 
type b.

there a number of interesting technical problems to solve next year.  we 
intend to produce an inflection that is as suitable as an average human 
reader of good quality.  the primary barrier is the removal of field effect 
abberations introduced by the audio generator.  this is a buzzing, much like 
a musical instrument produces when played so that none of the partials line 
up correctly in pitch.

the FFT routines we have are to "ham-handed" to remove this.

unfortunately, more extensive use of the speech synthysis commands that 
produce a more pleasing characterization also make this buzzing increase to 
intolerable levels.

this one takes specialized skills and time to fix.

but we are poking this elephant and are confident that continued incremental 
improvement will get us where we want to go.

one final point, having students build plays from novels will result in better 
results than are achieved by schools today.  the students will exhibit more, 
and more rapid improvement than is being realized today.  it is what writers 
do today. it is what Walt Disney did.  actually, it is what j. s. back and g. 
Mahler and many other painters, and architects, and writers and every sort of 
creator have always done.

the future is a reinterpretation of the past.

that is the essence of the fount from which self-determination springs.

and this plugs directly into it, something that is needed today.

mike eschman, etc ... 
(http://www.etc-edu.com ) Not just an afterthought ...