[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[school-discuss] falling costs of producing audio books
hello,
many months ago, i estimated the cost of producing an audio book at $5000-7000
U.S.,
that's 100 to 140 man-hours.
totday the cost for a good rendering, like what we are doing to lion, witch
and wardrobe and odyssey is about $2,000 or a single man-week.
completing a "six-pack" of works by the same author is approaching $1,000 now.
(good quality rendering with a single voice and no music or sound effects.)
that's six books for twenty man-hours.
at twenty man-hours with a payoff of half-a-dozen books, we hope the idea of
starting your own audio book effort is attractive.
how many things can you do for the greater good in a little over two work
days, that result in 6 new books for young children, English as a second
language students, and plain old book lovers, can be played on any old mp3
player and can be duplicated for less than 50 cents?
meanwhile, we are continuing to move the technology forward in three
directions (marked A, B and C):
A- the gutenberg group wants to rip audio books on the fly. last year, our
technology to do that produced a mono-tone voice, that was hypnotic - it put
you to sleep. today, we can rip a book on the fly that has a reasonably
engaging inflection. this resulted directly from focusing on the effect
consonant patterns have on a "steady state" audio generator, that is, a sound
generate that produces a certain number of phenoms per time unit, as viavoice
appears to be (this is speculation, because the source code is not available,
and the speculation is based on more than a year of observations].
unfortunately, running redhat linux 7.2 on an atlalon 900 with .5 gig of
memory and a typical ide drive, it takes about 2.5 hours to complete a book
like war of the worlds. fortunately, no single chapter takes over ten
minutes. (this abberation in arithmetic is accounted for by our not having a
dedicated box to rip audio on, so loading affects run time).
we hope gutenberg users will use the chapter-by-chapter ripping software, and
will help out with a few cycles and maybe some bandwidth :-)
the new software, which i am using to rip the etc ... new year's cd does the
following :
1 - downloads the gutenberg book
2 - applies an author specific set of sed parsers to correct
mis-pronunciations.
3 - introduces appropriate pauses in the reading.
4 - dithers speech rate and voice baseline frequency to improve inflection.
5 - creates an .au file in a non-standard format. sorry.
this is an emacspeak / viavoice artifact. without the source code we can't
fix it.
6- creates .wav files
7 - upsamples the .wav file and introduces masking to reduce digitized
artifacts (i.e. "buzzing")
8 - expands the voice output in the critical mid-range.
9 - adds reflections to improve clarity, according to the guidelines
established for this by the German classical recording company, DGG,
a leader in this field.
10 - converts the wav file to mp3 and adds title information.
we would love to make this conform to koa(?) indexing requirements,
but our hands are full.
b : rewriting classic works as radio plays :
this involves translation of the work to first person, introduction of
multiple computer voices, sound effects and music and more extensive
listening to output, to perfect inflection.
perfection of inflection - this is changing the speech rate to prevent
distortion of the time sense of a phrase. some vowel patters cause
emacspeak/viavoice to draw a word out, as if the reader was dealing with the
after-effects of a stroke! the rate must revert to the baseline value
immediately after the problematic word.
you can spend as much on one of these as is prudent, it's easy to go
overboard.
Alice in wonderland could be considered a representative example, along with
the unwilling vestal (see our audio book downloads).
c : the crown jewel of these efforts is a live reading of a script done as a
type b.
there a number of interesting technical problems to solve next year. we
intend to produce an inflection that is as suitable as an average human
reader of good quality. the primary barrier is the removal of field effect
abberations introduced by the audio generator. this is a buzzing, much like
a musical instrument produces when played so that none of the partials line
up correctly in pitch.
the FFT routines we have are to "ham-handed" to remove this.
unfortunately, more extensive use of the speech synthysis commands that
produce a more pleasing characterization also make this buzzing increase to
intolerable levels.
this one takes specialized skills and time to fix.
but we are poking this elephant and are confident that continued incremental
improvement will get us where we want to go.
one final point, having students build plays from novels will result in better
results than are achieved by schools today. the students will exhibit more,
and more rapid improvement than is being realized today. it is what writers
do today. it is what Walt Disney did. actually, it is what j. s. back and g.
Mahler and many other painters, and architects, and writers and every sort of
creator have always done.
the future is a reinterpretation of the past.
that is the essence of the fount from which self-determination springs.
and this plugs directly into it, something that is needed today.
mike eschman, etc ...
(http://www.etc-edu.com ) Not just an afterthought ...