[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [kidsgames] word familiarity



Hi Steve,

On Mon, 24 Jan 2000, Steve Baker wrote:

-->Date: Mon, 24 Jan 2000 22:51:56 -0600
-->From: Steve Baker <sjbaker1@airmail.net>
-->Reply-To: kidsgames@smluc.org
-->To: kidsgames@smluc.org
-->Subject: Re: [kidsgames] word familiarity
-->
-->jwaddell@ix.netcom.com wrote:
-->
-->> word: 128 characters (are there any words longer than this? should it be
-->> shorter or longer?)
-->
-->The longest word (not place-name or proper noun) in English is
-->Antidisestablishmentarianism - a mere 28 letters.  If you allow
-->proper nouns - but exclude place names, you need to allow 39:
-->Pneumonoultramicroscopicvolcanoconiosis. There is a town in Wales
-->called Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch,
-->but even that is only 57 characters...so I think you're OK with 128.
-->

I just gotta know.  HOW the H*** do you KNOW that?  My goodness....  I
mean I'm glad you know, but....

-->I guess you could add synonyms and antonyms - and words that rhyme.
-->

So basically it's a META Dictionary... containing words and all associated
information to each word.  Simply the sum total of all mankinds
language(s) in one easy to use repository.  IEEEEEEEEE!!!!  Did I say that
with a straight face....

-->Creating such a database would be a monumental undertaking.
-->

uh, well, duh.... you would be correct ;)

-->Have you estimated the size of the task?
-->

Bigger than your average breadbasket.  I'm trying NOT to estimate the size
as it might be intimidating if I were to do that.

-->Back in the early days of the Compact Disk Audio, I worked for
-->Philips Research Labs where we built the first ever CD-ROM system
-->by hacking apart one of the prototypes of the first domestic
-->CD Audio placer - and hooked it up to our 'C.H.R.I.S' home computer
-->prototype (68000-based IIRC).
-->

COOL!

-->As a 'proof of concept' of what a CD-ROM could do (back before
-->3.5" floppies existed, before the existance of the IBM PC - when
-->a 10Mb "Winchester" hard disk was considered pretty amazing), we
-->decided to build a CD-ROM dictionary - when we scoped the amount
-->of work it would entail, it became apparrent that our team of
-->five or so engineers would never finish the job in under a couple
-->of years - so we down-sized the demonstration to just a single
-->letter.  For some reason (I forget why) we chose the letter 'O'
-->- and started in on that - expecting it to take maybe a month.

Ouch, only 5 engineers.  Major burn-out....

-->After about 4 weeks, we were all bored to tears with entering the
-->data and painting the pictures...and we were only about a third
-->of then way through the letter 'O'.
-->

Yep, that's why I HOPE to get this thing into a DATABASE that is world
accessible that many more than just 5 (and not just developer's either)
can add to it.

-->We were attempting a similar thing to what I think you propose
-->- for each letter, a couple of lines of text, pointers to synonyms
-->and antonyms, textual and audio pronounciation guide, pictures
-->for words like OAF, OAK, OAKAPPLE, ...etc.
-->

Do you still have ANY of that data, and would it be POSSIBLE to get
permission to USE it in ours?  It would be incredible if we could slurp a
large chunk directly into our store.

-->Maintaining consistancy over the duration of the project was 
-->very difficult - at the beginning, we were enthusiastic but
-->inexperienced - at the end we were bored to tears and had
-->learned a lot...it was noticable how much nastier the pictures
-->were towards the end of the letter 'o'!
-->

This is very good information and I for one appreciate you sharing this.
Do you have ANY suggestions for dividing the task in such away as to avoid
this type of burn-out?  I want people to remain enthusiastic and not to
all the "pictures to get nastier".  The only way I can see to do that is
to encourage people to enter data during the course of creating something
specific.  Maybe especially when working with their children for specific
vocabulary, the parent could entry the data for the days lessons and that
would be placed in the global repository (assuming they accept that
choice) and then the next parent that needs to do a lesson with that word
will not have to do anything.  Am I making any sense?  Basically I want to
make that part of the project massively parallel to avoid the problems you
speak of.

-->So, even downscaling to words to a childs vocabulary, this is a 
-->HUGE undertaking.

And the benefits, I hope, more than make up for it.  They say open source
("freed" software) allows the developer's to "do it right".  So the
question is -- "Is building this database the RIGHT way to do it?"  If so
then I think we attempt it...  Future generations can fix it on the fly...
It is theirs TOO.

--> I think you should manually code a couple of 
-->dozen words - timing how long it takes you - then scale that up
-->to the couple of thousand you're probably going to need.

I've already put several small books into audio format trying to figure
out some phoneme stuff.  And if I attempted to fill this database on my
own---it would have a very few entries by they time my life cycled ended.
 
--> 
-->However, I have to say that this would be a magnificent
-->resource for potential authors of kids games.  Good luck!
-->

Yes! GOOD LUCK!  to us all, for this one we will NEED it, and help from a
few thousand of our friends....


-- 
Jeff Waddell
jeff@smluc.org

Kids Games Project Coordinator
main website at http://smluc.org/SIA/kidsgames/


-
kidgames@smluc.org  -- To get off this list send "unsubscribe" in the
body of a message to majordomo@smluc.org