[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [kidsgames] distributed data gathering



Hello Paul,

On Fri, 18 Feb 2000, Paul Kienzle wrote:

-->Date: Fri, 18 Feb 2000 19:47:04 +0000 (GMT)
-->From: Paul Kienzle <pkienzle@kienzle.powernet.co.uk>
-->Reply-To: kidsgames@smluc.org
-->To: kidsgames@smluc.org
-->Subject: [kidsgames] distributed data gathering
-->
-->
-->Anybody familiar with the oracle program that was running on Harvard
-->computers in the '80s?  The program would answer any question you asked
-->it, but in return you had to answer a question that it asked you.  I don't
-->remember if people put it into their login scripts, or if they hacked
-->the login program so that you could not log on until you talked to the
-->oracle.  It came up with some very unexpected answers to your questions,
-->and everyone was amazed.  I think the program was billed as a product of
-->the artificial intelligence program, but instead it was a wonderful hack:
-->it simply redirected your question to the next person and your answer
-->to the previous person.  (Sorry I don't remember where I read this ---
-->perhaps the book of the three hackers, Kevin Mitnick, Robert Morris, and
-->I don't remember who, whose title I don't remember --- it's been a while.)
-->

I remember it vaguely.

-->The reason I mention this (other than being a good story) is that this is
-->a wonderful way to fill a database.  The most important feature is that
-->the data is gathered regularly in small bits, so it is not particularly
-->onerous for any one individual.
-->

Great Idea.

-->For example, to generate a dictionary of English, do a concordance on all
-->the texts of the gutenburg project to generate your word list and find
-->a half dozen sentences which use that word.

We also have the half a dozen or so (some perhaps not too useful to OUR
main goal) that come in dictd including the wordnet stuff.

-->  Then set up a mail server
-->which sends each member a word, the sentences that use that word, and a
-->definition template for things like word class, plural form, past tense,
-->and different word senses.

Could we do a webform as an alternative?

-->  When they reply with the definition, they
-->are sent a new word to define.  If they do not reply in a given period,
-->the word is sent to someone else, and they are sent a new word to define.
-->If they reply with DONTKNOW, then the word is put on the hard word list
-->which self-styled experts subscribe to.  Words that even they don't know
-->can then be periodicly collected and sent to the entire list in hopes
-->that someone will be able to define them (and maybe include an AltaVista
-->search to generate a few more references).  Send the words out in order
-->of frequency, and within a month 1000 subscribers will create a useful
-->dictionary (assuming they each do a word a day).
-->

Sounds wonderful.

-->While gathering the definitions, each word has to go through an editting
-->process.  Again this can be distributed.  Amongst the definition requests,
-->include a list of five entries to check.  The subscriber should clean
-->up the spelling and grammar, and correct any definitions that seem off.
-->Each word should go through several different subscribers just to be sure.
-->Subscribers whose definitions are frequently changed should have some of
-->their original entries examined.  If they are sabotaging the project,
-->then boot them off the list.  Similarly, subscribers whose entries are
-->never changed should be promoted to "Editor".  Note that the dictionary
-->itself should be "owned" by a single individual or a small group, much
-->like the kernel is "owned" by Linus.
-->

ok, who starts as "Editor"?

-->Closer to home, we can use this technique to generate a kid-oriented
-->word list.  Send out words in batches of about 25, alphabetical but
-->randomly selected, and have you rate each word according to what age
-->you think a kid would be familiar with it.

Dictionary at home....

-->  You will need to transform
-->this into a normal score for each subject (i.e., assuming a normal or a
-->log-normal distribution of acquisition ages, transform the actual age
-->scores to one with a mean of 0 and a standard deviation of 1) before
-->collating the responses.

This is beyond me, but it sure sounds good.

-->  You could then generate average scores for
-->each word if you have enough of subjects to have each word rated by
-->several different people.  Next, you have to map the normalized score
-->to an actual age group.  Again send samples of words to each subject
-->balanced according to normal score.  Have your subject ask their kid if
-->they know the word (indirectly of course), and for each word rate it
-->according to how well the child seems to understand.  [If this rating
-->isn't strictly correlated with the normal score, then obviously adults
-->are not very good at rating child-appropriateness of words, and you will
-->have to do the entire list with children which is a whole lot more work.]
-->>From these responses, assuming multiple children at each age group, you
-->can generate a probability for each word being known at a particular age.
-->Note that some children will be "ahead" or "behind", and it will be up
-->to them to determine what age group they are reading at.
-->

This sounds like a good direction for us to go in.  Are you interested in
making a webform, and or modifying some mailing list software to automate
this?  I'm sure we can house this system at
sourceforge.net/seul.org/smluc.org or wherever we need to get the
number of participants we need.  You are basically talking about a
knowledge base for words with humans as the driving force of the knowledge
I think.

-->The biggest problem in all this is getting the thousands of interested
-->individuals.
-->

It will take some time, but I really believe that this will not be a
problem in the long run.

-- 
Jeff Waddell
jeff@smluc.org

Kids Games Project Coordinator
main website at http://smluc.org/SIA/kidsgames/


-
kidsgames@smluc.org  -- To get off this list send "unsubscribe kidsgames"
in the body of a message to majordomo@smluc.org