[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [school-discuss] most frequently used words



There aren't programs like that that I know of, but there is at least 
one "open" service that looks up words from a variety of sources:

http://www.dict.org/

You could turn that into a script maybe (I didn't check out the 
license). But I don't think it is going to do what you want, because the 
problem is much harder than one might imagine at first blush.

For example, you gave the example "frog" which would seem to be 
unarguably a "noun", no? Well, using www.dict.org gives:

   noun - An amphibious animal
   verb - To ornament or fasten
   interjection - Term of disgust
   adjective - Similar to bagbiting, but milder

And it only gets worse with less concrete words. So, you can probably 
label them yourself with what *you* mean easier than you could automate 
the process.

Of course, you can't tell how the word was used once you have just the 
word. If you wanted to try to parse the sentence to find the context of 
the word, then you would be into other problems. If we had that problem 
solved, the web would be much easier to search, and many other problems 
would go away.

Maybe for a limited context (say third grade reading in a given country) 
it might be possible to automate the task with one source dictionary. 
But in general, it is an impossible task.

Oh, frog! I have to go frog my frogging froggy...

-Doug

Jeremy C. Reed wrote:

> I am looking for some easy ways to figure out the most commonly used
> words (in English).
> 
> But, I would like to categorize them by nouns, verbs, article, pronouns,
> conjunctions, etc.
> 
> Does anyone know of any dictionary software that can be used on an Unix
> command-line that can help?
> 
> Such as some tool like:
> 
>   $ the-dictionary -t frog
>   noun
>   $ the-dictionary -t ahdsjkhgfe
>   [not in dictionary]
>   $
> 
> (I already can build a list of frequently used words from miscellanous
> emails, and HTML and txt docs on my system.)
> 
> My plan is to build categorized lists of top words for reading practice.
> 
>    Jeremy C. Reed
>    http://www.reedmedia.net/
> 
> p.s. for example, frequently used words (not categorized):
> 
> 7.6% the
> 3.0% to
> 2.6% a
> 2.5% of
> 2.3% and
> 2.0% is
> 1.7% in
> 1.5% for
> 1.0% this
> 1.0% that
> 1.0% be
> 0.8% with
> 0.8% if
> 0.7% or
> 0.7% it
> 0.7% are
> 0.6% you
> 0.6% on
> 0.6% not
> 0.6% by
> 0.6% as
> 0.5% from
> 0.5% an
> 0.4% will
> 0.4% which
> 
> 
> 


-- 
Douglas S. Blank,         Assistant Professor
dblank@brynmawr.edu,            (610)526-6501
Bryn Mawr College,   Computer Science Program
101 North Merion Ave,       Park Science Bld.
Bryn Mawr, PA 19010  dangermouse.brynmawr.edu