[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [school-discuss] most frequently used words
There aren't programs like that that I know of, but there is at least
one "open" service that looks up words from a variety of sources:
http://www.dict.org/
You could turn that into a script maybe (I didn't check out the
license). But I don't think it is going to do what you want, because the
problem is much harder than one might imagine at first blush.
For example, you gave the example "frog" which would seem to be
unarguably a "noun", no? Well, using www.dict.org gives:
noun - An amphibious animal
verb - To ornament or fasten
interjection - Term of disgust
adjective - Similar to bagbiting, but milder
And it only gets worse with less concrete words. So, you can probably
label them yourself with what *you* mean easier than you could automate
the process.
Of course, you can't tell how the word was used once you have just the
word. If you wanted to try to parse the sentence to find the context of
the word, then you would be into other problems. If we had that problem
solved, the web would be much easier to search, and many other problems
would go away.
Maybe for a limited context (say third grade reading in a given country)
it might be possible to automate the task with one source dictionary.
But in general, it is an impossible task.
Oh, frog! I have to go frog my frogging froggy...
-Doug
Jeremy C. Reed wrote:
> I am looking for some easy ways to figure out the most commonly used
> words (in English).
>
> But, I would like to categorize them by nouns, verbs, article, pronouns,
> conjunctions, etc.
>
> Does anyone know of any dictionary software that can be used on an Unix
> command-line that can help?
>
> Such as some tool like:
>
> $ the-dictionary -t frog
> noun
> $ the-dictionary -t ahdsjkhgfe
> [not in dictionary]
> $
>
> (I already can build a list of frequently used words from miscellanous
> emails, and HTML and txt docs on my system.)
>
> My plan is to build categorized lists of top words for reading practice.
>
> Jeremy C. Reed
> http://www.reedmedia.net/
>
> p.s. for example, frequently used words (not categorized):
>
> 7.6% the
> 3.0% to
> 2.6% a
> 2.5% of
> 2.3% and
> 2.0% is
> 1.7% in
> 1.5% for
> 1.0% this
> 1.0% that
> 1.0% be
> 0.8% with
> 0.8% if
> 0.7% or
> 0.7% it
> 0.7% are
> 0.6% you
> 0.6% on
> 0.6% not
> 0.6% by
> 0.6% as
> 0.5% from
> 0.5% an
> 0.4% will
> 0.4% which
>
>
>
--
Douglas S. Blank, Assistant Professor
dblank@brynmawr.edu, (610)526-6501
Bryn Mawr College, Computer Science Program
101 North Merion Ave, Park Science Bld.
Bryn Mawr, PA 19010 dangermouse.brynmawr.edu