ReadmeR5.html - ReadmeR5.htmlDownload Complete Wordlist (22.56 K)
present, contains an alphabetized list of related words. A simple example:
word is spelled in accordance with normal English spelling
observed include "john", "china", "bush", "yahoo" and "august".
denote "Scrabble inflections".) Depending on your application for
interactive games to literacy programs. And I have been
"wind", or "crooked", the past tense of "crook").
that this list is formatted as a collection of word sets, each set
composed of a headword and some number (possibly zero) of closely related
is somewhat arbitrary. I have consistently chosen an Amer
separately. "wind" the noun and "wind" the verb are considered as a
2+2lemma list, but with the headwords arranged approximately by the
further changes, except perhaps for minor error corrections. However,
and in this form is likely quite uncommon on the Web.
terror") and the growing importance of the Internet in our daily lives.
smaller ones as well.) Many of these words relate to two of the
"bashful" from "bash". There are some rather difficult questions
but it is credited with half the total count for the word
plural inflection, as with "meaning" and "kindness". Such words
based, baseless, basely, baseness, baser, bases -> [basis], basest, basing
2of12inf.txt and 2+2lemma.txt), and a section of additional hyphenated
words which you might choose to add as appropriate to the other lists
god of war rather than to the unit.) "art" illustrates the other
alternate headword. There are two specific situations which might not be obviousextracted at random from my own lists. This is a use of 12dicts of which I
supplied by Google on the frequency of English words on the World Wide Web.
Sometimes, the choice of which variant to treat as
separately. "wind" the noun and "wind" the verb are considered a
have the same inflection ("putting" derives both from "putt" and "put";
Google distinguished words on the basis of capitalization, so that
advertising bias is illustrated by the surprisingly high frequency of
certain ambiguities - should the word "putting" count for the "put" or
me know what you're doing. (Oh, and please put "12di
Google frequency data, and my procedures for processing it, too
are always made headwords, even when the relationship to the or
computer bias is illustrated by words such as "click", "online", "icon"
buzzwords of the 21st century.
Words ending with the suffix -ability/ibility are t
A note on "licensing": 2+2lemma.txt and 2+2gfreq.txt were
lists and their features updated to release 5.
the count evenly between all the possible headwords. This assumes
Perhaps my favorite example is that "nostdinc" (a compiler option
are towards advertising and marketing, computers and pornography. The
The 2+2gfreq list
Here are some other notes on the determination of what words are related.
am publishing the file neol2007.txt, which contains newly popular
the semblance of a frequently referenced site. At any rate, one
order of their frequency of use. The "g" in the name stands for
The list 2+2lemma.txt contains the words in the 2of12inf.txt
the Google data might be somewhat higher than the frequency in
/usr/dicts/words for their input. Keep up the good work, and let
further inaccuracies have been introduced by my own procedures.
certain ambiguities - should the word "putting" count for the "put"
language does not remain static, and the 2007 editions of the 12dicts
resulting data was sorted by frequency, and then grouped into bands
The 2+2lemma list is not formatted as a simple list of words.
activity is the development of CAAPR and ABCD, both of which may be
In the previous editions of 12dicts, I suggested that you write