Back | WordLists

ReadmeR5.html - ReadmeR5.html

Download Complete Wordlist (22.56 K)
Showing 92 Randomly Sampled Lines...
"second" the verb.


word is spelled in accordance with normal English spelling

Release 5 of the 12dicts

Almost always, a given word has only one cross-reference - the exception is the incredible tangl

Sometimes,
data with considerably higher frequency than credible fo
N
manage to keep up.  After all, it took them 20 years to recognize the word
composed of a headword and some number (possibly zero) of closely related
any uses of "are" noun uses, but, almost certainly, most
2+2lemma list, but with the headwords arranged approximately by the
"wind", or "crooked", the past tense of "crook").  These
fewer words than 2of12inf.txt.


"fought" or "colonel").  Though these files were developed as a
unusual way, the resulting word is considered independent.  For
alternate headword. There are two specific situations
which might n
 It is composed of entries of 1 or 2 lines each.  The first

The Web is biased towards certain kinds of content, and the
advertising bias is illustrated by the surprisingly high
Y
61406
by agid.txt.  I release neol2007.txt into the publ

Almost always, a given word has only one cross-reference
have been added, marked with a + if they would not have otherwise been
The 12Dicts Word Lists, release 5
 After the words were grouped in this fashion, each band was

"sluggish" to "slug".  In general, I have chosen the course of
80431
spam, the publication of my email
side of the problem.  It is an archaic form of the verb "to be",
extracted at random from my own lists. This is a use of 12dicts of which I
do not approve!)


and 2of4brif lists, lemmatized.  The word "lemmatized" is a rare
least surprise by treating such pairs as independent.


under the Linux operating system) occurs more frequently on the Web
least surprise by treating such pairs as independent.
word, which you will find in none of these lists, but what it means is
terror") and the growing importance of the Internet in our daily lives.

chose to ignore capitalization.  This was necessary - as it would
preferred for these lists to be in synch with the older 12
phrase "over the past
"NEVERENDING sweetie - animal thread" and  "REDRUM REDRUM REDRUM
Atkinson's AGID, described in the file agid.txt.  I place no
date back to the previous century - their omission from previous

The neol2007 list


In the previous editions of 12dicts, I suggested that you write
individual uncapitalized words and their inflections (as recorded in
"check", whereas, to an observer with a British bias, they would no doubt be separate headwords.vocabulary of that content is overrepresented.  Three such biases
is derived primarily from the 12dicts 6of12 list.  ABCD, Alan's
the number of headwords.  I treat "cheque" as a variant of
particularly pleased to occasionally hear of first-year Computer
spam, the publication of my email address in this package has led to a

Here are some other notes on the determination of what words are related.most occurrences of "US" refer to the country rather than to the
Science assignments  specifying a 12dicts list rather than
neol20xx file each time.


which, though pronounced differently, are clearly variants
subject line when you email me.  This will allow me to easily
2+2gfreq list was made by accumulating the frequency counts for all of
source dictionaries will contain some words which either did not exist,

It is probably valuable to present here the matrix of the
 Perhaps my favorite example is that "nostdinc" (a compi
thereby ends up in frequency band 5.  (Not only are hardly
printed.


 "are", as a noun, is an obscure unit of measurement,

Almost always, a given word h
some of this is explained by the technique of setting up many
no word "basical".)  When one of these suffixes is u
"check", whereas, to an observer with a British bias, they would no dou
are always made headwords, even when the relationship to the original
"http://www.wyrdplay.org/AlanBeale/CAAPR-ref-12.html">CAAPR and the Google data,
would have implied more significance to the data than is actually
appear that capitalization on the Web is random, or at least beyond
fancy name for a bi-dialectal pronunciation dictionary whose wor
than the common word "responsibility".


Finally, British forms of words in

    suspects
    2of12inf
     CAAPR is the Combined Anglo-American Pronunciation Reference, a
    there.  As I will explain in the following paragraphs, the Google

    The list of related words contains three sorts of entries.


    No distinction is made of different meanings of the same word,
    here, such as how closely "slavish" is related to "slave", or
    presented in frequency order.  The reason is that I think this
    the number of headwords.  I treat "cheque" as a vari



Back | WordLists