Historical Thesaurus :: About the Thesaurus

Home > About the Thesaurus > A Guide to the Thesaurus >

Source Data

Outline of Sources

The primary source of data for the first edition of the Historical Thesaurus was the second edition of the Oxford English Dictionary (OED) and its Additions series, incorporating the first edition and its supplements. The in-progress second edition of the Thesaurus takes into account the work produced for the third edition of the OED.

For the Old English (OE) period, these OED data were augmented by additional slips covering the recorded OE vocabulary, whether obsolete by 1150 or not. These Old English slips were compiled by Jane Roberts, using Clark Hall’s A Concise Anglo-Saxon Dictionary as an initial word list, and Bosworth Toller’s An Anglo-Saxon Dictionary for its supplementation and for fuller exemplification of senses. Account was also taken of materials from the new Toronto Dictionary of Old English as they became available. As Roberts and Christian Kay worked on the classification of these materials, it became clear that they would make a useful resource for scholars in their own right, hence the publication in 1995 of A Thesaurus of Old English (TOE) – a rare case of the child preceding the parent. Data from TOE are included in the Historical Thesaurus, but without the length marks normally added in Old English texts or the flags used in TOE to indicate distribution and frequency. Words from TOE appear either as freestanding Old English forms, labelled OE, or are linked to their modern descendants as listed in the OED. Thus mother < modor OE– represents the modern English word mother, derived from OED modor, and in continuous use thereafter. Since one of our main interests was to see how many Old English words mapped onto the OED entries, we may have been over-generous in our distribution of Old English forms. On the other hand, an interesting by-product of this operation has been the discovery of possible links spanning centuries of unrecorded use, as when an apparently obsolete Old English word resurfaces in a nineteenth-century dialect dictionary.

Some Considerations

Although we have relied heavily on the OED, we have not followed it slavishly. In a 2002 paper outlining the relationship between the OED and the Historical Thesaurus, Kay and Irené Wotherspoon wrote:

Given the different purposes and design of dictionaries and thesauri, it is not surprising that difficulties should arise in transforming one into the other. The basic aim of the OED is to give maximum information about the development and use of individual word forms. HTE [= Historical Thesaurus of English], on the other hand, aims to group together words which share one or more features of their meaning, thus using a broader brush to maximize semantic information. The thesaurus slip-maker is not mechanically transferring data from one format to another, but is continually making judgements about the appropriateness of that information for his or her purpose. Thus, the original OED editor may have felt that information about a certain word suggested a division into a certain number of senses and proceeded accordingly. The thesaurus editor, on the other hand, may feel that these divisions are either too broad, and thus miss nuances of meaning that s/he might wish to see represented, or, more usually in our case, too narrow, resulting in several appearances of the same form within a single semantic category. The fact that such problems occur does not mean that either editor is ‘wrong’ in his or her decisions, but is an inevitable outcome of data intended for one purpose being adapted for another.[1]

Thus, although the OED sense divisions are generally followed, there is not total isomorphism between the data in the Historical Thesaurus and the OED. Nor do we claim to include the entire contents of the OED. Where a word generates a large number of phrases and compounds, we have usually omitted the most obviously transparent. We have also been selective with derived forms, such as the almost limitless formations with prefixes such as dis- or un-.[2] There are sometimes apparent absences of words from the Historical Thesaurus lists, not because the OED does not include them, but because (especially in the case of verbal nouns and participial adjectives) it does not include a quotation in the sense needed for that particular list. This applies especially where the OED has a blanket entry to cover “action of the verb in its various senses” without specifying or distinguishing the senses, for example the verbal noun sagging, where the single entry contains citations which could potentially be linked to any of the multiple meanings of the verb. The result is that it is quite often difficult to be sure which sense each form is to be assigned to, and it seems safer to omit the form altogether; conceivably the OED editors themselves were not sure. Revisions for OED3 will work towards addressing that, and we are accounting for this in our second edition. Comparisons undertaken thus far have indicated that the editorial choices made by Thesaurus editors have been mirrored by OED editors when it came to revising the relevant entries.

A source of unevenness for the Old English data, again relating to participles and participial adjectives, is that these are represented only very sparsely among the headwords in Clark Hall and Bosworth Toller. Compounds like landagende, for example are included, while simplices like reocende appear only sporadically. The result is that the proportion of these forms in the adjective lists is much lower for Old English than for Middle and Modern English.

Similar situations can arise with adjectives and adverbs. The senses of an adjective may be less finely differentiated than those of the noun from which it derives, or the adverb senses from those of the parent adjective. The slip-maker then has to decide whether to put the adjective in a more general category than the noun, or to select the citations appropriate to each noun as far as this is possible, and bearing in mind that the OED editors had access to discarded examples as well as those which ended up in print. On the whole we have gone for the latter solution rather than simply repeating the adjective entry in all the possible categories.

Derived verbal forms also raise further issues of transitivity. In the OED, transitivity is rarely (if ever) specified for verbal nouns and participial adjectives, but for the Thesaurus’ purposes we needed to know which form of the verb they were attached to if we were to take their dates into consideration. There are also issues, not just for dictionary or thesaurus makers but for grammarians generally, in the wide grey area between an intransitive verb standing alone and a transitive verb with a direct object – a cline of transitivity including such following constructions as indirect objects, infinitive phrases, clauses, and prepositions. On the whole we have tended to follow more recent OED practice in labelling a verb with any kind of object as transitive. The problem, however, is compounded where there are Old English verbs, as objects may occur in different grammatical cases or there may simply not be enough evidence to support a decision. In that case, verbs are entered without specifying transitivity, giving us three basic categories: vi. (intransitive), vt. (transitive) and v. (verb). There are also three minor categories: v. impers. (impersonal), v. pass. (passive), and v. refl. (reflexive).

As with any large project conducted by many hands over a long period of time, there have been some minor variations in practice. Originally, for example, we intended to exclude two classes of words where the coverage of OED1 was known to be patchy: later dialectal words and words from recent technological and scientific fields. Both of these initial decisions were overtaken by events, principally the much greater coverage of scientific registers in OED2, reflecting a more widespread interest in things scientific[3]. Many dialect words crept in simply because the compilers could not resist them, but also because of increased interest in linguistic variation. The basic meanings of grammatical words, such as prepositions and conjunctions, are usually represented, but not at the level of detail given in the OED.

In matters of spelling, we have followed OED3 (where available) and OED2 headwords, including any variant spellings that occur there, and occasionally including further variants if they are significantly represented in particular senses. Capitalization is less straightforward. The introduction to OED2 tells us that:

In the first edition of the OED, every main headword was given a capital initial, regardless of whether the word was normally so written. Most derivatives, and many combinations, were also capitalized. The Supplement, in accord with modern lexicographical practice, abandoned this convention, giving a capital only where that is the normal spelling. This edition follows the Supplement's practice. For many words capitalization varies, either at different dates or in different senses. Because its convention disguised the problem, the OED often did not indicate the prevailing or preferred style. Where the intentions of the first edition were not deducible, as often with rare and obsolete words, decisions about capitalization were made on the basis of the printed quotations or analogy with similar and related words, or both.[4]

In general, we have followed this practice, treating capitalization of each sense of a word according to the information available in the OED headwords and citations. Sometimes this conflicts with the Historical Thesaurus’ style of using upper case initials for main category headings and lower case for subordinate ones. For proper names, we have allowed capitalization to override this style, but in one particular case, the use of Latin taxonomic terms in subordinate scientific categories, we have retained lower case initials.

Where appropriate, we have included OED labels indicating features such as style, for example ‘slang’ or ‘ironic’, and provenance (e.g ‘dialectal’ or ‘South African’). However, we have omitted such labels where the fact of classification makes them redundant; we do not, for example, label a word ‘Physics’ in the category of that name. Later Scots words are usually labelled as such, but for older words, where there is an initial mixture of Middle English and Older Scots citations, labelling was often deemed unnecessary. One of the most problematic labels has proved to be ‘figurative’ (fig.). The huge amount of research into metaphor in recent years has focused attention on the extent to which the abstract lexis is derived from the concrete, i.e. is inherently metaphorical.[5] In a category such as Pride, when one’s perception of metaphor is sharpened, one begins to wonder why upstage is marked as figurative, while condescend is not; the editorial criterion roughly has to be whether the metaphor is sufficiently fresh to be perceived as such by modern users of the language.

Finally, the Historical Thesaurus has two conventions which indicate currency in our source data, neither wholly unproblematic. Words with an OED final date after 1870 are usually considered to be actually or potentially still in use, and the closing date is often replaced with a dash, as in aunt 1297–. In cases of uncertainty, we consulted available sections of OED3, online corpora, and two desk dictionaries, The Concise Oxford Dictionary[6] and The Chambers Dictionary.[7] For cases where there were doubts about continuous currency, usually defined as a gap in citations of around 150 years, we have replaced the customary dash between dates with a plus sign. Thus, great-aunt 1656 + 1870– indicates that the word was first recorded in 1656, but not found again until 1870, whereas nephew 1494-1585 (in the sense of niece) indicates that the word appears to have been current between those dates.[8] Where a word (or sometimes a date) is recorded only in a dictionary or similar work, we have used the label ‘Dictionary’ (Dict.), to indicate doubts about its general currency. In all cases, a certain amount of expert lexicographical and editorial discretion was allowed.

Source citations

Our sources therefore include:

The Oxford English Dictionary. 1884-1933, ed. by Sir James A. H. Murray, Henry Bradley, Sir William A. Craigie and Charles T. Onions; Supplement, 1972-1986, ed. by Robert W. Burchfield; 2nd edn, 1989, ed. by John A. Simpson and Edmund S. C. Weiner; Additions Series, 1993-1997, ed. by John A. Simpson, Edmund S. C. Weiner and Michael Proffitt; 3rd edn (in progress) OED Online, March 2000-, ed. by John A. Simpson, Edmund S. C. Weiner and Michael Proffitt. Oxford: Oxford University Press.

We also include the material from:

Roberts, Jane and Christian Kay with Lynne Grundy. 1995. A Thesaurus of Old English. (= King’s College London Medieval Studies XI.) Second edition 2000. Amsterdam: Rodopi.

This material derives from:

Bosworth, Joseph and T. Northcote Toller. 1882-98. An Anglo-Saxon Dictionary. With supplements by T. Northcote Toller 1908-21 and Alistair Campbell 1972. London and Oxford: Oxford University Press.

Cameron, A. F., A. C. Amos, A. diP. Healey, Sharon Butler, Joan Holland, David McDougall and Ian McDougall, eds. 1986–. Dictionary of Old English (DOE). Toronto: PIMS. (Published for the Dictionary of Old English Project, Centre for Medieval Studies, University of Toronto.)

Clark Hall, J. R., with supplement by H. D. Merritt. 1960. A Concise Anglo-Saxon Dictionary. Fourth edition. Cambridge: Cambridge University Press.

[1] “Turning the Dictionary Inside Out: some issues in the compilation of a historical thesaurus”. A Changing World of Words: Studies in English Historical Semantics and Lexis, Javier E. Diaz Vera (ed.). Amsterdam: Rodopi, 2002, 109-135.

[2] For an interesting account of the OED’s issues with the latter, see Peter Gilliver, “The Great Un-crisis: an unknown episode in the history of the OED”. Words and Dictionaries from the British Isles in Historical Perspective, John Considine & Giovanni Iamartino (eds). Newcastle: Cambridge Scholars Publishing, 2007, 166-177.

[3] See Michael Rand Hoare & Vivian Salmon, “The Vocabulary of Science in the OED”. Lexicography and the OED: Pioneers in the Untrodden Forest, Lynda Mugglestone (ed.). Oxford: Oxford University Press, 2000, 156-171.

[4] From the OED2 preface online, point D5: http://public.oed.com/history-of-the-oed/oed-editions/introduction-to-the-second-edition/

[5] The classic text, which inspired much subsequent work, is George Lakoff & Mark Johnson, Metaphors We Live By. Chicago: University of Chicago Press, 1980. Our interest in this work sparked the Glasgow Mapping Metaphor with the Historical Thesaurus project, whose results are available here.

[6] Robert Allen (ed.), The Concise Oxford Dictionary. 8th edn. Oxford: Clarendon, 1990.

[7] Robert Allen & Catherine Schwarz (eds), The Chambers Dictionary. Edinburgh: Chambers, 1998.

[8] Philip Durkin, The Oxford Guide to Etymology. Oxford: Oxford University Press, 2009, in chapters 3 and 8 has an interesting discussion of how, if at all, we can determine whether words and meanings with widely separated citations are cases of re-invention at different periods (‘polygenesis’) or simply victims of a defective record. Some light may be shed on such matters when words are examined in semantic categories in the Thesaurus.