Purpose and rationale for Thematic Categories
The set of thematic categories has been designed to overlay a less fine-grained hierarchy on the full structure of the Historical Thesaurus. Although the c.240,000 categories and subcategories of the standard hierarchy are essential to the structuring of the Thesaurus data, they are more extensive for much data processing purposes and may, in fact, hinder users of the data by separating the lexis of semantic fields into too many highly precise divisions.
The thematic category set therefore seeks a ‘human scale’ level of categorization, retaining conceptual divisions which are likely to be recognized by the majority of users and eliding those which represent a level of detail beyond that necessary for most purposes. An intentional by-product of the thematic category set is that it draws a line through the Thesaurus hierarchy indicating the point at which concepts are likely to become too narrow and specific for most humans to have much investment in learning the vocabulary associated with them. The resulting list of thematic headings contains 4,033 items, collecting the full Thesaurus category set into a set of headings approximately 98% smaller.
Thematic category headings can be found within the current Thesaurus structure. Clicking on the icon in the top right corner of a category heading box opens a category options pane below the heading, in which the thematic heading for the category in question is provided.
At present, the thematic categories are primarily used on the Historical Thesaurus website as a means of aggregating data for visualizations, rather than as a means of navigating the Thesaurus hierarchy. In the sparkline and heatmap visualization tools, users can select categories from a more manageable list of thematic headings.
Creation of the Thematic Category Set
The thematic headings list was created by looking through the category and sub-category headings in the Historical Thesaurus. Headings which were deemed relevant at ‘human-scale’ have been kept, whilst those which seemed either too specific or too general were removed - for example, Thesaurus heading 01.03.01.05.12 (n.) Disorders of birds (as part of the section on animal health) was thought too specific and specialist a topic for users to be likely to want to search for it; alternatively, 01.05.11.02 (n.) General parts (i.e. of animals) appeared too general to be useful, although headings which are nested under it in the main hierarchy (e.g. 01.05.11.02.04 (n.) Covering/skin) were considered significant enough as concepts to be given a thematic heading.
Historical Thesaurus categories which were too miscellaneous to act as useful search terms have also been omitted (e.g. 03.11.11.42.05.08 (n.) Other parts (i.e. of machines)). No sub-categories from the main Thesaurus hierarchy have been carried over into the thematic category set.
Structure of Thematic Category Headings
The thematic category headings have a five-level hierarchy in the following format:
- two upper case letters
- lower case letter
- lower case letter.
For example, BK01d04a Board-game:
- BK = Leisure
- BK01 = Amusement/entertainment
- BK01d = A specific form of amusement/a pastime
- BK01d04 = Game
- BK01d04a = Board-game
For many thematic categories, this structure closely parallels the structure of the main Historical Thesaurus hierarchy, with the difference that a ‘cut-off’ point is reached below which further meaning distinctions are not recognized.
For most sections of the thematic hierarchy, three or fewer of these levels are actually used, in the hope of keeping the structure as easy for a user to navigate as possible.