Module:Category tree/poscatboiler

From The Languages of David J. Peterson
Jump to navigation Jump to search

Introduction

This is the documentation page for the main data module for Module:category tree/poscatboiler, as well as for its submodules. Collectively, these modules handle generating the descriptions and categorization for almost all category pages. The only current exception is topic pages such as Category:en:Birds and Category:zh:State capitals of Germany, and the corresponding non-language-specific pages such as Category:Birds and Category:State capitals of Germany; these are handled by Module:category tree/topic cat.

Originally, there were a large number of Module:category tree implementations, of which Module:category tree/poscatboiler was only one. It originally handled part-of-speech categories like Category:French nouns and Category:German lemmas (and corresponding "umbrella" categories such as Category:Nouns by language and Category:Lemmas by language); hence the name. However, it has long since been generalized, and the name no longer describes its current use.

The main data module at Module:category tree/poscatboiler/data does not contain data itself, but rather imports the data from its submodules, and applies some post-processing.

  • To find which submodule implements a specific category, use the search box on the right.
  • To add a new data submodule, copy an existing submodule and modify its contents. Then, add its name to the subpages list at the top of Module:category tree/poscatboiler/data.
File:Text-x-generic with pencil.svg The text of any category page using this module should simply read {{auto cat}}.
The correct way to invoke Module:category tree/poscatboiler on a given category page that it handles is through {{auto cat}}. You should not normally invoke {{poscatboiler}} directly. If you find a category page that directly invokes {{poscatboiler}}, it is probably old, from before when {{auto cat}} was created, and should be changed.

Concepts

The poscatboiler system internally distinguishes the following types of categories:

  1. Language categories. These are of the form LANG LABEL (e.g. Category:French lemmas and Category:English learned borrowings from Late Latin). Here, LANG is the name of a language, and LABEL can be anything, but should generally describe a topic that can apply to multiple languages. Note that the language mentioned by LANG must currently be a regular language, not an etymology-only language. (Etymology-only languages include lects such as Provençal, considered a variety of Occitan, and Biblical Hebrew, considered a variety of Hebrew. See here for the list of such lects.) Most language categories have an associated umbrella category; see below.
  2. Umbrella categories. These are normally of the form LABEL by language, and group all categories with the same label. Examples are Category:Lemmas by language and Category:Learned borrowings from Late Latin by language. Note that the label appears with an initial lowercase letter in a language category, but with an initial uppercase letter in an umbrella category, consistent with the general principle that category names are capitalized. Some umbrella categories are missing the by language suffix; an example is Category:Terms borrowed from Latin, which groups categories of the form LANG terms borrowed from Latin. Umbrella categories themselves are grouped into umbrella metacategories, which group related umbrella categories under a given high-level topic. Examples are Category:Lemmas subcategories by language (which groups umbrella categories describing different types of lemmas, such as Category:Nouns by language and Category:Interrogative adverbs by language) and Category:Terms derived from Proto-Indo-European roots (which groups umbrella categories describing terms derived from particular Proto-Indo-European roots, such as Category:Terms derived from the Proto-Indo-European root *preḱ- and Category:Terms derived from the Proto-Indo-European root *bʰeh₂- (speak)). The names of umbrella metacategories are not standardized (although many end in subcategories by language), and internally they are handled as raw categories; see below.
  3. Language-specific categories. These are of the same form LANG LABEL as regular language categories, but with the difference that the label in question applies only to a single language, rather than to all or a large group of languages. Examples are Category:Belarusian class 4c verbs, Category:Dutch separable verbs with bloot, and Category:Japanese kanji by kan'yōon reading. For these categories, it does not make sense to have a corresponding umbrella category.
  4. Raw categories. These can have any form whatsoever, and may or may not have a language name in them. Examples are Category:Requests for images in Korean entries and Category:Terms with redundant transliterations/ru (which logically are language categories but do not follow the standard format of a language category); Category:Phrasebooks by language/Health (which is logically an umbrella category, but again with a nonstandard name); Category:Terms by etymology subcategories by language (an umbrella metacategory); and Category:Templates (a miscellaneous high-level category).

Under the hood, the poscatboiler system distinguishes two types of implementations for categories: individual labels (or individual raw categories), and handlers. Individual labels describe a single label, such as nouns or refractory rhymes. Similarly, an individual raw category describes a single raw category. Handlers, on the other hand, describe a whole class of similar labels or raw categories, e.g. labels of the form learned borrowings from SOURCE where SOURCE is any language or etymology language. Handlers are more powerful than individual labels, but require knowledge of Lua to implement.

Adding, removing or modifying categories

A sample entry is as follows (in this case, found in Module:category tree/poscatboiler/data/lemmas):

labels["adjectives"] = {
	description = "{{{langname}}} terms that give attributes to nouns, extending their definitions.",
	parents = {"lemmas"},
	umbrella_parents = "Lemmas subcategories by language",
}

This generates the description and categorization for all categories of the form "LANG adjectives" (e.g. Category:English adjectives or Category:Norwegian Bokmål adjectives), as well as for the umbrella category Category:Adjectives by language.

The meanings of these fields are as follows:

Category label fields

The following fields are recognized for the object describing a label:

parents
A table listing one or more parent labels of this label. This controls the parent categories that the category is contained within, as well as the chain of breadcrumbs appearing across the top of the page (see below).
  • An item in the table can be either a single string (the parent label), or a table containing (at least) the two elements name and sort. In the latter case, name specifies the parent label name, while the sort value specifies the sort key to use to sort it in that category. The default sort key is the category's label.
  • If a parent label begins with Category: it is interpreted as a raw category name, rather than as a label name. It can still have its own sort key as usual.
  • The first listed parent controls the category's parent breadcrumb in the chain of breadcrumbs at the top of the page. (The breadcrumb of the category itself is determined by the breadcrumb setting, as described below.)
description
A plain English description for the label. This should generally be no longer than one sentence. Place additional, longer explanatory text in the additional= field described below, and put {{wikipedia}} boxes in the intro= field described below so that they are correctly right-aligned with the description. Template invocations and special template-like references such as {{{langname}}} and {{{langcode}}} will be expanded appropriately; see #Template substitutions in field values below.
breadcrumb
The text of the last breadcrumb that appears at the top of the category page.
  • By default, it is the same as the category label, with the first letter capitalized.
  • The value can be either a string, or a table containing two elements called name and nocap. In the latter case, name specifies the breadcrumb text, while nocap can be used to disable the automatic capitalization of the breadcrumb text that normally happens.
  • Note that the breadcrumbs collectively are the chain of links that serve as a navigation aid for the hierarchical organization of categories. For example, a category like Category:French adjectives will have a breadcrumb chain similar to "Fundamental » All languages » French » Lemmas » Adjectives", where each breadcrumb is a link to a category at the appropriate level. The last breadcrumb here is "Adjectives", and its text is controlled by this field.
displaytitle
Apply special formatting such as italics to the category page title, as with the {{DISPLAYTITLE:...}} magic word (see mw:Help:Magic words). The value of this is either a string (which should be the formatted category title, without the preceding Category:) or a Lua function to generate the formatted category title. A Lua function is most useful inside of a handler (see #Handlers below). The Lua function is passed two parameters, the raw category title (without the preceding Category:) and the language object of the category's language (or nil for umbrella categories), and should return the formatted category title (again without the preceding Category:). If the value of this field is a string, template invocations and special template-like references such as {{{langname}}} and {{{langcode}}} will be expanded appropriately; see below. See Module:category tree/poscatboiler/data/terms by etymology and Module:category tree/poscatboiler/data/lang-specific/nl for examples of using displaytitle=.
intro
Introductory text to display at the very top of the page. This is similar to the description= field, but it logically displays before the edit and recent-entries boxes on the right side, while the description displays after. The intro= field is mostly useful for right-aligned text such as {{wikipedia}} boxes, which if placed in the intro= field will correctly align to the right of description= text. Template invocations and special template-like references such as {{{langname}}} and {{{langcode}}} are expanded appropriately, just as with description=; see #Template substitutions in field values below.
additional
Additional text to display directly after the text in the the description= field. The difference between the two is that description= text will also be shown in the list of children categories shown on the parent category's page, while the additional= text will not. For this reason, use additional= instead of description= for long explanatory notes, See also references and the like, and keep description= relatively short. Template invocations and special template-like references such as {{{langname}}} and {{{langcode}}} are expanded appropriately, just as with description=; see #Template substitutions in field values below.
umbrella
A table describing the umbrella category that collects all language-specific categories associated with this label, or the special value false to indicate that there is no umbrella category. The umbrella category is normally called "LABEL by language". For example, for adjectives, the umbrella category is named Category:Adjectives by language, and is a parent category (in addition to any categories specified using parents) of Category:English adjectives, Category:French adjectives, Category:Norwegian Bokmål adjectives, and all other language-specific categories holding adjectives. This table contains the following fields:
name
The name of the umbrella category. It defaults to "LABEL by language". If you change this, you will probably have to modify Module:auto cat to recognize the new name of the umbrella category.
description
A plain English description for the umbrella category. By default, it is derived from the description field of the category itself by removing any {{{langname}}}, {{{langcode}}} or {{{langcat}}} template parameter reference and capitalizing the remainder. Text is automatically added to the end indicating that this category is an umbrella category that only contains other categories, and does not contain pages describing terms.
parents
The parent category or categories of the umbrella category. This can either be a single string specifying a category (with or without the Category: prefix), a table with fields name (the category name) and sort (the sort key, as in the outer parents field described above), or a list of either type of entity.
breadcrumb
The last breadcrumb in the chain of breadcrumbs at the top of the category page; see above. By default, this is the category label (i.e. the same as the umbrella category name, minus the final "by language" text).
displaytitle
Apply special formatting such as italics to the umbrella category page title; see above.
intro
Like the intro= field on regular category pages; see above.
additional
Like the additional= field on regular category pages; see above.
toc_template, toc_template_full
Override the table of contents bar used on umbrella pages. See below. It's unlikely you will ever need to set this.
umbrella_parents
The same as the parents subfield of the umbrella field. This typically specifies a single umbrella metacategory to which the page's corresponding umbrella page belongs; see #Concepts above). A separate field is provided for this because the umbrella's parent or parents always need to be given, whereas other umbrella properties can usually be defaulted. (In practice, you will find that most entries in a subpage of Module:category tree/poscatboiler/data do not explicitly specify the umbrella's parent. This is because a default value is supplied near the end of the "LABELS" section in which the entry is found.)
toc_template
The template or templates to use to display the "table of contents" bar for easier navigation on categories with multiple pages of entries. By default, categories with more than 200 entries or 200 subcategories display a language-appropriate table of contents bar whose contents are held in a template named CODE-categoryTOC, where CODE is the language code of the category's language. (If no such template exists, no table of contents bar is displayed. If the category has no associated language, as with umbrella pages, the English-language table of contents bar is used.) For example, the category Category:Spanish interjections (and other Spanish-language categories) use Template:es-categoryTOC to display a Spanish-appropriate table of contents bar. (In the case of Spanish, this includes entries for Ñ and for acute-accented vowels such as Á and Ó.) To override this behavior, specify a template or a list of templates in toc_template. The first template that exists will be used; if none of the specified templates exist, the regular behavior applies, i.e. the language-appropriate table of contents bar is selected.
  • Special strings such as {{{langcode}}} (to specify the language code of the category's language) can be used in the template names; see below.
  • Use the special value false to disable the table of contents bar.
  • An example of a category that uses this property is "LANG romanizations". For example, the category Category:Gothic romanizations would by default use the Gothic-specific template Template:got-categoryTOC to display a Gothic-script table of contents bar. This is inappropriate for this particular category, which contains Latin-script romanizations of Gothic terms rather than terms written in the Gothic script. To fix this, the "romanizations" label specifies a toc_template value of {"{{{langcode}}}-rom-categoryTOC", "en-categoryTOC"}, which first checks for a special Gothic-romanization-specific template Template:got-rom-categoryTOC (which in this case does exist), and falls back to the English-language table of contents template.
toc_template_full
Similar to toc_template but used for categories with large numbers of entries (specifically, more than 2,500 entries or 2,500 subcategories). If none of the specified templates exist, the templates listed in toc_template are tried, and if none of them exist either, the default behavior applies. In this case, the default behavior is to use a language-appropriate "full" table of contents template named CODE-categoryTOC/full, and if that doesn't exist, fall back to the regular table of contents template named CODE-categoryTOC. An example of a "full" table of contents template is Template:es-categoryTOC/full, which shows links for all two-letter combinations and appears on pages such as Category:Spanish nouns, with over 50,000 entries.
catfix
Specifies the language code of the language to use when calling the catfix() function in Module:utilities on this page. The catfix() function is used to ensure that page names in foreign scripts show up in the correct fonts and are linked to the correct language.
  • The default value is the category's language, if any (for example, the language LANG in pages of the form LANG LABEL). If the category has no associated language, or if the setting catfix = false is used, the catfix mechanism is not applied.
  • The setting catfix = false is used, for example, on the romanizations label (which holds Latin-script romanizations of foreign-script terms, rather than terms in the language's native script) and the redlinks labels (which holds pages linking to nonexistent terms in the language in question). If this is omitted, for example, then pages in Category:Manchu romanizations will show up oriented vertically despite being in Latin script, and pages in Category:Cantonese redlinks will show up using a double-width font despite mostly not being Cantonese-language pages.
  • The setting catfix = "en" is used for example on categories of the form Requests for translations into LANG (see Module:category tree/poscatboiler/data/entry maintenance) because these categories contain English pages need translations into a given language, rather than containing pages of that language.
  • Note that setting a particular language for catfix= will normally cause that language's table of contents page to display in place of the category's normal language, and setting a value of false will normally cause the English table of contents page to display. In both cases, this behavior can be overridden by specifying the toc_template= or toc_template_full= fields.
|hidden = true
Specifies that the category is hidden. This should be used for maintenance categories. (Hidden categories do not show up in the list of categories at the bottom of a page, but do show up when searched for in the search box.)
|can_be_empty = true
Specifies that the category should not be deleted when empty. This should be used for maintenance categories.

Template substitutions in field values

Template invocations can be inserted in the text of description, parents (both name and sort key), breadcrumb, toc_template and toc_template_full values, and will be expanded appropriately. In addition, the following special template-like invocations are recognized and replaced by the equivalent text:

{{PAGENAME}}
The name of the current page. (Note that two braces are used here instead of three, as with the other parameters described below.)
{{{langname}}}
The name of the language that the category belongs to. Not recognized in umbrella fields.
{{{langcode}}}
The code of the language that the category belongs to (e.g. en for English, de for German). Not recognized in umbrella fields.
{{{langcat}}}
The name of the language's main category, which adds "language" to the regular name. Not recognized in umbrella fields.

Raw categories

Raw categories are treated similarly to regular labels. The main differences are:


Handlers

It is also possible to have handlers that can handle arbitrarily-formed labels, e.g. "###-syllable words" for any ###; "terms in XXX script" for any XXX; or "learned borrowings from LANG" for any LANG. As an example, the following is the handler for "terms coined by COINER" (such as Category:English terms coined by Lewis Carroll):

table.insert(handlers, function(data)
	local coiner = data.label:match("^terms coined by (.+)$")
	if coiner then
		return {
			description = "{{{langname}}} terms coined by " .. coiner .. ".",
			breadcrumb = coiner,
			umbrella = false,
			parents = {{
				name = "coinages",
				sort = coiner,
			}},
		}
	end
end)

The handler checks if the passed-in label has a recognized form, and if so, returns an object that follows the same format as described above for directly-specified labels. In this case, the handler disables the umbrella category "Terms coined by COINER by language" because most people coin words in only one language.

The handler is passed a single argument data, which is an object containing the following fields:

  1. label: the label;
  2. lang: the language object of the language at the beginning of the category, or nil for no language (this happens with umbrella categories);
  3. sc: the script code of the script mentioned in the category, if the category is of the form "LANG LABEL in SCRIPT", or nil otherwise;
  4. args: a table of extra parameters passed to {{auto cat}}.

If the handler interprets the extra parameters passed as data.args, it should return two values: a label object (as described above), and the value true. Otherwise, an error will be thrown if any extra parameters are passed to {{auto cat}}. An example of a handler that interprets the extra parameters is the affix-cat handler in Module:category tree/poscatboiler/data/terms by etymology, which supports {{auto cat}} parameters |alt=, |sort=, |tr= and |sc=. The |alt= parameter in particular is used to specify extra diacritics to display on the affix that forms part of the category name, as in categories such as Category:Latin words suffixed with -inus (properly -īnus).

For further examples, see Module:category tree/poscatboiler/data/words by number of syllables, Module:category tree/poscatboiler/data/terms by script or Module:category tree/poscatboiler/data/terms by etymology.

Note that if a handler is specified, the module should return a table holding both the label and handler data; see the above modules.

Language-specific labels

Support exists for labels that are specialized to particular languages. A typical label such as "verbs" applies to many languages, but some categories have labels that are specialized to a particular language, e.g. Category:Belarusian class 4c verbs or Category:Dutch prefixed verbs with ver-. Here, the label "class 4c verbs" is specific to Belarusian with a description and other properties only for this particular language, and similarly for the Dutch-specific label "prefixed verbs with ver-". Yet, it is desirable to integrate these categories into the poscatboiler hierarchy, so that e.g. breadcrumbs and other features are available. This can be done by creating a module such as Module:category tree/poscatboiler/data/lang-specific/be (for Belarusian) or Module:category tree/poscatboiler/data/lang-specific/nl (for Dutch), and specifying labels and/or handlers in the same fashion as is done for language-agnostic categories. See Module:category tree/poscatboiler/data/lang-specific/documentation for more information.

Subpages


local export = {}

local lang_independent_data = require("Module:category tree/poscatboiler/data")
local lang_specific_module = "Module:category tree/poscatboiler/data/lang-specific"
local lang_specific_module_prefix = lang_specific_module .. "/"

-- Category object

local Category = {}
Category.__index = Category

function Category.new_main(frame)
	local self = setmetatable({}, Category)

	local params = {
		[1] = {},
		[2] = {required = true},
		[3] = {},
		["raw"] = {type = "boolean"},
	}

	local args, remaining_args = require("Module:parameters").process(frame:getParent().args, params, true, "category tree/poscatboiler")
	self._info = {code = args[1], label = args[2], sc = args[3], raw = args.raw, args = remaining_args}

	self:initCommon()

	if not self._data then
		return nil
	end

	return self
end


function Category:get_originating_info()
	local originating_info = ""
	if self._info.originating_label then
		originating_info = " (originating from label \"" .. self._info.originating_label .. "\" in module [[" .. self._info.originating_module .. "]])"
	end
	return originating_info
end
	
	
function Category.new(info)
	for key, val in pairs(info) do
		if not (key == "code" or key == "label" or key == "sc" or key == "raw" or key == "args"
			or key == "called_from_inside" or key == "originating_label" or key == "originating_module") then
			error("The parameter \"" .. key .. "\" was not recognized.")
		end
	end

	local self = setmetatable({}, Category)
	self._info = info

	if not self._info.label then
		error("No label was specified.")
	end

	self:initCommon()

	if not self._data then
		error("The " .. (self._info.raw and "raw " or "") .. "label \"" .. self._info.label .. "\" does not exist" .. self:get_originating_info() .. ".")
	end

	return self
end

export.new = Category.new
export.new_main = Category.new_main


function Category:initCommon()
	local args_handled = false
	if self._info.raw then
		-- Check if the category exists
		local raw_categories = lang_independent_data["RAW_CATEGORIES"]
		self._data = raw_categories[self._info.label]

		if self._data then
			if self._data.lang then
				self._lang = require("Module:languages").getByCode(self._data.lang, true, nil, nil, true)
				self._info.code = self._lang:getCode()
			end
			if self._data.sc then
				self._sc = require("Module:scripts").getByCode(self._data.sc, true, nil, true)
				self._info.sc = self._sc:getCode()
			end
		else
			-- Go through raw handlers
			local data = {
				category = self._info.label,
				args = self._info.args or {},
				called_from_inside = self._info.called_from_inside,
			}
			for _, handler in ipairs(lang_independent_data["RAW_HANDLERS"]) do
				self._data, args_handled = handler.handler(data)
				if self._data then
					self._data.module = self._data.module or handler.module
					break
				end
			end
			if self._data then	
				if self._data.lang then
					if type(self._data.lang) ~= "string" then
						error("Received non-string value " .. mw.dumpObject(self._data.lang) .. " for self._data.lang, label \"" .. self._info.label .. "\"" .. self:get_originating_info() .. ".")
					end
					self._lang = require("Module:languages").getByCode(self._data.lang, true, nil, nil, true)
					self._info.code = self._lang:getCode()
				end
				if self._data.sc then
					if type(self._data.sc) ~= "string" then
						error("Received non-string value " .. mw.dumpObject(self._data.sc) .. " for self._data.sc, label \"" .. self._info.label .. "\"" .. self:get_originating_info() .. ".")
					end
					self._sc = require("Module:scripts").getByCode(self._data.sc, true, nil, true)
					self._info.sc = self._sc:getCode()
				end
			end
		end
	else
		-- Already parsed into language + label
		if self._info.code then
			self._lang = require("Module:languages").getByCode(self._info.code, 1, nil, nil, true)
		else
			self._lang = nil
		end

		if self._info.sc then
			self._sc = require("Module:scripts").getByCode(self._info.sc, true, nil, true) or error("The script code \"" .. self._info.sc .. "\" is not valid.")
		else
			self._sc = nil
		end

		-- First, check lang-specific labels and handlers if this is not an umbrella category.
		if self._lang then
			local langcode = self._lang:getCode()
			local langs_with_modules = mw.loadData(lang_specific_module)
			if langs_with_modules[langcode] then
				local module = lang_specific_module_prefix .. self._lang:getCode()
				local labels_and_handlers = require(module)
				if labels_and_handlers.LABELS then
					self._data = labels_and_handlers.LABELS[self._info.label]
					if self._data then
						if self._data.umbrella == nil and self._data.umbrella_parents == nil then
							self._data.umbrella = false
						end
						self._data.module = self._data.module or module
					end
				end
				if not self._data and labels_and_handlers.HANDLERS then
					for _, handler in ipairs(labels_and_handlers.HANDLERS) do
						local data = {
							label = self._info.label,
							lang = self._lang,
							sc = self._sc,
							args = self._info.args or {},
							called_from_inside = self._info.called_from_inside,
						}
						self._data, args_handled = handler(data)
						if self._data then
							if self._data.umbrella == nil and self._data.umbrella_parents == nil then
								self._data.umbrella = false
							end
							self._data.module = self._data.module or module
							break
						end
					end
				end
			end
		end

		-- Then check lang-independent labels.
		if not self._data then
			local labels = lang_independent_data["LABELS"]
			self._data = labels[self._info.label]
		end

		-- Then check lang-independent handlers.
		if not self._data then
			local data = {
				label = self._info.label,
				lang = self._lang,
				sc = self._sc,
				args = self._info.args or {},
				called_from_inside = self._info.called_from_inside,
			}
			for _, handler in ipairs(lang_independent_data["HANDLERS"]) do
				self._data, args_handled = handler.handler(data)
				if self._data then
					self._data.module = self._data.module or handler.module
					break
				end
			end
		end
	end

	if not args_handled and self._data and self._info.args and next(self._info.args) then
		local module_text = " (handled in [[" .. (self._data.module or "UNKNOWN").. "]])"
		local args_text = {}
		for k, v in pairs(self._info.args) do
			table.insert(args_text, k .. "=" .. ((type(v) == "string" or type(v) == "number") and v or mw.dumpObject(v)))
		end
		error("poscatboiler label '" .. self._info.label .. "' " .. module_text .. " doesn't accept extra args " ..
			table.concat(args_text, ", "))
	end

	if self._sc and not self._lang then
		error("Umbrella categories cannot have a script specified.")
	end
end


function Category:convert_spec_to_string(desc)
	if not desc then
		return desc
	end
	if type(desc) == "number" then
		desc = tostring(desc)
	end
	if type(desc) == "function" then
		local data = {
			lang = self._lang,
			sc = self._sc,
			label = self._info.label,
			raw = self._info.raw,
		}
		desc = desc(data)
	end
	return desc
end


function Category:substitute_template_specs(desc)
	if not desc then
		return desc
	end
	-- This may end up happening twice but that's OK as the function is idempotent.
	desc = self:convert_spec_to_string(desc)

	desc = desc:gsub("{{PAGENAME}}", mw.title.getCurrentTitle().text)
	desc = desc:gsub("{{{umbrella_msg}}}", "This is an umbrella category. It contains no dictionary entries, but only other, language-specific categories, which in turn contain relevant terms in a given language.")
	desc = desc:gsub("{{{umbrella_meta_msg}}}", 'This is an umbrella metacategory, covering a general area such as "lemmas", "names" or "terms by etymology". It contains no dictionary entries, but holds only umbrella ("by language") categories covering specific subtopics, which in turn contain language-specific categories holding terms in a given language for that same topic.')
	if self._lang then
		desc = desc:gsub("{{{langname}}}", self._lang:getCanonicalName())
		desc = desc:gsub("{{{langcode}}}", self._lang:getCode())
		desc = desc:gsub("{{{langcat}}}", self._lang:getCategoryName())
		desc = desc:gsub("{{{langlink}}}", self._lang:makeCategoryLink())
	end
	if self._sc then
		desc = desc:gsub("{{{scname}}}", self._sc:getCanonicalName())
		desc = desc:gsub("{{{sccode}}}", self._sc:getCode())
		desc = desc:gsub("{{{sccat}}}", self._sc:getCategoryName())
		desc = desc:gsub("{{{scdisp}}}", self._sc:getDisplayForm())
		desc = desc:gsub("{{{sclink}}}", self._sc:makeCategoryLink())
	end
	if desc:find("{") then
		desc = mw.getCurrentFrame():preprocess(desc)
	end
	return desc
end


function Category:substitute_template_specs_in_args(args)
	if not args then
		return args
	end
	local pinfo = {}
	for k, v in pairs(args) do
		k = self:substitute_template_specs(k)
		v = self:substitute_template_specs(v)
		pinfo[k] = v
	end
	return pinfo
end


function Category:make_new(info)
	info.originating_label = self._info.label
	info.originating_module = self._data.module
	info.called_from_inside = true
	return Category.new(info)
end


function Category:getBreadcrumbName()
	local ret

	if self._lang or self._info.raw then
		ret = self._data.breadcrumb
	else
		ret = self._data.umbrella and self._data.umbrella.breadcrumb
	end
	if not ret then
		ret = self._info.label
	end

	if type(ret) ~= "table" then
		ret = {name = ret}
	end

	local name = self:substitute_template_specs(ret.name)
	local nocap = ret.nocap

	if self._sc then
		name = name .. " in " .. self._sc:getDisplayForm()
	end

	return name, nocap
end


function Category:getTOC(toc_type)
	local ret

	-- type "none" means everything fits on a single page; fall back to normal behavior (display nothing)
	if toc_type == "none" then
		return true
	end

	-- Return the textual expansion of the first existing template among the given templates, first performing
	-- substitutions on the template name such as replacing {{{langcode}}} with the current language's code (if any).
	-- If no templates exist after expansion, or if nil is passed in, return nil. If a single string is passed in,
	-- treat it like a one-element list consisting of that string.
	local function get_template_text(templates)
		if templates == nil then
			return nil
		end
		if type(templates) ~= "table" then
			templates = {templates}
		end
		for _, template in ipairs(templates) do
			if template == false then
				return false
			end
			template = self:substitute_template_specs(template)
			local template_obj = mw.title.new("Template:" .. template)
			if template_obj.exists then
				return mw.getCurrentFrame():expandTemplate{title = template_obj.text, args = {}}
			end
		end
		return nil
	end

	local templates, fallback_templates

	-- If TOC type is "full" (more than 2500 entries), do the following, in order:
	-- 1. look up and expand the `toc_template_full` templates (normal or umbrella, depending on whether there is
	--    a current language);
	-- 2. look up and expand the `toc_template` templates (normal or umbrella, as above);
	-- 3. do the default behavior, which is as follows:
	-- 3a. look up a language-specific "full" template according to the current language (using English if there
	--     is no current language);
	-- 3b. look up a language-specific "normal" template according to the current language (using English if there
	--     is no current language);
	-- 3c. display nothing.
	--
	-- If TOC type is "normal" (between 200 and 2500 entries), do the following, in order:
	-- 1. look up and expand the `toc_template` templates (normal or umbrella, depending on whether there is
	--    a current language);
	-- 2. do the default behavior, which is as follows:
	-- 2a. look up a language-specific "normal" template according to the current language (using English if there
	--     is no current language);
	-- 2b. display nothing.

	local data_source
	if self._lang or self._info.raw then
		data_source = self._data
	else
		data_source = self._data.umbrella
	end

	if data_source then
		if toc_type == "full" then
			templates = data_source.toc_template_full
			fallback_templates = data_source.toc_template
		else
			templates = data_source.toc_template
		end
	end

	local text = get_template_text(templates)
	if text then
		return text
	end
	if text == false then
		return nil
	end
	text = get_template_text(fallback_templates)
	if text then
		return text
	end
	if text == false then
		return nil
	end
	return true
end


function Category:getInfo()
	return self._info
end


function Category:getDataModule()
	return self._data.module
end


function Category:canBeEmpty()
	if self._lang or self._info.raw then
		return self._data.can_be_empty
	else
		return self._data.umbrella and self._data.umbrella.can_be_empty
	end
end


function Category:isHidden()
	if self._lang or self._info.raw then
		return self._data.hidden
	else
		return self._data.umbrella and self._data.umbrella.hidden
	end
end


function Category:getCategoryName()
	if self._info.raw then
		return self._info.label
	elseif self._lang then
		local ret = self._lang:getCanonicalName() .. " " .. self._info.label

		if self._sc then
			ret = ret .. " in " .. self._sc:getDisplayForm()
		end

		return mw.getContentLanguage():ucfirst(ret)
	else
		local ret = mw.getContentLanguage():ucfirst(self._info.label)
		if not (self._data.umbrella and self._data.umbrella.no_by_language) then
			ret = ret .. " by language"
		end
		return ret
	end
end


function Category:getIntro()
	if self._lang or self._info.raw then
		return self:substitute_template_specs(self._data.intro)
	else
		return self._data.umbrella and self:substitute_template_specs(self._data.umbrella.intro)
	end
end


local function remove_lang_params(desc)
	desc = desc:gsub("{{{langname}}} ", "")
	desc = desc:gsub("{{{langcode}}} ", "")
	desc = desc:gsub("{{{langcat}}} ", "")
	return desc
end

function Category:getDescription(isChild)
	-- Allows different text in the list of a category's children
	local isChild = isChild == "child"

	local function display_title(displaytitle, lang)
		if type(displaytitle) == "string" then
			displaytitle = self:substitute_template_specs(displaytitle)
		else
			displaytitle = displaytitle(self:getCategoryName(), lang)
		end
		mw.getCurrentFrame():callParserFunction("DISPLAYTITLE", "Category:" .. displaytitle)
	end

	if self._lang or self._info.raw then
		if not isChild and self._data.displaytitle then
			display_title(self._data.displaytitle, self._lang)
		end

		if self._sc then
			return self:getCategoryName() .. "."
		else
			local desc = self:convert_spec_to_string(self._data.description)

			if not isChild and desc and self._data.additional then
				desc = desc .. "\n\n" .. self._data.additional
			end

			return self:substitute_template_specs(desc)
		end
	else
		if not isChild and self._data.umbrella and self._data.umbrella.displaytitle then
			display_title(self._data.umbrella.displaytitle, nil)
		end

		local desc = self:convert_spec_to_string(self._data.umbrella and self._data.umbrella.description)
		local has_umbrella_desc = not not desc
		if not desc then
			desc = self:convert_spec_to_string(self._data.description)
			if desc then
				desc = remove_lang_params(desc)
				desc = mw.getContentLanguage():lcfirst(desc)
				desc = desc:gsub("%.$", "")
				desc = "Categories with " .. desc .. "."
			end
		end
		if not desc then
			desc = "Categories with " .. self._info.label .. " in various specific languages."
		end
		if not isChild then
			local additional = self:convert_spec_to_string(
				self._data.umbrella and self._data.umbrella.additional or not has_umbrella_desc and self._data.additional
			)
			if additional then
				desc = desc .. "\n\n" .. remove_lang_params(additional)
			end
			desc = desc .. "\n\n{{{umbrella_msg}}}"
		end
		desc = self:substitute_template_specs(desc)
		return desc
	end
end


function Category:canonicalize_parents_children(cats, is_children)
	if not cats then
		return nil
	end
	if type(cats) ~= "table" then
		cats = {cats}
	end
	if cats.name or cats.module then
		cats = {cats}
	end
	if #cats == 0 then
		return nil
	end

	local ret = {}

	for _, cat in ipairs(cats) do
		if type(cat) ~= "table" or not cat.name and not cat.module then
			cat = {name = cat}
		end
		table.insert(ret, cat)
	end

	local is_umbrella = not self._lang and not self._info.raw
	local table_type = is_children and "extra_children" or "parents"

	for i, cat in ipairs(ret) do
		local sort_key = self:substitute_template_specs(cat.sort)

		local name = cat.name

		if cat.module then
			-- A reference to a category using another category tree module.
			if not cat.args then
				error("Missing .args in '" .. table_type .. "' table with module=\"" .. cat.module .. "\" for '" ..
					self._info.label .. "' category entry in module '" .. (self._data.module or "unknown") .. "'")
			end
			name = require("Module:category tree/" .. cat.module).new(self:substitute_template_specs_in_args(cat.args))
		else
			if not name then
				error("Missing .name in " .. (is_umbrella and "umbrella " or "") .. "'" .. table_type .. "' table for '" ..
					self._info.label .. "' category entry in module '" .. (self._data.module or "unknown") .. "'")
			end
			if type(name) ~= "string" then
				-- assume it's a category object and use it directly
			else
				name = self:substitute_template_specs(name)
				if name:find("^Category:") then
					-- It's a non-poscatboiler category name.
					sort_key = sort_key or is_children and name:gsub("^Category:", "") or self:getCategoryName()
				else
					-- It's a label.
					local raw
					if self._info.raw or is_umbrella then
						raw = not cat.is_label
					else
						raw = cat.raw
					end
					local cat_code
					if cat.lang == false then
						cat_code = nil
					elseif cat.lang then
						cat_code = self:substitute_template_specs(cat.lang)
					elseif not raw then
						cat_code = self._info.code
					end
					sort_key = sort_key or is_children and name or self._info.label
					name = self:make_new({
						label = name, code = cat_code, sc = self:substitute_template_specs(cat.sc),
						raw = raw, args = self:substitute_template_specs_in_args(cat.args)
					})
				end
			end
		end

		sort_key = mw.ustring.upper(sort_key or is_children and " " or self._info.label)
		local description = is_children and self:substitute_template_specs(cat.description) or nil
		ret[i] = {name = name, description = description, sort = sort_key}
	end

	return ret
end


function Category:getParents()
	local is_umbrella = not self._lang and not self._info.raw
	local retval
	if self._sc then
		local parent1 = self:make_new({code = self._info.code, label = "terms in " .. self._sc:getCanonicalName() .. " script"})
		local parent2 = self:make_new({code = self._info.code, label = self._info.label, raw = self._info.raw, args = self._info.args})

		retval = {
			{name = parent1, sort = self._sc:getCanonicalName()},
			{name = parent2, sort = self._sc:getCanonicalName()},
		}
	else
		local parents
		if is_umbrella then
			parents = self._data.umbrella and self._data.umbrella.parents or self._data.umbrella_parents
		else
			parents = self._data.parents
		end

		retval = self:canonicalize_parents_children(parents)
	end

	if not retval then
		return nil
	end

	local self_cat = self:getCategoryName()
	for _, parent in ipairs(retval) do
		local parent_cat = parent.name.getCategoryName and parent.name:getCategoryName()
		if self_cat == parent_cat then
			error(("Internal error: Infinite loop would occur, as parent category '%s' is the same as the child category"):
				format(self_cat))
		end
	end
		
	return retval
end


function Category:getTopicParents()
	if self._data["topic_parents"] then
		local topic_parents = {}
		for _, topic_parent in ipairs(self._data["topic_parents"]) do
			if self._lang then
				table.insert(topic_parents, self._lang:getCode() .. ":" .. topic_parent)
			else
				table.insert(topic_parents, topic_parent)
			end
		end
		return topic_parents
	end
	return nil
end


function Category:getChildren()
	local is_umbrella = not self._lang and not self._info.raw
	local children = self._data.children

	local ret = {}

	if not is_umbrella and children then
		for _, child in ipairs(children) do
			child = mw.clone(child)

			if type(child) ~= "table" then
				child = {name = child}
			end

			if not child.sort then
				child.sort = child.name
			end

			-- FIXME, is preserving the script correct?
			child.name = self:make_new({code = self._info.code, label = child.name, raw = child.raw, sc = self._info.sc})

			table.insert(ret, child)
		end
	end

	local extra_children
	if is_umbrella then
		extra_children = self._data.umbrella and self._data.umbrella.extra_children
	else
		extra_children = self._data.extra_children
	end

	extra_children = self:canonicalize_parents_children(extra_children, "children")
	if extra_children then
		for _, child in ipairs(extra_children) do
			table.insert(ret, child)
		end
	end

	if #ret == 0 then
		return nil
	end
	return ret
end


function Category:getUmbrella()
	if self._info.raw or not self._lang or self._sc or self._data.umbrella == false then
		return nil
	end

	return self:make_new({label = self._info.label})
end


function Category:getAppendix()
	-- FIXME, this should be customizable.
	if not self._info.raw and self._info.label and self._lang then
		local appendixName = "Appendix:" .. self._lang:getCanonicalName() .. " " .. self._info.label
		local appendix = mw.title.new(appendixName).exists
		if appendix then
			return appendixName
		else
			return nil
		end
	else
		return nil
	end
end


function Category:getCatfixInfo()
	if self._lang or self._info.raw then
		if self._data.catfix == false then
			return nil
		end
		local lang, sc
		if self._data.catfix then
			lang = require("Module:languages").getByCode(self:substitute_template_specs(self._data.catfix), true, nil, nil, true)
		else
			lang = self._lang
		end
		if self._data.catfix_sc then
			sc = require("Module:scripts").getByCode(self:substitute_template_specs(self._data.catfix_sc), true, nil, true)
		else
			sc = self._sc
		end
		return lang, sc
	else -- umbrella
		if not self._data.umbrella or not self._data.umbrella.catfix then
			return nil
		end
		local lang = require("Module:languages").getByCode(self:substitute_template_specs(self._data.umbrella.catfix), true, nil, nil, true)
		local sc = self:substitute_template_specs(self._data.umbrella.catfix_sc)
		if sc then
			sc = require("Module:scripts").getByCode(sc, true, nil, true)
		end
		return lang, sc
	end
end


function Category:getTOCTemplateName()
	local lang, sc = self:getCatfixInfo()
	local code = lang and lang:getCode() or "en"
	return "Template:" .. code .. "-" .. (self._data.toctemplateprefix or "") .. "categoryTOC"
end


function Category:getDisplay()
	if self._data["display"] then
		if self._lang then
			return self._lang:getCanonicalName() .. " " .. self._data["display"]
		else
			return mw.getContentLanguage():ucfirst(self._data["display"]) .. " by language"
		end
	end
	return nil
end

function Category:getDisplay2()
	if self._data["display"] then
		if self._lang then
			return mw.getContentLanguage():ucfirst(self._data["display"])
		else
			return mw.getContentLanguage():ucfirst(self._data["display"]) .. " by language"
		end
	end
	return nil
end

function Category:getSort()
	return self._data["sort"]
end


return export