Module:Data consistency check/documentation: Difference between revisions
Djpwikiadmin (talk | contribs) m (Djpwikiadmin moved page Module:Data consistency check/doc to Module:Data consistency check/documentation) |
Djpwikiadmin (talk | contribs) No edit summary |
||
Line 11: | Line 11: | ||
* Each name in the list of other names must appear only once. | * Each name in the list of other names must appear only once. | ||
* <code>otherNames</code>, if present, must be an array. | * <code>otherNames</code>, if present, must be an array. | ||
* Wikidata item IDs must be a positive integer or a string starting with <code>Q</code> and ending with decimal digits. | |||
The following must be true of the data used by [[Module:languages]]: | The following must be true of the data used by [[Module:languages]]: | ||
* Each code must be defined in the correct submodule according to whether it is two-letter, three-letter or exceptional. | * Each code must be defined in the correct submodule according to whether it is two-letter, three-letter or exceptional. | ||
* The canonical name (field <code>1</code>) must be present and must not be the same as the canonical name of another language. | * The canonical name (field <code>1</code>) must be present and must not be the same as the canonical name of another language. | ||
* If <code>scripts</code> is given, it must be an array, and each string in the array must be a valid script code. | * If field <code>2</code> is not <code>nil</code>, it must a valid Wikidata item ID. | ||
* If field <code>3</code> or <code>family</code> is given and not <code>nil</code>, it must be a valid family code. | |||
* If field <code>4</code> or <code>scripts</code> is given and not <code>nil</code>, it must be an array, and each string in the array must be a valid script code. | |||
* If <code>ancestors</code> is given, it must be an array, and each string in the array must be a valid language or etymology language code. | * If <code>ancestors</code> is given, it must be an array, and each string in the array must be a valid language or etymology language code. | ||
* If <code>family</code> is given, it must be a valid family code. | * If <code>family</code> is given, it must be a valid family code. | ||
Line 23: | Line 26: | ||
* If <code>entry_name</code> or <code>sort_key</code> is given, the <code>from</code> array must be longer or equal in length to the <code>to</code> array. | * If <code>entry_name</code> or <code>sort_key</code> is given, the <code>from</code> array must be longer or equal in length to the <code>to</code> array. | ||
* If <code>standardChars</code> is given, it must form a valid Lua string pattern when placed between square brackets with <code>^</code> before it ({{code|lua|"[^...]}}). (It should match all characters regularly used in the language, but that cannot be tested.) | * If <code>standardChars</code> is given, it must form a valid Lua string pattern when placed between square brackets with <code>^</code> before it ({{code|lua|"[^...]}}). (It should match all characters regularly used in the language, but that cannot be tested.) | ||
* Have no data keys besides these: {{code|lua|" | * If <code>override_translit</code> is set, <code>translit</code> must also be set, because there must be a transliteration module that can override manual transliteration. | ||
* If <code>link_tr</code> is present, it must be <code>true</code>. | |||
* Have no data keys besides these: {{code|lua|1, 2, 3, "entry_name", "sort_key", "display", "otherNames", "aliases", "varieties", "type", "scripts", "ancestors", "wikimedia_codes", "wikipedia_article", "standardChars", "translit", "override_translit", "link_tr"}}. | |||
Checks not performed: | Checks not performed: | ||
* If <code> | * If <code>translit</code> is present, it should be the name of a module, and this module should contain a <code>tr</code> function that takes a pagename (and optionally a language code and script code) as arguments. | ||
* If <code>sort_key</code> is a string, it should be the name of a module, and this module should contain a <code>makeSortKey</code> function that takes a pagename (and optionally a language code and script code) as arguments. | * If <code>sort_key</code> is a string, it should be the name of a module, and this module should contain a <code>makeSortKey</code> function that takes a pagename (and optionally a language code and script code) as arguments. | ||
* If <code>entry_name</code> or <code>sort_key</code> is a table and contains a field <code>remove_diacritics</code>, the value of the field should be a string that forms a valid Lua pattern when it is placed inside negated set notation (<code>[^...]</code>). | * If <code>entry_name</code> or <code>sort_key</code> is a table and contains a field <code>remove_diacritics</code>, the value of the field should be a string that forms a valid Lua pattern when it is placed inside negated set notation (<code>[^...]</code>). | ||
Line 55: | Line 60: | ||
[[Category:Data modules|*]] | [[Category:Data modules|*]] | ||
[[Category:Module unit tests|*]] | [[Category:Module unit tests|*]] | ||
[[Category: | [[Category:Language and script modules]] | ||
[[Category:Maintenance modules]] | [[Category:Maintenance modules]] | ||
[[Category:Wiktionary modules]] | |||
</includeonly> | </includeonly> |
Revision as of 19:14, 18 September 2023
This module checks the validity and internal consistency of the language, language family, and script data used on Wiktionary: the modules in Category:Language data modules as well as Module:scripts/data.
Output
Lua error in package.lua at line 80: module 'Module:languages/data/3/i/extra' not found.
Checks performed
For multiple data modules:
- Codes for languages, families and etymology-only languages must be unique and cannot clash with one another.
- Canonical names for languages, families, and etymology-only languages must not be found in the list of other names.
- Each name in the list of other names must appear only once.
otherNames
, if present, must be an array.- Wikidata item IDs must be a positive integer or a string starting with
Q
and ending with decimal digits.
The following must be true of the data used by Module:languages:
- Each code must be defined in the correct submodule according to whether it is two-letter, three-letter or exceptional.
- The canonical name (field
1
) must be present and must not be the same as the canonical name of another language. - If field
2
is notnil
, it must a valid Wikidata item ID. - If field
3
orfamily
is given and notnil
, it must be a valid family code. - If field
4
orscripts
is given and notnil
, it must be an array, and each string in the array must be a valid script code. - If
ancestors
is given, it must be an array, and each string in the array must be a valid language or etymology language code. - If
family
is given, it must be a valid family code. - If
type
is given, it must be one of the recognised values (regular
,reconstructed
,appendix-constructed
). - If
entry_name
is given, it must be a table that contains either two arrays (from
andto
) or a string (remove_diacritics
) or both. - If
sort_key
is given, it may either be a string, or at table that in turn contains either two arrays (from
andto
) or a string (remove_diacritics
). - If
entry_name
orsort_key
is given, thefrom
array must be longer or equal in length to theto
array. - If
standardChars
is given, it must form a valid Lua string pattern when placed between square brackets with^
before it ("[^...]
). (It should match all characters regularly used in the language, but that cannot be tested.) - If
override_translit
is set,translit
must also be set, because there must be a transliteration module that can override manual transliteration. - If
link_tr
is present, it must betrue
. - Have no data keys besides these:
1, 2, 3, "entry_name", "sort_key", "display", "otherNames", "aliases", "varieties", "type", "scripts", "ancestors", "wikimedia_codes", "wikipedia_article", "standardChars", "translit", "override_translit", "link_tr"
.
Checks not performed:
- If
translit
is present, it should be the name of a module, and this module should contain atr
function that takes a pagename (and optionally a language code and script code) as arguments. - If
sort_key
is a string, it should be the name of a module, and this module should contain amakeSortKey
function that takes a pagename (and optionally a language code and script code) as arguments. - If
entry_name
orsort_key
is a table and contains a fieldremove_diacritics
, the value of the field should be a string that forms a valid Lua pattern when it is placed inside negated set notation ([^...]
).
These are not checked here, because module errors will quickly crop up in entries if these conditions are not met, assuming that Module:utilities attempts to generate a sortkey for a category pertaining to the language in question, or full_link
attempts to use the transliteration module.
Module:languages/code to canonical name and Module:languages/canonical names must contain all the codes and canonical names found in the data submodules of Module:languages, and no more.
The following must be true of the data used by Module:etymology languages:
canonicalName
must be given.parent
must be given must be a valid language, family or etymology-only language code.- If
ancestors
is given, it must be an array, and each string in the array must be a valid language or etymology language code. The etymology language should also be listed as the ancestor of a regular language. - Have no data keys besides these:
"canonicalName", "otherNames", "parent", "ancestors", "wikipedia_article", "wikidata_item"
.
Codes in Module:families data must:
- Have
canonicalName
, which must not be the same as the canonical name of another family. - If
family
is given, it must be a valid family code. - Have at least one language or subfamily belonging to it.
- Have no data keys besides these:
"canonicalName", "otherNames", "family", "protoLanguage", "wikidata_item"
.
Codes in Module:scripts data must:
- Have
canonicalName
. - Have at least one language that lists it as one of its scripts.
- Have a
characters
pattern for script autodetection, and this must form a valid Lua string pattern when placed between square brackets ("[...]"
). (It should match all characters in the script, but that cannot be tested.) - Have no data keys besides these:
"canonicalName", "otherNames", "parent", "systems", "wikipedia_article", "characters", "direction"
.