Module:Data consistency check/documentation: Difference between revisions

From The Languages of David J. Peterson
Jump to navigation Jump to search
No edit summary
Line 11: Line 11:
* Each name in the list of other names must appear only once.
* Each name in the list of other names must appear only once.
* <code>otherNames</code>, if present, must be an array.
* <code>otherNames</code>, if present, must be an array.
* Wikidata item IDs must be a positive integer or a string starting with <code>Q</code> and ending with decimal digits.


The following must be true of the data used by [[Module:languages]]:
The following must be true of the data used by [[Module:languages]]:
* Each code must be defined in the correct submodule according to whether it is two-letter, three-letter or exceptional.
* Each code must be defined in the correct submodule according to whether it is two-letter, three-letter or exceptional.
* The canonical name (field <code>1</code>) must be present and must not be the same as the canonical name of another language.
* The canonical name (field <code>1</code>) must be present and must not be the same as the canonical name of another language.
* If <code>scripts</code> is given, it must be an array, and each string in the array must be a valid script code.
* If field <code>2</code> is not <code>nil</code>, it must a valid Wikidata item ID.
* If field <code>3</code> or <code>family</code> is given and not <code>nil</code>, it must be a valid family code.
* If field <code>4</code> or <code>scripts</code> is given and not <code>nil</code>, it must be an array, and each string in the array must be a valid script code.
* If <code>ancestors</code> is given, it must be an array, and each string in the array must be a valid language or etymology language code.
* If <code>ancestors</code> is given, it must be an array, and each string in the array must be a valid language or etymology language code.
* If <code>family</code> is given, it must be a valid family code.
* If <code>family</code> is given, it must be a valid family code.
Line 23: Line 26:
* If <code>entry_name</code> or <code>sort_key</code> is given, the <code>from</code> array must be longer or equal in length to the <code>to</code> array.
* If <code>entry_name</code> or <code>sort_key</code> is given, the <code>from</code> array must be longer or equal in length to the <code>to</code> array.
* If <code>standardChars</code> is given, it must form a valid Lua string pattern when placed between square brackets with <code>^</code> before it ({{code|lua|"[^...]}}). (It should match all characters regularly used in the language, but that cannot be tested.)
* If <code>standardChars</code> is given, it must form a valid Lua string pattern when placed between square brackets with <code>^</code> before it ({{code|lua|"[^...]}}). (It should match all characters regularly used in the language, but that cannot be tested.)
* Have no data keys besides these: {{code|lua|"canonicalName", "entry_name", "sort_key", "otherNames", "type", "scripts", "family", "ancestors", "wikimedia_codes", "wikipedia_article", "standardChars", "translit_module", "override_translit", "link_tr", "wikidata_item"}}.
* If <code>override_translit</code> is set, <code>translit</code> must also be set, because there must be a transliteration module that can override manual transliteration.
* If <code>link_tr</code> is present, it must be <code>true</code>.
* Have no data keys besides these: {{code|lua|1, 2, 3, "entry_name", "sort_key", "display", "otherNames", "aliases", "varieties", "type", "scripts", "ancestors", "wikimedia_codes", "wikipedia_article", "standardChars", "translit", "override_translit", "link_tr"}}.


Checks not performed:
Checks not performed:
* If <code>translit_module</code> is present, it should be the name of a module, and this module should contain a <code>tr</code> function that takes a pagename (and optionally a language code and script code) as arguments.
* If <code>translit</code> is present, it should be the name of a module, and this module should contain a <code>tr</code> function that takes a pagename (and optionally a language code and script code) as arguments.
* If <code>sort_key</code> is a string, it should be the name of a module, and this module should contain a <code>makeSortKey</code> function that takes a pagename (and optionally a language code and script code) as arguments.
* If <code>sort_key</code> is a string, it should be the name of a module, and this module should contain a <code>makeSortKey</code> function that takes a pagename (and optionally a language code and script code) as arguments.
* If <code>entry_name</code> or <code>sort_key</code> is a table and contains a field <code>remove_diacritics</code>, the value of the field should be a string that forms a valid Lua pattern when it is placed inside negated set notation (<code>[^...]</code>).
* If <code>entry_name</code> or <code>sort_key</code> is a table and contains a field <code>remove_diacritics</code>, the value of the field should be a string that forms a valid Lua pattern when it is placed inside negated set notation (<code>[^...]</code>).
Line 55: Line 60:
[[Category:Data modules|*]]
[[Category:Data modules|*]]
[[Category:Module unit tests|*]]
[[Category:Module unit tests|*]]
[[Category:Modules dealing with languages and scripts]]
[[Category:Language and script modules]]
[[Category:Maintenance modules]]
[[Category:Maintenance modules]]
[[Category:Wiktionary modules]]
</includeonly>
</includeonly>

Revision as of 19:14, 18 September 2023

This module checks the validity and internal consistency of the language, language family, and script data used on Wiktionary: the modules in Category:Language data modules as well as Module:scripts/data.

Output

Lua error in package.lua at line 80: module 'Module:languages/data/3/i/extra' not found.

Checks performed

For multiple data modules:

  • Codes for languages, families and etymology-only languages must be unique and cannot clash with one another.
  • Canonical names for languages, families, and etymology-only languages must not be found in the list of other names.
  • Each name in the list of other names must appear only once.
  • otherNames, if present, must be an array.
  • Wikidata item IDs must be a positive integer or a string starting with Q and ending with decimal digits.

The following must be true of the data used by Module:languages:

  • Each code must be defined in the correct submodule according to whether it is two-letter, three-letter or exceptional.
  • The canonical name (field 1) must be present and must not be the same as the canonical name of another language.
  • If field 2 is not nil, it must a valid Wikidata item ID.
  • If field 3 or family is given and not nil, it must be a valid family code.
  • If field 4 or scripts is given and not nil, it must be an array, and each string in the array must be a valid script code.
  • If ancestors is given, it must be an array, and each string in the array must be a valid language or etymology language code.
  • If family is given, it must be a valid family code.
  • If type is given, it must be one of the recognised values (regular, reconstructed, appendix-constructed).
  • If entry_name is given, it must be a table that contains either two arrays (from and to) or a string (remove_diacritics) or both.
  • If sort_key is given, it may either be a string, or at table that in turn contains either two arrays (from and to) or a string (remove_diacritics).
  • If entry_name or sort_key is given, the from array must be longer or equal in length to the to array.
  • If standardChars is given, it must form a valid Lua string pattern when placed between square brackets with ^ before it ("[^...]). (It should match all characters regularly used in the language, but that cannot be tested.)
  • If override_translit is set, translit must also be set, because there must be a transliteration module that can override manual transliteration.
  • If link_tr is present, it must be true.
  • Have no data keys besides these: 1, 2, 3, "entry_name", "sort_key", "display", "otherNames", "aliases", "varieties", "type", "scripts", "ancestors", "wikimedia_codes", "wikipedia_article", "standardChars", "translit", "override_translit", "link_tr".

Checks not performed:

  • If translit is present, it should be the name of a module, and this module should contain a tr function that takes a pagename (and optionally a language code and script code) as arguments.
  • If sort_key is a string, it should be the name of a module, and this module should contain a makeSortKey function that takes a pagename (and optionally a language code and script code) as arguments.
  • If entry_name or sort_key is a table and contains a field remove_diacritics, the value of the field should be a string that forms a valid Lua pattern when it is placed inside negated set notation ([^...]).

These are not checked here, because module errors will quickly crop up in entries if these conditions are not met, assuming that Module:utilities attempts to generate a sortkey for a category pertaining to the language in question, or full_link attempts to use the transliteration module.

Module:languages/code to canonical name and Module:languages/canonical names must contain all the codes and canonical names found in the data submodules of Module:languages, and no more.

The following must be true of the data used by Module:etymology languages:

  • canonicalName must be given.
  • parent must be given must be a valid language, family or etymology-only language code.
  • If ancestors is given, it must be an array, and each string in the array must be a valid language or etymology language code. The etymology language should also be listed as the ancestor of a regular language.
  • Have no data keys besides these: "canonicalName", "otherNames", "parent", "ancestors", "wikipedia_article", "wikidata_item".

Codes in Module:families data must:

  • Have canonicalName, which must not be the same as the canonical name of another family.
  • If family is given, it must be a valid family code.
  • Have at least one language or subfamily belonging to it.
  • Have no data keys besides these: "canonicalName", "otherNames", "family", "protoLanguage", "wikidata_item".

Codes in Module:scripts data must:

  • Have canonicalName.
  • Have at least one language that lists it as one of its scripts.
  • Have a characters pattern for script autodetection, and this must form a valid Lua string pattern when placed between square brackets ("[...]"). (It should match all characters in the script, but that cannot be tested.)
  • Have no data keys besides these: "canonicalName", "otherNames", "parent", "systems", "wikipedia_article", "characters", "direction".