Project:Adding Languages
Adding Languages to the DJP Wiki
Adding the language code
Each language added to this wiki needs a four-character long code that is unique to it. This is how the language will be referenced in many uses of templates as well in some uses of special pages. These cords are analogous to the two- and three-letter codes used in Wiktionary and other places on the internet. Usually the code is the first four letters of the name of the language, but that is not a strict rule. For example, the code for High Valryian is hval.
Once the code is determined, it needs to be added to the language list at Module:Languages/data/djp. This page is solely for languages documented by The Languages of David J. Peterson. The entry has to be in a specific format, with the code at the top and then a number of values in the entry. The exact details documentation on what can go in a language entry can be found on the page itself. This is a summary.
Here's an example:
m["gvun"] = { "G'Vunna", "Q999999018", "atha", "Gvoz,Latn", ancestors = {"veda"}, case_insensitive = true, sort_key = { from = {"[äàáâå]", "Ǝ", "[ëèéêǝ]", "[ïìíî]", "[öòóô]", "[üùúû]" }, to = {"a" , "E", "e" , "i" , "o" , "u" }} , standardChars = s["default-chars"].."ÖöÜüƎǝ" .. c.punc, }
Don't forget the commas after each line in the entry!
Each entry has four unlabeled values that are in there in this order.
- Canonical name (required). This is the 'official' name of the language. It should be in quotes.
- Wikidata item ID (required). The item ID is a left over bit from the Wiktionary code we borrowed, so you don't need to figure out one, but something must be there. You can just put nil.
- Language family (optional). This is a value from one of the pages in Module:Families/data or Module:Families/data/djp. If this doesn't apply, you can put nil or "x".
- Scripts (optional). A list of scripts that apply to the language, using the scripts in Module:Scripts/data or Module:Scripts/data/djp. This should be in quotes. If there is more than one script, separate them with commas. If you do include this, you should list at least Latn for the Latin/Roman alphabet, and make sure you put in something for Language Family before it even if it is nil.
After these, there are a number of named values. These are all optional. They start with the name of the value, an equal sign (=), and the desired values. These can be added in any order after the four unlabeled ones.
Common labeled values are:
- ancestors. Identifies the ancestor languages, by language code. The codes are listed as separate values in a table.
- For example:
ancestors = {"veda","en","etc"},
- standardChars. These are the characters that are considered standard for the romanization of the language. Any character in a word that isn't in the languages standard characters gets called out into a category. In many entries, you'll see several things strung together with
..
, which links multiple things into one. You can uses["default-chars"]
to mean the usual English 26 characters (a-z and A-Z). It also includes numerals (0-9). Put other characters in a string inside of quotes. And you'll usually seec.punc
and the end, to signify that standard punctuation isn't supposed to be called out. - case_insentive. If this is true, the case doesn't matter. This is also the default value if it isn't included.
- sort_key. This is used to equate certain characters with others when it comes to sorting in lists and categories. For example if you want versions of 'e' with diacritics to be sorted with the e without diacritics. Look at the existing entries in the data page and the detailed instructions for more on that.
Adding extended information to the language
Additional information on the language that is not accessed as much is kept on a separate page. For the languages documented on this wiki, that page is Module:Languages/data/djp/extra. This information is optional - if the language doesn't need it, then you don't have to bother adding anything to the page.
Here's an example from the same language:
m["gvun"] = { aliases = {"G'Vunnǝ", "Lokheim", "Gvunna", "Gvunnǝ"}, }
Right now, we only use the labeled value aliases</codes> for other known names for the language. As seen above, this is a labeled value with a table of strings in quotes.
Adding Scripts
If there are scripts associated with the language that have not been added already, those should be added now. See Project:Adding Scripts for details.