Contribute/High Valyrian

From The Languages of David J. Peterson
Jump to navigation Jump to search

Things to do

Tasks on the High Valyrian section of the wiki that could use a hand are

Adding pages for inflected word forms

I don't know any coding so this is the solution I came up with. It's very much cobbled together and I'm aware that it's rather convoluted.
-- Juelos (talk) 01:59, 26 September 2022 (PDT)

Different speadsheets are used to generate the pages, and pywikibot (a python tool for interacting with MediaWiki) is used to add them to the wiki, in short. Here is the spreadsheet used to generate pages for inflected noun forms, for example. Spreadsheets for other paradigms can be provided upon request to User:Juelos. Each tab corresponds to some paradigm or subtype of a paradigm. You must modify the number of cells and/or their content for each (sub)paradigm if you haven't made or got a file for that purpose. Each cell corresponds to a wiki page for an inflected word form. Each cell i.e. wiki page must begin with {{-start-}} and end with {{-stop-}}, in order for it to work with pywikibot later. For each citation form of a word, at least one principal part (the stem) must be provided, and sometimes more if the word is somehow irregular. How you get these is up to you, but it's easiest if you simply have a list that you can copy.

You then copy the whole range of cells with the into a text document, or just one column if I have concatenated each row into one cell. For the text document, I use Notepad++ to edit it. The main reason for this is the Regex (regular expressions) search and replace features it has, as well as the abilty to highlight and copy specific parts of the document, which I use a lot in the following steps. When you have the pages for inflected forms you wish to edit, you must remove tab characters and quotations marks which are an artifact from Excel. This is simply done with a regex search and replace for [\t"] and replace it with nothing.

The next step will be to check which of your generated pages already exist on the wiki. However, due to the limits of pywikibot, it can only check against one category. I have yet to come up with a solution for this, so there may need to be some manual checking especially for short words. Note, that in you file with all generated pages, there should not be two or more pages with identical names, since then only one (probably the last) will be added. This should not be a problem unless you have very similar words in the same paradigm.

You should then add the pages of inflected forms for such words in separate sessions/pywikibot commands, in order to avoid this. The category you will most likely want to check against is "High Valyrian terms with IPA pronunciation", or "High Valyrian lemmas" or "High Valyrian non-lemma forms".

For the checking against existing pages, you'll want to highlight the page names you generated and copy them and only them into a separate file and save that file. Then you'll want to have installed pywikibot. You will have to save your password or a bot password (safer) for your account in a login file (you can google pywikibot tutorials, there are very many and they are very detailed). Then in the command prompt, you cd to the folder where you save your files and run pywikibot, for example with cd pywikibot, followed by pwb.py login.

Then you run the command that does the checking and generates the intersection of the pages you've generated and pages already on the wiki. The command I use is pwb.py listpages -format:3 -intersect -cat:"High Valyrian terms with IPA pronunciation" -file:file_with_generated_page_names.txt. This will give you list of pages that already exist. I then paste these in another spreadsheet in order to get a search term to use on the first file with all the pages, to highlight them and copy them into a different text document. Then you run a second replace to replace the new lines (in the attached file). This will give you commands to use with pywikibot in the command prompt, that add the part of the generated pages after the pronunciation section to the existing pages.

You'll have to remove the final \n, this is some artifact of the process, and replace it with a new line. When you paste these into the command prompt, they will execute directly to add the text to existing pages, so make sure they are correct. When these partial pages have been appended to existing pages (they will only look right if the existing page is a High Valyrian term), you can do the last step, which is actually the easiest. This is simply adding the rest of the newly generated pages to the wiki. In the command prompt, enter pwb.py pagefromfile -showdiff -notitle -summary:"Created page" -file:file_with_all_generated_pages.txt. The pages that already exist that you dealt with in the previous step will not be a problem; they will simply not be added. Then you simply let that run and your pages will be added. Then repeat the process for the next batch of inflected form pages. If you are sure there are no identical page names among them, you can speed up the process by adding inflected form pages for words in several (sub)paradigms at once.