Contribute/High Valyrian: Difference between revisions

From The Languages of David J. Peterson
Jump to navigation Jump to search
 
(19 intermediate revisions by the same user not shown)
Line 1: Line 1:
==Things to do==
==Things to do==
Tasks on the High Valyrian section of the wiki that could use a hand are
Tasks on the High Valyrian section of the wiki that could use a hand are:
 
* adding recently publicized (i.e. "new") words to [[High Valyrian Dictionary|the dictionary page]] and [[English-High Valyrian Dictionary|the English-High Valyrian dictionary page]],
* adding pages for recently publicized (i.e. "new") words (see [[:Category:High Valyrian lemmas]] for examples),
* adding pages for new words (see [[:Category:High Valyrian lemmas]] for examples),
* adding new words to [[High Valyrian Dictionary|the dictionary page]] and [[English-High Valyrian Dictionary|the English-High Valyrian dictionary page]],
* adding new senses of words to existing word pages and to the dictionary,
* adding pages for inflected forms of new words (see below, and see [[:Category:High Valyrian non-lemma forms]] for examples),
* adding pages for inflected forms of new words (see below, and see [[:Category:High Valyrian non-lemma forms]] for examples),
* adding new senses to the dictionary and existing word pages,
* adding the dialogue from episodes of ''[[House of the Dragon]]'' to [[:Category:House_of_the_Dragon_dialogue|the dialogue pages]],
* adding examples from Duolingo, the dialogue and other official sources to word pages (see [[Template:HVexp|the template documentation]] for guidelines; a general guiding principle for now can be to only add examples to lemmas i.e. citation forms),
* downloading audio from [https://dedalvs.com/work/game-of-thrones/ DJP's work folder], editing it (i.e. remove slow High Valyrian and English parts) and uploading it to the wiki and then adding links to the audio to the dialogue pages, as well as to examples and pronunciation sections in word entries (see [[:Category:High Valyrian terms with audio links]] for examples),  
* adding words to [[Rhymes:High_Valyrian|the appropriate Rhyme page]] and/or creating new ones,
* adding words to [[Rhymes:High_Valyrian|the appropriate Rhyme page]] and/or creating new ones,
* adding the dialogue from upcoming episodes of the first season of House of the Dragon to [[Season_1_House_of_the_Dragon_Dialogue|the dialogue page]],
* adding [[:Category:Hval:All topics|topic]]/[[:Category:Hval:All sets|set]] categories to existing pages using [[Template:c|<code><nowiki>{{c|hval|...}}</nowiki></code>]], and
* adding examples from the new dialogue to word pages (see [[Template:HVexp|the template documentation]] for guidelines; a general guiding principle for now can be to only add examples to lemmas i.e. citation forms),
* downloading audio from [https://dedalvs.com/work/game-of-thrones/ DJP's work folder], editing it (i.e. remove slow HV and English parts) and uploading it to the wiki and then adding links to the audio to the dialogue pages, as well as to examples and pronunciation sections in word entries (see [[:Category:High Valyrian terms with audio links]] for examples),
* adding topic categories to existing pages, and
* proofreading existing pages and correcting any errors you find.
* proofreading existing pages and correcting any errors you find.


===Adding pages for inflected word forms===
===Adding pages for inflected word forms===
''By [[User:Juelos]]''


:''I don't know any coding so this is the solution I came up with. It's very much cobbled together and I'm aware that it's rather convoluted.
There are two main components I use for the adding of pages for inflected forms: several different spreadsheets, based on the same basic principle, to generate the pages, and Pywikibot, a Python tool for interacting with MediaWiki, to add them to the wiki. Here are [https://drive.google.com/drive/folders/1_7moscyZ8JnTFfKezrmOdIyx7xm7C9dx?usp=sharing the spreadsheets]. Each tab corresponds to some paradigm or subtype of a paradigm. You must modify the number of cells and/or their content for each paradigm if you haven't made or got a file for that purpose.  Each cell corresponds to a wiki page for an inflected word form. Each cell i.e. wiki page must begin with <code><nowiki>{{-start-}}</nowiki></code> and end with <code><nowiki>{{-stop-}}</nowiki></code>, in order for it to work with Pywikibot later. For each citation form of a word, at least one principal part (the stem) must be provided, and sometimes more if the word is somehow irregular. How you get these is up to you, but it's easiest if you simply have a list that you can copy.  
::''-- [[User:Juelos|Juelos]] ([[User talk:Juelos|talk]]) 01:59, 26 September 2022 (PDT)''
 
Different speadsheets are used to generate the pages, and pywikibot (a python tool for interacting with MediaWiki) is used to add them to the wiki, in short.  
Here is [https://docs.google.com/spreadsheets/d/1VGta9Av6fJPAS1AUqdGJ05uPfyT1h2qR/edit?usp=sharing&ouid=116450018339793751999&rtpof=true&sd=true the spreadsheet] used to generate pages for inflected noun forms, for example. Spreadsheets for other paradigms can be provided upon request to [[User:Juelos]]. Each tab corresponds to some paradigm or subtype of a paradigm. You must modify the number of cells and/or their content for each (sub)paradigm if you haven't made or got a file for that purpose.  Each cell corresponds to a wiki page for an inflected word form. Each cell i.e. wiki page must begin with <code><nowiki>{{-start-}}</nowiki></code> and end with <code><nowiki>{{-stop-}}</nowiki></code>, in order for it to work with pywikibot later. For each citation form of a word, at least one principal part (the stem) must be provided, and sometimes more if the word is somehow irregular. How you get these is up to you, but it's easiest if you simply have a list that you can copy.  


You then copy the whole range of cells with the into a text document, or just one column if I have concatenated each row into one cell. For the text document, I use Notepad++ to edit it. The main reason for this is the Regex (regular expressions) search and replace features it has, as well as the abilty to highlight and copy specific parts of the document, which I use a lot in the following steps. When you have the pages for inflected forms you wish to edit, you must remove tab characters and quotations marks which are an artifact from Excel. This is simply done with a regex search and replace for <code><nowiki>[\t"]</nowiki></code> and replace it with nothing.  
You then copy the whole range of cells with the into a text document, or just one column if I have concatenated each row into one cell. For the text document, I use Notepad++ to edit it. The main reason for this is the Regex (regular expressions) search and replace features it has, as well as the abilty to highlight and copy specific parts of the document, which I use a lot in the following steps. When you have the pages for inflected forms you wish to edit, you must remove tab characters and quotations marks which are an artifact from Excel. This is simply done with a regex search and replace for <code><nowiki>[\t"]</nowiki></code> and replace it with nothing.  


The next step will be to check which of your generated pages already exist on the wiki. However, due to the limits of pywikibot, it can only check against one category. I have yet to come up with a solution for this, so there may need to be some manual checking especially for short words. Note, that in you file with all generated pages, there should not be two or more pages with identical names, since then only one (probably the last) will be added. This should not be a problem unless you have very similar words in the same paradigm.
The next step will be to check which of your generated pages already exist on the wiki. However, due to the limits of Pywikibot, it can only check against one category. I have yet to come up with a solution for this, so there may need to be some manual checking especially for short words. Note, that in your file with all the generated pages, there should not be two or more pages with identical names, since then only one (probably the last) will be added. This should not be a problem unless you have very similar words in the same paradigm.


You should then add the pages of inflected forms for such words in separate sessions/pywikibot commands, in order to avoid this. The category you will most likely want to check against is "High Valyrian terms with IPA pronunciation", or "High Valyrian lemmas" or "High Valyrian non-lemma forms".
You should then add the pages of inflected forms for such words in separate sessions/Pywikibot commands, in order to avoid this. The category you will most likely want to check against is "High Valyrian terms with IPA pronunciation", or "High Valyrian lemmas" or "High Valyrian non-lemma forms".


For the checking against existing pages, you'll want to highlight the page names you generated and copy them and only them into a separate file and save that file. Then you'll want to have installed pywikibot. You will have to save your password or a bot password (safer) for your account in a login file (you can google pywikibot tutorials, there are very many and they are very detailed). Then in the command prompt, you <code><nowiki>cd</nowiki></code> to the folder where you save your files and run pywikibot, for example with <code><nowiki>cd pywikibot</nowiki></code>, followed by <code><nowiki>pwb.py login</nowiki></code>.
For the checking against existing pages, you'll want to highlight the page names you generated and copy them and only them into a separate file and save that file. Then you'll want to have installed Pywikibot. You will have to save your password or a bot password (safer) for your account in a login file (you can google Pywikibot tutorials, there are very many and they are very detailed). Then in the command prompt, you <code><nowiki>cd</nowiki></code> to the folder where you save your files and run Pywikibot, for example with <code><nowiki>cd pywikibot</nowiki></code>, followed by <code><nowiki>pwb.py login</nowiki></code>.


Then you run the command that does the checking and generates the intersection of the pages you've generated and pages already on the wiki. The command I use is <code><nowiki>pwb.py listpages -format:3 -intersect -cat:"High Valyrian terms with IPA pronunciation" -file:file_with_generated_page_names.txt</nowiki></code>. This will give you list of pages that already exist. I then paste these in [https://docs.google.com/spreadsheets/d/1VC5wQGXsGMPq6d0WpQhA5YkfM3HKOiNJ/edit?usp=sharing&ouid=116450018339793751999&rtpof=true&sd=true another spreadsheet] in order to get a search term to use on the first file with all the pages, to highlight them and copy them into a different text document. Then you run a second replace to replace the new lines (in the attached file). This will give you commands to use with pywikibot in the command prompt, that add the part of the generated pages after the pronunciation section to the existing pages.  
Then you run the command that does the checking and generates the intersection of the pages you've generated and pages already on the wiki. The command I use is <code><nowiki>pwb.py listpages -format:3 -intersect -cat:"High Valyrian terms with IPA pronunciation" -file:file_with_generated_page_names.txt</nowiki></code>. This will give you list of pages that already exist. I then paste these in [https://docs.google.com/spreadsheets/d/1_gjZw59P7pnhOgWGWseO5pX69ZH5SUEs/edit?usp=sharing&ouid=116450018339793751999&rtpof=true&sd=true another spreadsheet] in order to get a search term to use on the first file with all the pages, to highlight them and copy them into a different text document. Then you run a second replace to replace the new lines (in the attached file). This will give you commands to use with Pywikibot in the command prompt, that add the part of the generated pages after the pronunciation section to the existing pages.  


You'll have to remove the final <code><nowiki>\n</nowiki></code>, this is some artifact of the process, and replace it with a new line. When you paste these into the command prompt, they will execute directly to add the text to existing pages, so make sure they are correct.  
You'll have to remove the final <code><nowiki>\n</nowiki></code>, this is some artifact of the process, and replace it with a new line. When you paste these into the command prompt, they will execute directly to add the text to existing pages, so make sure they are correct.  
When these partial pages have been appended to existing pages (they will only look right if the existing page is a High Valyrian term), you can do the last step, which is actually the easiest. This is simply adding the rest of the newly generated pages to the wiki. In the command prompt, enter <code><nowiki>pwb.py pagefromfile -showdiff -notitle -summary:"Created page" -file:file_with_all_generated_pages.txt</nowiki></code>. The pages that already exist that you dealt with in the previous step will not be a problem; they will simply not be added. Then you simply let that run and your pages will be added.  
When these partial pages have been appended to existing pages (they will only look right if the existing page is a High Valyrian term), you can do the last step, which is actually the easiest. This is simply adding the rest of the newly generated pages to the wiki. In the command prompt, enter <code><nowiki>pwb.py pagefromfile -showdiff -notitle -summary:"Created page" -file:file_with_all_generated_pages.txt</nowiki></code>. The pages that already exist that you dealt with in the previous step will not be a problem; they will simply not be added. Then you simply let that run and your pages will be added.  
Then repeat the process for the next batch of inflected form pages. If you are sure there are no identical page names among them, you can speed up the process by adding inflected form pages for words in several (sub)paradigms at once.
Then repeat the process for the next batch of inflected form pages. If you are sure there are no identical page names among them, you can speed up the process by adding inflected form pages for words in several paradigms at once.
 
:''I don't know any coding so this is the solution I came up with. It's very much cobbled together and I'm aware that it's rather convoluted.
::''-- [[User:Juelos|Juelos]] ([[User talk:Juelos|talk]]) 01:59, 26 September 2022 (PDT)''
 
[[Category:High Valyrian language]]

Latest revision as of 22:19, 19 April 2024

Things to do

Tasks on the High Valyrian section of the wiki that could use a hand are:

Adding pages for inflected word forms

By User:Juelos

There are two main components I use for the adding of pages for inflected forms: several different spreadsheets, based on the same basic principle, to generate the pages, and Pywikibot, a Python tool for interacting with MediaWiki, to add them to the wiki. Here are the spreadsheets. Each tab corresponds to some paradigm or subtype of a paradigm. You must modify the number of cells and/or their content for each paradigm if you haven't made or got a file for that purpose. Each cell corresponds to a wiki page for an inflected word form. Each cell i.e. wiki page must begin with {{-start-}} and end with {{-stop-}}, in order for it to work with Pywikibot later. For each citation form of a word, at least one principal part (the stem) must be provided, and sometimes more if the word is somehow irregular. How you get these is up to you, but it's easiest if you simply have a list that you can copy.

You then copy the whole range of cells with the into a text document, or just one column if I have concatenated each row into one cell. For the text document, I use Notepad++ to edit it. The main reason for this is the Regex (regular expressions) search and replace features it has, as well as the abilty to highlight and copy specific parts of the document, which I use a lot in the following steps. When you have the pages for inflected forms you wish to edit, you must remove tab characters and quotations marks which are an artifact from Excel. This is simply done with a regex search and replace for [\t"] and replace it with nothing.

The next step will be to check which of your generated pages already exist on the wiki. However, due to the limits of Pywikibot, it can only check against one category. I have yet to come up with a solution for this, so there may need to be some manual checking especially for short words. Note, that in your file with all the generated pages, there should not be two or more pages with identical names, since then only one (probably the last) will be added. This should not be a problem unless you have very similar words in the same paradigm.

You should then add the pages of inflected forms for such words in separate sessions/Pywikibot commands, in order to avoid this. The category you will most likely want to check against is "High Valyrian terms with IPA pronunciation", or "High Valyrian lemmas" or "High Valyrian non-lemma forms".

For the checking against existing pages, you'll want to highlight the page names you generated and copy them and only them into a separate file and save that file. Then you'll want to have installed Pywikibot. You will have to save your password or a bot password (safer) for your account in a login file (you can google Pywikibot tutorials, there are very many and they are very detailed). Then in the command prompt, you cd to the folder where you save your files and run Pywikibot, for example with cd pywikibot, followed by pwb.py login.

Then you run the command that does the checking and generates the intersection of the pages you've generated and pages already on the wiki. The command I use is pwb.py listpages -format:3 -intersect -cat:"High Valyrian terms with IPA pronunciation" -file:file_with_generated_page_names.txt. This will give you list of pages that already exist. I then paste these in another spreadsheet in order to get a search term to use on the first file with all the pages, to highlight them and copy them into a different text document. Then you run a second replace to replace the new lines (in the attached file). This will give you commands to use with Pywikibot in the command prompt, that add the part of the generated pages after the pronunciation section to the existing pages.

You'll have to remove the final \n, this is some artifact of the process, and replace it with a new line. When you paste these into the command prompt, they will execute directly to add the text to existing pages, so make sure they are correct. When these partial pages have been appended to existing pages (they will only look right if the existing page is a High Valyrian term), you can do the last step, which is actually the easiest. This is simply adding the rest of the newly generated pages to the wiki. In the command prompt, enter pwb.py pagefromfile -showdiff -notitle -summary:"Created page" -file:file_with_all_generated_pages.txt. The pages that already exist that you dealt with in the previous step will not be a problem; they will simply not be added. Then you simply let that run and your pages will be added. Then repeat the process for the next batch of inflected form pages. If you are sure there are no identical page names among them, you can speed up the process by adding inflected form pages for words in several paradigms at once.

I don't know any coding so this is the solution I came up with. It's very much cobbled together and I'm aware that it's rather convoluted.
-- Juelos (talk) 01:59, 26 September 2022 (PDT)