Wikidata:Bot requests/Archive/2018/11

Latest comment: 4 years ago by Jura1 in topic BLKÖ (one time data import)

Fixing URLs in Sources

Request date: 12 September 2018, by: MichaelSchoenitzer

Task description

After an update of the software hosting the site all the links to tags in the gnome-gitlab don't work anymore. They are used heavily in sources for version-numbers. Can someone update the 240 links with an bot/script? From

https://git.gnome.org/browse/([^/]*)/tag/?h=(.*)

to

https://gitlab.gnome.org/GNOME/$1/tags/$2

Here's a query searching for the cases:

select ?item ?st ?url where {
  ?item p:P348 ?st.
  ?st prov:wasDerivedFrom ?src.
  ?src pr:P854 ?url.
  FILTER CONTAINS(STR(?url), "https://git.gnome.org/browse")
  }
Try it!
Discussion
Request process

Task completed (00:27, 2 November 2018 (UTC))

This section was archived on a request by: MichaelSchoenitzer (talk) 00:27, 2 November 2018 (UTC)

Import SL-language items into the WikiData medical repository

Request process

Request date: 26 November 2018, by: Vitosmo

Link to discussions justifying the request

https://sl.wikipedia.org/wiki/Uporabni%C5%A1ki_pogovor:Vitosmo#Bot_Request

Task description

a flat file of items localized for the SL language is to be imported into the WikiData repository

Licence of data to import (if relevant)
Discussion

request by Vitosmo (talk) 12:46, 26 November 2018 (UTC)

I've created a request for permission.

Two notes about the table:

  • There are two nearly empty lines only containing the string sl.
  • A few names contain parentheses. Normally text in parentheses is used to distinguish different pages with the same title but this is not necessary in Wikidata because there are descriptions.

Do you want to update the table?

--Pyfisch (talk) 16:24, 1 December 2018 (UTC)

I'll take care of the two questions by tomorrow at the latest. Sorry for the delay
Vitosmo (talk) 21:43, 14 December 2018 (UTC)
Pyfisch: corrected the two errors indicated - danke und los geht's(g)
Pinky sl: Passed the texts through the spellcheck/Besana strainer
Vitosmo (talk) 10:29, 15 December 2018 (UTC)

Should the old sl labels be set as an alias? --Pyfisch (talk) 12:38, 7 December 2018 (UTC)

I work with Vitosmo on this request. I am also an admin on sl wiki. You can set old sl labels as alias. --Pinky sl (talk) 07:37, 8 December 2018 (UTC)
Request process
FischBot 6

Task completed (19:19, 18 December 2018 (UTC))

This section was archived on a request by: Pyfisch (talk) 19:19, 18 December 2018 (UTC)

move descriptions in German from English to German description

Special:Search/Beruf/Funktion seems to find a lot. Sample edit: [1] --- Jura 16:07, 2 November 2018 (UTC)


{{Section resolved|Manually moved 9 entries. Can't find more German descriptions in the English field. Pyfisch (talk) 17:32, 1 December 2018 (UTC)}}

  • @Pyfisch: did you click on the search link ? It currently gives 15,607 results and any I clicked on aren't in English. Maybe you need to change your interface language to English --- Jura 05:52, 4 December 2018 (UTC)
Interesting. I searched with "Pages in this language: English" which only shows a single result. But if I switch the interface language to English I get a whole lot of entities. Thanks for the SPARQL query! I am not sure though if we want to move all these descriptions to German because they are rather long unlike other descriptions that serve to disambigulate between persons of the same name. --Pyfisch (talk) 09:53, 4 December 2018 (UTC)
Feel free to improve them, but we surely don't want German text in the English description field. --- Jura 09:58, 4 December 2018 (UTC)
I am now moving descriptions from English to German. (details) --Pyfisch (talk) 09:27, 11 December 2018 (UTC)
(more) --Pyfisch (talk) 11:18, 11 December 2018 (UTC)
(more) --Pyfisch (talk) 13:37, 11 December 2018 (UTC)
(last one) I should have fixed most (all). @Jura1: If you have a query that produces more items, please tell me.--Pyfisch (talk) 19:00, 11 December 2018 (UTC)
Thanks. Seems mostly done. Searching for "Konfession" finds a few more. --- Jura 16:50, 12 December 2018 (UTC)
"Konfession" done. --Pyfisch (talk) 21:24, 14 December 2018 (UTC)
This section was archived on a request by: Matěj Suchánek (talk) 14:29, 15 February 2019 (UTC)

Populating P3722 (P3722)

Request date: 30 May 2018, by: Thierry Caro

Link to discussions justifying the request
  • None.
Task description

Take all instances of subclasses of geographical feature (Q618123). Look for those that have a Commons category (P373) statement and visit the corresponding Commons category. If it includes another category that has Maps of and then its name as its own name, import this value as P3722 (P3722) to the item. This would be useful to the French Wikipedia, where we now have Q54473574 automatically populated through Template:Geographical links (Q28528875).

Licence of data to import (if relevant)
  • None
Discussion
  Comment. Hi. Is there still someone here? Thierry Caro (talk) 23:27, 8 November 2018 (UTC)
@MisterSynergy: This could be for you, couldn't it? Thierry Caro (talk) 19:52, 17 February 2019 (UTC)
Uff, not sure. Let's talk about one specific example:
Correct? This would require quite some specific code to be written which I do not have yet; however, it does appear doable I think. Do you have an estimation how many statements could be created this way? —MisterSynergy (talk) 20:11, 17 February 2019 (UTC)
@MisterSynergy: Yes, that's correct. I have no idea how much imports could be made this way but I have the feeling that there could be a lot. My guess would be in the tens of thousands? Thierry Caro (talk) 20:36, 17 February 2019 (UTC)
I just ran a query here which yields ~30k categories at Commons which are named "Maps of …". However, there is stuff like commons:Category:Maps of 19th-century Europe of which I am not sure whether it qualifies as P3722 value for any item. What do you think? Do we have other "special cases"? —MisterSynergy (talk) 20:41, 17 February 2019 (UTC)
@MisterSynergy: No, I have nothing else in mind that could create problems. I think having things like commons:Category:Maps of 19th-century Europe as a value is OK. Plus I believe that most of the time, for these special cases, the item – here an item for 19th-century Europe – won't exist whatever. But if the item does exist, then fine. You may add the relevant statement through P3722 (P3722). Thierry Caro (talk) 02:27, 18 February 2019 (UTC)
I found some time to write the code for such an import, and sample diffs are these: [3][4][5]. Looks good so far. If you are fine with it, I try to get a bot job approved for this import, to perform it with User:MsynBot. I’d iterate over all ~30k "Maps of" categories from my previous comment, and my guess would be that in around 50% of the cases there could be an import. —MisterSynergy (talk) 19:47, 22 February 2019 (UTC)
@MisterSynergy: Everything is fine, as far as I'm concerned. Thank you. Thierry Caro (talk) 21:00, 22 February 2019 (UTC)
This is done now, we went from initially 167 P3722 statements up to almost 13.000 now. Quite a lot of the ~30.000 input categories do not fit anywhere right now, either due to their structure which is missing a Wikidata item (such as for example commons:Category:Maps of weather and climate of Sri Lanka, commons:Category:Maps of the world before Columbus, commons:Category:Maps of borders of Sweden, and many others), or because of the single value constraint on P3722 which does not permit to add historical categories (such as commons:Category:Maps of 17th-century Europe). —MisterSynergy ( talk) 15:02, 1 March 2019 (UTC)
@MisterSynergy: OK. Thank you very much for your dedicated work. There are now almost a thousand active pages using this property on the French Wikipedia, as you may see here. This is great. Thierry Caro (talk) 23:59, 3 March 2019 (UTC)
@MisterSynergy: If you want to try to get more results for the sake of it, you may try to look for the Maps of category not in the main category of the given item anymore but within its Geography of subcategory if it exists. For example, go from Martinique (Q17054) to Category:Martinique, then from there to Category:Geography of Martinique and then get to Category:Maps of Martinique eventually. This should let us reach a few hundreds more. But then again I'm already fine with what you've done! Thanks. Thierry Caro (talk) 00:16, 4 March 2019 (UTC)
Request process

Task completed —MisterSynergy (talk) 15:02, 1 March 2019 (UTC)

This section was archived on a request by: MisterSynergy (talk) 15:02, 1 March 2019 (UTC)

Add annual country level unemployment rate (P1198)

It would be interesting to have annual data for each country (1 value per year for country items). I'm not sure what are the most suitable sources for each country.

When discussing a query with CalvinBall, I noticed that Q30#P1198 currently only has one value (for 2013). --- Jura 12:34, 26 October 2018 (UTC)

Hi Jura, we could use the WDBot to do this job. The source could be World Bank Data - here an example for the USA: https://data.worldbank.org/indicator/SL.UEM.TOTL.ZS?locations=US. The World Bank uses ILO estimates, which have the following nice properties (check the Details button for the indicator on the WB's page):
Statistical Concept and Methodology: [...] The standard definition of unemployed persons is those individuals without work, seeking work in a recent past period, and currently available for work, including people who have lost their jobs or who have voluntarily left work. Persons who did not look for work but have an arrangements for a future job are also counted as unemployed. Some unemployment is unavoidable. At any time some workers are temporarily unemployed between jobs as employers look for the right workers and workers search for better jobs. It is the labour force or the economically active portion of the population that serves as the base for this indicator, not the total population. The series is part of the ILO estimates and is harmonized to ensure comparability across countries and over time by accounting for differences in data source, scope of coverage, methodology, and other country-specific factors. The estimates are based mainly on nationally representative labor force surveys, with other sources (population censuses and nationally reported estimates) used only when no survey data are available..
If this is fine for you I would make a request for bot permission (the script is already available). Datawiki30 (talk) 15:50, 26 October 2018 (UTC)
Hi Jura and thank you for your feedback. Do we really need the additional qualifier? Similar to the GDP I would just use the "stated in" = World Bank database and "reference URL" = https://data.worldbank.org/indicator/SL.UEM.TOTL.ZS?locations=XX where XX is the ISO code of the country. My opinion is that qualifier trying to explain the data are too short to describe the method behind the data... For new methods I would just suggest to propose a new property (like there are different properties for total, male and female population); Cheers! Datawiki30 (talk) 16:36, 26 October 2018 (UTC)
I think it's useful. The value would just be a (new) specific item. There is no need for its label or description to include the full text. I think it's an advantage as the above numbers may be useful for cross-country comparison, but some users might be looking for just one country and expect the methodology preferred in the country. For your bot it might be possible to do all in one edit, so the additional work would be marginal. --- Jura 16:44, 26 October 2018 (UTC)
Thank you Jura for your comment. There are no technical obstacles - the bot can handle this. I suppose that you mean, that we could have structurally different values in the same property - for example the value from ILO could for country A could be for example 5% where for the same years and country the value from Eurostat could be 3 %. Is this the case? Datawiki30 (talk) 21:29, 26 October 2018 (UTC)
  • Yes. I had in mind mainly national agencies that might have 4% instead of 10% (by whatever method), but it's the same issue. --- Jura 16:05, 2 November 2018 (UTC)
@Jura1: OK. Could you please take a look here? After a discussion in the project chat and here I think that it would be the best to import only the most actual data (for example for 2017). Otherwise we could have problems with the loading time of the countries pages. I would be glad to see your comment there. Cheers! Datawiki30 (talk) 19:18, 12 November 2018 (UTC)
  • This isn't really helpful to look at the evolution. I think it could easily hold annual data for the last 50 years. There are several other properties that have annual data. --- Jura 04:24, 13 November 2018 (UTC)

trainer-stations

Request date: 12 November 2018, by: Fundriver

Task description

Is it possible to harvest the trainer-data for coach of sports team (P6087) out of the german Wikipedia out of different infoboxes? It should be pretty similar to the harvesting for member of sports team (P54) and could be done with the same syntax for different sports, because the infoboxes are similar for the different sports in the german Wikipedia (expect for ice hockey): You could use trainer_tabelle in Template:Infobox Rugby Union biography (Q14373909), Template:Infobox football biography (Q5616966), Template:Infobox basketball biography (Q5831659) and Template:Infobox floorball player (Q20963207) with the same technic. You probably just should pay attention to don't import data, that isn't totally clear. So sometimes you have a "(Co-Tr.)", "U-21" or "U21" in addition to the Wikilink that need manual oversight. But per example a "(Co-Tr)" you could use to refine a statement, if this is possible. Fundriver (talk) 09:52, 12 November 2018 (UTC)

Licence of data to import (if relevant)
Discussion


Request process

BLKÖ (one time data import)

Most pages in https://de.wikisource.org/wiki/Kategorie:BLK%C3%96 (27209 pages) seem to lack items (http://petscan.wmflabs.org/?psid=6382466 , currently 26641 pages).

I think it would be worth creating them as well as an item for the person subject of the article if it can't be matched with one of the exisiting items. --- Jura 07:43, 8 November 2018 (UTC)

Proposal

To get this started I propose this structure for articles. It also mentions from which source each statement is imported. As I see it besides the structure for articles the structure for volumes and person subjects with imported data also needs to be decided. Additionally described by source (P1343) should probably be added to new and existing person subjects. --Pyfisch (talk) 22:29, 11 December 2018 (UTC)

Article


I've made a preliminary data export. It contains all BLKÖ articles with GND, Bearbeitungsstand etc. The articles are linked based on the stated GND, Wikipedia and Wikisource articles, if there was a conflict multiple Q-numbers are given. I also searched for items linked to the article and unfortuanly found many that describe the person instead the of the text (they will need to be split). The last four columns state the date/place of birth/death from the text. The dates vary in accuracy:
  • year-month-day, year-month, only year
  • ~ before date describes imprecise dates
  • > before describes dates stated as "nach 1804"
  • A before dates describes "Anfang/erste Tage" start of
  • E before dates describes "Ende/letzte Tage" end of
  • M before dates describes "Mitte" middle of
  • ? BLKÖ knows the person was dead but does not know when he/she died

The places will need to be manually matched to Q-items. The first column contains some metadata about the kind of page. There are:

  • empty: Person
  • L: Liste
  • F: Family, Wappen, Genealogie
  • R: Cross Reference
  • P: Prelude
  • H: note about names and alternate spellings
  • N: corrections, Nachträge

Each group should get a distinct is-a property. @Jura1: Do you like it? This is just for viewing, a later version will be editable to make manual changes before the import. --Pyfisch (talk) 22:14, 18 December 2018 (UTC)

@M2k~dewiki: Yes, the data is already prepared for the import, but I have not gotten around to writing an import script, getting approval and running the script. --Pyfisch (talk) 09:07, 11 July 2019 (UTC)

Hello @Jura1:

I started to create the wikidata objects for Kategorie:BLKÖ:Band 1 to Kategorie:BLKÖ:Band 12 (from overall 60) a few days ago with Petscan (example), since I did not want to create them manually anymore for every newly created article in the German wikipedia, which references to an entry from BLKÖ (and this proposal from 2018 has been already archived in January and the user who wrote the initial proposal did not reply anymore Topic:Vg5y62e08pcorztk, also the documents above ("data export") already have been deleted on Google Docs).

I also tried to use Harvest Tools to add the references (example) harvesting from Vorlage:BLKÖ, which only worked in a few cases.

How would you do the cross-referencing of the two items? Which tools would you use? I also checked Wikidata:WikiProject DNB, but I did not find any comment, which tools or techniques have been used. Besides from Quickstatements, HarvestTools, PetScan, I sometimes use external (PERL) scripts, in order to harvest information (e.g. using WWW::Mechanize, responses from https://query.wikidata.org/sparql, ...), to convert/merge data, or to prepare statements for Quickstatements, for example. --M2k~dewiki (talk) 17:21, 27 March 2020 (UTC)

  • Yeah, I should have follow through with this earlier. I think PetScan should do. You could also create a QuickStatements for new items.
MxM would probably be ideal, but Magnus would need to do a few adjustment for it to work. If you work with Openrefine, that might be an alternative. For entries where we can get dates, it might fairly straightforward to just create them and merge a few duplicates. --- Jura 00:23, 28 March 2020 (UTC)
Hi @M2k~dewiki:, thanks for starting again to bring some attention on this interesting and important project on deWikisource linking with Wikidata. I'm already maintaining another big project on deWikisource linking to Wikidata - Die Gartenlaube. There we have now more than 13,000 fully qualified bibliographic items on Wikidata. For this purpose i have written a python script to parse the categories and extract the bibliographic information from the infobox on each article page and create import files for QuickStatements.

BLKÖ is already on my todo-list, unfortunately time is running... The above presented data model looks very good to me, i only have some suggestions (from a librarian's perspective): ;-) we shouldn't use in bibliographic context

therefor

If i have some time within the next few days i could point my script to BLKÖ and make some imports as example for further discussions. --Mfchris84 (talk) 21:59, 28 March 2020 (UTC) (Add a section on German WikiSource to inform the community about this ongoing discussion here: BLKÖ

  • Yes, P1433 does look better. Feel free to move ahead as you prefer. If you create them with QuickStatements, it would have the advantage that everything gets done in one edit. BTW, I just came across an experiment I did with MxM a while ago [6]. I suppose we could get that to work. --- Jura 22:07, 28 March 2020 (UTC)

Hello @Mfchris84, Jura1:

currently there exist about 8.200 objects for BLKÖ (approx. volume 1 to 16 out of 60), where about 300 have a cross-reference yet (User:M2k~dewiki/Tools#Project_BLKÖ):

Today I wrote a program which can cross-reference objects based on the GND.

Examples:

Hey @Jura1:, first line (P1922) that's a great idea! i made an example here: Schmid, Anton (Musikschriftsteller) (BLKÖ) (Q27134764), so we had also to add copyright status. i add both statements in the model above. --Mfchris84 (talk) 22:53, 28 March 2020 (UTC)

@M2k~dewiki, Jura1:, thanks for these queries, they are very useful. I made some edits on Anschütz, Heinrich (BLKÖ) (Q88549749) to have a "fully qualified" item for biographic encylopedial article.

Do we have a consensus about the structure of the labels? I really would prefer to add BLKÖ in the label to have an easy difference between the biographical item and the BIBLIOgraphical item. This could be especially for automatically reconciliation processes like in OpenRefine very useful to avoid that people starting adding biographical identifiers to the enyclopedical items from BLKÖ instead to the biographical items. What are your opinions on this question? --Mfchris84 (talk) 22:47, 28 March 2020 (UTC)

Some other useful queries for updating the existing items:

@Mfchris84, Jura1: the label structure d:Q88549769 would be similar to the ADB entries, for example d:Q27568759. --M2k~dewiki (talk) 22:52, 28 March 2020 (UTC)

in a certain way this looks good for me, i have my doubt that don't have the encylopedia abbreviation in the label directly some bot scripts or reonciliation process with less attention will match these bibliographic items to some biographical facts which should be stored in the biographical item on Wikidata - but that are things we could fetch with ShapExpressions later on! --Mfchris84 (talk) 22:59, 28 March 2020 (UTC)

@Mfchris84: regarding d:Q88549749: is your script also able to do all or some of the other steps, like creating the entries (currently done with petscan, might take several days or weeks) or adding the cross-references ? --M2k~dewiki (talk) 22:57, 28 March 2020 (UTC)

@M2k~dewiki: of course my python script is also able to do cross-referencing based on the given wikipedia-links or gnd-ids in the infoboxes on wikisource page. e.g that's also the way how i add main subject statement for Gartenlaube articles, because Wikisource-editors add main subjects on WikiSource-Pages with Wikipedia-Links. Unfortunately i need some time to point my script against BLKÖ. But then i can do both, create new items and update all the existing items. --Mfchris84 (talk) 23:02, 28 March 2020 (UTC)
  • Personally, I tend to agree with Mfchris84, especially as I had to cleanup several 10,000 of DNB entries that kept getting mixed up. (i.e. I prefer the earlier version of ADB [7].) --- Jura 23:01, 28 March 2020 (UTC)
i would also prefer this way. with a suffix in parenthesis it also don't look that bad as with these namespace-like prefix as it as structured in Wikisource. --Mfchris84 (talk) 23:03, 28 March 2020 (UTC)

@M2k~dewiki, Jura1: - ok, four hours of coding, i think i have some first approach create some good items for BKLÖ. My script gitlab.com/blkoe is now able to create items like Majláth von Székhély, Joseph (II.) Graf (BLKÖ) (Q88911969) by QuickStatements. Improvement i see is needed for follows/followed by. But that's would be in a second round even better, because in the first round for all items without a wikidata-item these links won't be set neither. My script also added on biographical item the described by source (P1343) Q1163609#P1343 with qualifiers and references. What do you think about this item? I think, we could go forward this way; in a "second round" all items could get improvement like

  • missing volume/page statements
  • missing main subject (and its cross-reference on the biographical item - but maybe M2k~dewiki will do this which his/her script?)
  • adding follows/followed by (which needs that all items are created)

--Mfchris84 (talk) 01:52, 29 March 2020 (UTC)

@Mfchris84: looks good to me, please go ahead. Thanks a lot! --M2k~dewiki (talk) 02:05, 29 March 2020 (UTC)

@M2k~dewiki, Jura1: Thanks, i will go ahead with my approach. One point we have to discuss, are how shall we deal with the huge amount on "cross-references" within the encyclopedia. Like BLKÖ:Lubomirski, Georg Fürst (Verweis) Lubomirski, Georg Fürst (Verweis) (BLKÖ) (Q88898002) which only stated a different name for the encyclopedic article there BLKÖ:Lubomirski, Georg Fürst Lubomirski, Georg Fürst (BLKÖ) (Q88893826). We should prevent to create items like Lubomirski, Georg Fürst (Verweis) (BLKÖ) (Q88898002) - i know this happend to the petscan import, which can't deal with such constraints. What would be a good model for such a "Verweis"?
I modified Lubomirski, Georg Fürst (Verweis) (BLKÖ) (Q88898002) in the following points:
* instance of (P31)cross-reference (Q1302249)said to be the same as (P460)Lubomirski, Georg Fürst (BLKÖ) (Q88893826) Link to the biographical article.
* Adding Volume and Page information, and also follows/followed by links like it is given in the encyclopedia.
* description change to "cross-reference to an entry in the [BLKÖ]"
--06:27, 29 March 2020 (UTC)
  • For cross-references, main subject could be used to point to the other article. P460 isn't generally used as a qualifier. HarvestTemplates could be used to set follows/followed by, if there is interest. The main advantage of cross-reference items is that we wont miss pages. It's up to you how much detail you want to add into them, if any. --- Jura 06:54, 29 March 2020 (UTC)
Good point.

During my first QS-BatchJob, a problem occurs adding described by source (P1343) with the "LAST" term. So there a self-referencing statements now.

SELECT * WHERE {
  ?item p:P1343 ?descStmt.
  ?descStmt ps:P1343 wd:Q665807;
            pq:P805 ?item.
  ?blkoArticle wdt:P1433 wd:Q665807;
               wdt:P921 ?item.
}
Try it!

I have cleaned those self-references up and i will split described by source (P1343) also into a second round when i have the Q-ID for the article. --Mfchris84 (talk) 06:57, 29 March 2020 (UTC)

@Mfchris84: it seems that the english and german descriptions are mixed up ("(vol. 16, p. 357)" vs. " (Bd. 16, S. 357)"). Examples: d:Q88936832, d:Q67389376. --M2k~dewiki (talk) 00:51, 30 March 2020 (UTC)

@M2k~dewiki: correct. if have already changed my script and starting and update process as soon as possible. --Mfchris84 (talk) 05:19, 30 March 2020 (UTC)

@Zabia: this project might be also for your interest, for example de:Benutzer:M2k~dewiki/Test, e.g. to check the completeness (do all wikisource articles link to an existing article in the german language wikipedia?) and correctness (does the article really exist/has the article been moved to another lemma?) of links in the Wikisource articles to the german language articles. (also see Help:Import BLKÖ from wikisource). --M2k~dewiki (talk) 11:52, 30 March 2020 (UTC)

For example, in this wikisource article the lemma in the german wikipedia has been changed, but the change has not been done in the wikisource article. --M2k~dewiki (talk) 12:00, 30 March 2020 (UTC)

@M2k~dewiki, Mfchris84:

  • Seems to be advancing fine, despite some hickups with Quickstatement yesterday. I expanded the help page a bit. Some random points:
  1. Completed volume/page for all existing items. (9000)
  2. Added/fixed a few P31 for cross-reference (Q1302249)
  3. For vol. 1 & 2, I added the scan available at Commons with the relevant file page (Hopefully the thumbnails work one day).
  4. Wikisource has an external link to scans. If there is interest, these could be added with full work available at URL (P953).
  5. Should a "proofread"-badge be set on the sitelink?
--- Jura 21:44, 4 April 2020 (UTC)
This section was archived on a request by: continued at Help:Import BLKÖ from wikisource --- Jura 06:44, 23 April 2020 (UTC)