Wikidata:Bot requests/Archive/2020/03

Protein aliases (alias deletion)

Request date: 27 July 2019, by: SCIdude

Link to discussions justifying the request


Task description

The set of protein objects contained "(en)protein" aliases, this was fixed but as you can see from Q24248974 there is still the arabic بروتين and who knows which other language aliases with the exact translation "protein". Optionally, the same object has a "hypothetical protein" alias which should be moved to descriptions (and do the same to all proteins with that alias), and as third option remove the "Listeria gene" alias which is completely nonsensical (proteins with an alias of form taxon+"gene" would be the general target).

Licence of data to import (if relevant)
Discussion
SELECT * { ?a skos:altLabel "protein"@en }

Try it! Likely all of the above. --- Jura 01:25, 27 March 2020 (UTC)

Request process

Accepted by (Edoderoo (talk) 11:18, 6 April 2020 (UTC)) and under process
Worked on this script, will run it the coming days.
Task completed (13:19, 10 April 2020 (UTC))

This section was archived on a request by: Great thanks, Edoderoo! --- Jura 07:12, 12 April 2020 (UTC)

COVID19 recoveries: change P1561 to P8010

Please change number of survivors (P1561) to number of recoveries (P8010) on

There are multiple statements with references. --- Jura 13:01, 28 March 2020 (UTC)

✓ DoneMisterSynergy (talk) 13:52, 28 March 2020 (UTC)

This section was archived on a request by: Thanks! --- Jura 07:12, 12 April 2020 (UTC)

Add OpenStreetMap relation ID (P402) to US counties (one time data import)

SELECT * { ?item wdt:P882 [] ; wdt:P17 wd:Q30. MINUS { ?item wdt:P402 [] } . }
Try it!

For completeness sake, I think it would be good to add the above. --- Jura 11:07, 1 March 2020 (UTC)

Seems still useful. --- Jura 15:21, 11 September 2020 (UTC)
OSM query for all counties with Wikidata links in Colorado: [1] Change nist:state_fips for different state/territory. --Pyfisch (talk) 11:08, 15 September 2020 (UTC)
I think that this discussion is resolved and can be archived. If you disagree, don't hesitate to replace this template with your comment. For discussion of the edits see Wikidata:Edit groups/CB/9e6c4c7f9ed8. --Pyfisch (talk) 14:16, 24 September 2020 (UTC)

This could probably be populated more from the template at enwiki. I did the counties.

@ديفيد عادل وهبة خليل 2, Thierry Caro, PKM, Cwf97, Gerwoman: who made/supported the property proposal. --- Jura 13:45, 1 March 2020 (UTC)

OK. ✓ Done. I've just imported IDs from multiple languages. Thierry Caro (talk) 23:28, 1 March 2020 (UTC)

Complete/fix items created by SourceMD (label cleanup)

Many of the items created by the tool have labels with a middle initial lacking a ".". This is present in some of the sources listed. Also, it might be possible to complete these items with aliases and other identifiers from some of the references provided. --- Jura 10:13, 7 August 2019 (UTC)


By type: qualifier fixes

Import taxon author (P405) and year of taxon name publication (P574) qualifiers for taxon name (P225) (one time data import from Wikipedia)

Similar to this edit adding qualifiers to taxon name (P225), maybe there is a good way to import year of publication of scientific name for taxon (P574)-values or even taxon author (P405) from enwiki (or another WP). --- Jura 09:10, 23 August 2019 (UTC)


Other

Reviews in articles (data import/cleanup)

When doing checks on titles, I found some items with P31=scholarly article (Q13442814) include an ISBN in the title (P1476)-value.

Sample: Q28768784.

Ideally, these would have a statement main subject (P921) pointing to the item about the work. --- Jura 19:10, 13 December 2018 (UTC)

Discussion

@Jura1: I’ve been manually cleaning up a few of these. Some comments on the process from my perspective:

- PKM (talk) 01:33, 4 March 2019 (UTC)

"$" in the title also finds some. --- Jura 11:55, 25 August 2019 (UTC)


Monthly number of subscribers (periodic data import)

At Wikidata:Property proposal/subscribers, there is some discussion about various formats for the number of subscribers. For accounts with many subscribers, I think it would be interesting to gather monthly data in Wikidata.

Using format (D1) this could be added to items such as Q65665844, Q65676176. Initially one might want to focus on accounts with > 100 or 50 million subscribers. Depending on how it goes, we could change the threshold.

I think ideally the monthly data would be gathered in the last week or last ten days of the month. --- Jura 14:22, 19 July 2019 (UTC)


Optimize format of country items (reference consolidation)

Given that these items get larger and larger, it might be worth to review their structure periodically and optimize their format, e.g. by moving references to separate items. Check for duplication, etc. --- Jura 13:33, 14 June 2019 (UTC)

Related: https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2019/06#Metadata_and_reference_unification_for_Economics_and_possibly_other_projects --813gan (talk) 17:37, 17 June 2019 (UTC)

  • If <stated in>: <bla> is sufficient to locate the information within <bla>. I don't think all elements from the item <bla> should be repeated in the reference. --- Jura 14:47, 25 June 2019 (UTC)

Add description to items about articles

SELECT ?item
{
	?item wdt:P31 wd:Q13442814 . 
    OPTIONAL { ?item schema:description ?d . FILTER(lang(?d)="en") }
    FILTER( !BOUND(?d) )
}
LIMIT 10

Try it!

I seem to keep coming across articles that lack descriptions. If they had long titles, that wouldn't matter, but it's happens with articles that could be mistaken for items about topics. As I can't query them efficiently and just add descriptions with quickstatements/descriptioner, maybe a bot could run the above query every few minutes or so (once query server lag is gone) and add basic descriptions. If the standard description collides with another item, please add some variation. --- Jura 14:43, 24 June 2019 (UTC)

In English, most get a description during the import. But for people working on the other 300 language wiki's this ain't no help ;-) For Dutch I have given a big load of items a description already. A tool that can also be of help is Descriptioner. If you copy your query in here, it can set the descriptions for you in the background. Edoderoo (talk) 09:13, 25 June 2019 (UTC)
I'm aware of that. I just did a few with SELECT ?item { ?item wdt:P31 wd:Q13442814 } OFFSET n LIMIT 50000
Surprisingly, I even got up to offset 3,000,000. Still, even with this approach, a bot might be the better choice.
The other query needs to do even smaller steps of to avoid timeout.
Maybe there is a better way to identify them.--- Jura 11:02, 25 June 2019 (UTC)
I almost got to offset 4000000 before facing a timeout in descriptioner as well. --- Jura 11:50, 25 June 2019 (UTC)
and now the initial query times-out too. --- Jura 13:46, 25 June 2019 (UTC)

If you have access to a Linux machine, you can install pywikibot and use this script (add 'en' as language in the structure though). If you do not have Pywikibot, it is available to everyone on PAWS. Then copy/paste the script into a new python-file, and run it. Theoretically I could do that for you too with Edoderoobot, but my bot is already stopped many times a day because of too many edits/to much lag/etc. Edoderoo (talk) 11:20, 30 March 2020 (UTC)