Wikidata:Bot requests/Archive/2020/03
This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Protein aliases (alias deletion)
Request date: 27 July 2019, by: SCIdude
- Link to discussions justifying the request
- Task description
The set of protein objects contained "(en)protein" aliases, this was fixed but as you can see from Q24248974 there is still the arabic بروتين and who knows which other language aliases with the exact translation "protein". Optionally, the same object has a "hypothetical protein" alias which should be moved to descriptions (and do the same to all proteins with that alias), and as third option remove the "Listeria gene" alias which is completely nonsensical (proteins with an alias of form taxon+"gene" would be the general target).
- Licence of data to import (if relevant)
- Discussion
SELECT * { ?a skos:altLabel "protein"@en }
Try it! Likely all of the above. --- Jura 01:25, 27 March 2020 (UTC)
- Request process
Accepted by (Edoderoo (talk) 11:18, 6 April 2020 (UTC)) and under process
Worked on this script, will run it the coming days.
Task completed (13:19, 10 April 2020 (UTC))
- This section was archived on a request by: Great thanks, Edoderoo! --- Jura 07:12, 12 April 2020 (UTC)
COVID19 recoveries: change P1561 to P8010
Please change number of survivors (P1561) to number of recoveries (P8010) on
- number of survivors (P1561) of COVID-19 pandemic in Luxembourg (Q87250860)
- number of survivors (P1561) of COVID-19 pandemic in Colombia (Q87483673)
There are multiple statements with references. --- Jura 13:01, 28 March 2020 (UTC)
Done —MisterSynergy (talk) 13:52, 28 March 2020 (UTC)
- This section was archived on a request by: Thanks! --- Jura 07:12, 12 April 2020 (UTC)
Add OpenStreetMap relation ID (P402) to US counties (one time data import)
SELECT * { ?item wdt:P882 [] ; wdt:P17 wd:Q30. MINUS { ?item wdt:P402 [] } . }
For completeness sake, I think it would be good to add the above. --- Jura 11:07, 1 March 2020 (UTC)
- Seems still useful. --- Jura 15:21, 11 September 2020 (UTC)
- OSM query for all counties with Wikidata links in Colorado: [1] Change nist:state_fips for different state/territory. --Pyfisch (talk) 11:08, 15 September 2020 (UTC)
I think that this discussion is resolved and can be archived. If you disagree, don't hesitate to replace this template with your comment. For discussion of the edits see Wikidata:Edit groups/CB/9e6c4c7f9ed8. --Pyfisch (talk) 14:16, 24 September 2020 (UTC) |
Handbook of Texas ID (P6015) from enwiki
This could probably be populated more from the template at enwiki. I did the counties.
@ديفيد عادل وهبة خليل 2, Thierry Caro, PKM, Cwf97, Gerwoman: who made/supported the property proposal. --- Jura 13:45, 1 March 2020 (UTC)
- OK. Done. I've just imported IDs from multiple languages. Thierry Caro (talk) 23:28, 1 March 2020 (UTC)
Complete/fix items created by SourceMD (label cleanup)
Many of the items created by the tool have labels with a middle initial lacking a ".". This is present in some of the sources listed. Also, it might be possible to complete these items with aliases and other identifiers from some of the references provided. --- Jura 10:13, 7 August 2019 (UTC)
- still open, I think. --- Jura 21:19, 26 March 2020 (UTC)
By type: qualifier fixes
Import taxon author (P405) and year of taxon name publication (P574) qualifiers for taxon name (P225) (one time data import from Wikipedia)
Similar to this edit adding qualifiers to taxon name (P225), maybe there is a good way to import year of publication of scientific name for taxon (P574)-values or even taxon author (P405) from enwiki (or another WP). --- Jura 09:10, 23 August 2019 (UTC)
- Some have been done since, but maybe more can. --- Jura 21:19, 26 March 2020 (UTC)
Other
Reviews in articles (data import/cleanup)
When doing checks on titles, I found some items with P31=scholarly article (Q13442814) include an ISBN in the title (P1476)-value.
Sample: Q28768784.
Ideally, these would have a statement main subject (P921) pointing to the item about the work. --- Jura 19:10, 13 December 2018 (UTC)
Discussion
@Jura1: I’ve been manually cleaning up a few of these. Some comments on the process from my perspective:
- Add P31 = book review (Q637866)
- The scholarly article should link to the version, edition or translation (Q3331189) associated with the ISBN, not the “work”.
- The version, edition or translation (Q3331189) should be created if it doesn’t exist.
- The version, edition or translation (Q3331189) should be set as “main subject” of the article.
- PKM (talk) 01:33, 4 March 2019 (UTC)
- Sure, it's possible to take this a step further. --- Jura 11:09, 10 March 2019 (UTC)
- I'd use genre (P136) for book review (Q637866), not P31. --- Jura 11:57, 25 August 2019 (UTC)
- I would be happy with either, but are there advantages or drawbacks about using one approach over the other?
- Adding the version, edition or translation (Q3331189) as the main subject of the book review sounds sensible. Doing a bot run to find items with an ISBN in the title and mark them as book reviews (either P31 or P136) should be pretty reliable. Richard Nevell (talk) 09:16, 27 August 2019 (UTC)
- I'd use genre (P136) for book review (Q637866), not P31. --- Jura 11:57, 25 August 2019 (UTC)
- Sure, it's possible to take this a step further. --- Jura 11:09, 10 March 2019 (UTC)
- BTW, In addition to looking for "ISBN", some of this articles are nicely titled "Book Reviews: .. ", sample: Book Reviews: SNYDERMAN, M., & ROTHMAN, S. (1988). The IQ Controversy, the Media and Public Policy. New Brunswick, NJ: Transaction Books, $18.95 paperback, $29.95 cloth, 310 pp (Q29397081)
- "$" in the title also finds some. --- Jura 11:55, 25 August 2019 (UTC)
- Maybe the first step is to identify them as book reviews, then find the work that is being reviewed. --- Jura 11:57, 25 August 2019 (UTC)
- I identified some items by adding book review (Q637866). Talk:Q637866 has some queries. Not all items include details of the book that was reviewed. --- Jura 17:25, 26 August 2019 (UTC)
- Would still be interesting to do. --- Jura 21:19, 26 March 2020 (UTC)
Monthly number of subscribers (periodic data import)
At Wikidata:Property proposal/subscribers, there is some discussion about various formats for the number of subscribers. For accounts with many subscribers, I think it would be interesting to gather monthly data in Wikidata.
Using format (D1) this could be added to items such as Q65665844, Q65676176. Initially one might want to focus on accounts with > 100 or 50 million subscribers. Depending on how it goes, we could change the threshold.
I think ideally the monthly data would be gathered in the last week or last ten days of the month. --- Jura 14:22, 19 July 2019 (UTC)
- Good to have, I think. --- Jura 21:19, 26 March 2020 (UTC)
Optimize format of country items (reference consolidation)
Given that these items get larger and larger, it might be worth to review their structure periodically and optimize their format, e.g. by moving references to separate items. Check for duplication, etc. --- Jura 13:33, 14 June 2019 (UTC)
Related: https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2019/06#Metadata_and_reference_unification_for_Economics_and_possibly_other_projects --813gan (talk) 17:37, 17 June 2019 (UTC)
- If <stated in>: <bla> is sufficient to locate the information within <bla>. I don't think all elements from the item <bla> should be repeated in the reference. --- Jura 14:47, 25 June 2019 (UTC)
- I guess it's ever more useful. --- Jura 21:19, 26 March 2020 (UTC)
Add description to items about articles
- Items used: scholarly article (Q13442814)
- Properties used: instance of (P31)
SELECT ?item
{
?item wdt:P31 wd:Q13442814 .
OPTIONAL { ?item schema:description ?d . FILTER(lang(?d)="en") }
FILTER( !BOUND(?d) )
}
LIMIT 10
I seem to keep coming across articles that lack descriptions. If they had long titles, that wouldn't matter, but it's happens with articles that could be mistaken for items about topics. As I can't query them efficiently and just add descriptions with quickstatements/descriptioner, maybe a bot could run the above query every few minutes or so (once query server lag is gone) and add basic descriptions. If the standard description collides with another item, please add some variation. --- Jura 14:43, 24 June 2019 (UTC)
- In English, most get a description during the import. But for people working on the other 300 language wiki's this ain't no help ;-) For Dutch I have given a big load of items a description already. A tool that can also be of help is Descriptioner. If you copy your query in here, it can set the descriptions for you in the background. Edoderoo (talk) 09:13, 25 June 2019 (UTC)
- I'm aware of that. I just did a few with SELECT ?item { ?item wdt:P31 wd:Q13442814 } OFFSET n LIMIT 50000
- Surprisingly, I even got up to offset 3,000,000. Still, even with this approach, a bot might be the better choice.
- The other query needs to do even smaller steps of to avoid timeout.
- Maybe there is a better way to identify them.--- Jura 11:02, 25 June 2019 (UTC)
- I almost got to offset 4000000 before facing a timeout in descriptioner as well. --- Jura 11:50, 25 June 2019 (UTC)
- and now the initial query times-out too. --- Jura 13:46, 25 June 2019 (UTC)
- I gave it another try with Special:Search/haswbstatement:P31=Q13442814 -article. It finds 3,534,392 items. I will do some of them. --- Jura 13:46, 25 June 2019 (UTC)
- I did some of it. Maybe a bot could run through the remaining ones. For a more general solution, see Wikidata:Property_proposal/default_description_for_instances. --- Jura 20:43, 30 June 2019 (UTC)
- Seems endless .. --- Jura 21:19, 26 March 2020 (UTC)
If you have access to a Linux machine, you can install pywikibot and use this script (add 'en' as language in the structure though). If you do not have Pywikibot, it is available to everyone on PAWS. Then copy/paste the script into a new python-file, and run it. Theoretically I could do that for you too with Edoderoobot, but my bot is already stopped many times a day because of too many edits/to much lag/etc. Edoderoo (talk) 11:20, 30 March 2020 (UTC)