User talk:Daniel Mietchen/Archive/2017

Latest comment: 6 years ago by Lydia Pintscher (WMDE) in topic Wikidata weekly summary #292
This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

Wikidata weekly summary #242

Wikidata weekly summary #242

Wikidata weekly summary #243

Wikidata weekly summary #244

Wikidata weekly summary #245

Title with footnote marks

We got "† , ‡" footnote marks in the title [1]. I wonder if this is a sourcemd issue? — Finn Årup Nielsen (fnielsen) (talk) 19:53, 5 February 2017 (UTC)

@Fnielsen: This comes from the sources, and SourceMD just forwards it. Several similar problems listed here. --Daniel Mietchen (talk) 00:53, 7 February 2017 (UTC)

Wikidata weekly summary #246

Wikidata weekly summary #247

Wikidata weekly summary #248

Research Bot

Hallo Daniel,

ich habe ein gewisses Problem mit Artikeltiteln in Großbuchstaben wie z.B. BENJAMIN FRANKLIN'S MEDICAL IMPRINTS (Q28776923). Zweifelsohne ist das der Titel, wie ihn die Quelle hergibt, aber sind Artikeltitel in Großbuchstaben nicht nur ein Artefakt aus dem Bleilettern-Zeitalter, das von manchen Zeitschriften zur Hervorhebung von Überschriften angewandt wird? Wäre es eine Möglichkeit, bei P1476 den Titel in Großbuchstaben zu nehmen, dafür aber den Titel des Items in Benjamin Franklin's Medical Imprints anzupassen?--Kopiersperre (talk) 18:02, 20 February 2017 (UTC)

Hallo Kopiersperre, mich stoeren diese Grossbuchstabenketten auch, und ich wandle diese gelegentlich in ein Alias um, doch ist mir keine systematische Heransgehensweise an dieses Thema bekannt, und Research Bot kann dabei derzeit auch nicht helfen. Unter User:Research Bot#Known problems sind einige aehnliche Probleme aufgelistet, die mittels SPARQL queries zumindest ueberwacht werden koennen, und ich arbeite die entsprechenden Listen immer mal wieder durch. Ein Query fuer Labels oder P1476, die nur aus Grossbuchstaben bestehen, erscheint mir nicht uebermaessig kompliziert, auch wenn ich im Moment nicht genau weiss, wie das anzustellen waere. Vielleicht fangen wir damit an und gucken, was dabei rauskommt? Auf der Basis koennen wir dann vermutlich auch andere fuer die Thematik interessieren. --Daniel Mietchen (talk) 20:08, 20 February 2017 (UTC)
Thanks to WikidataFacts, we now have this query. I have added it to the list of known problems and will keep it in mind when I go through the list. --Daniel Mietchen (talk) 22:57, 22 February 2017 (UTC)

Wikidata weekly summary #249

Item to be delete

In RFD there are a lot of item proposed for the deletion created by you. If you do not agree you can participate in the debate --ValterVB (talk) 20:01, 3 March 2017 (UTC)

Thanks for the notification. I am fixing the items right now. --Daniel Mietchen (talk) 21:42, 4 March 2017 (UTC)

Item to be delete

In RFD there are one or more item proposed for the deletion created by you. If you do not agree you can participate in the debate --ValterVB (talk) 23:32, 3 March 2017 (UTC)

I'm done going through these items — thanks again for the notification. --Daniel Mietchen (talk) 22:54, 4 March 2017 (UTC)

Wikidata weekly summary #250

Wikidata weekly summary #251

Weekly Summary #252

Wikidata weekly summary #253

Wikidata weekly summary #254

Wikidata weekly summary #255

Wikidata weekly summary #256

Wikidata weekly summary #257

Wikidata weekly summary #258

Wikidata weekly summary #259

Wikidata weekly summary #260

Wikidata weekly summary #261

Wikidata weekly summary #262

Wikidata weekly summary #263

Wikidata weekly summary #264

Wikidata weekly summary #265

Wikidata weekly summary #266

Wrongly adds short name of researcher

@Daniel Mietchen: The bot wrongly adds P2093 for authors that are already P50'ed. — Finn Årup Nielsen (fnielsen) (talk) 12:35, 29 June 2017 (UTC)

Thanks — I hadn't seen that yet. I've stopped the bot and will dig into this. Pinging T Arrow. --Daniel Mietchen (talk) 16:12, 29 June 2017 (UTC)
I posted a GitHub ticket. --Daniel Mietchen (talk) 16:15, 29 June 2017 (UTC)

Wikidata weekly summary #267

Wikidata weekly summary #268

Wikidata weekly summary #269

Papers

Hello! You are doing an excellent work importing papers. Do you have an estimation about how many papers will you import? What databases come the info from? What percentage you have imported by now? Thank you. Emijrp (talk) 08:04, 23 July 2017 (UTC)

@Emijrp: I have no precise idea how many items about papers I (or anyone) will set up, but we currently have about one million such items (i.e. well under 1% of all scholarly articles), and I expect this to at least double over the next months. Research Bot will do a significant chunk of this, and others from WikiCite are likely to do so as well. I mainly work by topic and try to make Scholia profiles useful for a set of topics I have some knowledge about, e.g. Zika virus. As for workflows, I am currently mostly using Fatameh, which is based on WikidataIntegrator and works basically with PubMed and Europe PubMed Central. Some further details here. --Daniel Mietchen (talk) 21:23, 23 July 2017 (UTC)
Could you please join this discussion? We are trying to reach consensus for paper descriptions. Emijrp (talk) 20:20, 29 July 2017 (UTC)
Done. Thanks, --Daniel Mietchen (talk) 20:28, 30 July 2017 (UTC)

Wikidata weekly summary #270

I don't understand why you added genome-wide association study (Q1098876) as the main subject of “Ahoy Me Hearties!” Captain Pugwash, Bits of Movable Paper, and the Bible: A Tribute to John Ryan (Q30239127), which is about comics. Is this maybe a mistake? - PKM (talk) 02:23, 27 July 2017 (UTC)

@PKM: That was a mistake — fixed now. Thanks for checking! --Daniel Mietchen (talk) 07:36, 27 July 2017 (UTC)
Great. Thanks for fixing. - PKM (talk) 20:54, 27 July 2017 (UTC)

Wikidata weekly summary #271

Add short author name that is already there

Research Bot/Fatameh adds a short author name that is already there: https://www.wikidata.org/w/index.php?title=Q28079394&diff=503900567&oldid=492148862 . — Finn Årup Nielsen (fnielsen) (talk) 13:53, 3 August 2017 (UTC)

Reported at https://phabricator.wikimedia.org/T172385Finn Årup Nielsen (fnielsen) (talk) 13:55, 3 August 2017 (UTC)

Wikidata weekly summary #272

Wikidata weekly summary #273

Research paper authors

I noticed that Research Bot imports the authors of papers as text strings. Should the articles also be linked to the items for the authors if they exist? Should the string be retained if the item is added? Should there be an item for every author of a research paper, even if they are not otherwise notable? (As an aside, is there a way to designate lead and corresponding authors? This is important in some fields.) Antony-22 (talk) 06:18, 17 August 2017 (UTC)

@Antony-22: The bot imports the authors as strings because that's how the information is stated in the databases it queries (PubMed, PubMed Central, CrossRef). There are no good tools to convert those author name string (P2093) statements into author (P50) statements at scale, but for individual strings (which may still represent multiple authors), tools like Resolve Authors are an important first step, and in its newest version, it retains the original string through a object named as (P1932) statement (example). It could be seeded by SPARQL queries like this. We also have a dedicated Wikidata game. As for lead author (Q6508397) or corresponding author (Q36988860), we could use subject has role (P2868) as a qualifier, but we currently have no tools to help with that. I don't think there should be an item "for every author of a research paper" for all Wikidata-indexed papers anytime soon. --Daniel Mietchen (talk) 14:03, 17 August 2017 (UTC)
Awesome, thanks. Antony-22 (talk) 18:53, 17 August 2017 (UTC)

scientific papers

Is your plan to create an item for all 27 million records in PubMed? --Pasleim (talk) 16:05, 20 August 2017 (UTC)

  Yes please. Emijrp (talk) 16:47, 20 August 2017 (UTC)
In the long run (next year?), I'd say yes. Currently, however, I am concentrating on getting most of the papers from PubMed Central in (ca. 4M), along with reviews from PubMed (ca. 2M) and PubMed-based corpora on specific topics, especially papers cited from PubChem (ca. 2M) or related to viral outbreaks or general disaster response (several 100k). On that basis, my focus will shift to adding main subject (P921), author (P50), copyright license (P275) to these items for a while and on improving the workflows for that (e.g. MeSH terms). Help most welcome. --Daniel Mietchen (talk) 02:27, 21 August 2017 (UTC)

Wikidata weekly summary #274

Job queue increasing, Admin noticeboard

Please note https://phabricator.wikimedia.org/T173710 "Job queue is increasing non-stop"

Does this match your item creation?
--- Jura 11:11, 23 August 2017 (UTC)

@Jura1: Thanks for the ping. I am not familiar with this queue and do not understand much of its scope and mechanics — it's not even clear to me whether it's specific to Wikidata or not — but if it is related to Wikidata edits or page creation, then I would assume that my bot adds stuff to the queue, and in that sense, it has certainly contributed its share to the queue's growth. I do not know how the "de-queuing" happens, so cannot comment on that. I also noticed that there is an earlier peak in terms of queue growth around July 18, when I was not active (nor was my bot). --Daniel Mietchen (talk) 19:30, 23 August 2017 (UTC)
I found the explanations by Daniel in T171263 most helpful. He also linked the full documentation. The ticket was opened on the last peak. Even if it was your bot, I don't think it should have such an impact, as it seems fairly efficient in the way it creates items.
--- Jura 19:49, 23 August 2017 (UTC)

Wikidata weekly summary #275

Wikidata weekly summary #276

Problem with saving edits

Daniel, I just resent my email on my edit problem. Sorry if this stems from a stupid mistake on my part, but I'll need a solution anyway! Thanks, Walkerma (talk) 11:44, 7 September 2017 (UTC)

I think this boils down to the problem that we cannot easily indicate ranges of values. I gave more details in the email. --Daniel Mietchen (talk) 23:13, 7 September 2017 (UTC)

Wikidata weekly summary #277

Wikidata weekly summary #278

Symptoms, Medical signs, Diagnosis methods

Hi, I noticed that many (e.g. Durkan's test (Q5316666)) of these medical topics don't have any "subclass of", "instance of" fields.

As you seem to be knowledgeable in medicine, maybe you can devise a good import strategy and data source for such information. For example, import via petscan and en:Category:Medical signs, en:Category:Symptoms, en:Category:Medical tests, en:Category:Physical examination as a starting point? Or some well-known medial dictionary? I can't do that, as I'm not in medicine. Thank you! Chire (talk) 17:03, 22 September 2017 (UTC)

Hi, thanks for checking in here. Yes, this is within the scope of WikiProject Medicine that I am hereby pinging:
  Notified participants of WikiProject Medicine
My preference in such cases is to go for expert-curated databases when available, but in cases like these when we have no statement whatsoever, I agree that some basic import from other wikis might be useful as a starting point. --Daniel Mietchen (talk) 18:07, 22 September 2017 (UTC)
Regarding medical signs and symptoms, I have a MeSH Mix'n'match set up that aims to tag most of these with MeSH IDs. Once that's done, it would be nice to use the mesh tree structure to assign subclass of statements (and instance of sign or symptom or whathaveyou). There are other resources that could be used also, but are generally restricted by licensing issues, such as HPO and SNOMED. For medical procedures and diagnosis, the only resource I'm aware of is SNOMED, which has an enormous and well-organized set of these (although, I can't find the example you gave, but here's another example). The issue, again, is with the licensing. Gstupp (talk) 18:26, 22 September 2017 (UTC)

Wikidata weekly summary #279

Wikidata weekly summary #280

Missing authors

Hi. I noticed that in Q25909434 there are many authors missing (you can verfiy this e.g. on ncbi). I don't know if that types of ommisions are common, but I am hoping you know some effortless way of fixing that. --22merlin (talk) 14:28, 6 October 2017 (UTC)

Thanks for the notification. It's not the only such case, but they are very rare. I have tried to fix it for this article, but the tools we have could not do it, since the author list provided by the PubMed API ends with "Abecasis, Gonçalo R", listing everyone else as investigator instead. The CrossRef API handles this differently, but I don't have a way to harvest that in a simple fashion. --Daniel Mietchen (talk) 23:11, 6 October 2017 (UTC)

Wikidata weekly summary #281

Taxon items with audio files

Hello, a volunteer and I have been working to share audio files from the Natural History Museum in London. As a result there are some 1,750 new audio files on Commons. Magnus Manske came up with this tool showing where file names correspond to items on Wikidata. As you've got a list of taxon items with audio files I thought this collection of files might be relevant. Thinking about adding the audio files to items, do you know of a relevant community who might be interested in that? Richard Nevell (WMUK) (talk) 12:34, 10 October 2017 (UTC)

Great work, and thanks for the ping! Yes, I'm definitely interested, and so is User:Pigsonthewing. It would be great to have a set of queries to identify accounts that have uploaded the audio files on that Wikidata list, or added them to items or other wiki pages. Worth a try might also be the various biology or audio WikiProjects. Great stuff for a hackathon/ editathon too. --Daniel Mietchen (talk) 12:53, 10 October 2017 (UTC)
PS: We could also think of opening a new channel in WikiRadio. --Daniel Mietchen (talk) 12:55, 10 October 2017 (UTC)
Good thinking about WikiRadio, that would be a nice way of presenting some of the audio.
Magnus has also created this tool which shows when an item on a taxon has an audio file but the corresponding Wikipedia page doesn't, which should help with filtering the files through to various wikis. Richard Nevell (WMUK) (talk) 13:42, 10 October 2017 (UTC)
Cool. I've done a few already and noticed that the metadata is very often in the audio track, which does not make it suitable for WikiRadio, nor for non-English wikis. --Daniel Mietchen (talk) 13:48, 10 October 2017 (UTC)
ah, you're right - most of them would need trimming for use in other wikis. Thank you for making those edits! Richard Nevell (WMUK) (talk) 14:06, 10 October 2017 (UTC)
@Richard Nevell (WMUK): They should be trimmed for use in Wikidata too; and the introductions transcribed into the text descriptions on Commons. The original files should be retained, and the trimmed versions uploaded as derivatives, under a new name. Do we have any idea of how many have spoken introductions; and which files are affected? If the answer to the latter is no, the first job would be to check, and add them to a specific category. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 09:38, 11 October 2017 (UTC)
It should be pretty much all of them, though there are a small proportion where there's already been some trimming so it's just the animal sound. Richard Nevell (WMUK) (talk) 16:01, 11 October 2017 (UTC)
Looks like the seed for an audio trim-athon... --Daniel Mietchen (talk) 16:12, 11 October 2017 (UTC)

Wikidata weekly summary #282

Wikidata weekly summary #282

Data about citations

When it comes to creating items of new papers, there are seldom statements about citations. Is that because your data source doesn't contain information about citations? ChristianKl (talk) 16:00, 18 October 2017 (UTC)

Yes. Also, while the bibliographic data is stable, citation data is less so (i.e. the cited references may or may not have a Wikidata item initially, and that can change over time, and due to things like Initiative for Open Citations (Q29188397), even the licensing of citation information may change), so it makes some sense to have different tooling for bibliographic and citation data. Most of the latter is being brought here by User:Harej in case you'd like to dig deeper into that. --Daniel Mietchen (talk) 18:15, 18 October 2017 (UTC)

Understanding how the Wikidata community is editing items

Dear Daniel,

Together with some colleagues, I am running a survey to understand the way Wikidata editors edit items over time. We would like to know the extent to which you choose the items you want to edit, the criteria that you use to decide what to edit, the situations that trigger your edits, and the way these decisions change over time.

Before we conduct the survey, we would like to be sure that the questions are clear and we would like to get some feedback from Wikidatans. Given your expertise, I am writing you directly, to kindly ask you if you could answer the survey and give us feedback.

You will need around 10 minutes to complete the survey.

We are not interested in the edits of particular users, but rather in the editing strategies being followed by the community of editors. That’s why the responses to this survey are anonymous.

We plan to publish the anonymous results openly. We will share the results with the Wikidata community.

Thanks a lot in advance for your collaboration!

Link to the survey: https://docs.google.com/forms/d/e/1FAIpQLScmxdvsyupNDjhzV-JodQgiscXQShksczns0PGmQbLXWpb3cw/viewform

Cristina Sarasua <csarasua@uni-koblenz.de> --criscod (talk) Institute for Web Science and Technologies, University of Koblenz-Landau, Germany Member of Wikimedia Deutschland

Gianluca Demartini <g.demartini@sheffield.ac.uk> Information School, University of Sheffield

Just for the record, and to get the archiving working properly: the above entry was posted by User:Criscod on 28 August 2016. --Daniel Mietchen (talk) 15:13, 20 October 2017 (UTC)
Thanks. I should have added that in the signature back then, you are right. :) criscod (talk) 12:21, 22 October 2017 (UTC)

stop for a while please

Hey :) We currently have a very high dispatch lag. This causes changes to show up on Wikipedia only very late. This is not ok. Can you please slow down for a while until it is down? You can check it here: Special:DispatchStats. --LydiaPintscher (talk) 18:59, 19 October 2017 (UTC)

OK, will do, though I don't think my bot's edits have an effect on the dispatch lag. --Daniel Mietchen (talk) 19:34, 19 October 2017 (UTC)
They should not, yeah but something is fishy and I am trying to narrow it down. Thanks a lot! --LydiaPintscher (talk)
Makes sense. I'll keep an eye on the edits and the job queue. --Daniel Mietchen (talk) 20:47, 19 October 2017 (UTC)
Cool! There is also https://grafana.wikimedia.org/dashboard/db/wikidata-dispatch?refresh=1m&orgId=1&panelId=17&fullscreen The graph at the bottom should be green for the lag to go down. --LydiaPintscher (talk) 20:54, 19 October 2017 (UTC)

Wikidata weekly summary #283

Wikidata weekly summary #282

Wikidata weekly summary #284

Wikidata weekly summary #285

Wikidata weekly summary #286

New items

Hi Daniel Mietchen. As I'm trying to create 100 items in sequence, I was wondering when your bot will be not running. Currently it fills Special:NewPages. It can wait a couple of days.
--- Jura 10:54, 17 November 2017 (UTC)

Interesting. What would you need 100 consecutive items for? Have you checked whether any user has created 100 (still existing) items in a row already? I guess a good number of bots would meet this criterion, probably including mine. It's currently running continuously, but I can of course stop it any time. A convenient time for me to do that would be on Tuesday Nov 28 after 9pm CET. Would that be still OK or too late for your purposes? --Daniel Mietchen (talk) 23:42, 17 November 2017 (UTC)
100 Decameron stories .. ordinals have them .. not that it's really important, but as there 100, why not. Ok for Tue.
--- Jura 23:47, 17 November 2017 (UTC)
Just saw that is Nov 28. Let's coordinate here then. --Daniel Mietchen (talk) 02:42, 18 November 2017 (UTC)
Oh in 10 days? Could we do this weekend instead?
--- Jura 07:28, 18 November 2017 (UTC)
Not sure - running an event in a different time zone. Will try tonight though, and ping you once I've stopped the bot. What time do you plan to be around tomorrow (Sunday) UTC? --Daniel Mietchen (talk) 21:22, 18 November 2017 (UTC)
@Jura1: — I just stopped it. Will resume tomorrow. --Daniel Mietchen (talk) 21:30, 18 November 2017 (UTC)
Thanks. It worked out.
--- Jura 15:30, 19 November 2017 (UTC)

Wikidata weekly summary #287

Wikidata weekly summary #287 Global message delivery/Targets/Wikidata

Hi, I randomly (literaly) arrived at Pre-clustering of the B cell antigen receptor demonstrated by mathematically extended electron microscopy (Q42128282) your academic bot created. I quickly foun that 3 of the 6 authors have a wikidata item (naturally, the last 3 named authors) and manually migrated them from author name string (P2093) to author (P50). I don't know how often this happens, but I thought you better know about it, maybe even find a way to minimize the loss of data. Good luck and thanks, DGtal (talk) 10:28, 20 November 2017 (UTC)

Thanks for checking. The bot takes the information from PubMed (Q180686) or PubMed Central (Q229883), and if author identification by way of ORCID iD (P496) is provided there for an author that has a Wikidata entry, the item about the paper will link to the item about the author by way of P50. Otherwise, just the string of the author's name will be recorded by way of P2093. No need to do the conversion in an entirely manual fashion — we have a tool that can help considerably, though manual oversight is still warranted. I am using it all the time, and I recommend that you give it a try. --Daniel Mietchen (talk) 03:23, 21 November 2017 (UTC)
Thanks for the info. Unfortunately ORCID is still much less common than it should be, so there are probably thousands of misses by now. DGtal (talk) 12:30, 21 November 2017 (UTC)
Yes — we have over 1 million P50 statements versus 43 million of P2093 statements, and we're actively reaching out to institutions and libraries to share with Wikidata the author disambiguation and related curation work they are already doing. If you could think of potential partners in Israel or elsewhere who would be interested in this, I'd be happy to dig deeper. --Daniel Mietchen (talk) 21:31, 21 November 2017 (UTC)
I can't elaborate too much but the current infrastructure in Israeli academia doesn't have the relevant data yet, but should have it in a few years, so we need to wait a while. DGtal (talk) 09:19, 23 November 2017 (UTC)

Wikidata weekly summary #288

Property migrator

I have recently shared my User:Deryck Chan/Property migration tool with Charles Matthews at a Cambridge Wikimedia meetup. Charles said you may be interested in this tool so I came to drop you a message. Let me know if my tool can be of any use - it was originally developed a few weeks ago to catch the big fish in the deprecation of P794 (P794). Deryck Chan (talk) 14:52, 27 November 2017 (UTC)

@Deryck Chan, Charles Matthews: Thanks to both of you — this looks useful indeed. --Daniel Mietchen (talk) 03:07, 28 November 2017 (UTC)

Strip [ and ] ?

Hi Daniel Mietchen, it seems that some of the sources the bot uses add square brackets around titles (" [ ] "). The bot imports these. Other sources for the same article don't add them. I don't think they are actually part of the title and could be dropped. Sample: [2].
--- Jura 10:38, 29 November 2017 (UTC)

Yes, the brackets are not part of the original title, but the bot takes the titles from PubMed, which uses the brackets to indicate that the original title was not in English, without giving the original title. I don't have a way to fix that right now but here is a query that catches such paper items, so that we can work on them systematically when we do have a fix. --Daniel Mietchen (talk) 17:19, 29 November 2017 (UTC)
Title statements would need to use the original language. Looks like I started out with one that's also published in English (Q41388344).
--- Jura 15:10, 2 December 2017 (UTC)

Wikidata weekly summary #289

Wikidata weekly summary #290

Wikidata weekly summary #291

Author items

Hi, thanks for creating items for scientific article authors. You may consider to add occupation (P106) scientist (Q901) for these items, as this will be correct for most of them.--Jklamo (talk) 09:52, 21 December 2017 (UTC)

Thanks for the note. I'm always trying to be more specific (e.g. using entomologist (Q3055126) rather than scientist (Q901)) but agree that having a domain-general P106 statement is often better than having none. Will probably go with the even more generic researcher (Q1650915) by default, though. --Daniel Mietchen (talk) 14:18, 21 December 2017 (UTC)
For sure more specific is better, but I randomly found those with none occupation (P106). From the P106 usage stats it seems that scientist (Q901) is a bit more poplar (3802) than researcher (Q1650915) (1711).--Jklamo (talk) 14:44, 21 December 2017 (UTC)
I'm doing a batch run for researcher (Q1650915) now, which can then be used as a basis for more fine-grained tagging. --Daniel Mietchen (talk) 15:23, 21 December 2017 (UTC)

Wikidata weekly summary #292

Return to the user page of "Daniel Mietchen/Archive/2017".