Logo of Wikidata

Welcome to Wikidata, Charles Matthews!

Wikidata is a free knowledge base that you can edit! It can be read and edited by humans and machines alike and you can go to any item page now and add to this ever-growing database!

Need some help getting started? Here are some pages you can familiarize yourself with:

  • Introduction – An introduction to the project.
  • Wikidata tours – Interactive tutorials to show you how Wikidata works.
  • Community portal – The portal for community members.
  • User options – including the 'Babel' extension, to set your language preferences.
  • Contents – The main help page for editing and using the site.
  • Project chat – Discussions about the project.
  • Tools – A collection of user-developed tools to allow for easier completion of some tasks.

Please remember to sign your messages on talk pages by typing four tildes (~~~~); this will automatically insert your username and the date.

If you have any questions, please ask me on my talk page. If you want to try out editing, you can use the sandbox to try. Once again, welcome, and I hope you quickly feel comfortable here, and become an active editor for Wikidata.

Best regards!

Bináris (talk)

Hello there! I am w:User:Charles Matthews. Charles Matthews (talk) 21:05, 16 February 2013 (UTC)

ODNB referencesEdit

Hi CM, I mentioned these back in April at w:Template talk:Cite ODNB/Archive1#Subscription, that the references (and Archival material) cited by ODNB can be accessed by the URL http://www.oxforddnb.com/view/references/ODNBid. Reading the blog by Andrew, I was wondering if you have thought of somehow accessing the ODNB references by Wikidata. Solomon7968 (talk) 06:54, 1 December 2014 (UTC)

Yes, I recall the discussion. Andrew and I met the ODNB folk on Tuesday, at a party for the tenth anniversary of the ODNB; amd we have cooperation with them on metadata. So that's a line I could take up with Andrew. Charles Matthews (talk) 06:58, 1 December 2014 (UTC)
Great! However I have noticed that the ODNB doesn't list the book reference author(s) in full names and the name is often difficult to figure out, see wikisource:User talk:Billinghurst#Full names regarding this but that's a separate story. On the metadata front what do they say about your filled-in references idea? Solomon7968 (talk) 07:26, 1 December 2014 (UTC)
Yes, it's not so clear to me what can be done. Charles Matthews (talk) 07:27, 1 December 2014 (UTC)

You are a composerEdit

Hi Charles, I was Listening to Wikidata this morning, and amongst the profusion of bot edits, I saw you were contributing to the medley!Fabian Tompsett (WMUK) (talk) 09:13, 16 January 2015 (UTC)

Germanic umlaut: Gerstäcker, FriedrichEdit

Is this edit an old / known problem? Cheers --Kolja21 (talk) 11:55, 6 April 2015 (UTC)

Yes, there are many problems with special characters in the Appleton's catalog (on the mix'n'match tool). Charles Matthews (talk) 16:59, 6 April 2015 (UTC)

Fellow of the Royal Society ID (P2070)Edit

Fellow of the Royal Society ID (P2070) is ready. --Tobias1984 (talk) 18:00, 14 September 2015 (UTC)

Wrong mergesEdit

All three mergesd of the Thai administrative units you did earlier today I had to revert, because they were wrong - there are same-named units at different administrative levels, which are NOT the same. Please do not merge when you are not sure about it. Ahoerstemeier (talk) 10:40, 15 October 2015 (UTC)

Thank you for the information. Charles Matthews (talk) 10:48, 15 October 2015 (UTC)

WikiData queryEdit

Hi Charles, We met at the recent wikidata training in London, and i have been inspired to to hold a Wikidata event in the coming weeks. The aim will be to improve the DWB wikidata using this query The page suggest that the Data should be edited in Autolist 2. Can I just confirm that it is ok for our volunteers to work off the above page?

As for Autolist2: the comment is perhaps a little misleading. Autolist2 is useful for adding a given statement to a whole list of items. It is perfectly fine to go into items and edit them from Autolist1, of course.
For instance, and I did this for the ODNB genders myself, suppose you go through and find all the women, and mark them as female. Then when you run the query again, it will be all male, given that CLAIM[31:5] guards against families and suchlike.
So to finish the job you can run it in Autolist2, and tell it to add "sex or gender = male" to all of them, a big timesaver.
Good luck with it all, and glad to help. Charles Matthews (talk) 11:51, 9 November 2015 (UTC)
Great, Thank you Charles. Jason.nlw (talk) 12:43, 9 November 2015 (UTC)

DNB property proposalEdit

If you are in favor of the property, would you support it explicitly? --- Jura 09:34, 17 November 2015 (UTC)

Well, OK. It would help me, but I wasn't sure it was best possible. Charles Matthews (talk) 09:41, 17 November 2015 (UTC)
I suppose it depends what you want to do with it. To just let it rest there, a complex version of P1433 might be the better solution. For any practical uses, personally, I think a separate property works better. --- Jura 11:26, 17 November 2015 (UTC)
I have a general theory that "described by source" will become important for Wikisource. If W is some kind of collective work, such as the DNB, then a separate property for W does make it easy to call up the scope of W, i.e. the items here that form the main subjects of articles in W. The data items attached to the articles here make it possible to add the authors there, and so to reconstruct the author list, and (author, article) pairs of items. These are the major applications I see of Wikidata to collective works on Wikisource. So since the separate property is positive rather than negative for these applications, I have no objection. I did think others might see other issues. Charles Matthews (talk) 12:42, 17 November 2015 (UTC)
I hadn't seen it that way, but it makes sense. You could do that same with to distinct properties, but I see your approach. Given that it's mainly Billinghurst and yourself who work with it, I'd go with the solution you prefer. If you do want to test, maybe a smaller work would be the better. --- Jura 15:10, 17 November 2015 (UTC)
For what I am doing now, which is hunting ODNB items which have not yet been tagged (because the initial mix'n'match list wasn't complete), it would be easier to have the value in "described by source" simply the DNB edition; and the Wikisource page could become the reference URL. This has the advantage of finding the DNB items rather explicitly. But then "described by source" still isn't the opposite relation to "main subject". Charles Matthews (talk) 15:17, 17 November 2015 (UTC)
Now that they are trying to make all identifiers into URLs, maybe an approach for Wikisource should be looked into as well. I added a few more lists here. In terms of queries, I think it's equivalent. It's just that you can't run the daily constraint reports on it (compared to a separate property). I try to finish importing main subjects from WS. Once done, maybe we can convince Magnus to do a Mix-and-Match between Wikisource and Wikidata for DNB. --- Jura 15:57, 17 November 2015 (UTC)
We talked about that idea, some time ago. I know there are some easy tasks for main subjects: the cases for DNB01 and DNB12 where there is a link to enWP. Magnus did DNB00 only (20K+) to get it started. Matching is otherwise quite hard work. Thanks for your help. Charles Matthews (talk) 16:05, 17 November 2015 (UTC)

When adding multiple VIAFsEdit

Hi CM. When adding multiple VIAF identifiers as per Kyrle Bellew (Q5569477) it would be helpful if you could assign one of the VIAFs as the preferred rank [the top of the little boxes] (I usually choose the lower number). You will see that I have done it in this case. This will allow WPs/WSs to have the AC templates to be populated properly rather than confused in their presentation. To note that same applies for images where a ranking is also useful. Thanks. Hope that you are well.  — billinghurst sDrewth 01:58, 4 January 2016 (UTC)

Thanks, I'll bear it in mind. Just about getting over Xmas here. Charles Matthews (talk) 08:07, 4 January 2016 (UTC)

Contributor addition, it is meant to flow from a work listing the contributorsEdit

Hi CM. I have previously done some of the contributor = DNB as you did at [1]. I was told that I had it arse about. That addition could only be done to the DNB item itself and there we would list the contributors, and do we do that per volume, or per series. There is currently no reverse property to tie a person to where they contributed, and I gave up and didn't propose it. Let us not talk about the difficulties of listing all contributors to Scientific American.  — billinghurst sDrewth 07:01, 14 July 2016 (UTC)

OK, this was driven by discussion at w:Wikipedia talk:WikiProject Dictionary of National Biography. It seemed a reasonable small project to me, and has proved useful so far, as I said there. In fact if you look at w:List of contributors to the Dictionary of National Biography, you can see we could have a better Listeria-generated page now. And also see w:User talk:Rich Farmbrough#List of contributors to the Dictionary of National Biography. I would like to treat this on the basis that Wikidata guidelines are not yet codified. Charles Matthews (talk) 07:10, 14 July 2016 (UTC)

Property proposalsEdit

Please note [2] - your edit broke an existing proposal; please paste your code into a new page for each proposal (so that they are on your watchlist). Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:47, 25 July 2016 (UTC)

Edward Nicholas birth and death datesEdit

In recent edits you assert that the birth and death dates for Edward Nicholas are stated in the Oxford Dictionary of National Biography, and that the dates are in the Gregorian calendar. I find this highly unlikely. The United Kingdom did not adopt the Gregorian calendar until 1752, more specifically, Wednesday 2 September 1752 was followed by Thursday 14 September 1752. Articles about the United Kingdom, especially if published in the United Kingdom, normally use the calendar that was in force at the time of the event, so it is highly probable that the dates of Edward Nicholas' birth and death are stated in the Julian calendar in the Oxford Dictionary of National Biography. Please check your information. Jc3s5h (talk) 16:55, 3 September 2016 (UTC)

Yes, Julian is likely. I have not documented the ODNB's editorial policy on dates: it is probable that the default is Julian to 1752, as you suggest. The default here, unfortunately, is Gregorian flagged up, which I don't like. Charles Matthews (talk) 16:58, 3 September 2016 (UTC)
I suggest you determine the ODNB's policy. If it is not explicitly stated, you could compare birth dates in the suspect range with another source, such as American National Biography, which explicitly states on pages xxi to xxii that they use Julian when that calendar was in force, but treat January 1 as the beginning of the year even when England and Wales treated March 25 as the beginning of the year. It is your duty as an editor to either determine the correct information, or undo all your edits in the suspect range. Although Gregorian is the default calendar, the user interface allows the default to be overridden manually. Jc3s5h (talk) 18:54, 3 September 2016 (UTC)
I am in a position to enquire of the ODNB editorial staff what the exact position is, as you suggest, and as I had in mind: as far as I can see it is not in the Help page they offer, nor is their convention on the Old Style year start. I assume their authors are asked to conform to a house style, but that is only my assumption. I am aware of the override. (I think you can omit telling volunteers their duty, as a matter of wiki etiquette.) Charles Matthews (talk) 19:08, 3 September 2016 (UTC)
So I have been sent the relevant section of the style manual for the ODNB. It does support the idea that pre-1752 dates are Julian, post-1752 Gregorian. There is a caveat about dates given for non-British events, which may use the local calendar.
This then leaves a maintenance problem. My idea would be to master how to use a SPARQL query to pull out dates marked "Gregorian", and apply it with some side conditions. I'll ask at Project Chat. Charles Matthews (talk) 09:23, 14 September 2016 (UTC)

Tidying the ODNB importEdit

Hi Charles,

I've been wondering for a while how best to tidy up the ODNB imports which have the "parent item" description attached to the "child item" labels. Two queries that seem useful:

  • SELECT DISTINCT ?description1 ?item1 ?item1Label ?item2 ?item2Label
    	{	SELECT DISTINCT ?item1 ?description1 ?item2
    			?item1 wdt:P1415 ?whatever1 .
    			?item2 wdt:P1415 ?whatever2 .
        		?item1 schema:description ?description1 .
        		?item2 schema:description ?description1 .
        		FILTER(LANG(?description1) = "en" && ?item1 != ?item2 && str(?item1) < str(?item2)  ) .
                FILTER (CONTAINS(str(?description1),'('))
    		LIMIT 1000
      	SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
    Try it!
This one gets all items which have identical descriptions *and* whose identical descriptions contain a bracket; this catches most of the cases where two paired items were imported at the same time. Some related queries to find entries which still use the ODNB summary -
  • SELECT DISTINCT ?description1 ?item1 ?item1Label ?item2 ?item2Label
    	{	SELECT DISTINCT ?item1 ?description1 ?item2
    			?item1 wdt:P1415 ?whatever1 .
        		?item1 schema:description ?description1 .
               FILTER (CONTAINS(str(?description1),'<'))
    		LIMIT 10000
      	SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
    Try it!
This gets all the ones with garbled HTML in the import (there's a couple of hundred)
  • SELECT DISTINCT ?description1 ?item1 ?item1Label ?item2 ?item2Label
    	{	SELECT DISTINCT ?item1 ?description1 ?item2
    			?item1 wdt:P1415 ?whatever1 .
        		?item1 schema:description ?description1 .
               FILTER (CONTAINS(str(?description1),'['))
    		LIMIT 10000
      	SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
    Try it!
All the ones with square brackets (usually only found in ODNB style, not ours)
  • SELECT DISTINCT ?description1 ?item1 ?item1Label ?item2 ?item2Label
    	{	SELECT DISTINCT ?item1 ?description1 ?item2
    			?item1 wdt:P1415 ?whatever1 .
        		?item1 schema:description ?description1 .
               FILTER (CONTAINS(str(?description1),'),'))
    		LIMIT 10000
      	SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
    Try it!
Any with the bracketed dates followed by a comma and other text - usually a sign it's the ODNB description
  • SELECT DISTINCT ?description1 ?item1 ?item1Label ?item2 ?item2Label
    	{	SELECT DISTINCT ?item1 ?description1 ?item2
    			?item1 wdt:P1415 ?whatever1 .
        		?item1 schema:description ?description1 .
               FILTER (CONTAINS(str(?description1),'–'))
    		LIMIT 10000
      	SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
    Try it!
Any with a long dash rather than a hyphen (again, likely from ODNB)
Not sure how useful these are likely to be to you, but thought they might well be of interest to find the ones most likely to need maintenance. Andrew Gray (talk) 12:56, 27 May 2017 (UTC)

Thanks - certainly going to be useful. Now to try to find some time ... Charles Matthews (talk) 20:14, 29 May 2017 (UTC)

@Andrew Gray:: important second thought. Among these pairs are going to be some from a bot batch where, erroneously, incorrect dates were added. On my conscience is the need to go over that whole batch again. If further SPARQL magic could fish out candidates, say by using P1415 and exact matches of birth and death years, perhaps that check could be made with less drudgery. Charles Matthews (talk) 20:50, 31 May 2017 (UTC)

Hmmm - interesting. All ODNB items where birth1 = birth2 and death1=death2? One complication here is that a lot of items will already have dates imported from enwiki (often wrong, via old DNB), or manually specified to day level rather than year, which will throw it out. But I'll see if I can work out something... Andrew Gray (talk) 10:29, 3 June 2017 (UTC)

Not forgotten about these. They have been copied in to User:Charles Matthews/Queries, which I suppose one day will grow into training material. Charles Matthews (talk) 09:36, 13 June 2017 (UTC)

Wikidata isn't a really a triple storeEdit

In http://moore.libraries.cam.ac.uk/meet-your-wikimedian-residence/extract-transform-load you seem to write that Wikidata is a triple store. I think that's misleading. Wikidata isn't focused on 3D reality. Good Wikidata claims have references with makes them 5D. Very often they also have qualifiers that add additional dimensions. If qualifiers are too complicated for WikiFactMine that's okay, but you are still not left with triples but with 5D entities that include sources.

Apart from that it feels strange to write blog posts without an ability for readers to leave comments when your goal is community outreach. ChristianKl (talk) 14:53, 14 June 2017 (UTC)

@ChristianKl: Thanks for the comments. I'm discussing feedback with the library webmaster.
As to "Wikidata is a triple store", that is obviously not the whole truth, but it is also not untrue: it stores triples. It is not a question of WikiFactMine, but of the intended audience, which is largely librarians rather than tech people. I could also talk about S(⌊R(x,y)⌋, z) being the way a quintuple with properties R and S, S qualifying the statement "R(x,y)", resolves into two triples, but that would be more pleasing to logicians.

Charles Matthews (talk) 15:19, 14 June 2017 (UTC)

blogs without comments are a commonplace now, given the headache of monitoring the rampant incivility. you should really change the thumbnail image. the 15th birthday party one is bigger and more recent. cheers. Slowking4 (talk) 19:13, 15 June 2017 (UTC)
  • @Charles Matthews: My argument is about technology but about ontology. I do think librarians care about ontology and how data get's modeled. A bit more than a decade ago people like Barry Smith came to the conclusion that knowledge isn't just made up of triples. Barry Smith wrotes papers like Against Fantology and formulated Basic Formal Ontology the paradigm of 4D perspectives on reality. Ontologies like the Ontology for Biomedical Investigations are based on Basic Formal Ontology.
Wikidata's function of qualifiers allow it to express 4D perspectives and it's worth for a librarian who wants to understand Wikidata to understand that capability.
Aside from the 3D/4D distinction of Barry Smith there's also the issue of sources. In Wikidata we don't want to store "X has relationship R to Y" but "Book B said 'X has relationship R to Y'".
Our data model is quite different from the triples that the Integrated Authority File of the German national library uses. And I haven't even talked about ranks which add another dimension to the data. ChristianKl (talk) 22:17, 19 June 2017 (UTC)
Thank you for the detailed comments, which I'll try to digest. Charles Matthews (talk) 04:04, 20 June 2017 (UTC)

On the comments: there is now an email link for me on the blog page at the Moore Library. More receny is better!? Charles Matthews (talk) 03:18, 16 June 2017 (UTC)

future scholars will want to track the graying of the beard; need to be "state of the art"; the WMUK EDU one is good also if artistically cropped. Slowking4 (talk) 01:48, 18 June 2017 (UTC)

Wow, the needs of future pogonologists! You're right that I wasn't taking those into account. Charles Matthews (talk) 03:21, 19 June 2017 (UTC)

fyi, there is talk of a GLAM user group https://meta.wikimedia.org/wiki/Wikimedia_GLAM_User_Group ; you might want to write up (copy paste) your exploits at GLAM newsletter https://outreach.wikimedia.org/wiki/GLAM/Newsletter . Slowking4 (talk) 13:47, 23 June 2017 (UTC)

Thanks, useful. Charles Matthews (talk) 09:24, 24 June 2017 (UTC)

Donald Trump pseudonymsEdit

Greetings, Charles M., from Deborahjay, "the small and meek" but nevertheless recently bold on my reading of w:Donald Trump pseudonyms on which Item pseudonyms of Donald Trump (Q26869209) is based, not being a Wikimedia list page but rather an actual article. I'm particularly unclear on the Statements, which I regret to have left in something of a muddle. Your advice is solicited at Talk:Q26869209. -- Many thanks, Deborahjay (talk) 08:50, 28 July 2017 (UTC)

Hi - I have contributed over there. Charles Matthews (talk) 09:35, 28 July 2017 (UTC)

Pietro ChiesaEdit

You seem to have made some mix ups in Pietro Chiesa (Q38019800) and Pietro Chiesa (Q38020071). You might want to recheck the items and links. Multichill (talk) 12:55, 7 September 2017 (UTC)

OK, I'll have a look. Charles Matthews (talk) 12:57, 7 September 2017 (UTC)
 Y Did a mix'n'match check today. Charles Matthews (talk) 09:54, 25 May 2018 (UTC)

Varvitsiotis' imageEdit

Hello! (Hoping this is the right place for such a request, if not I'm sorry for being a trouble). I've noticed that you undid some changes to Miltiádis Varvitsiótis (Q12881046) re. the image. There seems to be some confusion here. The image added clearly belongs to Varvitsiotis' grandson (Miltiades Varvitsiotis (Q12881047)) with whom they share the same name. Is it possible to somehow flag any entry or both so bots won't add the wrong image? --cubic[*]star 20:56, 25 October 2017 (UTC)

OK, sorry, that was careless of me. I'm done with politicians now. Charles Matthews (talk) 20:58, 25 October 2017 (UTC)
@CubicStar: It looks like the grandfather's entry had an interwiki & commons category for the grandson, which might be why the image got added. I've sorted them out so hopefully it won't reappear. Andrew Gray (talk) 11:55, 26 October 2017 (UTC)
@Charles Matthews: Thank you! --cubic[*]star 16:24, 26 October 2017 (UTC)

2x Thomas ScottEdit

Can you please take a look at Thomas Scott (Q19363661) (1780-1835). There is no reliable source for this item.

I've tried to separate both items, but gave up. --Kolja21 (talk) 17:17, 12 November 2017 (UTC)

101024920 is the correct OBIN for Thomas Scott (1780–1835). As it says in the article, "Thomas Scott (1780–1835), the fourth son, born on 9 November 1780, was educated at Queens' College, Cambridge, graduating BA in 1805 and MA in 1808." It is a subarticle, and Thomas Scott (1780–1835) is (correctly) a co-subject in the article about his father Scott, Thomas (1747–1821), "Church of England clergyman and biblical scholar", who has OBIN 101024919. It is confusing here because the father and son are both called Thomas. Charles Matthews (talk) 17:33, 12 November 2017 (UTC)
Thanks for the fast reply. So I need a subscription to see this info? --Kolja21 (talk) 17:41, 12 November 2017 (UTC)
Yes, the ODNB site is behind a paywall. In the UK, one can read it with a library card. But now there are the Cambridge Alumni Database ID (P1599) and Clergy of the Church of England database ID (P3410) identifiers on the item, and they give enough identifying information. Charles Matthews (talk) 17:49, 12 November 2017 (UTC)

DNB/ODNB matchesEdit

Hi Charles,

Noticed a couple of duplicated items today where one has an ODNB ID and the other had a DNB link and for some reason we never matched them up. I've knocked up a quick query for any items with a DNB "desribed by source" but no ODNB entry:

SELECT ?item ?itemLabel ?instanceLabel
  { ?item wdt:P1343 wd:Q16014700 . }
  UNION { ?item wdt:P1343 wd:Q15987216 . }
  UNION { ?item wdt:P1343 wd:Q16014697 . } # is described by any DNB volume
  ?item wdt:P31 ?instance
  FILTER NOT EXISTS { ?item wdt:P1415 ?odnb . } 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }

Try it!

Might be of some interest to you! Andrew Gray (talk) 10:52, 18 November 2017 (UTC)

@Andrew Gray: Thanks, that's a very interesting cleanup list. By the way, I'm now of the view that section, verse, paragraph, or clause (P958) should be used to qualify, rather than stated in (P248), when giving Dictionary of National Biography, 1885–1900 (Q15987216) etc. followed by the page for the DNB article. Charles Matthews (talk) 11:42, 18 November 2017 (UTC)

Simon François RavenetEdit

You seem to have mixed up Simon François Ravenet (Q1771158) and Simon Jean François Ravenet (Q24685816). Can you check the authority control links you added? At least two of them were wrong and are on the other person. Multichill (talk) 21:40, 18 January 2018 (UTC)

 Y I have done a check on mix'n'match today. Charles Matthews (talk) 10:00, 25 May 2018 (UTC)

William Henry Bradlay - a mixup of 2 men...Edit


I divided William Henry Bradley (Q46690366), where you added 2 IDs, from which one was someone else. You can find the other person at William Henry Bradley (Q47484762). --Hsarrazin (talk) 20:05, 22 January 2018 (UTC)

Henry Garnett Venn - Henry Garnett Venn (Q27868514)Edit

Who is this person who, according to source, died aged 3 days... and has no link to any other item ? what makes this item interesting ? :/ --Hsarrazin (talk) 21:32, 23 January 2018 (UTC)

The Kindred Britain link says his father is Henry Straith Venn (Q42245867). But I agree that such a short life is not intereating, usually. I did some checking, because the death date given for his mother, Maria Garnett, is inconsistent with the life of the son. It turns out that Kindred Britain is reliable on relationships; but there are some incorrect dates.
In this case, the death date of Maria Garnett is wrong: she lived to 1960.[3]. And the 1908 life of Henry Garnett Venn is correct.[4] Thank you for pointing out this issue.
Charles Matthews (talk) 21:53, 23 January 2018 (UTC)
I don't know the Kindred Britain. What's the content/interest of this base ? Are the people in it notorious for what they did or for who they are (nobility) ? --Hsarrazin (talk) 08:36, 24 January 2018 (UTC)
It's from Stanford University. It began with a study of the background of W. H. Auden (Q178698) the poet. It has grown to about 30K entries, illustrating the networks associated with British literature. Charles Matthews (talk) 08:44, 24 January 2018 (UTC)
ah, interesting ! so it's only about links between people, yes ? have a nice day ! --Hsarrazin (talk) 08:55, 24 January 2018 (UTC)


Pinging you (amongst others) since I suspect this might be relevant to the ContentMine dictionary, as to whether or not you think it would be useful to be able to record the "broader" field in thesauruses that have one, allowing one to reference the thesaurus structure in WDQS queries. Property proposal at Wikidata:Property_proposal/broader_concept. Jheald (talk) 19:26, 10 February 2018 (UTC)

@Jheald: I'm not getting this immediately - need to think some more. Most dictionary terms in the ContentMine sense are thing-like and concrete (diseases, drugs, genes ...) Charles Matthews (talk)
Ah, okay. I wasn't sure what degree of hierarchical organisation there was in the CM dictionary, nor how closely it corresponded to hierarchical statements on Wikidata.
For a different example, consider eg the Getty Art & Architecture Thesaurus (Q611299), as referenced by property Art & Architecture Thesaurus ID (P1014). The first four levels can be seen at User:Jheald/aat, or the full list of terms at User:Jheald/aat/full.
The question is whether, for ingestion / confirmation / quality control / sourcing / reference / comparison / extraction, it would be useful to have a property to record the hierarchical structure manifested in the external source, as distinct from our hierarchical structure that is represented by subclass of (P279), instance of (P31), facet of (P1269) etc. Jheald (talk) 18:11, 12 February 2018 (UTC)
There is a whole can of worms in mereology (Q1194916). If there is a semantic web version of part-whole that seems viable here, then it could be of interest. Charles Matthews (talk) 10:08, 13 February 2018 (UTC)

HoP in mix-and-matchEdit

Just noticed this edit, which is worrying me a little - in theory there shouldn't be any entries for the post-1690 volumes coming from mix-and-match, as I've done them all, so they should have been picked up by the automatching. It looks like the URLs have been garbled a bit when imported, and so they're not picking up as identical. Do you know if we can do anything about this, or whether we need to ask Magnus to rebuild the list using non-URLencoded forms?

It seems we now have about seventy like this (the Q312591 entry is a known duplicate, the others are all newish). I won't try and remove them just yet since it will only cause problems, but it's something we do need to work out Andrew Gray (talk) 17:14, 3 April 2018 (UTC)

@Andrew Gray: Yes, it started to worry me too when I had a closer look. I thought I was adding a small percentage of missing ones from the far end of the unmatched set on mix'n'match. Some at least are trivial URL variants. I may have to revert them all, which is a bit frustrating.
There will be some better matches to make in that set. It is anyway overwhelmed with soft redirects which are clearly irrelevant. I was avoiding those. The sort of thing that has happened is substitution for parentheses in the URL. That anyway is clear enough in the identifier. Feel free to revert those cases. I'll have a look now. Charles Matthews (talk) 17:44, 3 April 2018 (UTC)
The mix-and-match list has 22634 entries including redirects; there's a "redirect-free" set of URLs here which has 21404. However, they use the percent-encoded format (the same as the one currently on mix-and-match). The ones on Wikidata at the moment use unencoded forms for ( ) ’ - I think those are the only special characters, but there might be a few accented letters.
A list of all missing IDs as of last night is at [5] - there's about 6000, which sounds about right if M&M thinks there's 7000 and it's got all the redirects. Andrew Gray (talk) 18:07, 3 April 2018 (UTC)
Should be cleaned up now, and the query reflects that. I have copied you into a mail about it all. Charles Matthews (talk) 18:33, 3 April 2018 (UTC)


  Hello, I'm Marsupium. Before creating an item, please make sure it doesn't already exist. I've merged Walter Beck (Q38074585) with Walter Beck (Q38074573). If you have any questions, you can leave me a message on my talk page. Thanks! --Marsupium (talk) 07:24, 7 June 2018 (UTC)

Henry KingsburyEdit

Can you explain these edits? Henry Kingsbury (Q5724354) is obviously a different person than Henry Kingsbury (Q53508607). Could you please clean this up? Also in mix'n'match. Multichill (talk) 19:42, 28 June 2018 (UTC)

@Multichill: A mistake on my part. Thanks for pointing this out. But there was nothing to fix on mix'n'match. Looking closely at the histories, I see that I matched to the wrong item, undid the matches, and matched to the correct item, within a couple of minutes. Normally mix'n'match would handle the undoes, I think. Under some conditions it might not.
In any case, I have removed the incorrect statements from Henry Kingsbury (Q5724354). My understanding is that this was an anomaly.
Charles Matthews (talk) 19:45, 28 June 2018 (UTC)
Mix'n'match undoes nothing on wikidata, it just unmixes. One needs to separately reverse those entries. Been there, and fouled that up myself in my earlier days with the tool.  — billinghurst sDrewth 07:38, 29 June 2018 (UTC)

GTAA on WikidataEdit

You receive this message because you previously matched persons from the GTAA (the Thesaurus for Audiovisual Archives) with items on Wikidata. We would like to inform you about some improvements that we have made to that catalogue on Mix’n’Match. We have improved the automatic links and added additional information from our catalogue (what we’ve called ‘extracted terms’) to the terms. We hope that this makes matching the thesaurus with Wikidata that much more fun and easier. Read more about this project here or in Dutch WikiProject Dutch Media History nl here. Best! 85jesse (talk) 08:05, 18 July 2018 (UTC)

Find multipleEdit

#Prototype focus list batch by Aleksey
SELECT ?item ?itemLabel 
  values ?doi {  "10.1186/1743-422X-7-45" "10.3748/WJG.V13.I1.48" "10.1186/1743-422x-7-45" "10.3748/wjg.v13.i1.48" }
  ?item wdt:P356 ?doi
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }

Try it!

health specialty (P1995)Edit

Hi. Thank you for adding many health specialty (P1995) statements to disease items. But some of them is invalid. For instance, musculoskeletal disorder (Q4116663), urinary system disease (Q7900883), parasitic infectious diseases (Q1601794), cardiovascular disease (Q389735), etc... is not a medical specialty, but a disease. These items cannot be values of health specialty (P1995). Could you please fix them? Regards, --Okkn (talk) 14:09, 9 August 2018 (UTC)

Thanks for the feedback. The content of the statements is derived from the MeSH tree code (P672) statements that have been added to the disease items. Charles Matthews (talk) 14:22, 9 August 2018 (UTC)
What do you mean by “derived from the MeSH tree code (P672)”? How did you determine the health specialties from MeSH and reconcile them with Wikidata entities?
What I want to say is that health specialty (P1995) in premature ventricular contraction (Q26781137), which you have added, for example, should be cardiology (Q10379), not cardiovascular disease (Q389735). --Okkn (talk) 14:38, 9 August 2018 (UTC)
With a colleague I'm working on the ScienceSource project (see e.g. WD:SSFL) and as a preliminary we have done a conversion from MeSH descriptor ID (P486) to MeSH tree code (P672). The advantage is that MeSH tree code (P672) gives well-structured identifiers. I apologise if my first attempt to exploit the information is not so appropriate. I was last working in this area over a year ago, and I have the impression that there have been some changes made since then. Charles Matthews (talk) 14:46, 9 August 2018 (UTC)
I know the tree structure of MeSH, and your starategy using MeSH code is right. The trouble is that you didn't distinguish medical specialty (Q930752) from disease (Q12136). Do you understand the difference between urology (Q105650) and urinary system disease (Q7900883)? --Okkn (talk) 14:59, 9 August 2018 (UTC)
Yes, I understand that. I find it confusing that the English description for urology (Q105650) mentions only surgery, but reading the English Wikipedia article about the "branch of medicine that focuses on surgical and medical diseases of the male and female urinary-tract system and the male reproductive organs" is clearer. Thank you for understanding what I'm trying to do. urinary system disease (Q7900883) for me, this afternoon, was being used as a placeholder, really. I'm happy to make any replacements, cardiology (Q10379) for cardiovascular disease (Q389735), urology (Q105650) for urinary system disease (Q7900883), and so on.
To go into more detail, I'm working with the query at Wikidata talk:WikiFactMine/Core SPARQL#MeSH Code tree handling for "health speciality". That is a case analysis into nearly 30 cases: I can see I have started off in a clumsy way, but I will fix everything, of course. The query succeeds in extracting information from MeSH Code. Probably it can be refined, and certainly the handling of the cases should be changed. In 2017 I did similar things for ICD-9 and ICD-10 (and now ICD-11 is possible, I think) which was harder to code, but the specialty side was much more obvious.
So I have work to do here, and I'm grateful for the pointers you have given me. Charles Matthews (talk) 16:48, 9 August 2018 (UTC)

Can your project pull this information?Edit

I am still thinking about what your project may or may not do.

Is identifying Wikidata:Property proposal/risk factor as a term in papers and linking it to structured Wikidata about what risk factor a paper identifies among the possible outcomes of what your project might accomplish? Blue Rasberry (talk) 13:26, 15 August 2018 (UTC)

@Bluerasberry: Yes, in principle, it could do. What we'd usually be doing is searching for statements equivalent to "X is a risk factor for Y" where X runs over some definite list of "risky stuff" and Y runs over a list of, say, "diseases". The community discussion is clearly wide-ranging, but this could be a use case.
We certainly have such lists, under the name "dictionaries", for diseases, for example a list of cancers. It would be possible, with glyphosate (Q407232) in mind, to create a list for X of herbicides, or some such list of chemicals. Then what the tech would do is to find, in a bunch of papers, places where an X term and a Y term are close in the text.
The human-assisted step is then to have a person actually check the language: what does it assert, if anything, about an association between the herbicide and the cancer? The conclusion would take the form of an annotation. If the "risk factor" property existed, then with caveats about when it should be applied, the annotation could note the presence of a candidate statement for Wikidata. The source would still have to pass (our version of) MEDRS to be written into Wikidata as a referenced statement.
So, text-mining with definite dictionaries; human fact-check; scrutiny of the source; and the statement is passed over here if all is well. It is a standard workflow, but clearly only as good as the various inputs. Charles Matthews (talk) 13:44, 15 August 2018 (UTC)

PubMed article license task from Cambridge eventEdit

I've uploaded the script I used for generating the QuickStatements commands on Saturday to https://github.com/tmtmtmtm/sciencesource-pmc-licenses --Oravrattas (talk) 21:42, 22 October 2018 (UTC)

Many thanks! I have captured the edits on User:Charles Matthews/ScienceSource. Charles Matthews (talk) 03:34, 23 October 2018 (UTC)

OpenRefine tutorialsEdit


It was great catching up with you at the meetup! If you want to give OpenRefine a try, we have loads of tutorials to get started:

I hope you enjoy! If anything is unclear I would love to know the areas where these materials can be improved. − Pintoch (talk) 00:10, 8 November 2018 (UTC)

Thanks. I enjoyed talking to you, too. Charles Matthews (talk) 06:45, 8 November 2018 (UTC)
@Pintoch: You asked about the focus list. I was writing some documentation about it just now: http://sciencesource.wmflabs.org/wiki/Focus_list_and_filtering_it . Charles Matthews (talk) 11:16, 8 November 2018 (UTC)
Great, thanks! Would you also have an example of the PMC topics we were talking about? − Pintoch (talk) 11:53, 8 November 2018 (UTC)
@Pintoch: For topics it was PubMed — PMC was for licenses! Here's a page linked from Eight blue babies. (Q35103130): https://www.ncbi.nlm.nih.gov/pubmed/?term=12685296. Expanding the + section, the topic is definitely Methemoglobinemia (Q748442). When you click on the link with text Methemoglobinemia/diagnosis, and choose the option "search in MeSH", you get https://www.ncbi.nlm.nih.gov/mesh?term=%22Methemoglobinemia%22. OK, the top hit is then the page https://www.ncbi.nlm.nih.gov/mesh/68008708, where we have
Tree Number(s): C15.378.619 MeSH Unique ID: D008708
and the problem is solved. In https://tools.wmflabs.org/wikidata-todo/resolver.php one can set P486 for MeSH ID and value D008708, and the result is Q748442 (or use SPARQL directly, who cares). Here the label confirms that the top hit is OK. For diseases the MeSH catalog on mix'n'match was completed this summer, so this works generally. (For other kinds of topics the matching would need to happen.)
It is just strange. Obviously "Methemoglobinemia" could be extracted from the PubMed page. MeSH has a SPARQL endpoint at https://id.nlm.nih.gov/mesh/query that could be useful. But I don't see quite how to avoid the search step and some kind of checking inference back from the MeSH information. Charles Matthews (talk) 12:39, 8 November 2018 (UTC)
Oh I see, okay, so PubMed is simply not aligned to MESH, it only has topic strings which can be searched in MESH. − Pintoch (talk) 14:25, 8 November 2018 (UTC)
Actually another way to do it is the other way round: start with a MeSH term and search PubMed: e.g. https://www.ncbi.nlm.nih.gov/pubmed/?term=%22Methemoglobinemia%22%5BMeSH+Terms%5D . That would be easy scraping. If we did enough of that, then the "alignment" would exist on Wikidata ... Charles Matthews (talk) 14:38, 8 November 2018 (UTC)

Samuel HeathcoteEdit

There does seem to be an artist Samuel Heathcote - Royal Collection has the dates 1656-1708 but perhaps two men have been conflated? Will do some research. - PKM (talk) 21:46, 9 November 2018 (UTC)

Haven't found anything on the painter so far except the Royal Collection. I expect their dates are wrong, but I have made a separate item and moved the painting links there. - PKM (talk) 23:06, 9 November 2018 (UTC)
In the matter of the Raphael derived work, I assumed Heathcote owned it. He was very wealthy[6]. Charles Matthews (talk) 06:08, 10 November 2018 (UTC)
That seems very likely. - PKM (talk) 20:04, 10 November 2018 (UTC)

Is John Dixon (Q30020905) a fictional human (Q15632617)?Edit

Hello, you've created the statement John Dixon (Q30020905) instance of (P31) fictional human (Q15632617). Where does it say that in s:en:Dixon, John (d.1715) (DNB00)? Thanks a lot in advance! Best, --Marsupium (talk) 01:23, 18 March 2019 (UTC)

@Marsupium: On s:Dixon, John (d.1715) (DNB00) there is a note. It cites the ODNB article "Dixon, Matthew", and there it states "In her article 'Nicholas Dixon, limner, and Matthew Dixon, painter, died 1710', Mary Edmond noted that 'Vertue's “Mr John Dixon” was … non-existent', and that 'Walpole added to the confusion by combining details given by Vertue about Nicholas Dixon the limner with those about “John” Dixon, and attributing them to the latter'." That is citing Mary Edmond, Nicholas Dixon, Limner: And Matthew Dixon, Painter, Died 1710, The Burlington Magazine Vol. 125, No. 967 (Oct., 1983), pp. 610-612 (3 pages), Published by: Burlington Magazine Publications Ltd., https://www.jstor.org/stable/881428. Charles Matthews (talk) 04:17, 18 March 2019 (UTC)

Health specializatyEdit

Hi Charles, With regards to the last changes to cancel health specialization, I remind you that there was a discussion in the past (https://www.wikidata.org/wiki/Property_talk:P1995). In this discussion it was decided to change the specialization from medical to health, so as to be able to include Psychology (clinical psychology, in particular). I hope everything is clear, see you soon. --Dapifer (talk) 11:58, 20 March 2019 (UTC)

Query service lagEdit

Hey, we're currently experiencing some lag on the query service. One thing that has much impact on this is edits done on fairly large items. Could you postpone your current task for a moment to give the query service some time to recover? Thanks! Sjoerd de Bruin (talk) 16:00, 24 April 2019 (UTC)

Not straightforward. Charles Matthews (talk) 16:11, 24 April 2019 (UTC)
Or restart them tomorrow? --Egon Willighagen (talk) 16:26, 24 April 2019 (UTC)
Do I understand correctly you're using QuickStatements 1 and not 2? I can recommend using the new QuickStatements (with the old format) (next time), because that one allows you to pause jobs.--Egon Willighagen (talk) 16:29, 24 April 2019 (UTC)
To explain: I do understand the issue here. I'm working on a project with a deadline, set by the WMF grant, which comes at the end of May. The processing I need to do comes in several stages: I need to move content in batches twice through QuickStatements. The batch currently running through happens to be the final one in one segment of the project. After that is done, which will be less than 30 minutes now, I can pause until tomorrow. But from the point of view of project planning, there are dependencies. Charles Matthews (talk) 16:49, 24 April 2019 (UTC)
The system seems more or less back to normal. Thanks for considering! --Egon Willighagen (talk) 17:21, 24 April 2019 (UTC)


Hello Charles,

I have visited the science source wikibase instance site. I cant create an account there. Can you please look if there is a problem. -- Hogü-456 (talk) 18:46, 2 May 2019 (UTC)

Subclass or cause?Edit

About this, do you think that "subclass" or "cause" would make more sense? WhatamIdoing (talk) 15:20, 27 August 2019 (UTC)

@WhatamIdoing: Interesting point. I'm following https://www.ncbi.nlm.nih.gov/mesh/?term=Splenosis which has Splenosis subordinate to Splenic Rupture subordinate to Splenic Diseases. So thinking of everything as a disease (Q12136) broad sense, i.e. just any abnormal condition, "subclass" tends to be used to describe a more specialised condition. But MeSH is perhaps likely to miss this sort of point sometimes: obviously splenosis is a kind of side-effect of the rupture. So has cause (P828) should be OK. Charles Matthews (talk) 15:32, 27 August 2019 (UTC)
Maybe both would be the best approach? WhatamIdoing (talk) 20:47, 27 August 2019 (UTC)
@WhatamIdoing: I've used facet of (P1269), which seems appropriate here. Charles Matthews (talk) 07:47, 4 September 2019 (UTC)
That sounds like a reasonable approach. Thanks. WhatamIdoing (talk) 17:12, 4 September 2019 (UTC)

Community Insights SurveyEdit

RMaung (WMF) 17:38, 10 September 2019 (UTC)

Wrong dataEdit

Ukrainian Wikipedia (Q199698) have both official name (P1448) and title (P1476). Why? Please remove title (P1476), thanks!!! --2001:B07:6442:8903:C4AD:B849:2AF8:ED72 15:51, 20 September 2019 (UTC)

Reminder: Community Insights SurveyEdit

RMaung (WMF) 19:54, 20 September 2019 (UTC)

your batch editsEdit

I don't think that adding "main topic"--->"enzyme" to highly specific articles helps anyone, not even an AI. I mean even copying the enzyme name from the title and finding the corresponding enzyme family by text comparison would be more valuable and could be easily implemented. --SCIdude (talk) 17:23, 30 September 2019 (UTC)

@SCIdude: Thank you for the comment. Let me explain that project as a whole.
The NCBI2wikidata tool in use is a custom tool for adding disease and other primary metadata to article items. It only adds to items about articles that are reviews and which are under a Creative Commons license. These articles, for the ScienceSource project as originally conceived, were the key ones: we were interested only in such articles, and needed 30K of them, with some other side conditions.
I would say that tagging items as reviews with a CC license is anyway a very positive thing to do. I have brought the number of articles of that kind up to 60K recently, and continue to work in that direction. So I have run numerous MeSH search items on the PubMed API used by the tool, "Enzymes" being just one of these. These are documented at
Wikidata:ScienceSource project/NCBI2wikidata rsplus1
and in particular in the section Wikidata:ScienceSource project/NCBI2wikidata rsplus1#Runs to review (non-leaf). The current run labelled "enzyme" is at the bottom of that section.
"Non-leaf" means the search term used is not a leaf of the MeSH topic tree. The QuickStatements code for "rsplus1" has the search term added to it appears as a main subject (P921) statement, which is what you are seeing.
To take an article at random, Plant Ribosome-Inactivating Proteins: Progesses, Challenges and Biotechnological Applications (and a Few Digressions). (Q41918293), by going to the PubMed link on the item I can see that "Enzymes" appearing as "enzyme" in the main subject statement actually stands for the "Ribosome Inactivating Proteins" MeSH term; in other words Ribosome-inactivating protein (Q24788543). I would do that as a conscious check through the "enzyme" additions, I would be able to make the main subject precise, and would be able to add another MeSH major term, "Plants". Further, as I have discovered just now, Ribosome-inactivating protein (Q24788543) should carry a MeSH descriptor ID (P486) statement that isn't there just yet.
To sum up, there is a two-step workflow set up for improving these high-level terms by replacing them by accurate MeSH major terms.
In constrast, the other main subject there, "biotechnology", is unreferenced, though one can see it came from the title. It is a minor MeSH term if you look on PubMed. The tagging I'm doing is upmarket of that.
Charles Matthews (talk) 18:17, 30 September 2019 (UTC)
Thanks for the explanation. Can you please give an example of an item after the full workflow? --SCIdude (talk) 19:04, 30 September 2019 (UTC)
@SCIdude: OK, I have worked over MYO5B, STX3 and STXBP2 mutations reveal a common disease mechanism that unifies a subset of congenital diarrheal disorders: A mutation update. (Q47269140) chosen at random. It had been in a previous pass, and I see now that "enzyme" was added without reference and date. That's a glitch in this one batch: the previous addition of "genetic variation" has the usual dated reference. Justifying your query.
So https://www.wikidata.org/w/index.php?title=Q47269140&action=history shows I took about 20 minutes in this case. I did have to create two new items, for Myosin type 5 and for Qa-SNARE proteins. The substitutions for "genetic variation" (which I got wrong first time) and for "enzyme" were forced from the MeSH terms because the MeSH code strings have known initial strings. Charles Matthews (talk) 20:28, 30 September 2019 (UTC)
It's a pity that you need to do this by hand. But, even if we had a complete InterPro import the items would have no MeSH link, so it can't be automatized. Having a nearly complete mapping MeSH --> WD seems desirable. Do you know of any plans about such a project? --SCIdude (talk) 05:24, 1 October 2019 (UTC)
@SCIdude: I am actually working on MeSH. Indeed, with a 1-to-1 match of MeSH into Wikidata then software can take over. I'm not a developer, but it seems to me that NCBI2wiki, written in golang, could read across the major MeSH terms from PubMed by some modifications. The question is how to get there.
So the issues with MeSH were initially quite complex: a large collection of database constraint violations for MeSH descriptor ID (P486), which I have got on top of, for the D-numbers, just recently. There were many MeSH IDs in the wrong places: there are still 100 on gene items that need to be on the corresponding proteins. New properties were created, so in particular the M-numbers now have their own property.
MeSH descriptor ID (P486) now occurs on 20K items, a recent milestone. I have done some systematic work on organic chemistry, and am currently using the MeSH Organisms catalog on mix'n'match where there is some low-hanging fruit.
So, yes, I have been active to that end for some weeks now. Completeness is a few months away. When it is a bit closer, I'll think more seriously about adapting the existing metadata tool. I see the goal of automation as within reach.
I could spend the rest of my life adding these main subjects by hand. That is not my intention. Charles Matthews (talk) 05:42, 1 October 2019 (UTC)
Tools work only if items already exist so I might look into an InterPro update import. One of its problems is to make clear that InterPro domain entries actually are families that collect all proteins with that domain, so they should be instance-of protein family. Currently I'm finishing transport protein families from the TCDB (up to a certain depth), and as a survey will look today how many of them are covered by MeSH, maybe adding the most general IDs. So I think we agree on priorities. Best regards. --SCIdude (talk) 06:01, 1 October 2019 (UTC)

New page for cataloguesEdit

Hi, I created a new page where I started collecting sites that could be added to Mix'n'match and I plan to expand it with the ones that already have scrapers by category. Feel free to use, expand. Best, Adam Harangozó (talk) 15:09, 19 October 2019 (UTC)

problem with main subject additionsEdit

You are adding wrong main subject values, are you aware of this? E.g. Q21145003 has nothing to do with globulins or blood proteins, and the MeSH terms of its PMID are not the source of it. If I were you I'd stop the bot. --SCIdude (talk) 18:01, 8 March 2020 (UTC)

@SCIdude: Thanks for the comment. I have not in fact used that metadata bot recently: I had done enough with that version, and I'm trying to learn enough of the programming language to modify it.
In this case, there was no actual mistake: the information from the PubMed API was correct. The "globulins" subject can be seen on https://www.ncbi.nlm.nih.gov/mesh/?term=Antitoxins, and "Antitoxins" is one of the two starred MeSH terms on the PubMed page https://pubmed.ncbi.nlm.nih.gov/19325885-bacterial-toxin-antitoxin-systems-more-than-selfish-entities/ for the paper. One of the diagrams on the Antitoxins page shows "Globulins" five levels up from "Antitoxins".
By the way, in December I was also completing the MeSH catalogs on mix'n'match. So all of those are now matched into Wikidata. But since the annual updates in 2018, 2019 and now 2020 have happened since the catalogs were uploaded, there are some hundreds of current MeSH terms still missing. It would be interesting to have those new MeSH terms on mix'n'match. Charles Matthews (talk) 19:10, 8 March 2020 (UTC)
But matching the title with antitoxin is wrong, it should have matched with toxin-antitoxin system or pair Q3495384 which is a biological process in bacteria, while antitoxins are immunglobulins in vertebrates. --SCIdude (talk) 19:58, 8 March 2020 (UTC)
This leads me to the question, how are main subject values updated if the item where the MeSH id was placed changes? Will your bot update the main subjects on all article items that have the old subject? In the last months I have moved lots of MeSH ids to more correct items, and the case above would need also a move. What happens in the articles? --SCIdude (talk) 20:13, 8 March 2020 (UTC)

(edit conflict)

@SCIdude: On the first point, you seem to be saying that the "Antitoxins" MeSH term assigned by PubMed is incorrect. In that case of course "globulins" should be removed as main subject. Not a software issue.
On the second point: there were numerous problems initially, of the same kind. I have a list of topics where I should check all cases. The current version of NCBI2wikidata cannot automate such a process. A more "generic" version could be more helpful, simply reading all the major MeSH terms across from the PubMed API.
For that to work smoothly, it was first necessary to complete the MeSH matching here (or have it almost done). So I did that task.
To have a really good system, there should be another advance in the bot, such as is available in principle in bots using Rust. In principle, good checking and cleanup can be available that way.
Charles Matthews (talk) 20:25, 8 March 2020 (UTC)
In summary, yes, it's their problem. And on the updating, this is a widespread issue in WD. --SCIdude (talk) 05:27, 9 March 2020 (UTC)

@SCIdude: By the way, you mentioned the issue of MeSH descriptor ID (P486) statement being moved to other items. This kind of query can find such cases:

#Check main subject referenced to PubMed for whether the topic has MeSH ID
SELECT ?item ?topic
  WHERE {?reference pr:P248 wd:Q180686.
         ?statement prov:wasDerivedFrom ?reference.
         ?item p:P921 ?statement.
         ?item wdt:P698 [ ].
         ?statement ps:P921 ?topic.
         MINUS {?topic wdt:P486 [ ]}
  LIMIT 10

Try it!

When I ran it just now, the first topic was insulin (Q7240673) where you moved it; and the next nine hits were for breast cancer (Q128581), where I moved it to breast neoplasm (Q58833934) in June 2019. Once these topics are found in this way, in principle a bot can fix them. Charles Matthews (talk) 17:11, 9 March 2020 (UTC)

Gerald WellesleyEdit

Hi! I reverted the merge you made between Q90466544 and Gerald Wellesley (Q10504274) because you appeared to be merging a human with a disambiguation page. I would have attempted a human-to-human merge, but I cannot convince myself that the human you have identified is the same as any of the various Gerald Wellesleys we already know about. In particular, looking at the Camridge Alumni page, is "1788" supposed to be a date of birth or a date of matriculation? Cheers, Bovlb (talk) 16:26, 14 April 2020 (UTC)

@Bovlb: Q90466544 is actually for a disambiguation page, in a database. But I agree it was a bad merge. Q90466544 should be deleted. Charles Matthews (talk) 16:35, 14 April 2020 (UTC)
Ah. It's gone. Bovlb (talk) 16:39, 14 April 2020 (UTC)


Be careful not to add duplicate main subject (P921) to each biographical article (Q19389637). I've had to undo some of your edits due to this. ミラP 19:28, 28 May 2020 (UTC)

@Miraclepine: Apologies. It was an oversight, and I wasn't aware that you were working on P921. Charles Matthews (talk) 20:00, 28 May 2020 (UTC)

wrong main subjectEdit

Hi! This bot change seems totally wrong. Can you please have a look? --SCIdude (talk) 09:10, 14 September 2020 (UTC)

@SCIdude: Yes, you are right. It is listed on Wikidata:ScienceSource project/Focus list, main subject MeSH errors, which is my page for systematic corrections; but it is not caused by MeSH. A wrong Q-number was used with a bot run. There is some virus topic that should be substituted. But I have a newer technique, now. Charles Matthews (talk) 09:21, 14 September 2020 (UTC)
Very good! --SCIdude (talk) 09:24, 14 September 2020 (UTC)

We sent you an e-mailEdit

Hello Charles Matthews,

Really sorry for the inconvenience. This is a gentle note to request that you check your email. We sent you a message titled "The Community Insights survey is coming!". If you have questions, email surveys@wikimedia.org.

You can see my explanation here.

MediaWiki message delivery (talk) 18:45, 25 September 2020 (UTC)


Maybe solution (Q5447188) subclass of (P279) pharmaceutical preparation (Q66089252) is true for MeSH classification tree (that includes only medical terms), but such statement does not seem to be true for solution (Q5447188) in general. Wostr (talk) 10:09, 10 October 2020 (UTC)

@Wostr: OK, I have removed it. Charles Matthews (talk) 10:23, 10 October 2020 (UTC)

another mesh subject issueEdit

Q60939671 got Q426145, probably from the heme part of "heme oxygenase". They should stop matching single words that are also part of multiple-word terms? --SCIdude (talk) 08:22, 11 October 2020 (UTC)

@SCIdude: Yes, in the sense that https://meshb.nlm.nih.gov/record/ui?ui=D006418 on the MeSH Tree Structures tab has "Heme" as a narrower term of "Polycyclic Compounds"; and "Heme" is given as a starred MeSH term on https://pubmed.ncbi.nlm.nih.gov/30583467/, as well as "Heme Oxygenase-1". The abstract mentions heme metabolism. Doesn't seem too bad to me. Charles Matthews (talk) 08:37, 11 October 2020 (UTC)
It is completely off the mark IMO. --SCIdude (talk) 08:52, 11 October 2020 (UTC)
OK, the broader terms are only placeholders. Charles Matthews (talk) 08:54, 11 October 2020 (UTC)