Andrew Gray
imported from Wikimedia project: DOI query endpoint
editHey Andrew Gray, with your bot account you have created quite a lot of references with reference qualifiers imported from Wikimedia project (P143): DOI.org query endpoint (Q28946522), see for instance in the item William Henry Bragg. 1862-1942 (Q47476886). This does not make sense, as the property imported from Wikimedia project (P143) is not supposed to be used like that; there is currently a repair job running in which I am involved, in order to clean up wrong uses of imported from Wikimedia project (P143) in references.
I reckon you wanted to express that you somehow used the DOI query endpoint for your data retrieval, and you wanted to express that in the reference. Since DOI.org query endpoint (Q28946522) is just an interface and not a work by itself, this information is not crucial for those references. I thus suggest to remove imported from Wikimedia project (P143): DOI.org query endpoint (Q28946522) from all references where this reference qualifier + value pair shows up—unless you come up with an alternative approach here. The reference URL (P854) reference qualifiers with links to doi.org which are found in the same references should be kept, of course. Help:Sources provide available source models.
In order to avoid confusion: I do not expect you to make any edits; I am prepared to do them as part of the imported from Wikimedia project (P143) clean up. I just want to talk about the matter before I start, in order to figure out whether there are alternatives to the "fix by removal" approach I have suggested. —MisterSynergy (talk) 18:39, 10 May 2019 (UTC)
- MisterSynergy Thanks for checking on this (and thanks for doing all the cleanup!). I don't recall exactly why I set the references up this way but I think it was copied from another recent scholarly-publications import and I presumed it was the correct way to do it :-)
- I've gone back and checked the script I used to create them and can confirm that they all came from querying the crossref DOI endpoint. Your proposal sounds fine and makes sense to me. Andrew Gray (talk) 18:59, 10 May 2019 (UTC)
- Okay, thanks for the answer. My bot account is busy right now, but as soon as it is available for a new job—tomorrow, likely—I will remove the imported from Wikimedia project (P143): DOI.org query endpoint (Q28946522) reference qualifiers from all references where they show up, and keep the bare reference URL (P854) URLs in the same references. —MisterSynergy (talk) 19:14, 10 May 2019 (UTC)
- Special:Diff/938607984 is a sample diff. There are ~2000 items to be modified. —MisterSynergy (talk) 19:22, 10 May 2019 (UTC)
- Okay, thanks for the answer. My bot account is busy right now, but as soon as it is available for a new job—tomorrow, likely—I will remove the imported from Wikimedia project (P143): DOI.org query endpoint (Q28946522) reference qualifiers from all references where they show up, and keep the bare reference URL (P854) URLs in the same references. —MisterSynergy (talk) 19:14, 10 May 2019 (UTC)
I have to bother you with another comment. While I played with the related queries a bit, I thought I should just have a look whether there are reference qualifiers of the style stated in (P248): DOI.org query endpoint (Q28946522) out there, and it turns out that there are a lot of them: ~342k statements in ~37k items are referenced like that (complemented with reference URL (P854) and retrieved (P813), sample item Judicial Interaction on the Latham Court: A Quantitative Study of Voting Patterns on the High Court 1935-1950 (Q29027915)). Looks like User:Harej has added them a while ago. I am now reconsidering the plan, and think a move from imported from Wikimedia project (P143): DOI.org query endpoint (Q28946522) to stated in (P248): DOI.org query endpoint (Q28946522) could be more appropriate than a removal, in order to have a uniform situation. Such a use of stated in (P248) is a bit different than what we usually do, but if someone wants to clean it up, they could start from a cleaner situation. What do you think? (User:Harej may also comment, of course.) —MisterSynergy (talk) 19:37, 10 May 2019 (UTC)
- MisterSynergy I think that might have been the batch I copied it from!
- I've only ever uploaded these ~2k articles so I don't really know what the best approach is, and it's very hard to figure out from Wikidata:WikiProject Source MetaData what we should do. A lot of the existing article items don't really seem to have much sourcing. Your stated in (P248) idea sounds quite reasonable and we can always change it later. Andrew Gray (talk) 20:49, 10 May 2019 (UTC)
- Now done, with the "move-to-P248" solution. Thanks for your input! —MisterSynergy (talk) 05:22, 11 May 2019 (UTC)
Blisse issue
editStephen Bisse (Q28065015) seems to be Stephen Bisse (Q64577142) but under a slightly different name. http://www.historyofparliamentonline.org/volume/1715-1754/member/bisse-stephen-1672-1746 ... left for you to merge, lest Edward has some significance. --Tagishsimon (talk) 11:42, 12 June 2019 (UTC)
- @Tagishsimon: almost certainly a typo when I imported it! Will merge. Andrew Gray (talk) 21:44, 13 June 2019 (UTC)
Also...
edit- William Greene (Q25845178) diff ... not sure what the House of Lords business here was; as normal, this change for your scrutiny.
- Edward Lee (Q25927448) again has P39=House of Lords when my reading of History of Parliament says he's an MP. Your bot's fingerprints. Not amended by me. --Tagishsimon (talk) 13:44, 12 June 2019 (UTC)
- Beats me where these two came from! Fixed. Thanks for spotting them. Andrew Gray (talk) 21:46, 13 June 2019 (UTC)
Constituency snafu?
editSorry to keep trying to drag you to wherever my whimsy takes me, but...
Today, I think we have a constituency snafu to consider.
We have 2,222 constituencies which are instances of constituency of the House of Commons (Q27971968), listed in Wikidata:WikiProject British Politicians/MP terms by constituency.
We also have 278 constituencies which are instances of Constituency of the Parliament of England (Q27990982) listed at https://w.wiki/4vy and these are employed in a shedload of P39s listed in User:Tagishsimon/junk.
There seems to be an overlap between the two sets. 215 of the Constituency of the Parliament of England (Q27990982) constituencies have an end date of 1707, but have exactly the same name as constituency of the House of Commons (Q27971968) constituencies which have start dates before 1707 and end dates after 1707 - they all listed here - https://w.wiki/4vz
So that seems to me to be an indication that these are duplicate items. By way of example, we have an HoC Aldborough (Q4713453) with a date range of 1558 to 1832 and a PoE Aldborough (Q60576259) with no start date and an end date of 1707.
I'm thinking we should be merging the PoE constituencies which match by name, into their HoC counterparts; and if there are any that don't match by name, we should be changing their P31s to constituency of the House of Commons (Q27971968). What think you? --Tagishsimon (talk) 01:36, 14 June 2019 (UTC)
- @Tagishsimon: So, long story here, but short answer is "this is something I had been meaning to tidy up".
- When I started doing the Historic Hansard import I found we had a lumping vs splitting question for constituencies - do we need just one item for all iterations of a constituency with a given name (as enwiki does), or so we want to split them out by dates/characteristics? (eg boundary changes, number of seats). There's also the question of whether we ought to distinguish between "UK", "GB", and "English" constituencies in much the same way that we distinguish between classes of P39 memberships. (Similarly pre-Union Scottish & Irish seats).
- My current feeling is that, in the long run, splitting will make things a lot easier, especially given the way that since 1832 seats have often been recreated then abolished, or had dramatic boundary changes while keeping the name. Otherwise we'll end up having to do a lot of heavy lifting with qualifiers once we start including things like shapefiles, or a decent model for relating constituencies to places.
- Most old constituencies will then have an English instance (to 1707), a British instance (1707-1801), and one or more UK instances (1801 onwards). This is what I started setting up when I did the initial batch of English importing.
- The problem is that the dates for the "UK" constituency pages were imported ages ago and I just hadn't got around to fixing them to be consistent yet! I'll set up a batch for this shortly to trim all these to 1801 which should remove the obvious misalignments. (I can add dummy entries for 1707-1801 GB constituencies as well, but that may as well wait until we start importing that term data). Andrew Gray (talk) 18:46, 14 June 2019 (UTC)
- That seems a perfectly cromulent direction of travel; it's the mess which distresses me, rather than a predisposition towards a particular solution. Right now it's unclear which constituency should be selected. Let me know your preference; I can have a fiddle, or keep my fingers off this area ... don't mind which. Clearly, the start dates pre 1801 & pre 1707 in UK constituencies will need to be moved to the EN constituencies. --Tagishsimon (talk) 20:41, 14 June 2019 (UTC)
- TBH some of the start dates on UK constituencies are a bit dubious - I'm happy to move them but I don't know quite how robust they are. I guess we'll find out when we compare them to known memberships! I'll have a go at setting up an edit batch this weekend. Andrew Gray (talk) 21:05, 14 June 2019 (UTC)
- Interesting question as to what you link the en.wiki article for a constituency spanning EN, GB & UK. A fourth P31=group of constituencies item? --Tagishsimon (talk) 22:32, 14 June 2019 (UTC)
- I've been keeping the link on the current (or failing that, most recent) one, and using said-to-be-the-same-as on the others - which is a bit of a hack but works well enough for now. See eg w:User:Andrew Gray/Missing MPs since 1832 where it's detecting said-to-be-the-same-as to generate WP links. Andrew Gray (talk) 23:05, 14 June 2019 (UTC)
- Interesting question as to what you link the en.wiki article for a constituency spanning EN, GB & UK. A fourth P31=group of constituencies item? --Tagishsimon (talk) 22:32, 14 June 2019 (UTC)
- TBH some of the start dates on UK constituencies are a bit dubious - I'm happy to move them but I don't know quite how robust they are. I guess we'll find out when we compare them to known memberships! I'll have a go at setting up an edit batch this weekend. Andrew Gray (talk) 21:05, 14 June 2019 (UTC)
- That seems a perfectly cromulent direction of travel; it's the mess which distresses me, rather than a predisposition towards a particular solution. Right now it's unclear which constituency should be selected. Let me know your preference; I can have a fiddle, or keep my fingers off this area ... don't mind which. Clearly, the start dates pre 1801 & pre 1707 in UK constituencies will need to be moved to the EN constituencies. --Tagishsimon (talk) 20:41, 14 June 2019 (UTC)
- Okay, morning update: I think I've fixed up the start/end dates for all English seats. and I'm just about to do a batch to bring all the UK ones with pre-1801 dates up to 1801. I'll then do a second batch to update descriptions w/ correct dates and standard terms. Andrew Gray (talk) 10:54, 15 June 2019 (UTC)
- Morning back. Consider this my (EC) - you cover much of this in your update :)
- Helpfully, we also seem to have items like https://www.wikidata.org/wiki/Q56166293 ... no P31 ... I'd have a word with its creator ;)
- Per your tweets, I was trying to work out where you'd got up to, for which read, do we now have 3 items for constituencies that span the <1707 to the >1801. I know you spoke about delaying dummy constituencies for GB ... fwiw, I'd advise against & suggest a single job to sort constituencies.
- Looking at https://www.wikidata.org/wiki/Q11797248 as an example of a post-1801 constituency, let me raise two concerns: that its P31 "constituency of the House of Commons" might as easily apply to a GB or an England constituency and suggest that we should be implementing P31s on the pattern of the English constituencies - "Constituency of the Parliament of England", "Constituency of the Parliament of Great Britain" & "Constituency of the Parliament of the United Kingdom", each of which is a P279 of "constituency of the House of Commons"; and that we sort out the descriptions of the constituencies so it's clear to users which is which - again, the desc for https://www.wikidata.org/wiki/Q11797248 gives no helpful guidance.
- Sorry to be hyper-pedantic. --Tagishsimon (talk) 11:04, 15 June 2019 (UTC)
- "Constituency of the House of Commons" - I originally had this explicitly stating UK, and am ambivalent about whether it should be in there or not. Note thought that we do use Member of Parliament (Q16707842) for current MPs without also stating UK and that seems to work. All of the constituency types are subclasses of constituency in the United Kingdom (Q2064521).
- I'm running a batch just now to add dates to all the descriptions, which should hopefully help with the potential for confusion on an individual basis. Will look into setting something up for the 1707-1801 seats. Andrew Gray (talk) 11:19, 15 June 2019 (UTC)
Okay! Every post-1801 constituency should now have a start date no earlier than 1801, and a standardised description along the lines of "Parliamentary constituency in the United Kingdom, 1832-1885" (or "to present"). All the English ones ditto, though there I've given the end date only (it seems simpler given there's a bit of ambiguity with some of them). Andrew Gray (talk) 11:57, 15 June 2019 (UTC)
- Good progress. I'll haggle with you later as to whether a <1707 or a 1707-1801 constituency should be a subclass of C~ of the UK. Clearly they were not at the time they were around & I'm unconvinced they should be today. But I mainly popped in to give you some more of the Hertfordshire-like waifs & strays: https://w.wiki/4ya --Tagishsimon (talk) 13:55, 15 June 2019 (UTC)
- Ontological tree discussion... although I sense you're loath to change the structure, I'm fairly convinced that the first predicate (pre-1707 England is not in the UK) is true; I'm less certain about the other suggestions, having argued with myself for the last hour before pressing the Publish button in favour of a cup of tea and a walk outside. --Tagishsimon (talk) 17:06, 15 June 2019 (UTC)
- This discussion predicated on the assertion that this mapping, currently in use, is wrong for the reason that the UK was not a thing way back when.
- A worked opening suggestion is all constituencies should be
- A worked opening suggestion is all constituencies should be
- English constituencies should be
- English constituencies should be
- Scottish constituencies should be
- and so on for Wales, Ireland, Northern Ireland, (France).
- Scottish constituencies should be
- That we have three P279 statements in constituency of the House of Commons (Q27971968) as follows (1066 is a dummy date, other dates are illustrative):
- ⟨ constituency of the House of Commons (Q27971968) ⟩ subclass of (P279) ⟨ <constituency of the Parliament of England> ⟩
start time (P580) ⟨ 1066-00-00 ⟩
end time (P582) ⟨ 1706-12-31 ⟩ - ⟨ constituency of the House of Commons (Q27971968) ⟩ subclass of (P279) ⟨ <constituency of the Parliament of Great Britain> ⟩
start time (P580) ⟨ 1707-00-00 ⟩
end time (P582) ⟨ 1800-12-31 ⟩ - ⟨ constituency of the House of Commons (Q27971968) ⟩ subclass of (P279) ⟨ <constituency of the Parliament of the United Kingdom> ⟩
start time (P580) ⟨ 1801-00-00 ⟩
- That we have three P279 statements in constituency of the House of Commons (Q27971968) as follows (1066 is a dummy date, other dates are illustrative):
- And, next level up:
- ⟨ <constituency of the Parliament of Great Britain> ⟩ subclass of (P279) ⟨ <constituency in Great Britain> ⟩
- ⟨ <constituency of the Parliament of the United Kingdom> ⟩ subclass of (P279) ⟨ <constituency in the United Kingdom> ⟩
- And, next level up:
- And
- ⟨ <constituency in England> ⟩ subclass of (P279) ⟨ administrative territorial entity of England (Q171634) ⟩
- ⟨ <constituency in Great Britain> ⟩ subclass of (P279) ⟨ <administrative territorial entity of Great Britain> ⟩
- ⟨ <constituency in the United Kingdom> ⟩ subclass of (P279) ⟨ administrative territorial entity of the United Kingdom (Q717478) ⟩
- And
- Also
- Also
- Sigh. and even then, we have to go further up the tree.
- ⟨ administrative territorial entity of England (Q171634) ⟩ subclass of (P279) ⟨ <administrative territorial entity of Great Britain> ⟩
start time (P580) ⟨ 1707-00-00 ⟩ - ⟨ <administrative territorial entity of Great Britain> ⟩ subclass of (P279) ⟨ administrative territorial entity of the United Kingdom (Q717478) ⟩
start time (P580) ⟨ 1801-00-00 ⟩
- Sigh. and even then, we have to go further up the tree.
- Tea. Walk. Bye. --Tagishsimon (talk) 17:06, 15 June 2019 (UTC)
- Seen this on Wikidata:Request a query. When going through Canadian constituencies (mostly based on Wikipedia articles), I came to the conclusion that within Wikidata it should be easier to use separate items when a constituency with the same name is recreated later (Wikipedia uses just one article). Seems you came to similar conclusions for UK/Brit/etc ones. --- Jura 10:40, 7 February 2020 (UTC)
- @Jura1: Thanks for the pointer. I agree this is definitely the best way to do it for our purposes, but it's a bit of a mess trying to link all the ones with the same name back to the Wikipedia item (which should probably always be on the current/most recent version of the constituency). Is there a better way than using said to be the same as (P460)? Perhaps we need some kind of "described on Wikipedia as part of XX item" property... Andrew Gray (talk) 11:00, 7 February 2020 (UTC)
- There seem to be many more involved for the UK (compared to Canada). I'm not much worried about P460, but it can be useful to add a qualifier to indicate why one adds "same as" or "different from". For name items, this considerably reduced questions/incorrect deletions and merges. Sample at Q4925477#P1889. I agree that the item about the most recent constituency is probably the one that should link to Wikipedia, as they keep re-writing to match the present. --- Jura 11:08, 7 February 2020 (UTC)
- @Jura1: Thanks for the pointer. I agree this is definitely the best way to do it for our purposes, but it's a bit of a mess trying to link all the ones with the same name back to the Wikipedia item (which should probably always be on the current/most recent version of the constituency). Is there a better way than using said to be the same as (P460)? Perhaps we need some kind of "described on Wikipedia as part of XX item" property... Andrew Gray (talk) 11:00, 7 February 2020 (UTC)
Thanks
editWe have just in the last few days added references derived from Wikidata references into infoboxes im no.wikipedia. Not without some resistance though. But that makes lack of refs here visible. That is why I'm now adding some more, and actually today could use the same source as reference to the resent death-date of a parliament member as I used as source when I started the article about him in 2006. We are making progress. Haros (talk) 13:07, 30 August 2019 (UTC)
- @Haros: Awesome! Really glad to see another country being added to the sets of politicians. Let me know if you need any help with the data model or anything like that. Andrew Gray (talk) 12:03, 31 August 2019 (UTC)
Community Insights Survey
editShare your experience in this survey
Hi Andrew Gray,
The Wikimedia Foundation is asking for your feedback in a survey about your experience with Wikidata and Wikimedia. The purpose of this survey is to learn how well the Foundation is supporting your work on wiki and how we can change or improve things in the future. The opinions you share will directly affect the current and future work of the Wikimedia Foundation.
Please take 15 to 25 minutes to give your feedback through this survey. It is available in various languages.
This survey is hosted by a third-party and governed by this privacy statement (in English).
Find more information about this project. Email us if you have any questions, or if you don't want to receive future messages about taking this survey.
Sincerely,
Reminder: Community Insights Survey
editShare your experience in this survey
Hi Andrew Gray,
A couple of weeks ago, we invited you to take the Community Insights Survey. It is the Wikimedia Foundation’s annual survey of our global communities. We want to learn how well we support your work on wiki. We are 10% towards our goal for participation. If you have not already taken the survey, you can help us reach our goal! Your voice matters to us.
Please take 15 to 25 minutes to give your feedback through this survey. It is available in various languages.
This survey is hosted by a third-party and governed by this privacy statement (in English).
Find more information about this project. Email us if you have any questions, or if you don't want to receive future messages about taking this survey.
Sincerely,
Merge
editCan you explain why 15th Parliament of the United Kingdom (Q21084432) and 15th Parliament of Great Britain (Q21095070) shouldn't be merged?--Avilena (talk) 16:33, 28 September 2019 (UTC)
- @Avilena: Sure - 15th Parliament of Great Britain (Q21095070) was a term of the Parliament of Great Britain (GB = Scotland + England + Wales), from 1780-1784. In 1801, the Parliament was renamed as the Parliament of the United Kingdom (UK = Scotland + England + Wales + Ireland), and the numbering started again; 15th Parliament of the United Kingdom (Q21084432) was from 1847 to 1852. It seems the "Great Britain" ones don't have dates - I'll try and sort that out to make it clearer. Andrew Gray (talk) 16:49, 28 September 2019 (UTC)
I found three different birth dates about this person. Can you look into this?--GZWDer (talk) 21:33, 22 January 2020 (UTC)
- @GZWDer: Interesting - I am pretty sure they're all the same person, it's not a very common name. The 1887 date is almost certainly a typo - the same source says he was promoted to Lieutenant in 1899, which definitely wouldn't happen to someone aged twelve! 1877 and 1879 are both supported by primary sources (service record index) so it's possible the date got fudged at some point. I'll see if I can find something authoritative. Andrew Gray (talk) 20:51, 24 January 2020 (UTC)
- @GZWDer: Sorted - found a photograph of his gravestone, and his entry in the 1939 National Register, which both confirm the birthdate as 1st July 1877. 1879 is probably just a clerical error in a record. Andrew Gray (talk) 13:14, 26 January 2020 (UTC)
Hi,
did you receive my email regarding the Wikipedia template? Thanks! Adam Harangozó (talk) 13:36, 14 March 2020 (UTC)
Datasheets and chip documentation
editThanks a lot for the advise on that (Reference).
I've not been able to reply because it has been archived.
I've already started implementing the second proposal with one SOC datasheet.
More will follow when I stumble upon other SOCs datasheets or add new hardware which uses SOC that have public datasheets that I know about.
GNUtoo (talk) 16:39, 19 June 2020 (UTC)
- @GNUtoo: Looks good! I've tweaked Exynos 4 Quad User Manual Public (Q96463975) so that the item itself is a little more informative - hope that's OK. Andrew Gray (talk) 17:09, 19 June 2020 (UTC)
High AGBot login rate
editHello!
Your bot is logging into Wikimedia projects over 13K times in a 48H period, which is excessive, and shouldn't be necessary.
See https://phabricator.wikimedia.org/T256533#6261565
Can you do anything about this?
https://www.mediawiki.org/wiki/API:Login#Additional_notes
>If you are sending a request that should be made by a logged-in user, add assert=user parameter to the request you are sending in order to check whether the user is logged in. If the user is not logged-in, an assertuserfailed error code will be returned.
Reedy (talk) 22:27, 2 July 2020 (UTC)
Wow! Thanks for spotting it. It definitely shouldn't be doing that so I've switched it off. I'll look into it in more detail tomorrow. Apologies! Andrew Gray (talk) 23:48, 2 July 2020 (UTC)
- @Reedy: Thanks again for letting me know about this. Looking at phab, it seems this is because wikidata-cli logs back in each time it runs a command - I hadn't realised this was the way it worked. I'll switch any future large runs to use the batch mode, which should avoid the problem. Andrew Gray (talk) 11:27, 3 July 2020 (UTC)
sco.wiki and label filtering
editA couple of days ago I wrote a query not dissimilar to your [1] and got frustrated that the filter didn't achieve my intended removal of matches. I ended up grabbing a CSV and identifying the mismatches in a spreadsheet - and putting myself in a black mood about my inadequate skill with Wikidata, not for the first time. I notice lines such as Ayr appear on your list too, though I would have expected the filter (?sco_label != ?en_label) to lose them - is there some subtlety about labels that I am missing? AllyD (talk) 06:20, 2 September 2020 (UTC)
- @AllyD: Thanks for picking up on this - somehow I'd only looked at the mismatches and not realised half of the results were still the same!
- After asking around, it turns out there was a subtlety I was missing too - the ?sco_label value renders as just "Ayr" on the website & in the CSV, but internally in the query service, it still embeds the language code (hence why we can do things like lang(?sco_label). If we strip it down to just a plain string using str(), then it works smoothly. I'll update the other queries to use this form. Andrew Gray (talk) 16:53, 2 September 2020 (UTC)
- Thanks - applying a str wrapping makes sense when I think about it. That's something new I've learned for future queries. AllyD (talk) 18:29, 2 September 2020 (UTC)
Hi, without even looking at the moral side of using Wikidata (including Wikidata:Showcase items) for escalation of sco-wiki local drama, I believe that your 3733 edits like [2][3][4][5] are formally incorrect. Wikipedia:Articles for deletion(22897) is not a WikiProject, therefore sco:Wikipedia:Votes_for_deletion/Proposal_2/list/setB is not a WikiProject focus list(51539995). Could you undo these edits, please? --Lockal (talk) 19:24, 1 October 2020 (UTC)
- @Lockal: No moral issues or drama involved! These are just for maintenance work, and I'll be removing them when that is done - hopefully in a few days. Now I look at them they are a bit weirdly named, though, so I'll give them a clearer name to explain what they're for.
- The reason I'm using items is that I need to be able to check the labels on recently deleted items, so that I can check if they're actually good Scots or not. We know a lot of pages were given titles that may not be in Scots, or have strange spelling issues, and if we leave the Scots labels in place after the items are deleted, then this will just cause problems in future.
- Regarding scope, AFD isn't a "WikiProject", but on focus list of Wikimedia project (P5008) was originally described as "property to indicate that an item is part of a group of items of particular interest for maintenance, management, or development". So I think this sort of maintenance use is OK - we already have items like Commons categories: recently created wikidata items (Q55241810) or Mechanical Curator authors - current work batch (Q51540075). Andrew Gray (talk) 23:05, 1 October 2020 (UTC)
- Ok, if you actually work on this set of items in a collaboration with other people, I guess. As a bot owner you already probably know this, but just in case: you can bulk-select labels from WDQS without adding statements. There is some limit for query length, but chunking by 1000 items works for me. --Lockal (talk) 07:24, 2 October 2020 (UTC)
- @Lockal: thanks! I hadn't realised it could cope with a thousand at a time - I think I'd only ever used it with a hundred or so, which for ~10k items was a bit small. I'm doing some filtering with SPARQL for different classes of items (since eg "places in Scotland" need a special check, and "humans" are much more straightforward since names don't change) but doing this in batches of a thousand might make it practical to do the checking offline. Will test this out with batch 3 before adding the tags :-) Andrew Gray (talk) 09:25, 2 October 2020 (UTC)
Misrepresenting facts
editIs there any way to cope with a user who just constantly misrepresents facts in an attempt to humiliate another user? I've been in the corporate world long enough to know that the strategy is to muddy the waters enough so that no one wants to touch it with a ten foot pole. Is it best to ignore? Is there a place to raise the issue? I've tried raising it with one administrator but he seems uninterested in following up. I don't want to have to "pay tribute" to someone who is clearly not working very hard at the core problem. Gettinwikiwidit (talk) 13:42, 6 November 2020 (UTC)
- @Gettinwikiwidit: I wish I had an easy answer here, sorry! Wikidata is not great at dealing with difficult-but-also-productive users; we don't quite have the community willingness (confidence?) to firmly say "no" to people that some other projects do. I'm sorry that you're having to put up with it as well.
- If you do want to try and escalate it, I would suggest taking it directly to the admin noticeboard rather than approaching an individual admin - this will get more eyes on it. Andrew Gray (talk) 20:23, 6 November 2020 (UTC)
- @Andrew Gray: It's also bizarre to continually get arguments like "your model is fictitious" or "but that item isn't real". A model is real once it's been made. What it's modeling presumably was already real. The only real question is whether there is a useful mapping from one to the other. Gettinwikiwidit (talk) 09:51, 14 November 2020 (UTC)
- @Andrew Gray: Can you help me get some guidance as to hold him accountable? This is just a mindless edit war. Gettinwikiwidit (talk) 13:14, 14 November 2020 (UTC)
- @Gettinwikiwidit: It's really distressing that he behaves like this - he did not engage with the last set of proposed changes on the 9th/10th and then immediately appeared to complain you were making them on the 14th. And this is after you've dropped the model he originally objected to!
- I'm sorry you're the target of it. I think posting at Wikidata:Administrators' noticeboard is your best bet to try and get a resolution. Andrew Gray (talk) 14:01, 14 November 2020 (UTC)
- Thanks for the pointer. Gettinwikiwidit (talk) 14:07, 14 November 2020 (UTC)
- @Andrew Gray: Can you help me get some guidance as to hold him accountable? This is just a mindless edit war. Gettinwikiwidit (talk) 13:14, 14 November 2020 (UTC)
- @Andrew Gray: It's also bizarre to continually get arguments like "your model is fictitious" or "but that item isn't real". A model is real once it's been made. What it's modeling presumably was already real. The only real question is whether there is a useful mapping from one to the other. Gettinwikiwidit (talk) 09:51, 14 November 2020 (UTC)
@Andrew Gray: FWIW, I actually think it's not the right move to remove the end time (P582) qualifiers. I'd rather have them arguably accurate for now and guaranteed accurate in the future than conform to some convention and potentially be missing in the future. It does put a burden on the query writer, but that burden is always there. But really, I think feeding the trolls is a bigger risk to the long term usefulness of this project. I'm studiously avoiding giving them any air of legitimacy. We should not allow their low-energy complaints to overshadow actual contributions. We should champion contributions and shout down trolls. I heard the complaint, thought about it and made a reasonable call with only the usefulness of the data set in mind. But I don't object to you making a different call. Gettinwikiwidit (talk) 23:39, 29 November 2020 (UTC)
- @Gettinwikiwidit: For what it's worth, I had spotted it when writing some of the start/end date queries the other day and noted it down as something I meant to suggest changing anyway - but since it had been raised on PC it seemed sensible to add the note there. Apologies for not picking it up sooner. Andrew Gray (talk) 23:59, 29 November 2020 (UTC)
- @Andrew Gray: No worries. I do believe there is a case to be made either way. I also think that the risk these people pose to the entire community is real and that administrators aren't doing enough to mitigate that risk. Gettinwikiwidit (talk) 00:04, 30 November 2020 (UTC)
@Andrew Gray: Sadly he's at it again. I think there are severe personality issues at hand here. Gettinwikiwidit (talk) 23:27, 6 December 2020 (UTC)
- FWIW, this is one reason I would have liked to have removed the old model statements sooner rather than later. See Wikidata:Project_chat#electoral_district_cleanup for the latest. Gettinwikiwidit (talk) 01:53, 7 December 2020 (UTC)
- Moreover, his comments in this latest thread make no sense given that he seems to accept that all the statements using this old district type will be deleted. He simply isn't putting in any energy and I think energy spent explaining it to him is wasted. Gettinwikiwidit (talk) 03:31, 7 December 2020 (UTC)
@Andrew Gray: This seems like a positive development! Gettinwikiwidit (talk) 02:04, 20 January 2021 (UTC)
Recusants
editDo have any thoughts about modeling "recusant"? Discussion at Project chat. - PKM (talk) 00:04, 23 February 2021 (UTC)
- PKM Apologies - was away and missed this. Andrew Gray (talk) 21:34, 3 March 2021 (UTC)
- No apology necessary. - PKM (talk) 21:35, 3 March 2021 (UTC)
Tudor and Stuart politicians (and their constituencies)
editHi Andrew. Where are we at the moment for Tudor and Stuart politicians?
Asking pursuant to a collaboration with Viae Regiae (Q105547906). --- Viae Regiae's main project is to create a detailed gazetteer of places in England and Wales in the 1500s and 1600s, with a view to understanding the evolving road network at that time. (See their GIS "CarterGraph", on which they will be plotting the results.) This could be of immense value to wikidata, giving us a chance to make our content far more systemaic and robust for place information in England and Wales at that time - the possibility of a really solid quality-controlled foundation. (Plus some pretty high-level interests would be thrilled for VR to link to their content, with the links potentially pulled from corresponding wikidata items -- eg VCH and TNA both very much on board, and VR have been making all sorts of other v positive connections). So, in the last two weeks, together with User:PKM and User:DrThneed I have been rushing to spin up a WikiProject here to partner them, Wikidata:WikiProject Early Modern England and Wales, or Wikidata:WikiProject EMEW for short. The central aim of the WikiProject will be to try to achieve 1:1 matching with their EMEW gazetteer, and also to be able to reflect any other information about the places that they bring to light.
As per the WikiProject front page, the main target for collaboration will be places (with focus areas for places of particular types, and also trying to cross-match items for places to sources like Survey of English Place-Names ID (P3627)). But also of significant interest are people -- who were the people connected to the places? ; what were the relationships between them? ; etc. -- eg all the names that crop up in Lord Burghley's Atlas (Q105468439) in the form of Lord Burghley's personal annotations re who lived where, and whether he considered them reliable. (VR is just about to start to make a full transcription of the Saxton maps, to retrieve the places marked, but the annotations show how connected people are also of considerable interest).
So I was wondering, would it be possible to set up some tabs and dashboards, perhaps under Wikidata:WP EMEW/Politicians to give an idea of our coverage of political people of the period; and what sort of areas we should be looking to try to improve, if people want to get involved? I'm thinking coverage of MPs may now be pretty good, given the work you've done. I don't know so much about the Lords, who of course were so important in the period. (And of course the more we can connect people to places, the better).
One specific question came in this morning:
- I wonder if a WD query might give us a georeferenced list of historical non-county parliamentary constituencies in the period 1530-1680, linked to a list of years in which members were returned for each? I know, for example, that Christchurch returned members in 1571 for the first time since the 14th(?) century, and similar developments elsewhere would clearly be significant to mapping economic development.
I adapted this from the query behind Wikidata:WikiProject British Politicians/MP terms by constituency, but I have no idea how complete it is. (Is 277 rows about right? Would one expect more?)
- ADDED: Animated here, as you may have seen.
Do let me know if this seems of interest. VR seem very together, and I have to say, so far I've found the collaboration a blast. Jheald (talk) 12:31, 3 March 2021 (UTC)
- I might also start adding named after (P138) to those constituencies, to allow them to be mapped. Jheald (talk) 12:44, 3 March 2021 (UTC)
- @Jheald: Hi James. Sorry I've not been picking up your comments on this - having a really complicated few weeks at the moment and not been able to put any time to WD. Interested in principle but a bit absent right now!
- Generally speaking, our coverage of pre-1832 MPs maps to HoP volume boundaries. At the moment:
- 1386-1421 - we have everyone known, with their terms and constituencies
- 1421-1504 - we only have fragmentary data
- 1509-1603 - we have everyone known, with their terms and constituencies
- 1603-1629 - we have everyone, with their terms and constituencies
- 1640 onwards - it gets fragmentary again, and very bad during the Civil War
- So for the sixteenth century, you're pretty good, but we don't have the parliaments of Henry VII in any detail. The caveat here is "everyone known" - for big chunks of this period, the historical record itself is incomplete, and so we just don't have good data. For example, in 1539-40 we only have about half of the members. In some cases, we know someone served but we're unclear on the seat (or know the seat but we're unclear on the date). I also haven't fully gone through it to sort out the oddities (eg some inconsistencies between the 1509-1558 and 1558-1603 volumes), but I think for the moment it's probably "good enough".
- The data available is usually P39 + seat, but not start/end dates - we usually don't know these with any detail and so it seems best to infer it from the term. (Oddly it looks like I only have P39 + seat not P39 + seat + term, though it would be really straightforward to get that added in bulk.)
- Lords - I would file this as "hopeless" for the moment, we haven't really done any systematic work on peers in this period. I can write up a quick model for what it should look like, but I wouldn't bet on there being any data there.
- Constituencies - as we discussed on Twitter a few weeks ago, we don't currently have much in the way of mapping of seats to geographies, though it sounds like it would be pretty straightforward to just bang a single P131 on everything pointing to either the county or the town. We don't currently distinguish "types" of seat, so a query looking for eg boroughs not counties would have trouble, unless we were to infer it from the P131s once those are added. The specific query you mention also hits up against the problem of partial data - distinguishing "there was no member returned in 1539" versus "the seat existed in 1539 so we assume there was a member returned but we don't know who they are". Not sure how best we would handle that.
- I'll try and have more of a think about this and get something together at the weekend if possible. Andrew Gray (talk) 21:25, 3 March 2021 (UTC)
- @Andrew Gray: Thanks, Andrew. No hurry. You take care of yourself. A model for peers would be great, just to give us an idea what we'd be shooting for, (and also a reminder of the model for MPs), but no urgency. Jheald (talk) 22:41, 3 March 2021 (UTC)
- Thanks. As a first stab, here is a quick report for the numbers of people and constituencies in each Parliament. You can see that it becomes a bit more comprehensive from about the mid-1500s on, but before that it's all over the shop. Andrew Gray (talk) 23:14, 3 March 2021 (UTC)
- @Jheald: Okay, here's the promised data model for MPs. At the moment, MPs in this period generally have claims along these lines:
- There is one statement per Parliament, except for the (unusual) case where they were returned for two distinct seats in the same Parliament, in which case they probably have two. A few items also have significant event (P793) qualifiers (to note special cases) or sourcing circumstances (P1480) (usually when it's dubious whether they sat or not, which tells you all you need to know about the data quality here!). No parliamentary term (P2937) qualifiers as yet but I can get those in place soon.
- The main change here compared to the "modern" data is that we don't list specific start time (P580)/end time (P582) qualifiers. For most cases it seems we don't really know this with any kind of accuracy, so it feels like adding dates here is going to be very piecemeal and/or introduce a lot of spurious accuracy. I would recommend using the dates on the membership item (eg Member of the 1542-44 Parliament (Q60585664)) or the term qualifier to infer the dates someone served as an MP.
- The data model for peers is not widely applied in this period as yet, but the approach I would recommend is something like:
- William Cecil, 1st Baron Burghley (Q354309)
- start time (P580): 25 February 1571
- end time (P582): 4 August 1598
- noble title (P97):baron (Q165503) (or a specific "Baron Burghley" if one exists)
- start time (P580): 25 February 1571
- end time (P582): 4 August 1598
- So one statement for his membership of the Lords, one for the peerage (since they may not align). Dates here start with the date he was created a peer, or inherited the title, and end with his death (or attainder, etc). For the modern Lords I am also using a "subject has role: hereditary peer", but that might seem a bit unnecessary in the sixteenth century! However, it might be helpful if you're doing the bishops as well. Andrew Gray (talk) 22:33, 5 March 2021 (UTC)
- William Cecil, 1st Baron Burghley (Q354309)
- @Jheald: Looking at constituencies now - given that there was mostly a 1:1 mapping of counties or towns onto seats at this point, I've started setting it up using located in the administrative territorial entity (P131), pointing to either the main item about the settlement (for boroughs) or the "historic county" item (for counties). Example query to try and automatically infer the type of seat (still have to actually assign it for most seats, but you can see the effect emerging - map). This also gives us a quick way to pick up coordinates, and it looks so far like we'll be able to georef every constituency that way - the coords from other items aren't perfect, but they're certainly good enough for "approximately around here". Andrew Gray (talk) 16:33, 6 March 2021 (UTC)
- Update: all pre-1707 English constituencies now have a relevant P131, except for the two universities. For the Welsh borough seats, it's a bit unclear whether they represented one town or a group of three-four towns in this period; I've listed them all under the main one for now. Andrew Gray (talk) 21:56, 6 March 2021 (UTC)
- Oh very nice. I like the layers. Huge number of seats in Devon and Cornwall ? Jheald (talk) 10:18, 7 March 2021 (UTC)
- @Jheald: Yes - the Cornish towns were very good at getting set up as parliamentary boroughs (w:Cornish rotten and pocket boroughs has a good outline of this). The other thing that is very striking to me is the lack of seats around London - other than the ancient seats of London, Westminster, and Southwark, there is nothing nearer than St Albans, Windsor or Reigate - in fact nothing within the modern M25. Did London exercise such dominance over its hinterland that no towns in a day's journey could get well enough established, I wonder? Andrew Gray (talk) 19:16, 7 March 2021 (UTC)
- Map coloured in differently (proof of concept). Jheald (talk) 17:47, 9 March 2021 (UTC)
- Alternative Jheald (talk) 18:17, 9 March 2021 (UTC)
- @Jheald: Yes - the Cornish towns were very good at getting set up as parliamentary boroughs (w:Cornish rotten and pocket boroughs has a good outline of this). The other thing that is very striking to me is the lack of seats around London - other than the ancient seats of London, Westminster, and Southwark, there is nothing nearer than St Albans, Windsor or Reigate - in fact nothing within the modern M25. Did London exercise such dominance over its hinterland that no towns in a day's journey could get well enough established, I wonder? Andrew Gray (talk) 19:16, 7 March 2021 (UTC)
- Oh very nice. I like the layers. Huge number of seats in Devon and Cornwall ? Jheald (talk) 10:18, 7 March 2021 (UTC)
- Update: all pre-1707 English constituencies now have a relevant P131, except for the two universities. For the Welsh borough seats, it's a bit unclear whether they represented one town or a group of three-four towns in this period; I've listed them all under the main one for now. Andrew Gray (talk) 21:56, 6 March 2021 (UTC)
- @Jheald: Very nice - the colours work well. I am not sure how accurate those inception date are, though; a quick glance over this report suggests they're a bit hit and miss. Some like Bridgwater (Q60576318) have a spuriously late date (I think they were missed out of the Commonwealth parliaments and recreated at the Restoration); they should probably have their earlier inception dates instead. Conversely, others like Ripon (Q60576123) are down as 1295, as they were represented in the Model Parliament and in some other fourteenth-century ones, but were effectively recreated anew in the sixteenth - so perhaps they should be listed as the 1550s instead. This isn't something I've really delved into in much detail beyond breaking off "English" constituencies for pre-1707 instantiations, and I don't know if there's an obvious way to model something that is eg 1295-1337, 1553-1707, without two distinct items. Andrew Gray (talk) 22:20, 9 March 2021 (UTC)
- Possibly a new property "re-creation of" (or, poor man's version, "significant event" = "re-creation" / "of" = ...). We might want also something similar for peerages, if we were to distinguish "1st creation" from "4th creation" of a particular title. Jheald (talk) 07:46, 10 March 2021 (UTC)
- @Jheald: Came up with another way of looking at this today - here's every parliamentary term from 1485 to 1680, plus the number of distinct counties and boroughs represented (report). In 1509, there should notionally have been 37 counties and 110 boroughs; by 1558 there should have been 51 counties and 176 boroughs, with the number of boroughs steadily rising after that. So you can see that we are definitely undercounting for the first half of the century, but after that it's mostly complete with only one or two anomalies. I'm going to go back over the 1509-1629 uploads and see if there are any anomalies still needing fixed, which might add a handful more. Andrew Gray (talk) 16:04, 14 March 2021 (UTC)
- Tallies for the 1580s look to be in the right ball-park compared with this list from 1588 [6] (not checked in detail). Jheald (talk) 17:16, 7 April 2021 (UTC)
- @Jheald: Excellent. I've gone back over my notes and I think that 1509-58 is definitely "complete as it can be" (ie everything in HoP is included); 1558-1603 might be missing at most a dozen person-term pairs, and will be matched up when I find an evening. 1604-1629 seems to have maybe a hundred or so missing - will need to dig a bit more into that one to figure out the problem. But I'd be confident in the Tudor data being pretty much complete. Andrew Gray (talk) 21:01, 7 April 2021 (UTC)
- Tallies for the 1580s look to be in the right ball-park compared with this list from 1588 [6] (not checked in detail). Jheald (talk) 17:16, 7 April 2021 (UTC)
- @Jheald: And this report is all parliamentary terms with peers known to have been in the Lords in that period. (Lords aren't linked to parliamentary terms, but it's inferred from known start/end dates). As you can see, the answer is "basically zero". Lots of work still to do there... Andrew Gray (talk) 17:43, 14 March 2021 (UTC)
- I'm slightly tied up working on manors at the moment, but this is really good stuff. Thank you! Jheald (talk) 09:26, 17 March 2021 (UTC)
Flexible reconciliation
editHey there, I thought you might find this interesting. I built a tool for running your own reconciliation service from a TSV file. I don't know if you use OpenRefine, but it's pretty useful for a lot of data related work. Reconciliation is about finding (fuzzy) matches for your data. The trouble with running against the Wikidata reconciliation service is that there's a lot of data to sift through and limited expressiveness on how to limit what you're searching through. With this tool, you can run a SPARQL query for the set of things you're looking to match against, export the results as a TSV and fire up a reconciliation service with that file. This way the filtering is as fine grained as your query.
In any event, I thought it was cool and thought I'd share. Hope you're doing well. Gettinwikiwidit (talk) 09:02, 14 April 2021 (UTC)
- @Gettinwikiwidit: This looks really neat - thanks for letting me know about it! I can definitely see ways it could be useful, and I'll give it a shot next time I do any OR matching. Andrew Gray (talk) 19:39, 21 April 2021 (UTC)
Sco language names
editHi, I recall you were involved in the review of some sco contributions. Yesterday I came across phab:T162406. Maybe you want to double-check them. These names are notably displayed if one switches to the interface in "sco" (together with the ones from fallback). Maybe only "Spaingie" for "es" is different from sco:Spainish_leid. If updates are needed, a ticket needs to be created in phabricator. I can do that if you list them here. Also, one could also add currently missing sco names. --- Jura 03:53, 15 June 2021 (UTC)
- @Jura1: There's a recent attempt to put together some notes on this here - it's focused on country names but it covers languages as well. Of the ones on that list, it would suggest for the language that we should use -
- bg - Bulgairien
- da - Dens
- en - Inglis (or "Soothron" as an alternative, but I think everyone uses "Inglis")
- es - Spainish
- et - Eastlands
- he - Ebrew (or "Hebrew" as an alternative)
- lv - Letts
- no - Norn
- pl - Polls
- ro - Roumains
- sl - Slovein
- sq - Albainien
- sv - Swaidish
- tr - Turkish
- So I think only two would stay the same.
- There are no language names suggested for bs, fa, hy, la, or mk, but there are adjectives for the people of that country in some cases. There are languages given for hr & sr, but just "Serbo-Croats" which probably isn't right for what we need. The entry for China suggests that zh would be "Staundart Mandarin" but I don't think that's quite right either (would that be a different language code? cmn). If there isn't a generally accepted name for the language, what would be the best approach - have it left blank and therefore fall back to English for the time being? I'm not sure what's normally done here. Andrew Gray (talk) 18:06, 18 June 2021 (UTC)
- In languages I know, I'd leave it to the fall-back until I'm sure about the translation to use. This is the current name for "zh" in English: Chinese and it's currently on Chinese (Q7850). So we should remove all the ones not mentioned above and update per your list? In the meantime, I created phab:T285076 for this. --- Jura 11:18, 20 June 2021 (UTC)
- Thanks - have added a comment there. I've tracked down a couple more names and added them as well. Andrew Gray (talk) 15:26, 20 June 2021 (UTC)
- In languages I know, I'd leave it to the fall-back until I'm sure about the translation to use. This is the current name for "zh" in English: Chinese and it's currently on Chinese (Q7850). So we should remove all the ones not mentioned above and update per your list? In the meantime, I created phab:T285076 for this. --- Jura 11:18, 20 June 2021 (UTC)
Corporate or grouping
editWas that ever solved? Opencorporates currently requires wikipedia entries that are being rejected by wikipedia editors. Any chance to supplement that with wikidata for groupings? 70.79.138.139 06:31, 6 July 2022 (UTC)
- I'm not sure I'm afraid - it's not an area I've really been able to spend much time working in. In general WD items do I think reflect groupings more than individual legal entities, so they would seem to be a decent match here. It looks like OpenCorporates corporate grouping (P5256) has been set up but is not heavily used, so that might be worth exploring. Andrew Gray (talk) 21:32, 6 July 2022 (UTC)
Bot to update some references
editHi Andrew! There's a category on the English Wikipedia tracking errors in a module, with some 2600 articles: en:Category:Module:Wd reference errors. Many of those are articles on the municipalities of the Philippines that use en:Template:PH poverty incidence, which in turn uses references generated from Wikidata, and fails because someone didn't add titles. There are only 6 of those, repeating over some 1400 items. I never thought I'd edit so much on Wikidata, but did try to work on this problem, and got stuck with the tools I have (or know how to use). The thing is, I don't know how to update (or completely remove and re-add) the old references. Maybe you know how with your bot? This is an example of what we need for years 2000-2003-2006-2009-2012-2015, and these are (as far as I can tell) the items that need some attention: SPARQL query (my silly attempt). Would this be too much work? I could edit the enwiki template itself, which some might find as a little less elegant solution. Cheers! Ponor (talk) 17:12, 25 June 2023 (UTC)
- @Ponor Sure - happy to take a look at this. So just to get my head around what's needed:
- every poverty incidence (P8843) statement on a Philippines item should have a reference
- and that reference should have a title (P1476) value
- and all the references sourced to a specific URL can have the same title (so if one is set, I can just copy it over)
- Does that all sound correct? Looking at your 2000 sample, a handful have it set already (list) and so cribbing those should be straightforward. Andrew Gray (talk) 18:47, 25 June 2023 (UTC)
- Well thank you! I wanted to see if you can do it before providing more detail. I believe there are only 6 files referenced for those 6 years. Not sure if the existing titles are exact, was gonna check it for you/us. You're correct: I believe they all have a reference, many are missing a title, and the title is the only thing I'd be adding. Ponor (talk) 19:22, 25 June 2023 (UTC)
- To be fair, I sort of said I could do it without actually checking I could, but after a lot of back and forth I've figured out the syntax :-)
- References are weird, programmatically speaking: it seems you have to create a new one and then delete the old one if using a script. I've worked out how to do this on a case-by-case basis - example, if that looks OK? - and I'm going to work out now how to do it for the batch. Will let you know when that's up and running. Andrew Gray (talk) 19:36, 25 June 2023 (UTC)
- @Ponor Okay, here's the final update: I can't quite get the script I normally use to add new references in a batch. However, what I can do is a workaround:
- Run a QuickStatements batch to add a second reference to the existing claim
- Run a script later to remove all the surplus old references
- It's a little fiddly and might have unexpected issues if the same value is present two years running, but fingers crossed it'll all go smoothly. Running 2000 now... Andrew Gray (talk) 22:01, 25 June 2023 (UTC)
- Right - looks like it's going in OK. I'll set up the 2003/6/etc batches tomorrow. For reference:
- All references on those statements, with and without title
- Check for statements that have got two dates applied to them, in case of anomalies
- Andrew Gray (talk) 22:18, 25 June 2023 (UTC)
- Right - looks like it's going in OK. I'll set up the 2003/6/etc batches tomorrow. For reference:
- Ouch... It's never easy, is it? :( I don't understand why they made parts of Wikidata so different, sort of un-addressable. What you did so far looks good, and I thank you for that once again. Courious about your process (in case this new addiction of mine does not go away, haha): do you export the results from query.wikidata.org and use your scripts to work on them, or is there a better way? Ponor (talk) 12:38, 26 June 2023 (UTC)
- @Ponor Okay, here's the final update: I can't quite get the script I normally use to add new references in a batch. However, what I can do is a workaround:
- Well thank you! I wanted to see if you can do it before providing more detail. I believe there are only 6 files referenced for those 6 years. Not sure if the existing titles are exact, was gonna check it for you/us. You're correct: I believe they all have a reference, many are missing a title, and the title is the only thing I'd be adding. Ponor (talk) 19:22, 25 June 2023 (UTC)
@Ponor It's a little convoluted because I think editing existing references at scale is quite an unusual problem. I would normally use wikibase-cli for this sort of problem: it lets you target specific claims or specific references very easily and set up large batches of edits. However, the problem I hit was that the batch mode doesn't seem to play nicely with the format for adding new references (which requires several statements at once). I could get it to work if I edited one item at a time (which is what I did in the original test with San Esteban), but that is a bit antisocial since it means the script logs into the server each time, and after a few thousand of those someone would probably grumble. It is certainly possible there is an elegant solution here, I just couldn't quite work it out!
What I'm doing at the moment is exporting the existing item + desired value + year qualifier from a Query Service report, then adding a reference in the desired format and running it through QuickStatements.
Q12818 P8843 31.51 P585 +2000-00-00T00:00:00Z/9 S123 Q17067223 S577 +2005-11-29T00:00:00Z/11 S854 "https://psa.gov.ph/sites/default/files/NSCB_LocalPovertyPhilippines_0.pdf" S1476 en:"Estimation of Local Poverty in the Philippines"
This creates a new statement on the target item with the desired value, qualifier, and references. However, QuickStatements won't add a new statement if one already exists with that value - normally an annoying bug, but here it's a useful feature. It means the new reference gets tacked on to the existing statement. In the unlikely event the item has the same poverty incidence (P8843) values in two different years, it'll add the year qualifier as well and then I can check for it as a weird exception.
Once that's done, I can then use wikibase-cli to remove the old reference. This is a lot more straightforward and I can run it as a batch:
echo ' ["Q12818$D562A8C1-D629-47EC-B5C0-D4ABD97AB80A", "6b1fed4b6069c3c14f8aeba940eaf118fd4a7878"] ' | wd rr --batch
The first value here is the statement ID, the second is the reference hash - so it knows which claim to target, and which reference to remove from it. Both can be obtained from the Query Service, though the first one needs a little formatting tweak (the first - becomes $).
Sorry if that was a lot more detail than you wanted, but it seemed like it might be interesting! 2003 running just now, and will get the rest queued up. Once they're all in I'll sanity-check the output and then delete the titleless references. Andrew Gray (talk) 17:26, 26 June 2023 (UTC)
- Andrew: WOW! I did (secretly) want an answer like this, but did not want you to spend too much time writing it. TYVM! Seems like I'll have to take a look at wikibase-cli: I was adding some new data from OpenRefine, which was supposed to overwrite the existing data (delete + add) - and it did, but the old ("imported from") reference remained. Glad that you mentioned that the "...a7878" string is a reference *hash*, because I was just going to ask you why they all are the same. (as they should be — for the same year)
- Happy to learn, and you're a very kind teacher. Ponor (talk) 17:56, 26 June 2023 (UTC)
- @Andrew Gray, when you get to year 2012: there's a typo in the URL: "Estima7tes" should be "Estimates". I just noticed it checking the hashes :D Ponor (talk) 18:10, 26 June 2023 (UTC)
- @Ponor Very happy to help! References are a little weird and I don't quite fully understand it - I think the idea is that if dozens of statements all share an identical reference, it doesn't have to be stored over and over again. Which makes sense, but does mean you can't tweak an existing one. It's a pity there isn't an easy way to say "I know reference has ...a7878 already exists, please attach it to this statement as well".
- 2006 going in now; 2009/12/15 are set up but QS can get a bit unhappy if you try running several batches at once so I'll run them in sequence. It looks like all but two for 2018 have a full citation with title - I've manually fixed those two to be consistent with the others.
- I had just spotted the dodgy link for 2012 - took me a while to figure out why the file wasn't responding :-) Andrew Gray (talk) 19:03, 26 June 2023 (UTC)
- @Ponor Had a look at this tonight to finish up - all looks good. I tried doing one manually first to check it was all OK -
- Q1478#P8843 has only correctly titled references, and w:Caloocan#Economy is now showing all five references correctly.
- Q13711#P8843 was left with broken plus correct references, and w:Abra (province)#Economy is showing broken references plus the error tracker category.
- Both still have the broken graph + no table, but I think that is the way the template is set up - the code doesn't seem to produce a table in the way that eg the census one does. So it looks like removing the faulty references is necessary, but that it'll work smoothly after that. I'll run it tonight. Andrew Gray (talk) 20:31, 27 June 2023 (UTC)
- Yay! Glad it worked. Now, I'm no Philippinian, but I'm sure some people will be grateful as much as I am for getting rid of all those warning messages and emptying the category byhalf. Oh the things we do... :) Ponor (talk) 21:12, 27 June 2023 (UTC)
- It's nice to have solved the problem! And doing a bit of a skim through the rest of the category suggests the vast majority of similar problems are refs with a bare URL and no title. Which I admit I am pretty sloppy at myself, so an impetus there to do better in future.
- This does sound a bit like a plausible task for a roving bot - find title-less refs, scrape a title and add it, prioritise widely used items - but I guess you'd want some manual oversight so it doesn't end up titling them all DEAD LINK 404 or PLEASE REGISTER TO READ - I'll have a think about how this might work. Andrew Gray (talk) 22:38, 27 June 2023 (UTC)
- Update: looks like we got caught out a little - we were only running the job on municipalities. There's a number of places still in the category that are something else. Broadening the net identifies ~230 items still with dodgy references that fit country (P17):Philippines (Q928) (report). I'll have a look at tidying those up tomorrow. Andrew Gray (talk) 22:53, 27 June 2023 (UTC)
- Yay! Glad it worked. Now, I'm no Philippinian, but I'm sure some people will be grateful as much as I am for getting rid of all those warning messages and emptying the category byhalf. Oh the things we do... :) Ponor (talk) 21:12, 27 June 2023 (UTC)
- @Ponor Had a look at this tonight to finish up - all looks good. I tried doing one manually first to check it was all OK -
- @Andrew Gray, when you get to year 2012: there's a typo in the URL: "Estima7tes" should be "Estimates". I just noticed it checking the hashes :D Ponor (talk) 18:10, 26 June 2023 (UTC)
Call for participation in a task-based online experiment
editDear Andrew Gray,
I hope you are doing well,
I am Kholoud, a researcher at King's College London, and I am working on a project as part of my PhD research, in which I have developed a personalised recommender model that suggests Wikidata items for the editors based on their past edits. I am collaborating on this project with Elena Simperl and Miaojing Shi.
I am inviting you to a task-based study that will ask you to provide your judgments about the relevance of the items suggested by our model based on your previous edits.
Participation is completely voluntary, and your cooperation will enable us to evaluate the accuracy of the recommender system in suggesting relevant items to you. We will analyse the results anonymised, and they will be published in a research venue.
The experiment should take no more than 15 minutes, and it will be held next week.
If you agree to participate in this study, please either contact me at kholoud.alghamdi@kcl.ac.uk or use this form https://docs.google.com/forms/d/e/1FAIpQLSfA1wfdBfCRlcG3WhDyc-V8lzgPNx3fDFCNXkyn4CSwahXZ_A/viewform?usp=sf_link
Then, I will contact you with the link to start the study.
For more information about my project, please read this post: https://www.wikidata.org/wiki/User:Kholoudsaa
In case you have further questions or require more information, don't hesitate to contact me through my mentioned email.
Thank you for considering taking part in this research.
Regards Kholoudsaa (talk) 22:06, 5 October 2023 (UTC)
Mass modification that follows the recommendations (AGbot)
editHi,
Following this modification, whether it is you or your bot which adds references, would it be possible to add the references as stipulated on the Help:Sources page, especially on this page.
It takes this form there.
I thank you in advance.
Cordially. ―Eihel (talk) 10:41, 29 May 2024 (UTC)
SIMD chat on Wikidata
editHi Just popping in to say thank you for your comments on the SIMD Wikidata question. I will have a think about how the data could be organised so that it is useful but not too granular. Thanks! Drkirstyross (talk) 12:31, 3 June 2024 (UTC)