Wikidata:EveryPolitician/Proposal:Term Membership Items

How we record a position held (P39) for someone being a member of a national legislature has gone through various evolutions over the last couple of years.

The most basic way to represent this is a bare member of parliament (Q486839), possibly with a of (P642) qualifier to the legislature in question, e.g.:

Over time the preferred route came to be to use legislature-specific items for these (thanks, in large part to a lot of work by Andrew Gray creating all the relevant items: Wikidata:EveryPolitician#By_country): i.e.

The initial approach for adding dates to these was to add a start time (P580) for someone initially entering Parliament, and a end time (P582) when they finally lost their seat (or open ended if still serving), even when this spanned multiple terms and elections:

⟨ Trudy Harrison (Q28834855)  View with Reasonator View with SQID ⟩ position held (P39) View with SQID ⟨ Member of Parliament (Q16707842)  View with Reasonator View with SQID ⟩
start time (P580) View with SQID ⟨ 2017-02-24 ⟩

We now realise that it's better to split these up so the membership in each legislative term (Q15238777) is a separate entry, both because it's then simpler to query the data in many cases, and because it lets us store much richer information:

I now propose that we take an additional step, and create distinct subclasses of Member of Parliament (Q16707842) (and its equivalents in other legislatures) for each legislative period: e.g.

This will have several useful benefits:

  1. Once the relevant item is created, we require a little less repetition for each P39 we create, thus reducing the chances of ending up with inconsistent data (and saving a little bit of effort/time on each edit)
  2. It lets us store information against the all MPs in a given term (distinct from information about the term itself, e.g. what the salary for being an MP was during that term)
  3. It makes many queries against the data significantly easier: for example queries that compare the average age or gender breakdown of members across multiple historical terms will only require simple wdt: queries — which beginners are generally much more comfortable with — rather than the more advanced version required for accessing qualifiers.
  4. It allows us to use QuickStatements to add/correct the vast majority of these records. This is currently not possible as the tool doesn't cope with multiple P39s pointing at the same target (e.g. Member of Parliament (Q16707842)). We would still need to either hand-edit, or use a bespoke bot, for cases where someone held multiple P39s during the same term (e.g. when they cross the floor, or — in countries with a stricter split between the legislature and executive than the UK — when they take a cabinet post and then later return to the legislature again), but such cases are relatively small in number, and often need more manual work anyway to properly record the non-standard details.

There are scenarios where this approach would not work, for example where a legislature doesn't have parliamentary terms (e.g. House of Lords (Q11007)). However I am only aware of two cases where this is so of a primary chamber (Argentine National Congress (Q646190) and Pontifical Commission for the Vatican City State (Q7478146)), and anyway this approach is almost entirely backwards-compatible with the existing one, so could be implemented gradually in the cases where it makes sense to do so.

Thoughts?

--Oravrattas (talk) 11:04, 23 June 2017 (UTC)[reply]

@Oravrattas: This is a really interesting proposal and I like the way it degrades gracefully - we can always fall back on the simpler data models as a first step. Have you spoken to any of the non-UK groups about it? I am not absolutely convinced it's the best solution and I think it could get confusing for some of the more basic queries (eg a simple "count all MPs" becomes a bit trickier) but it certainly seems tempting. One possible edge case is people with staggered electoral terms - the obvious example is US senators, who are elected for a six-year term covering three "congressional terms" - but we'd have a little trouble representing them accurately in the older system anyway.
Perhaps we could try rolling it out for a small and well-defined group - Scottish MSPs would be a good training ground, there's 300 of them over five Parliaments. Andrew Gray (talk) 10:52, 24 June 2017 (UTC)[reply]
Yes, when you want to operate on the entire list of all MPs ever, that becomes a little tricky, but not really any more so than if you wanted that to include Member of Parliament (Q16707842), Member of Parliament of Great Britain (Q18015642), and Member of Parliament in the Parliament of England (Q18018860) in the current system. If anything, it would actually be easier than that one, as those don't share a unique parent at the minute, whereas the new approach would only require changing ?person wdt:P31 wd:Q16707842 to ?person wdt:P279* wd:Q16707842 (or the combination if some are direct instances and some are instances of subclasses).
As for other groups, I actually discovered yesterday evening (after writing this) that this is the way that the German project has already been recommending to model things: Wikidata:WikiProject Heads of state and government/Germany#Mitglied des Landtag (I only discovered this page for the first time yesterday — its proposals go a long way beyond Heads of State|Government!)
Trying this out on something like Member of the Scottish Parliament (Q1711695) might be a good idea. Do you already have any data lined up for bulk update there? If not, Member of the Legislative Assembly of Northern Ireland (Q3272410) might be slightly easier, as Scotland and Wales both have the slightly more complicated mixed-membership model with different constituency types.
--Oravrattas (talk) 11:48, 24 June 2017 (UTC)[reply]
I also think the US version isn't too bad. At the minute we would already give someone three position held (P39): United States senator (Q13217683) entries, with parliamentary term (P2937) qualifiers of 114th United States Congress (Q16146771), 115th United States Congress (Q18740945), 116th United States Congress (Q28227688) (only the first of which would have a elected in (P2715) qualifier. The new version would simply replace each of those with an equivalent membership item. --Oravrattas (talk) 11:57, 24 June 2017 (UTC)[reply]
I don't have the MSPs ready to go right now, but I've just tried doing the first parliament and it took about fifteen minutes to crosscheck against the WP list to get all the elections/byelections/etc in place. I'll see if I can prep the rest. Multiple constituency types shouldn't be a major issue as you still sit for one named seat; the only real weirdness I've found so far is the way list members don't have by-elections, their replacement just gets coopted... Andrew Gray (talk) 22:12, 24 June 2017 (UTC)[reply]
Okay, done. Took me about half an hour per parliamentary term, working from the WP lists, when all was said and done. This suggests a Westminster term will probably take an hour or more. Still not unachievable, though. I'll run the upload tomorrow and we can see what happens. Andrew Gray (talk) 23:20, 24 June 2017 (UTC)[reply]
Excellent! I've also gone through all those by-elections and added the full date for each (the original bot only added years). The one tricky one was 2000 Glasgow Anniesland by-elections (Q5566745), where the death of Donald Dewar (Q333158) triggered both Westminster and Scottish Parliament by-elections, so I've split those out as separate items. (I've had lots of practice doing that for countries with parliamentary and presidential elections at the same time, where Wikidata very often only has a single entry with resulting confusing properties and information) --Oravrattas (talk) 06:18, 25 June 2017 (UTC)[reply]
Upload running now. Looking good so far; see Adam Ingram (Q349826) as an example. Andrew Gray (talk) 10:22, 25 June 2017 (UTC)[reply]
And as expected it falls over for multiple P39s in the same term - Alex Fergusson (Q559046) - but it should be easy enough to identify these. Can we run a report on any P39 with, say, two P580 qualifiers? I have a list that I can work through of probable conflicts (there are 29) but it would be good to develop a query which we can use to find them in future. Andrew Gray (talk) 10:56, 25 June 2017 (UTC)[reply]
SELECT DISTINCT ?who ?whoLabel ?positionLabel ?start1 ?start2 {
  ?position wdt:P279* wd:Q1711695 .
  ?who p:P39 [ ps:P39 ?position ; pq:P580 ?start1 ; pq:P580 ?start2 ] .
  FILTER (?start2 > ?start1) 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
ORDER BY ?start1 ?whoLabel
Try it!
— --Oravrattas (talk) 12:58, 25 June 2017 (UTC)[reply]
This works for all qualifier properties with one small change - != instead of > in the filter. Hurrah! Andrew Gray (talk) 20:30, 25 June 2017 (UTC)[reply]
!= will get each pair twice, but should otherwise be fine. There's probably some other way to get them only once, like comparing the Qid but I'll need to experiment a bit more with that. --Oravrattas (talk) 20:41, 25 June 2017 (UTC)[reply]
Okay, all done. There are 29 "duplicates" as noted above. I have a list of all of these and will fix them later, but will leave them as is just now so we can test some queries :-). It also looks like we've had some glitches - eg Bill Butler (Q861989) got part-way through the third entry and didn't complete it. QuickStatements doesn't seem to think there was a problem. We'll need reports to find these gaps as well. Andrew Gray (talk) 12:03, 25 June 2017 (UTC)[reply]
@Andrew Gray: that's superb! I've added a sample report to Talk:Q30580541 — is that the sort of thing you had in mind, or are there other versions you think we need? Once we get this one working how we like, we can replicate it for the other terms. --Oravrattas (talk) 12:58, 25 June 2017 (UTC)[reply]
Yes, tabular reports like that are perfect. Seems to have some issues, though, with multiple entries for a person - eg David Steel is currently showing up once in the main list, although I've split his P39 into two sections and he's appearing twice in the WQS report.
I also like the "duplicate qualifiers" check above - we can write one of these for each of the standard qualifiers and then it'll be fairly easy to check they're all fine. Andrew Gray (talk) 16:47, 25 June 2017 (UTC)[reply]
Ah, yes, I've run into that before, where Template:Wikidata list only creates a single row for the main Item. I'm not sure what the best way to work around that is. Perhaps @Magnus Manske: might have a suggestion? --Oravrattas (talk) 18:44, 25 June 2017 (UTC)[reply]
Okay, final update for the day. All done and first-pass error-checked. Every item with duplicate qualifiers has been fixed; every parliament's members have been crosschecked to see that they all have constituency, party, start dates, end dates, and end reason (except current ones, obviously, who won't have the last two). Let's see what fun stats we can generate to demonstrate the utility of this approach :-) Andrew Gray (talk) 20:30, 25 June 2017 (UTC)[reply]
Woohoo! A couple of quick examples:
Gender breakdown by term:
SELECT DISTINCT ?positionLabel ?genderLabel (COUNT(DISTINCT ?who) AS ?count) {  
  ?position wdt:P279 wd:Q1711695 .
  ?who wdt:P39 ?position .
  OPTIONAL { ?who wdt:P21 ?gender }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
GROUP BY ?positionLabel ?genderLabel
ORDER BY ?positionLabel ?genderLabel
Try it!
Average age by term:
SELECT DISTINCT ?positionLabel (ROUND(AVG(?age)) AS ?avage) {
  ?position wdt:P279 wd:Q1711695 .
  ?position wdt:P2937 ?term .
  ?term wdt:P580 ?termstart .

  ?who wdt:P39 ?position ; wdt:P569 ?birthdate .
  BIND(ROUND((?termstart - ?birthdate)/365.2425) AS ?age)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
GROUP BY ?positionLabel ?avage
ORDER BY ?positionLabel
Try it!
Will put some more together over the next few days… --Oravrattas (talk) 20:41, 25 June 2017 (UTC)[reply]


  • @Andrew Gray, Oravrattas: I was a bit concerned when I first read this and realised what you had done, as to whether it would be easy to extract a report in the previous more concise form, for showing in infoboxes etc.
But it seems straightforward enough. (Not sure I could translate it into Lua though!)
I've gone ahead and added "List of first ministers of Scotland, and the constituencies they have represented" to Wikidata:WikiProject British Politicians/Sample Queries in time for today's workshop.
However, people might like to look to see whether it could be made more robust if the data is not so complete; whether any of the variable names could be made more intuitive; and whether there are the edge cases it needs to be extended to cope with.
Hope it all goes well. Might get there for the very end. All best, Jheald (talk) 07:10, 19 August 2017 (UTC)[reply]
It would probably be nice to add to the data on the constituency, to give more about it's nature -- eg single member or regional, directly-elected or top-up. Perhaps a new property "method of election" with value in a subclass "voting system"? Jheald (talk) 07:22, 19 August 2017 (UTC)[reply]