Wikidata talk:WikiProject every politician/United States of America

On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2024/04.

Data design choices edit

In broad strokes, the position held (P39) United States senator (Q4416090) statements previously had one per senator, but that left little room to specify details about each election. Then considering that the United States Senate was designed to ensure that the entire Senate isn't overturned in a given election by means of the Senate class system, a new model was implemented which had a single position held (P39) statement for each legislative term (Q15238777) whereby a typical election ensures a senator serves three consecutive congresses. The advantage of this model is it sets a common denominator across all Senate classes. That is to say that it puts a priority on queries around which senators served at the same time over where a senator is within his particular tenure. This closely models how the data is stored in the US Senate Biographical Directory and makes it easier to cross-reference against work performed in a given Congress.

Beyond modeling which legislative Congress a given senator served in, we're left with the task of how to model the difference in the seats the Senate class system prescribes. After some discussion it was decided to encode this in the electoral district (P768) property so that a United States Senate seat (Q101500234) for a given state would have located in the administrative territorial entity (P131) property describing the U.S. state (Q35657) the seat belonged to. This leaves the position held (P39) property to United States senator (Q4416090) for simplicity and relegates this detail to a qualifier.

This structure allows us to generate tables such as this one or details of infoboxes as found here. Gettinwikiwidit (talk) 00:42, 20 January 2021 (UTC)Reply

Data sources for Senate information edit

The primary source for these data are as follows:

Unfortunately, at the time of the collection of this data, the Biographical Directory of the United States Congress (Q1150348) did not contain information about which United States Senate seat (Q101500234) each senator occupied. The only source of that information was Wikipedia and there weren't clear references to where that data was collected from. It would be great to have a more definitive source for this information.

Moreover the start and end times of each senators service during a given congress was only available from the Biographical Directory of the United States Congress (Q1150348) embedded in a prose description. A number of heuristics were devised to try to tease that information out. It looks like more this information is being added to the more easily parsed XML data. It might be worth revisiting this information as a check on these heuristics.

Gettinwikiwidit (talk) 08:58, 20 January 2021 (UTC)Reply

Keeping Wikidata up-to-date edit

While the United States Senate seat (Q101500234) doesn't currently have class information for historical senate seats, the list of current senators at https://www.senate.gov/ does. This can be scraped for updates as needed. Gettinwikiwidit (talk) 03:38, 22 January 2021 (UTC)Reply

Remaining work for Senators edit

Running the checks on the Project page, you can see that all but four outlier cases have both a parliamentary term (P2937) and electoral district (P768) qualifier on them. Quite a bit more lack a parliamentary group (P4100). Some of this is due to the fact that the Party system took a few years to be firmly established, but some of it simply requires more work. Also a handful of the names need to be entered.

Aside from these checks it would be good to have an end cause (P1534) for at least all senators who didn't serve out their term. Many of these have been done, but a systematic check has not been taken up. In addition, given the staggered senate terms outlined in the Senate Class system mentioned above it would be good to include an elected in (P2715) qualifier for each of these statements. This information is available spottily in the prose part of the Biographical Directory of the United States Congress (Q1150348), the various lists of senators by state (e.g. Georgia) and can largely be gleaned from the senate class a senator's seat is a part of. None of these options are particularly trivial. It might be best to try all of them as a cross check against each other.

It's worth noting that it wasn't until 1913 and the establishment of the 17th Amendment that there were direct elections of senators. Prior to that they were elected by the various state legislatures. It's not clear how or if we want to account for that in the use of the elected in (P2715) qualifier. Another issue with providing elected in (P2715) is the organization of United States Senate election (Q24333627). It doesn't look like all the possibilities have been created, the labeling is inconsistent and some items such as 1812 and 1813 United States Senate elections (Q5510738) look strange. I think this is because they hew closely to the items in en-wiki. Which should we use? Is it better to have a consistent set used in Wikidata which reference the Wikipedia items?

SELECT ?election ?electionLabel WHERE {
  ?election wdt:P31/wdt:P279* wd:Q24333627;
            wdt:P17 wd:Q30.
  ?election wdt:P585 ?pit.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} ORDER BY DESC(?pit)
Try it!

Feel free to expand this list of TODO items, If there is sufficient interest, we can try to divide the work.

Regards, Gettinwikiwidit (talk) 09:17, 20 January 2021 (UTC)Reply

Election years edit

I've done some work on election years. I focused only on elections since the passing of the 17th Amendment. Firstly, I labeled all off-year election (Q7078911)s and removed their instance of (P31) United States general election (Q26252880) claims. (off-year election (Q7078911) is a subclass of United States general election (Q26252880).) This makes it easier to distinguish such elections from regularly scheduled elections.

Then I standardized the labels for United States general election (Q26252880) entities. I also changed all their point in time (P585) claims to be the election day rather than the election year.

SELECT ?item ?itemLabel WHERE {
  ?item wdt:P31 wd:Q26252880.
    SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!

I also created the missing entities of per state elections and added claims that they are part of (P361) United States Senate election (Q24333627)s which are part of United States general election (Q26252880)s.

SELECT ?item ?itemLabel WHERE {
  ?item wdt:P361/wdt:P361/wdt:P31 wd:Q26252880.
    SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!

Next, I calculated which senators were the first to serve in each seat for each congressional session.

SELECT ?sen ?senLabel ?district ?districtLabel ?term ?termNum WHERE {
  {
    SELECT ?district ?term (MIN(?start) AS ?first) WHERE {
      ?sen p:P39 ?ps.
      ?ps ps:P39 wd:Q4416090;
        pq:P2937 ?term;
        pq:P768 ?district;
        pq:P580 ?start.
      FILTER ( ?start > "1913"^^xsd:dateTime )
    }
    GROUP BY ?district ?term
  }
  ?sen p:P39 ?ps.
  ?ps ps:P39 wd:Q4416090;
    pq:P2937 ?term;
    pq:P768 ?district;
    pq:P580 ?first.
  ?term (p:P31/pq:P1545) ?termNum.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
ORDER BY (?district) (xsd:integer(?termNum))
Try it!

As well as associating election years for each class.

SELECT ?item ?itemLabel ?event ?eventLabel WHERE {
  ?item wdt:P279 wd:Q101500234;
        wdt:P793 ?event.
    SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!

The idea here being that if someone took office as the first person after a scheduled election year then they were most likely elected to that office. I also tried excluding people who were appointed, but it looks like no one was appointed under this circumstance. Gettinwikiwidit (talk) 01:14, 22 January 2021 (UTC)Reply

  • I've used this heuristic to fill in people who were elected in regular elections. As mentioned above it's probably a good idea to try to cross check this against other sources. The ones without an election are likely appointed. This can also be cross checked. Gettinwikiwidit (talk) 01:38, 23 January 2021 (UTC)Reply
For senators without elected in (P2715) qualifiers, I looked at their first congresses served in and removed those marked with appointed by (P748). The hypothesis is that these were all elected in special elections. There are only ~170 of them. It should be easy enough to check.
SELECT ?sen ?senLabel ?district ?districtLabel ?termNum ?appointed ?start WHERE {
  {
    SELECT ?sen ?district (MIN(?term) AS ?first) WHERE {
      ?sen p:P39 ?ps.
      ?ps ps:P39 wd:Q4416090;
        pq:P2937 ?term;
        pq:P768 ?district;
        pq:P580 ?start.
      FILTER(NOT EXISTS { ?ps pq:P2715 ?election. })
      FILTER(?start > "1913"^^xsd:dateTime)
    }
    GROUP BY ?sen ?district
  }
  ?sen p:P39 ?ps.
  ?ps ps:P39 wd:Q4416090;
    pq:P2937 ?first;
    pq:P768 ?district;
        pq:P580 ?start.
  ?first p:P31/pq:P1545 ?termNum.
  FILTER(NOT EXISTS { ?ps pq:P2715 ?election. })
  FILTER( xsd:integer(?termNum) > 65 )
  FILTER NOT EXISTS { ?ps pq:P748 ?appointed. }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} ORDER BY ?start
Try it!

Gettinwikiwidit (talk) 02:07, 23 January 2021 (UTC)Reply

It turns out that the Wikipedia pages on elections are incredibly easy to scrape for winners. I've done this to find special elections winners and cross check the regular election winners. Along the way I tried to standardize the items for special elections and add the missing ones. These could probably use a couple more rounds of refinement. Gettinwikiwidit (talk) 11:00, 24 January 2021 (UTC)Reply
I believe this is pretty derned good at the moment, but it never hurts to have an extra set of eyes. Please feel free to fix if there are errors. Gettinwikiwidit (talk) 04:12, 25 January 2021 (UTC)Reply

Referencing the Biographical Directory edit

@Andrew Gray: I think I know how this should be done. We can simply use the ID in the reference. I made a sample showing how it would work. This way, the link stays as relevant as the claim. What do you think? I suppose it might mean that all the claims would have to be flushed if the link is updated though. Hmmm Gettinwikiwidit (talk) 04:21, 25 January 2021 (UTC)Reply

Perfect - this is what I was aiming for, I just couldn't get it to work cleanly with wd-cli! (I think "best" practice would add a stated in (P248):Biographical Directory of the United States Congress (Q1150348) as well, but it's not really essential. Nice to have for tidiness though). If you're able to process that as a batch run then great, if not I can dig out my notes and try again.
In terms of purging after a link update, I think this isn't really an issue since these items already have a bioguide ID and the item would need purging for that anyway, so I guess it doesn't really add any extra problems. Andrew Gray (talk) 20:57, 26 January 2021 (UTC)Reply
@Andrew Gray: I can do this processing. FWIW, I was more wondering about a need for purging in the future should the link change again. I still think I prefer the consistency of this approach. Regards, Gettinwikiwidit (talk) 05:55, 28 January 2021 (UTC)Reply
Actually, what do you think about *only* putting in stated in (P248):Biographical Directory of the United States Congress (Q1150348) since the ID should exist elsewhere in the entity? Gettinwikiwidit (talk) 06:36, 28 January 2021 (UTC)Reply
@Gettinwikiwidit: I think it's probably better to give both - it makes clear that we mean "found in this specific entry" rather than more generically "sourced to the overall work". You can get cases where a statement is sourced to Entry X while being on Item Y; none of the ones we're looking at here fill that criteria, but it does happen with other identifiers so it's probably better to be explicit rather than implicit. If we only give one, I think the ID is probably the more important. Andrew Gray (talk) 19:10, 28 January 2021 (UTC)Reply
@Andrew Gray: Okay, fair enough. There was a bug in wb-cli, but the author is always very responsive, so it's fixed now. I'm adding the new refs now and will delete the old-style ones once that is done. Regards, Gettinwikiwidit (talk) 22:18, 28 January 2021 (UTC)Reply
@Gettinwikiwidit: amazing! I had been convinced I was making a stupid error somewhere as this was the first time I'd tried to use that syntax. Kind of relieved to know it wasn't just me :-) Andrew Gray (talk) 22:27, 28 January 2021 (UTC)Reply
@Andrew Gray: it looks like I'm being throttled. Getting "maxlag: Waiting for all: 7.0666666666667 seconds lagged" errors. Do you know any way I can check if this is just me? Gettinwikiwidit (talk) 01:46, 29 January 2021 (UTC)Reply
@Gettinwikiwidit: This'll be for everyone, I think - the maxlag feature is designed so that all tools slow down a bit when the database lag gets too high on servers, and they have a chance to recover without getting completely out of sync. Last night seems to have briefly spiked over the 10s limit about the time you posted and then cleaned up again. Andrew Gray (talk) 09:54, 29 January 2021 (UTC)Reply
@Andrew Gray: I see. I was able to get all the updates done last night. I'm about a third of the way through purging the old reference URL (P854) references. Gettinwikiwidit (talk) 10:27, 29 January 2021 (UTC)Reply

Appointments after election to the subsequent term edit

These senators are not listed on senate.gov's list of Appointed Senators, but are described as having been appointed subsequent to a term they were elected for.

SELECT DISTINCT ?sen ?senLabel ?ps ?diff WHERE {
  #hint:Query hint:optimizer "None".
  ?sen p:P39 ?ps;
       wdt:P1157 ?bioid.
  ?ps ps:P39 wd:Q4416090;
      pq:P2937 ?term;
      pq:P580 ?start.
  ?term p:P31/pq:P1545 ?termNum;
      wdt:P582 ?termEnd.
FILTER NOT EXISTS { ?ps pq:P2715 [] }
FILTER NOT EXISTS { ?ps pq:P748 [] }
  FILTER NOT EXISTS {
    ?term wdt:P155 ?prevTerm.
    ?sen p:P39 ?ps2.
    ?ps2 ps:P39 wd:Q4416090;
         pq:P748 [];
         pq:P2937 ?prevTerm. }
FILTER ( xsd:integer(?termNum) > 65 )
  BIND ( (?termEnd - ?start) AS ?diff )
    SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} order by xsd:integer(?diff )
Try it!

Should probably compare with this Wikipedia list. Gettinwikiwidit (talk) 01:34, 30 January 2021 (UTC)Reply

117th Congress replaces/replaced by edit

@Andrew Gray: I used the following to fill in the replaces (P1365) and replaced by (P1366) qualifiers.

SELECT ?sen ?senLabel ?prevSen ?prevSenLabel WHERE {
  ?sen p:P39 ?ps.
  ?ps ps:P39 wd:Q4416090;
      pq:P2937 wd:Q65089999;
      pq:P768 ?district.

  ?prevSen p:P39 ?ps2.
  ?ps2 ps:P39 wd:Q4416090;
      pq:P2937 wd:Q28227688;
      pq:P768 ?district .
  FILTER ( ?sen != ?prevSen )
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!

Overlapping dates edit

In both the Biographical Directory of the United States Congress (Q1150348) and the list of Senators of the United States, Cordell Hull (Q202979) final term as senator ends on March 3rd, 1933, but his successor Nathan L. Bachman (Q1768884) begins on February 28th, 1933. The appointment is also listed as being on February 28th, so I'm inclined to believe that's the right date. Some light googling seems to indicate that this confusion is widespread. I'll try to follow up with the Senate historian. 2405:6580:2880:AB00:551F:CB5B:A0AF:8D69 03:40, 1 February 2021 (UTC)Reply

OpenStates Property Proposal and Mix-n-Match edit

I created a OpenStates in Mix-n-match with nearly 8000 people IDs to match and property proposal to enable comparisons with current legislatures based on their ongoing work. Once matched the Q's will be fed back to OpenStates too. Working on this has enabled fixing some Wikipedia infoboxes with errors in district numbers. Wolfgang8741 (talk) 16:12, 11 February 2022 (UTC)Reply

Return to the project page "WikiProject every politician/United States of America".