Wikidata talk:WikiProject Higher education

What about labs and institutions edit

What about adding research labs and institutions? (as Universities normally comprise both teaching and researching!)

This would comprise at least:

https://www.wikidata.org/wiki/Q483242

https://www.wikidata.org/wiki/Q31855 OdileB (talk) 10:33, 28 March 2018 (UTC)Reply

@OdileB: We do have a broader scope that should include any organization under Wikidata:WikiProject Organizations; however we definitely have some coverage of research laboratories and research institutes that aren't themselves educational institutions via some of the databases listed here. If you look at the P31 statements on items with GRID ID's for instance you'll see a lot of research institutes. ArthurPSmith (talk) 14:21, 27 March 2018 (UTC)Reply
@ArthurPSmith: OK then .. Do you know how I can add massive references in QuickStatements, please ? I have tried Q30262322 P3016 "201421581B" S854 "https://www.data.gouv.fr/fr/datasets/repertoire-national-des-structures-de-recherche-rnsr/" ; doesn't work .. I'd like to reference all my RNSR identifier (P3016) recent additions. Can you help ? Should I ask elsewhere ? Thanks. OdileB (talk) 12:15, 4 April 2018 (UTC)Reply
sorry I'm not 100% on top of QuickStatements syntax - check the Help pages or ask on Project Chat. ArthurPSmith (talk) 12:17, 4 April 2018 (UTC)Reply

Updating the progress bars edit

Updating the counts on this page is currently done manually, with a SPARQL query like this:

SELECT (COUNT(DISTINCT ?inst) as ?defined) WHERE {
  ?inst wdt:P159 ?val.
  ?inst p:P31/ps:P31/wdt:P279* wd:Q4671277.
}
Try it!

Any better solution would be welcome. − Pintoch (talk) 08:14, 7 March 2017 (UTC)Reply

Endowment Property edit

Does anyone know if there is a property for endowment? A few searches yielded nothing by name. There were also no similar properties as far as I could tell. Will (Wiki Ed) (talk) 19:48, 28 February 2019 (UTC)Reply

Importing GRID edit

@ArthurPSmith: Did you see that there is a new release of GRID? Do you plan to import it with APSbot? I noticed that your bot adds a reference for every dump of the dataset. Is that a bug or a feature? On the long term it does not look very desirable. Last question: any idea why GRID provides only 16,069 Wikidata ids while they have much more Wikipedia links (22,103), many of which are associated to Wikidata items? Anyway it looks like APSbot understands Wikipedia links too, so that's great! − Pintoch (talk)

I haven't been updating with every release, probably about every 3 releases. I may delete old references if they are redundant, I haven't quite worked out what to do there. Note that wikidata now has over 28,000 links to GRID. I do plan to send GRID a list of their missing links. The extra wikipedia links in GRID are not very reliable; I ran through them once and imported those relationships, but I've since had to delete quite a few as they violated constraints (for example linking to a position rather than a government agency, linking to a generic rather than a specific instance of something, etc.). Any other suggestions on this of course are welcome! It's a bit of a tedious job doing the matching, a lot was missed by Mix n' Match automated matching as names just don't quite match up due to language or other issues. ArthurPSmith (talk) 20:19, 13 March 2017 (UTC)Reply
Ok! I'd be very interested to explore a bit more this matching problem. And yes, 28,000 GRID ids is a very nice achievement. I have recently been working on a reconciliation interface for OpenRefine that could be used for the remaining ones. It does some fuzzy-matching and can take into account other fields than names. I'll see if I can get reliable matches for this dataset. The good thing about GRID is that they provide so much metadata that it should be possible to get very solid matches. It's a fantastic database so getting it into Wikidata is my top priority for the WikiProject. I think we can probably aim for complete coverage as the vast majority of ids should refer to notable institutions. − Pintoch (talk) 21:04, 13 March 2017 (UTC)Reply
agreed on notability! I've been occasionally adding things that don't have existing wikidata/wikipedia entries, but there's a lot in GRID that we don't seem to have at all, especially on the corporate side. ArthurPSmith (talk) 21:04, 14 March 2017 (UTC)Reply
on matching - some notes from my experience:
  • if GRID has a home page then *sometimes* if you abbreviate that URL (take out the "/en" pieces etc) that quickly gives you to an exactly matching institution. That might be a good route, but note that just doing a Special:LinkSearch on the various language wikipedias often brings up unrelated pages that link to the site in question (for example very often a government agency with information about, say, environmental issues is linked from pages about those issues, not from the government site).
  • The most complete wikipedia pages for a given country are almost always in the official language of that country - for example for Polish institutions, look on pl.wikipedia.org, that is the most likely location to find an existing page and thus a matching link to a wikidata item. You can sometimes short-circuit this using Google Translate to convert the GRID label into a label for the language and search directly on wikidata - however, most often these wikidata entries are very minimal and you have to check the links to make sure it's a good match.
  • Sometimes the GRID labels are in English, and so need to be fully translated; sometimes they are simply a latinization of the native language name, and need to be transcribed into the character set of the country in question (this may be hard for Chinese; it's worked for me for Russian, Ukrainian, etc. though). Sometimes they are a full official name (for example many institutions are named after famous people, and so the full official label includes those names) and sometimes a more abbreviated name, sometimes they include the country name in parentheses; it would probably be useful to be able to group and handle these different cases differently somehow.
  • Label is NOT sufficient to match - you definitely need to match on country, but city is important for institutions with almost the same name at distinct locations in the same country, which seems to happen often. However, sometimes the GRID city differs from the wikidata/wikipedia city indicated, even though they are the same institution, as one of the two might specify a more precise suburb or other geographic location, so assisting the matching with some sort of geographic distance measure might be helpful.
In general I think this is very hard to automate, but I guess more could be done, perhaps an initial focus on English-language countries would be easiest... ArthurPSmith (talk) 21:18, 14 March 2017 (UTC)Reply
Thanks a lot for these tips. A few thoughts:
  • Some other databases (at least Open ISNI and Fundref) provide alternate names for their records. As GRID is now pretty well aligned with these, it might be worth trying matching these names too.
  • The idea to narrow down by country / language is interesting
  • Filtering by coordinates is a tricky problem - I would be interested to know how to get that right in the OpenRefine reconciliation interface. I guess that the matching score should be roughly logarithmic in the distance between the two points.
  • Same for URLs: a specific matching strategy is required.
I am currently matching GRID in OpenRefine, using the city and country codes to refine the matching score. Very few items are automatically matched, which is to be expected as all the low hanging fruits were probably caught by automatic matching in Mix'n'Match. But the high-scoring ones look pretty good, so it might be possible to get a few thousand more matches with this technique. I'll share the OpenRefine project once I am satisfied with my reconciliation settings. − Pintoch (talk) 22:13, 14 March 2017 (UTC)Reply
Note GRID has alternate names (aliases nominally in English, as well as labels in native language) for many institutions which I have not attempted to bring into Mix n Match, so those might help quite a bit with a broader matching tool. Those GRID labels might be better choices than using ISNI - I've found ISNI alternate name lists (which tend to include historical names, names for parts of the institution, names for temporary events that happened at the institution, etc.) far too broad. On geographic - I don't think you want logarithmic preference - closer is better but I would think anything more than, say, 100 km apart (if the coordinates were originally to that precision) should be completely excluded as a match.. ArthurPSmith (talk) 20:27, 15 March 2017 (UTC)Reply
I just downloaded the latest GRID release and I'm running it through my matching checks to put the latest ones in. About 50 more relationships added at least! ArthurPSmith (talk) 20:59, 15 March 2017 (UTC)Reply
Great job! − Pintoch (talk) 16:58, 18 March 2017 (UTC)Reply

Finishing off the GRID import edit

  Notified participants of WikiProject Universities

So, now that GRID is fully matched to Wikidata, I would like to import everything we have not imported from them yet, so that we can finally consider that Wikidata completely includes GRID (or at least what we want to retain from them). Here is a quick overview of the state of affairs: Wikidata:WikiProject Universities/External databases/GRID.

Can you comment on the usefulness of the remaining properties? We can propose the ones we want to create all at once, and then get this done. − Pintoch (talk) 18:49, 11 December 2017 (UTC)Reply

@Pintoch: Thanks, nice table and analysis! My thoughts on additional properties/things to import:
  • I haven't looked at HESA, UKRLP or UCAS, so no comment on those. Orgref was interesting to us for a while but I agree it's duplicative - in fact it was largely based on enwiki id's to start with (the first 10,000 or so of their id's are actually the same as the enwiki numerical id's for the related pages). Also it hasn't been updated much recently - mostly just pruned back a bit to remove small sub-institutions (departments etc) not really of interest. So I don't think it needs its own property. They do have VIAF id's that are not available directly from GRID - but we could pull those via the GRID relationships without requiring an independent Orgref property.
  • On "Label" - I've been adding those to Wikidata as the preferred label in the given language when I could. I don't think GRID specifies a "native language" anywhere so I don't think P1705 is right for that, unless you can do some kind of mapping from Country to language code and there's a GRID label in that language.
  • You left out the "type" property that is available from GRID; I've been using that for P31 (instance of) relations when there wasn't an existing one (mostly just for new items). We could use the "type" and/or P31 to choose between the different "location" options:
  • For type = Education, Facility, Healthcare, Archive, Other use P625 directly (not as qualifier) and P131.
  • For type = Company, Government, Nonprofit use P159 with P625 as qualifier
That won't get them all perfect, but it's a reasonable first guess I think.
  • On international organizations - I don't think there's any way to tell from GRID that's what they are, they tend to create a separate ID for each country the organization has a presence in, so it could get confusing. I wonder if we should try to add P17 = novalue to such organizations? Otherwise I would generally be careful pulling in P17 values from GRID for existing organizations, maybe leave that one for manual work rather than automation.
  • GRID does include organization relationships - part of (P361) or parent organization (P749) would be great to pull in from their data; I don't think it's very complete, but I'm sure there's a lot they have that is not yet in Wikidata.
Make sense? ArthurPSmith (talk) 19:46, 11 December 2017 (UTC)Reply
Thanks for the comments!
  • On "label" and native label (P1705): yes, we would need to select the correct language based on the country (I guess that is only possible when there is only one official language (P37) for that country.) And that isn't strictly speaking something that we import from them, it is rather inferred.
  • On countries - yes, the types do not help much in spotting international items, so I agree it should not be done automatically
  • For the types, yes they are worth adding when instance of (P31) is not present yet. By the way, I am not a huge fan of their "facilities": I think in most cases it actually corresponds to an organization (a research institute (Q31855) maybe?)
  • For the relationships - yes of course, it's one of the main things to import actually, I don't know why I left them out! I have added them now. − Pintoch (talk) 06:37, 12 December 2017 (UTC)Reply
For "Facility" I've been using facility (Q13226383) but I agree that research institute (Q31855) is often the right choice. ArthurPSmith (talk) 14:21, 15 December 2017 (UTC)Reply

Event ongoing now - Indian scientists datathon edit

I recognize that there is not much activity here, but right now 22-24 May 2020 there is this event in progress.

This event to profile scientists includes matching them to universities, and that means building out universities. Check out this event as a model for future university scientist events. Blue Rasberry (talk) 22:42, 22 May 2020 (UTC)Reply

Significant Events edit

I'm looking utilize significant event (P793) to indicate a US university accepted non-white students for the first time in a given year. A few universities I've browsed through have a significant event listed for mixed-sex education (Q541394). I'm wondering if there's a similar entry for mixed-race education; maybe a new Q entity is needed? My searches have come up short. Thanks in advance for any help here. Archiverlandson (talk) 14:12, 9 September 2020 (UTC)Reply

@Archiverlandson: I think mixed-sex education (Q541394) isn't really a great representation of the information, as it communicates an event as a characteristic. I think it'd be better to have gender educated (P7419) -> all genders (Q70853302), with a qualifier of start time (P580). It looks like we don't have a Q entity for "all races", but if you're going to create that, I'd recommend at least doing it the right way. {{u|Sdkb}}talk 01:37, 30 December 2020 (UTC)Reply
@Sdkb: Thanks for the input - especially noting the event vs. characteristic split. I think that's a good way to look at the issue. I may look into creating some sort of all races Q entry. Archiverlandson (talk) 19:24, 5 January 2021 (UTC)Reply

Project name edit

On en-WP, WikiProject Universities was renamed to WikiProject Higher education to better reflect that its scope includes colleges and other non-university institutions of higher education. Should we do likewise here? {{u|Sdkb}}talk 20:00, 30 December 2020 (UTC)Reply

Seeing no objections, I'm going to boldly rename the project. I'm not very familiar with Wikidata projects, so I may miss some things. Apologies in advance to anyone who has to clean up after me to complete the move. {{u|Sdkb}}talk 22:30, 23 March 2021 (UTC)Reply
  Done. I think I got everything. {{u|Sdkb}}talk 22:52, 23 March 2021 (UTC)Reply

property for head of department/college? edit

Hi folks - I'm looking for the correct property to identify the head of a department -- in the US university system these are often called chairs (so not the head of the whole university, but a subcomponent of it, either a college or individual department). I'm thinking chairperson (P488) is right - any thoughts? See Massachusetts Institute of Technology Department of Mechanical Engineering (Q104145690) for an example item. -- Phoebe (talk) 15:58, 6 January 2021 (UTC)Reply

What qualifier is best when a professor has an official website in more than one department? edit

Does it make sense to use the qualifier applies to part (P518) with the Qid for a university department when a professor has academic appointments in more than one department and therefore has two or more official websites? I'm looking for the best way to overcome the constraint violation on official websites. For a real life example: Phillip Thurtle (Q105086473). Have a look at the official website statement in that item. https://history.washington.edu/people/phillip-thurtle is his website in the UW Department of History and https://chid.washington.edu/people/phillip-thurtle is his website in the UW Department of Comparative History of Ideas. Does "applies to part" work as a qualifier, or is there something better that I should use? UWashPrincipalCataloger (talk) 22:02, 30 January 2021 (UTC)Reply

It looks like this was addressed on Project Chat, with the recommendation to use of (P642). ArthurPSmith (talk) 18:51, 1 February 2021 (UTC)Reply

Discussion at Project chat edit

You are invited to join the discussion at Wikidata:Project_chat#College_admissions_statistics. {{u|Sdkb}}talk 23:00, 23 March 2021 (UTC)Reply

IPEDS data import edit

Heads up, I've imported some data from the 2020 Integrated Postsecondary Education Data System (Q6042926) dataset, which covers basically all U.S. higher education institutions. Specifically, I've imported student body size, website URL, total assets value, endowment value, employee count, and admission rate. There are likely some other values that can be harvested from the "compare institutions" option here if anyone wants, and several of these ought to be updated annually in the future.

The IPEDS site isn't exactly a sterling example of web usability, but I managed to navigate it, and using OpenRefine wasn't as difficult as I feared: just run it through PAWS and follow this tutorial (which very helpfully uses as its example the UK equivalent of IPEDS). If anyone takes this up in the future and runs into difficulty, feel free to ping me. Cheers, {{u|Sdkb}}talk 22:35, 31 December 2021 (UTC)Reply

WikiProject Clinical Trials for Wikidata - the paper edit

 
WD:CT

See Wikidata:WikiProject Clinical Trials. Care about this because

  1. We presented it as a model WikiProject in this new preprint - https://doi.org/10.1101/2022.04.01.22273328
  2. It is a tidy cool WikiProject which coordinated the import of ClinicalTrials.gov (Q5133746) to Wikidata
  3. I and others will continue to develop this project and further integrate it with Wikidata for medicine, universities, biographical records of medical researchers, and meta:WikiCite

Comments requested here or at the WikiProject talk page.

For this WikiProject check especially Wikidata:WikiProject_Clinical_Trials/Query#Clinical_trials_at_Vanderbilt_University as an example for how to match research with a university.

THANKS TO ANYONE WHO COMMENTS! Bluerasberry (talk) 19:25, 19 April 2022 (UTC)Reply

NACUBO endowment data imported edit

Heads up, @Kiran891 has helpfully imported NACUBO endowment data for U.S. institutions. (See also this above for previous similar work.) Discussion about how to use it on English Wikipedia is taking place at w:Wikipedia talk:WikiProject Higher education#NACUBO Endowment data is now available on wikidata. Cheers, {{u|Sdkb}}talk 05:24, 14 May 2023 (UTC)Reply

Hi @Sdkb, I'm having a bit of difficulty with following things. Since this was my first foray into automated editing, I feel out of my depth on this..
  1. I cannot change rank with either openrefine as well as QuickStatements. It is possible to add "reason for preferred rank", but not the rank itself. Also, setting the "reason for preferred rank" qualifier to "most recent value" doesn't affect value pulled to the infobox, meaning the reason qualifier doesn't actually decide rank. User:Snipre says here that "Prefered" rank should not be used to distinguish between two sourced data of the same rank but one being more recent than the other". How will this be handled when say I add data next year, we will need to change the field and change preferred value again? Is there a way to pull up the most recent value for the property?
  2. I can't seem to figure out how to replace "author" field in the reference to "publisher". I think the option is to delete all previous additions and re-add with new fields. Can't seem to just "replace" as value; it's either add/merge/delete. All my test edit attempts result in either property deletion (all of the endowment values), or merging/duplication of references. QuickStatements cannot remove just the reference without removing the statement itself. I will check again but let me know if you know how to do this.
I checked out the changes at two randomly chosen pages:
  • On w:College of St. Scholastica (Q7726780) page, we need to figure out a workaround to pull most recent data. Also, I don't know why all the excess refs appear here (IPEDS refs is used once but shown twice in reflist; & NACUBO ref, which is in reflist but nowhere in the article).
  • On w:Cedar Crest College (Q5056666), I manually changed preferred endowment value to 2022 and corrected author to publisher, so it now has correct endowment in infobox and no error in ref, but it is still shown twice in reflist.
Let me know if you would like me to send you the data, so you can check out different schemas. Thanks. Kiran891 (talk) 15:19, 16 May 2023 (UTC)Reply
@Kiran891, hmm, if I recall correctly from the IPEDS data import I did, I couldn't figure out how to do preferred rank either. And that was my only foray into automated Wikidata editing, so I'm not exactly an expert. I would ask at wikidata:Project chat, as others there may be able to help. My understanding is that preferred rank is quite commonly used these days to highlight the most recent value (thus why most recent value (Q71533355) exists), so I don't think that talk page comment from 2013 represents a consensus practice. Cheers, {{u|Sdkb}}talk 15:26, 16 May 2023 (UTC)Reply
Hi @Sdkb, Per help I have figured out that to replace author (S50) field with publisher (S123) field, I have to delete previous statement (-) and then readd with new qualifiers. (Bold by me). Here's a quickstatement code for such a change performed in a sandbox.
-Q4115189 P6589 421189000U4917 P585 +2024-00-00T00:00:00Z/9 S50 Q30268203 S854 "https://www.nacubo.org/Research/2022/Public-NTSE-Tables" S1476 en:"U.S. and Canadian 2022 NTSE Participating Institutions Listed by Fiscal Year 2022 Endowment Market Value, Change in Market Value from FY21 to FY22, and FY22 Endowment Market Values Per Full-time Equivalent Student (Excel)" S813 +2023-05-15T00:00:00Z/11 S577 +2023-02-17T00:00:00Z/11
Q4115189 P6589 421189000U4917 P585 +2024-00-00T00:00:00Z/9 S123 Q30268203 S854 "https://www.nacubo.org/Research/2022/Public-NTSE-Tables" S1476 en:"U.S. and Canadian 2022 NTSE Participating Institutions Listed by Fiscal Year 2022 Endowment Market Value, Change in Market Value from FY21 to FY22, and FY22 Endowment Market Values Per Full-time Equivalent Student (Excel)" S813 +2023-05-15T00:00:00Z/11 S577 +2023-02-17T00:00:00Z/11
Will try to check this out on a few items and will add archive-links etc. Thanks. Kiran891 (talk) 12:02, 19 May 2023 (UTC)Reply

Adding "Events" tab and subpages edit

I'd like to add an events tab to allow for adding some event based subpages, e.g. "2023 - Editathon at X University" where people could collect relevant data during the event. Like query statistics before and after the event, what kind of data is missing from their target and instructions for students. Would this make sense to you, or could that be done somewhere else? Kristbaum (talk) 10:47, 25 July 2023 (UTC)Reply

Confusion around freshmen edit

This discussion, Wikidata:Project chat/Archive/2023/08#How to depict freshmen, revealed a bunch of overlapping/ill-defined items that need to be fixed, and the original question was never answered. Members of this project may be interested in helping.

  Notified participants of WikiProject Higher education. {{u|Sdkb}}talk 19:12, 9 August 2023 (UTC)Reply

Strategy for dealing with missing wikipedia entries for a school edit

I'm trying to get all the dental schools in the US properly added to wikidata, but I'm facing a couple of challenges. First, the US accrediting body for dental schools (Commission on Dental Accreditation) doesn't have its own wikipedia page (it's a creature of the American Dental Association). I've drafted one and submitted it for approval, but it says it could take as long as 4 months for it to show up (!). So until then, I can't fill in the accredited by information?

Similarly, there are a dozen or so college of dentistry that don't have their own pages. For example, The Dental College of Georgia (a college of Augusta University) doesn't have a page. If we want it entered as a dental school in wikidata, does it have to be added to wikipedia? I'm not enthusiastic about creating a bunch of dental school wikipedia articles.

Any suggestions? Am I missing something?

Thanks! XKL (talk) 22:08, 22 October 2023 (UTC)Reply

Hello, thank you for the contributions. I am not admin or bureaucrat so I cannot answer precisely. But from what I understand about the nature of Wikidata and Wikipedia, items in Wikidata does not have to had its counterparts in the Wikipedia. Wikidata will serve as the basic data foundation for the Wikipedia to come, and Wikipedia, as well as Wikidata depends on the community, so either you kindly wait for the page to be accepted, or wait for other to initate them. Athayahisyam (talk) 03:50, 24 October 2023 (UTC)Reply
@XKL: No, you don't have to create wikipedia pages, adding things directly to Wikidata is fine. You should make sure you have at least one reference or identifier as you fill out each item. ArthurPSmith (talk) 18:00, 24 October 2023 (UTC)Reply
Thanks for the suggestions. I'm glad having a Wikipedia entry isn't an absolute requirement. Now I'm off to see if I can find another identifier for these. I'm not at all well versed in other options. XKL (talk) 13:58, 28 October 2023 (UTC)Reply
Ok, I have begun my journey of getting all the dental schools in the US added or updated here.
An example of a new school I added: Q123226289
The information I have readily available and am adding were missing: grants (degree), accredited by, instance of (dental school), country, state, city, official web site, inception, and member of (ADEA), and parent organization. I plan on using the lists at [1]https://www.adea.org/dentalschools/ and [2]https://coda.ada.org/find-a-program/search-dental-programs a few at a time in the coming weeks. Any suggestions for fields I should be filling in? XKL (talk) 20:08, 28 October 2023 (UTC)Reply
Hi, thanks for the contributions, you could add coordinate and upload the picture of the school to Commons, and summon them to the item with image property. Please also check the definition of grants property. I think they are not intended for "degree granted" but rather fund, prizes, award received or given to entity. HA (talk) 04:18, 29 October 2023 (UTC)Reply
I disagree about the grants comment. Look at the description of grants: "confers degree, honor, award, prize, title, certificate or medal denoting achievement to a person or organization" The examples included are specifically degrees.
Thanks for the other suggestions. XKL (talk) 15:09, 29 October 2023 (UTC)Reply
Thank you for clearing that, apologize for the mistake, I also suggest you to follow other universities or institutions items. HA (talk) 16:08, 29 October 2023 (UTC)Reply
Return to the project page "WikiProject Higher education".