Wikidata:WikiProject Wikidata for research/Data models/Researchers

History

edit

This page has evolved from the necessity to centralize content for literacy purposes.

The inclusion of researchers' profiles (or authors of peer-reviewed articles) is crucial for the completeness and reusability of bibliometric data on Wikidata. In particular, given that this field constitutes a significant portion of Wikidata's content, it is essential to have a dedicated model for these types of biographical items.[1]

Previous guidelines have been created for example here (in Italian); this page is a product of the Wikidata:WikiCite/Researchers in Switzerland (Q125422363) project and converts in English various information from various discussions.

Termbox

edit

For the termbox please bear in mind primarily that the vast number of automatically generated items produced over the years tend to be quite minimal, thus an upscale is recommended:

  • for labels and aliases the majority of citation databases can be imprecise; the most common mistake could be the inadvertent omission of diacritics or the incorrect arrangement of first and last names during copy-pasting;
  • for descriptions please also take a look at the help page; it's advisable to refrain from referencing databases directly, as such details are already included in the statements (e.g. ORCID number); instead, descriptions should focus on providing more specific information relevant to the individual, such as their profession, especially in order to help human readers in disambiguating homonyms;
  • for aliases please also take a look at the help page; the use of initials with surnames should only be considered if explicitly mentioned in the sources. The form "surname, name" is not considered appropriate and previous occurrences are not progressively removed[2]

Statements

edit

Education

edit

Occupation, employer, affiliation

edit

It's not uncommon for properties to overlap on Wikidata;[3] The issue lies not in the existence of these properties, but rather in establishing guidelines for their usage, and oversimplified use and removal should be avoided.

Occupation

edit
  • occupation (P106): In the first position, always the occupation(s) (e.g. "historian," "biologist," "philologist"); in the second position, if applicable, a generic university teacher (Q1622272) (with possible qualifiers start time (P580) and end time (P582), but no others). Occupation tends to be more broadly defined and can encompass various phases of a career. It's typically sourced from national authority files (e.g. GND). Alternatively, it may be derived from self-descriptions in professional databases like LinkedIn, though this isn't considered as reliable since it's not third-party information. In bibliographic databases, occupation can often be inferred from the topic of research, but this isn't as reliable a strategy.

Affiliation and employer

edit

At a certain point for some users only employer (P108) was used, and affiliation (P1416) was considered not necessary, but they are not interchangeable. One should be extremely careful in drawing inferences, as partial or incorrect personal information may be provided in some public database.

Affiliations hold significant value for bibliometric analysis, reflecting an individual's academic network and institutional connections

"Employer" is a delicate concept that also encodes personal information, specifically who pays whom, with a very specific (or sometimes convoluted) legal meaning. For example, Ph.D. students might not receive a salary (and educated at (P69) covers the description of unpaid students, yet we need to specify their affiliation because they can also be multiple), and some professors or researchers may have dual affiliations while being paid by one institution.

In some countries (e.g. France) or for some positions (PhD students), employers (= people or institutions paying the salary) may be multiple and it may also happen that the affiliations (work done in a lab, for instance) are independent and/or multiple. Thus people can put both affiliations and employers in the section "employment" on ORCID. But the Research Agencies (ERC, ANR in France, NSF, etc) distribute money (sometimes salaries) and ask to be explicitly mentionned in the articles; then, some journals put them in "affiliation", which is the only section at your disposal.

These are some established key concepts as of now:

  • employer (P108) is always used as a standalone statement, not as a qualifier. The qualifier position held (P39) is used to indicate the specific position (research fellow (Q1706722), associate professor (Q9344260), full professor (Q25339110), etc.), and start time (P580) and end time (P582) are added if known. If a person has held multiple different positions at the same university, add a new value of P108 with a different value for the qualifier P39.
  • employer (P108) can have a limited use because of privacy concerns when actual employer information cannot be discolsed, whereas affiliation (P1416) is typically publicly available.
  • affiliation (P1416) is a piece of information that can often be inferred from specific articles (which is why it also has a twin-property for string values, affiliation string (P6424)). It can be sourced with bibliometric identifiers, making it manageable for those working with top-down massive bibliometric imports with no interest in adding more details or entering the nuance of the fiscal position of the researcher.
  • employer (P108) could ideally be sourced with links to official CVs stored on institutional databases or maybe LinkedIn profiles. One complete LinkedIn profile can state correctly for example when a Ph.D. position is also paid, hence appearing in both sections (employment and education). ORCID could be used but it says "employment", not "employer", which in some cases may induce a confusion: some fundig agencies are not technically employer under all aspects, but they could be included as such in some countries.
  • employer (P108) should not be used for Ph.D. students whose grant is not a real salary. This is the case in Portugal or in Italy[4], so they are affiliated and educated at a certain university but not employed. It can be used in other country such as Germany or Switzerland where salary for PhD student is also formally provided in exchange of some tasks, commonly taching assistant.

Identifiers

edit

For conflations and duplicates please use the page and subpages listed in Wikidata:WikiCite/Refinements for the support of more expert users.

Be sure to use identifiers that can be used as good and reliabe sources such as:

References

edit
  1. As of April 20th, 2024, there are approximately 41.5 million items on Wikidata describing scholarly articles out of a total of 109.5 million. This accounts for roughly 40% of the items, not including other types of publications and grey literature, as well as the items related to their authors. Hence, approximately half of the entire Wikidata ecosystem revolves around this topic.
  2. Such information can be included in the statement with subject named as (P1810)
  3. For instance, we have languages spoken, written or signed (P1412) and writing language (P6886), and despite the latter being described as a "subproperty of" the former, both are still utilized.
  4. In Italy doctoral students are not fiscally independent and can remain under their parents’ income tax declaration