Wikidata:ScienceSource project/Focus list

This project page is open to editing by all, but please consider using the talk page before making major changes. The top section on the talk page has a query for tracking the list.
There is a sublist of the focus list at Wikidata:ScienceSource focus list/Main subject needed. It consists of the entries that need main subject (P921).

Logo for ContentMine ScienceSource.svg
ScienceSource introduction

This is the project page for the Wikidata focus list of the ScienceSource project. This will be the starting point for downloads to the project's own wiki at It will also attempt to be a compact bibliography of open access biomedical literature, that addresses the systematic biases of the literature as a whole, most obviously the concentration on diseases that affect richer countries.

For a broad introduction to ScienceSource see the first video link.

Further materials for the project are:

What is the focus list for?Edit

ScienceSource will annotate 30,000 biomedical papers on its own wiki site at Which papers? The starting point will be the focus list created here. For reasons why this corpus of open access, biomedical will be more generally significant, see the second video link.

ScienceSource focus list video

What ScienceSource downloads will mainly be drawn from the list here. Since Wikidata now has a very large science bibliography, the list can mostly be constructed here, by tagging existing items with on focus list of Wikimedia project (P5008) and ScienceSource (Q55439927).

Community members in the wide sense can contribute. The focus list is a bibliographical project in its own right. For downloading, subsets will be taken. Having a machine-readable list here means that articles from a given publisher can be selected easily, and the Listeria tool allows any kind of sublist to be displayed on wiki pages. While the ScienceSource wiki is being set up, work and discussions can go on here. Please use the talk page here to ask for clarifications and to make suggestions.

The focus list has its own hashtag, #tagmedrefs.

Introduction to the listEdit

The intended use of the list is to tag instances of scholarly article (Q13442814), its subclasses,[1] and possibly other sources. It will therefore be used as a bibliography. The project's scope is biomedical literature. We hope that the list will build up into a first pass at a collection of the most valuable sources for referencing medical articles on Wikipedia.

ScienceSource's stated aim is to be able to screen articles for compliance to the Wikipedia guideline for medical referencing, w:WP:MEDRS. It will certainly exclude articles, with its algorithm, that are marginal in terms of MEDRS. The focus list, on the other hand, can be of articles that are broadly suitable. To tune the algorithm, it would be a mistake to exclude too much, too soon. Therefore the focus list, which will be maintained here, will include a broader range of material.

The focus list is likely to be the main source for articles that will be downloaded to the ScienceSource wiki.

How you can helpEdit

Participation in the focus list will be in two ways:

We'll welcome contacts with the project through other routes. We do not expect to build up a comprehensive bibliography all at once. And considering the goal of comprehensiveness, finding appropriate material for rare diseases, and overcoming geographical biases, will require help. (Even from bots! If you are a bot operator, you are welcome to help develop the focus list.)

Adding a single article to the focus listEdit

Before creating a new Wikidata item for an article, it's important to check if an article is already on Wikidata. There are several ways to do this:

  • If you have a DOI, you can look it up quickly (use the Resolver tool with DOI (P356), e.g. like this; keep in mind that DOIs on Wikidata are ALL CAPS).
  • Similarly if you have another identifier, such as PubMed ID (P698), you can look it up, knowing just the property number.
  • You can search Wikidata for the paper's title.

With over 17 million items for articles on Wikidata, it is likely that a DOI search will succeed: DOIs used as a reference on English Wikipedia have deliberately been added here.

In case of difficulty, please leave a note on the Talk page here. Items can be created for missing articles.

Then add the on focus list of Wikimedia project (P5008) statement to the item, with ScienceSource (Q55439927) as the object. Thank you!

Adding listsEdit

If you have bibliographical lists that you would like to add to the focus list, you will need to do a batch upload to Wikidata. There are some generic instructions for Wikimedia batch uploads on Wikimedia Commons, and these may be helpful to some users.

We can provide specific help for this campaign, so please look at Wikidata talk:ScienceSource focus list for a workflow. It is not very difficult to use the QuickStatements tool in this case, where a list of Q-numbers needs only to be followed by two constant columns.

Caveat: the number of items for articles on Wikidata is now so large that queries to may fail. For example, searching titles for a keyword, without further filtering, is unlikely to run. There are workarounds, so please formulate proposals for us.

Criteria to useEdit

Participation will require a small amount a familiarity with the MEDRS criteria. But it will not be a big deal to go outside those. The list will be filtered several ways before it is implemented.

The main criteria are each subject to qualifications.

  • ScienceSource aims to download open access literature; and cannot do that for closed access literature. On the other hand, the state of the licensing cannot always be determined very easily. Having some closed-access papers on the list is not in itself a problem, but more a kind of opportunity. For example, there are now over 50 million statements on Wikidata of the type "article A cites article B", and the articles cited in a closed-access paper, or those citing it, can be an interesting study or way to grow the list.
  • The major factor in MEDRS is "use the secondary literature". Most of the desirable articles for ScienceSource will therefore be types of review articles. But it should be said that excluding primary sources entirely would be unhelpful in terms of the larger goal of comprehensiveness. It will be important to understand that for some medical topics, there may not be suitable secondary literature.
  • Publishers should be respectable. "Predatory publishers" would rarely occur for Wikidata's article items: and if they are to be found, please help by flagging them up, for example on the Talk page here. There should be no assumption that the project wants to limit itself to the most prominent publishers, though. Again, to do so could undermine the aim of comprehensiveness. Briefly put, the blacklist here is more important than whitelists.
  • Normally medical referencing is to recent literature, meaning from the last half-a-dozen years by date of publication. Once more, exceptions can be made for the sake of better coverage, and that guideline should not be regarded as rigid.

Open access versionsEdit

There is infrastructure (unpaywall with over 18M articles, OABot) dealing with the issue of searching for more open versions of articles. We'll be interested in exploring that whole area, clarifying how it relates to ScienceSource's aims, and spreading understanding of the impact of intellectual property issues on Wikimedia.