Wikidata talk:WikiCite/Archive 5

Bibliography of Wikidata

Hi All,

I've created a Wikidata:Bibliography of Wikidata page, feel free to expand if interested.

Best, Adam Harangozó (talk) 13:01, 13 June 2020 (UTC)

@Adam Harangozó: Nice work. I might be good to ensure that all of those items have Wikidata (Q2013) as a main subject (P921) and have a listeria table or link to a query so that it doesn't fall out of date (avoid the fate of this page). T.Shafee(evo&evo) (talk) 12:52, 14 June 2020 (UTC)
Thanks! Sadly I don't know how to do that but I hope someone will. I'll add articles then, which don't have a Wikidata item yet, and the others can go through listeria. --Adam Harangozó (talk) 19:00, 14 June 2020 (UTC)
@Adam Harangozó: No worries. I'm only just learning to set up automatically-updating listeria tables, so I'll have a look into it. you might also find typing "wikidata" into this tool interesting. T.Shafee(evo&evo) (talk) 03:06, 15 June 2020 (UTC)

Affiliation mismatch with ORCID

While curating author affiliations, I noted that the Escola Superior de Agricultura Luiz de Queiroz (a unit of the University of São Paulo, though Wikidata does not know that yet) has a good number of people — e.g. Wanderley Marques Bernardo (Q89675057) — having employer (P108) statements with them that are referenced to ORCID but where their ORCID profile does not mention the school itself (though at least some of them mention the University of São Paulo) and who work on things not closely related to agriculture. Posting this here because I could not sort it out on the spot. Also pinging Sic19, since they seem to have been behind at least some of these edits. --Daniel Mietchen (talk) 21:00, 24 July 2020 (UTC)

@Daniel Mietchen: thanks for flagging this batch. I can tell you exactly what has caused this to happen; the Ringgold ID (P3500) for the University of São Paulo (28133) was erroneously added to Luiz de Queiroz College of Agriculture (Q5397345). When I reconciled the ORCID data against Wikidata using the organization identifiers this mismatching occurred. I've corrected the identifiers to prevent a repeat and will update the affiliations soon. There are other organization items with incorrect external identifiers and therefore we could have other mismatched affiliations - it is difficult to catch them, especially in large batches. It is more common for an identifier to be added to two different organizations but that is not a problem in this context because the reconciliation will fail. If you spot any similar problems in the future I would be grateful if you can send me the details. Simon Cobb (User:Sic19 ; talk page) 22:13, 24 July 2020 (UTC)
Resolved. Employer claims have been corrected per the referenced ORCID data. Simon Cobb (User:Sic19 ; talk page) 21:06, 25 July 2020 (UTC)

Property to use when describing the methods and equipment used in a scholarly publication?

also mentioned at Wikidata:Project chat

Are there existing properties to state the methods and equipment used in scientific publications, or are new ones needed? For example the publication Q27643422, uses the method Molecular replacement (Q17104122) and used the equipment Advanced Photon Source (Q2825375) (but clearly neither are a main subject (P921)). Eventually I'd like to see the methods and equipment used for all publications listed in their Wikidata items. T.Shafee(evo&evo) (talk) 10:47, 25 August 2020 (UTC)

Maybe we can extend the scope of describes a project that uses (P4510)? --Stevenliuyi (talk) 04:32, 26 August 2020 (UTC)
@Stevenliuyi: Ah thank you, that's ideal I think! I'd not even come across it previously! It' current usage certainly covers equipment/reagent side of things from its listed examples, but I think it should be clear in context when also using it for a technique/method. I'm not wild about its long label, but no huge problem. (ping @Jura1, Pasleim, TomT0m, JakobVoss, Fnielsen: some ppl involved in its creation/discussion ) T.Shafee(evo&evo) (talk) 12:00, 27 August 2020 (UTC)
I’d tend to be usually in favor of creating an item for the experiment. Create an item, link it to the publication, then use the usual properties on the experiment item to describe the experiment. Useless to duplicate all the properties like « uses » with « describes something that uses », « studies » vs. « describe something that studies » in my opininion. author  TomT0m / talk page 12:56, 30 August 2020 (UTC)
@TomT0m: Interesting, I'd not thought of that structure. Definitely a bit more complex, and a single publication could easily have a dozen experiments as part of it. Inclusion of methods and equipment is going to start of being extremely gappy (low coverage) since it's mostly not automatable yet. To start with I'm inclined to keep the information on the publication's item itself, but I can see it all being automatically migrated into linked items for each experiment later once it becomes more common and we want to make the structure more complex. T.Shafee(evo&evo) (talk) 00:29, 1 September 2020 (UTC)
I suppose we would also end up with "Wikidata project" in addition to Wikidata (Q2013). There is some discussion of this at Property_talk:P4510 --- Jura 05:50, 4 September 2020 (UTC)

How to model scholar articles produced in a project?

Hi, I'm having some trouble to figure this and any help would be appreciated.

Let's say a researcher is associated to an institution (not necessarily paid), and produces articles for that institution's project. How do we model that? funder (P8324) explicitly says the relation involves money; sponsor (P859) doesn't seems right either. I'm thinking something more like on focus list of Wikimedia project (P5008) or a "part of the project" sort of property.

Thanks in advance and good contributions, Ederporto (talk) 04:54, 4 September 2020 (UTC)

@Ederporto: Conceivably affiliation (P1416) could be useful. T.Shafee(evo&evo) (talk) 05:45, 4 September 2020 (UTC)
Thanks, @Evolution and evolvability:. Re-reading my question, I think it was dubious what I wanted help with. I am thinking in how to model the articles. How do I say an article is part of a project? Again, thanks for your answer. Good contributions, Ederporto (talk) 15:17, 11 September 2020 (UTC)
For example, Modelos de redes neurais com neurônios estocásticos e diferentes topologias: construção e análise (Q98799554). I added part of (P361)NeuroMat (Q18477654). As you can access in page 6 of the pdf of this article ([1]), "This thesis was produced as part of the activities of FAPESP Research, Innovation and Dissemination Center for Neuromathematics (grant #2013/07699-0, S.Paulo Research Foundation".
@Ederporto: Ah, I see what you mean now, how to model that the article is part of a project (my previous answer was about modelling a contributors' relationship with an institution). I agree that part of (P361) is pretty sensible. It's also flexible in case there are multiple projects that a single publication could be a part of. T.Shafee(evo&evo) (talk) 03:19, 12 September 2020 (UTC)

merging articles and their peer reviews into single items?

Hello all. I just noticed that the peer review and author response for an elife paper has been created as a separate item (there are likely others that have been similarly imported):

I think it's best to link to the url mentioned in the second item in the former as peer review URL (P7347), since all other information is duplicate. Any opinions? T.Shafee(evo&evo) (talk) 12:16, 14 September 2020 (UTC)

Two WikiCite grant programs - applications closing soon

I just wanted to make sure that people who frequent this project page are aware of these two grant programs currently open, highly relevant to this Wikiproject's activities. They were announced on the main project chat a month ago (and various other places) but I wanted to write here specifically too. Apply by 1 October.

1. Project & events [$2-10k]

2. e-Scholarships [per-diem calculated on your city; 1-5 people (single, or as a 'remote group') for 2-4 days, for COVID-era "stay at home" projects. Paid in advance living allowance, no expense report required.]

There is lots of documentation, eligibility requirements, selection criteria, program design principles at those links. Please check them out. Sincerely, LWyatt (WMF) (talk) 13:57, 18 September 2020 (UTC)

Meet the committee, grant deadline (apply/endorse), video tutorials

The Source MetaData WikiProject does not exist. Please correct the name.
As we draw near to the application deadline of the two WikiCite grant programs, the steering committee thought it would be useful to have a video call for anyone to chat with us; to encourage you to review/endorse proposals; to remind you can still apply yourself; and to announce a video tutorials call for applications.


1. Meet with the steering committee.
We will be hosting TWO open calls using GoogleMeet in the next few days, at different times to be available to different timezones. [this is short notice, sorry. But we wanted to make them before the grant application deadline].


2. Please comment, endorse and/or apply for the WikiCite grants. The deadline is 1 October.


3. Call for proposals: Video tutorials.
Today we are also launching a "Project Brief & call for proposals" for a series of video tutorials about some of the main tools and topics that are used in WikiCite activities. Please share with anyone you think might wish to apply for this contract.

Sincerely, LWyatt (WMF) (talk) 16:21, 24 September 2020 (UTC)

JEL classification

Hi, I just discovered gestion financira (Q1501332) (original source, I was curious: is there any plan to create a property to improve the metadata descriptions of the itemw of articles in the field of economics? it looks quite used there, but I don't see any link to that item, and no previous discussion. It's not my field, but I leave a comment for the future.--Alexmar983 (talk) 20:53, 11 October 2020 (UTC)

Court documents in Wikidata

What do you suggest one do with court documents and legal cases in Wikidata?

What documentation exists for how these different classes of items should be entered and linked in Wikidata?

Thanks, DavidMCEddy (talk) 04:03, 12 November 2020 (UTC)

What to do if Source MetaData doesn't generate an item

I created U.S. Geological Survey data release (Q104355278) and provided a DOI prefix (P1662) 10.5066. I am curious if there is anything else I must do so that items such as DOI: 10.5066/P98J7DRO generate an item fully. The DOI exists from DOI:org but Source MetaData for that DOI generates "CrossRef lookup has failed!" Please let me know if I can do anything to make these USGS data release DOIs work through Source MetaData. Thanks, Trilotat (talk) 18:08, 19 December 2020 (UTC)

I don't know the answer to this, but if it helps others in troubleshooting, below is the data that Crossref has on the article above (looked up using rcrossref):
  doi = {10.5066/P98J7DRO},
  url = {https://www.sciencebase.gov/catalog/item/5fa4abb4d34ed3698f90ea3a},
  author = {Hempel, Laura and Kisfalusi, Zach and Creighton, Andrea L},
  keywords = {Geomorphology},
  title = {Elevation Data from Fountain Creek between Colorado Springs and the Confluence of Fountain Creek at the Arkansas River, Colorado, 2020},
  publisher = {U.S. Geological Survey},
  year = {2020}
Hope that helps in tracking down the issue. T.Shafee(evo&evo) (talk) 01:16, 20 December 2020 (UTC)

Adding a ShortDOI property

Proposal

I would like to propose adding a ShortDOI property. ShortDOI are used to provide shorter DOIs than things like that: 10.1002/(SICI)1097-0258(19980815/30)17:15/16<1661::AID-SIM968>3.0.CO;2-2

it gets translated to 10/aabbe

They are created by an official DOI service (which has an API): http://shortdoi.org/

And can be used on [[2]] interchangeably with DOIs on all their services and will always refer to what the original DOI points to.

I would use the same rules as for DOI in term of restrictions until we can find more from DOI Foundation (they are likely only using letters and numbers, but we need to make sure).

Alternatives

The other solution would be to have two statements with a DOI property, but we would see that as messier as we wouldn't know which one is the short doi unless using a qualifier.

Hope it makes sense.

@AdrianoRutz:@GrndStt:@Egon_Willighagen:

For the Wikidata:WikiProject_Chemistry/Natural_products project, Bjonnh (talk) 16:09, 4 January 2021 (UTC)

@Bjonnh: This sounds like a good idea - you should propose it at Wikidata:Property proposal. ArthurPSmith (talk) 19:36, 4 January 2021 (UTC)
@ArthurPSmith: I added a proposition for the new property: Wikidata:Property_proposal/Creative_work#Short_DOI. Also the service is provided by the DOI Foundation not CrossREF, and CrossREF cannot use those shortened DOIs they have to be resolved first (I edited my question above to reflect that) Bjonnh (talk) 20:17, 4 January 2021 (UTC)

Encyclopedia articles and notability

Hi all! I've recently proposed the deletion of some tens of thousands of items created by @LargeDatasetBot: having instance of (P31)encyclopedia article (Q13433827) + title (P1476) + publication date (P577) + DOI (P356) + rarely author (P50) + rarely (but it should be always) published in (P1433) but no sitelink to Wikisource or whichever Wikimedia project; specifically:

In my opinion, as I have expressed in the deletion request, they don't fall into WD:N, but I think the policy is unclear in the case of encyclopedia articles: while the ones present in Wikisource are surely notable per N1, I think that articles not present on Wikisource aren't notable because all the information present in these items can be stored as qualifiers of the respective IDs (Benezit ID (P2843), Electronic Enlightenment ID (P3429), Grove Art Online ID (P8406)) present in the items which are subjects of these articles and, if no item about the subject of the article exists, it can be created and then it will be possible to act in the above way. Additionally, Help:Sources doesn't seem to mention the need of items about encyclopedia articles. And here in Project chat there was seemingly consensus for non-notability of these items.

I write you here in order to take a clear decision about the notability or non-notability of encyclopedia articles in absence of Wikisource pages containing them. If there is consensus for notability (overturning previous consensus in Project chat), it should be clearly stated in Help:Sources. If there is consensus for non-notability (or silent consensus about the past consensus in Project chat), it should similarly be clearly stated in Help:Sources and I will ask for the bot-deletion of the above items. --Epìdosis 10:16, 11 January 2021 (UTC)

Considering silent consensus, I have edited Help:Sources (here) and filed Wikidata:Bot requests#Admin bot for deletion of 100k non-notable items for the deletion of the items. --Epìdosis 16:52, 19 January 2021 (UTC)
@Epìdosis: I see this message a bit late, sorry. I kindly disagree, there is value to have the specific items about specific articles (not having the main item having thousands of claims maing it unreadable for instance) and it's easier to use as a reference. Plus, I don't see why we could accept all articles (magazine article, scientific article, etc.) but not encyclopedia articles.
That said, the items you speak about are indeed bad, but I think improvements is better than deletion and don't make a generality of some bad examples (even if there is a lot of bad examples, but we have even more bad example of book (Q571) and clearly we should not delete them all).
Cheers, VIGNERON (talk) 12:12, 20 January 2021 (UTC)
Thanks, I substantially agree; given also the discussion in Wikidata:Bot requests#Admin bot for deletion of 100k non-notable items yesterday, I have undone my edit on Help:Sources. --Epìdosis 13:15, 20 January 2021 (UTC)

Predatory publishers

There are two open but out-of-date databases ([3] [4]), and one up to date but proprietary list ([5]). I've put an intended plan of action and some record of what's being done below. Feel free to directly edit. T.Shafee(evo&evo) (talk) 05:44, 21 September 2020 (UTC)

Plan

  1. Check which predatory publishers already have items (First: 2016 StopPP list, Second: 2017 Beall's list)
    • Ensure reference included for instance of (P31) statement
    • Neither database includes easy identifiers (just name and URL, sometimes a common abbrev), so matches will require checking.
  2. Create items for missing predatory publishers
  3. Check which predatory and hijacked journals already have items (2017 StopPP list)
  4. Check for items in DOAJ (evidence against predatory status?)

Record of activity

Note: Relevant previous work by the ScienceSource project. Of course, many of the publishers and journals are included in wikidata, but their instance of (P31) doesn't indicate predatory nature (predatory publisher (Q65770389)/Template:Q65770378). T.Shafee(evo&evo) (talk) 05:44, 21 September 2020 (UTC)

Size of predatory journal/publisher lists:
List Last updated Journals Publishers
Beall 2016-12-31 1250 1163
anon update to Beall 2020-06-09 Beall + 168 Beall + 141
StopPP 2017-05-07 1317 predatory + 115 hijacked 1177
Cabell 2020-09-13 13757 + ~500 'under review'
Wikidata (2020-09-21) 2020-09-21 1 0
I've now gotten WikidataR to the point where it can be used for downloading the data from the sources in the table above and trying to resolve it into wikidata. I'll post the code as work through it, so anyone with R experience is welcome to assist! T.Shafee(evo&evo) (talk) 05:13, 12 February 2021 (UTC)

Discussion

The most current alternative is Cabells' proprietary paywalled list (terms of use) so probably precludes any use at all. I have access through my institution and have spot-checked random items and there area a lot not covered by the StopPP database.

Questions:

  • What other qualifiers are needed for a predatory publisher?
  • For hijacked journals, any ideas on how to identify dates of hijack?
  • Both main lists haven't been updated since 2017. How can more recent predatory journals best be indicated without those as a reference?

T.Shafee(evo&evo) (talk) 05:44, 21 September 2020 (UTC)

Retracted articles

I've put an intended plan of action and some record of what's being done below. Feel free to directly edit. T.Shafee(evo&evo) (talk) 05:44, 21 September 2020 (UTC)

Plan

  1. Pull all retracted and retraction items from pubmed (retracted+retractions)
  2. Where possible link retracted papers to their notices and vice versa
    • About 10% of retraction notice items indicate the corresponding doi of the retracted article in their abstract (download as 'PubMed' format)
  3. Pull all retracted items from crossref
  1. Pull non-copyrighted info from retractionwatch (retractions table; terms of use)
    • I think we can fair-use retractionwatch’s list of corresponding dois for the retracted papers and their retraction notice
    • We probably can’t use retractionwatch’s “Reasons” column, since that’s data directly created by them so is their ‘expression of an idea’.
    • Their table will only display 600 items at a time

Record of activity

Size of retraction lists:
List Retracted papers Retraction notices
Pubmed 7914 8274
Crossref 4936
RetractionWatch >24000 >24000
Wikidata (2020-09-21) 1108 (of which 979 connected to notice) 1661
As with the previous section, I've now gotten WikidataR to the point where it can be used for downloading the data from the sources in the table above and trying to resolve it into wikidata. I'll post the code as work through it, so anyone with R experience is welcome to assist! T.Shafee(evo&evo) (talk) 05:13, 12 February 2021 (UTC)

Discussion

Any other ideas for databases? IS there a way to pull retraction (or erratum) info from crossref? T.Shafee(evo&evo) (talk) 05:45, 21 September 2020 (UTC)

We do have is retracted by (P5824) and corrigendum / erratum (P2507).--Stevenliuyi (talk) 19:36, 21 September 2020 (UTC)
@ArthurPSmith, Stevenliuyi: The example I've been using is:
  • Q61957492 is the original article, and stated when the retraction started (and ended)
  • Q29030043 is the original retraction notice
  • Q58419365 is the retraction of that retraction notice
I think it's probably the most complicated case (of a retracted retraction), so if we can get a system that works for that, then it should be pretty robust overall! Here are three ways to encode the retracted status:
Dates are included as qualifiers - of course, it'd be great to include qualifiers for the reason for retraction but that's far harder to automate. T.Shafee(evo&evo) (talk) 23:49, 21 September 2020 (UTC)
Ah, that looks good, I'd forgotten we had those properties! ArthurPSmith (talk) 00:11, 22 September 2020 (UTC)
Edited comment above to correct to retraction (Q45203135) no that that item has been separated from rectification (Q26256296). T.Shafee(evo&evo) (talk) 11:46, 23 February 2021 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── Also of relevance is this category in enwp: w:Category:Articles_unintentionally_citing_retracted_publications. T.Shafee(evo&evo) (talk) 05:28, 1 March 2021 (UTC)

Discussion about descriptions of scientific articles

See https://www.wikidata.org/wiki/Wikidata:Project_chat#Let's_reach_consensus_on_descriptions_for_our_scientific_article_descriptions and continue here.--So9q (talk) 22:29, 26 January 2021 (UTC)

More detail is probably generally better, but we shouldn't make descriptions overly long. There are some cases where the same title ("Reply" for example) is used for a very large number of articles, so a more detailed description may be needed in those cases. ArthurPSmith (talk) 19:17, 27 January 2021 (UTC)

Standardised Data on Initiatives

There's a paper I've been working on with user:JackNunn that's highly relevant to this community: Standardised Data on Initiatives – STARDIT: Beta Version. From the Wikidata point of view, it's a way of getting people to provide standardised metadata on different projects (especially the sorts of metadata that's often omitted). Additional co-authors to the paper are being invited to contribute before it's finally submitted (likely co-submission to Research Involvement and Engagement and also simultaneously published in the WikiJournal of Science). If it's the sort of thing you'd like to pits in on, here's:

Obviously, feel free to share this with anyone else who might be interested in being involved! T.Shafee(evo&evo) (talk) 11:29, 17 February 2021 (UTC)

It might be easier to read the mapping here with links to the relevant/properties items. --- Jura 11:50, 17 February 2021 (UTC)

Good point. I've pasted some initial examples below (and collapsed the table above for ease of reading)
Example wikidata items created
Original plain text PDF Converted to wiki plain text Converted to wikidata item DOI of resulting publication
ASPREE-XT AdditionalFile5 wikispore:STARDIT/Q98539361 Q98539361 10.21203/rs.3.rs-54058/v1
Rare condition genomics AdditionalFile2 wikispore:STARDIT/Q100403236 Q100403236 10.21203/rs.3.rs-62242/v1
what is a systematic review - wikispore:STARDIT/Q101116128 Q101116128 10.15347/WJM/2020.005
T.Shafee(evo&evo) (talk) 22:36, 17 February 2021 (UTC)

Harvesting ORCID from Wikipedia

Has anyone done that? The authority control template has 1713168 transclusion(s) and we should have them all in WD IMO. https://templatecount.toolforge.org/index.php?lang=en&namespace=10&name=Authority_control#bottom WDYT? @pigsonthewing:--So9q (talk) 10:16, 25 February 2021 (UTC)

This was done in the early days of ORCID iD (P496); these days most instances of {{Authority control}} on Wikispecies pull their data from Wikidata, so I'm not sure we would get much, if anything, from Wikipedias. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:54, 25 February 2021 (UTC)
I wonder how many articles refer to an ORCID, but link to an item without. If there is also an item with an ORCID for that person, they should be merged. Is this what your looking to resolve @So9q:? Trilotat (talk) 14:10, 1 March 2021 (UTC)
I don't understand what you mean, could you give an example?--So9q (talk) 21:48, 2 March 2021 (UTC)

Shared Citations

Many of the people interested in this Wikipeiject will be interested in a new technical proposal I am currently working on called "Shared Citations". It is not one of the items in the 2018 "WikiCite roadmap" document, it is not the same as WikiCite, but it is definitely building on the momentum and could be considered the 'spiritual successor'.
This is a proposal for the Wikimedia Foundation to create a database of Wikimedia Projects' citation records - linked to from any given footnote/reference; and associated improvements to cross-wiki monitoring and editing. It would be agnostic to the notability/reliable source rules of any given wiki (including WD) These two pillars would empower community-managed workflows and tools to:

  • Make citations easier for the editor,
  • more useful for the reader,
  • and more efficient for our architecture.

This would obviously have a strong relationship to the data structures built on WD (many of which were built by this wikiproject's efforts) and there's a specific subsection of the proposal page explaining how it relies upon, but is different to, Wikidata – how it is shaped as a deliberately "support" or "service" wiki, not a new competing Sister project.
Read more about this proposed project at Meta:WikiCite/Shared Citations.

-- LWyatt (WMF) (talk) 15:06, 1 March 2021 (UTC)

I don't think it would "solve" books/works/edition problem - that is also a problem that library catalogue system designers have been trying to solve for decades too... Nonetheless, it might provide some clarity because of the deliberately practical scope of the Shared Citations database. The intention, as proposed, is that it would have records for the specific work being referenced in a Wikimedia page (e.g. a Wikipedia footnote). That therefore implies that the Shared Citation record is at the level of "'Les Miserables', Penguin books, hardcover, Klingon translation (2019)". Other editions/translations/versions would receive their own separate Shared Citation records. There would not be a Shared Citations record for "Les Miserables" in general - that belongs in Wikidata. Wikidata could, if it wanted, also have records for the lower-order levels if it chose - down to "specific individual object" if it was a notable item (e.g. a famous medieval bible).
I'm not sure that there's really any work/edition problem to be solved, except for some inconsistencies that have been created in Wikidata. An "edition" item should be a particular distribution of a work, which can be used as a reference, by being able to rely on the presence of expected text and even page numbers. I.e., the connection between edition items and "shared citations" would be one-to-one. Work items, on the other hand, are used to connect edition items and link with external databases of works. The inconsistencies come from the creation of "work" items which should have been editions, including items for things like journal, magazine or newspaper articles, which refer to a particular distribution of a work (and creating a work item won't usually be necessary, except in articles which have been published multiple times and each is wanted in a citation). Ghouston (talk) 03:50, 3 March 2021 (UTC)
Shared Citations should be agnostic to whether a Wikidata item exists for the overarching work, or author, or publisher. But if there is one in Wikidata - so much the better. Hopefully, it would be able to help create 'redlists' of commonly re-cited works/authors that do not [yet] have wikidata items. I've implied something to that effect in this wireframe example. The benefits of being able to do more fancy queries (e.g. assessing the gender/century/birthplace of authors cited in all articles in a Wikipedia category) would come with the connectivity to Wikidata (not by Shared Citations itself).
I don't think I've quite accurately answered your specific question but hopefully I came close. LWyatt (WMF) (talk) 11:42, 2 March 2021 (UTC)
  • At some point in my involvement in Wikimedia projects I was very hopeful and enthusiastic at the idea that WD was going to be a repository of citations for Wikipedias (like the Commons is for multimedia files). I believed WD would help users with creating references and prevent needless duplication of the same references across different language versions of the same article. As time goes by and millions of items of scientific articles accumulate in WD the idea still seems to be far off. Kpjas (talk) 18:53, 1 March 2021 (UTC)
I've tried to design this proposal to deliberately stay agnostic about the debates about notability, scale, purpose of Wikidata. This is a proposal to create a database to support other Wikimedia projects - not to compete as a new sister project. Crucially, it has the very practical/limited (but still large/ambitious) scope of only dealing with things used in Wikimedia's "references". When WikimediaCommons was created it originally had the idea to be a service-project (merely to centralise media storage) but rapidly grew its scope to also host any frely-licensed multimedia. I do not believe "Shared Citations" would follow that path precisely because Wikidata already exists. This is a narrower scope than wikidata with a very specific ontology required - supporting Wikimedia's references. Wikidata exists for everything more ambitious :-) LWyatt (WMF) (talk) 11:42, 2 March 2021 (UTC)
  • I'm pretty excited to see this moving forward as one thread of getting citation metadata more freely available. I suspect this whole topic starts to get into the topic of federated wikibases, which I don't purport to fully understand. However, I think we are at least getting to consensus that a structured item for every cited item is needed and I'm pretty agnostic as to whether that's technically on wikidata or a connected but separate knowledgebase, so long as the front-end user doesn't need to care. T.Shafee(evo&evo) (talk) 09:44, 3 March 2021 (UTC)

Diamond OA journals

A new list of diamond OA journals has just been published (many of which are not in DOAJ) as part of a project:

What do people think about inclusion of non-DOAJ listed diamond OA journals in Wikidata? It could end up being the most complete repository for such items. T.Shafee(evo&evo) (talk) 08:03, 10 March 2021 (UTC)

@Evolution and evolvability: I support the indexing in Wikidata of course and I could even support labeling green/gold/black/diamond/whatever as properties. I have a personal opinion that all the labeling and distinction about green etc. has been a really distracting error that takes up too much of the documentation and prevents people from having conversations about open access. It is hard enough to explain to newbies the difference between open and non-open, and my opinion is that it is too much of communication challenge to overcome to also bring into the conversation new terms like diamond. The open access community is not very large and Wikipedia is a big part of it. As a stylistic decision for clarity, I wish in wiki we could stay on point just talking open/non-open, and tell the rest of the open access community to simply the conversation after our lead. "Diamond" does not even have anything to do with ability to access content, and is just about the journal's history of business practice. Blue Rasberry (talk) 20:45, 11 March 2021 (UTC)

Bias in medicine in all journal indexing

Everyone here already expects this but check out this search for law journals (thanks Daniel Mietchen for the query):

SELECT DISTINCT ?item ?title (COUNT(?work) AS ?works)  
WITH {
  SELECT ?item ?title  WHERE {
    ?item wdt:P236 ?issn ;
          wdt:P1476 ?title .
    FILTER REGEX (LCASE(?title), "(\\blaw|\\blegal|\\court)")
  }
}
AS %journals

WHERE {
  INCLUDE %journals
          ?work wdt:P1433 ?item .
}
GROUP BY ?item ?title
ORDER BY DESC(?works)
Try it!

Here we see that among the top law journals indexed in Wikidata there are only a few examples outside the field of medicine. This is not just a Wikidata bias but a library cataloging bias due to medicine being much more open with metadata and law intentionally being much more closed with metadata. This came up because I was talking with some researchers at my university law school who were asking about representation of law in Wikipedia. I have no expectation that there is an easy solution to this, however, I am going to have a conversation through our librarians with HeinOnline (Q5699635) which is a major United States legal database for law research. I want their metadata. Blue Rasberry (talk) 20:53, 11 March 2021 (UTC)

  • I don't think we do "all journal indexing". So there isn't really a bias. People interested in some fields contribute, that's all. --- Jura 23:22, 11 March 2021 (UTC)
    • There is more to it. To import articles metadata is needed. There is actually a bias towards physics, biology, compsci, medicine when looking at availability of open metadata. --SCIdude (talk) 16:07, 17 May 2021 (UTC)
@SCIdude, Jura1: SCIDude - right, I am talking about the availability of metadata to import. Jura, for the past some months I collaborated with a team of researchers looking at education journals and who wanted to make scholarly research available for local administrators at primary education schools in the English speaking world. It is an odd idea to think that some kindergarten teachers would want to explore academic articles if they were not researchers themselves, but I came to observe that enough such people exist. When such people are interested to contribute, they may not have a peer network and existing metadata to make editing as easy as it is in those fields SCIdude has named.
I expect that everyone here knows this. The new information that I have to share is that I have seen growing interest in other fields getting access to source metadata. By end of summer I expect to be able to share some data in Wikicite for education papers processed with machine learning to identify general main subjects, like "mathematics education" or other simple top level classifications. Blue Rasberry (talk) 20:17, 17 May 2021 (UTC)

Grant proposal to build a Visual Editor for Citoid Web Translators

Hi everyone! With Diegodlh we are presenting a grant proposal to build a visual editor for Citoid web translators. The goal of this visual editor is to make it easier for non-technical users to create and edit Citoid web translators to increase website coverage of this citation metadata retrieval service. The grant proposal is open for feedback and endorsement, and we are also seeking for collaborators and volunteers, so if you are interested, please visit the grant proposal page! --Scann (talk) 12:25, 18 March 2021 (UTC)

Wikicite-Zotero plugin: simple entity creation vs author disambiguation

Hi, all! I'm developing a WikiCite plugin for Zotero, that provides citations metadata support. One of the plugin's features is synchronization of "cites work" (P2860) relationships between local Zotero library and Wikidata. For that, both citing and cited items must be entities in Wikidata (i.e., they must have QIDs). The plugin can fetch QIDs from Wikidata; it currently does so using unique identifiers (DOI and ISBN) and SPARQL queries, but next pre-release will use Wikidata's reconciliation API to use title as well.

If no exact match is found in Wikidata, the plugin will offer a series of possible candidates to choose from. If no relevant candidates are found, the user will be offered to create a new entity in Wikidata. In principle, I wanted to have the plugin handle this, using the Wikidata API. However, I thought that having to deal with author disambiguation (to use P50 -author- instead of P2093 -author name string- where possible) would be too much work for now.

So, in the end, I inclined toward using zotkat's QuickStatements translator to output QS commands. However, I see QuickStatements doesn't provide a simple way to disambiguate author name string (P2093) statements. The user has to manually: (1) locate these commands, (2) disambiguate them, and (3) replace them with author (P50) statements.

My questions, therefore, are:

  • My assumption was that enabling the user to easily create new entities from the plugin using author name string P2093 statements by default (where P50 statements might be possible) would be undesired. But maybe my assumption is wrong. Would it be OK to do this, as long as the reconciliation API is used to minimize chances that duplicates would be created?
  • If, otherwise, QuickStatements is preferred, am I missing a simpler way of replacing author name string with author statements? If not, does it make sense that one such simpler way would be worth developing?

Thank you! --Diegodlh (talk) 18:41, 29 March 2021 (UTC)

@Diegodlh: Are you familiar with Author Disambiguator? Reconciling author names to author items is quite tricky in general (even ORCID's are not perfectly reliable - often the data doesn't include exactly *which* author has the ORCID and you still have to do a name match of some sort). So I would encourage you to proceed with just directly creating the items using author name string (P2093) as you suggest was the original plan. ArthurPSmith (talk) 19:08, 29 March 2021 (UTC)
@ArthurPSmith: Thank you very much for your reply! I'm familiar with the Author Disambiguator tool, yes. Great tool! User:Egon_Willighagen suggested trying to fetch ORCIDs from Crossref and use these to try and disambiguate author name strings into author entities. It was commented this is what the citation.js QuickStatements plugin does. I will consider this option as a intermediary option between no disambiguation at all, and full author disambiguation from the plugin. But in the meantime, creating the items using author name string (P2093) sounds great. Thanks! --Diegodlh (talk) 19:53, 29 March 2021 (UTC)
Yes, the Crossref data is probably a good source, I think they do link the ID to the specific author, at least for most papers where that relation is there. ArthurPSmith (talk) 12:31, 30 March 2021 (UTC)
I may use the QuickStatements export translator anyway, as an intermediary step, to take advantage that it is already taking care of part of the translation from Zotero item to Wikidata entity. I have therefore opened a thread in the QS translator repository asking whether fetching ORCIDs from Crossref (and reconciling published in (P1433) values) might be something to do within the translator itself, or afterwards by a tool converting QS commands into MediaWiki API requests. --Diegodlh (talk) 02:27, 31 March 2021 (UTC)

All upper case for titles?

I've run across some scientific article items where the Title is in ALL CAPS. I suspect this is how they were entered in whatever database a bot pulled them from. My question is: is there a preferred case for this property for scholarly articles? Should the case in Wikidata match the case used in the journal? Convention in many journals is capitalization of first letter and proper nouns, but this is certainly not consistent across time or journals. Thanks.--Friesen5000 (talk) 20:04, 27 April 2021 (UTC)

As far as possible the statements should match whatever the journal article currently shows (if it is available online), and not what other databases may say about it. ArthurPSmith (talk) 21:13, 27 April 2021 (UTC)
@Friesen5000: I agree with ArthurPSmith. A complicating factor, however, is that sometimes journals will use allcaps almost as a 'font' rather than really meaning that the letters are really all capitals. T.Shafee(evo&evo) (talk) 09:00, 28 April 2021 (UTC)
@Evolution and evolvability: Yes, I find the upper case 'font' for article titles is especially prevalent in older literature. Thank you and ArthurPSmith for you input. I'll leave them as I find them.--Friesen5000 (talk) 18:22, 28 April 2021 (UTC)

Wikipedia Citations in Wikidata

Several people in this group were interested in/endorsers of the WikiCite grant proposal: m:Wikicite/grant/Wikipedia Citations in Wikidata.
Today, the team at OpenCitations tweeted that the code for the project was now available here: https://github.com/opencitations/wcw

As described on that page:

"It's a collection of scripts that can be used to extract citations from the English Wikipedia to external bibliographic resources, and then to upload them to Wikidata. Our goal is to develop four software modules in Python (the codebase from now on) that can be easily reused by developers in the Wikidata community:
  • extractor a module to extract citation and bibliographic information from articles in the English Wikipedia;
  • converter a module to convert extracted information into a CSV-based format compliant with a shareable bibliographic data model, e.g., the OpenCitations Data Model;
  • enricher a module for reconciling bibliographic resources and people (obtained in step 2) with entities available in Wikidata via their persistent identifiers (primarily DOIs, QIDs, ORCIDs, VIAFs, then also persons, places and organisations if time allows);
  • pusher a module to disambiguate, deduplicate, and load citation and bibliographic data in Wikidata that reuses code already developed by the wikidata community as much as possible.
The repository folder structure reflects these same modules that constitute the entire workflow."

There are more details at github. I will leave it to the team themselves to answer any questions etc., I just wanted to share :-) LWyatt (WMF) (talk) 13:25, 3 May 2021 (UTC)

Translated scientific articles

My name is Victor Venema and I am member of a new initiative on translations of scientific articles/texts. We want to make it easier to find translations and (thus) make it more worthwhile to make them. Wikidata would be a good place to store such information, but there seem to be many ways to do so and I was wondering whether we could give people some guidance on the best way.

Jakob Voß made a query to find translations of scientific works for us. This query filters out scientific works by asking whether a publication has a DOI. (To see how adding a translation works I recently added a WMO report I wrote, which does not have a DOI.)

Everyone does it differently, but there seem to be two main methods. 1) One Item, which uses the "full work available at URL" with the language as qualifier. 2) Create one Item for the work and multiple Items for the translations as edition of the work. The former creates less Items, but may not always work, sometimes we only know, e.g., that the British Library has the translation, but do not have a URL. The latter may be how librarians like it, especially when it comes to books, which anyway may have multiple editions, but is more work.

Titles are sometimes translated in multiple description, sometimes an Item has multiple titles (not always with language as qualifier, sometimes just between brackets in the title). Translated descriptions are a bit of a problem because multiple Items cannot have the same English description; I solved that for my report by adding the language in brackets to the description.

What do you think? VVenema (talk) 16:18, 13 April 2021 (UTC)

@VVenema: (2) is probably the best approach for the way Wikidata works; a translation may be published in a different journal and have a number of other differing attributes (DOI, page number for older cases, etc.) and presumably has at least the additional attribute of a translator, though that could be added as a qualifier for full work available at URL (P953) too. Otherwise if all but a small number of attributes of the translation are identical to the original then maybe (1) would be ok. But generally I would go with (2). ArthurPSmith (talk) 17:21, 13 April 2021 (UTC)
@ArthurPSmith: I am new here and was waiting for more feedback, but I guess this was it. May I assume that more people have read this exchange than people who responded and that they did not respond because they mostly agreed? I am fine with suggestion (2), it was also how I in the end did it myself for the report I used to test how adding a translation works. When I did this test I was a bit overwhelmed by the large number of options on how to do this. Would it make sense to write some sort of guidance on how to add translations to Wikidata? It would have saved me quite some time exploring the options and picking one. VVenema (talk) 16:59, 26 April 2021 (UTC)
You'll get the most feedback with a post on Project chat. If you are looking for some sort of vote or consensus, there is an RFC process... but yes I am sure at least a few other interested people track this page and would have responded if they thought they had something to add. Adding a piece of documentation on what you are doing and guidance would be great, and this is the place for it - you can see all the sub-pages of this WikiProject on the main project page here. ArthurPSmith (talk) 17:34, 26 April 2021 (UTC)
  • @VVenema: Scientific articles generally just have one item (not a work and an edition item as some try to do for books). WMF struggles already with the current load, so we can't really maintain a second set of 30 million items for the same articles on Wikidata.
For translations of an articles, it would probably be good to have a separate item for each translation and link that to the article item. Maybe we should have a new pair of properties "has translation" and "translation of" to link them together. The more general P:P747 and P:P144 have a different primary focus.
As it's fairly common for articles to have an abstract in English, I don't think abstracts only should lead to the creation of new items. The same for title translations. As you probably noticed, both aspects are already handled (partly) on the existing items.
Essentially, this would lead to the creation of a fairly short item about the translation linking to a more general one about the article that was translated (avoid repeating any information present there). --- Jura 14:42, 3 May 2021 (UTC)
Return to the project page "WikiCite/Archive 5".