Wikidata software profiling hackathon, June 6&8 edit

Those interested in software + Wikidata are invited to the Scholia Hackathon 6&8 June 2022.

WD:Scholia is a Wikidata front end which does scholarly profiling, and is best known as tool for browsing the WikiCite collection of WD:WikiProject Source Metadata.

An example Scholia profile for the software Stata (Q1204300) is

Anyone interested in examining any part of Wikidata connecting to software is welcome. Bluerasberry (talk) 20:40, 19 May 2022 (UTC)Reply

Suggestions on adding affiliation string to author names edit

Hi there. I used my automated tool to create a scholarly article item that added affiliation strings to each author from ADS database to Wikidata. Link is here: https://www.wikidata.org/wiki/Q113322652 Would adding affiliation string to author or author name string be useful? I'd like to hear advise and your suggestion. Feliciss (talk) 12:29, 28 July 2022 (UTC)Reply

@Feliciss: Yes, adding these would be useful, but I would prefer they be exactly as in the article, not parsed/edited. For your example, the Stanford affiliation in the article is listed as "Division of Applied Mechanics Stanford University, Stanford, CA 94305, U.S.A.", so I would think that should be the string used here? ArthurPSmith (talk) 20:36, 28 July 2022 (UTC)Reply
@ArthurPSmith Can you link to where the string "Division of Applied Mechanics Stanford University, Stanford, CA 94305, U.S.A." comes from? In my case, it's exactly as same as what we see in the article. I get the whole affiliation strings for each author from https://ui.adsabs.harvard.edu/abs/1982CMAME..32..199B/abstract. You can view the affiliations of each author by clicking "Show affiliations". Feliciss (talk) 07:28, 29 July 2022 (UTC)Reply
I see. It's from DOI in the article. Since my bot only get affiliation strings from ADS, it's not possible (or does not make sense) to get the affiliation strings twice from the DOI in the article. Feliciss (talk) 07:37, 29 July 2022 (UTC)Reply
Ok, you are adding the reference to the ADS bibcode there so I guess that's fine. Obviously ADS is doing some parsing of affiliations but I think they're pretty reliable about that so this is ok. ArthurPSmith (talk) 17:11, 29 July 2022 (UTC)Reply
Another example: https://www.wikidata.org/wiki/Q113380669
I think ADS is parsing some but not all affiliation strings. Feliciss (talk) 08:27, 2 August 2022 (UTC)Reply

Who is the author "JC Shakespeare"? edit

A cautionary tale: https://shkspr.mobi/blog/2022/08/who-is-the-author-jc-shakespeare/ - Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:28, 20 August 2022 (UTC)Reply

Reports published by policy and research organisations, can they be considered generally reliable? edit

also posted at w:WP:Village pump (policy) since it’s relevant there too

I’m looking for opinions on institutional policy and research reports in general as reliable sources as part of the WikiProject Policy Reports project. The example source types on WP:RS (scholarship, news, vendor etc) don’t quite cover our area of interest: reports, conference papers, discussion and briefing papers, strategies, policies and other docs (sometimes called grey literature). These are generally self-published by organisations (e.g. the WHO publishes WHO reports) but it’s obviously not the same as someone’s self-published blog or book.

I realise that for specific citations in WP it’s case-by-case. However, we’re looking for some guidance on what principles or criteria we could use to prioritise/sort organisations into 1) Generally reliable / 2) unclear / 3) generally unreliable since these sorts of items are likely often useful as potential WP sources in addition to books/journals/newspapers. As part of the project we’re looking to prioritise which organisations’ reports are most useful to upload metadata to Wikidata about. If general principles aren’t really possible, it’d be helpful to have some examples to calibrate on e.g. these five organisations:

  • The Australia Institute is an independent public policy think tank based in Canberra, Australia that carries out research on a broad range of economic, social, and environmental issues (APO-listed reports)

Thanks in advance for the feedback on these! We’ve >70 publishing organisations that we’re focusing on so these will help us calibrate which sorts of organisations are worth focusing on uploading metadata to Wikidata. If anyone has an interest in the full list, please let me know and I can loop you in on the full project. Brigid vW (talk) 07:06, 28 September 2022 (UTC)Reply

Notability of the organization might be a useful guide, for example the number of sitelinks to language wikipedias for the organization? ArthurPSmith (talk) 21:19, 28 September 2022 (UTC)Reply

Some scholarly article (Q13442814) statistics edit

As of 4th Apr 2023

Kpjas (talk) 20:19, 4 April 2023 (UTC)Reply

Interesting observations. I think that "Articles without main subject" is especially important because:
  • "Main subject" is the main reason to have articles in WD, since WD is not an authoritative article source, and doesn't have Abstracts.
  • WD is flooded with articles about X but the item X itself is missing. Example:
  • From your statistics, one might think that 1/3 of the articles have main subject: not so bad, right? However, an article should typically have at least 5-10 subjects, and there's no assessment whether those that have at least one, have adequate subjects
Vladimir Alexiev (talk) 07:01, 30 April 2023 (UTC)Reply
@Vladimir Alexiev AFAIK main subject (P921) the situation with providing scientific articles items with this property is:
  • by hand -- rather impractical
  • executed by a specific tool that is QS -- carefully select scientific article items with phrases in their titles that we assume would make an adequate main subject e.g "BRCA1 mutation" or "Huntington's disease"
  • provided by other tools or bots like SourceMD [8]
  • Pubmed metadata contain keywords and MeSH -- why not pull these (copyright issue?)
  • lo and behold ChatGPT is quite good at summarizing, perhaps also at providing main subjects for scientific articles, huh ?
Kpjas (talk) 19:38, 30 April 2023 (UTC)Reply
I asked GPT-4:
classify these PubMed papers with MESH descriptors and qualifiers.
(I provided title and abstract, but showing only title below)
1. Semantic Web in Healthcare: A Systematic Literature Review of Application, Research Gap, and Future Research Avenues
  • MESH Descriptors:
Semantic Web
Health Care
Systematic Reviews
Electronic Health Records
Medical Informatics
Interoperability
Decision Support Systems, Clinical
Information Management
Medical Informatics Applications
  • MESH Qualifiers:
utilization
trends
organization & administration
methods
standards
2. Semantic Web Technologies for Sharing Clinical Information in Health Care Systems Abstract:
  • MESH Descriptors:
Semantic Web
Health Care
Medical Informatics
Interoperability
Information Management
Ontologies (Information Science)
Electronic Health Records
Clinical Information Systems
Information Storage and Retrieval
  • MESH Qualifiers:
methods
trends
organization & administration
standards
utilization
-- Vladimir Alexiev (talk) 07:17, 1 May 2023 (UTC)Reply
I've created a subpage for statistics at /Statistiscs. I'm gonna make a little bot to keep this updated, will reply with the repository when I do this. I'll add the metrics you mention, let me know if you have other ideas :) Carlinmack (talk) 11:06, 16 June 2023 (UTC)Reply
@Kpjas @Vladimir Alexiev I've created a bot User:UpdateWikiprojectBot to update these statistics weekly and added a summary like you mentioned above to the project page Wikidata:WikiProject_Source_MetaData#Statistics. Let me know if you have any suggestions or improvements Carlinmack (talk) 17:09, 30 June 2023 (UTC)Reply
Thanks a lot for this great resource! Re https://www.wikidata.org/wiki/Wikidata:WikiProject_Source_MetaData/Statistics#External_Identifiers
  • Please add Arnet Miner publication ID (P7292), the main Chinese science knowledge graph. Back in Jan they had 332,899,739 PUBLICATIONS
  • Where do you get "Total in source" from? I wonder about DOI=318,158,353 because the main DOI source CrossRef has 147,086,458
--Vladimir Alexiev (talk) 07:26, 3 July 2023 (UTC)Reply
I use the following query to get the list of identifiers: https://w.wiki/6xjm
  1. I've added Wikidata property for items about scholarly articles (Q29548341) to Arnet Miner publication ID (P7292) so it will be added to the list when I run the script next
  2. I use number of records (P4876) to fill in the "Total in source" and if you go to DOI (P356) you can see the reference URL for this number :) I'll add a link to this property to the table
Carlinmack (talk) 17:36, 4 July 2023 (UTC)Reply

Source reliability + assessment edit

I recently drafted Wikidata:WikiProject Source Reliability to capture efforts to annotate source entities with information related to their reliability. I'd love feedback on how to make that a useful complement to SourceMD. Sj (talk) 17:07, 25 April 2023 (UTC)Reply

Please see property proposal: Wikidata:Property proposal/assessed source reliability. Harej (talk) 03:01, 26 April 2023 (UTC)Reply
Nice, that seems the minimal property that could capture a range of different evaluations. The Source MetaData WikiProject does not exist. Please correct the name. Sj (talk) 16:38, 28 April 2023 (UTC)Reply

Finalize a rename to WikiProject WikiCite edit

Move Wikidata:WikiProject Source MetaData -> Wikidata:WikiProject WikiCite

Previously discussed and approved in 2017 with mostly approvals, some abstention, and no opposition.

I called for comment, got it, then neglected to executed the move.

I am posting again to express intent to do the rename soon based on past approval. I do not expect objections, and in my view, over the years the scope of activity here has always overlapped with WikiCite activities.

Comments from anyone? Bluerasberry (talk) 17:59, 5 May 2023 (UTC)Reply

@Bluerasberry, ArthurPSmith, Sj, Daniel Mietchen, Kpjas, Olea: WikiProject Source MetaData has now been renamed to WikiCite. Harej (talk) 19:29, 8 August 2023 (UTC)Reply

ok "my" page was moved while working on it. I take note.--Alessandro Marchetti (WMCH) (talk) 20:16, 8 August 2023 (UTC)Reply

Other research resources edit

There is a concept of "research resources", which includes anything that a research project uses. This is relevant here because we index scholarly papers, and those papers name the resources which research projects use.

This month a recommendation came out from Research Data Alliance which instructs on making persistent identifiers for scientific instruments.

This paper is one of many. This paper alone may not be so remarkable, but this is part of a trend for identifying all resources. Eventually we should delineate all these resources in Wikidata, connected to this project. Bluerasberry (talk) 16:03, 4 August 2023 (UTC)Reply

Deletion of authors of notable publications edit

I report this discussion, which is relevant for the project since it regards the possible deletion of parts of bibliographic data. Your comments are welcome.

I also report that I am unable to user {{Ping project}} with your two lists of participants, i.e. Wikidata:WikiCite/Participants and Wikidata:WikiCite/More/Participants, because the template doesn't work with pages not containing the word WikiProject. So, after the project was moved from Wikidata:WikiProject Source Metadata to the present Wikidata:WikiCite, it has become impossible to use ping project for its member. If possible, I would suggest to find some solution for this issue. --Epìdosis 00:25, 1 October 2023 (UTC)Reply

WikiCite in continued limbo edit

I have no update but someone asked me again how WikiCite is. The general situation is that from 2017 Wikidata developers have said that WikiCite is too big for Wikidata. The effect is that while WikiCite development continues, contributors have slowed and limited their development of WikiCite content. Another effect is that because WikiCite is limited, anytime anyone proposes a project which is comparable in size to WikiCite, that project is halted immediately due to the size problem. Presumably Wikidata would have a much larger community with many more projects if we could manage the data hosting and querying.

WikiCite content is the biggest single component of Wikidata, and Wikidata is at its limit for database architecture due to hosting lots of content. There have been longstanding proposals to fork WikiCite content away from the main Wikidata. The standing counterargument is that WikiCite is only the beginning and if invited, lots of people would upload much more content to Wikidata about many things. By forking off WikiCite, we also establish the precedent that others should plan to fork off their large collections, and then we commit to federating the various interconnected Wikibase instances hosting all these. To my knowledge the current state of things is that no one representing Wikidata or the Wikimedia Foundation has any concise or comprehensible statement describing future plans, but the developers would like the community to be aware that all this WikiCite stuff is to remain in perpetual limbo until such time as a decision is made. This position began in about 2017. Here is the documentation.

If anyone has anything else to share then please do. Bluerasberry (talk) 18:55, 3 October 2023 (UTC)Reply

@Bluerasberry: A frank summary. Federation has been discussed off and on for a long time; maybe we can talk here about what is missing in the current wikibase ecosystem that would be needed to make it work smoothly? Mastodon is an example of a federated social media ecosystem that seems to be working fairly well. Of course the web as a whole has been that way from the start. What would be needed to extend Wikidata with, say, a WikiCite wikibase and make it all work smoothly? ArthurPSmith (talk) 19:20, 3 October 2023 (UTC)Reply
Maybe it's time to start thinking how a federated WikiCite should be. Could be the coming Data Modelling Days 2023 the moment to start writing the WikiCite model for it's own Wikibase instance? —Ismael Olea (talk) 09:06, 4 October 2023 (UTC)Reply
So I think the key to this has to be simple contextual namespaces somehow. Within Wikidata Q\d+ (and P... etc) has a clear meaning, but it's referring only to the entity on wikidata.org. How do we name things to simply relate them across wikibases? If I understand it correctly, to this point federation implementations within wikibase only essentially copy stuff from one wikibase to another (the collection of properties, for instance). What we need is to be able to refer to entities across wikibases without creating new entities within the referring wikibase. The mastodon-style approach would suggest attaching a locator symbol - Q5@wikidata for instance. The local wikibase would need to fetch remote entities as needed and probably cache them somehow for local use. Suppose we have two wikibases, wikidata as it is and a "wikicite" wikibase. So we could have Qxxxx@wikidata items and Qxxxx@wikicite items, Pxxxx@wikidata and Pxxxx@wikicite properties, etc. This is obviously relatively easily extensible, but then how do the @namespaces get translated into actual API endpoints etc? Would simply using the DNS names - wikidata.org etc. be ok? ArthurPSmith (talk) 00:43, 5 October 2023 (UTC)Reply
Sadly I don't have the knowledge to answer. But I can see a parallel case with Structured Data on Commons, which, AFAIK, runs its own Wikibase instance. Would be great if you could lead an activity proposal for exploring and discussing this case in the Data Days: Wikidata talk:Events/Data Modelling Days 2023. —Ismael Olea (talk) 11:59, 5 October 2023 (UTC)Reply
Huh - thanks, I somehow missed that was happening. Will check into it. ArthurPSmith (talk) 17:10, 5 October 2023 (UTC)Reply

If Wikicite is going to be split I was under the impression that this would be only on the triple store level, so you would still have the data in one Wikibase (Wikidata) put you would split Q items based on whether they were scholarly article (Q13442814) or not? — Finn Årup Nielsen (fnielsen) (talk) 16:20, 18 October 2023 (UTC)Reply

Pretty much, yes. Federation for the most part only affects the SPARQL queries. There is no short-to-medium-term need to migrate the citation data off of Wikidata, so most tools that don't do SPARQL should still work. Infrastruktur (talk) 17:52, 18 October 2023 (UTC)Reply
Yes I just saw this in the plan published the other day - that was new to me though, previously it sounded like they wanted to split the scholarly articles off into a completely separate wikibase. Maybe I just misunderstood. Anyway, I guess we'll get a feel for this in early 2024 when this starts being tested? ArthurPSmith (talk) 18:51, 19 October 2023 (UTC)Reply
The short term plan per the WMDE is to split the graph. Scholarly articles is atm. half of Wikidata. This won't affect Wikicite much in the short term. Long term maybe the wikicite stuff needs to be split off, but we are nowhere near close to that, and it would be stupid to do this prematurely without a plan since a lot of labor is involved. I do indeed run into certain queries on Wikidata that time out, but nothing of that strikes me as critical. Yes, Wikidata is slow for some things, but only the query side of things is really in trouble at this point in time. Infrastruktur (talk) 19:23, 19 October 2023 (UTC)Reply

Grants data edit

Many papers and other research resources which WikiCite indexes are grant funded. We could make Wikidata items for grants and interlink papers, software, data, people, institutions, and funders. We have some pilots in this, but not any showcased complete datasets.

I want to share some ongoing news and trends.

  • It is a small thing, but Open Grants (Q109929664) is convening some events in the next month as described at https://www.ogrants.org/upcoming_events . This organization is actually trying to get published grants, and they have a collection of about 200 of them. Here in Wikidata we are mostly interested in grant metadata, but since this is only 200 grants and since that makes a complete corpus, it would be interesting to profile this collection. I am going to their meetings.
  • Open Research Funders Group (Q45759536) is a consortium of most of the big United States foundations along with some in the UK. They have representatives from all these foundations meeting in this group to pilot standardization and open publishing of grant metadata. They brokered a relationship with Crossref, and the plan is to assign dois to all grants so that grant citations can be generated. When Crossref has this index, then we will be able to interlink grants like we do for papers.
  • I personally an investigator on a small project called SEEKCommons (Q118147033) where we are trying to raise access and visibility of open environmental resources of interest to community groups. A strategy that we are exploring in this is identifying about 100 National Science Foundation grants where the commitment was producing such resources, then matching grant to all those resources indexed in Wikidata. The current state of things - and this may surprise non-scientists and outsiders - is that for any given grant, it is hard for anyone to identify what outcome it had. Where we want to take things is that anyone can see whatever results come of sponsored research.

There is not quite a WikiProject which matches the needs for grants, but I am thinking of developing Wikidata:WikiProject Award to include grants. If anyone has thoughts then share. Bluerasberry (talk) 14:50, 20 October 2023 (UTC)Reply

How to ping the project? edit

I tried using the old and new name, both fail it seems, see https://www.wikidata.org/w/index.php?title=Wikidata%3ARequests_for_permissions%2FBot%2FLccnBot&diff=2029017611&oldid=2029012620 So9q (talk) 08:37, 13 December 2023 (UTC)Reply

Clarifying guidelines for "Affiliation" and "Employer" properties edit

FYI I'm currently raising some doubts regarding the current utilization of affiliation (P1416) and employer (P108) (along with some related properties) in the discussion here. I appreciate any feedback in advance. Alexmar983 (talk) 16:17, 17 February 2024 (UTC)Reply

I second that. Kpjas (talk) 08:30, 25 February 2024 (UTC)Reply

PagesBot edit

There is currently an RFP for a bot that reads page(s) statements on items of type scholarly article, it then infers number of pages statements before adding them with a reference.

Please see: Wikidata:Requests for permissions/Bot/PagesBot

Any comments would be much appreciated! Cheers, Aluxosm (talk) 12:34, 17 March 2024 (UTC)Reply

April 2024 Wikidata content pie chart edit

 
April 2024 Wikidata statistics

We have an updated pie 🥧 chart for Wikidata:Statistics! Wow the WikiCite slice sure is large and tasty.

Thanks user:VIGNERON for generating this. Bluerasberry (talk) 16:44, 8 April 2024 (UTC)Reply

Wikidata Query Service graph split for WikiCite edit

I am posting to share early news and to recruit anyone to join future unplanned discussions on the split of Wikidata into two pieces, WikiCite and everything else. There is not other published news but discussions like this are chaotic.

See Wikidata_talk:WikiCite#WikiCite_in_continued_limbo for background. Post here if you want to join discussions or get news. This is both a big deal but also I hardly have more published information to share. I am entering this as a heavy contributor to WikiCite and Wikidata:Scholia, and while there are people who have more information, this is a discussion where people are exchanging information and I do not see anyone with answers to many basic questions. One basic question is "Why?", and the answer to that is that queries to this content are breaking multiple Wikidata:WikiProject Limits of Wikidata. Something is not sustainable; determining what the problem is and what to do in response is a challenge. Split of the graph is a proposal in exploration. Bluerasberry (talk) 16:54, 8 April 2024 (UTC)Reply

Would love to stay in the loop! Recently I was talking about the WQS performance, and I thought I remembered that a replacement (query service) being worked on / looked at. I'm not managing to find much discussions about that, however. It looks like this ticket might be a good starting point for a deep dive? My interest in the subject is still mostly as a casual contributor. --Azertus (talk) 12:57, 10 April 2024 (UTC)Reply
Do you know if those queries timing out are running into fundamental limits of the back-end or are they related to (mostly arbitrary) performance limits set to be able to keep a public service running well? Azertus (talk) 13:01, 10 April 2024 (UTC)Reply
@ Azertus: I believe there will be a significant announcement on this soon, but one of the relevant pages on the current plan is Wikidata:SPARQL query service/WDQS graph split which also has some relevant subpages. ArthurPSmith (talk) 15:49, 11 April 2024 (UTC)Reply

Best paper award edit

Some scientific conference shave (multiple) best paper award(s). How should we model this? @NandanaM: has added best paper information to The 22nd International Semantic Web Conference (Q119153957) and I have currently a Synia panel with this information: https://synia.toolforge.org/#scientificeventseries/Q6053150 It is using the winner (P1346) property and the qualifier object has role (P3831). Does anyone have feedback on this? — Finn Årup Nielsen (fnielsen) (talk) 16:41, 9 April 2024 (UTC)Reply

Weird item about an article that has the title of a book, what is it ? edit

The item Hidden harmonies: the lives and times of the Pythagorean theorem (Q114012895) is classified as a scientific article but has the title of a book I want to cite, but it's not clear weather it's actually about the book, or a review of the book. Can someone sort that out ? Is there a need to create an item for the book ? author  TomT0m / talk page 08:42, 16 April 2024 (UTC)Reply

@TomT0m: If you follow the DOI or publisher links you'll see it is almost certainly a review, not the actual book. ArthurPSmith (talk) 14:10, 16 April 2024 (UTC)Reply
Return to the project page "WikiCite".