Wikidata talk:WikiCite/Archive 4
This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Detection of deprecated DOIs
The Source MetaData WikiProject does not exist. Please correct the name. Does the sourcemd tool detect DOIs that are marked as deprecated, e.g. Q51394575, and avoids creating a new item in that sort of case? In the example given, the deprecated DOI actually redirects to the other DOI. Thanks. Trilotat (talk) 17:32, 26 February 2019 (UTC)
- @Trilotat: Maybe the easiest thing to do is just to try on an example - if the tool creates a new item by mistake I am happy to delete it. − Pintoch (talk) 12:29, 28 February 2019 (UTC)
- @Pintoch: I tested at [1] and the tool created a new (duplicate) item; Q61913611. Drat. Pintoch, would you delete Q61913611? I can post at the "Request for Deletion" page, if necessary. Trilotat (talk) 16:07, 28 February 2019 (UTC)
- deleted − Pintoch (talk) 16:12, 28 February 2019 (UTC)
I think SourceMD should NOT create a duplicate item if the DOI exists in deprecated form. I've created a proposal on Magnus Manske's BitBucket to resolve the issue at [2]. Do you think I've stated the issue effectively there? Trilotat (talk) 13:13, 1 March 2019 (UTC)
Duplicated bad DOIs - Wiley journals
The Source MetaData WikiProject does not exist. Please correct the name.
There are a substantial number of duplicate scholarly article (Q13442814) items that are being created because SourceMD is not adding the correct DOI and thus does not subsequently filter the DOI if it is added again in another batch. This query of duplicate DOIs shows the scale of the problem. Some of the DOIs conflate several articles, others just have multiple items for a single article. Either way, this needs to be sorted out.
Is it better to merge or delete the duplicates? If we delete items there is a risk of losing author (P50) and other claims that have been added to duplicate items. Merging, on the other hand, is likely to add author name string (P2093) that have already been removed or exist with a slight variation in the name formatting.
The incorrect DOIs shouldn't be too difficult to fix but it would be sensible to find a solution to the problem in SourceMD that is creating these duplicate items. Simon Cobb (User:Sic19 ; talk page) 14:16, 16 March 2019 (UTC)
- or by limiting it to Geology (Q5535339) journal, you get some with almost 400 appearances of the faulty DOI. I've been trying to fix that journal's articles, but so many versions to merge... Trilotat (talk) 20:48, 16 March 2019 (UTC)
- This Quick Statements batch fixes 1840 of the bad Geology (Q5535339) DOIs. I need to make some extra checks before running the next batch to get the right DOI for things like replies and comments, which have identical or very similar titles to the original article. Simon Cobb (User:Sic19 ; talk page) 22:41, 17 March 2019 (UTC)
- See also WD:AN#Problematic_batch_creation_by_GerardM. Please stop using SourceMD until the problem is resolved. − Pintoch (talk) 08:05, 18 March 2019 (UTC)
- @Pintoch: This problem concerns DOIs that include <...> and SourceMD omitting that section. For example, SourceMD added the DOI
10.1002/(SICI)1096-8628(19960726)67:4 3.0.CO;2-K
to Systematic screening for mutations in the 5′-regulatory region of the human dopamine D1 receptor (DRD1) gene in patients with schizophrenia and bipolar affective disorder (Q57655680) - the correct DOI is10.1002/(SICI)1096-8628(19960726)67:4<424::AID-AJMG21>3.0.CO;2-K
. From what I have seen, this is only a problem with SourceMD and not the fault of a specific editor. Simon Cobb (User:Sic19 ; talk page) 18:02, 18 March 2019 (UTC)
- @Pintoch: This problem concerns DOIs that include <...> and SourceMD omitting that section. For example, SourceMD added the DOI
- See also WD:AN#Problematic_batch_creation_by_GerardM. Please stop using SourceMD until the problem is resolved. − Pintoch (talk) 08:05, 18 March 2019 (UTC)
- @Sic19: That's an awesome QS query. I have been plugging away one issue at a time trying to fix each DOI individually. I didn't know that a query could do so many so quickly! Thanks! Edit: Can I see the text of the QS Query so I can do other duplicated DOIs for Geology journal? Trilotat (talk) 01:30, 19 March 2019 (UTC)
- Yes, of course. The first line will delete the existing DOI and the second line is adding the correct DOI:
- -Q59262974 P356 "10.1130/0091-7613(1976)4 2.0.CO;2"
- Yes, of course. The first line will delete the existing DOI and the second line is adding the correct DOI:
- This Quick Statements batch fixes 1840 of the bad Geology (Q5535339) DOIs. I need to make some extra checks before running the next batch to get the right DOI for things like replies and comments, which have identical or very similar titles to the original article. Simon Cobb (User:Sic19 ; talk page) 22:41, 17 March 2019 (UTC)
- Q59262974 P356 "10.1130/0091-7613(1976)4<798:GRPEAR>2.0.CO;2"
- I used the Crossref API for a title & journal title query to retrieve the DOIs, e.g.:
https://api.crossref.org/works?query.title=Greenland's+rapid+postglacial+emergence:+A+result+of+ice-water+gravitational+attraction:+Comment+and+reply&query.container-title=Geology&rows=5
(rows is the number of results). Simon Cobb (User:Sic19 ; talk page) 07:18, 19 March 2019 (UTC)- @Sic19: I'm not sure what that is, but it appears only as a long line of text on my screen. I'm not a member (not in academic circles, sadly), so maybe I'm unable to use the site. Perhaps one DOI at a time is the best I can do in this campaign. Drat. Trilotat (talk) 16:22, 19 March 2019 (UTC)
- @Trilotat: this is an API, in other words it is meant to be read by machines directly. So it is normal that it does not look great to you. You do not need to be a member of anything to use it though. − Pintoch (talk) 16:00, 24 March 2019 (UTC)
- Thanks Pintoch. @Trilotat: Sorry for the delay, this batch fixes another 1397 Geology DOIs. If you spot any problems in these fixes, I would be grateful if you could send me the details so I can apply stricter criteria when matching. Best, Simon Cobb (User:Sic19 ; talk page) 10:06, 6 April 2019 (UTC)
- @Trilotat: this is an API, in other words it is meant to be read by machines directly. So it is normal that it does not look great to you. You do not need to be a member of anything to use it though. − Pintoch (talk) 16:00, 24 March 2019 (UTC)
- @Sic19: I'm not sure what that is, but it appears only as a long line of text on my screen. I'm not a member (not in academic circles, sadly), so maybe I'm unable to use the site. Perhaps one DOI at a time is the best I can do in this campaign. Drat. Trilotat (talk) 16:22, 19 March 2019 (UTC)
- I used the Crossref API for a title & journal title query to retrieve the DOIs, e.g.:
@Sic19: I've been plugging away at the individual DOI repairs for Geology. This is what's left here. I'm sure there are lots of duplicate titles included in the list of bad DOI. I'll continue to clean that up. Incidentally, I'm to blame for the overwhelming majority of these duplicates and bad DOIs since I've been curating this journal for a while now. Of course, I welcome any help from your QS batch(es) to get closer to the finish line. Trilotat (talk) 06:05, 9 April 2019 (UTC)
- Comment Note that this problem has now been fixed in SourceMD, and we've cleaned up all the bad DOI's and merged the associated duplicates associated with this problem of trimming out the '<...>' portion of the DOI. There were also a relatively small number of DOI's with URL '%' encoding of some of the characters ('#', '[', ']', ':', ';', '(', ')') which should also now be fixed. ArthurPSmith (talk) 16:46, 15 April 2019 (UTC)
Source MetaData neglects title text after additional colon
@Sic19: @Pintoch: I suspect everyone concerned with this problem may know this, but Source MD doesn't add the text after a second (or third) colon in the title. Consider these examples:
- Q59381736 should end with "... detrital garnets: Comment and reply: REPLY" but Source MD didn't create the item with the the ":REPLY" portion
- Q59381586 should have a title with three colons, but SourceMD neglected text only after the third colon.
Those items with the same title should have text after the "Comment and reply" portion. I'm not sure if these colon problems (no pun intended) are related to the some of the items that Simon's batch fixes are missing.Trilotat (talk) 15:34, 6 April 2019 (UTC)
- This isn't a problem with SourceMD - that part of the title is not in the Crossref data and therefore SourceMD can't include it. It does make it difficult to correctly match items and DOIs though because there can be one or more COMMENT with a REPLY but all with the same title. Simon Cobb (User:Sic19 ; talk page) 18:52, 6 April 2019 (UTC)
SourceMD warnings
I am now unable to use SourceMD. These warnings are all I get. What should I do? I would have logged out and logged in, but I cannot even figure out how to logout of the tool.
Warning: parse_ini_file(/data/project//replica.my.cnf): failed to open stream: No such file or directory in
/data/project/magnustools/public_html/php/ToolforgeCommon.php on line 162
Warning: assert(): Unable to connect to database [Access denied for user @'##.##.##.###' (using password: NO)] failed in
/data/project/magnustools/public_html/php/ToolforgeCommon.php on line 180
Warning: mysqli::real_escape_string(): Couldn't fetch mysqli in /data/project/sourcemd/public_html/index.php on line 104
string(122) "SELECT * FROM batch WHERE `user`= ORDER BY FIELD(status,'TODO','STOPPED','DONE') ASC,last_action DESC LIMIT 100 OFFSET 0"
string(125) "#0 /data/project/sourcemd/public_html/index.php(106): ToolforgeCommon->getSQL(Object(mysqli), 'SELECT * FROM b...') #1 {main}"
Fatal error: Uncaught exception 'Exception' in /data/project/magnustools/public_html/php/ToolforgeCommon.php:241 Stack trace: #0
/data/project/sourcemd/public_html/index.php(106): ToolforgeCommon >getSQL(Object(mysqli), 'SELECT * FROM b...') #1 {main} thrown in
/data/project/magnustools/public_html/php/ToolforgeCommon.php on line 241
What does it mean, "Access denied for user @##.##.##.###' (using password: NO)"?
Thanks. Trilotat (talk) 14:58, 23 March 2019 (UTC)
- Any help? I can use author disambiguator and quickstatements, but not sourcemd. Trilotat (talk) 23:32, 24 March 2019 (UTC)
- Fixed I don't know what happened, but this appears to have resolved itself. Trilotat (talk) 01:48, 30 March 2019 (UTC)
Academia Europeana
We at ONTO (Niko, Vladimir and Nikola) are now busy matching Academia Europeana members to WD: https://tools.wmflabs.org/mix-n-match/#/catalog/1985. These are 4.3k leading European (or Europe-related) researchers. Thanks to @Gerwoman: for importing that catalog!
Then we'll add the structured data from the AE membership page (AE section -> occupation/field; current affiliationl; country). I have asked AE for collaboration, let's see what they say. I also discovered a mixup of 2 people on the AE pages, very unusual.
And then we can turn to a wider ORCID, continuing the work of @Sic19:.
Cheers! --Vladimir Alexiev (talk) 15:21, 25 March 2019 (UTC)
Scientific articles: labels truncated at 250 chars
There's a little problem that labels/titles imported into WD are truncated at 250 characters only e.g
Kpjas (talk) 07:39, 7 April 2019 (UTC)
- A longer title can be included in the title (P1476) statement. I believe the character limit is 350. Simon Cobb (User:Sic19 ; talk page) 22:31, 7 April 2019 (UTC)
Bad transcriptions in titles
A Few Notes of the Early Church-Wardens’ Accounts of the Town of Ludlow (Q59635962) has a transciption error ("Church-Vardens" for "Church-Wardens") in the Taylor & Francis database. In this case, I have changed the label and title to the correct word, and added the "bad" imported title with deprecated rank and <reason for deprecation> "transcription error". Any objections to this practice? - PKM (talk) 22:32, 14 April 2019 (UTC)
Unstable DOIs?
An example: The VALIDATION LIST NO.85: Validation of publication of new names and new combinations previously effectively published outside the IJSEM was published in 2002 with the doi:10.1099/ijs.0.02358-0 but the DOI is not resolving anymore. Earlier I merged Validation List no. 85: Validation of publication of new names and new combinations previously effectively published outside the IJSEM. International Journal of Systematic and Evolutionary Microbiology (Q45746347) with Validation List no. 85: Validation of publication of new names and new combinations previously effectively published outside the IJSEM. International Journal of Systematic and Evolutionary Microbiology (Q45315045) but it was recreated as Validation List no. 85: Validation of publication of new names and new combinations previously effectively published outside the IJSEM. International Journal of Systematic and Evolutionary Microbiology (Q56341240). Not sure if the source is PubMed (Q180686), but maybe we have around 10,000 issues. --Succu (talk) 21:31, 22 April 2019 (UTC)
- Rather than removing the bad DOI you should probably deprecate it after merging. Though I think there's still an issue with SourceMD not noticing deprecated statements, but in principle that's the right thing to do. ArthurPSmith (talk) 15:18, 23 April 2019 (UTC)
- At the moment we have only 37 deprecated DOIs. If you are using SPARQL you have to explicitly include deprecated values. For example at Is Justified True Belief Knowledge? (Q55868521) the "10.2307/3326922" is deprecated (in fact a redirected DOI). The usual SPARQL query gives no result. Only the generalized query finds the value. Not sure this is widely known. The Crossref-API serves the deprecated DOIs as valid ones. Instead they should be redirected to the now valid DOIs. It's frustrating to do the manual fixing again and again. --Succu (talk) 17:58, 23 April 2019 (UTC)
Journal series & sections
Where a work is cited as, say:
- "Variations in the Notes and Songs of Birds in different Districts". The Zoologist. 4th series, vol. 4 (issue 710 (August, 1900), section 'Notes and Queries'): 382–383.
what property do we use for "4th series"? And what for "section 'Notes and Queries'"? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:48, 9 June 2019 (UTC)
- I'd also be interested in the answer, also with stuff like 'new series'. Richard Nevell (talk) 16:12, 12 June 2019 (UTC)
Failing batches
I've had a string of failed batches. Anyone else experiencing this or am I just having a bad day? Trilotat (talk) 17:33, 9 June 2019 (UTC)
- A batch of 1300 DOIs failed (mostly) for reasons I can't discern. I was using sourcemd, was that where your batches failed? Richard Nevell (talk) 16:20, 12 June 2019 (UTC)
- @Richard Nevell: My batches have been one DOI each. I've tried for a few days. This morning I return to see many of the most recent attempts are still pending (many hours after submisssion). Peculiar. Trilotat (talk) 16:28, 12 June 2019 (UTC)
- @Magnus Manske: rewrote the Quickstatatements batch code recently, it's possible this is related? ArthurPSmith (talk) 13:31, 13 June 2019 (UTC)
- It's working now. Thanks for reply. Trilotat (talk) 19:31, 13 June 2019 (UTC)
- 👍 Richard Nevell (talk) 18:58, 14 June 2019 (UTC)
- @Magnus Manske: rewrote the Quickstatatements batch code recently, it's possible this is related? ArthurPSmith (talk) 13:31, 13 June 2019 (UTC)
- @Richard Nevell: My batches have been one DOI each. I've tried for a few days. This morning I return to see many of the most recent attempts are still pending (many hours after submisssion). Peculiar. Trilotat (talk) 16:28, 12 June 2019 (UTC)
Citationgraph bot 2 hasn't run in since last year, are there any plans to revive it? Richard Nevell (talk) 18:59, 14 June 2019 (UTC)
Is Source MetaData still down?
The url I use is returning "500 - Internal Server Error", but maybe the problem is on my end. I miss this tool, so please let me know if there is a working version somewhere. Thanks. Trilotat (talk) 19:50, 7 August 2019 (UTC)
Bad scientific articles' titles in SourceMD batches
There seems to be a problem with SourceMD that probably dates back at least to November 2018. It mangled some titles of scientific articles in newly created items.
Apparently here the problem is that SourceMD leaves out fragments of titles in italic.
Compare:
Joint Genomic and Proteomic Analysis Identifies Meta-Trait Characteristics of Virulent and Non-virulent Strains
or PubMed
So far I've come across two such items but there are probably more.
instance of (P31) for replies and comments on scholarly articles?
What is the appropriate instance of (P31) for replies and comments on scholarly articles? I don't want to tie the object of the comment or replies via main subject (P921). Perhaps that is the right connection. Please advise. Trilotat (talk) 17:56, 7 September 2019 (UTC)
Is CrossRef down?
SourceMD seems to be failing. I'm getting "503 Service Unavailable" when visiting https://api.crossref.org Is CrossRef failing for anyone else? Trilotat (talk) 18:22, 28 September 2019 (UTC)
- Actually, it's working intermittently. Issue resolved. Trilotat (talk) 20:04, 28 September 2019 (UTC)
Source MD generating spurious main subject (P921)
The Source MetaData WikiProject does not exist. Please correct the name. and The Source MetaData/More WikiProject does not exist. Please correct the name.
Does anyone know why SourceMD creates articles for Journal of Geophysical Research (Q2738009) with main subject main subject (P921): Atmospheric science (Q58005389), aquatic science (Q4782809) and Palaeontology (Q58944531)? Thanks. Trilotat (talk) 04:24, 30 September 2019 (UTC)
Wikipedia templates enabling citing of books/articles on the basis of its' items on Wikidata
Do such templates already exist in any Wikipedia? --eugrus (talk) 11:36, 14 October 2019 (UTC)
New Tool for Creating Items from a Pubmed ID
Hi All, I made a tool to help create items for journal articles from a Pubmed ID. It uses WikidataIntegrator, which is a python package created by User:Sebotic for creating bots and interacting with Wikidata. Check out the page here, and let me know any comments or suggestions. – The preceding unsigned comment was added by Gstupp (talk • contribs) at 20:50, 28 January 2017 (UTC).
- Just signing off here, so the archiver can do its work. --Daniel Mietchen (talk) 21:16, 7 November 2019 (UTC)
SourceMD appears to be not functioning
From my end, I've been unable to get past "500 - Internal Server Error" for a couple days. Frustrating that such a useful tool doesn't work. Like half the tools in my garage, but that's my fault. Trilotat (talk) 17:03, 18 October 2019 (UTC)
- Same issue here, I hope it's temporary. Richard Nevell (talk) 18:15, 18 October 2019 (UTC)
- Still appears to be offline after a bug came up: Topic:V2fzk650ojg2n6l1. Do people thin kit might be worth trying to get a grant together to hire a dev to go though and fix it? It feels like a pretty vital piece of the wikicite infrastructure. T.Shafee(evo&evo) (talk) 09:03, 31 October 2019 (UTC)
- Probably a good idea to make this tool to try to more robust and reliable. --Daniel Mietchen (talk) 21:20, 7 November 2019 (UTC)
- Still appears to be offline after a bug came up: Topic:V2fzk650ojg2n6l1. Do people thin kit might be worth trying to get a grant together to hire a dev to go though and fix it? It feels like a pretty vital piece of the wikicite infrastructure. T.Shafee(evo&evo) (talk) 09:03, 31 October 2019 (UTC)
SourceMD creating duplicates where DOI capitalization varies
I've created some SourceMD batches and found that items exist with the same DOI, but with differences in capitalization. Example, I ran a batch with DOI 10.1130/0091-7613(1974)2<281B:CSOTHT>2.0.CO;2 and it was going to create a new item. Problem is, Q58832663 exists already. I don't know where to post this problem other than here. Trilotat (talk) 18:55, 29 October 2019 (UTC)
- @Trilotat: I think SourceMD (and other DOI import tools) have been capitalizing the DOI's routinely, so that shouldn't be the problem. However there HAVE been problems with DOI's having '<' and related characters, where some versions of the DOI (particularly from Europe PMC) have that character URL-encoded (as %3C) - which makes it hard to match the strings! ArthurPSmith (talk) 17:32, 30 October 2019 (UTC)
- @ArthurPSmith: So, if the DOI is already present on Wikidata with lowercase characters, it shouldn't matter if I'm trying to add the same DOI with only capitalized letters? Trilotat (talk) 22:46, 30 October 2019 (UTC)
- I'm not sure what happens with lowercase - I believe SourceMD does a SPARQL query for whether the ID is already there and that's case sensitive, but it may try both upper- and lower-case. I am sure mixed case DOI's in Wikidata would lead to duplicates. ArthurPSmith (talk) 11:56, 31 October 2019 (UTC)
- SourceMD and most of the other tools uppercase all DOIs in their input and check for uppercase duplicates. I also think that in this case, the likely culprit are the pointed brackets. --Daniel Mietchen (talk) 21:14, 7 November 2019 (UTC)
- @ArthurPSmith: So, if the DOI is already present on Wikidata with lowercase characters, it shouldn't matter if I'm trying to add the same DOI with only capitalized letters? Trilotat (talk) 22:46, 30 October 2019 (UTC)
Using ArticlePlaceholder for bibliographic metadata?
Has anyone explored whether ArticlePlaceholder would be of any use for representing bibliographic metadata in Wikipedias that have this extension enabled? I started looking into this and put my notes into Wikidata:WikiProject Source MetaData/ArticlePlaceholder. --Daniel Mietchen (talk) 21:19, 7 November 2019 (UTC)
Community wishlist 2020 and bot request
The following items may be of interest to this project:
Papers with pages in Wikispecies, not linked in Wikidata
Please see discussion here. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:16, 13 November 2019 (UTC)
Bot request: complete WikiJournal dataset as subset of wikicite
I've a wikicite-adjacent bot request (link below). It's about integrating WikiJournals' metadata into wikidata (including peer reviewers and handling editors etc) and might be a useful concise dataset for wikicite in general. However currently none of the current project members have the requisite skills to put something like that together. Would there be a logical location/person to go to to ask for help?
Wikidata:Bot_requests#Automated_addition_of_WikiJournal_metadata_to_Wikidata T.Shafee(evo&evo) (talk) 04:29, 20 November 2019 (UTC)
Best way to store per-article submission date, acceptance date, & peer review info
Using Lysenin (Q76846397) as an example item (for doi:10.15347/wjs/2019.006), what do people think is the best way to record the following information:
- Submission date (currently using author (P50) start date)
- Acceptance date (no idea)
- Publication date (just use publication date (P577)?)
- Peer review date, reviewer, author response date, url link (currently using reviewed by (P4032) with qualifiers, but could use peer review URL (P7347) with qualifiers?)
I'd be interested in knowing what people's thoughts are for best practices before implementing more widely. T.Shafee(evo&evo) (talk) 03:58, 30 November 2019 (UTC)
- @Evolution and evolvability: I think the general rule for this sort of thing is to use significant event (P793) with values generic items for "article submission", "article acceptance", etc. and the point in time (P585) qualifier to specify the date. For reviewers use reviewed by (P4032) with qualifiers. ArthurPSmith (talk) 13:07, 30 November 2019 (UTC)
- +1. No need of new property. Snipre (talk) 14:11, 30 November 2019 (UTC)
I was formulating an email reponse (which seem to be in line with ArthurPSmith) before I saw the wikilink:
- A generic property is P793 - significant event. There you can put in various events. For scientific events, I have been using that to record "paper submission deadline" and "abstract submission deadline". One could create a "paper submission" Q-item and associate that with the P793. We have also been using that for retraction of papers, - if I remember correctly. IMHO P50 on submission date is a bit odd. I suppose it could also mean when the author started writing the paper. For publication date I would use P577. The use of P4032 (peer reviewed by) as a property and with its qualifiers looks fine by me. Perhaps I wouldn't use P4032 as qualifier for P7347 (Peer review URL). I suppose an issue is that a peer review is not a fixed date. Multiple reviewers may submit at different time, the manuscript might be re-reviewed. There may be multiple author responses. These cases could be handled with P793. — Finn Årup Nielsen (fnielsen) (talk) 20:01, 30 November 2019 (UTC)
- @ArthurPSmith, Snipre, Fnielsen: Thank you! I've updated Lysenin (Q76846397) with the suggestions. reviewed by (P4032) works quite well, since you can use date ranges that differ for different reviewers. Having proposed peer review URL (P7347) a few months back, I now wonder whether it isn't necessary if it the structure of reviewed by (P4032)+URL (P2699) is more versatile? I've also created submission (Q76903164) for the concept of submission (any assistance with it's values or translations welcome). T.Shafee(evo&evo) (talk) 01:28, 1 December 2019 (UTC)
Preprint import and Wikipedia export info
Followup question: For articles that are submitted by import from another source (e.g. Preprint server or Wikipedia), what's the best way to link to where they were imported from? Again, using Lysenin (Q76846397) as an example, I've put in the reviewed by (P4032) of submission, with the qualifier based on (P144)=Wikipedia (Q52) and a URL (P2699) link. Such a structure would have to be compatible with import from any preprint server for other publications. Similarly I've added the significant event of merge (Q1921621) with qualifier derivative work (P4969)=Wikipedia (Q52) and a URL (P2699) link. Is this logical or is there a better organisation? Thanks again! T.Shafee(evo&evo) (talk) 22:34, 2 December 2019 (UTC)
- I don't think we have a consistent model for this - I've seen many cases where there are separate items for the preprint and published versions of an article, but that doesn't seem entirely right to me. Though maybe given there are likely revisions of the work... ArthurPSmith (talk) 21:33, 3 December 2019 (UTC)
SourceMD-relevant discussions with Open Journal Systems
As part of discussions about using OJS to address the 'back end' manuscript handling for Wikijournals, we have also been touching on the idea of exporting publication metadata from OJS to Wikidata. This could be considerably richer than is commonly imported from crossref (e.g. editor and reviewer information, submission dates). Relevant meeting minutes at this link (2019-11-29 & 2019-12-18). I've been using the WikiJournal of Science article Lysenin (Q76846397) as an example for the metadata structure, however it may be the sort of thing that other journals that use OJS might be interested in. If people have input, please let me know at v:Talk:WikiJournal User Group, since I'm sure this community has great ideas. T.Shafee(evo&evo) (talk) 01:13, 20 December 2019 (UTC)
Proposal to shuffle millions of claims
Quote: "What on earth is happening here?"
The issue is determining how much to distinguish similar concepts including
- academic journal (Q737498)
- scientific journal (Q5633421)
- academic journal article (Q18918145)
- scholarly article (Q13442814)
Good -
- lots of community engagement
- strong opinions of high urgency on matters of source metadata
- people expressing useful requests to learn more
Less useful -
- people want information when documentation is lacking
- some people have taking action first before discussion
- large conversations without preparation can be frustrating and lead to discussion fatigue
Blue Rasberry (talk) 01:16, 5 March 2020 (UTC)
- My personal opinion is that a more general class is better than trying to be too specific with instance of (P31) claims. This seems to work well for example with human (Q5). You can clarify what something really is with its other properties - like where it was published. Maybe everything should just be scholarly work (Q55915575)? ArthurPSmith (talk) 14:36, 5 March 2020 (UTC)
- Mostly agree with Arthur, but currently everything is Q13442814, so I'd rather leave it that way .. no shuffling and future reshuffling. Less useful: .. --- Jura 19:11, 5 March 2020 (UTC)
Cochrane Reviews duplicates (?)
I've been trying to merge cases where we have multiple items with the same DOI and I'm running into my own confusion with regard to Cochrane Reviews. There seem to be a lot of items with the same DOI but slightly different titles, sometimes slightly different author lists, different publication dates or issue numbers. Below is a list, though I've merged a few of them already (maybe wrongly?). Could somebody else who knows more about this take a look? ArthurPSmith (talk) 18:50, 5 March 2020 (UTC)
Mining COVID19 research using [R] and Wikidata
For people interested in [R], textmining, Wikidata, COVID19 and open data: project posted to text mine and analyse the covid literature, and annotate publications' wikidata items with main subject (P921) values. Details at Wikidata_talk:WikiProject_COVID-19 and github repo. T.Shafee(evo&evo) (talk) 03:53, 19 March 2020 (UTC)
EntitySchema for preprints
There is a draft one at EntitySchema:E185, named preprint (E185). --Daniel Mietchen (talk) 16:26, 7 April 2020 (UTC)
- Related note, do we have any automatic importing of preprint info into wikidata (discussion at Wikidata_talk:WikiProject_COVID-19)? T.Shafee(evo&evo) (talk) 10:44, 15 April 2020 (UTC)
- @Evolution and evolvability: Most of the COVID-19 preprints so far have been imported by Konrad Foerstner. Not sure where his code is, though, and it did not account for author order by way of series ordinal (P1545), which causes problems in terms of author disambiguation. --Daniel Mietchen (talk) 18:32, 25 April 2020 (UTC)
- Related note, do we have any automatic importing of preprint info into wikidata (discussion at Wikidata_talk:WikiProject_COVID-19)? T.Shafee(evo&evo) (talk) 10:44, 15 April 2020 (UTC)
Towards more consistent P31 usage across the WikiCite corpus
I would like to take the bulk fixing error discussion as a starting point to review the data model for bibliographic items. In particular, I think it would be useful to have one (or a few) standard instance of (P31) for all of them, similar to how human (Q5) is used for all items about humans. A candidate for such a generic value of that P31 statement could be something like publication (Q732577), written work (Q47461344) or document (Q49848). Once we have sorted that out, we could go for additional properties (think "publication type"/ "document type" or similar) that would specify things like monograph (Q193495), preprint (Q580922) or technical report (Q3099732). --Daniel Mietchen (talk) 16:07, 20 March 2020 (UTC) The Source MetaData WikiProject does not exist. Please correct the name. --Daniel Mietchen (talk) 16:07, 20 March 2020 (UTC) The Source MetaData/More WikiProject does not exist. Please correct the name. — second round of pings. --Daniel Mietchen (talk) 16:07, 20 March 2020 (UTC)
- The approach we usually followed is to just use monograph (Q193495), preprint (Q580922) or technical report (Q3099732) as value of instance of (P31). In the end these values would not even scratch the surface of the publication type, and scientific articles, for example, would also be distinct from « news article », while both being article. In the end, for this specialized property the ontology of publication problem would be exactly the same as if we just use instance of (P31). author TomT0m / talk page 16:50, 20 March 2020 (UTC)
- @Daniel Mietchen: Presumably you have had a look at Wikidata:WikiProject_Books? I think more consistency would be great - I also have some concerns that I raised in this discussion. This may also be of interest: Wikidata:Requests_for_comment/Wikidata_to_use_data_schemas_to_standardise_data_structure_on_a_subject. I think the wikibooks model makes quite a lot of sense but it should be clarified for specific cases - under which wikiproject it should go is hard to say. We do need beter ontologies but I agree with TomT0m that new properties as you suggest won't help much. Iwan.Aucamp (talk) 16:54, 20 March 2020 (UTC)
TomT0m, the difference is that P31 needs to be stable throughout the life cycle of an academic work to accommodate data flows and feed apps like Scholia. Thus classes like "preprint" and "conference paper" vs "journal article" are not suitable. "Monograph" (scientific book) vs "scientific articke" is ok. Vladimir Alexiev (talk) 19:28, 23 March 2020 (UTC)
I would say, in terms of immediate practical usefulness, tidying up P31 statements for scientific journal (Q5633421) should be given more attention. A random example, Developmental Dynamics (Q59752), shows it is also instance of academic journal (Q737498), a second subclass of magazine genre (Q21114848). It is instance of hybrid open access journal (Q5953270), and there are certainly people who believe that open-access status should be dealt with by a dedicated property. And it is instance of society journal (Q73364223), which is a rather awkward way of dealing with important information on publisher. Charles Matthews (talk) 05:48, 20 May 2020 (UTC)
- I'd welcome having a instance of (P31) publication (Q732577) statement for all of publication items to facilitate querying. The number of subclasses is constantly growing with 2.172 at the moment. On tidying up P31 statements my opinion is more mixed: sure tidying up is good but the class hierarchy should be allowed to stay dynamic, especially addition of new classes makes sense the more items we have. Just one possibly less-relevant remark: non-publications can be cited too, see this classical piece of library and information science for a discussion. -- JakobVoss (talk) 07:04, 3 June 2020 (UTC)
Discontinued journals as a test corpus
As part of exploring the limits of WikiCite, I am looking into discontinued journals, for which (i) we can hope for a good degree of completeness, (ii) some documentation may be harder to find than for current ones. Is anyone working along these lines? To get things started, here is a query for items that
- have a ISSN (P236) statement, hinting that it is a periodical
- have a end time (P582) qualifier on the publisher (P123) statement, meaning that they ceased publishing with at least one publisher
- have a official website (P856) statement, so there is a chance that additional information could be gathered
- do not have any items published in them:
The following query uses these:
- Properties: ISSN (P236) , publisher (P123) , official website (P856) , published in (P1433) , end time (P582)
SELECT DISTINCT ?journal ?journalLabel WHERE { { SELECT DISTINCT ?journal WHERE { ?journal wdt:P236 ?issn; wdt:P123 ?publisher; p:P123 ?publisherStatement. ?publisherStatement pq:P582 ?endTime. ?journal wdt:P856 ?website. } LIMIT 100 } FILTER(NOT EXISTS { ?item wdt:P1433 ?journal. }) SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } }
Amongst the currently 38 results, several look like a suitable starting point. I am particularly attracted to Scholia (Q15755172) because
- it is not biomedical
- I found this blog post that describes its corpus as "Scholia and Scholia Reviews published 862 contributions by 392 scholars and academics at 193 universities and other institutions in 36 countries", which seems a useful size for such a test corpus
- it is a namesake of our tool Scholia (Q45340488)
--Daniel Mietchen (talk) 20:56, 25 April 2020 (UTC)
- @Daniel Mietchen: This sounds like a great idea. A discontinued journal that I am familiar with and could get you full metadata for is Physics Physique Физика (Q85793224) - it doesn't have match your WDQS query because there's no statements on the Wikidata item at all, but it also appears to have none of its (some quite significant) papers listed in Wikidata yet either. ArthurPSmith (talk) 14:34, 27 April 2020 (UTC)
- @ArthurPSmith: Sounds great — let's give that a try then! --Daniel Mietchen (talk) 14:56, 27 April 2020 (UTC)
- PS: I linked some seed papers to Physics Physique Физика (Q85793224). --Daniel Mietchen (talk) 20:22, 27 April 2020 (UTC)
- It will be interesting to see how this develops. What particular limits do you expect there to be? If you're looking for another option, there's Sussex Notes and Queries (Q92202389) which stopped in 1971 so might present a challenge as something less recent; the contents are documented in the Archaeology Data Service and the Sussex Records Society's website. Richard Nevell (talk) 17:32, 27 April 2020 (UTC)
- @Richard Nevell: I expect a number of potential limits, e.g. in terms of our ability to track down things like
- the identities (and perhaps affiliations) of authors referred to by author name strings (and perhaps affiliation strings) for publications in the journal — see P B Gatne (Q41919945) or D A Nordlund (Q92128997) for examples
- exact publication dates
- citations from and especially to
- periods with specific editors, publishers, place of publication etc.
- If we had some test corpora, we could quantify such things, which could help guide further curation efforts. --Daniel Mietchen (talk) 20:22, 27 April 2020 (UTC)
- Sounds good. I've sometimes wondered if there's a way to tap into Google Scholar for citations to works. It can be noisy sometimes, including websites based on Wikipedia for example, but does surface some interesting uses such as references from MA theses where they're available online. Richard Nevell (talk) 18:54, 28 April 2020 (UTC)
Speaking of which, I added end time (P582) for Greater Manchester Archaeological Journal (Q42721106) and Cheshire Past (Q44323342)}. Do they need official website (P856) to show up in the query, or does the filter exclude journals where Wikidata has items for articles published in them?Helps if I read the query properly. Richard Nevell (talk) 19:09, 28 April 2020 (UTC)
- @Richard Nevell: I expect a number of potential limits, e.g. in terms of our ability to track down things like
- @ArthurPSmith: Sounds great — let's give that a try then! --Daniel Mietchen (talk) 14:56, 27 April 2020 (UTC)
- Update: With the help of ArthurPSmith, I have now created all the missing items for articles published in Physics Physique Физика (Q85793224) . Next steps: annotating topics, citations (to and from) and authors, and the latter with affiliations etc. --Daniel Mietchen (talk) 21:39, 28 April 2020 (UTC)
Handle preprints?
Do we have any consistency on how should we handle preprints? For instance, [5] has a journal paper, a "versionless" preprint and a versioned preprint. There only seems to be a single version. I have currently used based on (P144) to link from the journal paper to one of the preprints. Perhaps the two preprints should be merged into one. Should further merging be made? — Finn Årup Nielsen (fnielsen) (talk) 09:34, 30 April 2020 (UTC)
- @Fnielsen: I can imagine merging the preprints being logical in the majority of circumstances currently. I think significant event (P793) + submission (Q76903164) would be favourable (could be even be used to record all preprints), see example Q57912487#P793. I'd strongly preference avoiding publication date (P577) for preprint (Q580922) items, since preprint servers usually carefully avoid saying something is 'published' as such. Using significant event (P793) would also make timelines of articles easy to track. What do you reckon?T.Shafee(evo&evo) (talk) 07:45, 3 May 2020 (UTC)
- Maybe there should be a generic preprint property that would work like arXiv ID (P818) but link to arbitrary URLs. Setting up preprints as editions would be a bit tedious. Ghouston (talk) 09:34, 3 May 2020 (UTC)
- Additionally, I think Q56795015 and Q57912487 could definitely be merged. In most cases, there'll be a single preprint version and a single published version. Articles in F1000Research (Q27701587) will be tricky ones, since they often mint a lot of versions, and the preprint server and the published versions are hosted by the same organisation. That's partly why I favour a single item, with statements indicating the preprint version(s). @Ghouston:, what do you think of just using URL (P2699) as the arbitrary URL property? T.Shafee(evo&evo) (talk) 10:21, 3 May 2020 (UTC)
- I'd expect URL (P2699) and/or official website (P856) to refer to the location of the final version, not a preprint. Ghouston (talk) 14:14, 3 May 2020 (UTC)
- I agree that URL (P2699) for preprints would be better as qualifiers for a significant event (P793) of submission (Q76903164) of a preprint version. T.Shafee(evo&evo) (talk) 05:10, 7 May 2020 (UTC)
- I'd expect URL (P2699) and/or official website (P856) to refer to the location of the final version, not a preprint. Ghouston (talk) 14:14, 3 May 2020 (UTC)
- Additionally, I think Q56795015 and Q57912487 could definitely be merged. In most cases, there'll be a single preprint version and a single published version. Articles in F1000Research (Q27701587) will be tricky ones, since they often mint a lot of versions, and the preprint server and the published versions are hosted by the same organisation. That's partly why I favour a single item, with statements indicating the preprint version(s). @Ghouston:, what do you think of just using URL (P2699) as the arbitrary URL property? T.Shafee(evo&evo) (talk) 10:21, 3 May 2020 (UTC)
- Maybe there should be a generic preprint property that would work like arXiv ID (P818) but link to arbitrary URLs. Setting up preprints as editions would be a bit tedious. Ghouston (talk) 09:34, 3 May 2020 (UTC)
About that roadmap
A roadmap for WikiCite was laid out in August of 2018 - people voted on the options and the outcome is more or less:
For now:
- 1. Centralized: 4 (+2 alternative choice, +1 second choice)
- 2. Namespace: 3 (+2 alternative choice)
- 3. Sister site: 0 (+1 second choice)
- 4. Federated: 0
Eventually:
- 1. Centralized: 2 (+2 alternative choice, +1 second choice)
- 2. Namespace: 3 (+2 alternative choice)
- 3. Sister site: 0 (+1 second choice)
- 4. Federated: 2
I'm not sure what this practically means but it would be good to record this somewhere I think if it is considered finalized. If the matter is not finalized then it would be good to know under what option we should operate for now. Iwan.Aucamp (talk) 10:31, 10 May 2020 (UTC)
- Hello Iwan - my sense is, for now we should add citation data to Wikidata, in the centralized approach; eventually moving to some combination of 2,3,4. Sj (talk) 12:03, 10 May 2020 (UTC)
The Source MetaData WikiProject does not exist. Please correct the name.
- I agree with SJ. 1 is what is possible now, 2 and 3 seem only possible with Wikimedia movement financial investment and community organization which does not appear to exist, and 4 can happen anytime any external organization such as a university invests in a Wikibase instance, which also does not appear to be in the works anywhere. The practical development which I think has happened is that the growth rate of Wikicite has slowed. This tension came to be when Wikicite was 60% of all the items in Wikidata. Now Wikicite content is about 31% of all the items in Wikidata because Wikidata is growing to expand capacity for a range of projects. When Wikicite was the only project using scarce space it seemed more like an emergency, and now we instead have to do longer term planning for many projects which all will grow over time. Solving only Wikicite does not address the many other projects of comparable size which are also incoming. I think everyone is getting the idea to seek lots of feedback and be selective when there is a large upload possible. Blue Rasberry (talk) 15:13, 11 May 2020 (UTC)
Functioning of ORCIDator
Doesn't work? I've tried to run for Q60023087 and it does nothing. --Infovarius (talk) 19:10, 18 May 2020 (UTC)
Recent media & publications
Have you given/created a presentation, paper, tutorial, poster, research or documentation related to WikiCite and open citations since the WikiDataConference 2019??
If so, please add it to the list of Media & Events on Meta wiki so we can all keep track:
Meta:WikiCite/media#2020
Many events – including most of those which were approved under the 'satellite grants' program – have been forced to be cancelled/indefinitely postponed in recent months. But that does not mean people have stopped producing excellent work relating to linked bibliographic data in the Wikidataverse). Quite the opposite! So, it would be very helpful if you could help ensure that the work is easily findable by adding it to the list linked above.
Relatedly, I am currently preparing the 2019/20 WikiCite annual report – following in the sequence of the last three annual reports. I would like to include mention many of these things if possible. [not to take credit for them - but to demonstrate the variety and quality of work that is being done in our sector].
Sincerely,
LWyatt (WMF) (talk), project manager for WikiCite. 17:24, 20 May 2020 (UTC)
Items for individual volumes and item for work as a whole
Hi, new editor/contributer here, so looking for some help. Scoggan's Flora of Canada was a four-part series. What is the recommended way to enter this? One item for the entire work, or one item for each part? There is currently one item for parts 1-3 and two items that both appear to be for part 4 (Q51408010 and Q51407106). Thanks. Friesen5000 (talk) 15:52, 7 June 2020 (UTC)
- If it's a single work, it can be represented as a single item for a work and edition, with a number of parts of this work (P2635) statement. Sometimes the number of volumes varies in different editions. Perhaps there are also cases where it's worth describing each volume separately, when they are more like works in their own right. Ghouston (talk) 01:43, 8 June 2020 (UTC)