Wikidata talk:WikiProject Source MetaData/Archive 2

Links to Green Open Access versions?

Hi all, there is a steep increase in Open Access literature. Unfortunately, it is not all Gold Open Access. There is a grey area of "green" Open Access where the document cannot be legally shared by non-authors, but at least the author can share it via repositories etc. Now, the closed access publishers won't link to these articles. Google Scholar finds many of them, and new websites like oadoi.org do a reasonable job at finding them. That said, I was wondering how authors can help these tools? What about adding a default way of adding a link to the Wikidata entry for a particular DOI to one or more green OA versions? I do not mean preprint versions of the paper, but really the "green OA" version of the final paper. --Egon Willighagen (talk) 10:52, 31 October 2016 (UTC)

I use full work available at URL (P953) for such links. --Succu (talk) 11:05, 31 October 2016 (UTC)
Ah, cool! --Egon Willighagen (talk) 11:25, 31 October 2016 (UTC)
Hi, we have a project to do that on the English Wikipedia: OAbot. Currently submitted for bot approval. User:Daniel_Mietchen has raised the question of adapting it to Wikidata, which should not be hard, but I don't have time for this. The source code is available. − Pintoch (talk) 16:13, 31 October 2016 (UTC)
@Egon Willighagen, Succu, Pintoch, Daniel_Mietchen: I think it would be useful to start figuring out more broadly how to map OAdoi.org's API response fields to Wikidata's bibliographic data model. In other words, if a Wikidata item represents a work, and – per Succu – we can use full work available at URL (P953) to represent free_fulltext_url, how are we going to store additional properties such as: evidence, is_boai_license, is_free_to_read, is_subscription_journal, license, oa_color? Are they always properties of the item or some times qualifiers of a specific resource identified via a URL? I think we should focus our efforts on coming up with a good data model for all possible combinations of these fields than simply representing links to green OA versions. Also, AFAIK there isn't an easy way of telling from any available API if a given preprint is the author version or a self/institutionally archived green-OA copy matching the version of record.--Dario (WMF) (talk) 16:33, 31 October 2016 (UTC)
For the "is_" properties, Wikidata does not provide a boolean datatype. They can be replaced by putting the articles in classes like
⟨ the article ⟩ instance of (P31)   ⟨ free access content ⟩
or
⟨ the article ⟩ instance of (P31)   ⟨ paywalled content ⟩
for example. We have a licence Search property as well, which allows to further define classes like "free access content" by means of Q7257715      for example. author  TomT0m / talk page 16:49, 31 October 2016 (UTC)
I like the idea of using copyright license (P275) as a qualifier on full work available at URL (P953). Further qualifiers could indicate alternative formats (HTML, PDF, ...), e.g. with media type (P1163). --Egon Willighagen (talk) 14:15, 5 November 2016 (UTC)
@Egon Willighagen, TomT0m: the advantage of using a top-level property for the version of record (which describes the work and is probably the only version for which we have reliable provenance data for the license), instead of a qualifier, is that it enormously simplifies querying.--DarTar (talk) 07:05, 8 November 2016 (UTC)

Creating many items on journal articles

  Notified participants of WikiProject Medicine

Currently, we have 200,672 papers from PubMed Central. I would like to add an additional 166,204, comprising open-access review articles from the last ~5 years. I would like to accomplish two things with this:

  • Continue to build out the Wikidata Citation Graph, which is over 500,000 connections strong and currently in the process of growing to 800,000
  • Help make high-quality biomedical literature discoverable for the benefit of the WikiProject Medicine community through the Librarybase project (my inclusion criteria is based on Wikipedia's medical reliable sources criteria).

In a few months' time, I will want to discuss having a Wikidata item for every source cited on Wikipedia (up to a certain level of granularity). That will require more research on the part of myself and the Librarybase team. But I think this would be a good set to try it out with. I will note that this import will be based on workflows that have been used many times before. Harej (talk) 15:06, 7 November 2016 (UTC)

@Harej: On which criteria do you select the articles which will be imported in WD ? Currently the main criterion is the reuse of the article item as reference in a statment. Is it the case ? Snipre (talk) 19:41, 7 November 2016 (UTC)
Snipre, that is definitely an important use case. But I am also interested in modeling this data for its own sake; in building an open Citation graph (Q5122403), we can create an open alternative to commercial citation graphs like Web of Science, allowing us to identify "hubs" of research without the use of commercial services. I also want to help make scientific papers, especially open access papers representing reviews of the literature, more easily discoverable by connecting papers with concepts on Wikidata. Daniel Mietchen, who has been working on this with Zika, can explain more. So I would say the criteria for inclusion is the extent to which those two goals are accomplished. Yes, this casts a wide net, but we can move gradually on this. Harej (talk) 20:06, 9 November 2016 (UTC)
Harej Wikidata requirements to create new items are quite large but the minimal requirement is that the new item should be used at least in another item or in WPs. I know that this rule is not always fully respected but most of the time this concerns thousands of items. Here you are speaking about several hundreds thousand of new items and the order of magnitude is about the million if we consider bibliographic evolution over time.
Your argument about an open alternative to commercial citation graphs is not relevant in the case of WD: WD is not primary dedicated to offer an alternative to commercial data offers but to support WP edition. Again here a discussion at community level should be performed in order to amend Wikidata:Notability. Snipre (talk) 14:12, 11 November 2016 (UTC)
Knowledge dissemination is a weird beast. When I read a primary article, it builds on knowledge in cited article. In fact, in many cases you refer to the cited article too, as commonly the citing article requires you to look up details in the cited article (common writing standard). Therefore, any cited article is at least as notable as the citing article. The other way around, a highly cited paper is more notable, so you actually do want to know how often it is cited, and the citing articles note how the notable article is being used. That use is, to me, notable too. When I read Wikidata:Notability I think articles that are active part of a citation network fulfill the second notability criterion: they are clearly identifiable conceptual entities and can be described using serious and publicly available references. Now, you could argue that that reference must be independent from the identifiable concept, which would imply only cited articles can be added if no sitelink exists. I can only assume the DOI of the article to be used as reference in Wikipedia falls under that category. There is a good bit of interpretation possible, but my impression right now is that the proposal is suitable and at least has my support. The argument about an open alternative is, to me, not about free alternatives, but about the ability to provide serious and publicly available references which happens to be a core value of Wikidata. --Egon Willighagen (talk) 15:30, 11 November 2016 (UTC)
Snipre, I'm fairly certain each item created here will be used in at least one other item. I'm not concerned about failing to meet notability requirements; I just want to make sure that if people have issues with the work that has been done to date that we don't make the problem worse (e.g. if there is a problem with data quality). Harej (talk) 22:49, 12 November 2016 (UTC)
  • Yeah I know that science is more of a focus here but in regards to the other parts of Wikipedia that are using full citations, this new model MUST be addressed fully. I get the ease of these very structured PubMed citations and how they are more workable, but I am concerned that there will be significant loss of data when it comes to unstructured citations in the Wild West that is Wikipedia. And trying to create pathways from citations on Wikipedia to Wikidata will be catastrophic.
And I think that most importantly, it is CRITICAL that Wikipedians (again assuming English Wikipedia apologies) are consulted and this project is not presented as (a) a done deal and (b) a rigid, lossy structure that is hard to edit (i.e., is on Wikidata, not Wikipedia) and has no documentation.
So while it may be logical to just push these PubMed citations through this system, it's not really being up front about the end goal. It's a little bit manipulative, un-transparent, dare I say, unethical at base.
You haven't gotten any sort of buy-in from editors, and there's no public notification of what is coming down the pike. I think it will upset a lot of old school editors who don't want to edit Wikidata.
We need a form similar to the RefToolbar that will reduce and/or eliminate the need to go into the Wikidata project. It needs to be so seamless people don't realize they are on a different project.
So I see a lot of problems here. Not to rain on your parade or dissuade you from this work James, but this is sort of a huge deal. And should not be undertaken without great care and consideration. This is not all about Wikidata or programming, which is what I get the sense every time these solutions are put forth. This is going to fundamentally change Wikipedia and Wikipedia editing. Wikipedia is based on citations in my mind. This will alter the experience for editors at its very heart, and unless the user experience is elegant and well thought out, people will be very unhappy. So... -- Erika aka BrillLyle (talk) 01:49, 11 November 2016 (UTC)
Erika, I don't understand how creating Wikidata items for papers used as references on Wikipedia changes our Wikipedia uses articles as references. Can you give an example of what you are worried about? People will still be able to add any reference to Wikipedia, right? --Egon Willighagen (talk) 15:34, 11 November 2016 (UTC)
@Egon Willighagen: Apologies for the delay in responding.
I focus on BLP entries on Wikipedia and am not a scientist, but if you look at the Zika virus references, the ones without PMID, DOI, etc. are the ones I would be worried about. The citations would become a franken-construction between "manual" old school cites and these automated cites of this project. And this is obvious to all I am sure, but citations without these identifiers are just as valid. Unless I am misunderstanding. I am concerned about how Wikipedia editors will be able to navigate this experience.
Yes, people will still be able to add any references to Wikipedia (I think), but eventually would all citations move to a structured union library type of repository? I think James understood the concern I was expressing addresses this above pretty clearly. -- Erika aka BrillLyle (talk) 10:51, 15 November 2016 (UTC)
BrillLyle, my goals are outlined at m:Grants:Project/Harej/Librarybase: an online reference library, linked above. I do not, at the present time, have any plans to deviate from that. I am proposing this as a very early, preliminary step. I do not have proposals for the other steps yet because they are not ready. I can do this particular step earlier because it builds off of a substantial body of work that has already been done; biomedical journal articles, for whatever reason, have a wonderful ecosystem for this kind of work. From there there is the matter of devising models for other types of work in other fields (books, for instance, are notoriously difficult). All of this will take place on Wikidata and/or a dedicated wiki, and I am dedicated to having a public discussion on where the boundaries are. Tools to find sources for Wikipedia articles and (perhaps) build citations would make use of existing templates and systems; changing the fundamental on-wiki citation infrastructure is a huge technical challenge that we are not going to solve any time soon. But once we are ready for that, it will be something the community will have to discuss so that we get it right. ("We" being the broader Wikipedia community; I don't know yet who will actually do it.) In the meantime, discussions about tools that help Wikipedians find sources will take place on Wikipedia, since I think that is a more appropriate venue than here. (My plan is for a "microsurvey" that will be posted to several discussion pages.) Harej (talk) 19:34, 11 November 2016 (UTC)
@Harej: Apologies for the delay in responding. I am very glad to hear your thoughts on this, as well as the plans to integrate Wikipedians in the discussion with "microsurvey" etc. Very appreciative of this response. Thank you! -- Erika aka BrillLyle (talk) 10:48, 15 November 2016 (UTC)
I think @BrillLyle: just misinterpreted a few things: she's (rightly) worried that this new vision will seriously impact Wikipedia citations overnight (ie. we put citation data here on WD, and then suddenly on WP), but here we're discussing about the first part. We need data here on Wikidata, and that's what we're talking about. When the data are here (or in Librarybase) we can of course discuss about brandnew tools like the RefToolbar, so that wikipedians can edit Wikidata data on Wikipedia (which I think is the goal for everyone: making life of Wikipedians easier, making them more efficient). But right now we are far from that, so no worry.
We are discussing about the very first premise: do we want Wikidata as a huge and rich bibliographic dataset? In Wikicite, in May, we discussed a simple "universal library model"
  • A = Library of Babel, aka all documents that can be published.
  • B = Universal library, aka all documents that have been published (and recorded somewhere).
  • C = Wikipedia library, aka all documents that have been cited in Wikipedia.
  • D = Notable library, aka all documents that are notable (for Wikipedia and/or Wikidata).
Of course, D ⊆ C ⊆ B ⊆ A, they are subclasses of each other. Our interest, right now, is C and/or D, but also B could be a future concern (A is a joke ;-)).
Deciding between C and D is complicated, because there is an important trade-off
If we want C, we will have a beautifully rich, but very complicated Wikidata full of bibliographic records related to scientific articles and books (but also, maybe, videos, newspapers, websites, etc.). We gain in richness, we lose in control and simplicity.
With B, it's the other way around.
This, IMHO, is the single more important question we have to face (modeling properties is complex, but it can be done, especially with scientific/academic articles). Because it's a social, political, community decision.
Sometimes I wonder if the "federated wikibase" approach could help us in this: one of the options is to stuff all the bibliographic data in Librarybase, and the rest in Wikidata, and connect the two databases when we'll have ways to do that. But this approach raises questions as much as it answers them.
I this the deal it's pretty much this: are we OK with putting all these data here on Wikidata, or not? We do we draw the line? Aubrey (talk) 11:02, 12 November 2016 (UTC)
Thank you, @Aubrey: You always seem to understand what I am very inelegantly trying to say, and then provide a clear and detailed illustration like this. It's amazing. I also really like the questions you ask re: the fundamentals. And that you describe this as a "social, political, community decision," not just an engineering or programming issue to fix. And yes I understand it is a significant engineering and programming issue. I am just really happy with this framework as it seems both constructive and balanced. Thanks again.... -- Erika aka BrillLyle (talk) 10:48, 15 November 2016 (UTC)
Thank you for referencing the past work done by Wikicite, Aubrey. If we had a federated wikibase system, then it would be the best of both: Wikidata data model with Librarybase wiki's inclusion policies. But without knowing how soon it's coming (and with software project time estimation being a wicked problem), I'm reluctant to wait for it. In the meantime, do we want to come up with the definitive statement of what is worth including in Wikidata for its own sake? I think there is a strong case for representing papers that are influential and are themselves summaries of other pieces of research that came before it.
Further, it may not be appropriate to frame it in terms of a presence on Wikipedia. Something being cited on Wikipedia may potentially be an indicator of notability, but given that there is not (yet) a structural need to represent, on Wikidata, every citation that appears on Wikipedia, I don't think we necessarily need to go in that direction. The Librarybase wiki can and will be used for that. Wikidata should represent papers and their networks for its own sake—as a form of knowledge, or meta-knowledge, representing how information is developed and published. If we are modeling papers for its own sake, then it shouldn't matter whether they are already cited on Wikipedia. So then how does Librarybase relate to this? If I know what's already being done on Wikidata, I won't have to do it again on Librarybase and create duplicated data. Harej (talk) 22:46, 12 November 2016 (UTC)
@BrillLyle: I think I understand your concerns, which I believe Aubrey captured well, but please refrain from using language that implies bad faith or calls other participants' behavior unethical. Let's disagree and argue vigorously – we need to make this proposal work for all editors and communities, not just Wikidata, and there will definitely be mistakes, corrections and (hopefully) several iterations to improve our workflows and documentation – but let's please maintain a civil tone and keep personal attacks out of this space, thanks.--DarTar (talk) 06:15, 15 November 2016 (UTC)
@DarTar: I have re-read what I wrote and I am a bit befuddled that what I wrote is perceived as a personal attack. This is James' project full-stop, and I appreciate the hard work and diligence with which he is undertaking the work. I am trying to very clearly express my concerns here because this is a significant project that will have a massive impact upon the Wikimedia projects. And when I say the project might be considered unethical it was meant as a concern that this is a massive scaled project that typical Wikipedians don't have a clue about. I was not saying James was unethical; concern was for the blowback of the project and its impact -- without discussion, which James very clearly is addressing. So apologies if I didn't coddle my language in dulcet tones so the concerns are more easy to swallow. As I have felt throughout my exposure to Wikidata, I am asking uncomfortable questions, but does that mean they should not be asked? Or that an active discussion that may ruffle feathers is not still valuable. I don't know. It does me no favors to put my concerns out there and stick my neck out here and via other avenues re: concerns about Wikidata usability and end-user issues. I can step away no problem but the issues will not go away. Also I have confidence James is more than able to respond as he did and doesn't need protection from a critical and/or concerned eye. Lastly I dislike having to repeat this but I am actually a big supporter of Wikidata and WikiCite. I evangelize the strengths and potentialities at every turn. But to be afraid of frank discussions I think does a disservice to all efforts. If this is consensus however I can stop participating and just focus on editing. -- Erika aka BrillLyle (talk) 10:22, 15 November 2016 (UTC)
@BrillLyle: It doesn't help to the discussion to the discussion to call James' behavior manipulative, and un-transparent, when so far the project is only affecting Wikidata. OTOH, I don't think you are asking any uncomfortable questions, they are questions that so far are far-fetched. Perhaps if some day there is a working prototype or some development that could be used in any of the Wikipedias, then the editors should look at its advantages/disadvantages. AFAIK, I only see good faith here, so please keep participating in the discussions if you think it is relevant. Normally the more input the better.--Micru (talk) 16:28, 15 November 2016 (UTC)
@Micru: Thanks for the feedback. I appreciate it. I want to clarify one thing: I was in no way calling into question James' behavior. I was questioning the impact and scope of the project itself, which I believe is a very different thing. -- Erika aka BrillLyle (talk) 05:53, 16 November 2016 (UTC)
Wanted to add that I understand I did not express my concerns in a politic or constructive manner. I in no way meant to create an environment that was upsetting to anyone. I believe in inclusive and friendly communication and I have failed at making my point in this manner. Apologies. -- BrillLyle (talk) 13:56, 16 November 2016 (UTC)

As far as I understand (please correct me if not), we are talking about 2 different things here:

  • @Harej: asked the "permission/opinion" on importing ~200k articles on WD. He thinks this is important for WD. And for WP, in the future. He argues that putting stuff here is worth it on it's own sake and I personally agree with him, but I know that many people (for example Wikidata team people) have serious concerns. Putting hundreds of thousand of items here has pros and cons. This is also connected to the following:
  • @BrillLyle: is concerned about how these articles, here on WD, are useful on WP. Going on the w:Zika virus article on WP I see a lot of cite journals and cite web templates, and from what I understand none is using WD data, even if they already are present here on WD. In a sense, Erika's right that at the moment there is some confusion: because it seems that we are just importing articles metadata here on WD for it's own sake and are not interested in WP integration. Is this the case? Fine for me, but we need to explain it better our roadmap. We need to state that we (or others) will think about changing the "cite" templates in the future, and make them integrated with Wikidata. As far as I understand, this integration is not in the scope of the Librarybase project, nor other volunteer editor interested in Wikicite have the time, skills and resources to embark in a kinda difficult project as that is (i.e. rewriting Lua templates, integrating them in RefToolbar or Visual Editor). Aubrey (talk) 13:33, 16 November 2016 (UTC)
    • @Aubrey: This is a very accurate summary of my concerns. Thanks for clarifying this so well. It seems that this is the consistent feedback, that the resources required for this integration is significant. It is also outside the scope of James' project. Well said. -- BrillLyle (talk) 13:56, 16 November 2016 (UTC)
      • @Aubrey:@CFCF:@jmh649: Various editors on English-language WP (and other large WPs) have shown great concern about even small changes to citation formatting templates over the years. It will not be an easy sell. OTOH, translators working from that WP would almost certainly welcome the use of WD-based citations that were auto-generated from an identifier seen in that original. There's a rather scattered discussion, but this might be as good a place as any to start.LeadSongDog (talk) 19:52, 16 November 2016 (UTC)
        • @Aubrey, CFCF, BrillLyle, LeadSongDog: I have talked over the past 2 weeks to a number of people who share Erika's same concerns. I want to say that these are very legitimate questions that – in part– are broader than the scope of the present wikiproject (they are about norms and expectations on cross-project content reuse and the design of workflows or processes that may conflict with any individual community's best practices). Broader as these issues may be, however, they need to be addressed here given the way in which the WikiCite initiative has advertised itself (as a source repository built in Wikidata to serve all Wikimedia projects). I concur that even if we don't have solutions to the bigger problem of how this integration is going to happen yet, we need to start discussing this as early as possible. I have some thoughts on this I'd like to share with all of you in some form, probably not buried in this thread. I hope to have a moment over the weekend to put them all together and solicit your feedback.DarTar (talk) 03:55, 18 November 2016 (UTC)
@Aubrey, CFCF, BrillLyle, DarTar: The chief difficulty I foresee on English-language WP is one of interpreting the 2006 Arbcom ruling behind w:en:wp:CITEVAR. If approached correctly, I think this change could be accomplished in a non-disruptive manner, which could then fit within the "generally considered helpful" part of that discussion. It will, however, take great patience to get this right. Over-reaching haste will only raise hackles. It will be necessary to ensure that in each case where a carefully manicured citation is converted from project-local to WD storage it meticulously maintains unchanged appearance in the finished article. If high-visibility FA-Class articles are mangled, we will see massive resistance, and rightly so. OTOH, stub articles with a sloppy assortment of citation styles will likely welcome the change, so long as it does not degrade verifiability. LeadSongDog (talk) 17:15, 18 November 2016 (UTC)
@LeadSongDog: Agree 100%. I don't think haste is a good thing, but I am concerned that there is a lot of entrenchment -- and the very real possibility this could devolve into the ever-unproductive Wiki Rulez experience that many folks are fond of using to justify hostility to content and change. If there is no WP:BOLD, this initiative will stall and become nightmarish.
That said: If the process was communicated effectively to end-user Wikipedia editors as being about improving citation creation (i.e., RefToolbar/Visual Editor cite create forms that are actually Wikidata input boxes #wishlist), making it easier to create and re-use citations easily across Wikis (and do API lookups via SPARQL, etc.), I think editors would be very receptive. The big issue (for me) has been a lack of communication and transparency in the process -- and absence of cheatsheet type documentation (see: Authority control). Letting folks know and being responsive to concerns is really key for buy-in. And guiding them through the process as it is tested and implemented, also key. It is not enough to just do something technical and let the end-users catch up. Not for something this impactful. To me: Wikipedia is citations, it's the cornerstone of all of the information.
If citations are built using piped formats found in established citation templates, it would be a matter of curating pathways and having no lost data in the conversion process from Wikipedia to Wikidata. Not a trivial amount of work, of course. And Wikidata does not seem to be built to have this much granularity at this point. Is the unique identifier the key element, like with the PMID project? These piped citations could be a similar test-case for what James is doing with this PMID project. Looking at WP:CITEVAR, I probably violate this with my systemized approach to citations as well, as I attach a unique identifier using the "ref name=x" for collocating citations. I think a unique identifier would really help. -- Erika aka BrillLyle (talk) 21:46, 18 November 2016 (UTC)

I am a bit late here. I think it is a good idea to import all these articles. I appreciate James Hare's work. It is not necessary to listen to Wikipedians. Whether the Wikipedians wants to use the data from Wikidata or not is up to them. For those with an interest in that direction: I ran into a page on the Russian Wikipedia that had made some connection to Wikidata bibliographic data. — Finn Årup Nielsen (fnielsen) (talk) 21:14, 19 December 2016 (UTC)

Thank you all for commenting thus far. I am interested by how this discussion turned out. While it is true that this work will not affect Wikipedia right away, it is important to keep their needs in mind if we want to eventually have this as a resource for them. I especially like LeadSongDog's comment: "If high-visibility FA-Class articles are mangled, we will see massive resistance, and rightly so." This highlights the need for the conversion from free text citation to structured data to be lossless, that one could easily convert from one format to another. I know at Wikicite there were discussions on how not only to model individual works, but also how to model citations, i.e., the event of Wikipedia Article A citing Paper B. When Librarybase models citations, it will absolutely need to preserve all this data. However, for Wikidata's purposes, we are interested in works in the abstract; a given citation will either be to a work on Wikidata, or to a different work. Mapping citations to sources correctly will be crucial once we get to it. But because this import is not necessarily based on Wikipedia, we do not have to worry about it at the moment. Harej (talk) 20:53, 27 December 2016 (UTC)

Springer

I am unsure about "Springer". The company structure is not clear to me. The link.springer.com says "Springer International Publishing AG. Part of Springer Nature.", see, e.g., [1]. On the other hand [2] about "Springer International Publishing AG" says "Springer, Part of Springer Science+Business Media". We do not have an item about Springer International Publishing AG it seems. "Springer" journals may also be an issue. What should we do? — Finn Årup Nielsen (fnielsen) (talk) 20:43, 12 November 2016 (UTC)

Organizations are a hard beast to model. To quote Geoff Bilder: unlike humans, organizations can do a number of irritating things. I am hoping the Org ID project will help solve these issues (that affect not just publishers, but orgs, companies, institutions in general) in a manageable way. You can read more about the initiative here. In the short time, I believe we should settle on some simplification based on external publisher registries.--DarTar (talk) 06:31, 15 November 2016 (UTC)

Wikidata Content Schema

I have a proposal for a Wikidata Content Schema, a concept for representing different data models on Wikidata. I devised it with WikiProject Source MetaData in mind; I would like to create schemas for representing different types of contents such as books, articles, etc. But first we need an underlying system. Please comment on the linked page. Harej (talk) 01:07, 13 November 2016 (UTC)

2016 Community Wishlist Survey proposal regarding citation quality and the reliability of sources

Greetings to everyone concerned about the reliability of sources used in the Wikipedia. For the 2016 Community Wishlist Survey, I have created a proposal that addresses some aspects of this called "Citation quality assessment". Please check it out, and consider giving the proposal your support in the two-week voting period beginning November 28 (Monday). Any ideas to improve upon the proposal are also very much welcome. Stevietheman (talk) 19:36, 27 November 2016 (UTC)

What should we do with bioRxiv articles later being "published"?

What should we do with bioRxiv articles later being "published"? In this situation two items might have been created because they have different DOIs, e.g., A practical guide for improving transparency and reproducibility in neuroimaging research (Q22914736) https://doi.org/10.1101/039354 and A Practical Guide for Improving Transparency and Reproducibility in Neuroimaging Research (Q27826377) https://doi.org/10.1371/journal.pbio.1002506 . I have merged two such items before. I suppose that is the ok way to do it? One issue is the publication date (P577) though. Any other opinions on whether we should merge or not? — Finn Årup Nielsen (fnielsen) (talk) 13:25, 13 January 2017 (UTC)

As I understand it, this project is intended to support citations. When works are republished there may be differences; the text may be corrected, the page numbers may be different. Even if these are the same, if the publisher and the publication date are different, a person who is accessing the source to see if the statement in Wikidata or Wikipedia is really true may wonder if there were changes between two different versions. So different publications should not be merged. Jc3s5h (talk) 14:36, 13 January 2017 (UTC)
For papers on Arxiv there might be different versions and the last version might be published in a journal. In this case we would rarely make two (or three!?) entries for the paper but just use the journal version as the canonical and provide a Arxiv link with arXiv ID (P818). Usually Arxiv papers are formatted the same as the journal version and links might be available from Arxiv to the journal version. Splitting such an article into several Wikidata items seems to me to be an unnecessary babelification. — Finn Årup Nielsen (fnielsen) (talk) 13:16, 26 January 2017 (UTC)

Cross-post: Canonicalizing DOIs

Please see Property talk:P356#Canonicalizing DOIs and comment there. Harej (talk) 00:44, 15 January 2017 (UTC)

WikiCite 2017 applications are open through February 27

The Source MetaData WikiProject does not exist. Please correct the name.

 

Hey all, we just announced that applications for WikiCite 2017 are open. WikiCite 2017 is a 3-day conference, summit and hack day to be hosted in Vienna, Austria, on May 23-25, 2017. It expands efforts started last year with WikiCite 2016 to design a central bibliographic repository , as well as tools and strategies to improve information quality and verifiability in Wikimedia projects. Our goal is to bring together Wikimedia contributors, data modelers, information and library science experts, software engineers, designers and academic researchers who have experience working with citations and bibliographic data in Wikipedia, Wikidata and other Wikimedia projects. Members of this WikiProject are the most relevant and knowledgeable community in Wikidata on the topic of structured data for sources and it would be fantastic to see you there. We have (limited) travel funding available and I'd like to encourage you all to submit an application if you're interested in contributing. This year's event will be held at the same venue as the Wikimedia Hackathon and we'll be able to accommodate up to 100 participants. Any questions? Get in touch with the organizers at: wikicite@wikimedia.org --DarTar (talk) 18:18, 11 February 2017 (UTC)

Indicating the institution of a thesis

This Wikiproject might be interested in a recent bulk upload of data about 3238 doctoral theses from Oxford University, all of which have full text available online. I have yet to link theses to their notable authors, but that is coming soon. There is also this discussion about whether the institution of a thesis should be indicated with a property or with a qualifier. MartinPoulter (talk) 15:52, 7 March 2017 (UTC)

Please add long titles as well as short ones.

When adding a DOI, could you please add long titles, too? Example diff: https://www.wikidata.org/w/index.php?title=Q30050797&diff=prev&oldid=489716912 Jodi.a.schneider (talk) 15:56, 25 May 2017 (UTC)

Rename to Wikidata:WikiCite

The Source MetaData WikiProject does not exist. Please correct the name.

The name of this project page only makes sense if you are familiar with the history of naming habbits in Wikimedia projects ("WikiProject"? "MetaData?" WTF?!). I'd prefer to rename it to Wikidata:WikiCite. Renaming requires a user of group Wikidata staff or Translation admins. What do you think? -- JakobVoss (talk) 15:19, 14 June 2017 (UTC)

One goal which this project has but which Wikicite does not have is to make systems for moving information from sources into Wikipedia infoboxes. Now that there have been some more years of conversation, I think that it might be better to drop that as a goal and instead become aligned with the Wikicite goals. The combining of Wikidata and other Wikimedia projects is a separate issue of which Wikicite is a part, and Wikicite might address some points for that project, but I think now it is not for Wikicite alone nor for this project alone. Is there already a Wikicite WikiProject? If not, then yes, I think this group could become it. Blue Rasberry (talk) 16:25, 14 June 2017 (UTC)
Fine with me. --Daniel Mietchen (talk) 18:36, 14 June 2017 (UTC)
Majes sense to me. - PKM (talk) 19:12, 14 June 2017 (UTC)
I would not feel comfortable with this rename without checking with Dario first.... -- BrillLyle (talk) 19:47, 14 June 2017 (UTC)
@Dario_(WMF): @DarTar: — Finn Årup Nielsen (fnielsen) (talk) 10:04, 13 July 2017 (UTC)
@BrillLyle, Fnielsen, JakobVoss, Daniel Mietchen, Bluerasberry, Pigsonthewing: So sorry I missed this. My 2 cents: I do agree proliferation of names loosely related to WikiCite is a problem, and I'd be in favor of renaming this for consistency. However, there's a perception shared by many in the community that Wikidata:WikiProject Source MetaData is about scholarly papers only and that as a result, for example, Wikidata:WikiProject Books is not within its scope. This is a problem. I wouldn't want to perpetuate with a rename the idea that WikiCite is about scholarly citations and scholarly work metadata only. This is the reason why I suggested we put some effort into creating a portal and a set of templates for bibliographic data modeling purposes. I think a WikiCite project could complement really well a portal, if it engaged volunteers and librarians interested in modeling bibliographic properties for any kind of work. What do people think? --Dario (WMF) (talk) 12:36, 13 July 2017 (UTC)
Abstain, though I think "source metatdata" is more widely understood than "Wikicite". Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:08, 14 June 2017 (UTC)
Whatever is more simple for the user. Aubrey (talk) 07:12, 15 June 2017 (UTC)
I collected all related pages at Category:WikiCite but none is named Wikidata:WikiCite. -- JakobVoss (talk) 08:38, 16 June 2017 (UTC)
Makes sense to me. Sj (talk) 03:10, 17 June 2017 (UTC)
Fine by me. — Finn Årup Nielsen (fnielsen) (talk) 11:43, 13 July 2017 (UTC)
I agree with Aubrey - whatever is easier for the end user. Personally I find the capitalized D in Metadata to be more problematic and weirdly oftputting. It's very clear there is a close relationship between these projects / initiatives. I think as long as they somewhat clearly link to each other it is fine. But yes, agree it focus on end user usability is priority. Best, -- Erika aka BrillLyle (talk) 16:38, 13 July 2017 (UTC)

Why limited?

Why is this Wikiproject limited to science? Is it possible to open it up for other fields like history, religion, etc? I have recently proposed a similar thing in which we fetch citation parameters from Wikidata. Capankajsmilyo (talk) 16:48, 14 August 2017 (UTC)

@Capankajsmilyo: Where did you see a statement that this be limited to science? In any case, it's open to all fields of knowledge, and if that is not clear where people look, we'll have to make it clearer. --Daniel Mietchen (talk) 21:31, 27 August 2017 (UTC)
I saw the video and example. Capankajsmilyo (talk) 01:57, 28 August 2017 (UTC)
It's also listed under "Science WikiProjects" in the WikiData WikiProjects directory.--Carwil (talk) 17:37, 18 September 2017 (UTC)

Cite Q template nominated for deletion on English Wikipedia

There's an active conversation of the future of a Wikidata-driven citation template ({{cite Q}}) on English Wikipedia here: en: Wikipedia:Templates for discussion/Log/2017 September 15--Carwil (talk) 17:42, 18 September 2017 (UTC)

Indicating canonical paper for topic

How would one indicate the canonical paper for a topic? For instance, that the canonical paper for Stanford CoreNLP (Q32998961) is The Stanford CoreNLP Natural Language Processing Toolkit (Q32999098)? Do we need a new property? Should we use described by source (P1343)Finn Årup Nielsen (fnielsen) (talk) 11:34, 13 July 2017 (UTC)

I suppose a topic could have multiple canonical papers, e.g., Natural Language Toolkit (Q1635410) could be said to have NLTK: the natural language toolkit (Q30452988) and Natural Language Processing with Python (Q28193986)Finn Årup Nielsen (fnielsen) (talk) 11:37, 13 July 2017 (UTC)
I think described by source (P1343) is sufficient here. --Daniel Mietchen (talk) 20:29, 25 July 2017 (UTC)
I was thinking about described by source (P1343). I am unsure about it. It seems it is mostly for encyclopedias. — Finn Årup Nielsen (fnielsen) (talk) 09:38, 29 July 2017 (UTC)
I have started to use described by source (P1343). — Finn Årup Nielsen (fnielsen) (talk) 14:45, 26 September 2017 (UTC)

Usability test…

So, as an academic curious about WikiCite, I ran an article of mine through the process to see how it worked. I produced Q38229173. The Zotero to QuickStatement process stored my name as an "author name string." Of course this means losing some of the metadata on first and last names. Then, I tried the {{Cite Q}} template on Wikipedia. Result: no author at all.

Back here on Wikidata, there's no straightforward guide to entering a citation, but Wikidata:WikiProject_Source MetaData does have a list that includes author (P50). And it requires another Wikidata item for each author. Okay, great: Q40230904. So now I try to populate my personal author data item with my name, and… double error: neither my first name nor my last name is a Wikidata item. Moreover, suppose that I create my family name, Bjork-James, as a new Wikidata item (and avoid any further difficulties. But also I include my family name at birth as a property of Q40230904. How does {{Cite Q}} know which family name to include in the citation?

Academics, bibliographers, and style manual writers have a solution to this problem: use the author information in the publication in your citation! Which makes first name, last name, and name order properties of the article itself. In the rush to simplify, it seems WikiCite has chosen to ignore all these difficulties and build a very difficult-to-use architecture as well.--Carwil (talk) 22:06, 18 September 2017 (UTC)

If you have the DOI for the paper in your ORCID profile, an author item link will be added to the item about the paper (and the "author name string" removed) as soon as the bot gets around to it (or it can be triggered manually, here, as I have just done). As to your surnames, Cite Q (which, as you acknowledge elsewhere, is a prototype) currently uses the label, not the properties. I see no "rush to simplify"; just a solve-the-majority-before-the-edge-cases approach typical of the wiki way of working. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 23:04, 18 September 2017 (UTC)
This is very useful, and points to WikiCite's "missing manual" problem. Is that being worked on? And is this page the crossroads for WikiCite?
But on names, the "solve-the-majority-before-the-edge-cases approach" is going to get this project into a motherlode of trouble. My situation is not an edge case. It's the situation of everyone who changes their name upon marriage. And the assumption that FamilyName,GivenName belongs in a citation is not about "non-edge cases" but about non-Hispanic Western countries (which happen to dominate academic publishing). For the vast variety of such cases, see: W3C, Personal names around the world. If you don't build a schema that recognizes and includes such cases from the ground up, you will create a system that gets entire continents of authors wrong.
Even worse, the user interface implications of not being able to input your own name (or other vital aspects of your identity) are a good red flag for designing an effective system. See these presentations on user interface design: Hello, my name is __________. and Designing forms for gender diversity and inclusion.
I know this is all still in alpha, but that's the best time to chart out the space you will be working towards.--Carwil (talk) 00:19, 19 September 2017 (UTC)
"the assumption that FamilyName,GivenName belongs in a citation" Nobody made that assumption; the opposite is true. A label can hold any form of name. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:20, 19 September 2017 (UTC)
So the citation format (and here I'm referring to {{cite Q}} on English Wikipedia) is just a comma separated list of author's full Wikidata names? Whether or not those names are the ones listed in the cited article? And with no parsing of family names? (If I'm having to guess and check here, it's because there's no documentation.) This does avoid many cases becoming a problem, but at the expense of inverting reference list expectations. Also, author-date citations have to be figured out by hand by the editor? Isn't using data to automatically create appropriate citation references the selling point of the whole system?--Carwil (talk) 23:29, 19 September 2017 (UTC)

Author names

I'm coming to the conclusion that we need to vastly increase the use of object named as (P1932) qualifiers on author (P50), on items about works, and to encourage colleagues to copy vales from author name string (P2093), rather than discarding them when a P50 is added. We currently have <59K P1932 qualifiers.

For example, on Updated world map of the Köppen-Geiger climate classification (Q21128894), the label for Murray Peel (Q40701704) in most languages is "Murray Peel"; yet the work credits him as "M. C. Peel", and it is the latter value which should be used in citations. The issue is compounded where an author has a transliterated name, or has changed surname on marriage, divorce, or for whatever reason.

As can be seen on old versions of the above example, the series ordinal (P1545) qualifier ties together the two forms of identity for a single person.

At the very least we need a temporary moratorium on removing P2093 values when a P50 is added, so that we do not discard data while we discuss the best model to use. @Magnus Manske: Please could you update your bot in this regard?

I have deactivated the P2093 removal. --Magnus Manske (talk) 08:49, 26 September 2017 (UTC)

It is possible that this could apply to things like journal names, also, but that is likely to be an issue less frequently; and to be easier to resolve programmatically.

@Daniel Mietchen, DarTar, Fnielsen, Egon Willighagen, Zuphilip: with whom I have been discussing this; and @RexxS: who has kindly been helping with the 'Cite Q' template on en.Wikipedia, which is a prototype for calling Wikidata citation metadata into a sister project.

What does everyone think is the best way forward? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 09:36, 23 September 2017 (UTC)

So the recommended order of precedence for rendering author names in citations would be:

  1. object named as (P1932) qualifier on author (P50)
  2. author name string (P2093)
  3. author (P50) label in local language
  4. author (P50) label in any other language

Also; regardless of which of the above is used, link to the author biography article if available, via author (P50). Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:16, 23 September 2017 (UTC)

For the first question, I definitely prefer qualifiers in a author (P50) statement to indicate how an author's name is stated in this particular paper. Doubling the author statements with author (P50)- and author name string (P2093)-property and then link these two just by a number, does seem very fragile and complex for any query reasons.
But, IMO using object named as (P1932)-qualifier should state the name of the author as it is in the paper, independently of any rules from citation styles. There are all variants of citation styles which require the full first names, just initials, in normal first-name-then-last-name-order, or inverted order (and e.g. Spanish names are known to be heard to split into first and last names). I don't think that one text string can help us to do all these citations. --Zuphilip (talk) 10:20, 23 September 2017 (UTC)
Yes, it is important to distinguish between
  1. how the information is represented in Wikidata (which should be discussed here) and
  2. how it should be rendered for downstream uses (which would have to be discussed there, e.g. on a given Wikipedia).
Ad 1, I think this example represents current best practice on Wikidata. The main problem I see with it is that different sources (e.g. publisher PDF, publisher HTML, publisher API, Crossref API, PubMed API etc.) may provide different author name strings (e.g. John Smith vs. J. SMITH vs. Smith J etc.), but references can not be tied to a specific qualifier value. --Daniel Mietchen (talk) 10:28, 23 September 2017 (UTC)
"qualifier should state the name of the author as it is in the paper" Agreed. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:02, 23 September 2017 (UTC)
It is a serious mistake to think that downstream use is only an issue for other sites. Wikidata may be one of the easiest databases to put information into, but unfortunately it is one of the most difficult to extract that information from. More thought needs to be put into how use by other users may be facilitated, otherwise this huge store of data will remain untapped. For example, if you decide that names are stored as a single string to save a little effort here, you'll force every single downstream user to develop their own parsing of the string for the circumstances when that name is required to be in two parts. That's a colossal waste of developer time across multiple destinations for what benefit? The better the quality of the data that goes into a database, the more usable it is.
With regards to Andy's musings above, on English Wikipedia it is perfectly acceptable to use either "Murray Peel" or "M C Peel" in citations - in fact it's been encouraged for as long as I can remember to supply the most detail possible in a citation on the grounds that electrons are far cheaper than printer's ink, so there's little incentive in any online medium to abbreviate. --RexxS (talk) 12:49, 23 September 2017 (UTC)
Interesting points, User:RexxS, but it's not just a case of "M C" vs. "Murray",; note also my comment that "The issue is compounded where an author has a transliterated name, or has changed surname on marriage, divorce, or for whatever reason". And that Murray Peel (Q40701704) has both family name (P734) and given name (P735) values. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:48, 23 September 2017 (UTC)
@Pigsonthewing: I hope I didn't give the impression that I was merely quibbling about using "M C" or "Murray", but I wanted to make clear that the finer the granularity of the data which is stored, the richer it is, and the easier it is for downstream users to use it. I think you may want someone to write some code for use in enwp that looks for the presence of family name (P734) and given name (P735) in an item linked by author (P50) and uses them in preference to author name string (P2093) or the author's label, although I would expect such code to be able to make use of the label when transliteration is needed. Obviously, the more information that is stored, the more choice we get on how we can use it, so I'd echo your request above not to throw information away. It's not difficult in code to follow an author (P50) link and look for names, and that is not hampered by the presence of a author name string (P2093) and that may well be a useful fall-back in some circumstances. --RexxS (talk) 17:21, 23 September 2017 (UTC)
How would the local language be determined? languages spoken, written or signed (P1412), original language of film or TV show (P364), language of work or name (P407), or something else? Lagewi (talk) 10:37, 23 September 2017 (UTC)
That's "local to the person or system using the citation". So on English Wikipedia, {{Cite Q}} would use the English label by preference; on French Wikipedia, the French label, and so on. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:02, 23 September 2017 (UTC)
Not really resolving the issue, I would like to point to a comment by Donald Knuth (Q17457): that involves citation: "I try to make the indexes to my books as complete as possible, or at least to give the illusion of completeness. Therefore I have adopted a policy of listing full names of everyone who is cited." [3].  – The preceding unsigned comment was added by Fnielsen (talk • contribs) at 12:13, 23 September 2017‎ (UTC).
Would we want to use "stated as" in all cases? I can see editors only using it when it differs from the preferred label in the language of the work, but of curse labels can change over time. - PKM (talk) 20:14, 23 September 2017 (UTC)
...and over languages. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:45, 23 September 2017 (UTC)

Question: Instead of object named as (P1932), should that be subject named as (P1810)? What is the difference between the two? —seav (talk) 16:19, 24 September 2017 (UTC)

  • I think a different approach here would be good: rather than storing the author names as text, store the name formatting that the paper is using, and figure out ways that we can then derive the full reference formatting from that and the info in the entries about the authors. It reduces data duplication and storage complexity, and might also make it easier to show the reference with different formats. Thanks. Mike Peel (talk) 22:53, 24 September 2017 (UTC)
    • With author names, a proper citation in most English-language formats requires knowing both how the name is stated in the published article and how that name is broken in to given and surname and the style of the name being used (e.g., is the name normally written surname-first or surname-last). The third quantity determines whether a comma appears in the surname-first citation styles that dominate proper citation in English.--Carwil (talk) 01:38, 25 September 2017 (UTC)

Wrt. Magnus Manske's deactivation of author name string (P2093) removal: P2093 value is often not sufficiently correctly to be used for object named as (P1932). Take Software tools for analysis and visualization of fMRI data (Q32145705): A bot adds "Cox RW" as author. After Manske resolver we got Robert W. Cox (Q26236233) with qualifier object named as (P1932) set to "Cox RW" while the orginal paper states "Robert W. Cox" [4]. — Finn Årup Nielsen (fnielsen) (talk) 12:14, 6 October 2017 (UTC)

That's common-or-garden GIGO. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:28, 16 October 2017 (UTC)

How to make a list of journals?

There are a few examples of lists of journals, e.g., "Beall's list" and "Big five" and Nature Index [5]. How can we represent information about this relationship? catalog (P972) would be a candidate. I note that GerardM points to the use of catalog (P972) for Black Lunch Table (Q28781198) and Colección Patricia Phelps de Cisneros (Q27430435). — Finn Årup Nielsen (fnielsen) (talk) 12:24, 6 October 2017 (UTC)

Lists of journals are instances of bibliography (Q1631107) or its subclasses. The connection between a journal and a list could be made by part of (P361) or a more specific subproperty. Both catalog (P972) and collection (P195) may suit and both are already used for this purpose. The difference seems to be having a copy of the journal or only its description. -- JakobVoss (talk) 09:32, 11 October 2017 (UTC)

Names in botany

Hoi, authors of publications for taxons in botany have a fixed format. There is also a fixed format that indicates all the relevant authors. Is this a proper place to raise this issue as well? Thanks, GerardM (talk) 11:12, 23 September 2017 (UTC)

If your refer to the above section, then it is about storing the names as used in the work in Wikidata. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:22, 23 September 2017 (UTC)
I do not understand your point. Thanks, GerardM (talk) 12:29, 6 October 2017 (UTC)
I do not understand your question, GerardM. Could you please a little bit more specific. --Succu (talk) 19:02, 21 October 2017 (UTC)
The format for authors in botany is like "(Britton & Rose) Luetzelb.". How do we annotate this in Wikidata. Thanks, GerardM (talk) 20:59, 21 October 2017 (UTC)
See Melocactus bahiensis (Q290609). (Via taxon author (P405) and basionym (P566)) --Succu (talk) 21:06, 21 October 2017 (UTC)

Percentage of contribution

In multi-author research papers, percentage of contributions of each author vary. Much recently, authors have started specifying the percentage of contribution and sometimes do state that despite the order, their contributions are equal. Do we introduce some qualifiers like 'percentage of contribution' etc. to represent such information? Is there any other way? John Samuel 18:48, 21 October 2017 (UTC)

Discussion on Wikidata integration with Wikisource's Proofreadpage index template

Here's an ongoing discussion on it that might interest the contributors of this project. ~nmaia d 01:00, 31 October 2017 (UTC)

How do we deal with ghostwritten documents?

Trump: The Art of the Deal (Q7847758) has uses statement disputed by (P1310) but doesn't doesn't express the information that the paper is ghostwritten and who's name is on the cover.

Martin Keller (Q6775864) role in Study 329 (Q34082892) is a similar case but there the status of the authorship seems to be in less dispute.

Verfassung und Verfassungsvertrag (Q2515430) is another important case.

Likely an good solution would be to use subject has role (P2868) in some way. Is there existing prior art about how to model this? How do we want to model the relationship? ChristianKl (talk) 15:25, 5 November 2017 (UTC)

How will "subject has role" identify a ghostwriter? Thanks, GerardM (talk) 15:28, 5 November 2017 (UTC)
Being a ghostwriter is a role. However being a ghostwriter implies having actually written the paper in question but his name isn't on the official listed author list. The charge against Keller is that he hasn't written the paper. "Plagiarist" would be a word that's more in line with the academic view of the behavior of passing of writing that one hasn't written with one's name but there might be a better label that librarians have for the status. ChristianKl (talk) 15:38, 5 November 2017 (UTC)
After thinking about it, it should probably by "Object has role" given that the author is the object. ChristianKl (talk) 16:33, 5 November 2017 (UTC)

The Source MetaData WikiProject does not exist. Please correct the name. ChristianKl (talk) 18:50, 9 November 2017 (UTC)

  • The people who wrote "fraud" wrote a comment instead of an answer. Given that the person who wrote the top-voted answer brought links to papers I went through the papers and it seems like "honorary author" is the term most commonly used (and I renamed the item correspondingly). https://www.ncbi.nlm.nih.gov/pubmed/9676661 seems to define the terms in the paragraph "Objectives.— To determine the prevalence of articles with honorary authors (named authors who have not met authorship criteria) and ghost authors (individuals not named as authors but who contributed substantially to the work) in peer-reviewed medical journals and to identify journal characteristics and article types associated with such authorship misappropriation."
If you want another name, I suggest you search for papers that use another name to talk about the phenomenon. ChristianKl () 01:35, 11 November 2017 (UTC)
You are wrong there are two phenomena here; there is indeed the "honorary author" he is mentioned as a thank you. This is not the principal author. Then there is what is supposed to be the principal author, who is a fraud because he is NOT the author at all. His paper is written by special interests. Thanks, GerardM (talk) 07:34, 11 November 2017 (UTC)
Do you dispute that the peer reviewed papers that talk about the problem use the term "honorary author"? ChristianKl () 14:46, 12 November 2017 (UTC)

Why not have a property for "ghostwriting"? I think that there are too many cases (hundreds of thousands in the near future?) to capture this under author (P50). A related concept is campaigns which have astroturf backing, like National Smokers Alliance (Q6978472). This was supposedly a citizen's effort to lobby for smokers rights, but was secretly backed by a tobacco company. I wonder if there is some commonality between the hidden writing of a document and the hidden organization of any sort of work, like one company operating a secret shell, or any individual doing any project without associating their name with the public presentation of the project. Blue Rasberry (talk) 19:38, 14 November 2017 (UTC)

There are two sides of ghost-writing. In the vocabulary of the linked paper, there are honorary authors (who are named, but haven't written) and ghost authors (who aren't named but who have written). If you start calling the honorary authors (who are named, but haven't written) ghost writers (or ghost authors) there's a good chance this will confuse someone.
Even when there are hundreds of thousands of both, we unfortunately don't have data in that quantity. Given the nature of the matter, people usually don't publish information about either.
Let's imagine we would have a Paaa named P:honorary_author, what do we do with the original author statement? Completely removing it feels like throwing away data. Deprecating would likely be the best way to deal with it. That means that you basically have a deprecated author statement every time P:honorary_author is used. To me that feels more complex then storing that information as qualifier. Given that we store the authors with series ordinal (P1545) there's also an interest to have all the kinds authors in one property.
One advantage of a qualifier is also that it's possible to easily model shades of gray where an author is something between a full ghost-writer and a real author by introducing additional items. ChristianKl () 22:59, 14 November 2017 (UTC)
Return to the project page "WikiProject Source MetaData/Archive 2".