Wikidata talk:WikiProject Books/2020

Active discussions

Labels for edition items, use of parentheses

In Project chat recently, I noticed a comment that we shouldn't use parentheses in labels for items. I looked at the Help:Label page and sure enough, in a section titled Disambiguation information belongs in the description, it does appear to discourage the use of parenthetical information to disambiguate in the label.

I have been routinely adding parentheticals such as (1st edition) to the labels for edition items and find them helpful. And many examples on the WikiProject Books page do as well. I'm not sure what to make of this. How else would we distinguish editions from works in the label? Comments? --Lrobare (talk) 21:19, 12 September 2019 (UTC)

I don't distinguish works from editions in the label; I do that in the description. - PKM (talk) 23:15, 12 September 2019 (UTC)
I personally add the edition number in parentheses for book edition. Just type CRC Handbook of Chemistry and Physics in the search engine: without the edition number in the label you will have 7 times the same label. Same when you will select a value when creating/modifying a statement. I think in case of edition, this information is important especially if the item can reused several times like it is the case for book in references. Snipre (talk) 10:10, 13 September 2019 (UTC)
The label should name the book; the description should give the edition particulars. Both fields show up in a search. @Snipre:, the example you give is a god reason not to use parentheses. The book title is so long that the parenthetical information is off the edge of the search window. --EncycloPetey (talk) 17:28, 13 September 2019 (UTC)
@EncycloPetey: I don't know the resolution of your screen but in my case I am able to select the good edition item directly from the search window. The lable is a wikidata field used to identify the item in the wikidata environment. If you want to use the title in any kind of external application, you should use the appropriate property title (P1476). Why do we create that property if the label and the value of property title (P1476) are exactly the same ? Snipre (talk) 19:27, 13 September 2019 (UTC)
Because labels are meant to be read together with descriptions. --- Jura 19:34, 13 September 2019 (UTC)
In German I use a full stop as a delimiter, see eg. Taxonomic literature : a selective guide to botanical publications and collections with dates, commentaries and types (Q56649865). --Succu (talk) 19:47, 13 September 2019 (UTC)
We don't use parentheses to distinguish songs of the same title, or films of the same title, or TV episodes of the same title. Search for "Tobacco Road" as an example. It's helpful to enable the "Descriptions" gadget in the your preferences so that descriptions appear on hover. - PKM (talk) 20:14, 13 September 2019 (UTC)
@Snipre: "Why do we create that property if the label and the value of property title (P1476) are exactly the same?" It will not always be the same. The label and title will probably be the same in the language of publication, but may be different in other language labels. Also, some editors use the library record form of the title, which drops initial articles from the title, such as "the", "a", "le", "una", or "el". In these situations the label and property value will not match. --EncycloPetey (talk) 01:02, 14 September 2019 (UTC)
@EncycloPetey: So if label can be translated and be different from the official title of a the book, why can't we modify the label to distinguish the item from other items ? Just take as example Gone with the Wind (Q2870): why do we accept the tanslation of the label ? The official title in the original langauge is Gone with the Wind, so this should be the same for all languages. If we accept that the label is translated for the work item in order to facilitate the item identification for every contributor, why can we add the edition number in the label ? Snipre (talk) 21:31, 16 September 2019 (UTC)
Why not? Because Wikidata norm disfavor doing so. Why use parentheses when the description exists for precisely that purpose? Why not just use the description, the way it was intended, instead of stuffing everything into a label? --EncycloPetey (talk) 01:14, 17 September 2019 (UTC)
Speaking for myself, I add parentheses with either the edition number or the date, every single time. It makes it far clearer when the item is the value of a statement, what the statement refers to. Similarly when it is the value of a reference. It also reduces the risk accidentally making the item for the book the value of a statement, when what they want is the subject that the book is named for. It makes finding the right item in search more straightforward, and it gives a better clearer caption in a wikidata-driven category infobox on Commons. If people want the actual title, then title (P1476) is available for that. Because for works P1476 is available, then the reasons that apply for labels of other types of items not to include disambiguators (better generality of reuse) simply don't have the same force; whereas it makes life much easier in so many ways if there is something in the label to indicate that it is a work, or which edition it represents. Jheald (talk) 00:20, 23 October 2019 (UTC)
I've found that the edition number and year of publication are often inadequate. Sometimes an edition is the first book edition or the first UK edition, or the first US edition, and many 19th-century books have a UK and US edition that came out in the same year. Also, many 19th-century novels were serialized prior to being collected as novels, and some first book editions have multiple volumes published in different years. --EncycloPetey (talk) 01:27, 23 October 2019 (UTC)
By all means give further information in the description, and of course try to give as full information as possible in the statements. But some indication in the label (i) that the item is an edition, or an article, or a painting, or a map, or whatever, not a regular subject; and (ii) which edition, etc, is IMO useful. Jheald (talk) 09:07, 23 October 2019 (UTC)
@EncycloPetey, PKM: A question for you: see Wikidata:Project_chat#a_thought_for_longer-term:_works,_editions,_and_searching. Snipre (talk) 07:49, 5 November 2019 (UTC)

I am in favor of not including editions or years of books in their labels. I think this approach is the most consistent with WikiData policy, is preferable from a usability perspective, and is easier to maintain.

  • From the Help:Label page, the general policy is "to use the name that an item would be known by to the most readers" for the label, and to "reflect common usage." If someone asks me what I'm reading, I would say "Moby Dick" or "the CRC Handbook of Chemistry and Physics" without mentioning the edition. In fact, in common usage virtually no one even knows what edition of a book they're reading. As I understand it, distinguishing between multiple editions/versions of things is almost the entire point of having descriptions.
  • I don't see any real downside from a usability perspective to this approach. As has been noted, descriptions come up in searches so it's not hard to decide which edition to select in that context (and with long titles, adding editions to a label will cut things off or make them longer/more unwieldy than necessary). And while it's true that people wouldn't be able to determine which edition of a book is being referred to when it's the value of a statement in another item, I don't think most people would care about the specific edition of a book in that situation? (back to the whole common usage point) Also, using the same format for book labels as for songs and other items means that a WikiData user who hasn't done any editing on books specifically would be less likely to "mess up" (aka make busywork for others to clean up) if they create a new entry for a book or remove the edition from a book item they come across and think that the book entry people don't know how to use Wikidata.
  • Including the edition in the label duplicates information (because the description would certainly have it as well), which I always hate because if for some reason there's a need to edit the edition information then there's now an extra field that needs to be changed, making it harder/slower to maintain data and increasing the risk of inconsistencies if both label and description aren't changed simultaneously.

Of course, when an edition or year really is inherently key to the item (like the Wicked Bible) it could be appropriate to include that in the label, but I think as a general policy it's best not to make books an exception to the usual rule. Biggins (talk) 12:14, 16 January 2020 (UTC)

Physical book vs. a literary work published in book form

It is apparently confusing many editors that we don't seem to have an item for "a literary work published in book form" which would be distinct from Q571, which I now understand is the item for a physical book. Does such an item exist or should we create one? User:Bodhisattwa is currently replacing instances of Q571 with Q7725634 "literary work". These might be better replaced with an item that was defined as "a literary work published or intended to be published in book form", which could be a subclass of Q7725634. --Robert.Allen (talk) 01:47, 15 December 2019 (UTC)

How about digital "books"? Can the definition be expanded to somehow include those? --Robert.Allen (talk) 01:54, 15 December 2019 (UTC)
The reason that many editors are confused is that they haven't been paying attention to the data model used by WikiProject Books, which has decided against using book (Q571) at all because it could mean any of several things (a work, an edition, a particular physical copy in a specific library). So instead we use literary work (Q7725634) but only on "work" data items and only when a more specific descriptor does not exist. But this is not limited to publication in book form; it is used for the "work" regardless of form. We then create separate data items for each edition or translation of a work.
So, for example Silas Marner (Q638327) is the primary data item for George Eliot's novel (independent of year, edition, publisher, language, or format). But data item Silas Marner (Q62579127) is for the first UK book-format edition; Silas Marner (Q61654288) is the data item for the 1896 French translation by Malfroy; and Silas Marner (Q61654298) is for the 1907 English edition with illustrations by Hugh Thomson. Notice that this later edition links to the digital copies at Wikisource and Gutenberg, which are digital forms of this same edition, with the same text and illustrations. Note also the not all Gutenberg editions are linked this way, because for many Gutenberg editions the source and integrity of the text are not so clear, so the Gutenberg edition gets a separate data item of its own. But in all of these cases, the version, edition, or translation (Q3331189) has a separate data item from the work as a whole. Many of the edits by User:Bodhisattwa have failed to take this into account, so editors will have to check by hand whether the data item was for the "work" or for an "edition/translation". If it is the latter, then the value for Q571 should be version, edition, or translation (Q3331189) instead of literary work (Q7725634).
For a fuller view see Wikidata:WikiProject Books/Works which shows the data model used here, and which has been adopted from the FRBR bibliographic ontology. --EncycloPetey (talk) 05:25, 15 December 2019 (UTC)
There must be many editors who create book items, but who are unaware of and will remain unaware of WikiProject Books. Creating a minimum of two items for every book seems like overkill to me. Can't we just create one item for the first edition, which in most cases will be the only edition? --Robert.Allen (talk) 16:04, 15 December 2019 (UTC)
In short, can you guarantee that the "book" will never have another edition, and that it will never be translated, digitized, or re-issued? I suggest you look back through the many discussions we've had on this issue. Explaining it to each person on the server one-by-one (like this) is not an efficient method. --EncycloPetey (talk) 17:03, 15 December 2019 (UTC)
Naturally I can't guarantee a second edition won't be published, but if a second edition appears, can't we create the "primary" data item (like Silas Marner (Q638327)) at that time? Perhaps we need more flexibility, so that less knowledgeable editors can also contribute. --Robert.Allen (talk) 19:56, 15 December 2019 (UTC)
@EncycloPetey:, we don't create new items for a digitization, do we? E.g., on Silas Marner (Q62579127), which was a hardback book edition, we've got a document file on Wikimedia Commons (P996) and Internet Archive ID (P724) for the scan, not a separate item. Ghouston (talk) 05:20, 18 December 2019 (UTC)
Actually, my primary intention was to replace physical book (Q571) with abstract work as per WD:WikiProject Books. Now, there are thousands of items which have statements compatible both for work and edition, and thus creating constraints. For example, items linked with Wikipedia are works but many of them have ISBN-10 (P957) or ISBN-13 (P212), which is used for editions. Now these additions are done by some bots which are not following data model used by WikiProject Books, creating more problems for clean-up works. -- Bodhisattwa (talk) 18:54, 15 December 2019 (UTC)
It seems rather odd to me that editors contributing to a project that is called "Books" have apparently decided to banish items with any kind of definition for "book" from Wikidata and has extended its purview to all sorts of items that are not even included in the usual definition of a book. --Robert.Allen (talk) 02:17, 16 December 2019 (UTC)
In my view, the batch process run by User:Bodhisattwa has caused a degradation in the specificity of information for a large number of items which are books. I would suggest that it be reverted and that we spend some time improving the definitions of items (like Q571) for instances of books. --Robert.Allen (talk) 02:43, 16 December 2019 (UTC)
I have not done something new. Other editors have done normal and batch editing based on the model established in WikiProject Books. There had been long discussions on this topic in the past and I suggest to check those. I strongly oppose the consideration of reversion of all those edits to books. That would break workflows of many Wikisources and tools like Inventaire.io. -- Bodhisattwa (talk) 04:20, 16 December 2019 (UTC)
@Robert.Allen: The degradation was present before Bodhisattwa started the batch. We have many, many "books" which have ISBN infomarion that applies to a specific edition, but place on the general item for the work. Bodhisattwa's batch run does not fix everything, but it is a positive first step towards aligning Wikidata content with the consensus data model. --EncycloPetey (talk) 04:53, 16 December 2019 (UTC)

  Comment a literary work is the idea; the intellectual property. An edition (book, electronic, tattoo on your backside) are all the manifestations of that idea. Remember that a literary work can be unfinished and not be published, whereas an edition has that further idea of the reproduction. And yes, it would be great if there did not have to be two forms, but tell me how we will manage a work that has multiple editions, and multiple translations, or even translations of a later edition. As long as the world produces a work more than once, we cannot have a schema that says there is a one to one representation.  — billinghurst sDrewth 05:20, 17 December 2019 (UTC)

  • The first two definitions of "book" in the OED are:
    • 1a. A portable volume consisting of a series of written, printed, or illustrated pages bound together for ease of reading.
    • 1b. A written composition long enough to fill one or more such volumes.
  • Two non-obsolete definitions of "edition" are:
    • 3a. One of the differing forms in which a literary work (or a collection of works) is published, either by the author himself, or by subsequent editors.
    • 3b. An impression, or issue in print, of a book, pamphlet, etc.; the whole number of copies printed from the same set of types and issued at the same time.

In your terminology I would say that for "book", "1a" is a "manifestation" and "1b" is an "idea", while for edition, "3a" is an "idea" and "3b" is a "manifestation". So, I think I have a problem with version, edition, or translation (Q3331189): "versions" or "translations" are "ideas", and, while "edition" can have two meanings "3a" or "3b", in this context it is not a "manifestation" in your sense but must correspond to 3a, an "idea". Another problem I see is that an instance of literary work (Q7725634) corresponds to "written composition", but "written composition long enough to fill a book" is a subset of that. That is why I argue that replacing an instance of "book" with literary work (Q7725634) resulted in a loss of specificity. --Robert.Allen (talk) 09:04, 17 December 2019 (UTC)

I'm unsure that it's useful to distinguish between the manifestation and the idea of a book, particularly when it comes to ISBNs. It seems to me the ISBN refers to the manifestation, yes, but also the idea that is manifested. In most cases, all we need is an item for the first edition of a book and it may or may not have subsequent editions and they can be linked if both Wikidata items exist with "has edition" and "edition of". That seems to be a more common sense approach that most editors will understand. --Robert.Allen (talk) 09:41, 17 December 2019 (UTC)
With regard to ISBN: If you don't see the difference between the manifestation and the idea, then please tell me the ISBN for Shakespeare's Hamlet (Q41567)? If there is no difference between the "work" and "edition", then there should be only a single ISBN, right? With regard to, in most cases, we only need a data item for the first edition: If you worked on Wikisource, you'd understand that no, this is untrue. There are multiple instances where no first edition exists, and many where multiple first editions exist. We cannot link all translations under a first edition, because quite often, the translation is not made from the first edition. And in the case of some notable works, such as Moby-Dick, neither of the two first editions is the standard modern edition. And for most of Dickens' novels, the first edition was a serialized publication, and not in book form at all?
And why should it matter which definitions are listed first in the OED? We're creating a database, not writing an essay. Did you look at the FRBR bibliographic ontology, as I recommended? That is the international standard for library databses, not the OED. The entry for "book" in the OED may be easier to look up, but it's not as applicable when you're creating an international bibliographic database. If you believe that librarians the world over should adopt your terminology, instead of the international agreed upon system, then the International Federation of Library Associations and Institutions (IFLA) is the place to make that proposal. --EncycloPetey (talk) 17:17, 17 December 2019 (UTC)
So an item with an ISBN cannot be called an instance of a "book"? Is Wikidata a bibliographic database? --Robert.Allen (talk) 04:48, 18 December 2019 (UTC)
Bibliography: systematic description and history of books, their authorship, printing, publication, editions, etc. So an ontology derived from a bibliographic database does not need to label items as book, since everything is about books. Wikidata is about more than books, so we need to identify items that are books as instances of book. --Robert.Allen (talk) 05:43, 18 December 2019 (UTC)
By the way, Hamlet is a play, not a book, so of course it does not have an ISBN. Plus it's old. Old books also do not have an ISBN. --Robert.Allen (talk) 05:46, 18 December 2019 (UTC)
I have a book on my shelf called "Hamlet" that contains an edition of Shakespeare's play. The book has an ISBN. Books can contain many different kinds of contents, including plays. This is why we need separate data items for works (such as "Hamlet") and editions of those works, such as the edition on my bookshelf. --EncycloPetey (talk) 05:52, 18 December 2019 (UTC)
Fine, I don't see any problem with that. The item with the edition of Hamlet is a book. No problem. If a book does not have a first edition, fine handle it differently. But don't ltry to impose that more complicated method on other book items that are simple cases. --Robert.Allen (talk) 05:58, 18 December 2019 (UTC)
Probably many of the items with ISBNs were were created so they could be cited in Wikidata as references with page numbers. We need to keep them as instances of book (Q571). The item version, edition, or translation (Q3331189) does not seem to be equivalent to book to me. Couldn't this also be applied to other formats, like software, films, or maps? It was a mistake to replace book (Q571) with literary work (Q7725634), because the latter applies to the content, not the format, i.e., a book, which has page numbers and sometimes volume numbers. (Perhaps "format" and "content" may be better terms than "manifestation" and "idea".) --Robert.Allen (talk) 16:50, 18 December 2019 (UTC)
"Probably?" Most of the ISBN items were imported en masse from Wikipedia, where ISBNs had been stuck on everything, apparently randomly. Now that the information is on Wikidata, it is being cleanup up and collated. Wikidata information must do more than simply interface with Wikipedia for use in citations. Bibliographic information must also connect to databases of the Internet Archive, Hathi Trust, Google Books, and the databases of major world libraries, such as the Bibliothèque nationale de France, the Deutsche Nationalbibliothek, and the Library of Congress. All of these databases use the same data structure adopted by WikiProject Books. They all have two sets of data, one for the authority files (works and authors) and one set for bibliographic information (specific editions or shelf copies). This is an internationally used data structure, and if Wikidata is to connect and interface with them ( as we are already doing), then we have to have the same data structure to pair with them. What you are asking Wikidata to do is to not connect with the collection of world data on publications.
Also, when citing using page numbers, it is very important to cite a specific edition, or even a specific printing. Each edition is a new edition because it has changed from earlier editions. This is true on Wiktionary as it is anywhere else, where example sentences demonstrating the use of a word are quoted following a definition. Wikidata supports data for edition information so that verification by the correct edition can be made. And no, there is a separate subclass of version, edition, or translation (Q3331189) for software version (Q20826013), so the two are not interchangeable. For information about films, see Wikidata:WikiProject Movies. --EncycloPetey (talk) 17:07, 18 December 2019 (UTC)
I've created numerous book items on Wikidata. I've used most of them for Wikidata references ("stated in") not on Wikipedia. All of them have either an ISBN number or an OCLC number (usually older books) to identify the specific edition. --Robert.Allen (talk) 19:02, 18 December 2019 (UTC)
  Comment In my view book (Q571) needs to be orthogonal to the work / edition / physical copy distinction. Similarly for the main items for map, poem, novel, encyclopedia, etc.
In most cases, in my view, an item can and should have instance of (P31) = edition, or instance of (P31) = work, AND instance of (P31) = book (Q571).
We already have a specific item for individual copy of a book (Q53731850), so whatever Q571 is meant to stand for, it is not specifically an individual copy of a book. (Though I would have individual copy of a book (Q53731850) subclass of (P279) book (Q571)).
I believe book (Q571) is best held to represent the form of the work/edition/copy: ie that it is a work/edition/copy of a book, rather than saying anything more about which of those three classes the item in question belongs to.
This I think is how people most commonly understand the idea of "book" in everyday life; and I think is what the sitelinks for Q571 represent.
It would also avoid all of the confusion caused by arbitrarily assigning Q571 to any one of work/edition/copy, against the expections of readers.
So that's how I think we should go forward, and similarly for map, poem, etc. Jheald (talk) 17:17, 18 December 2019 (UTC)
@Jheald: Wrong due to concept confusion between edition and work. Ontology requires that one concept = one item. By indicating that all works and all editions are books, you mix 2 concepts in one item. If you are able to distinguish work from edition, this means there are 2 concepts so why do you want to mix them up ? Later when people will perform data extraction through query, they will never use book as search criteria, because they are interested in works or in editions. Snipre (talk) 03:57, 19 December 2019 (UTC)
@Snipre: Not all works or editions are works or editions of books. You can have an edition of a map, or a music score. There are two different dimensions here. The dimension of whether an item is for a work or for an edition is different from the dimension of whether an item has the form of a book or a map or a music score. In my view, inheritance from Q571 is giving useful information about that latter dimension, not the former one. You appreciate the distinction? Jheald (talk) 10:04, 19 December 2019 (UTC)
@Jheald: Please don't say me that you never see the use of written work (Q47461344) or literary work (Q7725634). When we speak about work, this doesn't mean we mix all works: there are dedicated items for written works so I don't why you are introducing music in that discussion. Snipre (talk) 10:39, 13 January 2020 (UTC)
If a book has a limited edition of say 300 copies and the copies are numbered, then I guess individual copy of a book (Q53731850) would apply to a particular exemplar? Often these are books of prints, where the quality declines with the exemplar number. --Robert.Allen (talk) 19:11, 18 December 2019 (UTC)
Yes. If you want to indicate that the item is for the particular copy that was number 12 in the run, or was the particular copy used by Lincoln to swear the oath of office, use individual copy of a book (Q53731850). Jheald (talk) 19:43, 18 December 2019 (UTC)
Since software version (Q20826013) is a subclass of version, edition, or translation (Q3331189), maybe we should create a new subclass of version, edition, or translation (Q3331189) for book editions. Would it also be a subclass of book (Q571)? Then we could use "instance of" that new subclass for items such as The Dictionary of Art (Q66415298), Catalogues de la collection d'estampes de Jean V, roi de Portugal (Q65555135), Gaspare & Carlo Vigarani: Dalla corte degli Este a quella di Luigi XIV (Q63067141), etc. Would that help solve the problem we have here? Also, should there be a property called "has subclass", so Wikidata users can find subclasses more readily. --Robert.Allen (talk) 19:58, 18 December 2019 (UTC)
It depends on what you mean by "book". As you discovered before, "book" means more than you previously though. Even "Hamlet" can be a book. This is why "book" is problematic as an identifier: it means different things to different people in different circumstances. What we do have is distribution format (P437) which can accept values such as hardback (Q193955). This may be closer to what you're looking for. And no, "has subclass" isn't necessary as a reverse property, but you might petition the developers for better tools to make it easier to search for items that are subclasses. --EncycloPetey (talk) 20:09, 18 December 2019 (UTC)
@Robert.Allen: The problem is the definition: please give us your definition of book. Then we will see if we can use your definition to merge work and edition. Snipre (talk) 03:57, 19 December 2019 (UTC)
Perhaps you are asking too much. "Edition of a book" is a very common term. It is not our job to define it. I have seen almost no similar class items on Wikidata which have extensive or exact definitions. Usually they have a short description, which is often rather vague or even inexact. For instance private mansion (Q1365179) has a short description in English and the French term hôtel particulier as an alternative name. One can infer this item is actually intended for buildings that meet the definition of a French hôtel particulier. That definition is complex and varies with different authors and historical periods. Wikidata does not need to define it to use it. When an instance is added, if there is any question about whether the item fits the class of hôtel particulier, a reference identifying the building as hôtel particulier can be added to the "instance of" statement or another statement for the item. We could employ a similar procedure for the class "edition of a book", and just provide a short, not necessarily definitive description. Any Wikidata item with a statement identifying it as an instance of an "edition of a book" can provide a statement or reference that supports that it is indeed such an instance. For example, if the item has an ISBN, that is evidence. Or a book listing at WorldCat, e.g., OCLC 634752628. You may notice there is an item at the top of the WorldCat listing with the label "Edition/Format:", which in this case says "Print book". Or if you look at the results of a search, e.g. here, on the left there is a check box for book and some subcategories of book. It's these kinds of information that would confirm that an item is indeed an instance of an "edition of a book". I'm sure there are plenty of other souces which could be used to confirm that an item fits the class. An example of a possible description might be "written or illustrated composition published or intended to be published as a book." --Robert.Allen (talk) 10:12, 19 December 2019 (UTC)
@Robert.Allen: "It is not our job to define it." By writing that sentence, you are completely out of scope of what is happening in WD: WD aims to create an item for each concept. So if you don't want to define concept I think you are not at the right place. The descriptions of concepts are improved when the cases require more precise definitions and new items are created to distinguish concepts which were mixed. In the case of book, it was decided to avoid that term to use more precise concepts (work, edition, exemplar). So as our system is able to provide what is necessary to classify work, edition and particular book, I don't that saying "edition of a book" instead of "edition" is a large change in the concept. So if after that discussion, your proposition is to say that we should use "work of a book" or "edition of a book" instead of work or edition, I think I loose my time to discuss with you. Snipre (talk) 10:50, 13 January 2020 (UTC)

We shouldn't remove the "book" claim from the "Instance Of" statement in short story collections. The problem we have is that only by adding "book" we can differentiate if the item refers to a single short story or to a short story collection, as both have the claim "short story" in the genre statement. There is a "short story collection" item, but that one can't be added in the genre statement because a "short story collection" isn't a literary genre, "short story" is. So, having the "book" claim in the "instance of" is the only way of separating them. And this is used in many wikipedias to autocategorize articles, so by deleting the book claim User:Bodhisattwa has inadvertently broken the categories in different wikipedias of dozens of articles. We should put the "book" claim again, at least in the short story collections, to correct this problem.--Freddy eduardo (talk) 14:39, 3 January 2020 (UTC)

@Freddy eduardo: You mention yourself the solution: all short story have to be defined with short story as genre. No need to use book for that. Snipre (talk) 10:59, 13 January 2020 (UTC)
@Snipre: as I mentioned, the problem with that is that we can't differentiate short stories from short story collections, as both have "short story" as its genre. There's the "short story collection" item, but that isn't a literary genre. So, we need a way to state whether an item is a short story collection or a single short story. As I said, the best way is to leave the "book" statement in short story collections, as the modules in wikipedia need a way to tell them apart in order to categorize them correctly. All the items where @Bodhisattwa: removed the book claim got their categories messed up because they lost the categories related to short story collections and got the ones related to single short stories. Many of them still haven't been corrected.--Freddy eduardo (talk) 14:03, 13 January 2020 (UTC)
@Freddy eduardo: Modules should use the value of instance of (P31) instead. --JavierCantero (talk)
@Snipre: I thought it was already clear that neither novel (Q8261) nor short story (Q49084) (same for short story collection (Q1279564)) were literary genres. I don't know why we keep discussing this. A film (such as 2001: A Space Odyssey (Q103474) is a instance of (P31) film (Q11424), not a instance of (P31) audiovisual work (Q2431196) with genre (P136) set to film (Q11424). Compare that with we are doing with the literary equivalent: 2001: A Space Odyssey (Q835341), that can't be apparently a novel (Q8261) (even when that is the very definition of the item in Wikipedia), so it's defined as a written work (Q47461344) with the "novel" attribute moved to genre (P136). Novels are a very common, well-known well-recognizable family of items (as films are) and should be naturally defined as that, keeping written work (Q47461344) for the ones that don't fit in any other common category. --JavierCantero (talk)
I agree. This is yet another instance where common sense should prevail. --Robert.Allen (talk) 23:14, 13 January 2020 (UTC)
Novel, etc., are classified as literary form (Q4263830) or narrative form (Q6630149). Perhaps there should be a property for that. Ghouston (talk) 06:53, 14 January 2020 (UTC)
It shouldn't be literary forms, it's too limiting. There are novels that are spread through several books, with different ISBNs and even different publication dates in some cases (example: french and spanish editions of Cryptonomicon (Q534975)). There are books that contains several novels (example: "American Science Fiction: Nine Classic Novels of the 1950s", ISBN 978-1-59853-157-2). Today, novels are published simultaneously on paper, as ebooks and also as audiobooks, each of them with its own ISBN. A "novel" is not a property of a book, is a written intellectual work (Q15621286) that can be published in different formats. That "property" approach is not going to work to express such real-world diversity --JavierCantero (talk) 11:42, 14 January 2020 (UTC)
A literary form could be a property of a intellectual work (Q15621286), in the same way that genre (P136) can be used that way? We've got items like science fiction novel (Q12132683) which are conflating genre and form. Ghouston (talk) 12:35, 14 January 2020 (UTC)
Using one value field to code two separate concepts (such as science fiction novel (Q12132683)) is an infinite source of pain and never ends well. Whoever did that, they haven't thought that to support it they have to maintain all the possible combinations of genre and form (M x N elements being M the number of genres and N the number of forms). The explosive growth makes it totally unmanageable in the long run. --JavierCantero (talk)
Agreed, although I don't blame anyone on Wikidata, since the item was created by a bot to link with a Wikipedia article, and needs to exist as long as it has sitelinks. However, that doesn't mean it should be used as a genre on Cryptonomicon (Q534975), for example. Ghouston (talk) 22:41, 14 January 2020 (UTC)
@Ghouston: literary form (Q4263830) or narrative form (Q6630149) could be encoded using a property, if one was introduced. But what would be the advantage of doing so, compared to using instance of (P31) ? Jheald (talk) 23:48, 14 January 2020 (UTC)
I think the two methods are equivalent, since novel (Q8261) is an indirect subclass of written work (Q47461344), and so can be used as a instance of (P31). I don't know if there's any reason to prefer one method or the other. Cryptonomicon (Q534975) could be an instance of novel (Q8261) , although preferably not science fiction novel (Q12132683), since there's a property for genre. Ghouston (talk) 07:19, 16 January 2020 (UTC)

Publishing number in a collection

Hi, how can I add the publishing number of a book within a collection? series ordinal (P1545) is not authorised as a qualifier for collection (P195), and inventory number (P217) doesn't seem to fit. Thanks. Ayack (talk) 14:46, 10 February 2020 (UTC)

Could you explain more of what you're trying to encode? Do you mean the volume (P478) in the series? --EncycloPetey (talk) 15:44, 10 February 2020 (UTC)
Yes, indeed. The French label is wrong so I haven't seen it... Thanks. Ayack (talk) 17:04, 10 February 2020 (UTC)

Always a separate item for an edition ?

Hi I created my first book item Q87480748 and I want to make sure I do the things right. Do I also need to create a separate item for the edition given that there is only one edition of the book ? I plan to use this book as a reference in several Wikidata statements.--Kimdime (talk) 14:27, 11 March 2020 (UTC)

@Kimdime: Yes, unless you are sure that nobody will produce a new edition in the future. Snipre (talk) 21:11, 11 March 2020 (UTC)
What about when you are reasonably sure, or at least it hasn't happened yet, like a conference proceedings? Can you create an item which is an instance of edition and some kind of written work category? Items like Biomedical Engineering and Computer Science (ICBECS), 2010 International Conference on (Q21684049), does it need an edition item just to hold the ISBN-13? Ghouston (talk) 21:28, 11 March 2020 (UTC)
@Snipre: I'm positive about this, published in 1970, no reedition expected, the book was made available as a PDF online. But then, should I add that this is an instance of Q3331189 ? Because right now there is a constrain message on the ISBN field.--Kimdime (talk) 08:54, 12 March 2020 (UTC)
@Kimdime, Ghouston: Numerization of editions can lead to a new edition. Snipre (talk) 19:51, 16 March 2020 (UTC)
The only thing I can see is that different publication formats, like hardback, softback, and ebook, are likely to have different page numberings and different ISBNs, even though they are distributions of the same edition. But perhaps that could be recorded with qualifiers on a single item. The guidelines already say that a separate edition items aren't needed for every reprinting. Ghouston (talk) 00:04, 17 March 2020 (UTC)

Lemony Snicket (pseudonyms revisited)

A question that has come up previously was around how to handle cases where a book is written under a pseudonym, and Wikidata has items both for the author and the pseudonym. One interesting example of this is the A Series of Unfortunate Events (Q213841) series. We have items both for Daniel Handler (Q1060636) and Lemony Snicket (Q458346) (complicated further by the latter being a character in the books as well as the pseudonymous author of them). Currently many of the books in the series (e.g. When Did You See Her Last? (Q17066462)) have each as a separate author (P50) statement, which doesn't really seem to be the best approach. But having author (P50): Daniel Handler (Q1060636) with a qualifier of named as (P1810): "Lemony Snicket" seems lossy, as that's only a string, rather than connected to the relevant article. Has there been any consensus yet on how to best handle this sort of case? If not, can we come up with something? --Oravrattas (talk) 16:17, 14 March 2020 (UTC)

This seems like a general discussion about authorship and names rather than a Books-specific topic. Mark Twain (Samuel Clemens) wrote short stories as well as "books". There are also many actors, musicians, and other entertainers who work under under a stage name or pseudonym. This might be worth a general discussion in the Project Chat instead of a local discussion. --EncycloPetey (talk) 20:18, 16 March 2020 (UTC)
Thanks. I've raised it at Wikidata:Project_chat/Archive/2020/03#Notable_pseudonyms --Oravrattas (talk) 05:17, 18 March 2020 (UTC)

Book items

I find book items rather convoluted and counter-intuitive, especially bot-generated items, and am unsure if the purpose of Wikidata is to track editions of books or individual copies in some libraries somewhere, some of which happen to have been digitized. Case in point: Flora of Mount Desert Island, Maine (Q5862880) is the human curated item for a book, while bot-generated items Q51431260 and Q51475451 refer meticulously to individual copies on Internet Archive, with identifiers unique to the scan, but otherwise identical. There is another copy on Internet Archive that differs only in number of page scans (but not page numbers), and according to WorldCat, copies are held in over 80 libraries worldwide. They don't seem to represent separate editions. In this case can all three be merged? -Animalparty (talk) 23:03, 26 April 2020 (UTC)

The latter two items have different DOI values. Will it cause problems to have two different DOI values on the same data item? --EncycloPetey (talk) 00:24, 27 April 2020 (UTC)
I think the items should be merged if it's just two scans of the same underlying work. See Wikidata:Project_chat/Archive/2020/03#The_large_game_and_natural_history_of_South_and_South-East_Africa: I merged the duplicates I found in that case. The resulting item The large game and natural history of South and South-East Africa (Q51422566) does indeed have a number of duplicate identifiers. Ghouston (talk) 01:15, 27 April 2020 (UTC)
I went ahead and merged them. I think it is the pinnacle of frivolity to track individual copies of books of the same edition, as if we care that this copy can be found in Library X and has a coffee stain on page 9, while this copy in Library Z was scanned by a grant from the Rockefeller Foundation. Any book on your bookshelf has thousands if not millions of virtually identical copies around the world: they don't all warrant items. -Animalparty (talk) 05:30, 10 May 2020 (UTC)

Chronicles and other narrative sources

As I have noticed again recently, items for a particular chronicle (Q185363) or other historical source (Q5369651) are very often empty, no matter how old the respective items are. This appears to be true across languages, not just for those few I can easily read. There have been systematic efforts to open up these primary sources for research for centuries now (see e. g. this tutorial for students of medieval history). Much of this knowledge is still in printed books (like this particular Quellenkunde), but increasingly there are also online databases like Narrative Sources, Geschichtsquellen des deutschen Mittelalters or Historiography of Early Modern Ottoman Europe. I would appreciate it, if others could have an eye on those (relatively) empty items as well, especially those able to read Cyrillic (Q8209), as e. g. ru.wikipedia seems to have quite a few solid articles on medieval latin chronicles. --HHill (talk) 12:18, 12 May 2020 (UTC)

Short-story collections

Do we not have a specific item for "collection" in the sense of "collection of short works by a single author" (as opposed to "anthology", a collection of short works by multiple authors)? Am I just missing it? - PKM (talk) 20:48, 13 May 2020 (UTC)

We have short story collection (Q1279564). --EncycloPetey (talk) 23:50, 13 May 2020 (UTC)

Library of Congress Classification

In the section "Work item properties" one of the items is: Library of Congress Classification (Property:P1149). It has an example: The Party Journalist. But this example had a link to LCC, that did not work. I did not manage to find a working link. Can anyone please help/explain? Thanks, --Dick Bos (talk) 12:42, 18 July 2020 (UTC)

I'm not sure of the details for this particular situation, but the LCC is a general system. In principle, any book can have a classification assigned under this system, even if no copy of the work is present in the Library of Congress collections. This is necessary because many university libraries use this cataloging system for their own collections. So it is possible for a work to be assigned a value under this system but have no valid link at the LCC because the Library does not possess a copy. --EncycloPetey (talk) 04:40, 27 July 2020 (UTC)

How much propagating of replicated detail into subparts of works

When people are putting WD items for subparts of works, how much of the parent item detail is expected to drop down through the subsidiary articles? I have typically been minimalistic when creating these subparts where the parts mimic the parent.

For example with chapters of a non-fiction work, I would not replicate the language, the author, etc. from the work detail down to the chapters. Not certain what others are doing, and what we are recommending. Thanks for your thoughts and information about what and why you do something.  — billinghurst sDrewth 03:45, 27 July 2020 (UTC)

That's a tough one to generalize about, because some nonfiction works have parts by different authors, or have parts that are in different languages. I have created individual data items replicating everything when the parts of a larger work were individual poems, since there are published translations where the component poems have been translated. For a significant enough work, with translations into other languages, the information can be useful to replicate. --EncycloPetey (talk) 04:34, 27 July 2020 (UTC)
Yep. Where they are different authors, or any difference then I put that detail in. We almost need a field that says AS THE PARENT, so that 1) you don't have to go farnarkling on the detail, and 2) if the parent ever changes that it self-propagates (sort of like our relative links in WS works).  — billinghurst sDrewth 05:44, 27 July 2020 (UTC)

Adding a book (subtitle and automatic import questions)

I want to add this book.

My first question: Would you add A Guide to Fearlessness in Difficult Times as a subtitle? Or the whole The Places That Scare You: A Guide to Fearlessness in Difficult Times as the title? How do you decide something like that? Is there an official book database which treats subtitles separately?

Second question: Is there a tool to automatically import book data like this example? Can you connect with an existing database like goodreads?

Third question: If I put the author-property on the book, will there be automatically the inverse property 'author of' generated or should I put both manually? Whats best practice here?

Fourth question: Would you ask this question here or in the project chat?

So, that's all. Thanks. Franzsimon (talk) 21:34, 27 February 2020 (UTC)

The first question is a good one. I suppose title should be complete on its own, and the subtitle is likely to be displayed in a smaller font. For the book you give, I'd call "The Places That Scare You" a title and "A Guide to Fearlessness in Difficult Times" as a subtitle. en:Subtitle (titling) Ghouston (talk) 02:23, 28 February 2020 (UTC
@ Ghouston : The other questions are also good ;). I had the same solution in mind, but was not sure because the 'official' sources just have a title. So 'officialy' the whole string is the title and I was asking myself if Wikidata should somehow going after the 'official' version. But it's probably not that important. I guess I use what you suggest: splitting in title and subtitle. Franzsimon (talk) 21:58, 2 March 2020 (UTC)
Third question, I don't think that there is any "author of" property. There's author of (Q65970010), but that's an "inverse label" and you don't have to do anything with it. Ghouston (talk) 22:13, 16 March 2020 (UTC)
Good questions, Franzsimon! OCLC WorldCat may be close to an official database, but it tends to have duplicates because it aggregates records from member organisations. Some libraries purchase their MARC records from the book supplier, but others still hand-catalogue their acquisitions, leading to variation. For example this page currently lists 35 entries, but at a quick glance I see around 10 distinct editions. To bulk import from there without curation would replicate the problem. Perhaps a de-dupe on ISBN would work in many (but not necessarily all) cases. Like Ghouston, I would lean toward splitting out a subtitle at the first colon or (if I have the book in hand, or there is an image of cover or title page) based on font, regardless of “official” sources. Pelagic (talk) 23:57, 21 August 2020 (UTC)
MARC21 field 245 (Title Statement) has $a and $b subfields that can be used to separate title from subtitle, but the usage instructions say to split on punctuation, and examples show $b used also for translated title. 240 (Uniform Title) doesn’t have any special provisions for subtitles. [1] So the way cataloguing data is exchanged may tend to lose subtitles or store them as part of the title. An automated import from librarian-style title records would need to deal with punctuation conventions, where : ; = / signify different things. Pelagic (talk) 23:57, 21 August 2020 (UTC)

ISBN format constraints

Is there a bot that automatically resolves format constraint violations of ISBN-13 (P212) and ISBN-10 (P957), that is, adds the finicky dashes in the right spots? These mandatory dashes are rarely printed "correctly" in full on online sources, precluding easy copy and pasting. And if the answer is no, then why not? There is a bot that automatically adds the required spaces in ISNI (P213). Lets make bots work harder than people. -Animalparty (talk) 22:25, 1 August 2020 (UTC)

Plbot has that mission, although I failed to ascertain when it last did. While it is trivial to add dashes that satisfy the constraint, there are several possible ways to do so. Only one of these is "correct" in the this-gets-extra-credit sense of the term, because the third segment (of ISBN13s) is a variable-length segment identifying the publisher. The bot apparently uses an external file with that data, but last time I tried I could not find any free source that was actually complete. That would be a possible reason to nag users with the constraint. Matthias Winkelmann (talk) 01:49, 4 August 2020 (UTC)

@Animalparty, Matthias Winkelmann: My bot is doing this for ISBN-10. It is based on this file in which for each language/country the block lengths for publisher and publication are defined. I will try to expand it to ISBN-13 as soon as I have time. --Pasleim (talk) 19:32, 10 August 2020 (UTC)

Clarifying the use of identifiers for editions of written works on written works themselves.

ISBN-13 (P212), OCLC control number (P243) and Goodreads version/edition ID (P2969) (possibly more) are identifiers for editions of written works (?item instance of (P31) version, edition, or translation (Q3331189)) and NOT written works themselves. However, currently all these properties are allowed by constraint to be used on written works (?item instance of (P31) written work (Q47461344)). I think we should try agree on the rules for the use of edition identifiers on written works and then document it.

See also:

Iwan.Aucamp (talk) 22:15, 9 May 2020 (UTC)

Morrigan68 (talk) 17:09, 7 March 2021 (UTC) Aubrey
Viswaprabha (talk)
Micru
Tpt
EugeneZelenko
User:Jarekt
Maximilianklein (talk)
Don-kun
VIGNERON (talk)
Jane023 (talk) 08:21, 30 May 2013 (UTC)
Alexander Doria (talk)
Ruud 23:15, 24 June 2013 (UTC)
Kolja21
arashtitan
Jayanta Nath
Yann (talk)
John Vandenberg (talk) 09:14, 30 November 2013 (UTC)
JakobVoss
Danmichaelo (talk) 19:30, 16 February 2014 (UTC)
Ravi (talk)
Mvolz (talk) 08:21, 20 July 2014 (UTC)
Hsarrazin (talk) 07:56, 9 August 2014 (UTC)
Accurimbono
Mushroom
PKM (talk) 19:58, 10 October 2014 (UTC)
Revi 16:54, 29 November 2014 (UTC)
Giftzwerg 88 (talk) 23:36, 1 January 2015 (UTC)
Almondega (talk) 00:17, 5 August 2015 (UTC)
maxlath
Jura to help sort out issues with other projects
Epìdosis
Skim (talk) 13:52, 24 June 2016 (UTC)
Marchitelli (talk) 12:29, 5 August 2016 (UTC)
Alexmar983 (talk) 23:53, 28 August 2016 (UTC)
Finn Årup Nielsen (fnielsen) (talk) 10:44, 29 August 2016 (UTC)
Chiara (talk) 14:15, 29 August 2016 (UTC)
Thibaut120094 (talk) 20:31, 14 September 2016 (UTC)
Ivanhercaz | Discusión   15:30, 31 October 2016 (UTC)
YULdigitalpreservation (talk) 17:35, 10 November 2016 (UTC)
User:Jc3s5h
PatHadley (talk) 21:51, 15 December 2016 (UTC)
Erica (ohmyerica) (talk) 19:26, 1 January 2017 (UTC)
User:Timmy_Finnegan
Mauricio V. Genta (talk) 05:38, 12 March 2017 (UTC)
Sam Wilson 09:24, 24 May 2017 (UTC)
Sic19 (talk) 22:25, 12 July 2017 (UTC)
Andreasmperu
MartinPoulter (talk) 09:21, 20 July 2017 (UTC)
ThelmadatterThelmadatter (talk) 01:11, 13 September 2017 (UTC)
Zeroth (talk) 15:01, 16 September 2017 (UTC)
Emeritus
Ankry
Beat Estermann (talk) 20:07, 12 November 2017 (UTC)
Shilonite - specialize in cataloging Jewish & Hebrew books
Elena moz
Oa01 (talk) 10:52, 3 February 2018 (UTC)
Maria zaos (talk) 11:39, 25 March 2018 (UTC)
Wikidelo (talk) 13:07, 15 April 2018 (UTC)
Mfchris84 (talk) 10:08, 27 April 2018 (UTC)
Mlemusrojas (talk) 3:36, 30 April 2018 (UTC)
salgo60 Salgo60 (talk) 12:42, 8 May 2018 (UTC)
Dick Bos (talk) 14:35, 16 May 2018 (UTC)
Marco Chemello (BEIC) (talk) 07:26, 30 May 2018 (UTC)
Harshrathod50
 徵國單  (討論 🀄) (方孔錢 💴) 14:35, 20 July 2018 (UTC)
Alicia Fagerving (WMSE)
Louize5 (talk) 20:05, 11 September 2018 (UTC)
Viztor (talk) 05:48, 6 November 2018 (UTC)
RaymondYee (talk) 21:12, 29 November 2018 (UTC)
Merrilee (talk) 22:14, 29 November 2018 (UTC)
Kcoyle (talk) 22:17, 29 November 2018 (UTC)
JohnMarkOckerbloom (talk) 22:58, 29 November 2018 (UTC)
Tris T7 TT me
Helmoony (talk) 19:49, 8 December 2018 (UTC)
Naunc1
Shooke (talk) 19:17, 12 January 2019 (UTC)
DarwIn (talk) 14:58, 14 January 2019 (UTC)
I am Davidzdh. 16:08, 18 February 2019 (UTC)
Juandev (talk) 10:03, 27 February 2019 (UTC)
Buccalon (talk) 15:51, 27 March 2019 (UTC)
MJLTalk 16:48, 8 April 2019 (UTC)
Rosiestep (talk) 20:26, 24 April 2019 (UTC)
Dcflyer (talk) 12:23, 7 May 2019 (UTC)
Susanna Giaccai (talk) 05:56, 29 July 2019 (UTC)
Asaf Bartov (talk) 19:03, 31 July 2019 (UTC)
Msuicat (talk) 17:58, 6 August 2019 (UTC)
SilentSpike (talk) 15:27, 12 August 2019 (UTC)
TheFireBender (talk) 12:40, 20 August 2019 (UTC)
Jumtist (talk) 21:45, 22 October 2019 (UTC)
Irønie
Openly
DrLibraryCat (talk) 18:25, 25 November 2019 (UTC)
ShawnMichael100 (talk) 20:04, 25 November 2019 (UTC)
Lmbarrier (talk) 19:47, 2 December 2019 (UTC)
Satpal Dandiwal (talk) 17:32, 16 December 2019 (UTC)
Rosiestep (talk) 17:08, 14 February 2020 (UTC)
AndrewNJ
Franzsimon
Vladis13
Clifford Anderson (talk) 01:37, 1 April 2020 (UTC)
Discostu (talk) 09:02, 9 April 2020 (UTC)
Subodh (talk)
Iwan.Aucamp (talk) 14:02, 27 April 2020 (UTC)
Алексей Скрипник (talk) 15:31, 4 May 2020 (UTC)
MLeonStewart (talk) 18:04, 11 May 2020 (UTC)
ArielBritoJiménez (talk) 16:17, 31 May 2020 (UTC)
DanielleJWiki (talk) 16:16, 8 June 2020 (UTC)
Ninovolador (talk)
Blrtg1
Alex (talk) 06:05, 3 August 2020 (UTC)
Alex_Q (talk) 11:11, 18 September 2020 (UTC)
See the bright light (talk)
Alessandra Boccone (talk) 11:18, 6 November 2020 (UTC)
Uomovariabile (talk) 09:54, 13 November 2020 (UTC)
Pru.mitchell (talk) 08:11, 17 November 2020 (UTC)
Carlobia (talk) 13:34, 26 November 2020 (UTC)
Mathieu Kappler (talk) 11:31, 12 December 2020 (UTC)
Pierre Tribhou (talk) 19:19, 28 December 2020 (UTC) Alessandra.Moi (talk) 16:54, 20 February 2021 (UTC) Kind data (talk) 18:09, 23 February 2021 (UTC) Morrigan68 (talk) 17:11, 7 March 2021 (UTC)
  Notified participants of WikiProject Books Mattsenate (talk) 13:11, 8 August 2014 (UTC)
KHammerstein (WMF) (talk) 13:15, 8 August 2014 (UTC)
Mitar (talk) 13:17, 8 August 2014 (UTC)
Mvolz (talk) 18:07, 8 August 2014 (UTC)
Daniel Mietchen (talk) 18:09, 8 August 2014 (UTC)
Merrilee (talk) 13:37, 9 August 2014 (UTC)
Pharos (talk) 14:09, 9 August 2014 (UTC)
DarTar (talk) 15:46, 9 August 2014 (UTC)
HLHJ (talk) 09:11, 11 August 2014 (UTC)
Blue Rasberry 18:02, 11 August 2014 (UTC)
JakobVoss (talk) 12:23, 20 August 2014 (UTC)
Finn Årup Nielsen (fnielsen) (talk) 02:06, 23 August 2014 (UTC)
Jodi.a.schneider (talk) 09:24, 25 August 2014 (UTC)
Abecker (talk) 23:35, 5 September 2014 (UTC)
Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:21, 24 October 2014 (UTC)
Mike Linksvayer (talk) 23:26, 18 October 2014 (UTC)
Kopiersperre (talk) 20:33, 20 October 2014 (UTC)
Jonathan Dugan (talk) 21:03, 20 October 2014 (UTC)
Hfordsa (talk) 19:26, 5 November 2014 (UTC)
Vladimir Alexiev (talk) 15:09, 23 January 2015 (UTC)
Runner1928 (talk) 03:25, 6 May 2015 (UTC)
Pete F (talk)
econterms (talk) 13:51, 19 August 2015 (UTC)
Sj (talk)
TomT0m
addshore 17:43, 18 January 2016 (UTC)
Bodhisattwa (talk) 16:08, 29 January 2016 (UTC)
Ainali (talk) 16:51, 29 January 2016 (UTC)
Shani Evenstein (talk) 21:29, 5 July 2018 (UTC)
Skim (talk) 07:17, 6 November 2018 (UTC)
PKM (talk) 23:19, 19 November 2018 (UTC)
Ocaasi (talk) 22:19, 29 November 2018 (UTC)
Trilotat Trilotat (talk) 15:43, 16 February 2019 (UTC)
NAH
Iwan.Aucamp
Alessandra Boccone
Pablo Busatto (talk) 05:40, 23 June 2020 (UTC)
Blrtg1 (talk) 17:20, 23 July 2020 (UTC)
Kosboot (talk) 21:32, 23 July 2020 (UTC)
Matlin (talk) 09:38, 11 August 2020 (UTC)
Carrierudd(talk) 11:44, 3 November 2020 (UTC)
So9q (talk) 11:35, 16 January 2021 (UTC)
pdesai (talk) 16:00, 8 February 2021 (UTC)
  Notified participants of WikiProject Source MetaData Vladimir Alexiev (talk) 11:59, 13 March 2017 (UTC) GerardM (talk) 15:58, 26 March 2017 (UTC) Jonathan Groß (talk) 17:52, 26 March 2017 (UTC) Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits Jneubert (talk) 13:47, 29 April 2017 (UTC) Framawiki (please notify !) (talk) Sic19 (talk) 20:42, 12 July 2017 (UTC) Wikidelo (talk) 21:15, 8 May 2018 (UTC) ArthurPSmith (talk) 19:52, 22 August 2018 (UTC) PKM (talk) 19:40, 23 August 2018 (UTC) Ettorerizza (talk) 06:44, 8 October 2018 (UTC) Fuzheado (talk) 03:47, 19 December 2018 (UTC) Daniel Mietchen (talk) 16:30, 7 April 2019 (UTC) Eihel (talk) 15:13, 19 June 2019 (UTC) NAH (talk) 20:29, 18 August 2019 (UTC) Iwan.Aucamp (talk) 21:48, 3 October 2019 (UTC) Epìdosis (talk) 23:49, 22 November 2019 (UTC) Sotho Tal Ker (talk) 00:52, 1 May 2020 (UTC) Bargioni (talk) 09:48, 02 May 2020 (UTC) --Carlobia (talk) 14:34, 11 May 2020 (UTC) Pablo Busatto (talk) 03:22, 23 June 2020 (UTC) --Matlin (talk) 10:53, 6 July 2020 (UTC) Emu (talk) 18:26, 19 July 2020 (UTC) Msuicat (talk) 21:57, 27 August 2020 (UTC) Uomovariabile (talk) 10:04, 27 October 2020 (UTC) Silva Selva (talk) 17:21, 30 November 2020 (UTC) 1-Byte (talk) 15:52, 14 December 2020 (UTC) Alessandra.Moi (talk) 17:26, 16 February 2021 (UTC) CamelCaseNick (talk) 21:20, 20 February 2021 (UTC) [[--Songceci (talk) 18:45, 24 February 2021 (UTC)]] --moz (talk) 10:48, 8 March 2021 (UTC) AhavaCohen (talk) 14:41, 11 March 2021 (UTC) Kolja21 (talk) 17:37, 13 March 2021 (UTC)   Notified participants of WikiProject Authority control


@Sic19:


Discussion

  • Option A, Allow Edition data on Written Work items: What makes sense to me is to allow identifiers of editions on written works if they are qualified and if there is no Q-item for the written work edition. If a Q-item for the edition is created later then the identifier should be moved to the edition Q-item. Iwan.Aucamp (talk) 22:15, 9 May 2020 (UTC)
  • Do you mean allow identifiers on written works if there's no item for an edition? I guess edition items don't really make sense without a corresponding item for the work. There are properties like publication date and publisher which can be placed on the work item, even though they really only apply to an edition. I think a principle that could be used is that identifiers for a first edition can be placed on the work item directly, just like the publication date, which would avoid the need for edition items for works that only have one edition. Ghouston (talk) 00:10, 10 May 2020 (UTC)
  • I disagree. If we allow "first edition" information on the work data item, then how do we determine which "first edition" is meant. Many 19th-century novels were published in magazines before the "first edition" of the book came out. Many novels in English have a first UK book edition and a first US edition. Many nonfiction books have different ISBNs for the first hardback and for the first paperback "editions". Allowing "first edition" information on the work data item creates a hodgepodge of information and sends the wrong signal to editors who import data here from Wikipedia infoboxes. If edition information appears on a work data item, then an edition data item should be created. --EncycloPetey (talk) 01:15, 10 May 2020 (UTC)
  • If the magazine version came out first, then that would be the first edition. The other cases are typically just different formats or printing locations of the same edition. Identifiers can take a format qualifier without needing to create an item for every printing. Ghouston (talk) 02:09, 10 May 2020 (UTC)
  • But those generalizations do not hold. There are magazine-first editions that get new titles when released in book form, or new chapters, or chapters removed, or other editorial changes by the publisher. And publication of an edition can also mean new or different artwork, and certainly a different publisher. That kind of information cannot all be shoehorned into a single data item. I challenge your assertion that"other cases are typically just different formats"; the two first editions of Moby-Dick (Q174596) are a case in point; each had enormous editorial differences from the other. --EncycloPetey (talk) 05:01, 10 May 2020 (UTC)
  • Yes, there will be cases where the edition items will be needed. Another case would be you wanted to use an item in a reference with a specific page number. However, there are also cases where there's only a single edition, or where nobody cares enough to make multiple items, and it's still nice to be able to record at least one ISBN so that it can link to other databases. Ghouston (talk) 08:06, 10 May 2020 (UTC)
  • It's inconsistent that some "edition" properties can go on "work" items, without complaint, like publication date (P577) and OCLC control number (P243), but others like ISBN are flagged as constraint violations. Ghouston (talk) 08:33, 10 May 2020 (UTC)
    • OCLC is a mess. I agree with you about publication date, but I think it's being used as shorthand for first publication date. It would nice if we could require a qualifier on the publication date that identifies which edition that date applies to. --EncycloPetey (talk) 20:45, 11 May 2020 (UTC)
  • @Ghouston, EncycloPetey: I agree that the idea of first edition is a bit ambiguous. To me though this is more an issue related to WikiCite and source metadata, If the aim is to use WikiData as a citation database it would be useful to have ISBNs in WikiData, for one to do ISBN lookup. But I also don't think we necessarily want to have an item for each edition of a written work, in some cases editions would warrant items. I think there are maybe another option, to have something similar to software version identifier (P348) and then put the edition metadata on those. I can see all of this gets messy, and I think the simplest solution is not to conflate edition with written work, but I'm not sure there is enough commitment for that as already we have 100s of ISBNs on written works.
Whatever the solution, we have to deal with the inconsistency that Ghouston mentioned at some point in time. I would be somewhat open to insisting that edition identifiers only go on editions items, but then we have to be strict and consistent about it and fix all properties which do not have this constraint at the moment. Iwan.Aucamp (talk) 12:44, 10 May 2020 (UTC)
  • Option B, Insist that edition data only go on edition items:
    •   Weak support This is the "right" option, and the cleanest and simplest option. So I can't really oppose it. But I worry about the implications of this especially given this has not been enforced (even by constraints) at all. We would need to be a lot more proactive in patrolling new violations and in fixing existing violations. But I'm up for it if others are up for it. Iwan.Aucamp (talk) 12:44, 10 May 2020 (UTC)
    •   Support --EncycloPetey (talk) 15:55, 10 May 2020 (UTC)
    • Would this be for all written works, or would there still be an exception for the likes of newspaper, magazine and academic journal articles? Ghouston (talk) 03:26, 11 May 2020 (UTC)
      • @Ghouston: The problem is related to multiple editions: do we have several editions for newspaper, magazine and academic journal articles ? Is it a common practice to reedit newspaper, magazine and academic journal articles ? Snipre (talk) 08:46, 11 May 2020 (UTC)
        • It can happen, but it's probably much less common than for books. However, this is also true for certain types of books, such as conference proceedings. Ghouston (talk) 09:01, 11 May 2020 (UTC)
          • @Ghouston, Snipre, EncycloPetey: Conference proceedings will indeed be a bit verbose if we have to create both work and edition items for them. Maybe the solution is to just create editions without written work items for them? Iwan.Aucamp (talk) 21:52, 15 May 2020 (UTC)
        • Well, there's already the example of a novel serialized in parts before it's published as a book. Another very common example these days is preprints of scientific articles: if we wanted to be strict, they are separate editions. Ghouston (talk) 09:06, 11 May 2020 (UTC)
    • This is the only correct way. The current situation with publication date (P577) and OCLC control number (P243) is due to bad model at the beginning of the data importation from WP. Snipre (talk) 08:46, 11 May 2020 (UTC)
    •   Weak support I think this will lead to a lot of work with thousands of ISBNs that are linked to works right now. I'm also not 100% sure it is reasonable to create an edition for a work that only has one edition. But this doesn't change the fact that only editions have should have ISBNs, not works. This is also the way the authorities we link to are handling this topic. -- Discostu (talk) 10:10, 11 May 2020 (UTC)
    •   Support --JavierCantero (talk) 14:37, 12 May 2020 (UTC) I also would rather prefer to change the publication date (P577) in instance of (P31) written work (Q47461344) items to a new specific "first publication date" property.
    •   Support Consistent modelling. I don't like properties where the meaning changes depending on the applied domain and think Wikidata should generally avoid them. --SilentSpike (talk) 12:09, 23 May 2020 (UTC)
  • Support option A. Option B would require creating at a minimum 1 duplicate of every single work with an ISBN, even if there is only single edition. I guess robots have no problem with this (they don't care about user friendliness), but it sure makes work more confusing, redundant and tedious for us bumbling humans. I have no opposition to the creation of secondary items for separate editions if they exist and are notable and add to the project, but when an item can be fully explained with a single item (a single literary work with a single edition and ISBN, regardless of how many copies of the same work are scanned on Internet Archive or available in my local library), I don't think we should forbid placing an OCLC or ISBN on it, in the interest of reducing redundancy. -Animalparty (talk) 18:18, 17 May 2020 (UTC)
    If you want to create a single item, then create the item for the edition, and put the ISBN on that. --EncycloPetey (talk) 20:30, 17 May 2020 (UTC)
    @Animalparty: I share your hesitation here, but I think what EncycloPetey suggests, to just create edition items, is probably a reasonable solution and will work. In the end the result will be more clearly modelled data. Iwan.Aucamp (talk) 23:27, 19 May 2020 (UTC)
  • Option B2, Edition/Version is the core item, Work is optional (for books). Per 'Petey and Iwan above, and this discussion where I was pointed to here. Late addition, will add ping notifications separately. Pelagic (talk) 04:49, 22 August 2020 (UTC)
    • (1) We already recommend that Wikidata references should cite the edition, not the work. (2) Identifiers like ISBN are edition-specific. (3) There is no info in the Work that isn’t also in the Edition (e.g. we re-enter the titles and authors on each item). (4) Where we have multiple Q-items for different editions, then the Work is required to bind them together. Work may also be needed for Interwiki links if it has a page in Wikisource or Wikipedia.
      I can’t say about the practicalities of creating constraint violations, but suggest start with this as a recommended practice with or without constraints. Open questions: How would this affect existing queries that look for instances of work (Q386724) or written work (Q47461344) but not version, edition, or translation (Q3331189)? What about non-book media? (My thoughts about academic papers are opposite to those on books, I probably should expand on that.) Should we do anything retrospectively to existing Works without Editions, or just recommend this for future? — Pelagic (talk) 04:49, 22 August 2020 (UTC)

Written Work / Book / Novel / ...

I feel like more specific guidelines regarding instance of (P31) for works along the following ideas would be beneficial. Please let me know how misguided I am:

Works should be instances of one of the subclasses of written work (Q47461344), such as dictionary (Q23622), comics (Q1004), manga (Q8274), play (Q25379), or book (Q571). Choose the the most specific class that clearly applies and avoid tue tautologically redundant use of multiple classes from within this hierarchy. When in doubt, gravitate to commonly used concepts such as book (Q571) and avoid more artificial concepts such as written work (Q47461344) itself unless the situation clearly requires it.

(at #Work_item_properties)

And, for editions:

Editions should be instances of version, edition, or translation (Q3331189) or one of its subclasses. Do not use items from the work-level hierarchy outlined above on editions.

Matthias Winkelmann (talk) 22:44, 31 July 2020 (UTC)

Except that book (Q571) should not be used. That term can mean a "work", or an edition, or a specific physical copy, or a format in which w work can be published. It is too common and too variable a term to be used. --EncycloPetey (talk) 00:32, 3 August 2020 (UTC)
Just remember this easy tip: whatever seems logical, don't do that. -Animalparty (talk) 06:08, 22 August 2020 (UTC)

What to put as work author if edition authors change

It's not uncommon for things like textbooks to be written by a pair or team of authors, and for the team members to change over time as the work goes through decades of editions -- some textbooks and reference works are now a modified version of something more than a century old and the entire authorship team has been replaced. So what should we put as the author in the work record? The union of all authors; the authors of the first edition; just the "lead" author of the first edition if this can be determined; or what? — Levana Taylor (talk) 16:59, 2 August 2020 (UTC)

There was some discussion recently at Wikidata:Project_chat/Archive/2020/05#Different_authors_of_different_editions, which has a couple of examples of such works. I'm not sure that it came to any conclusion. Ghouston (talk) 00:28, 3 August 2020 (UTC)
Well, if ace data modellers are having their brains broken by this, I don't feel bad about being confused :-). Now that you mention it, it does seem that a work-level author is something different than an edition level author, where Voltaire is the ultimate author of Candide no matter how many translators, editors, abridgers, and commentators put their hands to one edition, and where if there is no such unifying authorship, it's almost like there is no work author. Now I'm reminded of one of the other great unsolved book-data-modelling issues, the inability to model editions-of-editions (all the editions of Martin Gardner's annotated Alice in Wonderland are just editions of Alice in Wonderland and nothing groups them together as derived from each other). If we thought of it as starting from the most specific editions as basic units and grouped them into an ascending hierarchy of ever larger categories that might help? An author could be an author at some level and everything lower but not higher -- all sorts of properties could inherit downwards from wherever the highest place they occur is. That is not how data is structured here now, though, is it? Not at all simple to implement. — Levana Taylor (talk) 04:26, 3 August 2020 (UTC)
Some of the "editions" are also just reprintings, the only difference may be a printing date, if you are lucky. Alternatively, a reprinting may change page layout or the number of volumes. Sometimes you may be able to work out that a certain printing is most closely related to some other printing, but it's probably too hard / too time consuming to do in general, look at the edition items linked to On the Origin of Species (Q20124). Presumably only the "edition" items are supposed to correspond to works that actually exist; the "work" item is just a place to link all the edition items. It just becomes hard, if you want to list all of an author's works: often you wouldn't want to list every edition, but on the other hand, what if an author is only mentioned on a particular edition item? Ghouston (talk) 06:11, 3 August 2020 (UTC)
The nice thing about thinking of written-work data as hierarchical generalizations is that your hierarchy could have only one level if you don't know subgroupings. Even overlapping groups would be possible as long as the computer checked to make sure that they didn't conflict: you could have a group for the editions authored by Jane Jones who worked on the project between 1971 and 1984, and another for those authored by Harry Huang who worked on it between 1980 and 1986, lowest-level items could belong to either or both groups; if you added something like publisher information to those groups, but it was different between the two, the computer would have to tell you that there can't be different publishers overlapping, although authors can overlap. The difficulty is not conceptual but practical -- huge practical problems!
If you want to keep the conceptual distinction between abstract highest-level properties like topic, and specific properties like publisher, then you'd have to think of the lowest-level objects as being part of two different sorts of hierarchies. Nonetheless, both of them would inherit downward and both of them could "collapse" if there's only one specific instance of the work, the way we now avoid having separate work-items and edition-items for scholarly articles. — Levana Taylor (talk) 14:37, 3 August 2020 (UTC)
Just add the author property to edition item with all the authors of the edition and this data will have priority on the author data from the work level. Snipre (talk) 15:48, 4 August 2020 (UTC)

  Comment I would have thought that the authors for the work are simply cumulative, though qualified, you can use start and finish time. The detail is buried in editions.

Hmm... not a bad idea, but if, before reading your comment, I had seen an author qualified with start and end times, I'm sure I would have had no idea what that was supposed to mean. — Levana Taylor (talk) 02:03, 7 August 2020 (UTC)
@Levana Taylor: The other option is to call later editions where the authors change a different work, and have one follow the other. I have only had to face this issue is for multi-volume works where the editors change in time, and I cannot say that I have overly fussed the work item, and been more involved with the edition level.  — billinghurst sDrewth 13:56, 7 August 2020 (UTC)
Rather than start-end dates, it would make more sense to have a qualifier "Property:contributor to edition QID" (and not split up editions into separate works). That way the work record could act as a true summary but also you could filter out some of the information if you wanted. — Levana Taylor (talk) 03:01, 8 August 2020 (UTC)
Date qualifiers seems like an elegant way to present it. Pelagic (talk) 05:05, 22 August 2020 (UTC)

Library Reference Model (LRM) Considered?

I read with enthusiasm the good working happening here in WikiProject Books. I see a reference to the FRBR bibliographic ontology and wondered if there has been consideration of the Library Reference Model (LRM), which sought to revise and update the IFLA FRBR ontology?

Excellent Primer recorded here on the LRM: http://www.ala.org/alcts/confevents/upcoming/webinar/120518 Full LRM Document: https://www.ifla.org/files/assets/cataloguing/frbr-lrm/ifla-lrm-august-2017.pdf

Unsigned comment Special:Diff/1166091991 by Jimfhahn 13:46, 25 April 2020 (UTC)

Thanks for posting here, Jim! I am not a Librarian (IANAL?!), so I hadn’t heard of LRM before reading it. I created a stub on English Wikipedia, w:en:IFLA Library Reference Model. Someone else beat me to IFLA Library Reference Model (Q54410458), which already has links to articles in Italian and French. Pelagic (talk) 23:45, 22 August 2020 (UTC)
Maybe on the project page we could change “FRBR” to “FRBR and LRM”? The main WEMI model is similar across both. Wikidata has entities for Agents (as either Humans or Organizations), and has some mechanisms for nomina via “stated as” qualifiers and “also known as” alternative labels, though I wonder how LRM deals with the “Lemony Snickett problem”. We can put statements (e.g. language) on Works that correspond to properties of the Representative Expression. Is there anything that we ought to do differently in light of LRM? Do LRM entities and relations have URIs that could be added as “same as” statements on their equivalent Wikidata items and properties? The mapping can't be exact since we collapse Expresssion and Manifestation into a single Edition/Version layer, and individual Items are outside our scope. Pelagic (talk) 23:45, 22 August 2020 (UTC)

inception (P571) vs publication date (P577)

Wikidata:WikiProject_Books#Bibliographic properties states that inception (P571) should be used on works and publication date (P577) on editions, that raises some questions for me, as until now I was believing that P577 should be used for both levels:

  • the edition-specific P577 description states that it should be the "date or point in time when a work was first published or released": that should be rephrased to "date or point in time when an edition was first published or released" right?
  • Should all works P577 statements be replaced by P571 statements or are there cases where P571 make sense for works (and are worth having several properties for pretty much the same role?)
  • the French label for P571 ("date de fondation ou de création" [foundation or creation date]) sounds a bit weird for a written work. Does "Inception" make sense in this context for English native speakers?

-- Maxlath (talk) 22:14, 30 August 2020 (UTC)

I guess that for most works, we only know the date of first publication. For all we know, a work may have been more or less finished years before it was published. Ghouston (talk) 23:07, 30 August 2020 (UTC)
@Maxlath:
  • I would agree but you should also ask on Property talk:P577 (and also ask for changing the constraint to remove work as an allowed value?).
  • probably not, we can't be sure if it's the date of creation or of first publication... (see @Ghouston: remark, and I'm adding the cases of posthumous publication or lost work as an extreme, a work can be created decades before it is finally published ; which also reset the copyright duration by the way).
  • I don't really find it weird creation and inception are synonyms, no? plus, "date of foundation" and "date of creation" are both alias for "inception" so it seems fine.
Cheers, VIGNERON (talk) 12:53, 1 September 2020 (UTC)

For "modern" works publication date (P577) is usually much easier to determine, unless the work is of high literary significance and has had studies published about the history of its creation. However, for classical and ancient works, inception (P571) is usually more meaningful as a date of origin, since those works were not "published" in any meaningful sense until centuries after they were written.

It should also be noted that Commons is applying inception (P571) to scans of editions, which is the reverse of what is currently recommended on Wikidata. --EncycloPetey (talk) 14:55, 1 September 2020 (UTC)

The ISBN13 Format Regex is broken

I believe the format constraint is wrong. Of the 2157 violations, 1833 are of the format 978-9X-.... But according to the prose explanation of the format, and the booksellers I checked, a two-digit second group is fine when it starts with a 9. Even one of the two property examples listed on the property does not validate!

To be honest, I would suggest scrapping the dashes and simplifying the format constraint to ^97[89]\d{10}$, i. e. "Start with 978 or 979 and have 13 digits" and the checksum. It's terribly annoying to try to guess where the dashes belong when websites routinely leave them out, and the data is practically useless: on the standard page for ISBNs, there's a link to this atrocity of a 40-line SPARKQL query:

SELECT ?isbn13 ?statement WHERE{
 ?statement prov:wasDerivedFrom ?ref .
 ?ref pr:P212 ?isbn13 .
 BIND(lcase(?isbn13)AS?l)
 BIND(lcase("9780954771003")AS?i)
 BIND(SUBSTR(?i,4,1)AS?i41)BIND(SUBSTR(?i,4,2)AS?i42)
 BIND(SUBSTR(?i,4,3)AS?i43)BIND(SUBSTR(?i,4,4)AS?i44)
 BIND(SUBSTR(?i,4,5)AS?i45)BIND(SUBSTR(?i,6,7)AS?i67)
 BIND(SUBSTR(?i,7,6)AS?i76)BIND(SUBSTR(?i,8,5)AS?i85)
 BIND(SUBSTR(?i,9,4)AS?i94)BIND(SUBSTR(?i,10,3)AS?i103)
 BIND(SUBSTR(?i,11,2)AS?i112)BIND(SUBSTR(?i,12,1)AS?i121)
 BIND("-"AS?h)BIND(CONCAT(SUBSTR(?i,1,3),?h)AS?i13)
 BIND(CONCAT(?h,SUBSTR(?i,13,1))AS?x)
 FILTER(CONTAINS(?l, ?i)||
  CONTAINS(?l,CONCAT(?i13,?i41,?h,SUBSTR(?i,5,1),?h,?i67,?x))||
  CONTAINS(?l,CONCAT(?i13,?i41,?h,SUBSTR(?i,5,2),?h,?i76,?x))||
  CONTAINS(?l,CONCAT(?i13,?i41,?h,SUBSTR(?i,5,3),?h,?i85,?x))||
  CONTAINS(?l,CONCAT(?i13,?i41,?h,SUBSTR(?i,5,4),?h,?i94,?x))||
  CONTAINS(?l,CONCAT(?i13,?i41,?h,SUBSTR(?i,5,5),?h,?i103,?x))||
  CONTAINS(?l,CONCAT(?i13,?i41,?h,SUBSTR(?i,5,6),?h,?i112,?x))||
  CONTAINS(?l,CONCAT(?i13,?i41,?h,SUBSTR(?i,5,7),?h,?i121,?x))||
  CONTAINS(?l,CONCAT(?i13,?i42,?h,SUBSTR(?i,6,1),?h,?i76,?x))||
  CONTAINS(?l,CONCAT(?i13,?i42,?h,SUBSTR(?i,6,2),?h,?i85,?x))||
  CONTAINS(?l,CONCAT(?i13,?i42,?h,SUBSTR(?i,6,3),?h,?i94,?x))||
  CONTAINS(?l,CONCAT(?i13,?i42,?h,SUBSTR(?i,6,4),?h,?i103,?x))||
  CONTAINS(?l,CONCAT(?i13,?i42,?h,SUBSTR(?i,6,5),?h,?i112,?x))||
  CONTAINS(?l,CONCAT(?i13,?i42,?h,SUBSTR(?i,6,6),?h,?i121,?x))||
  CONTAINS(?l,CONCAT(?i13,?i43,?h,SUBSTR(?i,7,1),?h,?i85,?x))||
  CONTAINS(?l,CONCAT(?i13,?i43,?h,SUBSTR(?i,7,2),?h,?i94,?x))||
  CONTAINS(?l,CONCAT(?i13,?i43,?h,SUBSTR(?i,7,3),?h,?i103,?x))||
  CONTAINS(?l,CONCAT(?i13,?i43,?h,SUBSTR(?i,7,4),?h,?i112,?x))||
  CONTAINS(?l,CONCAT(?i13,?i43,?h,SUBSTR(?i,7,5),?h,?i121,?x))||
  CONTAINS(?l,CONCAT(?i13,?i44,?h,SUBSTR(?i,8,1),?h,?i94,?x))||
  CONTAINS(?l,CONCAT(?i13,?i44,?h,SUBSTR(?i,8,2),?h,?i103,?x))||
  CONTAINS(?l,CONCAT(?i13,?i44,?h,SUBSTR(?i,8,3),?h,?i112,?x))||
  CONTAINS(?l,CONCAT(?i13,?i44,?h,SUBSTR(?i,8,4),?h,?i121,?x))||
  CONTAINS(?l,CONCAT(?i13,?i45,?h,SUBSTR(?i,9,1),?h,?i103,?x))||
  CONTAINS(?l,CONCAT(?i13,?i45,?h,SUBSTR(?i,9,2),?h,?i112,?x))||
  CONTAINS(?l,CONCAT(?i13,?i45,?h,SUBSTR(?i,9,3),?h,?i121,?x)))}

Try it! ("Try it" the template says. Because it's in on the joke that the query does not even work.)

There is some information encoded in these groups, but by my estimation there are about two people on the planet that intuitively know the publisher or country just from looking at it, and neither of them are well-liked by their friends. But, in any case, the ISBN Institute (it's a thing) publishes the data needed to dasherize an ISBN if one ever feels the urge. --Matthias Winkelmann (talk) 13:43, 9 August 2020 (UTC)

There is https://www.loc.gov/publish/pcn/isbncnvt_pcn.html to hyphenate ISBNs. Guessing the locations of the hyphens is not possible. The blocks have variable lengths in order to have enough numbers for a few large publishers, but also numbers for many small publishers. --Pasleim (talk) 19:40, 10 August 2020 (UTC)
+1 with @Pasleim:, no guessing needed or involved here. And the hyphens are needed and should be kept for many obvious reason already explained. « there are about two people on the planet » I'm pretty sure there is more than 2 librarian/bookseller/publisher/editor on the planet :P
That said, @Matthias Winkelmann: is right and the constraint is indeed wrong and should be corrected (and maybe completed and/or splitted into several simplier constraints to make them more understandable). Could someone located the problem and fix it? edit: the more I look, the more I find errors in this regex (most if not all 979 generate error too). I suggest to adopt a more easier regex, something like 97[89]-\d{1,5}-\d{2,7}-\d{1,6}-\d maybe?
@Matthias Winkelmann: the query does seems to work fine, what is the problem you've seen? I see no result but that's to be expected, no item use this ISBN as a ref, does it?
Cdlt, VIGNERON (talk) 11:49, 24 August 2020 (UTC)
I've edited the query above to include the ISBN13 of Ulysses (2004) (Q28599849), and it doesn't find the item. Neither in the format without dashes, nor with them. The main search finds the edition, but only if you manage to enter the ISBN with dashes, in all the right places. Even assuming that query can somehow be fixed, it is completely insane. This is probably the most well-known identifier there is. But to use it requires copy and pasting this...thing, which probably consumes half of the allotted time the query service gives you. In reality, finding items by ISBN would tend to be just part of a query. But any such query is immediately rendered unreadable and, for people with remnants of professional pride, unsharable.
The main problem with the dashed format is that ISBNs formatted in that way are not unique: there are dozens of different ways to split 13 numbers into five sets that satisfy the regular expression constraint, even if we manage to fix it.
Yes, there is in fact the one "true" format for any given ISBN. But because that truth is essentially a list of ISBNs (instead of some rules), users have no way of getting to that format from a dash-less ISBN except to find some tool that does it for them. That probably happens quite often: I just checked Google, OpenLibrary, and Amazon. The first two show ISBNs without any dashes, while Amazon splits of the initial 978. With the possibilities as they are, we are indeed capable of running a bot to occasionally correct them. But we can't offer any help at time of entry. That means users wishing to add an ISBN get constraint violations in return, and not knowing about the bot, or the relative weakness of constraints' prescriptive power, will waste time or give up in frustration. I believe the lack of citations for even potentially controversial statements on high-profile items is among WD's top problems, and entering ISBNs is on the critical path for adding them (even though it's a small part of how terrible that process is, but I digress...).
A user might then add dashes in appropriate places that satisfy the regex without actually being "correct" (this is what I meant by "guessing", above). Unless that bot does its work immediately, this leads to several problems: multiple items can have the same ISBN, but they will not be marked as such because they are cosmetically different (this is also a problem between ISBN10s and 13s). They also cannot be found by search, except by trying all possible combinations. Reconciliation is unlikely to work, etc. Essentially none of the benefits of an identifier actually materialise, even though the ISBN system is pretty close to being the "canonical", widely known, identifier.
I'm not too invested in this issue since books aren't items I usually work with for my actual job, and it's fine if people want to fix the obvious problems, and the world will keep turning. But if it's the work involved that scares people, I'd be willing to handle everything that's within my power. --Matthias Winkelmann (talk) 18:33, 24 August 2020 (UTC)
@Matthias Winkelmann: « it doesn't find the item » oh I see now! This query wass not made to find the item but to find the items citing this ISBN in references. Indeed, we probably should change that ; we could either make 2 separate queries or expand the existing query, both are very easy (it's just the first two lines of the query, replace these first two lines by ?item wdt:P212 ?isbn13 . and add ?item in the SELECT to find the item itself) just tell me which is best and clearer.
« A user might then add dashes in appropriate places that satisfy the regex without actually being "correct" » is it even possible ? For a given ISBN there is one and only possible position for the dashes (AFAIK), so if the regex is correct (which is a big if, the current regex still not being perfect) then the ISBN is correct too. I don't see any exception and how "guessing" can be wrong.
Cheers, VIGNERON (talk) 07:28, 25 August 2020 (UTC)
The hyphenation of ISBNs is a presentation "human readable" form, and spaces are just as correct as hyphens, according to the (latest, 7th ed) ISBN Users' Manual (section 5). "Note: The use of hyphens or spaces has no lexical significance and is purely to enhance readability."
The position, or lack or, hyphens or spaces does not change the 13 digit ISBN id number. The current regex is over-restrictive and is not a good way to store ISBNs in a database. If hyphenation is important, it should be added back in the view layer, and not affect storage or querying. This is why other publishing and library organisations use un-hyphenated formats for cataloguing and interchange (see the ONIX metadata description in section 8 of the User's Manual). ISBN-13 (P212) is a subproperty of Global Trade Item Number (P3962), but anything matching the current ISBN regex cannot validate against the EAN13/GTIN13 regex, which would be appropriate, and potentially useful. Storing hyphens in ISBN is akin to storing thousands separators in a database, and I don't consider that an exaggeration. I can see why ISBNs on Wikipedia might want to have hyphens for citations, which are a more of a display format, and my opinion on that is less formed. For a sub-property of Wikidata property to identify books (Q29547399) I think hyphens are inappropriate.
The LOC ISBN hyphenation form linked above can easily be outwitted by entering an ISBN from a non-Bowker region, e.g.: for the Chinese ISBN 9787503829901 978-7-50-382990-1, 978-7-503-82990-1, 978-7-5038-2990-1 are all hyphenation guesses that validate against the regex. LOC says "Unable to hyphenate this ISBN!" The 'safest' guess of 978-7-50382990-1 does not validate. The regex demands an arbitrary hyphen to split the last non-check-digit block. Putting that hyphen in the wrong place does not change the registrant or their publication element; that's always correctly represented by the ISBN, it simply makes it harder to search for or compare. Unfortunately putting the hyphen in the correct place also makes search and comparison at scale harder.
Another example is 9787543644748, which according to the printing on the book is correctly hyphenated as 978-7-5436-4474-8. The Bowker based online tools cannot split this as Bowker does not administer that region/range, and the https://www.isbn-international.org/range_file_generation mentioned does not help correctly split the 'registrant' from the 'publication element' as it does not list individual registrants by region. I believe this provides examples of what 'guessing' means, and also that there is no single source of information to make the full 'dasherization' for all possible ISBN -- the link from the official ISBN Institute is not sufficient. A lot of the publisher registrant information is commercially protected also, so I wouldn't consider that information is even supposed to be freely or publicly available.
The 2 people who can interpret the split groupings can still do with or without the explicit hyphens or spaces, and if it's important, it's generally better to strip out any well-meaning hyphens and form the groupings specific to the particular registrant or region of interest. At least that's my experience, maybe I shouldn't speak for the other one ;)
I support a simplification of the ISBN regex (@Matthias Winkelmann:' first regex suggestion looks like a correct ISBN 13 validation to me) rather than more bot effort into splitting stored ISBNs into (possibly incorrect?) human readable forms. Removing hyphens would make the data more useful and interoperable with other datasets. Any existing tools which can reliably make the splits should be used usefully in display contexts. Salpynx (talk) 23:24, 3 September 2020 (UTC)

Two WikiCite grant programs - applications closing soon

I just wanted to make sure that people who frequent this project page are aware of these two grant programs currently open, highly relevant to this Wikiproject's activities. They were announced on the main project chat a month ago (and various other places) but I wanted to write here specifically too. Apply by 1 October.

1. Project & events [$2-10k]

2. e-Scholarships [per-diem calculated on your city; 1-5 people (single, or as a 'remote group') for 2-4 days, for COVID-era "stay at home" projects. Paid in advance living allowance, no expense report required.]

There is lots of documentation, eligibility requirements, selection criteria, program design principles at those links. Please check them out. Sincerely, LWyatt (WMF) (talk) 13:58, 18 September 2020 (UTC)

Return to the project page "WikiProject Books/2020".