Wikidata talk:WikiProject Books/2018

Berg Encyclopedia of World Dress and Fashion

  WikiProject Books has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead.

Berg Encyclopedia of World Dress and Fashion (Q4891400) is a 10-volume encyclopedia published in hardcover, ebook, and online. Each volume has its own ISBNs, DOI, editors and subject area (e.g. African dress). Should I make a work item for each volume or can they all be editions of one work item Berg Encyclopedia of World Dress and Fashion (Q4891400) annotated as to volume number, editors, and subject area? (see contents) - PKM (talk) 23:55, 12 December 2017 (UTC)

After much thought, I am going to add a single edition for the online Encyclopedia and just include the volume information and reference URLs in individual references until I see if this source is useful enough to create items for each volume. - PKM (talk) 20:08, 14 December 2017 (UTC)
I have now added Berg Encyclopedia of World Dress and Fashion (Q55816150) specifically for the online edition with <has part> items for each of the 10 volumes. I'll be going through and updating my <stated in> references over time. - PKM (talk) 23:50, 29 July 2018 (UTC)

Proposing change to qualifiers— remove P248, add P805

Following a discussion in Wikidata:Project chat about the constraint that stated in (P248) is only to be used for references, that I will replace that with the identified preferred statement is subject of (P805). I will look to set up a references section on same page and have P248 entered there.  — billinghurst sDrewth 04:50, 14 December 2017 (UTC)

I don’t get why you need a qualifier and not a reference. The example do not help. author  TomT0m / talk page 16:42, 11 January 2018 (UTC)
@TomT0m: Where we are using described by source (P1343) it is not a reference, it is a qualifier to the work, as such it is a statement of origin. There has been widespread use of P248 in this situation, and this is a constraint violation, see Property:P248. So this is to offer a contextually corrected property to use for P1343.  — billinghurst sDrewth 04:55, 13 January 2018 (UTC)
See Charles Dickens (Q5686) for some examples of difference. If you have the "show constraint violations" gadget operating, you will see the highlights.  — billinghurst sDrewth 04:58, 13 January 2018 (UTC)

NB: I changed behavior of s:ru:Модуль:Другие источники to use P805 instead of P248 (diff). -- Sergey kudryavtsev (talk) 06:50, 13 January 2018 (UTC)

NB2: w:ru:Модуль:External links uses P805 too.

@billinghurst, TomT0m: Can you run a bot to replace P248 with P805? -- Sergey kudryavtsev (talk) 07:02, 13 January 2018 (UTC)

Well outside my skill set. I have placed this request for a bot to be run. There is a discussion section there if there is any comment to be made about the requested replacement.  — billinghurst sDrewth 10:53, 13 January 2018 (UTC)
Under way with user:PLbot undertaking.  — billinghurst sDrewth 13:21, 24 January 2018 (UTC)

Pseudonyms

Currently it seem we are assuming that pseudonyms do not have their own items. It’s not the case in external databases that have a proper identifier (Qid here) on pseudonyms. This causes questions on our users (see Talk:Q7245 or Topic:U5ied71lz96i7r8m). I think we should think about this. Any previous discussion about this ? Any known Documentation ? author  TomT0m / talk page 17:07, 11 January 2018 (UTC)

There are definitely cases where the pseudonyms have items, though the article needs to be about the pseudonym, not about the individual, ie. article at WP that has article as such pseudonym itself is notable. I have seen this more in cases of collective pseudonym (Q16017119).  — billinghurst sDrewth 05:02, 13 January 2018 (UTC)
Here a related question: is there a case where a Wikimedia project has two differents pages, one for the person and one for the pseudonym? (and is it a common pratice on a Wikimedia project?). I don't know any but if there is, Wikidata would have to deal with it. Cdlt, VIGNERON (talk) 11:14, 13 January 2018 (UTC)
@VIGNERON: check for instance of (P31) -> pseudonym (Q61002) and see what is there. I know that there are articles for collective pseudonyms.  — billinghurst sDrewth 12:52, 13 January 2018 (UTC)
My unknown case is when we know a text is signed by a pseudonym but we know nothing about whom actually is the author. Is the proposed model currently is
⟨ text ⟩ author (P50)   ⟨ unknown value Help ⟩
credited as Search ⟨ string pseudonym ⟩
 ?
This raises another question : a pseudonym is supposed to be its own identifier. What happens if several authors uses the same pseudonym at some point in time, we don’t know who one or two actually is but we are rather sure the author is not the same. In other words, there is two « persona », with each an author, with the same signature ? Is there anonymous authors we only know their pseudonyms ? This imply, if an anonymous author has several pseudonym and we create a « human » item for each, that one person can have several « human » Wikidata item. This is not true if we choose to have « persona » items. If we have « persona » item, we also can refer to this pseudo without using its signature string. We can have several « persona » for one pseudonym string. author  TomT0m / talk page 12:27, 13 January 2018 (UTC)
If there is no article/item for an author, just a pseudonym, and an unknown one, you probably should just consider using author name string (P2093). I see little point generating items for people who are basically anonymous.  — billinghurst sDrewth 12:53, 13 January 2018 (UTC)
Good questions.
The precise meaning and use of subject named as (P1810) is clearly not clear (there is a constraint used as qualifier constraint (Q21510863) but the given example is a direct property :/ I will raise this point on the talk page, but there is other unclear point, among others: is it limited to people or not?). Nonetheless, you model seems good, just one detail: it's not always a unknown value, it can be used for known value too for alternative names which act the same way as pseudonyms
⟨ some old edition of the 'Sonnets' ⟩ author (P50)   ⟨ William Shakespeare (Q692)      ⟩
subject named as (P1810)   ⟨ Shake-speares ⟩
(and with statement is subject of (P805) = spelling of William Shakespeare's name (Q7575898)). author name string (P2093) is a good solution too (but it depends on the context).
At least, one point is sure : anonymous (no name) and pseudonymous (some name) are mutually exclusive. It's either one or the other.
Cdlt, VIGNERON (talk) 12:56, 13 January 2018 (UTC)
@VIGNERON: I can also see it being used as qualifier to a reference, see Chet Baker (Q2274) which is causing constraints issues too. From my reading of the English description, it is used for proper nouns, rather than people.

Re your Shakespeare example, does it not come under my earlier explanation? I would have said that would just be the addition of the pseudonym property item added to Shakespeare, and then on the work, use author -> Shakespeare, then qualify with "named as" -> given pseudonym/alternate spelling/whichever  — billinghurst sDrewth 15:20, 13 January 2018 (UTC)

@billinghurst: indeed, I see that this point is already discuss on Property talk:P1810.
Maybe, but I'm not sure to understand, what « earlier explanation » are you talking about.
To get back to the original question, some database have several identifiers but some have only one (BnF has only one for Samuel Clemens/Mark Twain). Cdlt, VIGNERON (talk) 15:52, 13 January 2018 (UTC)
Once more we’re discussing a global issue (pseudonyms) taking the small picture. This tends to spread discussions everywhere :( This amounts to questioning Wikidata objective on this. I tend to think we’re one place where we can add informations that are not hold by over databases. Wikidata has a large scope, and tend to be inclusive. I think as a consequence we should allow to hold information about personas. author  TomT0m / talk page 16:04, 13 January 2018 (UTC)
@billinghurst: I wonder if the lack of a « persona » concept in this model tends to make kind of hard to treat cases in a generic way. There is a lot of properties and way to use it. Hard to take into account all the possible cases and not forget something. If a writer likes to play with the histories of its identities, invents false biographies for them, see https://en.wikipedia.org/wiki/Romain_Gary for example who let his cousin play the role of one pseudonym for the press, hard to model any of this. If we consider « Emile Ajar » a fictional character, then we can have an item for it and link it to the item of Gary’s cousin. Authors have also been known to change pseudonyms wrt. the field of work, eg. Special:EntityPage/Q309240 who signed « Moebius » only for its science fiction work. We can’t really link the pseudo with science fiction properly if we don’t have an item for Moebius. As a qualifier for the pseudonym maybe … but that’s a limited approach. Also a single persona may have several signature string. The « persona item » model allows to treat all kind of corner cases elegantly. And seems to me easier to query while being more flexible. I think we should have « persona » items and property to link them to their puppeteers. author  TomT0m / talk page 15:56, 13 January 2018 (UTC)
(ec) I said above:
There are definitely cases where the pseudonyms have items, though the article needs to be about the pseudonym, not about the individual, ie. article at WP that has article as such pseudonym itself is notable. I have seen this more in cases of collective pseudonym (Q16017119).
So no items for pseudonyms unless there is a wikidata item that says "this is a pseudonym" and not about the person for who it was a pseudonym.

So, for where there are multiple authority controls they are usually both entered against the person and each is qualified with "named as." If there is more than one BnF, then it will have corresponding multiple VIAFs, and it is my understanding that this will put the duplicates into a queue to be considered for merging.  — billinghurst sDrewth 16:02, 13 January 2018 (UTC)

@TomT0m: You can list multiple pseudonyms against one author. The task is to link a work to the author, irrespective of the name used, where the additional names are qualified.  — billinghurst sDrewth 16:05, 13 January 2018 (UTC)
If someone is creating false biographies for a pseudonym, then that sounds like it reaches into one of those where an article is being written about the pseudonym, and it does get its own item.  — billinghurst sDrewth 16:07, 13 January 2018 (UTC)
Then remember that I discussed collective pseudonym (Q16017119) so Ellery Queen (Q586362) and Michael Field (Q839369) have articles and have multiple people involved.  — billinghurst sDrewth 16:09, 13 January 2018 (UTC)

How to include books in a practical manner

I have read the documentation and there is only one concern that I have. It is wonderfull as a database but it fails me in several ways. I want to add all the books of all the authors we know. The objective is to information about books that are available for reading.

When I read about the database model, I find that there is nothing practical in there. The notion that LUA should be the glue to bind it all is not even an excuse. There are a few scenarios that I want an effective answer for.

  • I want all Wikisource books to be effectively registered so that we know what books are available for reading in what language. I really want us to advertise those books, I want them to be read.
  • I want us to import all books from the Open Library that have an author we have an identifier to the Open Library for. I do not mind to restrict it at first to include only the books with ebooks. To be truthful, I also want to include the books the Biodiversity Heritage Library has at the Internet Archive. For them we have to import many more authors .. but it is an option to treat them like we do scientific publications where authors are only added at a later date.

Now when it is about database design. It is one thing to suscribe to what libraries do, it makes sense when we accomplish things in this way. My challenge is how can we effectively register books and find an audience for these books. Thanks, GerardM (talk) 19:03, 25 January 2018 (UTC),

for wikisource texts, there is a work that is done now, by frwikisource and Tpt, to allow a rather automatic import of texts, as editions, and to ease the creation of work items. But it is not complete yet. You may read what's been done for now here (sorry, it's in French). --Hsarrazin (talk) 19:21, 25 January 2018 (UTC)
That is cool, even important. It is obvious that without data we cannot do much. But how is this going to enable more readers. How will this be a template for all the other Wikisources? How about all the other issues that I raise.. To paraphrase a Wendy advert: Where is the beef? Thanks, GerardM (talk) 19:53, 25 January 2018 (UTC)s
My two cents: Wikisource is still in the initial stages of adding to WD, and only the French and English Wikisources are really large enough and varied enough to be doing much. Many other Wikisources are small, poorly staffed, and have little oversight to maintain consistent formatting and data. Even on the English Wikisource, we face the issue that many older works and editions are so poorly curated, that they practically have to be done over again from scratch.
We've managed to do a decent job of adding authors and author data, but works, editions, and translations still have many challenges to overcome. I have requested a customizable tool for the addition of Wikisource works, but such tools seem to take low priority with the developers, who favor Wikipedia-tools because of the much larger participation. --EncycloPetey (talk) 00:59, 9 February 2018 (UTC)

Works

What is the current best practice on instance of (P31) for non-fiction works? And if book (Q571) is not the correct P31 for works - and I am sure it's not - could we please change the example on the project page? - PKM (talk) 19:47, 8 February 2018 (UTC)

I would say it depends on the item being added. For entries in the 1911 Encyclopædia Britannica, it's common to use encyclopedia article (Q17329259). There are also options for textbook (Q83790), academic journal article (Q18918145), etc. The use of book (Q571) is simply the most generic sort of example, and sometimes the only meaningful option. --EncycloPetey (talk) 00:54, 9 February 2018 (UTC)
I would say : in theory, all documents, fiction or non-fiction should follow the FRBR. In practice, since almost all non-fiction as only one editionFRBR per workFRBR, there is no real need to use the FRBR and most wikidatians only create one item (which ideally is more or less wrong but pragmatically is more or less right). But if you follow the FRBR, I see no reason why not use book (Q571) for workFRBR or as @EncycloPetey: said, any subclass of it, for instance a general and obvious choice is non-fiction work (Q20540385). Do you have a specific work in mind? Cdlt, VIGNERON (talk) 07:41, 9 February 2018 (UTC)
Mmm we’re actually not really « using FRBR ». You mean « create a work item » ? author  TomT0m / talk page 12:22, 9 February 2018 (UTC)
@TomT0m: mmm too, the first phrase of the first section on Wikidata:WikiProject Books is literally « We used the Functional Requirements for Bibliographic Records (FRBR) model », we adapted it (like everybody, nobody use exactly the FRBR, even the FRBR adapted itself several times since 1997) but adaption is still usage. Anyway, that doesn't matter that much, as PKM was speaking of « non-fiction works », I guessed (maybe wrongly) that she was indeed talking about creating a work item. Cdlt, VIGNERON (talk) 13:16, 9 February 2018 (UTC)
I’m of the opinion that, if we create a single item, it’s maybe best to create the work one anyway. author  TomT0m / talk page 14:12, 9 February 2018 (UTC)
@TomT0m: well yes, I think I get your idea but if you have only one item, you're outside the FRBR and work/edition separation, the item is neither and for the constraints you have to be both. Cdlt, VIGNERON (talk) 14:51, 9 February 2018 (UTC)
I don’t understand. FRBR describe a model, it does not require us to have items for every part of it ? Or does it ? author  TomT0m / talk page 15:15, 9 February 2018 (UTC)
Nobody is coming putting a knife under wikidatian's throat to create both a item about the work and one about the edition  . But logically, we're are creating item about works and editions. One is less meaningful and useful without the other. Cdlt, VIGNERON (talk) 15:21, 9 February 2018 (UTC)
@VIGNERON: The problem with using non-fiction work (Q20540385) for instance of (P31) is that it's not simply a from ("instance") but a form/genre combination item. That is, "book" is a form but "non-fiction" is a genre. So I wouldn't use that value at all. I would also point out that many non-fiction works have gone through multiple editions. I have books on my shelf about anatomy, botany, Greek theatre, and Latin grammar, as well as dictionaries, encyclopedias, biographies, writing guides, and statistical reference works which have all gone through multiple editions. --EncycloPetey (talk) 16:41, 9 February 2018 (UTC)
@EncycloPetey: There is nothing in instance of (P31) or in Help:BMP that says it classifies work of art by form. It’s a generic property that can handle classification by genre as well, it classifies by many criteria (and it’s is force, no need to reinvent the wheel to classify stuffs). As both genre-classes and form-classes are subclass of « work », this follows that there is no problem into creating a subclass of both non fiction and books. Although we don’t have to and using only instance of (P31) we could as well put statements with the two values. Seems practical however to create such classes for common combinations. author  TomT0m / talk page 17:38, 9 February 2018 (UTC)
@TomT0m: Yet we have no guiding philosophy or principle on this matter. I would argue that instance of (P31) should be limited to a form or structure, and leave the genre (for fiction works) to its own separate statement, and likewise the main subject (for non-fiction works) should be kept separate from the "instance of" statement to the greatest extent possible. --EncycloPetey (talk) 17:46, 9 February 2018 (UTC)
@EncycloPetey: I’d argue that other ontology project have handled taxonomies with a few numbers of properties (two, one to link instance to their class, and one for subclass relationship actually) and several class tree instead of creating one property for each taxonomies, with great success.https://en.wikipedia.org/wiki/OBO_Foundry For example https://en.wikipedia.org/wiki/OBO_Foundry (and they have many class trees). Following their path would probably be an help for interoperability if we share common principle with them. And we will have a hierarchy of artistic genre anyway (if not several, as there may be several ways to classify genres), so having a specific property to deal with them is not much help in my opinion. author  TomT0m / talk page 18:02, 9 February 2018 (UTC)
@TomT0m: Unfortunately, that is an encyclopedic categorization primarily for a single scientific subject field, and for a project like Wikisource, the structure quickly collapses. On Wikisource, we have followed the classification principles of the w:Library of Congress Classification. --EncycloPetey (talk) 18:26, 9 February 2018 (UTC)
@EncycloPetey: This is quite a large field, with many subfields and subontologies that are designed to work well together, which is not easy as there is many many ways to model things in a way that models won’t be easily combinable to each other. Definitely comparable with wikisource in complexity, if not waaay more complex. See https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2814061/ for example. Rapidly watching Library_of_Congress_Classification , it’s actually a topic classification amongst many other, like for example the ACM one, there is no objective reason to align to only one. As Wikidata is inclusive, this highlight the fact that we will indeed have to deal with several classification system. Plus Wikidata has a very rich system of items to precisely describe the topics of a book and relations between them, so it might be more efficient to use the « topic » property to the precise subject topic and to use knowledge fields items themselves to find close topics than to use a rigid topic classification tree designed for the non digital library era. author  TomT0m / talk page 19:01, 9 February 2018 (UTC)
And for the collapsing, I don’t know what you are referring to precisely, but if you’re referring to the category system and its loops, instance of (P31) and subclass of (P279)_are in no way comparable. Sure there is problems in our class tree but projects like OBO gives very good principle to avoid them by design. We try to unsure that a subclass is never a subclass of itself by a « subclass of » statement chain, for example, as by definition that would mean any of the classes in that path are equal. Classification (should) obey strong principles like the https://en.wikipedia.org/wiki/Type%E2%80%93token_distinction which are strong and well established, while categories are … way less structured. author  TomT0m / talk page 19:01, 9 February 2018 (UTC)
@PKM: could you give some context or example so we can see clearer here. Cdlt, VIGNERON (talk) 14:51, 9 February 2018 (UTC)
Sure! In general, I always make both work and edition items for my references, since (1) I am not always using the most recent edition of physical books and (2) I frequently use Oxford reference works which have separate editions for the online edition which will have a different date and ISBN than the physical publication. Also, I make heavy use of "main subject" and "genre" on these.
The work I was struggling with most recently was Patterns of Fashion 4 (Q48046762), a work on costume history, (edition: Patterns of Fashion 4 (Q48047403)). I used "book" here because I'm not sure what is better.
Another example is The Concise Oxford Companion to English Literature (Q47463825) (editions: The Concise Oxford Companion to English Literature (Q47463849) online 3rd, The Concise Oxford Companion to English Literature (Q47463828) online 4th). I used "creative work" here, again with some uncertainty. I have used "reference work" for items identified as "dictionary of..." or "encylopedia of..." in the past, but I'm not sure that's right for a general history of something. - PKM (talk) 19:24, 9 February 2018 (UTC)

can someone help me create a property for books?

hello! I've never created a new property. Is someone available to help? thanks! שילוני (talk) 14:03, 12 February 2018 (UTC)

Some tricky properties of a book

I'm clearly out of my depth here. It would be appreciated if someone else can take on cleaning up Q1366818 (the book Escape to Life) and then ping me to look at how it would be done correctly.

BEGIN: copied from Wikidata:Project chat.

Q1366818 (the book Escape to Life) presents an interesting situation on several counts. I'm wondering what, if anything, of the following we can somehow convey.

  • The book was originally published in 1939 by Houghton Mifflin. We have an existing entity Q390074 for present-day publisher Houghton Mifflin Harcourt, but not for this predecessor. It would be inaccurate to say that the book was published by Houghton Mifflin Harcourt; what should we do?
  • Klauss and Erika Mann originally wrote the book in German, but it was first published in English translation. A German edition did not come out until 1991. Is there any way to convey that the book was written in German, but first published in English? Is there any way to indicate the first German edition as being just that?

Jmabel (talk) 05:37, 16 February 2018 (UTC)

@Jmabel: - There's quite a lot of prior art at Wikidata:WikiProject Books; they seem to list the pertinent statements for the Work, and for the Edition, as far as i can see from a quick glance. (Sorry I'm pointing you elsewhere rather than answering in detail.) hth --Tagishsimon (talk) 09:02, 16 February 2018 (UTC)
Interesting. As a relatively casual user of Wikidata, how would I be likely to have found that page, other than coming here to ask? - Jmabel (talk) 16:13, 16 February 2018 (UTC)
@Tagishsimon: Even after reading that page, I don't see answers to either of the questions I asked above. Did you read the page and see answers to my questions? Or was this just "there's a lot of stuff about books at Wikidata:WikiProject Books, your questions might be answered there"? I think someone more expert than I on Wikidata would do well to see if this can be expressed with current properties (and if so I'd be interested in learning how). In particular, I'm guessing that for Houghton Mifflin there is some way to do this with custom properties, but I haven't been able to work out how to create one of those. - Jmabel (talk) 16:27, 16 February 2018 (UTC)
@Jmabel:
in fact, Escape to Life (Q1366818) is flawed because it is defined as a work Q7725634, but contains publication infos. If you read the Wikidata:WikiProject Books page, you've seen that works and version, edition or translation (Q3331189) are 2 different types. The work should contain only info about the authors, the original language (german), the original (german) title, the genre, and links to edition items.
infos about editions, both in english and german, must each go into an version, edition or translation (Q3331189) item, which would then have all the properties about the publisher, the year of publication, the title of the said publication, etc. like a traditional library catalog. For each edition there must be a different item, and it would be preferable if you could add a library ID for edition, LoC for example, to be able to differentiate editions and have reference.
Then, each edition is linked to the work item through edition or translation of (P629), and in the work item, you may link to the publications through has edition or translation (P747). Then, you can indicate on the English edition, that it was the first edition, like I did with Escape to Life (Q48914392). It should also be done for the first german edition, for which I have no info at all.
this may seem a little complicated, but it is the only way to manage data about the work and data about the different editions, without mixing them up.
if you need help, you may seek it on the discussion page of the project.
as for your question about publisher, on Houghton Mifflin Harcourt (Q390074), I see it was created in 1880, so it is the right publisher. Publishers often change their name through time, and it is written differently on many books, and it still is the same publisher... If it is the actual denomination in 1939 that bothers you, you can add a object named as (P1932) qualifier to set the exact name of the publisher at the time of publication. :) --Hsarrazin (talk) 16:50, 16 February 2018 (UTC)
Houghton Mifflin Harcourt doesn't seem to me like just a "change of name" of Houghton Mifflin. It represents a merger with the historically equally important Harcourt Brace Jovanovich (previously Harcourt Brace, then Harcourt, Brace, and World, then Harcourt Brace Jovanovich). Aside: there used to be a joke in the publishing industry that the name was changed because Jovanovich thought he was more important than the world.
I'm clearly out of my depth here. I'll bring it to Wikidata talk:WikiProject Books. - Jmabel (talk) 17:04, 16 February 2018 (UTC)

END: copied from Wikidata:Project chat. - Jmabel (talk) 17:06, 16 February 2018 (UTC)

@Jmabel: I totally agree with Hsarrazin, you should had one item for each edition, it's the easiest and simpliest way to go. Cdlt, VIGNERON (talk) 08:35, 23 February 2018 (UTC)
Hsarrazin VIGNERON So no item at all for the book as a work, just for editions? Because that is not at all the way that, for example, Hamlet (Q41567) is handled. - Jmabel (talk) 16:51, 23 February 2018 (UTC)
Also, I still see no way to express that the work was written in German, but first published in English translation. - Jmabel (talk) 16:52, 23 February 2018 (UTC)
this is deduced from the fact that the work item's language is German, while the first edition's language is English. --Hsarrazin (talk) 17:04, 23 February 2018 (UTC)
@Jmabel: you obviously need to keep the current item (Escape to Life (Q1366818)) about the work but you also need items for the editions (ideally for all the editions). Reminder: a work is an intangible object, it's *never* published what is published is de facto an edition. When you say "the work is published in English", it's in fact "the work has an edition in English". If you have several items, then it's easy to say "give me date and language of the first (or all, or the last) edition(s) of this work". Cdlt, VIGNERON (talk) 17:24, 23 February 2018 (UTC)
One solution:
one item Qxx0 for the work, with language in German
one item Qxx1 for the manuscript, with language in German
one item Qxx2 for the first edition, with language in English, translated from Qxx1, edition of Qxx0
one item Qxx3 for the second edition, with language in German, edition of Qxx0
For the editor problem, two cases:
1) Houghton Mifflin bought Harcourt Brace Jovanovich and changed its name by the same occasion. In that case, one item is sufficient, with two significant events, one for the buy and the second for the name change.
2) Houghton Mifflin merged with Harcourt Brace Jovanovich in a new entity called Houghton Mifflin Harcourt and in that case a new item is necessary for Houghton Mifflin Harcourt. Snipre (talk) 22:47, 23 February 2018 (UTC)
@Jmabel: Snipre (talk) 23:08, 23 February 2018 (UTC)
The history of Houghton Mifflin & Harcourt is even more complicated (Reed Elsevier had bought Harcourt, turned it into a couple of divisions of Reed Elsevier while keeping the names, then eventually sold those divisions (and also I believe some things that were never part of Harcourt) to Houghton Mifflin which changed its name to Houghton Mifflin at the time of the acquisition. So I guess it's more like your case 1, though I doubt we have entities in WikiData that describe exactly what Houghton Mifflin acquired. - Jmabel (talk) 00:14, 24 February 2018 (UTC)

Collection

Is there any way to include an edition in a collection of books? I mean, if I want to say some french edition belongs to Le Livre de poche (Q1629027) I can't use collection (P195) without triggering constraint issues thus this property is intended just for paintings, sculptures, etc. I don't know how to handle it -- maybe "part of", "series" or anything else. Any ideas? Thanks. Wikidelo (talk) 14:54, 18 April 2018 (UTC)

You should definitely not use collection (P195), as this is used to link the item to a collection assembled by a collector or collecting organization. Your example rather fits the definition of schema:Series; series (Q20937557) has <equivalent class> schema:Series. However, schema:Series is defined as a sub-class of Creative Work, while series (Q20937557) is not. Instead, Wikidata uses a qualifier to indicate the type of items of a series, e.g. Welsh Triads (Q2542444) (manuscripts), Zanja de Alsina (Q301895) (fortifications), Triumph Tiger (Q3539718) (motorcycles). In addition, several sub-classes of series (Q20937557) series have been defined - you may explore them using the Wikidata Ontology Explorer. Some of them relate to creative works. Maybe this would require some tidying up. --Beat Estermann (talk) 06:09, 19 April 2018 (UTC)
Maybe the following queries answer your question better. You may just replace the "P1433" in the second query to output the same list for the other properties. --Beat Estermann (talk) 06:51, 19 April 2018 (UTC)
#List of properties linking an item to a book series, ordered by frequency of use
SELECT ?property ?propertyLabel ?count WITH {
  SELECT ?property ?value (COUNT(DISTINCT ?item) AS ?count) WHERE {
    ?bookseries wdt:P31/wdt:P279* wd:Q277759.
    ?item ?wdt ?bookseries.
    ?property a wikibase:Property;
              wikibase:directClaim ?wdt.
    FILTER(?property != wd:P31)
  }
  GROUP BY ?property ?value
  ORDER BY DESC(?count)
  LIMIT 10
} AS %results WHERE {
  INCLUDE %results.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en-US,en,en". }
}
ORDER BY DESC(?count)
Try it!
# List of item / book series pairs linked with the property P1433 (published in)
SELECT ?item ?itemLabel ?bookseries ?bookseriesLabel 
WHERE
{
  ?bookseries wdt:P31/wdt:P279* wd:Q277759.
  ?item wdt:P1433 ?bookseries.
   
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Try it!
@Beat Estermann: But collection like Le Livre de poche (Q1629027) can be considered as a collection of written book by an editor. This similar to a collection of paintings by a collector. Written works are creative works so I don't really see why we should do a difference. Snipre (talk) 07:40, 19 April 2018 (UTC)
In fact, you're talking about a series which is called "collection". Let's take the example of stamps - you may have distinct series of stamps issued by the postal service which may respond to specific design criteria, and you may have collectors building collections of stamps according to their own criteria. I believe the case of Livre de poche is more like the case of the postal service. And yes, I think it makes sense to keep the two cases apart in the data model. Cheers, Beat Estermann (talk) 07:48, 19 April 2018 (UTC)
@Beat Estermann: Thank you very much for the examples. My skills with SPARQL are just slightly better than my personal best at pole vaulting (4 inches). Anyway, I see your point. I've been looking at several wikis and it seems the concept of "collection" has two ambiguous approaches: for French, Italian, Spanish, and a few more wikis, the idea of "collection" is more or less consistent [[1]]; nevertheless for enwiki there's no main category for "collection" (Penguin Classics, Everyman's Library, Folio...), just "series" [[2]]. Maybe I'm too picky, but for me, "series" should be Zuckerman Bound (Q17054053) and all the Zuckerman's books, or Harry Potter (Q8337), but Pocket Penguins (Q25037402) or Colección Austral (Q5776710) are quite another thing. I've also seen that there is little or non-existent consistency populating those items in wikidata. A lot of them don't even have a single statement, some are qualified as editorial collection (Q20655472), which is also weird because it's an instance of "art collection" and "catalog", and some are instances of "imprint" (Penguin Classics (Q11281443)). Messy. I agree with @Snipre:, "why we should do a difference"? But I also agree with @Beat Estermann:: maybe mixing art collections with paperback editions is not a good idea for the data model. Wikidelo (talk) 21:47, 19 April 2018 (UTC)
@Infovarius: It seems a good candidate. I'll try it. Thanks! Wikidelo (talk) 14:29, 20 April 2018 (UTC)
I don't think published in (P1433) is a good solution; the definition implies that it should be used for example for papers published in a scientific journal or for individual stories published in a larger work. In fact, many of the hits the above query produces point to scientific "book series" which are treated like journals (they also get a journal identifier). I would rather use part of the series (P179) instead; and possibly use different classes for different types of "series"/"collections". --Beat Estermann (talk) 17:38, 20 April 2018 (UTC)
@Beat Estermann, Wikidelo: Please provide an objective criterion to exclude books groups from collection. Please explain what is the common parameter between a paintings collection and a sculptures collection not shared by a books collection ?
part of the series (P179) is not appropriate as part of the series (P179) implies an order, a sequence: each element of a serie has to be connected to one or two others members of the serie using qualifiers like followed by (P156) or follows (P155). A collection is an ensemble of items grouped according to an arbitrary criterion.
@Jura1: Do you have some elements to distinguish part of the series (P179) from collection (P195) or do we have to merge both properties ? Snipre (talk) 23:20, 20 April 2018 (UTC)
@Snipre: I think I provided the distinction above - a "collection" has been collected by someone according to some criteria and is different from a "book series" (definition: "a set of independent books in a common format or under a common title or supervised by a common general editor", Oxford Dictionary). I am totally aware of the fact that the latter definition coincides with one of the definitions for "collection"@fr ("Série d'ouvrages, de publications ayant une unité", Le Petit Robert). – "collection"@en does not have this meaning; and as the definition given in Le Petit Robert implies, the property <"series"@en; "série"@fr> should be absolutely fine in this context - even from a francophone point of view. This said, we still may want to distinguish different types of series/collections at the level of the class definitions. The following distinguishing criteria come to mind (I don't have time right now to do a thorough terminological research across several languages, but we should maybe do that at some point - create a table with multi-lingual definitions and alignment with the respective WD class):
  • Have similar items been collected ex post or do they have been designed to be similar from the beginning?
  • Do the items have one common originator (publishing house, editor, etc.) or do they have different originators?
  • Can the items be brought in a natural order (are they even numbered)?
  • Does the order of the series apply to its content or rather only to its publication date or similar?
  • Is the series complete (we know all the items) or open (further items may be added)
So much for now... --Beat Estermann (talk) 07:27, 21 April 2018 (UTC)
@Beat Estermann: I don't know if it answers all your questions, but here's an example at LibraryThing [[3]] (note that the covers are randomly displayed; they aren't the actual covers of this particular editions). Wikidelo (talk) 11:47, 21 April 2018 (UTC)
@Jura1, Beat Estermann, Snipre: I guess part of the series (P179), catalog (P972) or collection (P195) could be tweaked to somehow solve the issue, but I can't figure out which one is the best candidate or in which way it could distort the main purpose of each property. I agree with all of your points of view but, as a non-librarian and a noob, I can't tell which one is best. I think the LibraryThing approach is quite interesting [[4]]. They use "Publisher series" for the "collection" concept I was talking about in the beginning. The closest we have here is editorial collection (Q20655472). So, we can broaden editorial collection (Q20655472) to include "Publisher series" or we can create a new "Publisher series" item. Then we can consistently make Penguin Classics (Q11281443), Pocket Penguins (Q25037402), Le Livre de poche (Q1629027), etc, instances of editorial collection (Q20655472) or the new Q:"Publisher series". Wikidelo (talk) 11:34, 21 April 2018 (UTC)
@Jura1: I think that should be only applied to version, edition or translation (Q3331189), otherwise we could end up with a mess, like book (Q571) with one or multiple ISBN. Wikidelo (talk) 11:34, 21 April 2018 (UTC)

Edition item properties

I've found a couple of properties that could be useful for Wikidata:WikiProject_Books#Edition_item_properties:

What do you think? Wikidelo (talk) 20:07, 21 April 2018 (UTC)

yep, seems interesting :
typeface/font used (P2739) would be mostly interesting on ancient books, where editions can be discriminated through font (I worked on a 1502 Venetian edition (Alde) this week, and can very well see where it would be useful). --Hsarrazin (talk) 09:25, 19 May 2018 (UTC)

Some questions, based on my first items for works / editions

Hi everybody. I've just created my first couple of sets of items for editions and works, at England Delineated (Q52228403) + editions, and A Topographical Dictionary of Wales (Q52240439) + editions, and I wondered if somebody could sanity-check what I've done, to see if there's anything I've done that's not right, or could be done better.

I've got quite a few editions/works that I'm about to organise some images for on Commons, that I'm intending to create Wikidata items for at the same time, so it would be good to know whether I've got things basically right.

The item-sets above I've hand-created as sort of a target to aim for. For the items going forward, I hope to be working much more from existing metadata, doing much less by hand, so they probably won't end up being as complete. But I thought if I do a couple by hand, then I could see where the different fields would fit, when or if I have them.

A few questions occurred to me along the way (below), that didn't seem entirely explained in the existing guidance at Wikidata:WikiProject_Books#Bibliographic_properties.

Sorry if it is rather a lot of questions, but it was my first time making works / editions items, so I was very much feeling my way, and hoping I was getting things right (or at least not too wrong!) So many many thanks in advance for any thoughts or comments, to set me on the right way. Regards, Jheald (talk) 23:59, 28 April 2018 (UTC)

Titles and subtitles of works/editions

Since nineteenth-century book titles can be quite long, what should go where? Are title (P1476) and subtitle (P1680) the right properties to use? Where should a title be split between them? Should one try to get everything into title (P1476) if one can, eg as done at England Delineated (Q52228403), or should I have tried to split this title?
With A Topographical Dictionary of Wales (Q52240439) I ran into the apparent problem that string fields can only be 400 characters long, leading to a rather unnatural split between the two parts of the title. Is there a better way round this? (eg is there any alternative property that ought to be used instead? Are there some properties that Wikidata allows to have longer strings?)
In practice this probably won't be a problem, because long titles in my metadata look as if they have already been truncated with ellipsis (...); so I shall probably go with just dumping the whole of that title into title (P1476). But it would be good to know what the 'right' thing to do is considered to be.
Query results for longest book titles tinyurl.com/ycnqu9s3, showing a few other examples
Usually there is a short 'convenience' title, which is typically what has been used for the item label -- should we have a property for this? If these Wikidata entries were being used to power reference citations, how would one most normally expect the the title to be stated in such a reference? Are we storing the information to generate that?
MARC 245 divides a title into: 'title' ($a) and 'remainder of title' ($b), stating that
"In records formulated according to ISBD principles, subfield $a includes all the information up to and including the first mark of ISBD punctuation (e.g., an equal sign (=), a colon (:), a semicolon (;), or a slash (/)) or the medium designator (e.g., [microform])."
Is that something we should try to emulate, or does it bring its own problems?
If the "remainder of title" would not fit into 400 characters, do we need a new property such as "subtitle (continuation)" ? Perhaps as a qualifier, since it would need to be linked to a particular value of "subtitle" ? Jheald (talk) 11:46, 29 April 2018 (UTC)

Naming of items

I've gone with shortened titles for item names, pretty much truncating at the first punctuation. I presume this is okay. One thing I wondered about was preference for title-case / sentence-case (ie how many words to capitalise in item names -- and in book titles). Again, I'm likely to follow the metadata, but I notice that for full titles, there seems to be an avoidance of title-case, presumably because it simply makes them too unreadable. For the shorter forms used for item names I think I prefer title case, but I did notice that a couple of existing items (England delineated (Q28872042) -- unrelated; and A topographical dictionary of Wales (Volume I) (Q25219289) -- a bit mixed up) both preferred sentence case. Are there any strong feelings one way or the other?
Also, naming of editions -- I've gone with England Delineated (1st edition) (Q52281940), England Delineated (2nd edition) (Q52229333), etc. Is this appropriate; what is the preference between this or using years for the suffix instead? (Or using both, eg "(1st edition, 1790)" -- are years helpful to include?) It may be easier to extract years from the data; on the other hand, there may be multiple years for the same edition (as we shall see).


Edition number

I'm getting a constraint warning for including series ordinal (P1545) as a qualifier to make the edition names more machine interpretable. Is there an objection to these? It seemed to me they might be helpful.
Also, does anyone know of any gadget or add-on to make it easier to re-order statements? eg the sequence of editions on England Delineated (Q52228403) is a bit of a mess at the moment. (I was kind of hoping that if I added series ordinal (P1545) qualifiers the software might sort them itself, but no joy). Yes, it's no big deal to do by hand; but if they were machine-added it could be a bit of a pain.
Your solution is not sufficient to sort editions: if several publishers generated several editions, then you should add another qualifiers to be able to distinguish which editions have to group before being sorted.
Ex. Publisher Y published one first and one second edition of a work, and publisher X published one first edition, one second and one third edition of the same work, how your system will help to sort edition of pulisher Y ? To be able to work correctly you don't have to use data from work item to deal with data about edition items. You are formatted with a wikipedia format where everything is mixed together in the same document. WD is a database, and data are splitted in several containers or items and you have to perform an extraction from the different items instead of duplicating data in the different items. Snipre (talk) 22:30, 29 April 2018 (UTC)

Different scans of the same edition

Turning to the edition items now, I have sometimes found cases where there are multiple different scans in circulation for the same edition -- see eg A Topographical Dictionary of Wales (3rd Edition) (Q52243033) for a particular case.
The constraint checker doesn't seem to like there being multiple different Google Books ID (P675) values for the same item. (Similarly for Open Library ID (P648) and Internet Archive ID (P724)). It also doesn't seem to like them being qualified with volume (P478). Are there real objections to consider here, or would it be appropriate for these constraint conditions to be relaxed a bit?
It probably also doesn't like me using publication date (P577) as a qualifier. In this case, the same edition has gone through various impressions, eg 1843, 1844, 1845. These differ little, apart from a few entries in the errata page, and a change of address at the end of 1843 for the publisher. To me it very much makes sense to group the different impressions together into a single item for the edition -- it makes it much easier to see the major developments of the text.
On the other hand, is it sometimes helpful to track particular scans? For example, very often the Internet Archive or Hathi scan corresponds to a particular scan from Google (although not always) Is it helpful to indicate this in some way? But if so then how?
Also, it may sometimes make sense to put images extracted from different scans of the same edition into different Commons categories (so that each category corresponds to images from a single exemplar of the book). In that case I'm guessing it may make sense to treat them as different exemplars from within the same edition, connected back by exemplar of (P1574) -- but most of the time I'm thinking that is probably an unnecessary complication? Is there an inverse property, to announce that the edition may have different exemplars?
Scans are similar to exemplars, and should not be mixed in edition level. If you want to add data about one scan, then you should create a new item for the scan and linked that item to the edition item like you link the edition item to the work item.
Ex.: if you want to add the localizaiton of one exemplar of the Gutenberg's bible, you haven't to add this data in the edition item, but create a new item. If you have three different ID and one property like publication date which are different for the scans, this justifies news new items to avoid confusion. Snipre (talk) 22:36, 29 April 2018 (UTC)
Okay, so let's take a look at how this might work.
I have created a new class individual copy of a book (Q53731850) for distinct printed copies of books (a few more eyes checking over its statements would be very welcome!), and changed the statements on a pair of existing items (hat-tip to User:Sic19), namely On the laws and practice of horse racing, etc., etc (Q51425849) and On the laws and practice of horse racing (Q51514189) to make them instances of it, and exemplar of (P1574) a new item On the laws and practice of horse racing (1866 edition) (Q53738443), to which I have moved statements that were specific to the edition rather than the copies.
However, with respect to information about online copies, I have duplicated this on On the laws and practice of horse racing (1866 edition) (Q53738443) as well as the items for the distinct copies, using statement is subject of (P805) to distinguish which copy each scan is taken from. I think it's useful to collect together information about all online copies on the edition item, because I think this is where people will look for that information, both directly as humans, and when writing queries. This creates an issue in the form of a violation of the 'unique values' constraint, but this can perhaps be worked around.
In practice I wouldn't expect many such items for distinct printed copies to be created. Internet Archive ID (P724) already allows collection (P195) to be used as a qualifier, and that should usually be enough to distinguish different scan families without the need for new items. So I would expect individual copy of a book (Q53731850) items only to be created rather lazily, in particularly complicated cases, or when there is specific information related to particular copies that people want to record. Jheald (talk) 22:10, 18 May 2018 (UTC)

Annotations for edition number (P393), publisher (P123), and printed by (P872)

In each case I have use object named as (P1932) to indicate how the name was actually stated on the title page. I hope this is acceptable. IMO it's for example quite useful to know that eg all copies of England Delineated (2nd edition) (Q52229333) were stated to be "Second Edition, with Additions and Corrections" -- this doesn't eg indicate an edition "2.1" following on from an initial release "2.0".
Where there isn't yet an item for the printer or publisher, I have used the special <some value> value, and then annotated it with the text from the page -- as eg at Q52283171#P872. In fact, this is what I am intending to do systematically for printers and publishers on initial item upload, then going back to see if there are ones I can match. I hope this is acceptable.
Also, where there is an identifiable address, I have put this in a P969 (P969) qualifier. Again, it's not suggested on the Books item style page, but I hope this is considered reasonable. With enough of these, it may be quite nice to be able to track the different addresses for a printer or publisher over time, with the works issued from each one.
The only complication I found was with A Topographical Dictionary of Wales (3rd Edition) (Q52243033), where the publisher's address changed during the print-run and/or re-issues of the edition. You can see how I've dealt with this, but I am open to suggestions, if anyone has a better thought.
Wrong. You mix the person and the company. You have to create an item for the company, i.e. S. Lewis and Co., and the address should be saved in the item of the company, not in the book's item. Again, you mix data about different concept into one item. address is not a characteristic of a book but of a company. Snipre (talk) 22:43, 29 April 2018 (UTC)

Annotations for full work available at URL (P953)

What qualifiers/annotations are recommended for P953? You can see what I have done at eg Q52243033#P953. Is this appropriate, or are other things that should be added? There are quite a lot of potential qualifiers in current use, eg tinyurl.com/ya5g4mxa. Are there any that it would be particularly valuable to try to make a point of including?
(BTW, I am presuming that if an edition has Google Books ID (P675) or NRHP reference number (P649) or Internet Archive ID (P724), then that suffices and it is unnecessary to add a P953 to the same scan?)
Also (perhaps related to the question that User:MartinPoulter raised a few threads above), what is the best way to indicate that a site offers eg a cleaned-up transcription of the full text, such as at Q52243156#P953, rather than the more common page scans + OCR ?
One other question that came up was how best to indicate multiple volumes available at the same link -- for example with Q52241009#P1844, both volumes are available at the same link (but as two different files). I indicated this by adding both volume (P478) = 1 and volume (P478) = 2 as qualifiers on the same statement. But at Q52241558#P953 the two volumes have been combined together into a single scan-file (they may also have been bound together). I tried to indicate this using volume (P478) = "1 & 2", but the constraint checker doesn't like this. Is the preferred way therefore to do what I did for the Hathi trust case? Or is there a different way to indicate two volumes together?
Same as above: instead of creating a bunch of qualifiers, create one item for the scan or the electronic version with all data including the link to the online version. Snipre (talk) 22:46, 29 April 2018 (UTC)
  • There is a qualifier to identify a specific page in a pdf: title page number (P4714), information that can't be stored otherwise. For the reminder, I think it depends how far you want to go. If you think a detailed description is needed, it might be preferable to create separate items.
    --- Jura 11:42, 4 May 2018 (UTC)
wikisource texts

Also, I would say : I have encountered items where full work available at URL (P953) had been used to link to a wikisource page... which was already present in the wikisource section of the same item. This is really useless and dirty. when an edition is available on wikisource, just link it as wikisource link ! --Hsarrazin (talk) 09:41, 18 August 2018 (UTC)

England Described (1818) (Q52284408)

I wasn't sure how to treat this. Should it be treated as an edition, or would it be more appropriate to treat it as a new work in its own right?
On the one hand, it is a much more extensive enlargement and rewriting of England Delineated (Q52228403) than the previous new editions. But on the other hand, it is an enlargement of Q52228403, albeit with a lot of new material, leading to a somewhat different focus.
If one is looking down the list of editions at England Delineated (Q52228403), is it helpful to see it included? (Google in fact titles it as such). Or would a stand-alone item and based on (P144) have made more sense?
There is no rule for that: starting when a modified edition starts to become a new edition or even a new work ? Usually the contributor who is adding this version has to choose based on expert or historic considerations. Snipre (talk) 22:51, 29 April 2018 (UTC)
I'm not sure what to use for the P31-statement but to express the relationship to England Delineated (Q52228403) you could use modified version of (P5059) instead of edition or translation of (P629) (or based on (P144)) to express that it is not a direct edition or translation but a modified version. - Valentina.Anitnelav (talk) 15:31, 3 May 2018 (UTC)

Does there *always* need to be a separate work and edition item?

Finally, if (as would be the case with England Described (1818) (Q52284408)), this is the only time the title was issued, is it appropriate to try to combine 'work' and 'edition' in the same item (as OpenLibrary does, or at least displays) ? Or is it still required to create two items, even though they will be rather redundant to each other?
Thanks in advance, Jheald (talk) 00:10, 29 April 2018 (UTC)
Any more thoughts about this?
Per guide to item structure on the project page, a lot of properties are expected to be located on version, edition or translation (Q3331189) items.
So, in cases when there has only ever been one edition, if we do accept that only a single item should be created, it would make sense to me for it to be made instance of (P31) both version, edition or translation (Q3331189) and book (Q571).
If we do go down this route, it might be helpful to include edition or translation of (P629) statement pointing to itself -- I think query writers would find this useful, so that separate edition and work items and combined edition/work items could both be dealt with in the same way.
Does that seem a sensible suggestion to people? Jheald (talk) 22:28, 18 May 2018 (UTC)
Yes, if and only if both concepts are merged inside the same item, meaning that all properties linking the edition and the work as as all editions properties and all work properties are present in the same item, then we can consider that solution. The risk is more about constraints: we will complexify the monitoring of properties use. Snipre (talk) 23:50, 20 May 2018 (UTC)
@Snipre: So you're saying it would also need has edition or translation (P747) on the item, pointing to itself? Jheald (talk) 09:24, 21 May 2018 (UTC)
@Jheald: Exactly. That's the only for lua scripts retrieving data from both work and edition items to be able to work without complexifying the code. But again this solution is possible but not recommended as we will have problem for some constraint definitions. Snipre (talk) 11:13, 21 May 2018 (UTC)
Question: Has England Described (1818) (Q52284408)) even been published in translation? That would require a separate data item for the edition. So, we're talking about finding a way to have a single data item for a work that was issued only once, in only one language, by only one publisher, from only one location, and never translated nor reprinted. --EncycloPetey (talk) 00:16, 21 May 2018 (UTC)
@EncycloPetey: Agreed. But it's not such an uncommon case -- in fact I would think it is the most common situation for most classes of old books. Jheald (talk) 09:22, 21 May 2018 (UTC)
That's not my experience with old books. My experience is that many were published as UK/US editions, or were published in another language, or had the contents appear later in another edition. --EncycloPetey (talk) 15:07, 21 May 2018 (UTC)

subject areas and genres

Sometimes genre (P136)-statements have subject areas/academic disciplines as their values. The most frequent are philosophy (Q5891), art history (Q50637) and history (Q309), but there are also cases like statistics (Q12483) and finance (Q43015). I see that this somehow mirrors the practice in book shops but I'm rather sceptical if it is the best way to express the fact that a work is of interest for a certain discipline. I see following options to deal with subject areas used as genres:

  1. Generally allow instances of academic discipline (Q11862829) to be used as values in genre (P136)-statements
  2. Create a new genre-item for each subject area that is used as a genre
  3. Expand the scope of an already existing property to be applicable for those cases, too (field of work (P101) is the one that comes to my mind, but maybe there are others)
  4. Create a new property <subject area> that has works as its domain and subject areas as its value

I don't really like the first two approaches (they tend to misuse genre (P136) as a catch-all), but what do you think about this issue? - Valentina.Anitnelav (talk) 14:53, 3 May 2018 (UTC)

@Valentina.Anitnelav: Yes, I agree that there is some confusion ,ainly because no clear classification exists about written texts.
In my opinion we need 4 properties to describe correctly
If I take the examples you provided, history, finance, statistics,... these are subject and no genre. If I should characterize your examples, I would propose as written format textbook (Q83790) and as written genre essay (Q35760), treatise (Q384515), scientific style (Q1965486),...
History, finance, statistics are not genre but subject and the property main subject (P921) should be used instead of genre (P136). Snipre (talk) 01:04, 4 May 2018 (UTC)
I mainly agree with you, Snipre, and I especially like the idea to separate between form (or written format) and genre.
I also thought about using main subject (P921) for subject areas. I abandoned the idea as this is actually not very accurate: the subject area of a work is seldomly the main topic. See for example The religious and historical paintings of Jan Steen (Q29589359). It is a catalogue about Jan Steen (Q205863), not about art history. Art history is the subject area this book is written in or of interest for. On the Genealogy of Morality (Q230302) is about morality, not about philosophy (in difference to The Problems of Philosophy (Q3393210)) - Valentina.Anitnelav (talk)
@Valentina.Anitnelav: You can add several subjects so I think you can really that the book is about art history and Jan Steen (Q205863) and even add the list of works mentioned in the catalog. Snipre (talk) 10:03, 4 May 2018 (UTC)
@Snipre: I see a problem with the use of main subject (P921) because those statements would be inaccurate (not because of the number of values). On the Genealogy of Morality (Q230302) is not about philosophy (e.g. its principles, questions, methods, development) and The religious and historical paintings of Jan Steen (Q29589359) is not about art history (e.g. its principles, questions, methods, development). It should be possible to get all books having philosophy as its main topic (e.g. The Problems of Philosophy (Q3393210) and What is Philosophy? (Q7991586)) without getting every book in the field of philosophy. - Valentina.Anitnelav (talk) 10:56, 4 May 2018 (UTC)
That's why libraries often use "Schlagwortketten" (subject strings) like "Philosophie - 19. Jahrhundert - Nietzsche - Moral". Imho main subject (P921) is the right property, since an editor is free to add a second "main subject" or replace a general subject like "philosophy" with a more precise term like "Frankfurt School". --Kolja21 (talk) 01:36, 11 June 2018 (UTC)

Time for a new "subject facet" property ?

@Valentina.Anitnelav, Snipre: Further to the above, I wonder if it would be useful to propose a new "subject facet" property ?

For the Bioheritage Diversity Library (BHL) books, discussed in this section below, that we now have 60,000 items for, the BHL releases a 'keywords' dataset, that it would be useful to think how best to add.

Looking at keywords that have more than 400 hits (from the volumes of the whole collection, not just the items we have titles for), a few we might consider to relate to the form of the item (ie what the item is), viz:

Periodicals (43829); Catalogs (10750); Pictorial works (1635); Internet resource (1455); Electronic books (929); Collected Works (704); Early works to 1800 (682); Catalogs and collections (571);

But mostly they are indicative of the subject matter, ie:

Natural history (10273); Science (9892); Botany (8022); Nursery stock (7413); United States (6392); Plants (6044); Birds (6012); Seeds (5556); Zoology (4745); Nurseries (Horticulture) (4643); Flowers (4330); Plants, Ornamental (3622); Agriculture (3482); Entomology (3219); Gardening (2985); Trees (2984); Seedlings (2855); Vegetables (2828); Geology (2746); Insects (2739); Fruit (2705); Insect pests (2670); Societies, etc (2635); Forests and forestry (2491); Fruit trees (2414); Great Britain (2329); Paleontology (2210); Control (2184); California (2184); Shrubs (2109); New York (State) (2078); Biology (2068); Angiospermas (2017); Bulbs (Plants) (1939); Flora (1803); Mollusks (1799); Germany (1717); Classification (1703); North America (1661); Fisheries (1651); Bibliography (1604); Horticulture (1577); Fishes (1512); Horses (1264); Equipment and supplies (1262); Australia (1220); France (1215); Ornithology (1177); Massachusetts (1155); Diseases and pests (1114); Montana (1091); Grasses (1089); England (1049); Canada (1043); Description and travel (1005); Research (1004); Pennsylvania (999); Alberta (965); History (924); Mexico (916); Anatomy (871); Fruit-culture (868); Illinois (845); India (834); Italy (818); Europa (809); Washington (State) (809); physiology (797); Oceanography (796); Taxonomía (782); Ohio (771); Hunting (750); Varieties (740); Península Ibérica (739); Iowa (725); Marine biology (708); Mammals (689); Pteridófitos (682); Gimnospermas (655); Scientific Expeditions (643); Botanical illustration (637); Roses (630); Lepidoptera (614); Bees (611); America (601); Agricultural implements (585); Berries (585); Statistics (584); North Carolina (583); Prices (582); Beetles (581); Europe (578); Animals (572); Forest reserves (566); Poultry (552); Fishing (544); Obras clásicas (543); Antiquities (534); Seattle (533); Game and game-birds (532); New Jersey (516); University of Washington Botanic Gardens (516); Anatomy, Comparative (514); Colorado (508); Diseases (508); Hongos y líquenes (494); Michigan (491); Evolution (488); Environmental aspects (482); Fungi (481); Veterinary medicine (477); 1809-1884 (463); Plant diseases (462); Engelmann, George, (461); Forest management (461); Wildlife conservation (460); Briófitos (458); Ethnology (455); Medicine (453); Beneficial insects (453); Natuurlijke historie (442); Austria (440); Floriculture (439); New York (439); Field notes (438); Africa (435); Indonesia (429); Plant collecting (427); Florida (424); Learned institutions and societies (421); Microscopy (413); Asia (412); Plantas útiles o venenosas (412); Minnesota (402); Identification (401);

But these are not, in almost all cases, the main subject (P921) of the item. Instead they are more like en:faceted search terms.

So, for example, if we take a book like A systematic arrangement of British plants. 4th edition (Q51423679), library catalogues might give the subject as "Botany -- Great Britain" and "Botany -- Ireland" (those two from OCLC, which for copyright reasons we can't take; but an edition at the LoC might have something quite similar). This would correspond to our P921.

On the other hand, the BHL [5] gives keywords "Great Britain", "Ireland", "Plants".

For the reasons Valentina was expressing above, I think these need a different property, that might perhaps be called "subject facet". What do people think? Jheald (talk) 10:55, 8 June 2018 (UTC)

Proposed, at Wikidata:Property_proposal/Creative_work#subject_facet Jheald (talk) 23:20, 10 June 2018 (UTC)

Pauly-Wissowa

Hi! I've just noticed that the volumes of Paulys Realenzyklopädie der klassischen Altertumswissenschaft (Q1138524) have instance of (P31)Wikimedia category (Q4167836) (e.g. Pauly-Wissowa vol. I,2 (Q26414652)); however, this property is in contrast with the constraint of published in (P1433) (e.g. in Ancites (Pauly-Wissowa) (Q15892059)). The problem affects thousands of items and creates thousands of constraint violations. My proposal is to trasform the items of the volumes from categories to items of books (e.g. Pauly-Wissowa vol. S I (Q26469375) is actually both!). What do you think? --Epìdosis 15:22, 9 May 2018 (UTC)

  Done. --Epìdosis 08:40, 30 May 2018 (UTC)

Award for book or for author ?

Hi, I wonder where should be the award received (P166): on the author item or on the book item. It's assumible to have it duplicate on both concepts ?. What about the inconsistencies ?. Excuse me, if it had been discusse before and I don't find. Thanks, Amadalvarez (talk) 07:27, 12 May 2018 (UTC)

I have added many awards to people. When an award is also associated with a book, it is easy enough to add them as qualifiers.. A secondary notion is that authors are more likely to have an item than a book. Thanks, GerardM (talk) 08:08, 12 May 2018 (UTC)
@GerardM: When you say "... to add them as qualifiers.", under which property would you use award received (P166) as a qualifier? May be author (P50)?. Thanks, Amadalvarez (talk) 22:35, 12 May 2018 (UTC)
In awards such as the Hugos, sometimes the same author can be nominated (finalist) to the same category two times for two different works, so I interpret that in those cases the award winners are not the authors but the works. However, since the winner (P1346) property is a person, I add award received (P166) to both person and literary work. Also, if your local Wikipedia templates support Wikidata integration, this way they can automatically show the awards received by the person in his/her infobox, and by the literary work in its infobox too. --JavierCantero (talk) 08:28, 13 May 2018 (UTC)

Recording the edition format

One of the English-language aliases for property distribution format (P437) is "book format".

Is distribution format (P437) appropriate to record the format of the books in a particular edition -- eg folio (Q772267), quarto (Q2122442), octavo (Q1307353), duodecimo (Q1266414) etc -- as at eg Q53576187#P437 ?

Or should the values allowed for distribution format (P437) be restricted to the currently permitted hardback (Q193955), paperback (Q193934), pamphlet (Q190399), softcover (Q990683), library binding (Q6542551); and perhaps a new property be introduced for book format, akin to newspaper format (P3912) ? Jheald (talk) 20:36, 19 May 2018 (UTC)

Number of pages

We have the property number of pages (P1104).

If a source gives the number of pages for a book as eg "viii, 187 p." or "338, xlviii p.", are there agreed values to attach to an applies to part (P518) qualifier to denote the number of pages of front matter (front matter (Q24033349)), main content, and appendices respectively ? Jheald (talk) 20:47, 19 May 2018 (UTC)

I would love to hear a cataloguers professional point of view. From my experiences with reproducing old works there is no such thing as uniformity, so at best guess you are seeing the last numbered page of each section. Page number in the fore sections of a work are often variable, some will label plates, some will not, "number of pages" as a concept itself is problematic, and it changes through time. If we are going to go via sections, then we would also need to start a number of plates in a work. Then to make things more complex when there are addition editions, you can even see inserted pages with nnnA, nnnB, ... so they didn't have to renumber the whole work. Dashed variability and changes in time!  — billinghurst sDrewth 23:28, 20 May 2018 (UTC)
"last numbered page of each section" makes a lot of sense. My second example above is in fact actually stated as "238 [338], xlviii p." in the catalogue -- presumably the last numbered page was also wrongly numbered in this case!
Do you think it ould it be worth a specific new property, last numbered page? Jheald (talk) 09:18, 21 May 2018 (UTC)

Using volume as a unit

I have been using volume (Q1238720) as a unit for the property number of parts of this work (P2635), eg at Q53574199#P2635.

Does anyone know if there is a way to make volume appear in the plural, ie as volumes ? Jheald (talk) 09:36, 20 May 2018 (UTC)

@Jheald: wouldn't the application of plural be something that is more general, and probably be language specific? As we know for many languages there is a general rule, and maybe it could be managed by a general rule with exceptions, though still that will be extensive when you get to number of languages.  — billinghurst sDrewth 23:18, 20 May 2018 (UTC)
@billinghurst: I was thinking of perhaps a slightly more general mechanism, as to whether there was a string that could be specified (in each language) to be shown when the item is being used as a unit. (Though that would still leave the single/plural issue, but we don't usually state when a book or edition is only a volume). I found and added P558 (P558) and unit symbol (P5061), but they don't seem to help. Jheald (talk) 08:46, 21 May 2018 (UTC)

Do we keep both or merge?

How would we handle a situation such as Our native ferns and their allies; with synoptical descriptions of the American Pteridophyta north of Mexico (Q51515725) and Our native ferns and their allies; with synoptical descriptions of the American Pteridophyta north of Mexico (Q51515726)?

These two items are for two different scans of the same edition of a book. The first was scanned by Cornell University from their holdings, but the second was scanned from the University of California libraries. It is the same edition, just different scans from copies at different libraries. --EncycloPetey (talk) 01:22, 30 May 2018 (UTC)

@EncycloPetey: You need one item for the edition, without any data about the scan (available at URL,...) and 2 items, one for each scan. The items about the scan should be linked to the edition item using exemplar of (P1574) and defined as instance of exemplar (Q512674). We already discussed about that problem above (see Wikidata_talk:WikiProject_Books#Different_scans_of_the_same_edition. @Jheald: You proposed to use instance of individual copy of a book (Q53731850) for particular exemplar: I don't like the term "book", because this term is not well defined and some people can use it for the edition or even for the work. I think exemplar (Q512674) is more neutral: we have work, edition and exemplar, all can be book depending on the point of view. Snipre (talk) 11:57, 30 May 2018 (UTC)
That method presents a very real problem, however. Each item on Wikisource was been labelled an "edition" or "translation" up until now, but those "editions" are typically backed by a specific scan. So, if we do what you're suggesting, then every single Wikisource-hosted copy will have to be redone, because you're suggesting they are actually exemplars. And thus, every Wikisource copy will have (1) an exemplar item where the Wikisource copy is linked, (2) a separate edition item, and (3) associated work item where the Wikipedia entry is linked. --EncycloPetey (talk) 15:02, 30 May 2018 (UTC)
@EncycloPetey, Jheald: Wikisource is doing what they want and I don't "take care" about what they choose as model. WD has to deal with other kind of data: first some exemplars can have an history like the bible of George Washington. Then particular exemplars have some characteristics like library identifiers, so WD has to have specific items to deal with that kind of data and finally you didn't do the difference between editions and print runs: one edition can have several print runs, each print run has small differences which can change the references (for example, some data can be printed on different page number). Fot these reasons WD has to have a specific item for each exemplar. Snipre (talk) 22:06, 7 June 2018 (UTC)
@Snipre: If somebody has a particular need for a particular exemplar then they can create an item for it. But in general, WD does not need to have a specific item for each exemplar. Lazy creation at the time of need will usually be quite sufficient. Jheald (talk) 22:15, 7 June 2018 (UTC)
I would merge the two into one item, and use collection (P195) as a qualifier to distinguish where a particular scan-set is taken from.
There may be a few complicated cases where it may make sense to create distinct items for specific exemplars, but in most cases I think that would be an unnecessary complication. Jheald (talk) 15:22, 30 May 2018 (UTC)
@EncycloPetey, Snipre, Jheald: in this case, I would merge too ; maybe create item about exemplars but not for the scans. Snipre: is your point to consider scans as a fifth level of FRBR? Because, scan are not exemplar at all. If a library did 10 times a scan of the same examplar (which is not unusual at all), do you really suggest to create items for: 1 work, 1 edition, 1 exemplar and 10 scans. PS: do someone know how to contact openlibrary to ask for a merge of OL26454530M and OL7247817M (indicated as two different editions but it's clearly the same one). Cdlt, VIGNERON (talk) 13:31, 12 June 2018 (UTC)
@EncycloPetey, VIGNERON, Jheald: Not exactly, I am not creating a new level, I just consider a hard copy of a book and its scan as two exemplars in term of FRBR. So if I take your example of a library doing 10 copies of a book in paper, then we have one work, one edition and eleven exemplars. I don't consider the scan as a sublevel of the hard copy but as an egal level. Why ? That is the only way to correctly put all the data of the scans without mixing them. Each scan can have one specific identifier, one specific URL for online access, one specific creation date,... how can you treat all that information in one item ? how can you retrieve the specific data of one scan when everything is mixed ? The question is not to know if we can merge the items, the question is what are the data about scans which can be described in WD ? If you have 2, 3 or more data per scan, then you HAVE to create several items. So unless you can ensure that now and in the future, no more than one data per scan will be possible, we have to create several items for each scan or hard copy.
By the way explain me how you plan to add the data about one scan if nobody specifies the data about the hard copy which was used for the scanning ? The scan, if it can be identified by any set of specific data, is an exemplar, and the scanning can be considered as the translation operation: we still need a property to indicate which was the edition used as original text for a translation. In some cases the original version in the original language was not used as text for translation. Ex.: an original version in English was translated in German and the German version, not the English one, was used to generate a French version. In that case, we should be able to specify that relation. We could use the same property to create a relation between a hard copy and a scan, if we have enough data about both "texts" to justify the creation of 2 items. Snipre (talk) 22:39, 13 June 2018 (UTC)
@Snipre: oh ok, I see. That can make some sense. For the FRBR, exemplar are only physical (and not always unique), see FRBR, pages 24, 47-48 but as FRBR itself says « dynamic nature of entities recorded in digital formats merit further analysis ».
In my example, there is only one physical document, the 10 scans are not materialized (reprint exists but are rare, most digital contents stay only digital). I don't see any problem to put 10 URLs of the same physical exemplar on one item for the exemplar, the URL is the only property specific to the scans, everything else is specific to the exemplar (and often specific only to the edition). I agree with your thought but you miss an important point: except for URL (and derivative of the URL like identifiers), 99% of the time, there is no specific data about specific scans. For exemplar, it's already quite rare to have specific data (the collection and the history of owners and that's it).
To go back to the original example here, what data would justifiy to have 2 exemplar items? I can understand one item for the edition and one for the exemplar but right now, both are about edition (and the same 6th edition). I would propose to merge the two current items and maybe create an item for the exemplar (but do we really need an item about the exemplar? I'm not even sure)
Cdlt, VIGNERON (talk) 08:39, 14 June 2018 (UTC)

Biodiversity Heritage Library

Announcing: Wikidata:WikiProject BHL

The Biodiversity Heritage Library (Q172266) is a large multi-institution project to digitise and make available literature from the past relating to zoology, botany, and the diversity of life.

A few weeks ago, Magnus's Reinheitsgebot created Wikidata items for 63,000 BHL titles out of the (currently) 136,000 in Mix'n'match catalogue 1131. Since then some progress has been made identifying Wikidata items for BHL creators and adding BHL creator ID (P4081); replacing author name string (P2093) with author (P50); and adding some further fields from online sources; but there is still a considerable way to go.

For current statistics, see the dashboard pages for title progress and creator progress now created at Wikidata:WikiProject BHL.

The data has its quirks. The BHL 'title' dataset combines various different sorts of material, including books, periodicals, catalogues, individually bound article reprints, technical reports, etc. Initially these have all been given instance of (P31) = publication (Q732577); a few (but not all periodicals) have now been given instance of (P31) = periodical (Q1002697) based on keywords from BHL. It will be quite a challenge to further refine the identification of the material.

Also be aware that the BHL dataset of 'creators' for the titles (currently imported as author (P50) / author name string (P2093)) actually includes people with a considerable variety of relationships to the printed material -- including authors, editors, illustrators, corporate sponsors, various other contributors to works, even former owners of the texts in a few cases. This too could usefully use quite a lot of refinement.

But it's an important collection. Commons currently includes almost 250,000 files from the BHL, coordinated through the c:Commons:Biodiversity Heritage Library project page -- so work structuring the information here may make a real difference to building new pathways to make those Commons images more accessible. With 60,000 titles, I think it's also a very useful test-set to work on, to put our ideas for book data into practice, and to see what practical issues and questions arise, when applying them to a (very diverse) real-world sample of this size.

Anyone with an interest in this data, and/or ideas on how to improve it, is very welcome to add themselves to the Participants section at the bottom of Wikidata:WikiProject BHL page. Jheald (talk) 16:12, 7 June 2018 (UTC)

Just to clarify Jheald: do you have any connection to the BHL? --Succu (talk) 21:21, 24 July 2018 (UTC)
@Succu: None at all. :-) I just was doing some work on the items, and thought that a WikiProject with property-use statistics, a progress page to record work done / doable, and a talk page might be a useful thing to create, as there were likely to be other people also interested in this data. Jheald (talk) 21:30, 24 July 2018 (UTC)

How many edition items for On the Origin of Species ?

We currently have 24 different items for English-language versions of On the Origin of Species (Q20124), mostly arising from different copies scanned for the Biodiversity Heritage Library (Q172266) (plus four more versions that are translations).

This query, tinyurl.com/yd4yjod9 gives a summary, because the list on Q20124 has become pretty much impossible to navigate.

Question: How many of these items should we keep, and which (if any) should we merge?

Background: Darwin himself produced six editions of the text, the first and the last being

His official authorised publishers were John Murray (Q1232629) in London, and D. Appleton & Company (Q3011053) in New York. With the exception of On the Origin of Species (Q20968204), all of the Murray and Appleton items that we have correspond to the 1872 version of the text.

The page-counts in the 'pp' column of the query correspond to the number of scan frames, so small differences here may not be that significant: the highest-numbered page of the Murray 1872, 1880, and 1886 copies is in each case 458; on the other hand that for the 1910 "popular impression" is 432. The Appleton copies are complicated by being in two volumes, sometimes bound together and sometimes not. The highest-numbered page of the second volume is 338 for the two 1889 copies, and 339 for the 1899, 1909, 1915, and 1917 copies.

Other publishers produced versions that may or may not correspond to the Darwin's final 6th edition, depending on the copyright observance and/or expiry and/or their own particular whims.

The 1902 and 1905 Collier copies (numbered as two volumes) both conclude at page 356; the 1909 "Harvard Classics" edition (single-volume) from the same publisher concludes at page 552.

The 1872(?) and 1899 Burt copies would both conclude at page 538 (except this is missing in the 1899 copy); the earlier copy then adds several pages giving a list of other works in Burt's "Library of the World's Best Books". This pagination also matches the Merrill and Baker copy, from a series called "World's Famous Books".

The Hurst and the Caldwell copies appear to have identical pagination (final page 501), though their title pages are different.

The Books Inc. copy re-orders the material (moving the historical preface to the end), and omits both the index and the comparison of the 6th with earlier editions.

... etc ...

So: how to bring sense to all of this?

We have a large number of copies based on the same text; some are also based on the same typography and pagination; some may even be from the same printing (Q51515167 / Q51515141); though even then we have different scannings, with each scanning available from multiple different sources (ie BHL vs IA).

The page at On the Origin of Species (Q20124) brings little sense of any of this; it certainly doesn't group together the copies based on the same underlying text.

Does it make sense to differentiate eg the three John Murray texts between 1872 and 1886, all with the same pagination, or would it make sense to group these in some way?

How can we best bring some order to all of this? Jheald (talk) 19:46, 12 July 2018 (UTC)

Start with the six editions issued during Darwins live span, published by John Murray (Q1232629). Remove all other items from On the Origin of Species (Q20124). Than match the rest by hand. Systema Naturae (Q29270) or Genera Plantarum (Q1501516) are probably easier to handle, because BHL is missing scans of most editions. --Succu (talk) 19:15, 24 July 2018 (UTC)
Hm, The Complete Works of Charles Darwin Online (Q7727209) gives 497 results! --Succu (talk) 19:46, 24 July 2018 (UTC) ... etc ...

Publishers and imprints

  WikiProject Books has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead. I'd like to do some work on publishers and imprints. Does anyone know of a standard reference or database (preferably freely accessible online) with info about the dates of publisher mergers, acquisitions, spinoffs, etc.? - PKM (talk) 19:22, 13 July 2018 (UTC)

Novel or book

I've started a discussion at Wikidata:Administrators' noticeboard#Novel or book, which impacts this project. --EncycloPetey (talk) 18:39, 21 July 2018 (UTC)

Why exactly did you ask for the intervention of an administrator without informing me first? The only reasons I can imagine is that you think I needed to be blocked right away because I was endangering the project, or that I have abused the extra tools that I have as an administrator. Is that what happened? Anyway, the place to start a discussion is the Project Chat or the corresponding WikiProject, and it's just basic courtesy to let the other person know. Moving on, if you are writing in this WikiProject, I am going to assume you are aware of the continuous discussions about modeling books and literary works. You can always start a new discussion, but it was agreed beforehand that all books should have instance of book. You also stated that novel (Q8261) is not a literary genre (Q223393), and yet one is an instance of the other. Currently, there are only 4738 items with instance of (P31) novel (Q8261), so I started checking some cases. Out of the 10 items I checked, most had instance of (P31) book (Q571) before being changed to novel (Q8261) by an inexperienced editor. Any reason why you (an experienced editor) made such changes without prior discussion? Andreasm háblame / just talk to me 20:12, 21 July 2018 (UTC)
I have explained my reasons there.
Re: "it was agreed beforehand"; I have seen discussion that touches this subject obliquely, and found multiple proposals to make "novel" a value for "instance of", but have seen no discussion that concluded it should not be used that way. --EncycloPetey (talk) 21:00, 21 July 2018 (UTC)
@EncycloPetey: If you wnat to understand the book classification, then start to read Wikidata:WikiProject_Books#Bibliographic_properties. The first principle is to use the FRBR model so please explain how novel can be included in that model. Snipre (talk) 18:13, 22 July 2018 (UTC)
I have read that. A "novel" is a form of literature, so it would be used at the WORK level of the model for WORKs that are novels. I don't see why this is so hard to understand. This proposal has been made many times, and I have found no objections. --EncycloPetey (talk) 20:46, 22 July 2018 (UTC)

We need a clear model in WD if we want to follow a FRBR structure

It is time to define a clear model for written works in order to be able to continue the data import in WD. We agreed to use the FRBR model, but we never adapted the WD model to that model.

The FRBR model is composed of 4 levels. Currently the WD model (described in Wikidata:WikiProject_Books#Bibliographic_properties) is trying to follow that model but without a clear success. We need to clarify the WD structure and to link clearly the WD structure with the FRBR structure.

FRBR model Current WD model Proposed WD model
Work Book Work
Expression - -
Manifestation Edition Edition
Item Exemplar, manuscript Exemplar

Propositions:

  1. Proposition 1: Whatever is the chosen classification, the classification levels have to be used as unique value for property instance of (P31). There is 4 levels in FRBR model so if WD wants to use that model, we should have maximal 4 values for instance of (P31) when used to define a written item.
  2. Proposition 2: We should avoid any use of term "book" in any of the 4 levels. Term "book" can be used in each level so the confusion is at its maximal when using that term with no clear definition. And even if we define clearly the term in WD, the use of this term will always be a source of misunderstanding as few persons are taking care of WD definitions.
  3. Proposition 3: The lower level "item" according to FRBR level should be defined by a unique value and not by manuscript AND individual book like this is currently described in Wikidata:WikiProject_Books#Bibliographic_properties. A manuscript is an individual book so this level manuscript is not necessary as a manuscript can be described as an individual book written by hand. The proposition id to use exemplar, but other possibilities exist like version or individual book.
  4. Proposition 4: More properties are necessary to describe some characteristics of a written item:
  • a property to describe the format of the written element. For example, a novel can't be used as value for genre, value for genre can be romantic, erotic, dramatic, but we need someting to define the format of the text.
  • a property is needed to define if an examplar is written by hand or by mechanics. This property will be used to define a manuscript.
  • a property is needed to describe the support. Currently a scroll is defined by "instance of scroll". As this is in contradiction with Proposition 1, we need a new property to define the physical format of the document.
Examples:
A novel can be written by hand on a scroll made of papyrus. A poem can be printed in a codex made of vellum.

Comments ? Snipre (talk) 19:23, 22 July 2018 (UTC)

General comments

Thanks for pulling this together. I suggest we organize comments by proposal. - PKM (talk) 20:47, 22 July 2018 (UTC)

this is indeed clarifying some very essential problems that have been very problematic to work with these past years. --Hsarrazin (talk) 08:59, 18 August 2018 (UTC)

Comments on Proposition 1 (classification levels/unique values)

@PKM: Why creative work (Q17537576) is too broad ? Just take the example of Gone with the Wind (Q2870): this was a work which doesn't exist only under the form of written document but as movie or songs. That's why I want to avoid the use of book at the level of work because often a book was adapted in movie or TV serie, and the work should be able to connect all those different forms. For me creative work (Q17537576) is fine. Snipre (talk) 14:29, 23 July 2018 (UTC)
There are books (as physical objects) that contain several literary work (Q7725634) (any anthology, but also all of the "Double Ace" collection, for example, contain two novels per book), and there are even literary work (Q7725634)s that span over several books (such as the spanish translation of Cryptonomicon (Q534975), published in three separate books with different title (P1476) and publication date (P577)). There is not going to be any "point" in the hierarchy that matches "book", neither creative work (Q17537576) nor literary work (Q7725634) or something in between. They are simply unrelated concepts. --JavierCantero (talk) 18:53, 23 July 2018 (UTC)
The description of creative work (Q17537576) includes the word "artistic". There are many works that contain the minimal level of creativity required to qualify for copyright, but are not considered artistic. There are also works that aren't even creative enough to qualify for copyright (at least in the US), but they are certainly books. Example: telephone book. Jc3s5h (talk) 14:27, 26 July 2018 (UTC)
The phrase in the proposal "we should have maximal 4 values for P31) (P31)) when used to define a written item" needs to be clarified. In general, an item may have several instance of (P31). Does this mean that any of the items under discussion may have up to 4 P31) (P31)) statements? Or does it mean if we travel from an exemplar, through the connections from more specific to less specific, we will encounter at most 4 items before we come to an item that has no {{P|31)} and only has subclass of (P279). What bothers me is that some items will clearly be instances of several other items, even after we allow for items implied by the hierarchy. For example, an exemplar could be an instance of a version, and also an instance of a manuscript. Another exemplar could be both an instance of a version, and also an instance of printed matter (Q1261026). Jc3s5h (talk) 14:39, 26 July 2018 (UTC)
  • Did we all notice that "edition" has been renamed "version" in English (version, edition or translation (Q3331189))? Is there general support for this (and if so, let's use it.) - PKM (talk) 20:58, 22 July 2018 (UTC)
  •   Oppose I don't see the justification for this. Everywhere else on Wikidata, we are happy with allowing items to be instances of a subclass (or chain of subclasses) of a particular item. So why not here?
I also think it's a poor fit with where we currently are.
Here's a query for current subclasses of written work (Q47461344), by count of number of items: tinyurl.com/ybkkemsw
What's wrong with identifying that an item is a biographical article (Q19389637) or a dictionary entry (Q4423781) or a field study (Q26840222) or a travel book (Q1164267) or a biographical dictionary (Q1787111) or an autobiography (Q4184) or an atlas (Q162827) or an encyclopedic dictionary (Q975413) -- or indeed a novel (Q8261) or a novella (Q149537) or a novelette (Q472808) or a short story (Q49084) ?
Yes, we probably want to put limits on the permitted classes, but it seems to me a lot more useful and intuitive than saying Uncle Tom's Cabin (Q2222) is a literary work (Q7725634). Jheald (talk) 23:00, 22 July 2018 (UTC)
  • Currently biographical dictionary (Q1787111) is <instance of> literary genre (Q223393) and subclass of both "literary genre" and "written work" (several steps removed). Is that acceptable? I think the proposal is parallel to what Wikidata has decided about occupations - Sandy Koufax isn't <instance of> baseball player, he has <occupation> baseball player. In the case of books, we might choose to say "all books are <instance of> written work" in the same way that all people are <instance of> human. But we don't have to make that choice. If we decide to accept <instance of> "novel/book/dictionary/play/chanson de geste" then I think we would have to make sure all of those things are subclasses of written work (at some remove) (as well as subclasses of genres), and we would have to agree that an item can be both a work and genre. - PKM (talk) 23:34, 22 July 2018 (UTC)
@PKM: Agreed. I think that is the exact analogy.
With regard to work and genre, we don't want the (implied) statement that Dictionary of National Biography (Q1210343)instance of (P31)literary genre (Q223393), so I think the statement biographical dictionary (Q1787111)subclass of (P279)literary genre (Q223393) is wrong and needs to come out.
On the other hand, using P31 to say biographical dictionary (Q1787111) (or some superclass of it) instance of (P31)literary genre (Q223393) is I think perfectly okay. Some care may be needed, to be vigilant for potential issues like this one; but essentially this is a pattern we use right across the site. Jheald (talk) 20:56, 23 July 2018 (UTC)
@PKM, Jheald, EncycloPetey: +1. The main problem is that a written document can be defined according to a long list of characteristic: the format of the text (novel,...), the genre (romantic, dramatic,...), the support (codex, scroll,...), the way the text is written (hand written (manuscript), printed, electronic,...), and plenty of others if we take the time to list all possible characteristics. So the question is on which basic we can use one particular characvteristic like the text format to classify one written document ? This is completely biased on the opinion of everyone.
Using a completely external structure allows to avoid any conflict: no need to define which characteristic is the better one, no need of creating an item for all possible combinations of characteristics, no need of defining if novel can be considered as a work or not. By creating one dedicated property for each characteristic of books, we keep a uniform treatment of the data and by using the FRBR structure, we define only the level of classification without trying to put a personal judgment. Snipre (talk) 16:21, 23 July 2018 (UTC)
Your argument is stuffed with straw: level 1 has nothing to do with the "support" or the "way the text is written"; those are properties of manifestations or exemplars. But the work in abstraction will still be a play, or a novel, or a poem. This is invariant regardless of the specific edition or exemplar examined, and is therefore a property of the work itself. --EncycloPetey (talk) 16:26, 23 July 2018 (UTC)
@EncycloPetey: My argument is perhpas not really focused on the work level, but it is still consistant: why do we classify a book according to its text format and not according to its genre ? In a library, it is more common to classify by genre that by text format. So please provide your argumentation because until now I didn't see anything which has the appearence of an argument from your side. Snipre (talk) 19:11, 23 July 2018 (UTC)
You are still arguing at straw. This is not an either...or' issue. We can do both. Please stop misinterpreting my comments. --EncycloPetey (talk) 19:29, 24 July 2018 (UTC)
@Snipre: There is a spectrum of possible practice here -- at one end, having very few, very generic classes and then conveying everything with properties; at the other, having a great number of very specific classes, ultimately stretching to creating a new class for every possible intersection.
Neither extreme is desirable, IMO. Over the years on Wikidata I think we've found the sweet spot is somewhere in the middle: having a limited number of classes that capture the basic essence of the thing, with further details about it conveyed by statements.
It wouldn't be an approach that would work well with a classic relational database, where one would want to force information as far as possible into strict columns; but it happens to be an approach that does work rather well with a triplestore query engine, like WDQS.
With a few exceptions (such as our key rule for human beings, that their defining essential feature is always that they are a Q5), the tipping point for switching from narrowing classes to adding a statement instead varies from context to context and is always slightly fuzzy and up for grabs (though as a community we can sometimes put down a few key markers). But a good consideration may be how strongly distinctive a class is, as opposed to being a mere intersection. So to say a particular poem is instance of (P31) sonnet (Q80056) would seem reasonable; whereas to say a manuscript is instance of (P31) "British illuminated manuscript" seems rather forced, unless British illuminated manuscripts are such a distinctive thing, with a tendency to such a distinct set of characteristics that set them apart from other illuminated manuscripts, that it truly makes sense to treat them as a distinctive class to themselves.
Finding this 'sweet spot' reasonably well has a number of advantages. Firstly, on the one hand it makes instance of (P31) feel intuitive, a good statement capturing what the thing fundamentally is, without on the other hand requiring too vast a vocabulary of different classes. It groups like-things well. Secondly it means it may be possible to state some characteristics of the things just once, for the class, rather than requiring users to accurately and redundantly input the same piece of information over and over again, per each individual item.
Fortunately SPARQL makes it very easy to extract items, via constructions like (wdt:P31/wdt:P279*)?/wdt:Pxxxx, whether a property is given as a statement directly on the item itself, or as a statement applying to a whole class that the item is an instance of.
So this is the model which has been applied pretty much across the whole of Wikidata; and which I also see substantial benefits for here, with very little against. (Though defining version, edition or translation (Q3331189) as the unique class for an edition does I think work quite well). Jheald (talk) 20:44, 23 July 2018 (UTC)
@Jheald: I know how WD works and just looking at the topics appearing in this page, the current model is not clear and solutions are found case by case. But this way is not possible for databases et especially for machines which need a rigorous classification to perform data processing. Your sweet spot is just the nightmare of everyone who has to do data extraction or data manipulations: some time the model is like that and sometimes the model is different. No way to create automatic tools. Accepting that text format are defined using instance of and genre using a dedicated property doesn't allow the possibility to have one tool extracting books according to their characteristics: we need to code differently the queries. Then if we start to use values like illuminated manuscript then this is a third way to extract data: to extract all manuscripts, we need to query the sum of 2 classes (illuminated manuscript and manuscript). This is the problem because people don't know how a database is working. A database needs structures and rules and not individual solutions. Snipre (talk) 08:58, 24 July 2018 (UTC)
@Snipre: This is a graph database, not a relational database. I've given you the query fragment above which will handle it -- in fact, in a graph database it is rather more efficient to have things in a hierarchy of relevant classes, rather than to have everything in a single mega-class and then to have to filter all of it. Jheald (talk) 09:14, 24 July 2018 (UTC)

To be honest, I can't make head or tail of the proposition here. Could someone try wording it differently? I can think of several ways to read the statement as currently given, and they contradict each other, and none of them strike me as good policy, so perhaps I'm missing something. - Jmabel (talk) 04:10, 26 July 2018 (UTC)

@Snipre: If I understand this proposition, this means there could only be 4 possible instance of (P31) values, which would be FRBR-1, FRBR-2, FRBR-3 and FRBR-4 (let's call them this way to avoid confusion with items actually existing on wikidata), without any possibilities to use subclasses  :)

Let's be clear : you tend to refuse subclasses as P31, because of their potential weakness to any change made on wd ontology that could break the system, and propose to dispatch the characteristics of specific subclasses in properties instead, which would insure that no modification to ontology would result in the loss of info.

Comparing with the work achieved on biographic data : it was a hard fight to group all people as human (Q5), since many contributors thought that writer, politician, athlete, etc, would be more precise. It is now widely accepted, but it was a very long process, and every biographical specificity is in a specific property (almost).

well… having worked a lot on biographies, for authors mainly, I, for one, tend to think that it would be an enormous work, but it could worth it.

Specific properties should be defined very clearly though :)--Hsarrazin (talk) 10:24, 18 August 2018 (UTC)

Comments on Proposition 2 ("book")

  • I think it's essential to use "book" as an alias where a novice user would expect it. - PKM (talk) 20:47, 22 July 2018 (UTC)
  • Book or novel or play or poem or whatever sort of "work" it is. Not all literary works are books, and even novels may be serialized in a magazine during their first release, or collected within a volume with other works in a later release. "Book" is extremely misleading for many works, but "work" is so generic that it is uninformative and should only be used in situations where no further precision is possible. --EncycloPetey (talk) 20:54, 22 July 2018 (UTC)
  •   Weak support I have some sympathy with this. The problem with "book" is that it might suggest either the work in abstract at its most essential level, or a physical manifestation of it, or even a particular edition. At the moment I believe we mostly use it at the 'work' level (and very heavily, with in excess of 100,000 uses). I would prefer to see those 'work' level uses replaced with classes like novel (Q8261) or thesaurus (Q179797) or encyclopedic dictionary (Q975413), which make much clearer that their instance is being considered for its content in abstract, rather than as a physical object. Jheald (talk) 23:15, 22 July 2018 (UTC)
  • Novices don't use "book" as a synonym of FRBR's Work but Edition: they expect for the book item to have an ISBN, since they are talking about the physical object and not the creative work. So if you want to keep the "book" term as a more user-friendly choice, you should use it as the third FRFB level. --JavierCantero (talk) 17:36, 23 July 2018 (UTC)
  •   Strong support, clearly book is too polysemic. For aliases, all for levels of the FRBR can have "book" as alias, I'm not sure that putting there aliases would really help. Cdlt, VIGNERON (talk) 22:15, 25 July 2018 (UTC)
  •   Support; "book" should be alias for each level, so all are easily found if someone enters "book" in the UI. - Jmabel (talk) 04:08, 26 July 2018 (UTC)
  •   Strong support - book (Q571), as described in French is a physical item, an object described as having pages (a codex in fact) and thus, at best, a format of publication ; in other languages (nl) it is merely a document or a printed work (es), or a medium for distribution of a text (en) (paper or electronic), which are very different things ; in no case it can be a work, or even an edition (except by metonymy (Q41966)). An exemplar could be a book but many exemplars are not books, they can be serials issues, manuscripts, cds, etc, volumen. The fact that it is currently used for all 4 levels by people who are not book professionals (and also by book professionals) is indeed a very clear indication that it is totally inapropriate for our goals with FRBR. It could be an alias for each level though :) --Hsarrazin (talk) 08:38, 18 August 2018 (UTC)

Comments on Proposition 3 ("exemplar")

@Jheald: Your proposition is the same system used for categories in WP: if you have a novel written by hand on a scroll with decorations, do you create an item combining all these features (i.e. "novel written by hand on a scroll with decorations"? And if now you don't have a novel but a poem do you create a new item "poem written by hand on a scroll with decorations" ? And if you have a novel written by hand on a scroll without decoration would you create the item "novel written by hand on a scroll without decorations" or the item "novel written by hand on a scroll" ? Do I need to continue the demonstration or is it clear that combinations of terms is just a nightmare when considering all possible characteristics of a written document ? Snipre (talk) 11:40, 23 July 2018 (UTC)
@Jheald, Snipre: I think it's reasonable to say that the Ellesmere Chaucer (Q1227831) (poorly modeled currently) is <instance of > illuminated manuscript and <exemplar of> Canterbury Tales. - PKM (talk) 19:17, 23 July 2018 (UTC)
@PKM: Yes, I think that would be exactly the right way to model it.
A complication that might arise for some manuscripts (but not I think here) would be if the manuscript collected together a number of texts. The answer in such a case I think would be to have a corresponding number of exemplar of (P1574) statements, but it might be useful to qualify each one with a new property "folio(s)", akin to the existing page(s) (P304), to indicate which part of the MS corresponded to each text.
Pinging @MartinPoulter: here, who I think has recently been working with items for a number of manuscripts from the Bodleian Library. Jheald (talk) 19:47, 23 July 2018 (UTC)
@PKM: Not sure if your example helps the discussion: if we accept "illuminated manuscript" why can we accept "English illuminated manuscript" for Ellesmere Chaucer (Q1227831) ? Snipre (talk) 19:25, 23 July 2018 (UTC)
@Snipre:, if an editor thought it was reasonable to make an item for "Hiberno-Saxon illuminated manuscript" (which I think is a reasonable class of manuscripts), I'd have no problem with using <instance of> "Hiberno-Saxon illuminated manuscript" for the Lindisfarne Gospels (Q80935) (which today has three <instance of> statements). - PKM (talk)
@PKM: If you choose that system so you can forget to extract data in an exhaustive way: if one contributor create an item "Hiberno-Saxon illuminated manuscript" for one case and another create "Saxon illuminated manuscript" and "Hiberno illuminated manuscript" and chose to add too instances ("Saxon illuminated manuscript" and "Hiberno illuminated manuscript") instead of using "Hiberno-Saxon illuminated manuscript", no query will be able to handle those cases. The main problems are to have an overview of which descriptions are available and to update old descriptions with new ones. If contributors feel free to create items when they need them they won't take the time to look for existing descriptions and will create duplicates as this is the easiest way. People are lazy so don't expect they will try to look for what is already available. Snipre (talk) 08:40, 24 July 2018 (UTC)
@Snipre: Not true. This is exactly what SPARQL is designed to be good at. Path queries ( wdt:P279* ) are your friend. Jheald (talk) 09:23, 24 July 2018 (UTC)
@Jheald: Please read my comment once more: I never said that SPAQRL is not able to perform the queries I mentioned, I just pointed the fact that having 3 models, we need 3 different SPARQRL queries to find the same kind of items. Don't you understand that querying documents by genre requires a query based on a dedicated property and querying documents by text format requires a query based on instance/subclass ? Don't you see the problem to query all illustrated documents if we create different classes like illustrated novel, illustrated anthology, ... ? Snipre (talk) 15:07, 25 July 2018 (UTC)
@Snipre: The query fragment that I posted in the Proposition 1 discussion checks both ways: whether the item itself has a given property and value directly, or whether the item is an instance of a class that has that property and value. It's not particularly difficult. Jheald (talk) 15:22, 25 July 2018 (UTC)
  •   Comment I've read the discussion thus far, but don't think I've seen enough specific examples of how this would be applied, or what the actual results would look like. I'd like to see this explored more with varied examples before forming an opinion. Are there others who feel as I do? --EncycloPetey (talk) 22:29, 25 July 2018 (UTC)

Comments on Proposition 4 (new properties, genres)

4A: genre

  • I don't know that Wikidata can solve the world's imprecision around what is meant by "genre". I believe the proper solution is to encourage multiple genre values, so that Childhood's End has genres "novel" and "science fiction", while "The Roads Must Roll" has genres "short story" and "science fiction" (genre-by-form and genre-by-subject, if you will). If we were to create a new property for "genre-by-form", I'm not sure we'd achieve clarity, because some of the minor forms can be confused with genres-by-subject (lyric poetry). Even if we made a new property like "literary form", how would editors know what values to assign, since novel (Q8261) is an <instance of> literary genre, which can be supported by quality citations in multiple languages? (Aside from not supporting splitting genre in the way proposed, I would not want to use the term "format" here, as that has a strong connotation of book format (Q18602566) like hardback, paperback, etc.) I'm going to quote the Oxford Dictionary of Literary Terms at length re: genre, because I think this is important:
Genre The French term for a type, species, or class of composition. A literary genre is a recognizable and established category of written work employing such common conventions as will prevent readers or audiences from mistaking it for another kind. Much of the confusion surrounding the term arises from the fact that it is used simultaneously for the most basic modes of literary art (lyric, narrative, dramatic); for the broadest categories of composition (poetry, prose fiction), and for more specialized sub-categories, which are defined according to several different criteria including formal structure (sonnet, picaresque novel), length (novella, epigram), intention (satire), effect (comedy), origin (folktale), and subject-matter (pastoral, science fiction).

- PKM (talk) 20:47, 22 July 2018 (UTC)

  • We could certainly be more precise about what we mean by "genre", and really ought to disentangle several another salient features. For example, when I work with translations of Greek poetry (whether drama, or lyrical odes, etc.) it can be very important to know whether the translation was done in prose or in verse. The original text may have been poetic, but the translation may not be in the same literary form as the original. Currently, we have no means to indicate this aspect of works, at any level, where it differs from the higher levels. --EncycloPetey (talk) 21:00, 22 July 2018 (UTC)
  • It's perhaps worth noting that at least one major resource, viz the Library of Congress Genre/Form Terms (Q47537953) thesaurus, doesn't think that trying to create a wall between genre and form is a game that is worth the candle.
Per my answer to Propositions 1 and 2 above, I would think that the broad literary form -- eg novel (Q8261), poem (Q5185279), encyclopedic dictionary (Q975413) -- should be given as the main instance of (P31) on a work-level item.
Beyond that, in values for genre (P136), I see no merit in trying too hard to systematically segregate genre from form -- comic novel (Q2561390) is as meaningful a genre term as horror film (Q200092). I don't see any particular value in requiring that this instead be given as comedy (Q27640800). Both should be equally acceptable as values.
We do, however, probably need to agree guidance as to where P136 rather than P31 becomes more appropriate to specify the nature of the content.
In terms of subject matter, we have main subject (P921). But it may be useful to be able to individually specify aspects of an overall subject beyond this, as proposed at Wikidata:Property_proposal/Creative_work#subject_facet, based eg on the kind of subject keyword information available from databases like the Biodiversity Heritage Library (Q172266). Jheald (talk) 23:58, 22 July 2018 (UTC)

4B: support, and other properties

The idea of using distribution format (P437) at the 'Manifestation' level is interesting.
One thing that is interesting is what to do if an exemplar is the only known copy of a text. Do we need to have a manifestation level as well? Do we need to have a work level even?
We already have this issue on a larger scale, where we only have one known edition of a work. Do we need to have a work-level item as well? Many many systems (eg OpenLibrary) avoid this, and only create a separate work-level item when they actually need one, to avoid the difficulties of trying to keep two items with largely overlapping property-values (eg author, title, publication date, etc, etc) in sync -- potentially a huge amount of duplication, that users have so far largely run away from. Is there some semaphore we could adopt, to indicate a combined item? Jheald (talk) 00:20, 23 July 2018 (UTC)
@Jheald: The problem with your approach is how do you treat a novel written by hand on a scroll ? Following your reasoning this implies instance of novel + instance of scroll + instance of manuscript ? So you can already foresee the extension of that list when more details are added. So when you have a 3-7 instance of, I think we can really start to ask what is really the concept of the item. Snipre (talk) 11:04, 23 July 2018 (UTC)
I would support some way to handle a single-edition book without multiple items, especially for things like museum exhibition catalogues, if we can agree on how to model them. - PKM (talk) 20:20, 23 July 2018 (UTC)
@JavierCantero: So can you explain why we use a property for genre and we are not defining everything using instance of (P31) ? Why the text format and the support have more importance to be defined using instance of (P31) and not other characteristics ? You should be coherent: if "it's not possible to categorize everything with a single taxonomy" like you said, we should not create properties.
Just for your information, I am not against using instance of (P31) for describing most characteristics of a book, I just want a coherent system: if you accept to use genre (P136) so why this is a problem to have a dedicated property for text format ? Snipre (talk) 19:19, 23 July 2018 (UTC)
genre (P136) is a natural fit as a property since its value is independent from any other property, qualifier or value of the specific item. That can't be said about a property whose value could be book (Q571) or scroll (Q720106), since these have specific properties to state related to the item, such as ISBN-13 (P212) or the number of pages for a book, properties that if the item is a scroll shouldn't have (and viceversa if a scroll had its own specific properties they shouldn't be set when using a different "text format" value). Using instance of (P31) you ensure that only items defined as books would have book properties (such as ISBN-13 (P212)) and only items defined as scrolls would have its own (the data model enforces that). --JavierCantero (talk) 08:41, 24 July 2018 (UTC)

distribution format (P437) seems perfectly ok to me for manuscript, codex, volumen, cd, etc. --Hsarrazin (talk) 10:43, 18 August 2018 (UTC)

Other observations

  • Not really germane to any of the above, but the English-language label + definition of literary work (Q7725634) "creative work by a writer created with aesthetic or recreative purposes" seemed off to me.
Having a class with that definition may well be quite useful. But (perhaps the result of too many years exposed to copyright law), it seemed dubious to me to apply the label "literary work" to categorically exclude all non-fiction, biography, history, encyclopedic writings etc. In copyright law at least, these would all be considered literary works, embodying a degree of originality and creativity, and often a degree of literary style as well. Jheald (talk) 23:36, 22 July 2018 (UTC)
    • I agree with that ! for me "literary work" is some work made with "letters" (i.e. any textual work - written or not), not necessarily 'artistic' ; non-fiction literature is literature. --Hsarrazin (talk) 08:57, 18 August 2018 (UTC)
  • Above I've questioned whether we always need to have separate items for all levels. But there is also a converse question: do just these levels always provide a sufficiently fine grouping of different items, in particular of different versions at the edition/manifestation level.
For two examples, consider (i) different scans of a particular edition: A number of libraries, including eg OCLC, and the British Library consider that a particular scanning creates a new version of a particular edition. But it should be tied to the edition that it is a version of.
As a second example, consider (ii) the 28 versions of On the Origin of Species that we currently have items for, as discussed above. These different versions show some clear groupings, which it would be useful to model. Jheald (talk) 00:32, 23 July 2018 (UTC)
@Jheald: Re: scans, how about item "scan" and new property <scan of> "edition"?
For the Origin of Species question, I assume scholars name these groups in some way. Perhaps these are <instance of> a new item "version group" which has parts that are versions? - PKM (talk) 18:47, 24 July 2018 (UTC)
A property <scan of> would also be useful at The Decoration of Houses (Q55740549), published in 2007 from a photographic reproduction of the 1897 first edition. - PKM (talk) 20:41, 24 July 2018 (UTC)

Everyone seems to be ignoring the "expression" level. Seems to me it can be quite useful, especially to distinguish translations of the same work, when (for example) there might be multiple translations into a given language and/or multiple editions of a given translation. - Jmabel (talk) 04:16, 26 July 2018 (UTC)

Proposed refinement of model

  WikiProject Books has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead. Here's a first-cut model based on Snipre's proposed model and the comments above.

FRBR model Current WD model Proposed WD model <instance of> Property notes
Work Book Any subclass of creative work (Q17537576) [include literary forms, see Note 1] <genre> and <main subject> go here (+ others)
Expression - - -
Manifestation Edition Any subclass of version, edition or translation (Q3331189) <edition or translation of>; include all fields required for bibilographic citation here
Item Exemplar, manuscript Any subclass of document (Q49848) <exemplar of>

Notes:

  1. Literary forms (novel, dictionary) should be subclasses of creative work or its subclasses. Literary forms may be <instance of> genre but not subclass of genre

I think this may meet Snipre's objective of a simple model while allowing the use of a variety of <instance of> statements.

Thoughts on this model? - PKM (talk) 18:39, 24 July 2018 (UTC)

We currently have a limited number of classes that are currently in the subtrees of both document (Q49848) and creative work (Q17537576) -- see tinyurl.com/y8hgtw58
These are going to need some work to separate out the physical from the essential. Some we may need to think about more closely. Jheald (talk) 19:06, 24 July 2018 (UTC)
BTW, the talk pages of both document (Q49848) and creative work (Q17537576) have the {{Item documentation}} autodescription template, which gives a very useful view of what is in the subtrees of the two classes at the moment. Striking, because that template goes up rather than down. But it does have a link to the tree tool, which does go down. Jheald (talk) 19:43, 24 July 2018 (UTC)
On a first look, I am not 100% sure that Maximilian von Schwerin-Putzar (Q89848) document (Q49848) is going to work. We need our class to be something such that it is 100% clear that, for it and all of its subclasses, for anything that is instance of (P31) one of those classes, it should be self-evident from that statement that the thing is a definite single concrete physical object.
But Maximilian von Schwerin-Putzar (Q89848) document (Q49848) has subclasses like papal bull (Q189867) -- which I think really we would consider a type of work. (cf: en:List_of_papal_bulls) But it is hard to deny that it is also a kind of document. So this may need some refinement. Jheald (talk) 20:01, 24 July 2018 (UTC)
@Jheald:, I think you meant document (Q49848) not Maximilian von Schwerin-Putzar (Q89848). :-) (And {{Item documentation}} is my life-saver tool.) I agree some cleanup of document (Q49848) is necessary, but it looks like that is needed in any case. Is an ebook a physical object? Is it a document? We certainly must include ebooks at the edition level, as we cite them in references. However we structure our base model, I can see us having a "problem" list of things to discuss in order to truly standardize how we treat books - but having guidelines and best practices around all aspects of books would be valuable, and the lack of a base model has kept us from focusing on the finer points, IMHO. - PKM (talk) 20:44, 24 July 2018 (UTC)
Fixed now, thanks!! Jheald (talk) 21:24, 24 July 2018 (UTC)
@PKM: The key problem I think is that, in the subtrees of both document (Q49848) and creative work (Q17537576), there are fair number of class items that don't really clearly embody a work/edition/exemplar distinction, and so wouldn't solidly lead the editor of an item to accurately encode such a distinction.
Also, is eg Mona Lisa (Q12418) a work or an exemplar? Arguably it's a work in a different sense of creative work (Q17537576), one that's not a well-spring for multiple distinct exemplars, unlike say a text as a work.
It would be nice to get some clearer lines and inheritances in our classes here, but it's going to be a lot of work. So being less ambitious, and going back to something like "exemplar", rather than "document", may make sense as the key class at the head of the tree for classes that instances are individual distinct copies of works of which many individual notable copies may exist. It also gives us a tighter tree of subclasses to watch over and police, to some extent addressing Snipre's (very fair) comment below.
Mirroring this on the other side, it might also make sense to have a specific class for a work in the FRBR sense, ie a "work of which many individual notable copies may exist", that again one could more tightly police the subclasses of, rather than using creative work (Q17537576) for this.
It's a problem we have in all sorts of areas across Wikidata, as my original is-it-abstract-or-is-it-concrete query tinyurl.com/ya4spc62 tried to highlight. But actually I think we may be in rather better shape than the above may indicate, because the important thing to distinguish (as regards properties that may or may not be appropriate) are whether items are versions or whether they are exemplars, or not. And that I think with our existing structure we probably can already do, at least reasonably well, at least in principle (even if there is work to do on some/many individual items). Jheald (talk) 14:23, 26 July 2018 (UTC)
@Jheald:, if you're recommending that "individual book" and "illuminated manuscript" should be <subclass of> "exemplar", I would be 100% happy with that. "Document" was a quick choice for the Item level of FRBR, and is the category choice I was the least confident about.
Same thing with "work" - if we can define a class whose subclasses are clearly Works in the FRBR sense, and get agreement on that, I'm fine.
My one goal here is to get a model that we have consensus on and move on to other challenges. - PKM (talk) 19:03, 26 July 2018 (UTC)
In the other end, having dedicated properties for text format, genre,... allows us to have constraints which alarms when the instance/subclass values of the item where the property is used in not respecting the rules. Snipre (talk) 14:53, 25 July 2018 (UTC)
@EncycloPetey: So do you have a class tree to propose ? And if possible a class tree which is ok for everyone. It is already now so difficult to agree on which value should be used for work level (book or work) so I don't expect any agreement on several dozen of items classified in a certain order.
But in any case we can't create a model based on something which doesn't exist (the class tree) so unless you propose a class tree right now, my model doesn't rely on something which doesn't exist. Snipre (talk) 08:49, 28 July 2018 (UTC)
Ah, you're taking 'that strategy are you? Blame me for your reluctance while dissembling over the question I asked. You were the person who brought up the issue of trees. I only asked you to clarify, and you've tried to turn that against me. My question is straightforward: Are you proposing that we avoid making use of any class trees at all? --EncycloPetey (talk) 13:42, 28 July 2018 (UTC)
@EncycloPetey: I don't blame you, I am just tired to discuss with people who never propose some alternative. You support the use of class tree ? No problem, but please show me since when you are working on the development of this class tree and especially what you are doing to generate a global agreement for that initiative. Snipre (talk) 20:23, 29 July 2018 (UTC)
You have tried to put responsibility on me each time, but have failed to answer my question. Therefore, I assume that you're not going to answer my question, since I've asked it twice now without getting an answer. --EncycloPetey (talk) 20:27, 29 July 2018 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── Have we examined FaBiO, the FRBR-aligned Bibliographic Ontology as a guide for our books ontology? It might be helpful for sorting out our class trees. I note they have "novel" as work > artistic work (our creative work?) > literary artistic work > novel. I like that. - PKM (talk) 19:34, 28 July 2018 (UTC)

Use cases for modeling

What are your use cases? If the purpose is to "import data", maybe it would be preferable to outline what you want to import and how it's a problem in the current approach. If the purpose is to discuss a theoretical model that may be in uses elsewhere, maybe this isn't really a suitable forum.
--- Jura 04:33, 26 July 2018 (UTC)

@Jura1:, I think there is broad consensus to follow the FRBR model; what we seem to keep going round and round about is what to use for <instance of> at each level.
The current recommended best practice on our project page is to use <instance of> "book" for the work level, but that has been widely disputed in discussions in favor of something more like "work".
My use case is: As an editor, I want to understand a clear recommended best practice for adding a work-level item in WD to go with a new edition-level item (usually one I intend to use as a source for a <stated in> reference). - PKM (talk) 19:23, 26 July 2018 (UTC)
  • It might work better with more specific import questions at hand. Most Wikidata items can be interpreted within one theoretical approach or the other, but this doesn't really answer import questions.
    --- Jura 09:20, 28 July 2018 (UTC)

proposal for Copyright status of a work

  WikiProject Books has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead.

This proposal should of interest for all wikisource contributors that are part of this project too... Wikidata:Property proposal/copyright status --Hsarrazin (talk) 11:52, 25 July 2018 (UTC)

Will we mark the copyright status of the work for each and every country? Laws differ considerably in each nation.
We'd also have to have a means of differentiating the status of content within a volume. Sometimes the text is in public domain, but the illustrations are still under copyright. Or a volume may contain multiple works, some of which are under copyright and some of which are not. Sometimes the primary work is free of copyright, but the annotations, or the introduction are copyrighted. We'd have to have a system that indicates the status of individual components of a work before we can handle copyright issues. --EncycloPetey (talk) 14:41, 25 July 2018 (UTC)
@EncycloPetey: you probably mean, indicate the status of individual components of an edition, since each component is the edition of an individual work which status can be handled on the corresponding work item ? --Hsarrazin (talk) 09:26, 18 August 2018 (UTC)
No, I did not. I avoided being specific, because it is still not clear whether we would mark the work's data item or the edition's data item (or both) for the components, nor how this would be coordinated for a data item that represents a composite collection.

Chaucer

I've started some work on Chaucer. Can folks look at these items and see how they can be improved?

I am particularly stumped by how to show the relationship between the 1894 and 1900 versions of volume IV (edition of an edition? based on?).

I eventually want to get to the goal of indicating that the versions of Chaucer's works in Kelmscott Chaucer (Q4219142) are based on Skeat's 1894 editions, but that's a couple of levels of complexity beyond where I am right now.

 Y Also, I find complete edition (Q16968990) and complete works (Q1978454) confusing. Author A's "collected works" are logically a subset of her "complete works", but a "complete works" is a type of "collected works". And currently one is a book and one is a group of works. Ideas how to sort this? - PKM (talk) 21:43, 27 July 2018 (UTC)

  • Not that we necessarily currently have the properties, but in FRBR terms wouldn't the 1894 and 1900 versions of volume IV be two manifestations of the same expression, with the latter based closely on the former? based on (P144) would capture the relationship, no? - Jmabel (talk) 21:55, 27 July 2018 (UTC)
    Probably not. based on (P144) is usually used on derivative works, not editions of the same thing. We haven't yet determined how we want to handle items that are editions of other editions, or possibly an edition of two separate items simultaneously. --EncycloPetey (talk) 23:12, 27 July 2018 (UTC)
There are volumes in The complete works of Geoffrey Chaucer (Q55776699) that are editions of two (or more) works. - PKM (talk) 18:50, 28 July 2018 (UTC)
Re: complete edition (Q16968990) and complete works (Q1978454) - I am going to follow the ontology at FaBiO, which has anthology > collected works > complete works. I'll build these out accordingly. - PKM (talk) 19:00, 28 July 2018 (UTC)
I note that AAT has only one item "collected works" with the meaning "complete works" (which is the likely source of some of our poor labels), and that they class this as a document genre under "information" (that is, as FRBR:Item). AAT also classes "novel" as a literary genre. This is an illustration, I suppose, that controlled vocabulary =/= ontology. Just commenting for the purpose of general discussion. - PKM (talk) 20:38, 28 July 2018 (UTC)
It turns out that our item labeled "collected works" in English should actually be "complete edition" based on the single sitelink. Andreasmperu and I are sorting that out, and I'll likely move the AAT link which would solve the problem mentioned above. - PKM (talk) 15:01, 31 July 2018 (UTC)

Proposal to clean up the Books class tree using FaBiO

@Snipre, EncycloPetey, Jheald, JavierCantero, ArthurPSmith, Jura1: EncycloPetey has asked for more examples and Snipre has asked for a proposal on an improved class tree. I have spent some time over the last few days looking at FaBiO, the FRBR-aligned Bibliographic Ontology as a guide for making an FRBR-compliant class tree for Wikidata. Here are my preliminary thoughts:

  • Using a professionally-developed ontology rather than re-inventing the wheel is probably a good idea.
  • FaBiO was first published in 2010 and Version 2 was published in February 2018.
  • FaBiO is widely used and has been mapped to other ontologies.
  • FaBiO is less granular than Wikidata and includes some concepts which are out of scope for the Books project.
  • Getty AAT, while widely used for arts concepts in Wikidata, is not a good source for a primary book structure as it is not based on FRBR.

I think FaBiO would provide a solid basis for a class tree for Books in Wikidata, with the usual caveats that there will be individual concepts where Wikidatans choose to structure our class tree differently, especially if we skip the FRBR:expression level. I would recommend the following best practices:

  • Don't follow FaBiO exactly, but use it as a guide, to be expanded and collapsed where needed.
  • Where we do follow FaBiO, use a <stated in> FRBR-aligned Bibliographic Ontology (Q44955004) reference on the <subclass of> statement.
  • Used exact match (P2888) on concepts that are exact matches to FaBiO concepts.
  • Encourage the use of multiple parents within the same FRBR class (e.g. "biographical dictionary" <subclass of> "dictionary, biography").
  • Encourage the use of <instance of> "genre" on work-level items (especially where so classed in AAT or other vocabularies and referenced).

The following table maps the FRBR:works tree (only) in FaBiO to WD concepts. It's also available in Google Sheets with more details at bit.ly/fabio2wd. (It's been years since I've done a Wikitable - if you can improve the format of this, I'd be delighted!)

Mapping FRBR:works in FaBiO to Wikidata
Wikidata Class Tree (proposed) FaBiO Class Tree FaBiO Comment
intellectual work (Q15621286) or written work (Q47461344) work "subclass of FRBR work, restricted to works that are published or potentially publishable, and that contain or are referred to by bibliographic references"
announcement (Q567303)) announcement
no item notification of receipt
retraction notice (Q7316896) retraction
creative work (Q17537576) artistic work
literary work (Q7725634) literary artistic work
composed musical work (Q207628) musical composition
novel (Q8261) novel
novella (Q149537) not in FaBiO
novella (Q43334491) (Renaissance) not in FaBiO
novelette (Q472808) not in FaBiO
play (Q25379) (many subclasses) play
poem (Q5185279) (many subclasses) poem
screenplay (Q103076) screenplay
short story (Q49084) short story
biography (Q36279) biography
autobiography (Q4184) not in FaBiO
hagiography (Q208628) not in FaBiO
(many more)
no item case for support
correction (Q5172784) correction
historical-critical edition (Q680458) (?) critical edition
data set (Q1172284) dataset
essay (Q35760) essay
no item examination paper
no item grant application
not WD Books item image
no item instructional work
not WD Books item metadata
not WD Books item model
not WD Books item opinion
not WD Books item policy
not WD Books item proposition
not WD Books item questionnaire
reference work (Q13136) reference work
encyclopedia (Q5292) not in FaBiO
dictionary (Q23622)
(many more)
not WD Books item reply "A work that is a reply, either to a letter or other direct communication, or to feedback or comments "
report (Q10870555) report
review (Q265158) review
scholarly work (Q55915575) (added) scholarly work
scholarly article (Q13442814) not in FaBiO
not WD Books item at work level sound recording
specification (Q2101564) specification
vocabulary (Q6499736) (poor match?) vocabulary
group of works (Q17489659) work collection
not WD Books item work package " component of the case for support of a grant application"
working paper (Q1228945) working paper

FaBiO data from: Peroni, S., Shotton, D. (2012). FaBiO and CiTO: ontologies for describing bibliographic resources and citations. In Journal of Web Semantics, 17: 33-43. https://doi.org/10.1016/j.websem.2012.08.001. Open Access at: http://speroni.web.cs.unibo.it/publications/peroni-2012-fabio-cito-ontologies.pdf CC-BY 4.0

https://sparontologies.github.io/fabio/current/fabio.html#toc

Similar mappings can be done for editions/exemplars, but I'd like some general feedback on this idea before I do that work (it's done by hand and it's very time-consuming). What do you think of this approach? - PKM (talk) 21:15, 30 July 2018 (UTC)

Overall, I like this approach. We shouldn't be trying to reinvent something that experts have already tackled. I do see a few items not listed, such as patent application (Q3022019), and dissertation (Q1385450) / doctoral thesis (Q187685) (where some clarification or a merger may be needed), and learning material (Q6006020) which should be the same as "instructional work", including textbook (Q83790). I also think group of works (Q17489659) does not quite describe a volume that is a published collection or anthology (Q105420). We'd need to be sure we have that part of the classification tree worked out as well. --EncycloPetey (talk) 21:23, 30 July 2018 (UTC)
FaBiO is very different from our current model on frbr:expression and frbr:manifestation - their "expression" is closer to our "version/edition". I've added the FaBiO side of the mappings to the Google sheet for reference. We might effectively blend these levels. - PKM (talk) 00:30, 31 July 2018 (UTC)
I think this makes sense. When you say "Encourage the use of <instance of> "genre" ...", by genre here you are referring to work types like "novel", etc., or even more specifically the subject matter (eg. romance novel (Q858330), crime novel (Q208505), science fiction novel (Q12132683) etc.? I guess this is ok, but I don't believe it's been common practice for this project up to now... ArthurPSmith (talk) 14:37, 31 July 2018 (UTC)

While is "musical composition" under "literary work"? Seems unintuitive to me. - Jmabel (talk) 19:22, 31 July 2018 (UTC)

@Jmabel: I agree, though I double-checked, and that's where they place it, with subclass "song". - PKM (talk) 20:56, 31 July 2018 (UTC)

More on FaBiO

Karen Coyle's FRBR, Before and After: A Look at Our Bibliographic Models has some good general info on FaBiO (and other models) from a librarian's perspective, noting its emphasis on fields relating to the workflow of academic publishing. She also notes "Along with the classes derived from FRBR entities, FaBiO has dozens of properties for bibliographic description, few of which would be considered exact equivalents of descriptive elements in library data." Coyle's work is available as a PDF here. My take today (subject to further exploration) is that FaBiO would be a great reference for our "work" class tree but possibly of lesser value for "versions" and "items". FaBiO also highlights properties we don't have today and might want to add. I'd really like to hear from some of our librarians on this. - PKM (talk) 20:56, 31 July 2018 (UTC)

I've written on "scholarly work" on my blog. It's a common concept but needs a good definition. I like the definition that (I assume you) gave it on Wikidata: "work that reports the result of study and analysis of a topic using scholarly methods". However note that others define it as "anything published in a scholarly journal." Some insist that it means "peer-reviewed." Others tie it into the fact of having citations (which would make all of WP scholarly...). Also, using the same term "scholarly" in the name and the definition is a kind of definitional "no-no." It would be longer, but how about "work that reports the result of study and analysis of a topic, usually peer-reviewed." Also, chatted at WikiCite2018 about getting better advice/instructions to editors into WD pages. Means that we wouldn't need to whole banana in the definition as long as further info would show up prominently on the page. Kcoyle (talk) 18:41, 1 December 2018 (UTC)

Too many names to ping

There too many user (66) on Wikidata:WikiProject Books/Participants for Echo to work (max 50). Ran into it when proposing my property. —Dispenser (talk) 02:01, 6 August 2018 (UTC)

There's an open Phabricator ticket for this problem. - PKM (talk) 20:01, 6 August 2018 (UTC)

Distribution

Currently distribution format (P437) is valid for works but not versions/editions. I've proposed fixing that on the property's Talk page. -PKM (talk) 20:01, 6 August 2018 (UTC)

Refining my proposed model (again)

Here are my updated thoughts on tackling this problem.

Proposed general approach

Books and other written works in Wikidata are modeled in 3 layers based on Functional Requirements for Bibliographic Records (Q16388). These layers are:

  • Work, corresponding to frbr:work and representing the intellectual content of a written work.
  • Version or edition, similar to both frbr:expression and frbr:manifestation, but not exactly equal to either of these. The "version" is a published or otherwise distributed version of a "work", with full bibilographic information, that can be searched for online or in a library or archive, and used as a citation to support statements in Wikidata.
  • [exemplar], a physical or digital object that is one and only one instantiation of a "version", such as an individual book in a collection or an illuminated manuscript. This is equivalent to frbr:item.

Proposed class trees

Works

The class tree for works is based on version 2 of the FRBR-aligned Bibliographic Ontology (Q44955004). We use the fabio:works hierarchy with modifications and extensions as agreed to by the Wikidata community.

  • Types of works should be <subclass of> [pick work item] or one or more of its subclasses.
  • Types of works may also be <instance of> genre (Q483394) or one of its subclasses, but should not be <subclass of> a genre.
  • Individual works should be <instance of> one or more subclasses of [pick work item].
  • Individual works should be linked to their versions using <has edition>.

Versions/editions

The class tree for versions is developed specifically for Wikidata.

Items

  • Bibiographic items should be <instance of> their object type (book, illuminated manuscript, codex)
  • Bibiographic items should be linked to a version using <exemplar of> the version.

Comments

So many people have dropped out of this conversation that we may never be able to reach consensus. However, here are a few more thoughts based on the conversation:

  • Instructing people to stop using <instance of> "book" for work-level items would be a big deal, since that behavior is recommended in several places, but I think it's the one idea everyone pretty much agrees on.
  • There is enough opposition to the "all humans are Q5 model" (that is, "all frbr:work-type items are <instance of> [some item]") that I don't believe we'd ever get consensus to go that way.
  • So my question is, how can we move forward? - PKM (talk) 19:39, 18 August 2018 (UTC)
    If we can agree on (at least) the proposal for work and version/edition items, it would make for a good start. The item level seems to be generating more conversation, and may need a deeper look. We also need to agree explicitly (with specific examples) how we will handle Wikisource data items. --EncycloPetey (talk) 01:40, 19 August 2018 (UTC)
    Thanks for writting this up, PKM. I fully agree with your model. Concerning "book": I think we should try to discourage people from using it because it can refer to any of the 3 levels. As alternative I propose to use written work (Q47461344) or any of its subclasses. --Pasleim (talk) 12:05, 20 August 2018 (UTC)
@Pasleim: Change of book to written work: Don't modify Help:source without changing the recommendations on Wikidata:WikiProject Books. Everything should be coherent. Snipre (talk) 18:35, 12 September 2018 (UTC)
@Pasleim, Snipre, EncycloPetey: I'll be sure to modify both Help:source and the recommendations on Wikidata:WikiProject Books, just for works and editions at this time, unless someone else beats me to it. It seems like we have consensus (or at least no objections) to making these changes. - PKM (talk) 19:54, 12 September 2018 (UTC)
@PKM, Pasleim, Snipre, EncycloPetey: I'm all for implementing those 3 layers. Otherwise, if works' instance of (P31) can be so many different things (subclasses of written work (Q47461344) or genre (Q483394)), I would be in favour of having a dedicated "FRBR level" property to be able to know this level without having to do SPARQL requests. (I know this possibility of a dedicated property has been discussed somewhere, it was at least orally discussed at WikiCite 2017, but I can't find the notes). That would be a way to get a "all humans are P31:Q5" equivalent without having to confront resisting forces. -- Maxlath (talk) 16:24, 5 November 2018 (UTC)
This definitively one option even if I prefer the other one: use a new property for genre, text format or other characteristics and keep only the 3 FRBR levels as unique value for instance of (P31). Why ? For the same reasons we avoid the use of book. If we use subclasses of work as value for instance of (P31), then we have to be clear with the label of the subclasses to avoid any misunderstanding. Snipre (talk) 10:22, 7 November 2018 (UTC)

Publication

Does anyone want to take a stab at fixing the parents of publication (Q732577)? It's clearly not a "work" in the FRBR sense. It's currently a parent of "version". - PKM (talk) 20:10, 23 August 2018 (UTC)

Work item properties: Deleting RVK identifiers

Should Regensburg Classification (P1150) not be used as shown in "Work item properties"? For example Bottroper Protokolle, SWB-Online Katalog: RVK-Notation: GN 9999, MS 1420 etc. @JakobVoss: Why do you remove this property for books? --2A00:C1A0:4882:2F00:4538:E01:D5DA:B609 00:15, 12 September 2018 (UTC)

RVK notations on work items should better be replaced by main subject (P921), genre (P136) and related properties. Regensburg Classification (P1150) is more useful as mapping between RVK classes and Wikidata items. To give an example:

Updating the Project page

I have changed the example of the project page to show <instance of> written work, not book (which is what the sample item says anyway). I have added a second example with instance of scholarly work. I have added simple one-liners that "works" should be instances of "written work" or its subclasses and that editions should be instances of "version, edition or translation" or its subclasses.

I also suggest:

  • Specifically calling out that <instance of> "book" is deprecated (and making sure "book" is no longer a subclass of "work"
  • Adding a section on class tree and FaBiO (or possibly referencing a subpage on this topic?)

Either way, I'll announce these changes on Project Chat when I am done. Comments? - PKM (talk) 20:02, 16 September 2018 (UTC)

Thank you very much! I will not add instance of (P31)book (Q571) anymore! --Epìdosis 20:09, 16 September 2018 (UTC)
Re: Works should be instances of written work (Q47461344) or one of its subclasses.
We also need to allow for books that consist of printed musical scores and for books that consist primarily (or entirely) of artwork or photographs. --EncycloPetey (talk) 01:34, 17 September 2018 (UTC)
Do we have classes for those? FaBiO has “musical composition” but our item with that name = “act of composing music”. What would you want call them? - PKM (talk) 02:20, 17 September 2018 (UTC)
composed musical work (Q207628) is a subclass of musical work (Q2188189), although the translations seem to be divided over whether the item refers to the act or the result. --EncycloPetey (talk) 02:48, 17 September 2018 (UTC)
Ugh, then someone will eventually need to duplicate and disambiguate them. I’ve been thinking that works that are primarily artworks or photographs might be “scholarly works” or “reference works”. How would you feel about “Works will generally be instances of written work (Q47461344) or one of its subclasses. There may be exceptions for other types of works issued in book form.” - PKM (talk) 19:58, 17 September 2018 (UTC)
It would help if we could also link to a page or discussion with additional information, such as a copy of the FaBiO table and comments about musical score books, etc. In order words, have the short blurb, but with a link to additional and more detailed information. --EncycloPetey (talk) 00:36, 18 September 2018 (UTC)
I’ll put that on my to-do list. I want to format the FaBiO table a bit dfferently than above, and I’m thinking about how best to do that. - PKM (talk) 20:23, 18 September 2018 (UTC)
This is not a good idea to start the classification like that: we have musical work (Q2188189), work of art (Q838948), creative work (Q17537576), intellectual work (Q15621286), anonymous work (Q567620), derivative work (Q836950), collective work (Q3594128), recreative work (Q17538258), revoiced work (Q26160672), written work (Q47461344), work of science (Q11826511), posthumous work (Q17518461), scholarly work (Q55915575),... before saying that "Works should be instances of written work (Q47461344)", we need to have a clear understanding of the top classification but currently we are missing a global picture. Snipre (talk) 12:31, 19 September 2018 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── I don't agree. Many of these items (intellectual work (Q15621286), creative work (Q17537576), anonymous work (Q567620), derivative work (Q836950), collective work (Q3594128), recreative work (Q17538258), revoiced work (Q26160672), and posthumous work (Q17518461)) are broader concepts than works in the FRBR sense. FBRR-works may be any of these classes but they are also "book-works" whatever that is in WD. I don't believe work of art (Q838948) is a type of "book" in the scope of this project. musical work (Q2188189) is an outlier. I think it is important to define a parent class of which "all of these items are (FRBR works)".

We could take a cue from FaBiO and add a new "parent" "bibliographic work" = "work that is published or potentially publishable, and that contains or is referred to by bibliographic references". I'd also be happy using intellectual work (Q15621286) as the recommended parent class, although I think "written work" is better.

And I would not have made the changes to the project page if I didn't that we had consensus from those who are still participating in this conversation after (something like) three years. - PKM (talk) 00:15, 20 September 2018 (UTC)

@EncycloPetey: Wikidata:WikiProject Books/Works is a draft of a "more information" page for works. Feel free to edit or suggest changes. - PKM (talk) 20:13, 20 September 2018 (UTC)

Pictorial works?

@EncycloPetey, kcoyle: I am thinking of a new item pictorial work = "creative work consisting primarily of a selection of images, with minimal or no accompanying text" for the FRBR work-level item for books of photographs or artworks. The name is analogous to pictorial map (Q162206). Thoughts? Other suggestions? - PKM (talk) 20:10, 21 September 2018 (UTC)

Hmm. The terminology seems to be overly vague if you are intending this to be limited to book-like works. It also doesn't seem to match well with the concept used for pictorial map (Q162206). However, I haven't been able (yet) to think of an alternative label. I think the question we need to answer first is "What sort of things would we need to include in such a category?" Would we want to include cartographic works? collections of bird / plant / natural history illustrations? photographic collections? museum catalogs? If we can decide what sorts of items might be included, then we may be better able to assign a label to the concept. --EncycloPetey (talk) 00:58, 22 September 2018 (UTC)
It was suggested that using written work as our base class, we don't have a place for FRBR-works that are primarily artworks or photographs, so I am trying to resolve that. For some reason, map is currently a <subclass of> written work, so that's covered unless we change it. In my mind, atlases are reference works. Art and exhibition catalogs are often (but not always?) scholarly works. So I think this class—if we make it—would include natural history illustrations and non-scholarly photographic and artwork collections. - PKM (talk) 20:26, 22 September 2018 (UTC)
I think these would be good questions to take up at Wikicite 2018, if you will be there. Categorizing works is very hard, not straight-forward, so we need to give good guidance. There are definitely works that are not "written" - including music, dance, maps, films, and all of the visual arts. Let's make sure we have a way to cite all works. I'm game to put some time into this. 184.23.19.186 21:17, 23 November 2018 (UTC)

Thousands of bad edits

It seems User:Simon Villeneuve is running his account as an unsupervised bot, making thousands of additions, many of which do not fit the WikiProject Books data structure (among other issues). He is adding publisher (P123) to work items, instead of edition items. I have posted this issue to his talk page, but as I said, his account seems to be running automatically and without supervision. There will be a big mess to clean up afterwards. --EncycloPetey (talk) 13:43, 24 September 2018 (UTC)

Hi,
I have made a mistake. It is easy to correct it, but before, I want to be sure that there is a consensus here. I'll write my questions in a moment. Simon Villeneuve (talk) 14:11, 24 September 2018 (UTC)
Ok. We have about 39,000 books who have a publisher entry :
SELECT DISTINCT ?b ?bLabel WHERE { SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" } ?b wdt:P31/wdt:P279* wd:Q571 ; wdt:P123 [] . MINUS {?b wdt:P31/wdt:P279* wd:Q3331189 .}.}
Try it!
. I didn't enter all of them myself, so it seems that I'm not the only one who is making that "mistake" (if it is one).
My values come from infobox of enwiki. EncycloPeter (EP) said to me before on my talk page that there is a consensus to not put a publisher, release date or another property associated with a version/edition on item dedicated to book. When I asked him to show me a discussion about this consensus, he didn't point me one, just saying to me to come here to talk about it. I forgot the discussion on my talk page and I put back publishers this morning, but I have stopped this again until we agree here what to do.
For now, there is about 129,000 items about books and about 38,000 items about edition/version. Many item classified as edition/version seems to be wrongly classified, as Catholic Bible (Q591016) (or, at least, they can't have an editor, release date and other properties like that if I follow the logic of EP).
Here are my questions :
1- Do the project plan to create an item for every edition/version of a book ?
2- If so, can we put all publishers of a peculiar book on the item dedicated to the book until all of these items have been created or must we wait that every edition/version have been created to do so ?
3- If there's is a rough consensus not to put a publisher, release date and anoter properties on books, why nobody have blocked publisher (P123), publication date (P577), translator (P655) and so on with none-of constraint (Q52558054) for book (Q571) ?
I can compare this situation with items dedicated to films. We don't create an item for every language version of a film, or an element for every different publication date (P577) or distributed by (P750) in every country. We put all these informations on the element dedicated to the film. Simon Villeneuve (talk) 14:52, 24 September 2018 (UTC)
1. see Wikidata_talk:WikiProject_Books#Does_there_*always*_need_to_be_a_separate_work_and_edition_item?. There is not really consensus about this.
2. In my opinion, you can leave the statements there until an edition item is created. But I suggest that we remove publisher data on these 1300 items.
3. It is worse. There is not only a missing none-of constraint (Q52558054) but the constraints on publisher (P123) even enforce that publisher can only be used on work items and not on edition items. --Pasleim (talk) 16:57, 24 September 2018 (UTC)
Ok, thank you for the link. I'll read this discussion later.
I forgot to talk about the ~31,000 items about books who have an ISBN 10 or 13.
SELECT DISTINCT ?b ?bLabel WHERE { SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" } {?b wdt:P31/wdt:P279* wd:Q571 . ?b wdt:P957 [] .} UNION {?b wdt:P31/wdt:P279* wd:Q571 . ?b wdt:P212 [] .} }
Try it!
What do we do with them ? Simon Villeneuve (talk) 18:34, 24 September 2018 (UTC)
For these items, I believe the "correct" process is to duplicate the item, change the one with the sitelinks to a "work" and the duplicate to an "edition" and remove the incorrect statements on the resulting items. It's a lot of effort and I don't know if there is a way to automate it. Until recently, our published best practice was to use <instance of> book for work items (and "book" was still a sublcass of "creative work" and "literary work" last time I looked). We haven't yet broadly communicated this change in recommended practice. This is going to be a long-term effort. - PKM (talk) 20:08, 24 September 2018 (UTC)
It depends on where the place to which the sitelinks point. If the sitelinks are to WP, then it should usually be a work, unless the Wikipedia article is about a particular edition or exemplar. If the sitelinks are to Wikisource or Commons, then you will have to look at the Wikisource page or Commons page/category to see whether that page is for an edition or a work.
But when it comes to ISBNs, any values taken from Wikipedia pages will be a mess. The Wikipedia editors routinely added ISBNs for a current edition to all manner of articles about publications, irrespective of the edition. --EncycloPetey (talk) 02:09, 25 September 2018 (UTC)

Why author (P50) both on works and editions?

Since (I suppose) the author of the work is the same author of every single edition, wouldn't be sufficient use author (P50) only on work items?--Malore (talk) 14:14, 16 October 2018 (UTC)

There might be a few exceptions (for example works with multiple authors) but, more important, without the author it will be difficult to use the item in different contexts like Template:Cite Q (Q22321052). --Kolja21 (talk) 20:31, 16 October 2018 (UTC)
The three editions of Bayesian Data Analysis (Q29167237) has had slightly different authors, see [6]. When the editions are cited then it is easier if all the information is available on the edition item. — Finn Årup Nielsen (fnielsen) (talk) 00:02, 8 November 2018 (UTC)
The biggest problem with placing an author on an edition occurs when the edition is a translation. There are bots that will automatically label the translation as "edition by author" in other languages, which is incorrect for a translation. Conversations about this issue with people who run such bots have yielded no results. --EncycloPetey (talk) 00:41, 8 November 2018 (UTC)
The typical case I know are textbooks where the authors update the content through several editions and when one author dies or stops to contribute, new people contribute in replacement but the original authors are still mentioned in addition to the new authors. This is a particular case but quite recurrent and for that we need to adjust our model.
For the case mentioned by EncycloPetey, I think this is a problem of data import and not a structural model requiring a model modification. This way to import data should not be accepted and measures to avoid that are necessary. @EncycloPetey: If you find again that problem and you receive no positive feedback from bot operators, please report this here and we can launch actions as WikiProject: this can have a bigger effect especially when several persons support the bot flag withdrawal. Snipre (talk) 10:08, 8 November 2018 (UTC)
@Snipre: You mean like this one? e.g. This edit identifies a Gutenberg e-book as an "uitgave van Stephen Crane" (edition of/from Stephen Crane). --EncycloPetey (talk) 15:38, 17 November 2018 (UTC)
@EncycloPetey: No, your edit is just a confusing way to indicate that the item is about an edition, but the author is the same for the work and the edition. My case is the following:
work item
author: XX
first edition
author: XX
second edition item
author: XX and YY
And label description is not a real part of WD modelling. Snipre (talk) 13:41, 3 December 2018 (UTC)
But the description is still highly misleading. Stephen Crane was not responsible for that edition. --EncycloPetey (talk) 17:47, 3 December 2018 (UTC)
@EncycloPetey: I don't speak enough that language to clearly judge what was the sense behind that sentence. My opinion is not to consider description or label as relevant to assess an item. Do you speak that language ? Wikidata is the right structure to bypass all languages difference to use more absolute characteristics to define a concept. Snipre (talk) 19:45, 4 December 2018 (UTC)

editions of a book

Editions of a book are often described in a non-numeric way (or not purely numeric). How to provide this information in a wikidata item?

Just few examples:

  1. s:en:Page:Ossendowski - Beasts, Men and Gods.djvu/8: Ninth Printing Is this equivalent to edition number (P393) = 9?
  2. s:pl:Strona:Asnyk Adam - Pisma 03. Wydanie nowe zupełne.djvu/007: Wydanie nowe zupełne (Edition new and complete); we do not know how many earlier editions were.
  3. s:pl:Strona:Józef Piłsudski - Wspomnienia o Gabrjelu Narutowiczu (1923).djvu/04: Pierwszy — dziesiąty tysiąc (First — tenth thousand); OK, here we can assume this is edition number (P393) = 1.
  4. s:mul:Page:H.M. Der Untertan.djvu/10: Dreiundachtzigstes bis neunundneinzigstes Tausend (Eighty-third to ninety-ninth thousand). Which edition?

As you can see at least for pl & de early-XX c. books editions were named basing on edition volume size. You can also find editions like "corrected", "cheap", "hardcover", etc. the latter ones might slightly vary in content, not only in cover. And they definitely have different IDs in library catalogues.

Any hints? Ankry (talk) 10:25, 6 November 2018 (UTC)

My opinion:
1) Yes, use 9 as value for edition number (P393). Numbers should be preferred to allow better parsing when comparing data.
2) edition number (P393) = unknown value. There is a value but it is undetermined with the current information.
3) edition number (P393) = 1. Same as 1).
4) edition number (P393) = unknown value. There is a value but it is undetermined with the current information.
Snipre (talk) 10:09, 7 November 2018 (UTC)
1) No. The 9th printing is not the 9th edition. Printings are typically considered the same edition as whichever was the previous edition because no new editorializing has taken place. The book is re-printed from the same typsetting.
We will not always be able to describe the edition with a numerical value. In English publications, there is often a "US" and a "UK" edition published simulaneously. Sometimes the "first" edition is a translation (published in a different language) because the translation goes to press before the original. Sometimes the "first" edition is delayed and a revised edition is published before the original makes it to press. Sometimes even the scholars disagree over the numbering of editions. --EncycloPetey (talk) 12:08, 7 November 2018 (UTC)

LCCN (bibliographic)

Currently, Library of Congress Control Number (LCCN) (bibliographic) (P1144) is described as an authority control for works, but the example and any links to the Library of Congress you care to examine are all for editions. Shouldn't the property restrictions be corrected to apply to editions? --EncycloPetey (talk) 02:40, 11 November 2018 (UTC)

Yes. - PKM (talk) 20:33, 11 November 2018 (UTC)

So, how do we get this fixed? --EncycloPetey (talk) 14:45, 14 November 2018 (UTC)

I've changed the constraint [7] --Pasleim (talk) 14:57, 14 November 2018 (UTC)
For instance of (P31) it still has "Wikidata property for authority control for works". Will this create a problem? --EncycloPetey (talk) 01:01, 15 November 2018 (UTC)

Reprints

Should simple reprints of a book (that have no changes in content, no change in copyright date, and no designation as a new edition) get separate items in Wikidata? If the answer is yes, does it matter if the publisher is different? For example, if Penguin Books originally publishes a book in 2015 and then reprints it again in 2017 (but with no changes in content except for a one-line note that it's a reprint), should each get a separate Wikidata item? For case #2, let's imagine that a modern publisher reprints the original edition of Frankenstein; or, The Modern Prometheus, should that get a new Wikidata item? Please ping me on any response. Thanks. Kaldari (talk) 01:49, 6 December 2018 (UTC)

I bet EncycloPetey will know the answer to this! Kaldari (talk) 01:54, 6 December 2018 (UTC)
If a modern publisher reprints an old book, it will either be a new edition or a facsimile edition. And if it's a new publisher, that will necessitate a new data item because the date of publication and the publisher are different from the other edition. If it's a facsimile reprint edition, it's de facto a new edition, usually with an ISBN that the original didn't have. For example, the Methuen facsimile reprint of Shakespeare's First Folio, printed 300+ years after the original can't be fit into the data item for the original FF; the publisher, date, etc. are all different.
If it's a new printing of an edition, it could get a new data item, but Wikiproject:Books hasn't really tackled that issue yet. Thus far, we haven't had to worry about that question.
A rule of thumb is: if the data of the "new" edition / printing differs from previous editions / printings, then it needs a new data item. --EncycloPetey (talk) 02:02, 6 December 2018 (UTC)
@EncycloPetey: Here's an actual example of the problem (to prove it's worth worrying about)... Q51499531 is the original U.S. printing of a book in 1912; Q51499519 is a 1915 reprint by a different publisher, while Q51499528 is a 1916 reprint by the same publisher as the 1915 reprint. All three have absolutely identical content except for slight changes to the title page: same pagination, same typesetting, everything. Should all three of these have separate items in Wikidata? Kaldari (talk) 04:58, 6 December 2018 (UTC)
@Kaldari: Different publication date, different publishers enough to considered as different editions. Pagination, typestting can be the same but author of foreword, illustrator or number of pages can be different. Just remember that an item can be used as reference with a page number to source a statement, so everything should be the same especially the page number to considered an edition as reprint. This is not the case in your examples, so different items are required. Snipre (talk) 11:09, 6 December 2018 (UTC)
@Snipre: I believe it is the case in my examples. Judging by the linked scans at the Internet Archive, all three have the exact same pagination and content so they are considered reprints, right? What about the case of Q51499519 vs Q51499528. The only difference between these is that one is a 1915 reprint and the other is a 1916 reprint. Otherwise, they are identical and have the same publisher, pagination, and content as each other. Should we have separate items for both? Kaldari (talk) 17:35, 6 December 2018 (UTC)
Personally, my opinion is that we should not have separate items for books that differ only in publication date (but have the same publisher, content, pagination, etc.). Some books are reprinted nearly every year and having separate items for every year creates a needless maintenance burden and makes it difficult to figure out which item is appropriate to link to. Kaldari (talk) 17:43, 6 December 2018 (UTC)
On a tangent here: How does a book printed in 1912 have an ISBN? --EncycloPetey (talk) 03:43, 8 December 2018 (UTC)

Proposal

I propose the following guidelines for editions and reprints:
Each edition of a book should have a separate Wikidata item. If the content, pagination, or publisher changes, a new item should be created for that edition. If a book is an identical reprint of a previous edition by the same publisher (with the same content and pagination) it does not need a new item.
Kaldari (talk) 01:30, 10 December 2018 (UTC)

@Kaldari: Another formulation:
Each edition of a book should have a separate Wikidata item. If the content (foreword, afterword, illustration), pagination (page number), or publication data (publication date, publication place, publisher) changes, a new item should be created for that edition. If a book is an identical reprint of a previous edition (no change in the mentioned properties), it does not need a new item.
Snipre (talk) 10:26, 10 December 2018 (UTC)
@Snipre: I like your wording, except for one detail. I don't think we should include publication date in the second sentence. Although technically the publication date data would stay the same for an identical reprint anyway (since publication date (P577) specifies "date or point in time when a work was first published or released), it may confuse people if we say "publication date" here since reprints by definition have new publication dates (but not new first publication dates). Kaldari (talk) 17:47, 10 December 2018 (UTC)
So, applying this to the question above, #How many edition items for On the Origin of Species ?, it seems that most of the entities mentioned in that discussion would end up with different items (creating quite a chaos!) Are there properties that could be used to group together the main different variants of the actual text (ie disregarding differences of pagination, publisher, etc), in order to bring some structure to the set? And/or that would allow one to group together the versions that are most similar to each other? Jheald (talk) 18:28, 10 December 2018 (UTC)
If there are no objections within the next couple days, I'll add a tweaked version of Snipre proposal to the WikiProject page so that we have some guidance on this. We can always refine it further if needed. Kaldari (talk) 19:16, 12 December 2018 (UTC)
Ok to avoid publication date in the above properties list.
@Jheald: What is the problem to have hundreds of editions for On the Origin of Species ? The only problem is just to clearly describe the editions. And perhaps in the first step the best is to focus on the editions with the highest of examplars. Nobody requires to have all editions of all books in WD. Snipre (talk) 20:08, 12 December 2018 (UTC)
It's useful to be able to identify that, although the book was published many times in many different forms, there were only quite a limited number of texts for the book -- so eg if somebody wants to look at how the text varied, they're not faced with a huge number of items with no guidance as to how which relates to which. Similarly, if somebody has an exemplar of a book (or a new scan set) it is useful to be able to identify which other of the published forms it is closest to. Jheald (talk) 16:34, 14 December 2018 (UTC)

Introduction to notabilty

Hi, I have never made an item about a written work but I would like to do something for some local history books I have here in my room. I am giving them way and usually I "wikimedianize" all this stuff before I remove it. Like I use them as a source in some language editions. I feel it's time to create also an item on wikidata about them but I'd like to know more about the notabilty guidelens before I start. I made a quick sampling, and it does not seem to me that we have a lot of detailed entries in my language (Italian). Even common books are probably missing.

So for example this book is used as a source on itwikipedia and it's one of the books I am giving away. Are those the sort of books whose items we aim to have soon or later? Or we just stick to create items that have at least one clear IDs? (maybe some language might be more common in certain databases, so this could be biased, not sure)--Alexmar983 (talk) 18:58, 7 December 2018 (UTC)

You mean La fortezza di San Barnaba, Firenze 2001? You can create an item for this edition with the properties ISBN-10 (P957) and SWB editions (P1044). Every book that is cited in Wikipedia is notable. --Kolja21 (talk) 01:40, 8 December 2018 (UTC)
Thank you that's what I needed.--Alexmar983 (talk) 15:18, 8 December 2018 (UTC)

What are the best modelled items for your areas of interest?

Hi all

Over the past few months myself and others have been thinking about the best way to help people model subjects consistently on Wikidata and provide new contributors with a simple way to understand how to model content on different subjects. Our first solution is to provide some best practice examples of items for different subjects which we are calling Model items. E.g the item for William Shakespeare (Q692) is a good example to follow for creating items about playwright (Q214917). These model items are linked to from the item for the subject to make them easier to find and we have tried to make simple to understand instructions.

We would like subject matter experts to contribute their best examples of well modelled items. We are asking all the Wikiprojects to share with us the kinds of subjects you most commonly add information about and the best examples you have of this kind of item. We would like to have at least 5 model items for each subject to show the diversity of the subject e.g just having William Shakespeare (Q692) as a model item for playwright (Q214917), while helpful may not provide a good example for people trying to model modern poets from Asia.

You can add model items yourself by using the instructions at Wikidata:Model items. It may be helpful to have a discussion here to collate information first.

Thanks

John Cummings (talk) 15:29, 17 December 2018 (UTC)

Return to the project page "WikiProject Books/2018".