Wikidata talk:WikiProject Manuscripts/Data Model

Alignment with TEI P5-Guidelines

Latest comment: 7 months ago2 comments1 person in discussion

This is a draft for an alignment of Wikidata statements with the P5: Guidelines for Electronic Text Encoding and Interchange issued by the Text Encoding Initiative, Version 4.6.0 (last updated on 4th April 2023, revision f18deffba, link to source).

Details on Manuscript Description can be found in Chapter 10. The msdescription module has a variety of tags and parameters for this purpose.

10.3 Phrase-level Elements

⚓︎

10.3.1 Origination

<origDate>: (origin date) contains any form of date, used to identify the date of origin for a manuscript, manuscript part, or other object.

OK inception (P571): time when an entity begins to exist; for date of official opening use P1619

<origPlace>: (origin place) contains any form of place name, used to identify the place of origin for a manuscript, manuscript part, or other object.

OK location of creation (P1071): place where the item was conceived or made; where applicable, location of final assembly

10.3.2 Material and Object Type

<material>(material) contains a word or phrase describing the material of which the object being described is composed.

@function describes the function or use of the material in relation to the object as a whole. Sample values include: 1] binding; 2] endband; 3] slipcase; 4] support; 5] tie

OK made from material (P186) with qualifier applies to part (P518)

10.3.3 Watermarks and Stamps

<watermark> (watermark) contains a word or phrase describing a watermark or similar device.

<stamp> (stamp) contains a word or phrase describing a stamp or similar device.

missing

10.3.4 Dimensions

⚓︎

<dimensions> (dimensions) contains a dimensional specification.

@type indicates which aspect of the object is being measured. Sample values include: 1] leaves; 2] ruled; 3] pricked; 4] written; 5] miniatures; 6] binding; 7] box

<height> (height) contains a measurement measured along the axis at a right angle to the bottom of the object.

OK height (P2048): vertical length of an entity

<width> (width) contains a measurement of an object along the axis parallel to its bottom, e.g. perpendicular to the spine of a book or codex.

OKwidth (P2049): width of an object

<depth> (depth) contains a measurement from the front to the back of an object, perpendicular to the measurement given by the width element.

OKthickness (P2610): extent from one surface to the opposite

<dim> contains any single measurement forming part of a dimensional specification of some sort.

missing

10.3.5 References to Locations within a Manuscript

⚓︎

<locus> (locus) defines a location within a manuscript, manuscript part, or other object typically as a (possibly discontinuous) sequence of folio references.

@from (from) specifies the starting point of the location in a normalized form, typically a page number.

@to (to) specifies the end-point of the location in a normalized form, typically as a page number.

@scheme (scheme) identifies the foliation scheme in terms of which the location is being specified by pointing to some foliation element defining it, or to some other equivalent resource.

<locusGrp> (locus group) groups a number of locations which together form a distinct but discontinuous item within a manuscript, manuscript part, or other object.

@scheme (scheme) identifies the foliation scheme in terms of which all the locations contained by the group are specified by pointing to some foliation element defining it, or to some other equivalent resource.

Comment Modelling of loci in Wikidata is a bit sketchy by our current methods. For example, in Laurentianus Plutei 70.5 we list all the works (texts) transmitted in this manuscript (disregarding scholia) by using exemplar of (P1574) (see statements permalink) with a qualifier page(s) (P304) for the range. Instead of P304, section, verse, paragraph, or clause (P958) could be used as well. Both have the disadvantage that their datatype is string and the values cannot easily be rendered as e.g. links to the exact location on a scan of the manuscript. Jonathan Groß (talk) 18:09, 12 November 2023 (UTC)Reply

10.3.6 Names of Persons, Places, and Organizations

⚓︎ About modelling persons, so doesn't apply here.

<msIdentifier> (manuscript identifier) contains the information required to identify the manuscript or similar object being described.

<country> (country) contains the name of a geo-political unit, such as a nation, country, colony, or commonwealth, larger than or administratively superior to a region and smaller than a bloc.

<region> (region) contains the name of an administrative unit such as a state, province, or county, larger than a settlement, but smaller than a country.

<settlement> (settlement) contains the name of a settlement such as a city, town, or village identified as a single geo-political or administrative unit.

<institution> (institution) contains the name of an organization such as a university or library, with which a manuscript or other object is identified, generally its holding institution.

<repository> (repository) contains the name of a repository within which manuscripts or other objects are stored, possibly forming part of an institution.

<collection> (collection) contains the name of a collection of manuscripts or other objects, not necessarily located within a single repository.

<idno> (identifier) supplies any form of identifier used to identify some object, such as a bibliographic item, a person, a title, an organization, etc. in a standardized way.

<altIdentifier> (alternative identifier) contains an alternative or former structured identifier used for a manuscript or other object, such as a former catalogue number.

<msName> (alternative name) contains any form of unstructured alternative name used for a manuscript or other object, such as an ‘ocellus nominum’, or nickname.

10.5 The Manuscript Heading

⚓︎

<head> (heading) contains any type of heading, for example the title of a section, or the heading of a list, glossary, manuscript description, etc.

OK title (P1476)

10.6 Intellectual Content

⚓︎

<msContents> (manuscript contents) describes the intellectual content of a manuscript, manuscript part, or other object either as a series of paragraphs or as a series of structured manuscript items.

<msItem> (manuscript item) describes an individual work or item within the intellectual content of a manuscript, manuscript part, or other object.

<msItemStruct> (structured manuscript item) contains a structured description for an individual work or item within the intellectual content of a manuscript, manuscript part, or other object.

<summary> contains an overview of the available information concerning some aspect of an item or object (for example, its intellectual content, history, layout, typography etc.) as a complement or alternative to the more detailed information carried by more specific elements.

10.6.1

⚓︎

<author> (author) in a bibliographic reference, contains the name(s) of an author, personal or corporate, of a work; for example in the same form as that provided by a recognized bibliographic name authority.

<respStmt> (statement of responsibility) supplies a statement of responsibility for the intellectual content of a text, edition, recording, or series, where the specialized elements for authors, editors, etc. do not suffice or do not apply. May also be used to encode information about individuals or organizations which have played a role in the production or distribution of a bibliographic work.

<title> (title) contains a title for any kind of work.

@type classifies the title according to some convenient typology. Sample values include: 1] main; 2] sub (subordinate); 3] alt (alternate); 4] short; 5] desc (descriptive)

<rubric> (rubric) contains the text of any rubric or heading attached to a particular manuscript item, that is, a string of words through which a manuscript or other object signals the beginning of a text division, often with an assertion as to its author and title, which is in some way set off from the text itself, typically in red ink, or by use of different size or type of script, or some other such visual device.

<incipit> contains the incipit of a manuscript or similar object item, that is the opening words of the text proper, exclusive of any rubric which might precede it, of sufficient length to identify the work uniquely; such incipits were, in former times, frequently used a means of reference to a work, in place of a title.

Needs discussion: We do have incipit (Q1161138)Wikidata property (P1687)first line (P1922) so we could use that. Still, we need a way to model the locus (page(s) (P304) and section, verse, paragraph, or clause (P958) come to mind).

<quote> (quotation) contains a phrase or passage attributed by the narrator or author to some agency external to the text.

OK quotation or excerpt (P7081) seems to be the equivalent concept.

<explicit> (explicit) contains the explicit of a item, that is, the closing words of the text proper, exclusive of any rubric or colophon which might follow it.

Needs discussion: We do have explicit (Q3062109)Wikidata property (P1687)last line (P3132) so we could use that. Still, we need a way to model the locus (page(s) (P304) and section, verse, paragraph, or clause (P958) come to mind).

<finalRubric> (final rubric) contains the string of words that denotes the end of a text division, often with an assertion as to its author and title, usually set off from the text itself by red ink, by a different size or type of script, or by some other such visual device.

<colophon> (colophon) contains the colophon of an item: that is, a statement providing information regarding the date, place, agency, or reason for production of the manuscript or other object.

<decoNote> (note on decoration) contains a note describing either a decorative component of a manuscript or other object, or a fairly homogenous class of such components.

<listBibl> (citation list) contains a list of bibliographic citations of any kind.

<bibl> (bibliographic citation) contains a loosely-structured bibliographic citation of which the sub-components may or may not be explicitly tagged.

<filiation> (filiation) contains information concerning the manuscript or other object's filiation, i.e. its relationship to other surviving manuscripts or other objects of the same text or contents, its protographs, antigraphs and apographs.

<note> (note) contains a note or annotation.

<textLang> (text language) describes the languages and writing systems identified within the bibliographic work being described, rather than its description.

10.6.2 Authors and Titles

⚓︎ Is about phrasing author names and titles, so not applicable here.

10.6.3 Rubrics, Incipits, Explicits, and Other Quotations from the Text

⚓︎

10.6.4 Filiation

⚓︎

10.6.5 Text Classification

⚓︎

10.6.6 Languages and Writing Systems

⚓︎

10.7 Physical Description

⚓︎

10.8 History

⚓︎

10.9 Additional Information

⚓︎

10.10 Manuscript Parts

⚓︎

10.11 1 Manuscript Fragments

⚓︎

The above is a work in progress. --Jonathan Groß (talk) 18:09, 12 November 2023 (UTC) Jonathan Groß (talk) 18:09, 12 November 2023 (UTC)Reply

An overview of terminology

Latest comment: 22 days ago4 comments3 people in discussion

Gumbert (2004): the Codicological Unit

J. Peter Gumbert, "Codicological Units: Towards a Terminology for the Stratigraphy of the Non-Homogenous Codex", in: Segno e testo 2 (2004), 17–42.

Gumbert describes manuscripts mainly from the perspective of the creation and history of the physical object, while taking its intellectual contents into account. After a brief survey of previous attempts at creating terminology and discussion of a few cases of composite codices, Gumbert goes on to propose and explain his own terminology. What follows is an excerpt from p. 25, with bold highlighting instead of cursive:

A codicological unit (≈ Munk Olsen's 'élément codicologique') is a discrete number of quires, worked in a single operation, containing a complete text or set of texts.

It is unarticulated,

either uniform if there are no boundaries in it (except possibly quire and text boundaries)

or homogenous if it is divided by boundaries (for instance of hands) into sections, but it is not divisible (because the boundaries do not coincide with quire boundaries);

it is articulated if it is divided by caesuras into blocks (= 'codicological unit' according to Mazurelle), which makes it divisible.

After this, Gumbert goes on to describe two classes of codices. Excerpt forom p. 29:

A monomerous codex is a manuscript which contains a single codicological unit.

A composite codex is a manuscript which contains two or more codicological units.
These can be

independent (and they form a paratactic composite),

or dependent if they have been made to fit a pre-existent kernel [core] (and they form a hypotactic composite);

these can be

monogenetic (if they have been written by the same scribe)

or homogenetic (if they come from the same circle and time)

or allogenetic.

Note that the blocks within a codicological unit can also be described as monogenetic or homogenetic.

In relation to volumes, Gumbert states that a codicological unit can be

part of a volume, or

identical to a volume, or

span across multiple volumes.

On pp. 30–33, Gumbert takes the developments of the codicological unit into account. A codicological unit, he says, ceases to be an undisturbed unit when it is altered in any way after its creation.

It may become smaller by parts being removed intentionally or lost unintentionally. What remains is a defective codicological unit (if substantial: torso, if small: fragment).

A codicological unit may also be broken up (codex discissus) "with the intention of allowing it[s removed part] a separate existence as a severed unit, apart from its trunk".

The codicological unit can also grow by additions.

If this happens without adding new leaves or quires to the unit, Gumbert speaks of an enriched codicological unit (A. I. Doyle had suggested "enhanced"). Examples include "the completion of decorations which had been only partially executed", addition of glosses (marginalia, scholia) or of guest-texts (Maniaci: "microtesti") on blank pages at the end of a unit (or even in the margins). Gumbert calls the boundary between these texts and those of the original production process a suture. These additions can be specified as being allogenetic, homogenetic or even monogenetic; but if they are difficult to discern, Gumbert suggests to speak of continuous enrichment.

In contrast to that, if new leaves or other material (like inserted miniatures on small strips of paper) are added to the codicological unit, Gumbert calls it an enlarged unit. The added pieces he calls infixes, and the corresponding boundaries joins (sic, not joints).

Even if part of the unit is replaced with something new, the result may still be called an enlarged unit. Gumbert relates a case for mono- or homogenetic replacement in a Cicero manuscript (BPL 127 B) where the scribe (Sozomenos, early 15th century) had later found another manuscript with a more complete text and used that to replace the text he himself had written a few years earlier. – For cases of much later (= allogenetic) replacement, Gumbert prefers to speak of repair and suggests regarding them as dependent units (see above) in what is now a hypotactic composite codex.

If the additions become even more substantial (scribe B starts a guest text on a blank page and then adds a full quire or more), Gumbert calls the result an extended codicological unit (≈ Kwakkel's "extended production unit"). The acretions (new content on old material + those on new, added material) can be allo-, homo- or monogenetic.

Summary of this step: A codicological unit can

remain undisturbed;
or become smaller
by loss: defective unit or fragment,

by removal of a severed unit (remains: a trunk):
or grow
by addition of a new layer: enriched unit,

by addition of a guest text: enriched unit,

by addition of an infix: enlarged unit,

by addition of an accretion: extended unit.

With this in mind, Gumbert updates his definition of the codicological unit thus: a codicological unit is

a discrete number of quires,
worked in a single operation
– unless it is an enriched, enlarged or extended unit,
containing a complete text or set of text
– unless it is an unfinished, defective or dependent unit.

For combinations of codicological units, Gumbert (p. 34) proposes the term file (= Kwakkel: "usage unit"; A. I. Doyle: "string"; Maniaci: "assemblagio") for a number of codicological units (1 or greater) that have, at some point, been used together in one binding.

Thus far Gumbert. Let's see how much of his terminology we can apply here. Jonathan Groß (talk) 19:52, 20 November 2023 (UTC)Reply

as a postscriptum to any future readers, Gumbert's article is not always easy to find (libraries seem confused whether to treat Segni e testi as a journal or a series) but then again it is - https://www.csmc.uni-hamburg.de/files/2004-gumbert-codicological-units.pdf CRolker (talk) 09:55, 30 May 2024 (UTC)Reply

On adapations of Gumbert's model see Patrick Andrist, "New Tools and Database Models for the Study of the Architecture of Complex Manuscripts", in: Ancient Manuscripts and Virtual Research Environments. Classics@ 18.1 (2021), link to full text

I have created codicological unit (Q123476271) as it was missing; it can surely be refined, especially for the properties; I have added the labels in some European languages. --Epìdosis 14:37, 18 November 2023 (UTC)Reply

Thank you, and I just created dismembered codex (Q123485401). Jonathan Groß (talk) 19:22, 19 November 2023 (UTC)Reply

Number of folia

Latest comment: 24 days ago4 comments2 people in discussion

At the moment, Property:P1104 "number of pages" is described as follows in the table: "Specifies the number of folia of a manuscript." This is very odd, as the number of pages will always be different from the number of folia. For this reason perhaps, "foliation" is listed among the missing properties at the foot of the very same table. The optional "unit" for P1104 is, in my view, not helpful. Rather, a separate property "number of folia" would seem the best solution. What do you think? atb CRolker (talk) 12:59, 15 May 2024 (UTC)Reply

I agree that the practice of using number of pages (P1104) is not only not ideal, but wrong (strictly speaking) for manuscript codices. I faintly remember a discussion on Wikidata:Property proposals which was against creating additional properties when number of pages (P1104) with a qualifier and / or a specific unit can be used instead, but I cannot seem to find a link to it.

Maybe we can argue from the duality of folio(s) (P7416) and page(s) (P304) that it would make sense to have a new property dedicated to state the number of folios. In any case, I am all for this new property. Jonathan Groß (talk) 15:20, 27 May 2024 (UTC)Reply

Thanks for the feedback. In a sense, P1104 works as long as the unit is provided; but then again, the very name "number of pages" suggests that the unit is less important if the object has pages - and is plain odd when used for folios. Renaming the property would be a possible solution - but only if we had a proper term referring to "pages and folios". So, a separate property would indeed be better. abt CRolker (talk) 21:56, 27 May 2024 (UTC)Reply

Renaming (repurposing) properties is generally not a good idea, better to create a new property in this instance. Jonathan Groß (talk) 10:12, 28 May 2024 (UTC)Reply

inventory number

Latest comment: 19 days ago4 comments3 people in discussion

May I ask another question - again, as a newbe with little practical wikidata experience (though some experience with medieval mss). Of course it makes sense that mss are described by inventory number (P217) and collection (P195)- very hard to think of any properties which are more important. But does it really make sense to use "inventory number" and "collection" independently? In my view, "inventory number" cannot be a property on its own, i.e. without being qualified with the collection in question. If, however, you use "inventory number" only together with "collection" (as we do in our show case item), the property "collection" becomes completely redundant. Nothing wrong with some mild redundancy (e.g. inventory numbers often reappear in aliases), but as inventory numbers only make sense in the context of a collection, I think inventory number and collection should not exist as two separate properties. Best wishes, CRolker (talk) 22:09, 27 May 2024 (UTC)Reply

Thanks CRolker, this is a very important point you're raising. Our modelling of "inventory numbers" (call numbers, shelf numbers...) is really less than ideal, because it is such a semantic and context-dependent type of data. Finding a consistent, stable and straightforward way of modelling inventory numbers should be a priority of any data model like ours.

Pinakes made an excellent choice when they decided to organise their manuscript items hierarchically (country > city > institution > fonds > shelf number), but Wikidata unfortunately does not feature hierarchical integration of properties, at least as far as I know. @Epìdosis:, sorry to bother you, but I believe that your input would be invaluable here. Jonathan Groß (talk) 10:18, 28 May 2024 (UTC)Reply

A minor issue with shelf marks (and not a problem for us at the moment): at any given time, any ms should have one and only one shelf mark. However, we should not assume that this always the case. Apart from mss sine numero, it may not always be clear which shelf marks are introduced when. I am thinking about a collection which was catalogued, and given new shalf marks in the 19th century; the new shelf marks were duly used, and noone ever cited the catalogue numbers. Yet the next catalogue (20th c), using these shelf marks, also had catalogue numbers which were not meant to be shelf marks, but tended to be cited alongside them; after a while, they were regarded as (new) shelf mark, and when the library produced another catalogue (21st c), they used them too. So in the century between the publication of the second and third catalogues, the mss received new shelf marks, but it would be very difficult to pin down a date for this change, and perhaps best to say that they in this time had two shelf marks. Given how long it takes for new shelf marks to become generally used this is perhaps not so unusual. atb CRolker (talk) 10:05, 30 May 2024 (UTC)Reply

And some manuscripts have old shelfhmarks but the cataloguers at the current holding library don't know (yet) to which collection those shelfmarks belonged. HHill (talk) 12:18, 2 June 2024 (UTC)Reply

Add topic