Wikidata:Property proposal/Thesaurus Linguae Aegyptiae object ID

‎Thesaurus Linguae Aegyptiae object ID

Originally proposed at Wikidata:Property proposal/Creative work

Done: Thesaurus Linguae Aegyptiae object ID (P12185) (Talk and documentation)

Description	identifier of a text artefact/object and of abstract texts in the Thesaurus Linguae Aegyptiae (a lemmatized corpus of Ancient Egyptian texts)
Data type	External identifier
Domain	written work (Q47461344), text (Q234460), inscribed object (Q111326612), manuscript (Q87167), ..., Egyptian (language branch) (Q34610803)
Allowed values	[a-zA-Z0-9\-_]+
Example 1	Edwin Smith Papyrus (Q842363)→Q7G6FQJU7VFUNE43F6CB75K64Y
Example 2	Westcar Papyrus (Q591301)→SEPNO7UILFCDPKLYRADUNRV3QA
Example 3	Rosetta Stone (Q48584)→FF4FQ4TTYBAE3L7VW22DCNJ3XI
Example 4	Tale of the Shipwrecked Sailor (Q1517484)→2LZUFWY2WRDNHKXADEBXGL372Y
Example 5	The Maxims of Ptahhotep (Q963743)→PG6PVAQFHND67GCROZAIT3AA64
Example 6	Book of Caverns (Q853296)→MIO4AUI6CFAMBKH3NNEFU2O67A
Source	https://thesaurus-linguae-aegyptiae.de/info/text-corpus
Expected completeness	always incomplete (Q21873886)
Formatter URL	https://thesaurus-linguae-aegyptiae.de/object/$1
See also	Trismegistos text ID (P8532)
Applicable "stated in"-value	Thesaurus Linguae Aegyptiae (Q122748326)
Distinct-values constraint	yes

Motivation

To support LOD in the realm of Egyptian texts, we would like to

import a subset of Egyptian text objects of the TLA into Wikidata, creating independent Wikidata ID (so far only few Ancient Egyptian text artefacts are in Wikidata),
refer to Wikidata IDs in TLA text object entries,
link Wikidata entries of Ancient Egyptian text artefacts to TLA via the requested property.

--Dwer (talk) 20:29, 16 November 2023 (UTC)[reply]

Discussion

Support Sphekaleon (talk) 07:57, 23 November 2023 (UTC)[reply]
Support This must be useful for using TLA objects as linked open data --Somiyagawa (talk) 13:49, 22 November 2023 (UTC)[reply]
Support I think it is a useful addition to be able to link hieroglyphic texts to external repositories. Situxx (talk) 14:46, 19 November 2023 (UTC)[reply]
Support Maculosae tegmine lyncis (talk) 10:43, 20 November 2023 (UTC)[reply]
Support Tobias Paul (talk) 10:08, 21 November 2023 (UTC)[reply]
Comment I would rather propose separate properties for written works Thesaurus Linguae Aegyptiae work ID (Thesaurus Linguae Aegyptiae objects with Datentyp "Sammelüberschrift") and for objects Thesaurus Linguae Aegyptiae object ID (Thesaurus Linguae Aegyptiae objects with Datentyp " Objekt"). Thesaurus Linguae Aegyptiae work ID would have the subject type constraint written work (Q47461344), text (Q234460). Thesaurus Linguae Aegyptiae object ID would have the subject type contraint artificial object (Q16686448). Otherwise it gets messed up. (For example Westcar Papyrus (Q591301) is a WikiData entry for both the papyrus and the tales, but TLA has two separate IDs for the papyrus SEPNO7UILFCDPKLYRADUNRV3QA and for the tales TI5T3F4AMZHSXK5NXBTT2Y6N4Q. Hence, the WikiData entry cannot be unambiguously linked to a TLA object using a composite work/object property as proposed above.

Work ID example: Instructions of Kagemni (Q3069960)→U45UGJQ3WFEBRDS6TDCRBST66Y

Work ID example: The Maxims of Ptahhotep (Q963743)→PG6PVAQFHND67GCROZAIT3AA64

Object ID example: Prisse Papyrus (Q1632596)→NHXAUFJ7AZAYFKTYQUXLHWKXUE

--Ailintom (talk) 15:39, 21 November 2023 (UTC)[reply]

Comment It is true that from a conceptional perspective (a) text artefacts (objects) and (b) abstract texts ('The Bible', 'The Teachings of Ptahhotep') should be differentiated between.
However, the TLA uses the same data structure (BTSTCObject) and the same URL endpoint thesaurus-linguae-aegyptiae/object/$1 for both (only differentiating between the two via additional metadata).
I wonder if it is allowed to apply for two conceptually different properies that point to one and the same URL endpoint. Is this an option?
BTW: The TLA differentiates between text artifacts ('Papyrus Prisse'; BTSTCObject; URL endpoint thesaurus-linguae-aegyptiae/object/$1) and abstract texts ('The Teachings of Ptahhotep'; BTSTCObject; URL endpoint thesaurus-linguae-aegyptiae/object/$1), on the one hand, and concrete written texts ('the copy/version of The Teachings of Ptahhotep on Papyrus Prisse'; BTSText; URL endpoint thesaurus-linguae-aegyptiae/text/$1), on the other hand. The differentiation between a concrete written work/text and an abstract work/text seems to be alien to Wikidata, yet?
--Dwer (talk) 17:54, 21 November 2023 (UTC)[reply]

  Comment
Jahl de Vautban
Tolanor
JASHough
Jonathan Groß
Ahc84
Carbidfischer
Epìdosis
JBradyK
Joan Gené
DerMaxdorfer
Falten-Jura
DerHexer
Alexmar983
Demadrend
Liber008
Rybesh
ELexikon
Digitalphilologist
paregorios

Notified participants of WikiProject Antiquity —-Jahl de Vautban (talk) 21:17, 21 November 2023 (UTC)[reply]
  Comment Regarding the concerns raised by Ailintom and Dwer's reply: In my opinion, we don't need to create different properties for the TLA objects just because TLA puts textual artifacts (papyri, inscriptions etc.) in the same category as abstract concepts of texts (written work (Q47461344)). We must, however, make sure to keep the distinction between the two clear on Wikidata; it is far from alien to us. For comparison, we have both Berlin Chronicle (Q21100459) for the abstract work transmitted on the textual artifact Egyptian Museum and Papyrus Collection, P 13296 (Q21100575). Two concepts, two items. This can be done easily with one and the same property, so no need to change the URL endpoint on the TLA's end.
On another note, I find it curious that the TLA has opted to treat concrete objects and (intangible, abstract) texts in this way, but it's no need for concern in the creation of this property. Trismegistos has chosen to label its entries as "texts" when, clearly, they are textual artifacts (tangible objects). This had led to a confusion on Wikidata, which I have since corrected.
Oh, and BTW, I   Support the creation of this property. Jonathan Groß (talk) 16:48, 22 November 2023 (UTC)[reply]
  Comment Jonathan Groß, in that case we would probably need a clear guideline to decide which of the two different TLA text/object IDs we provide for the still very numerous WikiData entries that describe both the artefact (manuscript) and the abstract work (literary work). The problem is that TLA does not have a SPARQL interface; hence, when querying WikiData with SPARQL, you will have no idea whether the provided TLA ID refers to an artefact or an abstract work. This will render the provided property useless (or cumbersome to use, for one can of course scrape the TLA website with a Python script or the like) in many cases when you want to extract TLA IDs from WikiData for reuse in other LOD projects. (And if one day the TLA provides a semantic interface to their data, I guess they will no longer use the same class for artefacts and works, and we will have split the property anyway.) I think we are lucky that the TLA unlike Trismegistos does differentiate between abstract works, artefacts, and concrete written texts, and we should make use of this distinction to provide for a more semantically transparent mapping. --Ailintom (talk) 07:38, 23 November 2023 (UTC)[reply]
@Ailintom:: As a guideline I suggest disambiguating the Wikidata items which currently are conflation of artifact and work. See our discussion here. It would be a boon if you could list examples of this for us to fix. Best, Jonathan Groß (talk) 08:03, 23 November 2023 (UTC)[reply]

  Comment The kernel business of the TLA are text artifacts and concrete texts (versions). These text artifacts, however, are organized in a corpus tree which also contains various kinds of captions such as "Letters", but eventually also works such as Teachings of Ptahhotep. So there are excellent landing pages for works, under which you will also find the text artifacts and the concrete text linked, but the caption was not tailor-made for works. The actual question is therefore why text artifacts and captions were only differentiated between on the metadata level but not on the data type level (which technically implies the URL endpoint). Noone had foreseen these technical implications. Probably it is indeed wise to restrict this very proposal to text artifacts and to make a separate proposal for Thesaurus Linguae Aegyptiae work IDs (leading to the same URL endpoint), as suggested by @Ailintom:. --Dwer (talk) 11:22, 25 November 2023 (UTC)[reply]
  Support I'm on board with that, too. Jonathan Groß (talk) 12:25, 25 November 2023 (UTC)[reply]

@Dwer, Ailintom, Sphekaleon, Somiyagawa, Situxx, Maculosae tegmine lyncis: @Tobias Paul, Jahl de Vautban: Done I've created two properties: Thesaurus Linguae Aegyptiae object ID (P12185) for objects (textual artifacts) and Thesaurus Linguae Aegyptiae textual work ID (P12186) for works (intellectual contents, texts). Both use the same formatter URL but have slightly different property constraints. This should facilitate data evaluation. BTW, if you want to import much data from the TLA into Wikidata, I suggest seeking help and guidance at the WikiProject Antiquity. Jonathan Groß (talk) 21:16, 28 November 2023 (UTC)[reply]