Wikidata:Property proposal/Thesaurus Linguae Aegyptiae object ID

‎Thesaurus Linguae Aegyptiae object ID

edit

Originally proposed at Wikidata:Property proposal/Creative work

Descriptionidentifier of a text artefact/object and of abstract texts in the Thesaurus Linguae Aegyptiae (a lemmatized corpus of Ancient Egyptian texts)
Data typeExternal identifier
Domainwritten work (Q47461344), text (Q234460), inscribed object (Q111326612), manuscript (Q87167), ..., Egyptian (language branch) (Q34610803)
Allowed values[a-zA-Z0-9\-_]+
Example 1Edwin Smith Papyrus (Q842363)Q7G6FQJU7VFUNE43F6CB75K64Y
Example 2Westcar Papyrus (Q591301)SEPNO7UILFCDPKLYRADUNRV3QA
Example 3Rosetta Stone (Q48584)FF4FQ4TTYBAE3L7VW22DCNJ3XI
Example 4Tale of the Shipwrecked Sailor (Q1517484)2LZUFWY2WRDNHKXADEBXGL372Y
Example 5The Maxims of Ptahhotep (Q963743)PG6PVAQFHND67GCROZAIT3AA64
Example 6Book of Caverns (Q853296)MIO4AUI6CFAMBKH3NNEFU2O67A
Sourcehttps://thesaurus-linguae-aegyptiae.de/info/text-corpus
Expected completenessalways incomplete (Q21873886)
Formatter URLhttps://thesaurus-linguae-aegyptiae.de/object/$1
See alsoTrismegistos text ID (P8532)
Applicable "stated in"-valueThesaurus Linguae Aegyptiae (Q122748326)
Distinct-values constraintyes

Motivation

edit

To support LOD in the realm of Egyptian texts, we would like to

  • import a subset of Egyptian text objects of the TLA into Wikidata, creating independent Wikidata ID (so far only few Ancient Egyptian text artefacts are in Wikidata),
  • refer to Wikidata IDs in TLA text object entries,
  • link Wikidata entries of Ancient Egyptian text artefacts to TLA via the requested property.

--Dwer (talk) 20:29, 16 November 2023 (UTC)[reply]

Discussion

edit
Work ID example: Instructions of Kagemni (Q3069960)U45UGJQ3WFEBRDS6TDCRBST66Y
Work ID example: The Maxims of Ptahhotep (Q963743)PG6PVAQFHND67GCROZAIT3AA64
Object ID example: Prisse Papyrus (Q1632596)NHXAUFJ7AZAYFKTYQUXLHWKXUE
--Ailintom (talk) 15:39, 21 November 2023 (UTC)[reply]
  Comment It is true that from a conceptional perspective (a) text artefacts (objects) and (b) abstract texts ('The Bible', 'The Teachings of Ptahhotep') should be differentiated between.
However, the TLA uses the same data structure (BTSTCObject) and the same URL endpoint thesaurus-linguae-aegyptiae/object/$1 for both (only differentiating between the two via additional metadata).
I wonder if it is allowed to apply for two conceptually different properies that point to one and the same URL endpoint. Is this an option?
BTW: The TLA differentiates between text artifacts ('Papyrus Prisse'; BTSTCObject; URL endpoint thesaurus-linguae-aegyptiae/object/$1) and abstract texts ('The Teachings of Ptahhotep'; BTSTCObject; URL endpoint thesaurus-linguae-aegyptiae/object/$1), on the one hand, and concrete written texts ('the copy/version of The Teachings of Ptahhotep on Papyrus Prisse'; BTSText; URL endpoint thesaurus-linguae-aegyptiae/text/$1), on the other hand. The differentiation between a concrete written work/text and an abstract work/text seems to be alien to Wikidata, yet?
--Dwer (talk) 17:54, 21 November 2023 (UTC)[reply]
  •   Comment   Notified participants of WikiProject Antiquity —-Jahl de Vautban (talk) 21:17, 21 November 2023 (UTC)[reply]
      Comment Regarding the concerns raised by Ailintom and Dwer's reply: In my opinion, we don't need to create different properties for the TLA objects just because TLA puts textual artifacts (papyri, inscriptions etc.) in the same category as abstract concepts of texts (written work (Q47461344)). We must, however, make sure to keep the distinction between the two clear on Wikidata; it is far from alien to us. For comparison, we have both Berlin Chronicle (Q21100459) for the abstract work transmitted on the textual artifact Egyptian Museum and Papyrus Collection, P 13296 (Q21100575). Two concepts, two items. This can be done easily with one and the same property, so no need to change the URL endpoint on the TLA's end.
    On another note, I find it curious that the TLA has opted to treat concrete objects and (intangible, abstract) texts in this way, but it's no need for concern in the creation of this property. Trismegistos has chosen to label its entries as "texts" when, clearly, they are textual artifacts (tangible objects). This had led to a confusion on Wikidata, which I have since corrected.
    Oh, and BTW, I   Support the creation of this property. Jonathan Groß (talk) 16:48, 22 November 2023 (UTC)[reply]
      Comment Jonathan Groß, in that case we would probably need a clear guideline to decide which of the two different TLA text/object IDs we provide for the still very numerous WikiData entries that describe both the artefact (manuscript) and the abstract work (literary work). The problem is that TLA does not have a SPARQL interface; hence, when querying WikiData with SPARQL, you will have no idea whether the provided TLA ID refers to an artefact or an abstract work. This will render the provided property useless (or cumbersome to use, for one can of course scrape the TLA website with a Python script or the like) in many cases when you want to extract TLA IDs from WikiData for reuse in other LOD projects. (And if one day the TLA provides a semantic interface to their data, I guess they will no longer use the same class for artefacts and works, and we will have split the property anyway.) I think we are lucky that the TLA unlike Trismegistos does differentiate between abstract works, artefacts, and concrete written texts, and we should make use of this distinction to provide for a more semantically transparent mapping. --Ailintom (talk) 07:38, 23 November 2023 (UTC)[reply]
    @Ailintom:: As a guideline I suggest disambiguating the Wikidata items which currently are conflation of artifact and work. See our discussion here. It would be a boon if you could list examples of this for us to fix. Best, Jonathan Groß (talk) 08:03, 23 November 2023 (UTC)[reply]
      Comment The kernel business of the TLA are text artifacts and concrete texts (versions). These text artifacts, however, are organized in a corpus tree which also contains various kinds of captions such as "Letters", but eventually also works such as Teachings of Ptahhotep. So there are excellent landing pages for works, under which you will also find the text artifacts and the concrete text linked, but the caption was not tailor-made for works. The actual question is therefore why text artifacts and captions were only differentiated between on the metadata level but not on the data type level (which technically implies the URL endpoint). Noone had foreseen these technical implications. Probably it is indeed wise to restrict this very proposal to text artifacts and to make a separate proposal for Thesaurus Linguae Aegyptiae work IDs (leading to the same URL endpoint), as suggested by @Ailintom:. --Dwer (talk) 11:22, 25 November 2023 (UTC)[reply]
      Support I'm on board with that, too. Jonathan Groß (talk) 12:25, 25 November 2023 (UTC)[reply]