Wikidata:Property proposal/from language & to language

from language

Originally proposed at Wikidata:Property proposal/Generic

Not done

Description	qualifier stating that a value pertains to the translation from a particular language
Data type	Item
Example 1	Gregory Rabassa (Q2342303)occupation (P106)translator (Q333634)from languageSpanish (Q1321), Portuguese (Q5146)to languageEnglish (Q1860)
Example 2	GNU C Compiler (Q105514255)has use (P366)compilation (Q12769326)from languageC (Q15777)to languagemachine code (Q55813)
Example 3	TypeScript (Q978185)has goal (P3712)compilation (Q12769326)to languageJavaScript (Q2005)
Example 4	javac (Q306461)has use (P366)compilation (Q12769326)from languageJava (Q251)to languageJava bytecode (Q137496)
Example 5	KPHP (Q15912350)has use (P366)compilation (Q12769326)from languagePHP (Q59)to languageC++ (Q2407)
Example 6	DeepL Translator (Q43968444)has use (P366)translation (Q7553)from languageBulgarian (Q7918), Simplified Chinese (Q13414913), etc.to languageBulgarian (Q7918), Simplified Chinese (Q13414913), etc.
See also	readable file format (P1072), writable file format (P1073)

to language

Originally proposed at Wikidata:Property proposal/Generic

Not done

Description	qualifier stating that a value pertains to the translation to a particular language
Data type	Item
Example 1	see #from language
Example 2	see #from language
Example 3	see #from language

See #from language for the motivation and discussion.

Motivation

I propose the introduction of two new qualifiers "from language" and "to language" to qualify the source and target languages of translators (be they humans or software).

--Push-f (talk) 12:04, 5 November 2022 (UTC)[reply]

Discussion

Arlo Barnes (talk) 23:30, 15 November 2020 (UTC) Pamputt Germartin1 (talk) 11:01, 29 December 2021 (UTC) Prefuture (talk) عُثمان (talk) 15:04, 8 October 2024 (UTC)[reply]
Notified participants of WikiProject Languages WikiProject Informatics has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead. --Push-f (talk) 13:12, 5 November 2022 (UTC)[reply]
I would suggest broadening the application of these qualifiers from "languages" to just about any kind of object that is either the source or the target of a transformation process, and therefore use the labels "source object"/"target object" (or rather "source form"/"target form"), respectively.
This would allow the same qualifiers to be used with many different transformations, not just translation and compilation, but also adaptation, transliteration, or physical processing:
- custom software (Q339628)has use (P366)data conversion (Q1783551)source formISO/IEC 8859-1 (Q935289)target formUTF-8 (Q193537)
- typist (Q58487031)field of work (P101)transliteration (Q134550)source formCyrillic script (Q8209)target formMongolian (Q1055705)
- screenwriter (Q28389)occupation (P106)adaptation (Q1213562)source formnovel (Q8261)target formtelevision series (Q5398426)
- graphic artist (Q1925963)field of work (P101)technical drawing (Q192521)source formsketch (Q5078274)target formconstruction drawing (Q77419822)
- spinning wheel (Q58966)has use (P366)hand spinning (Q4140198)source formcotton (Q11457)target formyarn (Q49007)
In addition, also translators could be stated as specializing in certain types of source documents, such as contract (Q93288) or advertising (Q37038), and not only languages. In those cases, only the "source form" qualifier should be used as the target form is typically the same.

As there are already more specialized properties handling part of the above cases, they would be defined as sub-properties of either "source form" or "target form", to be preferred in case of semantic overlap. However, some contexts may be too specialized or insignificant to warrant the creation of two additional qualifiers, in which case "source form" and/or "target form" may be convenient to use. SM5POR (talk) 17:16, 5 November 2022 (UTC)[reply]
Comment I support the qualifiers "from/to language or script". For compilers and document converters such as Pandoc (Q2049294), I would use readable file format (P1072) and writable file format (P1073) instead of qualifiers. —Dexxor (talk) 18:02, 5 November 2022 (UTC)[reply]
I agree that readable file format (P1072) and writable file format (P1073) fit well for document converters, however I do not think that they fit for compilers because compilers translate languages ... they do not convert file formats. E.g. machine code (Q55813) is a language, not a file format. If we decide to broaden the scope of this proposal, as suggested by SM5POR above (which does sound reasonable to me), I think it would also make sense to use the qualifiers for document converters. Because Xhas use (P366)document conversion (Q5287638)source formAtarget formB is certainly more descriptive than Xreadable file format (P1072)A & Xwritable file format (P1073)B because the former can be used to answer the question: What can X be used for? while the latter would require Wikidata consumers to specifically know about those properties to answer that question. Furthermore readable file format (P1072) and writable file format (P1073) are not linked to document conversion (Q5287638) in any way. While they could probably be linked, it would not be as direct as in Xhas use (P366)document conversion (Q5287638)source formAtarget formB, and I think directly linking the relevant concepts is a good thing. --Push-f (talk) 18:30, 5 November 2022 (UTC)[reply]
A text file has one format, an image file has another format, and an audio file has a third format. Likewise you have different programming languages, such as Lisp, C and Python. You can use the text file format to represent the source code of a program in almost any programming language.

But can you express, say, C source code in any file format, be it text, image or audio? I'd say the answer is no. Well, you can certainly create an image showing a few lines of Lisp, or record yourself reciting (maybe even singing) the entire source code to a Python module in an audio file, as a proof-of-concept, but I doubt you will find an interpreter or a compiler that accepts either as syntactically valid input.

When the value of one property dictates the value of another, those properties aren't mutually independent, but the former is in effect a subtype of the latter. Thereby you will never need to specify both properties simultaneously. Either the programming language is unspecified (because it's irrelevant), or the file format is given by the language (and therefore redundant, unnecessary to specify).

You are right that we don't call these properties the same thing, but that's the only distinction, as Wikidata doesn't care about the labels used, it merely stores and displays them for our convenience.

And in order to simplify the task of composing SPARQL queries, we should try to minimize the number of properties needed, not increase it, and especially not define properties with identical semantics but reserved for different item categories. The trick is to find labels that describe what the properties mean, not to define properties that satisfy our linguistic conventions.
It's perfectly ok to label a property "language or data format encoded" if that explains what the property is about. SM5POR (talk) 23:47, 7 November 2022 (UTC)[reply]
Right, I agree that we shouldn't have separate properties with the same semantics. Yes we could of course also broaden the scope of readable file format (P1072) and writable file format (P1073) to include computer languages ... however that would certainly not be as flexible as the proposed qualifier properties. The nice thing about modelling this via qualifiers is that they can be used together with all sorts of properties like occupation (P106), has use (P366), or has goal (P3712), whereas readable file format (P1072) and writable file format (P1073) always only carry the semantics of has use (P366). --Push-f (talk) 00:42, 8 November 2022 (UTC)[reply]
I don't actually suggest broadening the scope of readable file format (P1072) or readable file format (P1072), I merely objected to the notion that file format and programming language are inherently different characteristics that can't be combined in the same property.

However, as I mentioned in the thread Wikidata:Property proposal/subfunction, your mention of how Wikidata Query GUI (Q114902143) overloads has use (P366) made me look for a more appropriate property, and I found output device (P5196) which has a pretty generic label, yet is mainly used with VR interaction gear.

What do you think of that? As I mention in Property talk:P5196#Value domain I would have preferred calling it "output form" rather than "output method", but as neither label exactly pins down the meaning of the property, I'm content with either, and above all I do not want both defined as separate properties, because I can't tell the difference between those labels.

Instead I suggest adding qualifiers such as writable file format (P1073) and encoding (P3294), to give you the flexibility you need. There is also input device (P479) which could have mostly the same qualifiers, only replacing writable file format (P1073) with readable file format (P1072) (it would work just as fine with a single qualifier "data format").

The only issue not resolved in this way is the special association between the input and output streams that comes with conversion, compilation or any other transformative process, but not with an application that simply reads a configuration file in one format and writes an image unrelated to that configuration in another format. You have suggested "to-" and "from-" as prefixes for different properties, but you would either have to do the same to every other qualifier (encoding etc), or you could group a set of qualifiers under two generic properties indicating the direction of the data in relation to the transformation.
applicationinput device (P479)sourceapplies to part (P518)image converterreadable file format (P1072)PNG

applicationoutput device (P5196)targetapplies to part (P518)image converterwritable file format (P1073)PNM

The applies to part (P518) item "image converter" is what links those two statements together, if there are several input device (P479) or output device (P5196) statements within the same "application" item, and that can be specified in constraint form too. Summary: No new properties needed. SM5POR (talk) 08:11, 8 November 2022 (UTC)[reply]
Yes you found a way to model this by using two existing properties in ways they weren't at all intended to be used. And the end result is that you need two statements and four qualifiers instead of one statement and two qualifiers, which is just incredibly awkward and cumbersome and would clearly hinder the entry of such data as well as the consumption of such data ... I am not sure which is worse. I strongly believe that data should be modeled in a way so that it is easy to consume. Requiring data consumers to perform their own "group by" operation on statements just to have the input and output of a particular process next to each other is not an option we should consider in my opinion. --Push-f (talk) 09:14, 8 November 2022 (UTC)[reply]
I'm sorry, I didn't limit my example to either one of those you have mentioned above, but I made one up in my head (an image file processing application containing, among other things, a utility to convert a file from one image data format to another), and kind of assumed this could be inferred from the example, but I realize now it may not have been that trivial.

However, let's take Google Translate (Q135622) for an actual example. How would you describe its functionality in detail using from-language and to-language? Since there is no point in listing all of its currently 133 supported languages, let's limit our example to three of them: English (Q1860), French (Q150), and German (Q188). I'm taking this first guess as a starting point, based on your example with DeepL Translator (Q43968444) (which I haven't used myself):
Google Translate (Q135622)has use (P366)translation (Q7553)from-languageEnglish (Q1860)from-languageFrench (Q150)from-languageGerman (Q188)

Google Translate (Q135622)has use (P366)translation (Q7553)to-languageEnglish (Q1860)to-languageFrench (Q150)to-languageGerman (Q188)

I'm using two main statements here partly due to limitations of the "statement" template, but also to hint at the possibility of adding qualifiers specific to the input or output sides of the translation. What would you add, change, or remove? Any other qualifiers, such as writable file format (P1073) HyperText Markup Language (Q62626012) or protocol (P2700) HTTP (Q8777), or would you make them separate main statements? SM5POR (talk) 07:46, 10 November 2022 (UTC)[reply]
Google Translate (Q135622) is an instance of website (Q35127) so I would not add either of writable file format (P1073) HyperText Markup Language (Q62626012) or protocol (P2700) HTTP (Q8777) because that holds true for any website. If the translation was only provided via some special protocol, then yes I think it would make sense to add that as a qualifier. --Push-f (talk) 12:46, 10 November 2022 (UTC)[reply]
Comment Would this work? YAML (Q281876)has use (P366)compatibility (Q2175596)to languageJSON (Q2063) -wd-Ryan (Talk/Edits) 04:07, 6 November 2022 (UTC)[reply]
Compatibility isn't a use, it's rather a goal. So you could state YAML (Q281876)has goal (P3712)compatibility (Q2175596)to languageJSON (Q2063). Though I'd say compatibility isn't really the right term. "compatible" often means "can be used together" and you certainly cannot directly insert arbitrary YAML into a JSON document: it needs to be converted first. So I think YAML (Q281876)has goal (P3712)data conversion (Q1783551)to languageJSON (Q2063), is the best way to express this relationship. --Push-f (talk) 04:49, 6 November 2022 (UTC)[reply]
Neutral Nepalicoi (talk) 04:16, 3 January 2023 (UTC)[reply]
Oppose this is conflating a bunch of things and is too specific at the same time BrokenSegue (talk) 00:32, 31 January 2023 (UTC)[reply]
Comment an alternative way of modelling this would be via field of work (P101) : translation from English (Q113785224) (this is how Czech National Authority Database (Q13550863) models it).Vojtěch Dostál (talk) 10:31, 2 March 2023 (UTC)[reply]
Not done, no consensus of proposed property at this time based on the above discussion. Regards, ZI Jony ^(Talk) 06:26, 25 January 2024 (UTC)[reply]