Wikidata:Property proposal/WordNet 3.1 Synset Id

WordNet 3.1 Synset Id

edit

Originally proposed at Wikidata:Property proposal/Authority control

DescriptionSynset identifier in Princeton’s WordNet Version 3.1
RepresentsWordNet (Q533822)
Data typeExternal identifier
Domainitem
Allowed values\d{8}\-[nvarsp]
Example 1dog (Q144)02086723-n
Example 2pawl (Q55629301)03907626-n
Example 3hot dog (Q181055)07692347-n
Sourcehttps://wordnet.princeton.edu/
Planned useThere is already word-sense disambiguation software that produces WordNet synsets. This property would allow such software to target Wikidata.
Number of IDs in source117,000
Expected completenesseventually complete (Q21873974)
Formatter URLhttp://wordnet-rdf.princeton.edu/id/$1
Robot and gadget jobsSee below
See alsoInterlingual Index ID (P5063), BabelNet ID (P2581)

Motivation

edit

WordNet is a substantial and widely-used set of concepts. Having a mapping between Wikidata and WordNet would assist those who want to use Wikidata for word sense disambiguation.

I note two previous proposals from two years ago: Wikidata:Property_proposal/Wordnet_synset_ID and Wikidata:Property proposal/WordNet ID. The first of those gained significant support, but was withdrawn because of issues about how WordNet ids changed in different versions, and because of questions about the part-of-speech being required to convert the id into a URL. The second proposal was opposed because its relationship to the prior proposal was unclear and because of the forthcoming integration with Wiktionary.

Regarding the issue with versions, this proposal solves that problem by being specific to one version, and uses the "offsets" for that version. (Arguably, we ought to have properties for both 3.0 and 3.1, the two major versions in use today, but only 3.1 is proposed here.)

The issue of the part-of-speech being part of the URL is resolved by making it part of the identifier.

This property is not directly related to the lexicographical data because these identifiers are for lexical concepts, which are better modelled as Wikidata items. WordNet has a separate namespace for lemmata, e.g. http://wordnet-rdf.princeton.edu/lemma/dog .

As can be seen in this query, we already have 188 mappings to WordNet 3.1 using exact match (P2888). While these are usable, it is better to have a specific property. These can be used to populate the property. Mappings between WordNet 3.0 and 3.1 and with other resources like ILI and BabelNet are available from various sources.

I anticipate being able to populate the property with hundreds or thousands of values with a few rounds of QuickStatements that do some of the obvious steps to populate the property from the existing data. This might need to be run from time-to-time, but probably does not require a bot.

Bovlb (talk) 18:23, 5 August 2020 (UTC)[reply]

Discussion

edit

  Notified participants of WikiProject Linguistics

Hello Bovlb, In view of my history, I am a "Beotian" for the lexicographical side of WD and, precisely, I expected that you would give me a reply. But indeed, it seems more interesting to me to capture most of the site IDs. With this in mind, I'm not blocking goodwill: I'm changing my opinion, but I'm not "excited" to include this property only for Qs. (I was "expeditious" on a large number of proposals since yesterday (creation, closure, etc.), because the new proposals were queuing to display correctly. You see me sorry.) For example, You always have the possibility of putting on hold in the status field, without omitting to include an explanatory message in the Discussion section, such as: {{Wait}} requesting advice from the lexico specialists blahblah. Looking forward to reading you. —Eihel (talk) 03:43, 25 August 2020 (UTC)[reply]
ps. If the proposal changes significantly, you need to notify everyone involved of this section. —Eihel (talk) 03:49, 25 August 2020 (UTC)[reply]