Wikidata:Property proposal/Wordnet synset ID
Wordnet synset ID edit
Originally proposed at Wikidata:Property proposal/Generic
Description | identifier for a set of cognitive synonyms in Princeton University's WordNet database |
---|---|
Represents | synset (Q1673963) |
Data type | External identifier |
Example | |
Source | https://wordnet.princeton.edu/ |
External links | Use in sister projects: [ar] • [de] • [en] • [es] • [fr] • [he] • [it] • [ja] • [ko] • [nl] • [pl] • [pt] • [ru] • [sv] • [vi] • [zh] • [commons] • [species] • [wd] • [en.wikt] • [fr.wikt]. |
Formatter URL | http://imagenet.stanford.edu/synset?wnid=$1 |
See also | BabelNet ID (P2581) |
- Motivation
WordNet (Q533822) is a central semantic network and the first major word net (Q2594143) AFAIK. We already have a BabelNet (Q4837690) property, but WordNet (Q533822) synsets are used, e.g., in ImageNet (Q24901201). It was discussed on the Wikidata mailing list in 2016, see https://lists.wikimedia.org/pipermail/wikidata/2016-April/008517.html — Finn Årup Nielsen (fnielsen) (talk) 14:58, 15 August 2017 (UTC)
- Updates
Update: This property could potentially be an external identifier. To my knowledge there exist no resolver at the canonical site at Princeton, but one could use, e.g., the ImageNet resolver here: http://imagenet.stanford.edu/synset?wnid= — Finn Årup Nielsen (fnielsen) (talk) 19:14, 15 August 2017 (UTC)
Notified participants of WikiProject Linguistics
Further update: After looking a bit more into WordNet and its identifiers I now see two issues:
- WordNet synset identifiers are not persistent between WordNet version. ImageNet apparently uses uses 3.0, while it seems to me that BabelNet uses Wordnet 2.1.
- WordNet synset identifiers have no canonical aggregation - AFAIU - wrt. to the "pos" and the "offset" part: n04380533 is apparent not the canonical but the format that ImageNet uses. n04380533 may also be written as 04380533n or 04380533-n.
I suggest we stick to ImageNet's format and their use of version 3.0 (for now). I have emailed ImageNet about their plans for the identifier. WordNet Fellbaum suggets using the WordNet sense key instead. I suppose we could potentially have multiple WordNet keys. — Finn Årup Nielsen (fnielsen) (talk) 21:54, 17 August 2017 (UTC)
Yet further update: We already have the possibility to link to Wordnet 3.0 and Wordnet 3.1 and global wordnet. The wordnets have LOD URI see, e.g., the dog at https://www.wikidata.org/wiki/Q144#P2888 which link to the resource http://globalwordnet.org/ili/i46360 I am not sure that the property is necessary? — Finn Årup Nielsen (fnielsen) (talk) 21:09, 21 August 2017 (UTC)
Here is an example that will fetch Wordnet-Wikidata links with the exact match (P2888) I have typed in so far:
SELECT ?item ?itemLabel ?uri WHERE {
{
SELECT ?item ?uri WHERE {
?item wdt:P2888 ?uri .
FILTER STRSTARTS(str(?uri), 'http://wordnet-rdf.princeton.edu/wn30/')
}
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
— Finn Årup Nielsen (fnielsen) (talk) 00:30, 22 August 2017 (UTC)
- Discussion
- Support. I can see how we would want to be able to describe these in Wikidata. YULdigitalpreservation (talk) 15:47, 15 August 2017 (UTC)
- Comment Should we wait until lexicographical data is supported? Mahir256 (talk) 18:21, 15 August 2017 (UTC)
- I do not see why we should wait. If I understand correctly then Wikidata "senses" are linked via the values to q-items, see, e.g., http://wikidata-lexeme.wmflabs.org/index.php/Lexeme:L15 with the links to leader and conductor. There seems to be a good number of WordNet synsets that fits well with current Wikidata items. I suppose that a WordNet-Id could potentially help to populated Wikidata-lexemes entities once they come. — Finn Årup Nielsen (fnielsen) (talk) 17:17, 16 August 2017 (UTC)
- Support while this is obviously English-specific, the "synset" is in principle a conceptual entity, not purely based on the English text. Some of the relationships in wordnet are lexical and relate to what we're doing with Wiktionary so I can see how they could help add data once the lexical stuff is in, but the id's should be just conceptual and it makes sense to me to link them to wikidata QID's. ArthurPSmith (talk) 19:52, 17 August 2017 (UTC)
- Support. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:29, 21 August 2017 (UTC)
- Support. WordNet is a very useful, and used, resource. Peter F. Patel-Schneider (talk) 21:09, 21 August 2017 (UTC)
- Comment I have made several updates to the 'motivation'. We need to address the issues that have come up. I wonder if there is anyone that has some input? — Finn Årup Nielsen (fnielsen) (talk) 21:30, 21 August 2017 (UTC)
- You are the proposer, if you think it's not necessary you can withdraw this proposal! I do largely agree that "exact match" works well if there's a well-defined URI for a concept, so if you think that's where WordNet synsets are now, I'm fine with not creating a special property for this. ArthurPSmith (talk) 18:16, 23 August 2017 (UTC)
- Support ديفيد عادل وهبة خليل 2 (talk) 20:40, 1 October 2017 (UTC)
- Comment I am perhaps most comfortable with withdrawing this proposal as exact match (P2888) is sufficient in most cases. I suppose there could be a interest in have a "ImageNet-Wordnet synset id" perhaps? — Finn Årup Nielsen (fnielsen) (talk) 11:44, 2 October 2017 (UTC)
- Support Can we resurrect this property? even though s exact match (P2888) allows recording the information, a specific wordnet property would be better as it would allow to query the wordnet sunset for an item, or find all items that have a wordnet sunset. exact match (P2888) only supports the reverse query, ie, given a synset id, find the corresponding Wikidata item.