About this board

talk talk

Previous discussion was archived at User talk:So9q/Archive 1 on 2019-11-18.

GZWDer (talkcontribs)

Do you have any comments on my recent edit?

GZWDer (talkcontribs)

I planned to add a reference to a large number of words (of 50000 from original source, at least 6000 exists in Wikidata) but I think we need community to discuss it first. See Wikidata_talk:Lexicographical_data#Moby_Part_of_Speech_List. (For the time being only existing lexemes will be edited.)

So9q (talkcontribs)

Sounds good to me. As long as it's a limited scope bot job with a some examples to judge the quality it sounds like a good idea to me :) I'm very happy you help finding these CC0 sources.

GZWDer (talkcontribs)

I want to receive some opinion on what is considered "everything from before is cleaned up"; Wikidata is a work-in-progress and edits are not required to be perfect. Usually, most items created by bots do not have most information filled but it is not usually considered an issue. Regearding duplicates, many tasks will create new duplicates and it is not possible to check them one-by-one (given the number of items created are several million), but what would be acceptable for already created items?

So9q (talkcontribs)

Yeah, it might not be a reasonable request at all. It's not a demand anyway, just what I would do myself and wish of others. I actually have something to clean up myself from an old QS batch that was eh misguided. 😅 The difference here is that you have someone nagging you and I don't. Nobody else seems to back me up so maybe you fine and in good standing? Ask Nikki, he is the only one I remember having mentioned you in Telegram (concerning lexemes). If you make any further bot requests I would very much like them to be limited in scope and with example edits. I would also love to see a bot that both for example for scientific artivlws:

  1. find a missing doi
  2. look up the authors
  3. imports any missing authors with ORCID with data from at least one source
  4. imports the paper and links to any authors and put the rest in author string

Since no one has written that I'm writing one now 😃

GZWDer (talkcontribs)

For lexemes, I have stated before that I will discuss every import seven days before at Wikidata talk:Lexicographical data. Wikidata's lexeme coverage is very limited and many very basic words are missing (you can see contribution of GZWDer (flood) and many words are very common), but due to the quality the sources we have, I proposed the Lexeme Mix'n'Match import approach. There are several online resources, (i) databases like WordData, WordNet and Flexion (Q101183911) (where I also imported a part), which contains a large number of invalid and duplicated entries (and for German, we still need to discuss the proper way to handle all inflection forms); (ii) online dictionaries, may be more reliable than i but the senses are copyrighted (so we may use a Mix'n'Match-like approach to manually match them); (iii) older text dictionaries, they may be in public domain but either they does not provide part-of-speech at all or part-of-speech may only be extracted using some complicated process; and (iv) word lists that provide nothing than words (which may contain plenty of non-lemma forms).

For authors of articles, there are some sources for them but either: (1) they conflated many people to one (Semantic Scholar); (2) containing multiple profiles for one person (Microsoft Academic); (3) Does not allow data mining per TOU (MathSciNet, Scopus). The nearest thing is ORCID API, where Magnus Manske's bot (currently inactive) was working on it.

So9q (talkcontribs)

I like the idea of a mixnmatch approach. I read up on it yesterday and I'm now using the userscript which is very user friendly. 😃 This is probably the best tool manske ever wrote. QS is also good but seems to be almost abandoned and that detract a lot of the value IMO. Also QS needs training on the users part and is not very intuitive IMO.

Reply to "Wikidata:Requests for permissions/Bot/GZWDer (flood) 6"
Arlo Barnes (talkcontribs)

Please find a good place on Wikidata to summarise progress made, for those who don't have a phone number to use Telegram.

Reply to "Re: topic:Vmaxvvzztj2kun8c"
Mateusz Konieczny (talkcontribs)
GZWDer (talkcontribs)

This new idea needs some input - in the future, data should be imported to a new system instead of directly to Wikidata and invalid words can be filtered out in advance.

Reply to "Wikidata_talk:Lexicographical_data#Tools_idea:_Lexeme_Mix'n'Match"
GZWDer (talkcontribs)

In WordData the basic unit is sense and there are no entries for lexemes. e.g. this refers to three Wikidata lexemes and 9 senses. See as an example here: if you query for synonyms of "group" it will first query all senses (verb and noun) of "group" and find synonyms of each senses. See Wikidata:Property proposal/Wolfram language WordData sense.

  • WordData["group", "Synonyms"] works like {x:[i.lemma for i in synonyms(x)] for x in senses("group")}
  • WordData["group", "Synonyms", "List"] works like sum([[i.lemma for i in synonyms(x)] for x in senses("group")],[])
  • WordData["group", "Synonyms", "Rules"] works like {x:[i for i in synonyms(x)] for x in senses("group")]
So9q (talkcontribs)
GZWDer (talkcontribs)

Wolfram Alpha may access some general information a word, but not a specific sense. you need to install Wolfram Mathemativa or Wplfram Engine, or use Wolfram Cloud to access data acout specific sense.

So9q (talkcontribs)

It seems we are miscommunicating here. I'm not interested in the senses if they cannot be accessed with a public URL.

GZWDer (talkcontribs)
So9q (talkcontribs)
GZWDer (talkcontribs)

Do you noticed the discussion in the Telegram groups?

So9q (talkcontribs)

Did you mean to ask if I participated in the chat in the telegram groups? Yes, I participate more or less every day atm. You are very welcome to join. :)

GZWDer (talkcontribs)

I did not use the correct grammar in the previous comment. I mean you should notify Telegram group about the discussion about WordData you started.

So9q (talkcontribs)

Yes I posted it in the lexical group.

GZWDer (talkcontribs)
So9q (talkcontribs)

can you give an example URL? Do we have a external ID property for that?

GZWDer (talkcontribs)

You may replace the "word" in above URL to arbitrary word. See also WordNet 3.1 Synset ID (P8814) - you may find Synset ID via OPTIONS (top-right) => Show Synset Identifier - which is only used three times. WordNet does not provide ID for lexemes nor (lexeme-dependant) senses of lexemes.

So9q (talkcontribs)
So9q (talkcontribs)

wordnet seems like a goldmine to link to, could you also ask if they can make an URL endpoint that accept sense_key or id possible?

Reply to "WordData"
VIGNERON (talkcontribs)
Reply to "Lang consistency"
Fnielsen (talkcontribs)

I think you should stop using MachtSinn for the Danish language. There are too many errors introduced, which require a considerable about of cleanup. "en af de tre klassiske samlede stater i sagen" as a gloss is not correct Danish. That is presumably a machine translation gone wrong. There are a number of non-nouns linked that are questionable or where the sense does not create or choose the correct lexeme. "studie", "kriminel", "arbejde" and "nød" are some of the lexemes that have questionable edits on 25 October 2020 as far as I can see.

Fnielsen (talkcontribs)
Fnielsen (talkcontribs)

Yet another: "kokosnød" is probably not the same as kokospalme. One is the nut, the other is the tree.

So9q (talkcontribs)

No problem for me. Will you use it? I can work on Swedish and English instead. Thanks for checking up on my edits 😃

Fnielsen (talkcontribs)

Thanks. :) I have rarely used it. I find it difficult to use.

Reply to "MachtSinn"
Hjart (talkcontribs)

Please refrain from or be very very carefull when using Google translate for descriptions. Lots of those you added today were fairly hopeless.

Fnielsen (talkcontribs)
So9q (talkcontribs)
Fnielsen (talkcontribs)

Tak. Jeg fiksede grammatisk køn et/en køn. :)

Reply to "Google translating description"
Fnielsen (talkcontribs)
Fnielsen (talkcontribs)
Fnielsen (talkcontribs)

Andre: omgang (længdeenhed), løb (sportsdisciplin)

So9q (talkcontribs)
Fnielsen (talkcontribs)

Jeg har nævnt det i et nyt issue og kommenteret på et gammelt issue.

Reply to "Problemer med MachtSinn"
Andreasmperu (talkcontribs)

You seem to have added a lot of wrong statements for P641. Please undo those wrong edits.

So9q (talkcontribs)

Thanks for the heads up. This is obviously an error. I intended to use P1269. I'm working on fixing it now.

Reply to "P641"