User:So9q/Tool ideas

LexUse edit

https://www.wikidata.org/wiki/Wikidata:LexUse first script in python. It's in alpha state. It has been used to add hundreds of usage examples to swedish from Riksdagen. It's about to be deprecated and replaced with LexUtils.

LexUtils edit

A toolbox of modules that together each does one job. Starts in a REPL. Separate out the TUI code to its own module.

Make it delightful to choose and change working language and remember between sessions.

Pseudo code

  1. check when languagelist was last updated (update once a year or when languages.json is missing)
  2. download and cache a list of all languages in WD that currently has at least 1 lexeme (a few hundred) to languages.json

https://w.wiki/3A62

  1. ask user to choose work language by searching in the list by the english name and choose one (make completer list) (YAGNI:or accept a QID)
  1. if QID was given, check that it is in languages.json or die with a clear error
  1. qid store in a shared settings.json in a dict under key "current_work_language":QID
always offer to use a previously used language first

UsageExamples edit

Successor of LexUse with dataframes, bugfixes and pretty formatting. Incorporated in lexutils to easily share code between the tools. E.g. UI code.

LexCombine edit

This is an idea for a tool by So9q to help insert combines lexemes (P5238) on lexemes in languages with multiple words.

Pseudo code edit

  1. fetch lexemes via sparql that has a space in the lemma and are missing combines lexemes (P5238)
  2. loop
  1. split the lemma X into words
  2. search via the API for lexemes with lemmas in language A
  3. if only one lexeme is found for each word in the lemma X
  1. ask the user to validate the whole combines lexemes (P5238)-statement
  2. upload to Wikidata
  3. add to the users Watchlist
  1. fetch lexemes which have an interfix (like -s- in Swedish)
  2. recognize affixes
  3. guess and propose to the user

Lex500 edit

help make sure the 500 most translated lexemes are translated into your language.

**** Pseudo code
- ask user work language worklang
  + e.g. da
- ask list of from languages fromlang
  + e.g. en,sv
- fetch the most translated qids sorted descending
- loop through them
  + e.g. "cat" "domesticated animal"
  + check if worklang translation is missing
  + check if fromlang glosses exist for at least one of them
  + print sense(s) from fromlang(s)
  + ask for which lexeme to link to this sense (target_lemma)
    - e.g. "kat"
  + try finding a lexeme in worklang with target_lemma
  + if found at least one
    - ask user to choose if more than one
    - else ask user if its the correct lexical category (noun in this example)
      + if yes list the senses if any
      + if no senses ask if user wants to create one
        - if yes fetch description from qid
        - if none ask user to input description
          + if input save description to qid and create sense with it as gloss and P5137 linking to the QID
          + upload to Wikidata
          + add to watchlist
      + else list senses and ask user to pick one to validate the linking of P5137
        - if validated
          + upload to Wikidata
          + add to watchlist
  + skip if none found

LexHype edit

Help enter hyphenation according to rules.

**** Pseudo code
rules=dict(
da:{
vowels:['a','e','i']})
}

LexDescription edit

Improve descriptions on items based on lexeme glosses. https://github.com/dpriskorn/LexUtils/issues/3

See also label collector which might be a better tool to find good descriptions for items.

ImproveWikidata edit

Free standing tool in own repo, see User:So9q/Tool_ideas/ImproveWikidata

LexAlias edit

Tool to create new lexemes using aliases in Wikidata. The idea is to get a list of all items currently linked from a lexeme in any language and work on the aliases of those.

  1. pseudo code
  2. choose language
  3. get a list of 1000 items via sparql

https://w.wiki/3A5m

  1. loop through list and split(" ")
  2. download list of all Swedish lexemes via sparql limit and offset
  3. cache the list for 1 month
  4. if word not in list
  5. check with sparql if a lexeme with the word exists
  6. if not
  7. create simple lexeme
  8. for Swedish look up the lexical category via Wiktionary and saob
  9. ask user to verify the lexical category.
  10. upload

LexEtymology edit

Scrape and parse the etymology section of other dictionaries linked from lexemes like SAOB and present the user with options to add derived statements.

LexDerived edit

Detect and enable semi-automatic addition of derived from lexeme (P5191) based on rules for the language. Present 10 for approval at a time

Rules for swedish edit

Nouns
Adjectives