Babel user information
kea-N Es uzuáriu se língua maternu e kriolu.
pt-5 Este utilizador tem um nível profissional de português.
en-4 This user has near native speaker knowledge of English.
pt-BR-3 Este usuário pode contribuir com um nível avançado de português brasileiro.
es-2 Este usuario tiene un conocimiento intermedio del español.
gl-2 Este usuario ten un coñecemento intermedio de galego.
Users by language

About me: Waldir@meta.wikimedia

Work in progressEdit

To doEdit

Recurring tasksEdit

  • Clean up Repology's Problems in Wikidata report
  • Clean up the tricky items in Mix'n'match Exoplanets list

Assorted todoEdit

GeographyEdit

PopulationEdit

  • Population of cities of Portugal: https://w.wiki/45s
    • Needs distinction between cities and municipalities (e.g. Braga vs. Braga)
    • Needs more data :)
    • Talk to Rui Cavaco Barrosa about this
    • INE.pt: Conceitos por tema: Território
      • Distrito, Município/Concelho, Freguesia
      • Vila: Aglomerado populacional contínuo, com um número de eleitores superior a 3000, possuindo pelo menos, metade dos seguintes: a) Posto médico; b) Farmácia; c) Centro cultural ou de espetáculos; d) Transportes públicos coletivos; e) Correios; f) Estabelecimentos comerciais e de hotelaria; g) Escola; h) Banco.
      • Lugar: Aglomerado populacional com dez ou mais edifícios destinados à habitação, e com uma designação própria, independentemente de pertencer a uma ou mais freguesias. Os edifícios que não devem distar entre si mais de 200 metros, excepto pela interposição de equipamento coletivo, como estradas, campos de desporto, jardins, etc.
      • Lugar urbano: Lugar com população igual ou superior a 2000 habitantes.
      • Quarteirão: Conjunto de edifícios implantados numa área urbana delimitada por arruamentos.
      • Subúrbio: Território urbanizado na periferia de um centro populacional marcadamente urbano (i.e. o centro da cidade).
      • Cidade: Aglomerado populacional contínuo, com um número de eleitores superior a 8000, possuindo pelo menos, metade dos seguintes: hospital com serviço de internamento; farmácias; corporação de bombeiros; casa de espetáculos e centro cultural; museu e biblioteca; hotéis e pousadas; escolas primárias e secundárias; escolas pré-primárias e infantários; transportes públicos, urbanos e suburbanos; parques ou jardins públicos.
      • TODO: create a Wikidata item for these concepts

MapsEdit

Map of PPP median incomesEdit
  • WIP: auto-generated map of median incomes in European countries
    • Work in progress! Still needs:
      • Display the results (value and corresponding year, if available) in map form
        • Here's a start
        • Here's a complex example that colors the shape files based on data: tinyurl.com/yckrqpcl (w.wiki URL shortening fails, possibly due to excessive length)
        • documentation
      • Ensure the units are PPS rather than raw Euros or other currency
        • That ought to be done by using psn instead of psv, but it doesn't seem to be working :(

Country and subdivision codesEdit

PortugalEdit
InternationalEdit

Cape VerdeEdit

Lexemes (kea dictionary)Edit

  • Books I own
    • Dicionário Caboverdiano—Português (Manuel Veiga)
    • Léxico do dialecto crioulo do Arquipélago de Cabo Verde (Armando Napoleão Rodrigues Fernandes)
  • Google Sheets file
  • Stats in the Ordia tool
  • Tools to work with lexemes (To experiment with)
  • TODO: Create Wikidata:Lexicographical data/Documentation/Languages/kea, documenting the intended structure for kea lexemes, showing examples of lexemes for various word classes, etc.
    • Examples
    • Contents
      • Guidelines: what language to use for "spelling variant"; how to model dialect variations; which properties to set in the senses, which grammatical features to set in the forms, etc.
      • Instructions for how to create lexemes, add pronunciation, etc.
  • TODO: set up completion dashboards for lexemes
    • Senses: translations, glosses in en/pt/kea, item for this sense, etc.
    • Forms: badiu & sampadjudu, IPA transcription, pronunciation audio
    • Perhaps a simplified interface (as a toolforge tool?), similar to how ninjawords.com did for Wiktionary. Example image:
       
      • Hauki (listed here, source code here) appears to be one such attempt (example)
Automatic list of kea lexemesEdit

This list is periodically updated by a bot. Manual changes to the list will be removed on the next update!

WDQS | PetScan | TABernacle | Find images Recent changes | Query: SELECT ?item ?lexemeLabel WHERE { ?item a ontolex:LexicalEntry ; dct:language wd:Q35963 ; wikibase:lemma ?lexemeLabel . }
?item ?lexemeLabel
kaza kaza
kaza kaza
kaza kaza
garafa garafa
mankara mankara
odju odju
katxor katxor
agu agu
adivogádu adivogádu
abril abril
adju adju
abakati abakati
pai pai
mai mai
país país
komunidadi komunidadi
mundu mundu
oji oji
manhan manhan
Kriolu Kriolu
alfabétu alfabétu
mudjer mudjer
bandoná bandoná
ilha ilha
End of automatically generated list.

TEDxPraiaEdit

Goal: Add items for the TEDxPraia event (tedxpraia.com, TED event ID 19377) and talks

Scripts / gadgetsEdit

  • Auto-fill item titles (labels) with corresponding language's article title (using the same algorithm as link piping to cut out parentheticals, etc.)
  • Suggest content for unfilled descriptions, with:
    • First sentence of corresponding language's article
    • Automatic translation of description in other languages, in the order defined by translatewiki's fallback chain (should be accessible through API), ultimately falling back to English
  • Highlight (bolden) the label for the current interface language, or move it to the top

Musical chordsEdit

Goal: model musical chords in Wikidata.

Software dataEdit

  • Repology
    • See Comment by Repology's maintainer
    • See discussion in the property talk page
    • Automated report of outdated software versions in Wikidata
      • TODO: to connect this with one of the software version updater tools
    • Automated reports of packages missing in Wikidata that are in other repos: Arch, DistroWatch, etc.
      • TODO: convert these into a Mix'n'match catalog. Ideally weighted/filtered by number of (unrelated) repos?

Unix distro manifestsEdit

I.e. the set of packages that come pre-installed with (specific versions of) Unix-like operating systems (distros)

ListeriaEdit

Try replacing the table at pt:Prémio Camões#Premiados with Listeria, based on a query like this: https://w.wiki/LJB

LexemesEdit

See also Useful stuff § Lexemes below, for general information about these.

FontsEdit

Useful stuffEdit

Assorted useful stuffEdit

  • languages to skip on wikidata game: zh,ja,ru,uk,hu,ko,pl,tr,et,el,ar,bg,vi
  • languages to prefer on wikidata game: pt,gl,es,it,ro,fr
  • Narrowing down search results: To search for Wikidata items by their title on a given site, use Special:ItemByTitle.
  • According to Special:MyLanguageFallbackChain, the languages that appear in item pages are determined by the contents of the {{#babel}} box in the userpage.

Data modelEdit

OpenRefineEdit

TABernacleEdit

  • TABernacle: provide a list of items for the rows, and a list of properties for the columns; the tool fills up the matrix and helps identify missing data, and add it directly.
  • There are short descriptions at Wikidata:Tools/Query data and the Tools directory
  • Issues / needed improvements
    • No way to sort the table columns (e.g. to locate empty cells)

SPARQL queriesEdit

Query building interfacesEdit

Notes:

  • Neither VizQuery nor Wikidata Query Builder allow combining conditions with OR
  • Neither VizQuery nor Wikidata Query Builder allow specifying non-property conditions (number of sitelinks, label/description, ...)

REST endpointEdit

DocumentationEdit

Introduction / general referenceEdit
PrefixesEdit
  • General reference
    • Prefixes (wd:, wdt:, etc.) are used to qualify elements of a query (operators and operands) depending on their type (e.g. item, property, value, etc.)
    • About prefixes
    • Full list
  • List of prefixes
    • wd = Wikidata entity (e.g. ___)
      • wds = Wikidata statement (e.g. ___)
      • wdv = Wikidata value (e.g. ___)
      • wdt = Wikidata property (equivalent to p + ps as shown below)
    • p = a property statement (e.g. ?item p:P123 ?prop.)
      • ps = prop/statement/ — the value of a property statement (e.g. ?item p:P123 ?prop. ?prop ps:P123 ?propValue.)
        • psv = prop/statement/value/ — the numeric value of a property as written in the statement (i.e. disregarding the unit)
        • psn = prop/statement/value-normalized/ — the numeric value of a property, normalized to the base unit of the measured quantity.
      • pq = prop/qualifier/ — a qualifier for a property statement (e.g. ?item p:P123 ?prop. ?prop pq:P456 ?propQualifier.)
        • pqv = prop/qualifier/value/ — ?
      • pr = prop/reference/ — ?
        • prv = prop/reference/value/ — ?
    • TODO: add examples above where missing
Query syntax cheatsheetEdit

Condensed/edited from the excellent —but awfully verbose— Wikidata:SPARQL tutorial)

  • The core structure of any query is a semantic triple (subject, predicate, object).
    The "predicate" represents the relationship between subject and object, so I'll call it "relation" to make this clearer:
    • ?subject wdt:relation wd:object.
  • The object of one triple can be the subject of another triple, which allows building more complex queries:
    • ?nephew wdt:child ?father. ?father wdt:brother wd:uncle.
    • There are also two shorthands for this:
      • ?nephew wdt:child/wdt:brother wd:uncle. — using the path separator character / to chain predicates together, creating a "property path" from the subject to the object.
      • ?nephew wdt:child [ wdt:brother wd:uncle ]. — using [] to nest a partial triple, where the omitted part is the missing piece in the outer triple.
  • Use , to append another object to the previous triple, reusing both the subject and the predicate:
    • ?subject wdt:relation wd:object1, wd:object2.
  • Use ; to append a predicate-object to the previous triple's subject:
    • ?subject wdt:relation1 wd:object1;
      wdt:relation2 wd:object2.
  • Predicates can be combined using regex-like syntax:
    • Use the regex-like quantifiers *, + and ? to represent how many times a predicate appears in the query:
      • ?descendant wdt:child+ ?ancestor.
    • The two constructs above are commonly used to specify the notion "instance of X or of any subclass of X":
      • ?subject wdt:P31/wdt:P279* ?object.
    • As in regex, () groups expressions.
    • As in regex, | means OR:
      • ?itemA wdt:relation1|wdt:relation2 wd:itemB.
      • Note that this is not an OR for entire triples, but for parts thereof!
      • Parenthesis may be needed to mark the limits of the OR expression: ?itemA (wdt:prop1|wdt:prop2)/wdt:prop3 wd:itemB.
  • Sorting results
    • Add ORDER BY ?fooBar after the closing } of the SELECT statement
  • Negative assertions
    • MINUS { ?item wdt:P3999 ?closure_date }
    • FILTER NOT EXISTS { ?item wdt:P3999 ?closure_date }
    • Both of the above work... not sure if one is preferable over the other.
    • Remove specific items
      • FILTER(?item != wd:Q12345) for a single item
      • FILTER(?item NOT IN (wd:Q123,wd:Q456,wd:Q789)) for multiple items
  • Optional assertions
    • OPTIONAL { ?city wdt:P1082 ?population. }
  • More useful info: https://www.slideshare.net/LeeFeigenbaum/sparql-cheat-sheet
    • UNION / MINUS (slide 8)
    • literal values (strings, numbers, ...)
    • comparison operators (!, &&, ||, <, =, !=, ...)
    • more predicate path operators (^, !, ...)
    • underspecified triples (e.g. two or even 3 variables)
  • Wikidata-specific helpers
    • label and description
      • Include SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
      • For a variable ?foo representing an item, that automatically binds its label to ?fooLabel, and its description to ?fooDescription
      • To add custom names for the label variables (other than ?fooLabel), use rdfs:label:
        SERVICE wikibase:label {
          bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".
          ?variableName rdfs:label ?customLabel .
        }
    • Associated Wikipedia article ("sitelink")
    • lexeme representations (lemmas)
      • ?item wikibase:lemma ?lemma.
      • Filter lemmas by plain string matching: FILTER (str(?lemma) = "foobar")
      • Filter lemmas by regex matching: FILTER (regex(?lemma, '^foobar$'))

Example queriesEdit

Tools to work with scholarly worksEdit

aka academic publications (scientific papers, theses / dissertations, books, etc.)

BooksEdit

QuickStatements referenceEdit

  • Help:QuickStatements
  • QuickStatements v1 (deprecated)
  • QuickStatements v2 (recommended)
    • Can import commands in the v1 format
    • The CSV format is pretty straightforward (and actually easier to author than v1):
      • cells are comma-separated instead of tab-separated
      • there's a header row, which allows avoiding the repetition of the same prefixes in every row (thus halving the number of fields per row)
      • Queries in the query service can be tweaked to produce near-ready QuickStatement commands; see e.g. these steps to remove the "NAME" prefix from the label of exoplanets).
    • The batch mode ("run in background") doesn't seem too reliable; I got some errors, but then wasn't able to see what they were
      • Update Nov 2021: still getting some "no API success flag set" errors. Better just stick to the sequential mode.

ExamplesEdit

Example 1RescueTime (Q34637733): software version identifier (P348) = "2.12.5.1503"; publication date (P577) = +2017-06-09T00:00:00Z/11; platform (P400) = Microsoft Windows (Q1406); version type (P548) = stable version (Q2804309) (others listed here); reference URL (P854) = "https://www.rescuetime.com/updates/win_release_notes.html"; title (P1476) = "RescueTime for Windows Release Notes" (English).

Q34637733	P348	"2.12.5.1503"	P577	+2017-06-09T00:00:00Z/11	P400	Q1406	P548	Q2804309	S854	"https://www.rescuetime.com/updates/win_release_notes.html"	S1476	en:"RescueTime for Windows Release Notes"

Simplified template:

<item>	P348	"<version number>"	P577	+<date>T00:00:00Z/11	S854	"<url>"	S1476	en:"<title>"

Observations:

  • Note how source (reference) properties must be provided using the nonstandard "S" prefix — so "S854" instead of "P854".
  • Note that the whitespace characters are tabs, not spaces
  • Note that timestamps must have zero time
  • Note that the reference title requires a language specifier, here indicated by the en: prefix.

Example 2 → (TODO: human-readable translation)

CREATE
LAST	Len	"Buying Lumber"
LAST	Den	"song from the sountrack of the 2000 game The Sims"
LAST	P361	Q7764364	P1545	"4"	P2047	306U11574
LAST	P31	Q217199
LAST	P86	Q943225
CREATE
LAST	Len	"Mall Rat"
LAST	Den	"song from the sountrack of the 2000 game The Sims"
LAST	P361	Q7764364	P1545	"5"	P2047	164U11574
LAST	P31	Q217199
LAST	P86	Q943225

Observations:

  • Note the usage of CREATE and LAST directives, since we're creating new items, rather than adding statements to an existing item
  • Note the Len and Den, for the English label and description
  • Note now each line can only contain a single statement triplet, but a given statement (e.g. part of (P361)) can have any number of properties/qualifiers.
  • Note now the duration (P2047) is provided as seconds which are marked U11574, when in reality the item is second (Q11574).
  • Note how the number for series ordinal (P1545) is provided as a string, even though a plain number should work as a quantity, according to the docs ("unit is optional")

Example 3 → (TODO: human-readable translation)

Q2986828	P348	"CLDR 30.0.1"	S854	"http://cldr.unicode.org/index/downloads/cldr-30#TOC-CLDR-30.0.1-Maintenance-Release"	S1476	en:"CLDR 30 Release Note"	S958	"CLDR 30.0.1 Maintenance Release"

Example 4 → (TODO: human-readable translation)

Q839063	P1324	"http://git.savannah.gnu.org/cgit/oddmuse.git/"	P8423	Q186055

Example 4 → Add software versions, release dates and reference URLs

qid,P348,qal577,S854
Q109462071,"""v0.0.3""",+2020-05-17T00:00:00Z/11,"""https://github.com/bigskysoftware/htmx/releases/tag/v0.0.3"""
Q109462071,"""v0.0.4""",+2020-05-26T00:00:00Z/11,"""https://github.com/bigskysoftware/htmx/releases/tag/v0.0.4"""
Q109462071,"""v0.0.5""",+2020-06-19T00:00:00Z/11,"""https://github.com/bigskysoftware/htmx/releases/tag/v0.0.5"""

Observations:

  • Note the usage of triple quotes for string values
  • Note the same clunky v1 syntax for dates
  • Other than that, this is actually quite an improvement: less repetition, and no reliance on tabs

LexemesEdit

FAQEdit

TODO: Create a quickstart / FAQ / examples page in Wikidata:Lexicographical data. See also Wikidata:Lexicographical data/Glossary (which isn't linked from the main page, for some reason)

  • What are lexemes?
    • words, phrases/expressions, prefixes, acronyms, etc.
  • Wikidata vs. Wiktionary
    • User:Rua/Wikidata for Wiktionarians
    • In Wiktionary each page contains all homographs of a word, with sections for each language, and subsections for each lexical category (verb, noun, etc.)
    • In Wikidata, each Lexeme page contains the homographs that share the same spelling+language+grammatical class (verb, noun, etc.)
      • The same Wikidata Lexeme page groups the different forms in the same word — e.g. "houses" is represented as a form in the house (noun) lexeme
      • Words that are spelled the same but belong to different languages are placed in different Lexeme pages. These homographs in other languages can be connected via homograph lexeme (P5402)
      • Additionally, there can be the same word spelled in alternative ways (e.g. loiça/louça). These can be connected via alternative form (P8530) (or probably synonym (P5973), for those that aren't similar, e.g. cruzeta/cabide)
  • Lexemes (L...) vs. items (Q...)
    • A lexeme has statements that describe the word
    • An item has statements that describe the concept
  • Senses and forms
  • Spelling variations
    • A lexeme can have different representations in different spelling variants (e.g. color vs. colour in en-us and en-gb).
    • These spelling variants need to have an official language code assigned, so thinks like Sampadjudu (Q2217638) and Sal Creole (Q18707467) can't be used.
    • A poor man's spelling variant can be done with separate lexemes connected via alternative form (P8530) (in the Forms section of the lexeme)

ToolsEdit

Data modelEdit

  • Lexeme (diagram)
    1. Top level
      1. Lemma
        • e.g. "run"
      2. Language
      3. Lexical category
      4. Statements (properties of the lexeme that are not specific to a Form or Sense). E.g. derived from, region, period, homonym, etc.
    2. Forms (i.e. inflections)
      • string representations of variants per gender, number, conjugation, etc.
      • one for each combination, tagged with the relevant qualifiers/properties (e.g. 2nd person, singular, past tense...)
      • Representation
      • Grammatical features
      • e.g. (TODO)
    3. Senses
      • string representations different meanings
      • link to items for the actual concepts
      • e.g. the lexeme "bank" (English noun) would have the senses "financial institution" and "edge of a body of water"
      • Gloss

Also:

ProblemsEdit

  • The creation form shows a "language variant" field when the entered language is not recognized.
    • See Help:Monolingual text languages and the tracking ticket phab:T144272.
    • For some reason the list of languages is restricted to the one approved by the Language committee for new Wikipedias, rather than e.g. the full list of languages from CLDR
    • To request a language to be supported, a new Phabricator task needs to be created in the same model as one of the child tasks of the one linked above
    • As a workaround, new lexemes can still be created by using the mis as described in the help page linked above.
    • For Kabuverdianu, the code is actually already available/linked (since phab:T127435), but the extra field was appearing nonetheless in the creation form; possibly that was due to a "no value" value for the language code, which was just removed in this edit.
      • TODO: if the problem persists, maybe a new Phabricator issue needs to be created.
      • Update Jun 2021: the problem still occurs — may be related to phab:T284870?

Question-answeringEdit

Tools:

See also:

Benchmarks:

Maybe there should be a completeness/coverage/parity dashboard, similar to w:Wikipedia:WikiProject Missing encyclopedic articles, that maps how much of the Wolfram Language can be modeled in terms of properties/qualifiers.

Mini-biosEdit

Thanks to the Wikidata Game, it will be possible to move quickly to a state where we Wikidata will have all the information needed to build automated mini-bios in the form

<label> (<place of birth, <date of birth> — <place of death>, <date of death>) was a <country of citizenship> <occupation> who <description>.

In fact, the description field for people in Wikidata should probably forgo occupation and nationality, and go straight to their claim to notability, since the former are redundant with the corresponding fields.

This proposal was originally posted here.

Related resourcesEdit