User:Christopher Johnson (WMDE)/Wikidata:Glossary


Wikidata is a knowledge base that anyone can edit. Before you get started, it is a good idea to familiarize yourself with the Wikidata Glossary. That way, editors can "speak the same language" (so to speak). We hope that this will help to improve discussion and communication amongst editors.

The Glossary is ordered conceptually, with the more general concepts presented first as much as possible, not alphabetically. This is because it is automatically translated into several languages, and the concepts have different names in different languages. In some cases, it is not obvious how to organize the entries. In these cases, "see also" has been added to the appropriate section.

Basic terms

TERM DEFINITION

Data

A data (Q42848) is an information noted in some sort of code. Wikidata is essentialy a collection of structured data, or database content. Those data are generally everything entered by the Wikidata editors and bots using the entity pages and the public programming interface. The wikipages from which a user can see and enter datas are organized in three data namespaces:

  1. the main namespace (for items), regrouping pages in which we can see and enter informations about a specific entity ,
  2. the property namespace, in which we can see informations about #properties, which are used to structure the information we enter into #statements and the
  3. query namespace, in which we can define additional ways to extract and display the information than the main namespace .

The datas in those namespace are said to be structured because they are all organized in a way the Wikibase software ensures a certain data model (Q1172480)      data model and because community decides and enforces correct ways to enter informations.

Other Wikidata pages are classical Wikipages and consist of unstructured data or semi-structured data (Q2336004)      (for example: running text or wikitext), and are meta pages, such as community discussion pages.

Specifically, an important kind of datas are property data. A property data is a values associated to a property to build a Claim, an organisation unit of the structured datas , A property is associated to a Datatypes, which defined the property datas values that can be use in claims built with this property.

Dataset

A dataset is generally any collection of (structured) Datas.

In Wikidata what is called a dataset is often associated to an entity : the dataset associated to an entity is all the information shown in the identity Wikipage (the set of statements in the database who have this entity as a subject, the Wikipedia links of articles describing this entity on Wikimedia projects, ...).

We can build other datasets by combining dataset of several entities.

The datasets can be represented in different ways : as in their entity Wikipage in the form of an XML or JSON file for the robots and computing programs. Specifically in the Wikidata user interface messages, dataset refers to datas associated to an entity (an item, a property or a query)

Dereferenceable URIs

These are used during content negotiation to supply a resource description even if it is the entity itself that is addressed. This also makes it possible to supply a human-readable description or a machine readable one. The latter one would then be RDF data, according to what is more suitable. The content the dereferenced URIs point to will be available through the page Special:EntityData.

Export

This refers to the way data and meta page content from Wikidata are made available for further consumption. The intention is to make machine-readable exports of the data available in widely used formats such as JSON or RDF/XML.

Linked data

This is a method for publishing structured data so that it can be interlinked and become more useful. It closely relates to how Wikidata works, by connecting entities and attaching data on linked data pages like Wikidata do for items.

List of pages linked to an item

See Sitelink

Initial Wikidata development

This refers to the work that is done by the initial Wikidata development team, as described in the technical proposal in the April 2012 to March 2013 timeframe.

Ontology

This is an explicit and formal specification of a conceptualization. It is important that an ontology convey a shared understanding of a domain. In Wikidata this would be given by using the properties in statements to describe the external entities (given by internal items) in relation to objects and literal data.

Provenance

This is the chronology of ownership or contributors (as would be the case in a crowdsourced project), and also from where the data originated.

Vocabulary

This is the set of terms that is used to describe the ontology. The terms used in one vocabulary can be the same as (owl:sameAs) some terms from another vocabulary. Sameness is more strict than equality.

Wikibase

This is the software behind Wikidata. It consists of three MediaWiki extensions: Wikibase, Wikibase client, and WikibaseLib:

  1. The Wikibase extension (for the Wikidata server, often called repository or just repo) allows a dedicated MediaWiki installation to collect and maintain structured data and is used on the Wikidata website.
  2. The Wikibase client extension (often called just client) enables MediaWiki installations such as the Wikipedias to query and display data from a Wikidata server on its own pages, and will be deployed on Wikipedias in different languages, and probably on other sites.
  3. The WikibaseLib extension has common libraries for both of the major extensions.

Wikidata

This Wikimedia project runs an instance of MediaWiki with the Wikibase extensions. It allows Wikidata editors to enter data and browse pages.

Wikidata editors

These are users of Wikidata who create and collaboratively maintain its content. Together, they are part of the Wikidata community. Anyone can be a Wikidata editor.


Pages

TERM DEFINITION

Page

An internal or external webpage with a unique title, for example an article in Wikipedia main namespace or an item in Wikidata main namespace. In Wikidata, the term "page" may refer to a item or property page in the data namespaces, an meta page in other namespaces or an external linked page on Wikipedia or other Wikimedia site or an other external site, that is referenced using a sitelink. Pages in the main namespace of Wikidata are about items, and one page can only hold one item.

Meta pages

These are all pages that are not entities, i.e. do not belong to the data namespaces. Wikidata meta pages contain unstructured content represented by conventional MediaWiki code, and perhaps also future Wikidata client side inclusion code. Examples are talk pages, category pages, project pages (in the Wikidata namespace) and help pages (in the help namespace). Meta pages also comprise content and data automatically generated by the MediaWiki software (for example, the edit history of a page, or special pages).

Namespace

A physical division of pages in MediaWiki to group them according to overall use or some additional behavior. Examples are namespaces for categories, files, users, and in the case of Wikidata, three data namespaces: items (in the main namespace), properties and queries. See the list of namespaces.

Mainspace

This is the namespace where all items are located. It is distinguished by its lack of a prefix.

Title

This is the name of an external linked page (known as Sitelink-title), the name of an meta page, or the Entity ID of an entity page. If the page does not belong to the main namespace, the title includes the namespace:id.

For items, properties and queries, the Wikidata entity title is an identifier containing the namespace prefix (if any), followed by a letter and a numeric id. A title example is Property:P17 for a property, and Q6256 for an item. The page URL consists of www.wikidata.org/wiki/ followed by the title. In search results, the localized label (also known as name) is presented, followed by the identifier in parenthesis (without the namespace prefix), and by the description, to make the overall string more readable. For example, if you search for "country" using the Special:Search interface, the search result will include the property "country (P17): sovereign state of this item", as well as the item "country (Q6256), region legally identified as a distinct entity in political geography".

Used for sitelinks the title is a canonical string that identifies a page on an external site. The Special:ItemByTitle interface may be used for searching a page by its title on a given Wikipedia. Together the site and title form the complete sitelink. During validation of the title the string will go through a normalization procedure, and in the end the title will be the external site's canonical page name. Only after the normalization is completed and site-specific constraints are satisfied a new sitelink can be stored.

Used for an meta page in non-entity namespaces the title is spelled out as is and identifies the meta page. The namespace is normally prefixed to the string, and also to the URL. Title example is Wikidata:Glossary.

Language attributes

TERM DEFINITION

Language attributes

These are the language-specific labels, aliases and descriptions that are assigned to items, properties and queries. These are human-readable text to improve understanding of the scope of the item; for example the specific type of real world entity. If they are missing some of them can be replaced by strings from alternate languages, following the language fallback chains.

Language fallbacks (language chains)

These are methods to systematically replace missing language attributes with strings from alternate languages. The exact replacement rules can be chosen depending on the type of page, whether the user is logged in, or the user preferred languages.

Label

Also known as name (not to be confused with title), this is a language-specific name used for items, properties and queries. This is usually the most important name the entry is known under, or the most general or easily understandable phrase it will be known as internally to the project. Within Wikidata this takes the role of the title in Wikipedia and is used as the primary means to distinguish entries. For items it does not need to be unique, neither in the language or the overall project, but it must be unique together with the description. For properties and queries (not defined yet) it must be unique within the given language. Uniqueness for a combination of a label and a description is a hard constraint that must be satisfied before a change can be saved, although it may be removed in the future.
Labels should use the language specific conventions for capitalization of proper names and phrases as seems fit for the specific entry. In listings the label will be followed by the description so they join as a single list entry. Both labels and descriptions can be extracted and used independently.

See Help:Label.

Description

This is a language-specific descriptive phrase for an item, property or query. It provides context for the label (for example, there are many items about places with the label "Cambridge"). The description therefore does not need to be unique, neither within a language or the overall project, but it must be unique together with the label. Uniqueness for a combination of a label and a description is a hard constraint that must be satisfied before a change can be saved.

See Help:Description for more information, including proper styling of descriptions.

Aliases

These are marked as ⧼wikibase-aliases-label⧽ in the user-interface. They are language-specific alternate names for items, properties and queries that can be used for lookup the same way as labels (titles). Similar to the labels they are language specific, but unlike the labels there can be as many aliases as necessary.

See Help:Aliases.

Entities, Items, Properties and Queries

TERM DEFINITION

Entity

(in the Wikidata user interface messages sometimes called data set) is the data content of a Wikidata page, that either may be an item (in the main namespace), a property (in the property namespace) or a query (in the query name space). Every entity is uniquely identified by an entity ID, which is a prefixed number, for example starting with the Q prefix for an item, and P for a property and U for query. An entity is also identified by a unique combination of label and description in each language. The entity can also be assigned a set of alternative multilingual aliases. (In ontologies and library catalogues that are used as reference for Wikidata, an entity is typically a real-life topic or subject, or its database representation, and corresponds in that context to what in Wikidata is called an item.)

Item

(in some languages translated to words for subject, object or element in the user interface) is a page in Wikidata main namespace that represents a real-life topic, concept, or subject. Items are identified by a prefixed id, or by a sitelink to an external page, or by a unique combination of multilingual label and description. Items may also have aliases to ease lookup. The main data part of an item is the list of statements about the item. An item can be viewed as the subject-part of a triplet in linked data.

Property

(in some languages translated to attribute) is the descriptor for a data value, or some other relation or composite or possibly missing value, but not the data value or values themselves. Each statement at an item page links to a property, and assigns the property one or several values, or some other relation or composite or possibly missing value. The property is stored on a page in the Property namespace, and includes a declaration of the datatype for the property values. Compared to linked data, the property represents a triplet's predicate. New properties are suggested and documented at Wikidata:Property proposal before creation. After creation, the documentation is placed at the corresponding Property talk page, where the usage of the property can be further discussed. All properties are manually listed at Wikidata:List of properties (WD:P). Most properties can be mapped to Wikipedia infobox parameters and categories, see the property documentation, as well as Wikidata:Infoboxes task force (Wikidata phase II) and various subject task force pages. The inclusion of property values in Wikipedia infoboxes (using inclusion syntax) is done for each infobox and Wikipedia version individually.

Query

(future feature) is a predefined search across items. A query is the descriptor for the predefined search, but not the hits generated by the search. A query can be executed to acquire search results, which may be useful for automatic generation and translation of list articles. See Wikidata:Lists task force (Wikidata phase III). Each query is an Entity and described and defined on its own page, and has its own prefixed identifier with prefix Query:U. The query engine is not ready yet, as is shown by the <no-query-yet> message we receive when attempting to make a new query.

Claims and Statements

 
Elements of a statement
TERM DEFINITION

Claim

is a piece of data about the entity on whose page the claim appears. A claim consists of a property (such as "Location") and a value (e.g., "Germany"), or some other relation or composite or missing value. A claim can have qualifiers, such as temporal qualifiers saying that the claim is valid within a specific time frame. Compared to the triplets used in linked data, a claim uses a property to express the predicate of a triplet and a value to express the object of a triplet. Claims form part of statements on item pages, where they can be augmented with references and ranks; they can also occur on non-item data pages.

Statement

is a piece of data about an item, recorded on the item's page. A statement consists of a claim (a property-value pair such as "Location: Germany", together with optional qualifiers), augmented by optional references (giving the source for the claim) and an optional rank (used to distinguish between several claims containing the same property). Wikidata makes no assumptions about the correctness of statements, but merely collects and reports them with a reference to a source. See Data model and Help:Statements.

Values

(or datavalues) are the information pieces embedded in each claim. Depending on their datatype, they can be a single value (like a number) or a value consisting of several parts (like a geographical position with longitude and latitude).

 
Modify the snaktype (value/some value/no value) here.

No value is a marker when there is no known value for the property. Lack of a value for a claim is very different from negation of a claim.

Unknown value is a marker when there is some value but the exact value is not known for the property. Some value mean that there is nothing known for the value except that it should exist and not imply a negation of the claim.

Custom value is a marker when there is a known value for the property.

Snak

is a single, basic assertion in Wikidata, including property-value assertions, "no value" assertions, and others. Statements are composed of one-to-many snaks. Snaks are an integral part of the data model, but, normally, this term will not be exposed to editors and users of Wikidata. For more information, see meta:Wikidata/Data_model#Snaks.

Datatype

(data value type or value type) is the kind of data values that may be assigned to a property, and specifies how the data values are stored in each claim. Each property is assigned a pre-defined datatype. See also Special:ListDatatypes for currently available datatypes.

String

(short for character string) is a general term for a sequence of freely chosen characters interpreted as text (e.g. "Hello") — as opposed to a data interpreted as a numerical value (3.14), a link to an item (e.g. [[Q1234]]) or a more complex datatype (the set {1,3,5,7} ). Wikidata will in addition to a string datatype support language specific texts; "monolingual-text" and "multilingual-text" as the value of a property.

Qualifier

is a part of the claim that says something about the specific claim, often in a descriptive way. A qualifier might be a term according to a specific vocabulary but can also be a variant descriptive phrase (whether those terms or phrases are free text or part of some vocabulary would probably be up to the Wikidata community).

Rank

is a quality factor used for simple selection/filtering in cases where there are many statements for a given property (see Help:Ranking). There are three possible ranks:

Deprecated rank is used for a statement that contains information that may not be considered reliable or that is known to include errors. (For example, a statement that documents a wrong population figure that was published in some historic document. In this case the statement is not wrong – the historic document that is given as a reference really made the erroneous claim – but the statement should not be used in most cases.)

Normal rank is used for a statement that contains relevant information that is believed to be correct, but may be too extensive to be shown by default. (For example, historic population figures for Berlin over the course of many years.)

Preferred rank is used for a statement with the most important and most up-to-date information. Such a statement will be shown to all users and will be displayed in Wikipedia infoboxes by default. (For example, the most recent population figures for Berlin.)

Reference

(or source) describes the origin of a statement in Wikidata. A source is often an item in its own right; for example, a book. Wikidata does not aim to answer the question of whether a statement is correct, but merely whether the statement appears in a reference. What constitutes valid references is expected to be a question of debate among the Wikidata editors.

Sitelinks

TERM DEFINITION

Sitelink

(in the user interface called List of pages linked to this item) is an identification of a linked page or article on another Wikimedia site such as a Wikipedia language version. It consists of a site identifier and a Sitelink-title (the article title), and go from individual items in Wikidata. They are used both for identifying an item from an external site, and as a central storage of interwiki (interlanguage) links. Previously, inter wikilinks were stored locally in every Wikipedia version of each article, and regularly synchronized by bots, which sometimes made them inconsistent. Wikidata phase I aimed at replacing these locally stored interwiki links by globally defined interwikilinks, stored as sitelinks in the Wikidata items. Sitelinks can have attached badges and will usually show that a page has been a featured article, or of similar status. See Help:Sitelinks.

Site

is a reference to an external website in general, but in sitelinks it refers to specific registered wiki websites owned by Wikimedia foundation, for example a Wikipedia language version, that easily can be linked. Those sites are referenced by global site identifiers or for short siteid. For example the English Wikipedia´s siteid is enwiki, while its interwiki code is en:. Usually the initial letters are followed by the subdomain of the registered site at Wikimedias projects. Linking to such sites can have constraints. In the current setup each external page can have only one link registered in Wikidata and one item can only have one link to each external site.

Badges

(future feature) is a kind of marker attached to a sitelink, which could identify, for example, that the article is a "featured article" on a specific site. They do not describe the external entity but the page on the specific site.

Related terms

TERM DEFINITION

MediaWiki

is the software that runs projects like Wikipedia and Wikimedia Commons; see MediaWiki.

JSON

is a JavaScript-based notation for exchanging data; see JSON.

RDF

is a W3C standard for the data model of the semantic web; see Resource Description Framework.

RDF/XML

is a serialization format of RDF in XML; see RDF/XML.

Triplet

(commonly called Triple) is how to store data as a single data entry in linked data. It consists of a subject, a predicate and an object. In Wikidata this corresponds roughly to the item, property and value.

XML

is a W3C standard for an extensible markup language; see XML.

Interlanguage link:

see Sitelink

Interwiki:

see Sitelink

See also

mw:Manual:Glossary – MediaWiki software Glossary

Meta:Glossary – Multilingual glossary for all Wikimedia projects

w:Wikipedia:Glossary – Wikipedia glossary

mw:Help:Extension:Translate/Glossary - Translation extension glossary

Category:Help-en – Wikidata English help pages

Wikibase Data Model Primer

The Wikibase Data Model

Wikidata technical proposal