Wikidata is a knowledge base that anyone can edit. Before you get started, it is a good idea to familiarize yourself with the Wikidata Glossary. That way, editors can "speak the same language" (so to say). We hope that this will help to improve discussion and communication amongst editors.
The Glossary is ordered conceptually. General concepts related to Wikidata are presented first. This is followed by sections on Items and Properties identifying all the elements of the pages for these. Note that this glossary is generally related to concepts relevant to the User Interface on Wikidata. Functions which are accessed on the wikibase clients or through the API are nor discussed here. At the end we have listed a number of terms generally used in structured data which you may come across in discussions on Wikidata.
Names and ProjectsEdit
Wikipedia is a project to write an encyclopedia in over two hundred languages and make it available to every person on earth not just to read but also to edit and use.
The WikiMedia Foundation (WMF) is a charitable foundation based in San Francisco, USA, which provides the computer servers used by Wikipedia and other Wikimedia projects including Wikidata, Wikimedia Commons, Wiktionary, and others. These projects are usually divided into multiple individual wikis - one each language - as with Wikipedia but some, like Wikidata, use one multilingual wiki. There are about 800 different wikis in total for Wikimedia projects. As for now, only Wikimedia projects can be linked with Wikidata.
Wikimedia Commons is a project which stores images and other multi-media files in one central location and makes them available to be used on all Wikimedia projects. There is a proposal to use the wikibase software on Commons to make it easier to describe files there to improve localisation and search and this is likely to proceed in the near future.
Wiktionary is a WMF project for dictionaries in every language. There are proposals for using the wikibase software for wiktionary but this will need a lot of development and is probably some years off.
A wiki is a website which can be edited over the internet by many users, working together to improve the various pages of the wiki. The mediawiki software originally developed by WMF for Wikipedia is the most widely used wiki software.
MediaWiki is the wiki software on which all Wikimedia projects are based which allows thousands of people to collaborate to write articles and pages. As the software is free, there are thousands of installations on other websites besides the wikis which are run by the WMF. The Mediiawiki software is licensed as Free and Open Source Software which anyone can use and modify. Although most of the code is written by developers employed by WMF there are also lots of contributions from others. The Wikibase software, for instance, is written by WMDE.
Wikidata is a Wikimedia project that runs an instance of MediaWiki with the Wikibase extensions. It allows Wikidata editors to enter data and browse pages but it restricts the format in which the data is entered so that pages can be easily compared and translated. This will make it easy for Wikipedias in different languages to share basic facts from Wikidata and for editors to update those facts in one place and see the change reflected across all languages versions of Wikipedia and other projects using that data.
Wikibase is the software behind Wikidata. It consists of three extensions that can be integrated into the MediaWiki software: Wikibase, Wikibase client, and WikibaseLib:
The Wikibase extension (for the Wikidata server, often called repository or just repo) allows a dedicated MediaWiki installation to collect and maintain structured data and is used on the Wikidata website.
The Wikibase client extension (often called just client) enables MediaWiki installations such as the Wikipedias to query and display data from a Wikidata server on its own pages, and is in use on Wikipedias in different languages and several other sister sites.
The WikibaseLib extension has common libraries for both of the major extensions.
Logged in Users can set a wide variety of "Preferences" via the link at the top of the page (only visible when you are logged in). Amongst the preferences is a tab labelled "Gadgets" which lists a wide variety of enhancements to the user interface and additional tools to make tasks easier and Users can enable any of these. These are not part of the Wikibase extension but they work with it.
WikiMedia Deutschland is a charitable foundation based in Germany which is independent of the WikiMedia Foundation but acts to support the WikiMedia movement. WMDE has sponsored the development of Wikibase and Wikidata and they employ the development team. Employees of WMDE are identified by having (WMDE) appended to their usernames.
On Wikidata an entity is a page with structured data. It is also used for the data content of these pages. Wikidata has three types of entity - item, property and query Each with a separate namespace.
Pages in our wikis are grouped by namespaces identified by the prefix before the page name.
In wikidata the Main namespace (with no prefix) is reserved for Item pages.
Properties are in the property namespace. The names of Property pages all start with "Property:".
Other namespaces include "Wikidata:" (pages for general discussions about wikidata); "Help:" (Help pages like this one), "User:" (for each logged in user to write about themselves); and other more specialised pages as listed on the advanced search page. Associated with each of these namespaces is a discussion page prefixed "Property talk:", "Talk:", "Wikidata talk:", "Help talk:" etc. These are for discussion of the associated page and can be reached via the "Discussion" tab near the top of each page.
Item pages (in some languages translated to words for subject, object or element in the user interface) refers to a real world object, concept, event that is given an identifier (an equivalent of a name) in Wikidata together with information about it. Each item has a corresponding Wikipage in the Wikidata main namespace. Listed below are the various elements of an item pages, starting from the top of the page.
Item IDs, Labels, Descriptions and AliasesEdit
Items are identified by a unique id starting with Q and followed by a number (like Q5).
At the top of each Item is a label in your language (Logged in users can change their default language). If no label has been set in your language then a greyed out message will say this. The item ID is shown after the label. Labels are not unique. Multiple items can have the same label. See Help:Label for more information on labels.
Below the label is a description in your language or a greyed out message if none has been entered. Although labels need not be unique the combination of label and description should be unique for each language. Remember that each item represents a concept - not a word. Some languages may need much longer descriptions than others to describe the concept of the item. See Help:Description for more information on descriptions.
Below the description is a space for alternative names for this item. aliases are used in the wikidata search as alternative terms. See Help:Alias for more information on aliases.
The other languages button is next. If you select this then it will show labels, descriptions and aliases in other languages selected by the software based on guesses about you based on your set language and location. For logged in users it tracks languages the user has used and uses that to select languages to display here.
The edit button in the labels, descriptions and aliases section will reveal the other languages box and let you edit the labels, discussions and aliases in these languages
The next section of each item is the statements. Each of these has a claim with a <property:value> pair - each a combination of a predefined property with a suitable value telling us something about the item. Claims can be supported by qualifiers and references. Each item can have multiple statements. See Help:Statement for more information on statements.
The first part of a statement is the claim which is a <property:value> pair. (<property:value> pairs are called snaks in the datamodel. See mw:Wikibase/DataModel#Snaks) Click the add button for a new statement and the property entry box will appear with suggestions for suitable properties, based on the other statements the item has. If you think of a claim as a sentence with the item as the subject then the property can be seen as similar to the verb in the sentence describing the relationship between the item and the value.
Once you select a property for your claim then the software will display a box for you to enter a value to go with that property. Each property expects values whose datatype matches that of the property and any other values will be rejected. For properties that take an item as a value the software will suggest suitable items when you begin typing the item label in your language. Other datatypes include properties,geographical coordinates, dates, urls, strings, monolingual text, numbers and numbers with units.
Next to the value entry box are three little rectangles. Clicking on these reveals the option to enter "no value" - this property does not have a value - or "unknown value" - this property has a value but it isn't known - or "custom value" - returns to the value entry box.
Each claim property can have as many additional values as apply to that item. Just click the "add" button inside the statement box.
To add a claim using a different property click on the "add" button just below the statement box
Between the property and the value of the claim are three squares. These let you select the rank of that claim. See Help:Ranking for more information.
For most cases this will be normal rank.
If there are multiple values then you can select one as the preferred rank. This will typically be the current value.
Where we have a value which we know is incorrect then this should be marked as deprecated rank. We sometimes include incorrect values just so we can mark them as deprecated and reduce the chance that they will be added by others. Deprecated values are not passed to Wikipedia or other clients.
Qualifier property:value pairs provide additional information related to the claim - start and end dates for instance. Click "edit" then "add qualifier" and a box will appear for you to enter a property. Select a property and you can then enter it's value. You can add as many qualifiers as you need. See Help:Qualifiers for more information.
Click "references" then "add reference". This allows you to add property:value pairs to describe references where the statement can be confirmed. imported from Wikimedia project (P143) is used to identify where information has been imported from wikipedia or other source but does not yet have references.
The next section on the item page is the sitelinks. These link to articles and pages on other WMF wikis which are about this item. See Help:Sitelinks for more information.
Multiple language projectsEdit
There is a sitelink section for each of the multiple languagel projects - Wikipedia, Wikibooks, Wikinews, Wikisource, Wikiquote, Wikivoyage. Each of these sections can have a link to one (and only one) page on each language version of that project.
Select edit on a section and sitelinks to the wikis in that multiple language project can be added, deleted or changed. Each sitelink has a language label, a page name and a badge.
The language codes on the sitelinks correspond to the various language versions of that project and should match the language codes in the prefix to the urls for that language version.
Different projects have different names for their pages. Wikipedia calls them articles, Wikimedia Commons calls them files. Enter the name of the page to be linked here and the software checks that the spelling of the page name matches that on the wiki. This links that page to this wikidata item.
The various wikis use the sitelinks to add links from the wikipages linked using the sitelinks to wikipages in other languages which are also sitelinked to that item. The software only allows one sitelink per wiki so the other wikis in that project will have, at most, one link to each of the wikis in other languages.
Next to the link is a strange dotted symbol. Click on this and a menus of badges appears that can be added to the sitelink. These describe the status of the wikipage that the sitelink links to (rather than describing the wikidata item). These are used to add additional information to language links.
The various wikis use the sitelinks to add language links to the wikipages linked using the sitelinks. The software only allows one sitelink per wiki so the other wikis in that project will only have one language link to each language wiki.
the other sites section links to projects, like Wikimedia Commons, which have one multilingual wiki rather than multiple separate language wikis. Instead of entering a language code you should enter the project name.
Properties and Property pages are wikidata entities. Properties are used in statements to describe the relationship between entities and values. Each property can accept values of one datatype.
Property IDs, Labels, Descriptions and AliasesEdit
The Labels, ID, Description and Aliases for properties are the same as for items except for the following.
Properties are in the property namespace. The label is preceded by "Property:" To limit a search to property pages only put "P:" in front of your search term.
Property IDs start with P followed by a number.
Property labels in each language should be unique - no two properties should have the same label.
Property descriptions may include some guidance on the usage of the property.
Property aliases in each language should be unique. Two different properties should not have the same alias.
Each property has values of only one datatype - as listed below. It is not possible to have a property which can have properties of more than one datatype.
Most properties have values which are items - entities with ID number starting with Q. Just type in the label or alias of the item and wait for the software to autocomplete. If there is no item corresponding to the value of a property with item datatype then a new item will have to be created before the value can be entered.
Some properties have values which are other properties.
The time datatype shows a date value. Time values can have an uncertainty - to the nearest day, month, year, decade, century, millenium. Note this is an uncertainty, not an interval. Dates are entered as "day_month_year". Example 5 3 1983 is displayed as 5 March 1983. You can select which calendar to use to display the date - the Julian and Gregorian calendars are available so far. More accurate time values (to a fraction of a second) can only be entered via the API.
Note that to describe a period of time you need two claims or qualifiers with different properties - one for a start date and another for an end date.
Note that this datatype is not used for recurring dates like Christmas day or May 5th. Those are items.
geographic coordinates datatypeEdit
The longitude and latitude coordinates of a place and what globe they are on (Yes! Wikidata has coordinates for places on the moon and mars and other planets! Extraterrestrial coordinates cannot be edited through the item pages; only through the API).
Enter the coordinates as two numbers between +180 and -180 separated by a space. -30.0 150.1234 will be displayed as '30°0'0.0"S, 150°7'24.2"E'. If you enter 30.0S 150.1234E or 30°0'0.0"S 150°7'24.2"E you get the same result.
'Advanced adjustments' lets you select the precision with which the coordinates are displayed.
A web address. This must include http:// or https:// at the beginning.
commons media file datatypeEdit
The name of a file on Wikimedia Commons.
Any sequence of alphanumeric unicode characters.
monolingual text datatypeEdit
Unicode text with a language code to identify the language.
A number. This can have an uncertainty interval. Example 1234+-5 will display as 1234±5.
quantity with units datatypeEdit
A measurement with uncertainty and a label to identify the units.
Statements for properties provide information about the property. A lot of this information is used by bots to check how the properties should be used in statements and to identify discrepancies that need to be checked.
Query entities are not yet available in wikidata.
Here are some terms which are used in structured data and knowledge engineering generally and which you may come across in discussions here on wikidata.
Wikipedia contains a lot of information and some of the most advanced artificial intelligence programs have tried to understand it (we don't charge a license fee to do this so any PhD can have a go). So far the results are disappointing with a lot of guesswork and mistakes. Structured data turns this problem on it's head. The AI doesn't try to understand wikipedia; instead wikipedia is translated into a format that is easy for a computer to understand. We use a controlled vocabulary of carefully defined terms. Where two things are similar we use the same properties in the same way to describe them both. Structured data has been used in industry and commerce for years, used by businesses to describe their transactions, customers and suppliers and various standard techniques have been developed.
Note that even with years of development wikidata will not have the quality of information contained in a wikipedia good article, much less a featured article; nevertheless it is still worth doing for a number of reasons.
- many languages do not have a lot of good articles. Wikidata is designed so that, with a little localisation every language wiki can at least have a comprehensive infobox that is kept up to date from a shared source on millions of topics. It's not a featured article but it's a lot more than we have now.
- it can be difficult to keep even the busiest wikis up to date on basic facts like the names of mayors of small towns or the death of an actor who specialised in voiceover for video games. With Wikidata this kind of basic fact can be updated centrally and all wikis get the updated information.
- The Categorisation system used on the wikis is useful for connecting articles on the same topic but it has limits. When it is done Wikidata search will be able to construct a category like list of articles from many more queries than would ever become Categories.
- Wikidata has an open API which means that anyone can write a program to read information from wikidata and create new ways to visualise that data, including links back to the wikidata items and the associated wiki pages on wikipedia and other WMF projects which are linked to those items.
- The data in wikidata is available under a CC0 license so anyone can use it. While industry and commerce will have extensive data stores for information about their own products and services they will need a shared vocabulary for communicating with others. Wikidata has the potential to become that shared backbone - a human curated and checked data store, localised in many languages and available at no cost. This is a resource even the largest companies cannot afford to duplicate. We want to put a free culture data model, designed from the ground up to be as inclusive as we can make it, at the beating heart of the next generation web.
Meta data is information about data.The meta data of a phone call is the start time, end time, number called, number making the call. This then links to the meta data for the number called - location, bank account number that pays the fees etc.
Wikidata stores meta data about real world concepts and topics and things that have wikipedia articles and wikisource documents and wikivoyage entries - the geographical coordinates of a place, the date a politician was elected and to what, the atomic weight of an element, what actors appeared in a movie and what roles they played.
Wikidata doesn't just give you the name of the actors that appeared in a movie and what roles they played. It links you to items for each of those actors and roles/characters so you can find more data - the actors date of birth, the book that character first appeared in.
An ontology is an explicit and formal specification of a conceptualization. It is important that an ontology convey a shared understanding of a domain. In Wikidata this would be given by using the properties and their intended meaning in statements to describe the real world entities and concepts, through their Wikidata counterpart, associated to literal data and other entities.
For example, to describe a football league table you need properties to specify the teams and the various columns in the table - games played, games won, games drawn, games lost, goals for, goals against, league points, ranking - and a standard way to associate these with each other. Once all of these have been agreed then the information (or dataset) for every season of every football league can be entered in this standard format ready to be searched and compared and translated into a hundred languages and a bunch of visualisations and graphs.
As you can see from the above example, the ontology is not part of the wikibase software - rather it is the web of properties and items that the editors of wikidata have built using the tools provided by the wikibase software. While we have tried to make our choices as logical and neutral as possible nevertheless there are real editorial decisions in the choices we have made. Most of these choices are made in the Wikidata:Property proposals pages where each proposal for a new property is discussed and (to a lesser extent) in Wikidata:Requests for comment and Wikidata:Project chat and the various WikiProjects.
Language is imprecise. Some words have multiple meanings. Some meanings can be described by multiple different words. Add in different languages and the opportunities for misunderstandings multiply. Structured data manages this uncertainty by having a limited number of terms that can be used, each with an agreed meaning. In Wikidata these are the wikidata items and properties. Wikidata has over 200 items called 'John Smith' but each has a different ID and we use labels, descriptions and statements to distinguish between them so that the correct 'John Smith' is used in each case.
A group of instances. A class item will have a claim using the subclass of (P279) property to link it to a larger class and so on to a root class. Some other ontologies have class as another type of entity but in Wikidata a class is just another item. Note that a class can be an instance of (P31) of a type of class so Lincoln Continental Mark IV (Q1128805) has the claims <subclass of (P279):luxury vehicle (Q5581707)> and <instance of (P31):automobile model (Q3231690)>.
Statements on an item for a class should relate to the class, not to the instances that make up the class. Bishop of London (Q1587771) should not have the claim <sex or gender (P21):male (Q6581097)> even though every Bishop of London has been a male. See Help:Basic membership properties.
The Resource Description Framework (RDF) is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata data model. It has come to be used as a general method for conceptual description or modeling of information that is implemented in web resources, using a variety of syntax notations and data serialization formats. It is also used in knowledge management applications.
The Web Ontology Language is a family of knowledge representation languages for authoring ontologies. Some wikidata properties which are the same as OWL properties. These are identified by statements on the wikidata property pages.
(commonly called Triple) is how to store data as a single data entry in linked data. It consists of a subject, a predicate and an object. In Wikidata this corresponds roughly to the item, property and value.
The Application Program Interface is the software interface to wikidata by which other programs and websites can read data from and (if authorised) write data to wikidata. This means that software developers can write software that interacts with the data in wikidata with minimal permission or approval from wikidata or anyone else. We hope that this will lead to thousands of new and interesting apps and websites reusing the data in ways we haven't thought of.
Internationalisation, often abbreviated as I18N (I plus 18 letters plus N) is the process by which a software program is adapted to make it easy to create local versions in other languages. The wikibase software is internationalised in that there is provision for every entity to have labels in multiple languages and for the program to show users labels in their preferred language (if these have been entered).
After a program has been Internationalised then the next stage is Localisation, often abbreviated L10N (L plus 10 letters plus N). This the process by which labels in each language are entered. Because of the way wikidata works; with a controlled vocabulary of properties and items; this means that providing a label in your language for one property or one item will mean that every statement using that property or item will now display the new label to every user with your default language.