Pitulung:Ngeunaan data

This page is a translated version of the page Help:About data and the translation is 39% complete.

Wikidata téh basis pangaweruh bébas anu bisa dibaca sarta diropéa boh ku manusa atawa mesin. Ieu téh salasahiji proyék anu maké basis wiki anu diimahan jeung dikokolakeun ku Yayasan Wikimédia, organisasi nirbati kontén bébas. Unggal proyék Yayasan Wikimédia mibanda hancegan séwang-séwangan—pikeun conto, Wikipédia pikeun kontén énsiklopédi, Commons Wikimédia pikeun berkas gambar jeung média lianna, sarta Wikamus pikeun émbaran léxikal ngeunaan kecap sarupaning harti jeung sinonim. Hancegan Wikidata mah wangun data.

Ieu kaca ditujukeun minangka ulasan ngeunaan data terstruktur. Lamun anjeun geus wanoh jeung data terstruktur, tapi hayang neuleuman ngeunaan pamakéanana di Wikidata, cara ngaksés data di Wikidata, atawa cara kontribusi data proyék anjeun ka Wikidata, mangga luncat baé ka bagbagan nutumbukeun data.

Mikapaham Wikidata

Data terstruktur nujul ka data anu geus diorganisasikeun turta geus diteundeun dina cara anu tangtu, utamana pikeun énkode harti jeung ngajaga hubungan antara titik-titik data anu béda dina hiji datasét.

Naon atuh data téh? Sarta naha bet urang kudu opén kana data terstruktur sacara husus?

Nangtukeun data

Data gedé, data uji coba, data nembrak, métadata—anjeun bisa jadi geus wanoh sababaraha atawa sakabéh istilah ieu saméméhna.

Unggal istilah ngandung harti anu béda sautak-saeutik, ngan sakabéhna diwangun dina pamahaman balaréa ngeunaan data sarta poténsina pikeun ngadadarkeun jeung ngaronjatkeun pamahaman urang ngeunaan dunya di sabudeureun urang.

Minangka konsép abstrak, data bisa dianggap salaku prékursor émbaran, anu hartina yén émbaran bisa dihasilkeun atawa diturunkeun tina data.

Ieu téh kusabab data nalika diperes jadi cipatina bakal jadi sakumpulan ajén ngeunaan banda. Ajén-ajén ieu bisa angkawi atawa kuantitatip kawas ukuran atawa lobana, tapi bisa ogé kualitatip kawas deskripsi atawa babandingan. Pikeun conto, urang bisa nyebut yén "8.848 m" téh hiji ajén data ngeunaan luhurna Gunung Éperes sarta yén "beureum" téh hiji ajén data ngeunaan kelir hiji mobil.

Sakumaha saméméhna disebutkeun, émbaran teu sarua jeung data tapi malah hiji hasil tina ngumpulkeun jeung analisis data. Pikeun conto, 8.848 (data) téh angka anu sorangan mah taya hartina najan urang apal yén éta téh jangkungna hiji gunung; urang ukur bisa nyarita yén Gunung Éperes téh gunung pangjangkungna sadunya dina 8.848 m (émbaran) lamun urang apal ukuran baku jangkung sarta lamun urang apal jangkungna séjén gunung. Bakal jadi leuwih babari pikeun nyieun kacindekan sarupa kitu, nambahan kaweruh, jeung nanjeurkeun kanyataan nalika datana terstruktur—urang bakal ngabahas ieu salajengna.

Datana di mana?

Data aya di sabudeureun urang. Aya rupa-rupa sumber data, di antarana baé data keuangan, biologis, jeung data sosial. Samalah ieu kaca ogé ngandung data! Pikeun conto, ieu kaca boga jumlah kecap, titimangsa dijieun jeung panungtung diropéa, jejer, sabaraha kali dibuka, sarta basa naon anu nyampak pikeun ieu kaca.

Ngan, sanajan sagala rupa bisa jadi sumber data, data anu teu dicatet jeung teu ditata bisa dianggap euweuh. Lamun euweuh wangun/strukturna, data moal boga harti sarta bisa gagal nyadiakeun émbaran anu mangpaat.

By organized, we mean categorized in a standard and unambiguous way. The organized and categorized data is what we refer to when we say structured data.

 
Wikidata features form-based input for adding data to items

Strukturna di mana?

Di raramat, struktur téh jawarana. Raramatloka lolobana dijieun maké HTML, basa markup anu nyadiakeun éntép seureuh, atawa struktur, hiji kaca raramat.

Basa markup ogé dipaké pikeun nyirian jeung medar kontén kaca sangkan mesin pamaluruh, bot, jeung aplikasi sarupaning asupan RSS bisa babari ngolah jeung "ngarti"na. Pikeun conto, tag <title> ngabéjaan mesin naon ngaran hiji raramatloka.

Instead of supporting the structure and common elements of a web page, Wikidata provides structure for all the information stored in Wikipedia, and on the other Wikimedia projects. Wikidata is based on the Mediawiki software as is any other Wikimedia project, extended by Wikibase, the software which powers Wikidata and is designed to manage large amounts of structured data. Structure is not directly added to the content of Wikipedia or other Wikimedia site pages, as in tables or lists, nor is any knowledge of markup languages, data schemas, object notation, or other special syntax required by Wikidata users; instead, data is added to and edited in Wikidata through user-friendly input forms.

Sakur data anu diteundeun di Wikidata bisa dipaké pikeun ngahasilkeun sagala rupa béréndélan atawa tabél otomatis tur mutahir atawa kaca terstruktur lianna di loka Wikimédia mana baé atawa di loka mana baé.

Tabél 1
Data pikeun Gunung
Gunung Pasipatan Ajén
Mount Everest height 8,848 m
K2 hauteur 8,611 m
Kanchenjunga height 8,586 m
Lhotse height 27940 ft

Nytruktur data

For an example on the importance of structure, let's look at Table 1. In this table we can see data for the four highest mountains on Earth. If we would like to know a particular piece of information, such as the height of the second highest mountain in the world, we should be able to look at the provided data and find out the correct value. However, only three of the four mountains have their data categorized as a height value, and only two of those three mountains have values in metres. While we know that height and hauteur (French for height) can be understood as equal to each other, and how to convert metres to feet or vice versa, a machine, such as a bot or a computer program may not.

It would be much easier for both humans and machines to process the information and answer the original question about the second highest mountain when all underlying data is recorded in a similar way even if the presentation differs.

Modeling data

Collections of structured data, like Wikidata, are organized according to a data model. Data models are machine-readable, meaning they can be understood by a computer. While computers are powerful, they are often not as smart as us when it comes to simple reasoning. For instance, in the example above, a machine would not be able to know that height and hauteur are the same unless they were explicitly told somehow that was the case.

Table 2
Data for Mountains
Mountain Property Value
Mount Everest continent Asia
K2 continent Asia
Kanchenjunga continent Asia
Lhotse continent Asia
 

Data models vary based on the analysis needs, scope and conceptual framework of the dataset, and the technical requirements of a system. However, all data models typically will specify what kind of data can be supported by a system and what relationships between values can be understood and represented. For example, a data model could specify that height and hauteur be mapped to each other so that both terms represent one concept, or that measurements in feet be automatically converted into metres. The Wikidata data model shapes the way that data can be edited and added to the system by users. It is also a work in progress, with new data types being added to the model over time.

The data model also essentially translates human natural language patterns into something that can be processed by machines. For example, in English we might say:

"Mount Everest is the highest mountain in the world"

This is also the raw, unstructured format of content currently on Wikipedia and all other Wikimedia sites.

On Wikidata, this would be represented by a statement, which consists of a property-value pair about an item, in this case Earth:

Earth (Q2) (item)highest point (P610) (property)Mount Everest (Q513) (value)

Additionally, Wikidata would also hold a statement about the item for Mount Everest (indicating it is a mountain):

Mount Everest (Q513) (item)instance of (P31) (property)mountain (Q8502) (value)

Note that because other items can be used as the values for statements, and all items have their own unique page on Wikidata, this means that all items in the system can be linked together through a series of statements. Because Wikidata uses a machine-readable format, this interlinking of data allows new relationships and connections to be discovered and processed by machines. For example, in Table 2 we see new data for our mountains, this time about their geographical location by continent but nothing about their heights. Assuming this continent data was linked to the mountain height data, we would feel more confident making predictions or drawing certain conclusions about it, like saying that Asia is home to the world's highest mountains.

Linking data

Besides being a collection of structured data, Wikidata also supports linked data. Linked data refers to the practice of publishing structured data so that it can be interlinked.

For Wikidata this means that volunteer-contributed data can also be linked to other datasets, databases, and data sources from all around the web and from diverse initiatives outside of the Wikimedia family. For example, Wikidata currently allows interlinking with datasets and databases as diverse as Google Books, Canmore (one of the Historic Environment Scotland databases), the Vatican Library, OmegaWiki, and MusicBrainz.

 
example of a simple statement consisting of one property-value pair
 
example of a more complicated statement consisting of one property-value pair, qualifiers, and a reference

By following linked data principles and practices, Wikidata is also able to support and be used by other projects.

Linked data principles

Wikidata uses unique identifiers, or uniform resource identifiers (URIs), for all its items as per linked data standards.

While Wikidata uses a unique data model, its content can be exported in RDF, a widely used and standard format for linked data. In Wikidata terms, a statement is composed of an item and a property-value pair. For those familiar with linked data concepts, an item can be viewed as the subject part of a triplet; the property represents a triplet's predicate; and a value is used to express the object of a triplet.

However, Wikidata statements may also contain elements beyond the subject-predicate-object, such as references and qualifiers (for more information, see Help:Statements). This makes it complicated to fully represent Wikidata's content using the language of RDF—more information on these challenges can be found in the document "Introducing Wikidata to the Linked Data Web".

Contributing data

If you have datasets you would like to contribute to Wikidata, please see Wikidata:Data donation.

Accessing data

The data in Wikidata is published under the Creative Commons Public Domain Dedication 1.0, allowing the free reuse of the data. You can copy, modify, distribute and perform the data, even for commercial purposes, all without asking permission.

See Data access for details about the different ways to programmatically access Wikidata's data.

See also

For related pages, see:

For additional information and guidance, see:

  • Project chat, for discussing all and any aspects of Wikidata
  • Wikidata:Glossary, the glossary of terms used in this and other Help pages
  • Help:FAQ, frequently asked questions asked and answered by the Wikidata community
  • Help:Contents, the Help portal featuring all the documentation available for Wikidata