Open main menu

Atlasowa

Joined 31 October 2012
Messy storage room with boxes.jpg

Contents

STATS! and State of WikidataEdit

 
Wikidata item creation growth

Who is editing wikidata? Wikipedians? Bots? How many claims have references? How much vandalism? How much is wikidata used in Wikipedias?

StatsEdit


 
User:Succu/Statistics/NoClaims/20150330: Distribution of items without a claim (2015-03-30). This statistic was made from the last dump (2015-03-30). Item numbers are subdivided into intervals of 100,000. So for instance (X,Y) = (192,73720) represents the interval [Q19200000, Q19299999] having 73,720 out of 100,000 items without a single claim. --Succu (talk) 21:27, 2 April 2015 (UTC) (Wikidata:Project_chat/Archive/2015/09#About_stalagmite_.3Csmall.3E.28Q181312.29.3C.2Fsmall.3E_and_stalactite_.3Csmall.3E.28Q177197.29.3C.2Fsmall.3E)
Distribution of items (2015-09-07)
10,000
20,000
30,000
40,000
50,000
60,000
70,000
80,000
90,000
100,000
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
180
190
200
210 (QID)
  •   items without claims/statements
  •   items with claims/statements
  •   „missing” items (deleted, redirected or not created)
 
 
SuccuBot (de): 45,212,327 (14.8%)Research Bot (de): 19,776,148 (6.5%)PLbot (de): 12,487,091 (4.1%)Reinheitsgebot (de): 11,624,213 (3.8%)QuickStatementsBot (de): 11,367,426 (3.7%)ProteinBoxBot (de/nl): 12,211,661 (4.0%)Edoderoobot (nl): 22,833,627 (7.5%)RobotMichiel1972 (nl): 10,017,807 (3.3%)BotNinja (bg): 26,965,135 (8.8%)ValterVBot (it): 25,361,973 (8.3%)Dexbot (fa): 23,857,242 (7.8%)KrBot (ru): 19,804,772 (6.5%)Mr.Ibrahembot (ar): 19,685,878 (6.4%)Emijrpbot (es): 18,644,530 (6.1%)Harej (en): 15,190,706 (5.0%)GZWDer (flood) (zh): 10,554,910 (3.5%) 
  •   SuccuBot (de): 45,212,327 (14.8%)
  •   Research Bot (de): 19,776,148 (6.5%)
  •   PLbot (de): 12,487,091 (4.1%)
  •   Reinheitsgebot (de): 11,624,213 (3.8%)
  •   QuickStatementsBot (de): 11,367,426 (3.7%)
  •   ProteinBoxBot (de/nl): 12,211,661 (4.0%)
  •   Edoderoobot (nl): 22,833,627 (7.5%)
  •   RobotMichiel1972 (nl): 10,017,807 (3.3%)
  •   BotNinja (bg): 26,965,135 (8.8%)
  •   ValterVBot (it): 25,361,973 (8.3%)
  •   Dexbot (fa): 23,857,242 (7.8%)
  •   KrBot (ru): 19,804,772 (6.5%)
  •   Mr.Ibrahembot (ar): 19,685,878 (6.4%)
  •   Emijrpbot (es): 18,644,530 (6.1%)
  •   Harej (en): 15,190,706 (5.0%)
  •   GZWDer (flood) (zh): 10,554,910 (3.5%)

Wikidata content statsEdit

Wikidata (52,976,107)
human: 5,250,513 (9.9%)taxon: 2,627,026 (5.0%)administrative territorial entity: 1,886,201 (3.6%)architectural structure: 2,313,883 (4.4%)occurrence: 3,898,674 (7.4%)chemical compound: 1,188,724 (2.2%)film: 285,742 (0.5%)thoroughfare: 607,912 (1.1%)astronomical object: 305,734 (0.6%)Wikimedia list article: 322,759 (0.6%)Wikimedia disambiguation page: 1,330,206 (2.5%)Wikinews article: 196,623 (0.4%)scholarly article: 23,058,725 (43.5%)other P31/P279: 5,729,916 (10.8%)no P31/P279: 3,973,469 (7.5%) 
  •   human: 5,250,513 (9.9%)
  •   taxon: 2,627,026 (5.0%)
  •   administrative territorial entity: 1,886,201 (3.6%)
  •   architectural structure: 2,313,883 (4.4%)
  •   occurrence: 3,898,674 (7.4%)
  •   chemical compound: 1,188,724 (2.2%)
  •   film: 285,742 (0.5%)
  •   thoroughfare: 607,912 (1.1%)
  •   astronomical object: 305,734 (0.6%)
  •   Wikimedia list article: 322,759 (0.6%)
  •   Wikimedia disambiguation page: 1,330,206 (2.5%)
  •   Wikinews article: 196,623 (0.4%)
  •   scholarly article: 23,058,725 (43.5%)
  •   other P31/P279: 5,729,916 (10.8%)
  •   no P31/P279: 3,973,469 (7.5%)
Module:Statistical data/by project/classes, 2019-07-23
Wikidata in % of all statements (about 70 million)
instance P31/P279: 20.9identifiers: 20.8person fields: 16.5taxonomy: 9.1location (coordinates, P131, ..): 7.9country: 5.4names: 4.4main category and other internals: 3.3images and media: 1.7author, performer, etc.: 1bibliographic fields: 1sport: 0.8other: 6.4 
  •   instance P31/P279: 20.9
  •   identifiers: 20.8
  •   person fields: 16.5
  •   taxonomy: 9.1
  •   location (coordinates, P131, ..): 7.9
  •   country: 5.4
  •   names: 4.4
  •   main category and other internals: 3.3
  •   images and media: 1.7
  •   author, performer, etc.: 1
  •   bibliographic fields: 1
  •   sport: 0.8
  •   other: 6.4

Groups by number of statements with specific properties.
In % of all statements (about 70 million), qualifiers not generally included. November 2015.

Identifiers for persons and taxa are in "identifers", not "person fields" or "taxonomy". location doesn't include P17.

Revert statsEdit

  • Wikidata:Project_chat/Archive/2015/06#Revert_analysis: I analyzed all 2073 reverts of IP edits during the last 30 days and identified the country of origin of the IP's. In the graph below all countries with more than ten reverts are listed. Remarkably, seven out of the ten top countries are Spanish speaking countries. In total, IP's from these seven countries are responsible for 1117 reverted edits or 55% of all reverted edits during the studied period. I hypothesize that many IP's are coming from Spanish Wikipedia articles with a [editar datos en Wikidata] link as in es:Jesé Rodríguez. Such links are one hand very welcome as Wikipedia authors can faster improve the connected Wikidata item, on the other hand they open a new playground for vandals. I checked if the vandalizing IP addresses from Spanish speaking countries are also active on Spanish Wikipedia. Though only 14% of these IP's have also reverted edits on Spanish Wikipedia. This means that vandals on Wikidata and Spanish Wikipedia are different people but most probably many Wikidata vandals are coming from Spanish Wikipedia --Pasleim (talk) 20:26, 15 June 2015 (UTC)
I created a second plot showing the number of reverts by country relative to the total number of IP edits made in the country. Only countries with more than 100 edits are shown. In average, 7% of all IP edits get reverted. --Pasleim (A) (talk) 20:08, 16 June 2015 (UTC)
 
 
  • Towards Vandalism Detection in Knowledge Bases: Corpus Construction and Analysis Stefan Heindorf, Martin Potthast, Benno Stein, Gregor Engels, SIGIR’15, August 09 - 13, 2015, Santiago, Chile. http://dx.doi.org/10.1145/2766462.2767804 : "Our corpus is based on a database dump of the full revision history of Wikidata until November 7, 2014 (...) about 85% of revisions are made automatically by bots approved by Wikidata’s community. (...) As we are interested in detecting ill-intentioned contributions by humans, and not errors in bots, we base our corpus on the 24 million manual revisions."
    • Revisions made on Wikidata 167,802,227 100 %
    • Revisions made on meta pages 1,211,628 1 %
    • Revisions made on special items 11,167 0 %
    • Revisions made automatically 142,574,999 85 %
    • Revisions made manually 24,004,433 14 %
  • "Of the 24 million manual revisions made on Wikidata, a total of 103,205 have been reverted via rollbacks, and 64,820 via undo/restore. Based on our below validity analysis, we label roll-back revisions as vandalism, whereas this cannot be done with confidence for undo/restore revisions."
  • (Figure 2: Manual revisions on Wikidata per month. Revisions affecting textual content (labels, descriptions, and aliases) are distinguished from revisions affecting structural content (statements and sitelinks). Major growth events are labeled.) "The first jump of growth rate was caused by enabling statement creation for first time. In the months around this event, Wikidata was connected to the Wikipedias in various languages, adding millions of statements and sitelinks. (...) The second growth rate increase is due to the emergence of semi-automatic editing tools for Wikidata, most notably the Wikidata Game.
  • Vandalized Item Categories: "Table 1: Top vandalized items Cristiano Ronaldo, Lionel Messi, One Direction, Portal:Featured content, Justin Bieber, Barack Obama, English Wikipedia, Selena Gomez (...) the least vandalized category Places gets almost 4 times as much attention by all editors (31%) (...) The focus of vandals deviates significantly from typical editors (...) while categorizing the revision samples, we noticed that 11% of the vandalized items concerned India, cross-cutting all categories, compared to 0.5% overall."
  • Vandalized Content Types: "About 57% of the vandalism happens in textual content like labels, descriptions, and aliases; and about 40% happens in structural content like statements and sitelinks. The remaining 2% of miscellaneous vandalism includes merging of items and indecisive cases."
  • Vandals: "About 86% (88,592 of 103,205) of vandalism on Wikidata originates from anonymous users. (...) Unregistered users primarily vandalize textual content and sitelinks, whereas registered users primarily vandalize statements and sitelinks."

QuarryEdit

Wikidata in WikipediaEdit

State of Wikidata 2015 (Wikimania Mexico 2015)Edit

 
Wikimania 2015 - State of Wikidata

"Tell us about Wikidata at "your" Wikipedia"Edit