Wikidata:Liputan perkamusan

This page is a translated version of the page Wikidata:Lexicographical coverage and the translation is 43% complete.

This page presents the lexicographical coverage of the Wikidata lexicographical data compared to a corpus of the given language. Unless the entry for the language says otherwise, the corpora are based on Wikipedia (available here).

These pages are updated weekly on Wednesdays by NikkiBot, although no edit will be made if nothing has changed.

The code for the bot is at https://github.com/nikkiwd/lexcover and is based on the original PAWS notebook by Denny. Report issues to Nikki (either on User talk:Nikki or on Telegram). Requests for additional languages, improvements and suggestions are also welcome.

Words can be filtered out by adding them to the "Filter" subpage for the language (e.g. Wikidata:Lexicographical coverage/nb/Filter) and the entries in the list can be customised, e.g. to add search links, by editing the "Missing/row" subpage (e.g. Wikidata:Lexicographical coverage/nb/Missing/row). It is also possible to add things before and after the list, e.g. if you want the output to be a table, by editing the "Missing/head" and "Missing/foot" subpages.

Maklumat lanjut:

Perangkaan lanjut:

ar

  • Forms in Wikidata: 1,346
  • Forms in Wikipedia: 246,598
  • Tokens: 69,840,956
  • Covered forms: 210 (0.1%)
  • Missing forms: 246,388 (99.9%)
  • Covered tokens: 1,375,846 (2.0%)
  • Missing tokens: 68,465,110 (98.0%)
  • Most frequent missing forms

bg

  • Forms in Wikidata: 233
  • Forms in Wikipedia: 118,514
  • Tokens: 33,132,887
  • Covered forms: 200 (0.2%)
  • Missing forms: 118,314 (99.8%)
  • Covered tokens: 775,767 (2.3%)
  • Missing tokens: 32,357,120 (97.7%)
  • Most frequent missing forms

br

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 8,703
  • Forms in Wikipedia: 9,552
  • Tokens: 1,459,030
  • Covered forms: 1,719 (18.0%)
  • Missing forms: 7,833 (82.0%)
  • Covered tokens: 1,021,384 (70.0%)
  • Missing tokens: 437,646 (30.0%)
  • Most frequent missing forms

bn

(This analysis was performed separately from all the others on this page, using the corpus linked here and custom counting code.)

  • Forms in Wikidata: 46,504
  • Forms in Wikipedia: 5,34,894
  • Tokens: 1,33,06,025
  • Covered forms: 13,900 (2.60%)
  • Missing forms: 5,20,994 (97.40%)
  • Covered tokens: 50,79,352 (38.17%)
  • Missing tokens: 82,26,673 (61.83%)
  • Most frequent missing forms

bs

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 7
  • Forms in Wikipedia: 35,431
  • Tokens: 3,876,195
  • Covered forms: 4 (0.0%)
  • Missing forms: 35,427 (100.0%)
  • Covered tokens: 392 (0.0%)
  • Missing tokens: 3,875,803 (100.0%)
  • Most frequent missing forms

ca

  • Forms in Wikidata: 178
  • Forms in Wikipedia: 176,311
  • Tokens: 108,297,498
  • Covered forms: 130 (0.1%)
  • Missing forms: 176,181 (99.9%)
  • Covered tokens: 14,815,847 (13.7%)
  • Missing tokens: 93,481,651 (86.3%)
  • Most frequent missing forms

cs

  • Forms in Wikidata: 193,385
  • Forms in Wikipedia: 261,374
  • Tokens: 74,084,890
  • Covered forms: 46,136 (17.7%)
  • Missing forms: 215,238 (82.3%)
  • Covered tokens: 46,846,313 (63.2%)
  • Missing tokens: 27,238,577 (36.8%)
  • Most frequent missing forms

cy

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 125
  • Forms in Wikipedia: 10,844
  • Tokens: 1,442,683
  • Covered forms: 79 (0.7%)
  • Missing forms: 10,765 (99.3%)
  • Covered tokens: 27,189 (1.9%)
  • Missing tokens: 1,415,494 (98.1%)
  • Most frequent missing forms

da

  • Forms in Wikidata: 529,006
  • Forms in Wikipedia: 111,139
  • Tokens: 30,879,404
  • Covered forms: 56,467 (50.8%)
  • Missing forms: 54,672 (49.2%)
  • Covered tokens: 28,024,135 (90.8%)
  • Missing tokens: 2,855,269 (9.2%)
  • Most frequent missing forms

de

  • Forms in Wikidata: 204,118
  • Forms in Wikipedia: 1,008,036
  • Tokens: 596,433,479
  • Covered forms: 108,779 (10.8%)
  • Missing forms: 899,257 (89.2%)
  • Covered tokens: 474,269,921 (79.5%)
  • Missing tokens: 122,163,558 (20.5%)
  • Most frequent missing forms

el

  • Forms in Wikidata: 38,901
  • Forms in Wikipedia: 129,276
  • Tokens: 40,452,744
  • Covered forms: 16,792 (13.0%)
  • Missing forms: 112,484 (87.0%)
  • Covered tokens: 18,421,523 (45.5%)
  • Missing tokens: 22,031,221 (54.5%)
  • Most frequent missing forms

en

  • Forms in Wikidata: 110,846
  • Forms in Wikipedia: 965,225
  • Tokens: 1,508,248,447
  • Covered forms: 76,257 (7.9%)
  • Missing forms: 888,968 (92.1%)
  • Covered tokens: 1,401,680,029 (92.9%)
  • Missing tokens: 106,568,418 (7.1%)
  • Most frequent missing forms

eo

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 5,759
  • Forms in Wikipedia: 27,201
  • Tokens: 4,222,541
  • Covered forms: 2,516 (9.2%)
  • Missing forms: 24,685 (90.8%)
  • Covered tokens: 2,458,658 (58.2%)
  • Missing tokens: 1,763,883 (41.8%)
  • Most frequent missing forms

es

  • Forms in Wikidata: 526,096
  • Forms in Wikipedia: 372,589
  • Tokens: 405,914,020
  • Covered forms: 87,693 (23.5%)
  • Missing forms: 284,896 (76.5%)
  • Covered tokens: 371,639,384 (91.6%)
  • Missing tokens: 34,274,636 (8.4%)
  • Most frequent missing forms

et

  • Forms in Wikidata: 2,637,156
  • Forms in Wikipedia: 123,073
  • Tokens: 16,832,892
  • Covered forms: 72,700 (59.1%)
  • Missing forms: 50,373 (40.9%)
  • Covered tokens: 13,668,062 (81.2%)
  • Missing tokens: 3,164,830 (18.8%)
  • Most frequent missing forms

eu

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 1,002,593
  • Forms in Wikipedia: 26,466
  • Tokens: 3,138,442
  • Covered forms: 16,173 (61.1%)
  • Missing forms: 10,293 (38.9%)
  • Covered tokens: 2,381,172 (75.9%)
  • Missing tokens: 757,270 (24.1%)
  • Most frequent missing forms

fa

  • Forms in Wikidata: 38,562
  • Forms in Wikipedia: 100,251
  • Tokens: 44,426,012
  • Covered forms: 8,601 (8.6%)
  • Missing forms: 91,650 (91.4%)
  • Covered tokens: 15,267,539 (34.4%)
  • Missing tokens: 29,158,473 (65.6%)
  • Most frequent missing forms

fi

  • Forms in Wikidata: 8,906
  • Forms in Wikipedia: 276,898
  • Tokens: 46,847,582
  • Covered forms: 5,120 (1.8%)
  • Missing forms: 271,778 (98.2%)
  • Covered tokens: 12,720,120 (27.2%)
  • Missing tokens: 34,127,462 (72.8%)
  • Most frequent missing forms

fr

  • Forms in Wikidata: 251,929
  • Forms in Wikipedia: 465,138
  • Tokens: 474,988,250
  • Covered forms: 54,526 (11.7%)
  • Missing forms: 410,612 (88.3%)
  • Covered tokens: 415,313,730 (87.4%)
  • Missing tokens: 59,674,520 (12.6%)
  • Most frequent missing forms

ha

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 1,174
  • Forms in Wikipedia: 4,816
  • Tokens: 859,259
  • Covered forms: 438 (9.1%)
  • Missing forms: 4,378 (90.9%)
  • Covered tokens: 304,017 (35.4%)
  • Missing tokens: 555,242 (64.6%)
  • Most frequent missing forms

he

  • Forms in Wikidata: 328,190
  • Forms in Wikipedia: 249,890
  • Tokens: 76,643,376
  • Covered forms: 54,458 (21.8%)
  • Missing forms: 195,432 (78.2%)
  • Covered tokens: 41,906,877 (54.7%)
  • Missing tokens: 34,736,499 (45.3%)
  • Most frequent missing forms

hi

  • Forms in Wikidata: 7,585
  • Forms in Wikipedia: 54,443
  • Tokens: 18,734,831
  • Covered forms: 3,050 (5.6%)
  • Missing forms: 51,393 (94.4%)
  • Covered tokens: 12,433,553 (66.4%)
  • Missing tokens: 6,301,278 (33.6%)
  • Most frequent missing forms

hr

  • Forms in Wikidata: 4,868
  • Forms in Wikipedia: 135,627
  • Tokens: 28,543,040
  • Covered forms: 2,816 (2.1%)
  • Missing forms: 132,811 (97.9%)
  • Covered tokens: 13,433,679 (47.1%)
  • Missing tokens: 15,109,361 (52.9%)
  • Most frequent missing forms

hu

  • Forms in Wikidata: 154
  • Forms in Wikipedia: 274,652
  • Tokens: 64,674,851
  • Covered forms: 100 (0.0%)
  • Missing forms: 274,552 (100.0%)
  • Covered tokens: 268,172 (0.4%)
  • Missing tokens: 64,406,679 (99.6%)
  • Most frequent missing forms

id

  • Forms in Wikidata: 392,850
  • Forms in Wikipedia: 100,137
  • Tokens: 40,049,055
  • Covered forms: 16,561 (16.5%)
  • Missing forms: 83,576 (83.5%)
  • Covered tokens: 22,919,883 (57.2%)
  • Missing tokens: 17,129,172 (42.8%)
  • Most frequent missing forms

ig

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 3,171
  • Forms in Wikipedia: 1,153
  • Tokens: 113,878
  • Covered forms: 421 (36.5%)
  • Missing forms: 732 (63.5%)
  • Covered tokens: 74,877 (65.8%)
  • Missing tokens: 39,001 (34.2%)
  • Most frequent missing forms

it

  • Forms in Wikidata: 405,463
  • Forms in Wikipedia: 341,080
  • Tokens: 284,500,580
  • Covered forms: 95,286 (27.9%)
  • Missing forms: 245,794 (72.1%)
  • Covered tokens: 260,323,554 (91.5%)
  • Missing tokens: 24,177,026 (8.5%)
  • Most frequent missing forms

ko

  • Forms in Wikidata: 495
  • Forms in Wikipedia: 290,844
  • Tokens: 34,282,183
  • Covered forms: 403 (0.1%)
  • Missing forms: 290,441 (99.9%)
  • Covered tokens: 2,449,289 (7.1%)
  • Missing tokens: 31,832,894 (92.9%)
  • Most frequent missing forms

la

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 778,655
  • Forms in Wikipedia: 11,551
  • Tokens: 1,031,544
  • Covered forms: 8,317 (72.0%)
  • Missing forms: 3,234 (28.0%)
  • Covered tokens: 884,683 (85.8%)
  • Missing tokens: 146,861 (14.2%)
  • Most frequent missing forms

lb

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 874
  • Forms in Wikipedia: 10,240
  • Tokens: 1,365,293
  • Covered forms: 369 (3.6%)
  • Missing forms: 9,871 (96.4%)
  • Covered tokens: 488,116 (35.8%)
  • Missing tokens: 877,177 (64.2%)
  • Most frequent missing forms

lt

  • Forms in Wikidata: 84
  • Forms in Wikipedia: 92,063
  • Tokens: 13,288,668
  • Covered forms: 39 (0.0%)
  • Missing forms: 92,024 (100.0%)
  • Covered tokens: 62,019 (0.5%)
  • Missing tokens: 13,226,649 (99.5%)
  • Most frequent missing forms

lv

  • Forms in Wikidata: 1,863
  • Forms in Wikipedia: 60,189
  • Tokens: 8,004,635
  • Covered forms: 1,111 (1.8%)
  • Missing forms: 59,078 (98.2%)
  • Covered tokens: 2,344,086 (29.3%)
  • Missing tokens: 5,660,549 (70.7%)
  • Most frequent missing forms

ml

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 748,786
  • Forms in Wikipedia: 28,789
  • Tokens: 1,992,352
  • Covered forms: 8,837 (30.7%)
  • Missing forms: 19,952 (69.3%)
  • Covered tokens: 1,074,456 (53.9%)
  • Missing tokens: 917,896 (46.1%)
  • Most frequent missing forms

ms

  • Forms in Wikidata: 4,239
  • Forms in Wikipedia: 51,515
  • Tokens: 16,143,659
  • Covered forms: 3,309 (6.4%)
  • Missing forms: 48,206 (93.6%)
  • Covered tokens: 11,829,507 (73.3%)
  • Missing tokens: 4,314,152 (26.7%)
  • Most frequent missing forms

mt

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 607
  • Forms in Wikipedia: 5,941
  • Tokens: 371,515
  • Covered forms: 296 (5.0%)
  • Missing forms: 5,645 (95.0%)
  • Covered tokens: 127,767 (34.4%)
  • Missing tokens: 243,748 (65.6%)
  • Most frequent missing forms

nan

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 42
  • Forms in Wikipedia: 5,791
  • Tokens: 1,119,898
  • Covered forms: 2 (0.0%)
  • Missing forms: 5,789 (100.0%)
  • Covered tokens: 63 (0.0%)
  • Missing tokens: 1,119,835 (100.0%)
  • Most frequent missing forms

nb

  • Forms in Wikidata: 156,060
  • Forms in Wikipedia: 153,555
  • Tokens: 49,620,256
  • Covered forms: 47,918 (31.2%)
  • Missing forms: 105,637 (68.8%)
  • Covered tokens: 44,362,321 (89.4%)
  • Missing tokens: 5,257,935 (10.6%)
  • Most frequent missing forms

nl

  • Forms in Wikidata: 921
  • Forms in Wikipedia: 260,266
  • Tokens: 130,343,371
  • Covered forms: 647 (0.2%)
  • Missing forms: 259,619 (99.8%)
  • Covered tokens: 38,067,221 (29.2%)
  • Missing tokens: 92,276,150 (70.8%)
  • Most frequent missing forms

nn

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 86,379
  • Forms in Wikipedia: 23,956
  • Tokens: 4,198,152
  • Covered forms: 8,735 (36.5%)
  • Missing forms: 15,221 (63.5%)
  • Covered tokens: 3,429,112 (81.7%)
  • Missing tokens: 769,040 (18.3%)
  • Most frequent missing forms

pa

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 15,629
  • Forms in Wikipedia: 21,156
  • Tokens: 4,611,923
  • Covered forms: 3,319 (15.7%)
  • Missing forms: 17,837 (84.3%)
  • Covered tokens: 3,380,564 (73.3%)
  • Missing tokens: 1,231,359 (26.7%)
  • Most frequent missing forms

pl

  • Forms in Wikidata: 18,922
  • Forms in Wikipedia: 333,225
  • Tokens: 117,356,732
  • Covered forms: 7,411 (2.2%)
  • Missing forms: 325,814 (97.8%)
  • Covered tokens: 43,226,962 (36.8%)
  • Missing tokens: 74,129,770 (63.2%)
  • Most frequent missing forms

pnb

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 15,681
  • Forms in Wikipedia: 21,465
  • Tokens: 5,029,117
  • Covered forms: 1,850 (8.6%)
  • Missing forms: 19,615 (91.4%)
  • Covered tokens: 2,438,667 (48.5%)
  • Missing tokens: 2,590,450 (51.5%)
  • Most frequent missing forms

pt

  • Forms in Wikidata: 34,520
  • Forms in Wikipedia: 214,847
  • Tokens: 158,056,230
  • Covered forms: 13,767 (6.4%)
  • Missing forms: 201,080 (93.6%)
  • Covered tokens: 118,278,858 (74.8%)
  • Missing tokens: 39,777,372 (25.2%)
  • Most frequent missing forms

ro

  • Forms in Wikidata: 56
  • Forms in Wikipedia: 119,245
  • Tokens: 40,889,103
  • Covered forms: 47 (0.0%)
  • Missing forms: 119,198 (100.0%)
  • Covered tokens: 370,880 (0.9%)
  • Missing tokens: 40,518,223 (99.1%)
  • Most frequent missing forms

ru

  • Forms in Wikidata: 912,866
  • Forms in Wikipedia: 651,825
  • Tokens: 290,067,562
  • Covered forms: 139,752 (21.4%)
  • Missing forms: 512,073 (78.6%)
  • Covered tokens: 177,467,539 (61.2%)
  • Missing tokens: 112,600,023 (38.8%)
  • Most frequent missing forms

sd

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 968
  • Forms in Wikipedia: 11,146
  • Tokens: 1,533,326
  • Covered forms: 80 (0.7%)
  • Missing forms: 11,066 (99.3%)
  • Covered tokens: 394,013 (25.7%)
  • Missing tokens: 1,139,313 (74.3%)
  • Most frequent missing forms

se

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 57,028
  • Forms in Wikipedia: 1,705
  • Tokens: 95,453
  • Covered forms: 50 (2.9%)
  • Missing forms: 1,655 (97.1%)
  • Covered tokens: 3,279 (3.4%)
  • Missing tokens: 92,174 (96.6%)
  • Most frequent missing forms

sk

  • Forms in Wikidata: 128,106
  • Forms in Wikipedia: 109,573
  • Tokens: 18,366,700
  • Covered forms: 46,034 (42.0%)
  • Missing forms: 63,539 (58.0%)
  • Covered tokens: 12,451,506 (67.8%)
  • Missing tokens: 5,915,194 (32.2%)
  • Most frequent missing forms

sl

  • Forms in Wikidata: 94
  • Forms in Wikipedia: 106,577
  • Tokens: 19,924,659
  • Covered forms: 76 (0.1%)
  • Missing forms: 106,501 (99.9%)
  • Covered tokens: 114,079 (0.6%)
  • Missing tokens: 19,810,580 (99.4%)
  • Most frequent missing forms

sr

  • Forms in Wikidata: 32
  • Forms in Wikipedia: 183,777
  • Tokens: 42,439,136
  • Covered forms: 23 (0.0%)
  • Missing forms: 183,754 (100.0%)
  • Covered tokens: 127,781 (0.3%)
  • Missing tokens: 42,311,355 (99.7%)
  • Most frequent missing forms

sv

  • Forms in Wikidata: 260,572
  • Forms in Wikipedia: 219,718
  • Tokens: 72,173,155
  • Covered forms: 66,994 (30.5%)
  • Missing forms: 152,724 (69.5%)
  • Covered tokens: 64,272,412 (89.1%)
  • Missing tokens: 7,900,743 (10.9%)
  • Most frequent missing forms

ta

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 5,790
  • Forms in Wikipedia: 31,721
  • Tokens: 2,539,025
  • Covered forms: 988 (3.1%)
  • Missing forms: 30,733 (96.9%)
  • Covered tokens: 265,649 (10.5%)
  • Missing tokens: 2,273,376 (89.5%)
  • Most frequent missing forms

tg

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 105
  • Forms in Wikipedia: 9,793
  • Tokens: 1,252,518
  • Covered forms: 32 (0.3%)
  • Missing forms: 9,761 (99.7%)
  • Covered tokens: 4,586 (0.4%)
  • Missing tokens: 1,247,932 (99.6%)
  • Most frequent missing forms

th

  • Forms in Wikidata: 15
  • Forms in Wikipedia: 27,089
  • Tokens: 2,068,858
  • Covered forms: 13 (0.0%)
  • Missing forms: 27,076 (100.0%)
  • Covered tokens: 4,870 (0.2%)
  • Missing tokens: 2,063,988 (99.8%)
  • Most frequent missing forms

tl

  • Forms in Wikidata: 27
  • Forms in Wikipedia: 20,893
  • Tokens: 3,583,109
  • Covered forms: 19 (0.1%)
  • Missing forms: 20,874 (99.9%)
  • Covered tokens: 21,296 (0.6%)
  • Missing tokens: 3,561,813 (99.4%)
  • Most frequent missing forms

tr

  • Forms in Wikidata: 2,118
  • Forms in Wikipedia: 151,341
  • Tokens: 30,211,406
  • Covered forms: 1,326 (0.9%)
  • Missing forms: 150,015 (99.1%)
  • Covered tokens: 6,790,230 (22.5%)
  • Missing tokens: 23,421,176 (77.5%)
  • Most frequent missing forms

uk

  • Forms in Wikidata: 238,597
  • Forms in Wikipedia: 356,409
  • Tokens: 114,386,141
  • Covered forms: 26,771 (7.5%)
  • Missing forms: 329,638 (92.5%)
  • Covered tokens: 16,775,930 (14.7%)
  • Missing tokens: 97,610,211 (85.3%)
  • Most frequent missing forms

ur

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 7,670
  • Forms in Wikipedia: 17,576
  • Tokens: 4,872,849
  • Covered forms: 1,364 (7.8%)
  • Missing forms: 16,212 (92.2%)
  • Covered tokens: 2,285,370 (46.9%)
  • Missing tokens: 2,587,479 (53.1%)
  • Most frequent missing forms

vi

  • Forms in Wikidata: 46
  • Forms in Wikipedia: 60,377
  • Tokens: 75,656,151
  • Covered forms: 29 (0.0%)
  • Missing forms: 60,348 (100.0%)
  • Covered tokens: 3,180,600 (4.2%)
  • Missing tokens: 72,475,551 (95.8%)
  • Most frequent missing forms