Wikidata:Lexicographical coverage

This page presents the lexicographical coverage of the Wikidata lexicographical data compared to a corpus of the given language. Unless the entry for the language says otherwise, the corpora are based on Wikipedia (available here).

These pages are updated weekly on Wednesdays by NikkiBot, although no edit will be made if nothing has changed.

The code for the bot is at https://github.com/nikkiwd/lexcover and is based on the original PAWS notebook by Denny. Report issues to Nikki (either on User talk:Nikki or on Telegram). Requests for additional languages, improvements and suggestions are also welcome.

Words can be filtered out by adding them to the "Filter" subpage for the language (e.g. Wikidata:Lexicographical coverage/nb/Filter) and the entries in the list can be customised, e.g. to add search links, by editing the "Missing/row" subpage (e.g. Wikidata:Lexicographical coverage/nb/Missing/row). It is also possible to add things before and after the list, e.g. if you want the output to be a table, by editing the "Missing/head" and "Missing/foot" subpages.

More information:

More statistics:

  • Forms in Wikidata: 1,321
  • Forms in Wikipedia: 246,598
  • Tokens: 69,840,956
  • Covered forms: 208 (0.1%)
  • Missing forms: 246,390 (99.9%)
  • Covered tokens: 1,375,457 (2.0%)
  • Missing tokens: 68,465,499 (98.0%)
  • Most frequent missing forms
  • Forms in Wikidata: 233
  • Forms in Wikipedia: 118,514
  • Tokens: 33,132,887
  • Covered forms: 200 (0.2%)
  • Missing forms: 118,314 (99.8%)
  • Covered tokens: 775,767 (2.3%)
  • Missing tokens: 32,357,120 (97.7%)
  • Most frequent missing forms

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 8,651
  • Forms in Wikipedia: 9,552
  • Tokens: 1,459,030
  • Covered forms: 1,707 (17.9%)
  • Missing forms: 7,845 (82.1%)
  • Covered tokens: 1,009,815 (69.2%)
  • Missing tokens: 449,215 (30.8%)
  • Most frequent missing forms

(This analysis was performed separately from all the others on this page, using the corpus linked here and custom counting code.)

  • Forms in Wikidata: 46,504
  • Forms in Wikipedia: 5,34,894
  • Tokens: 1,33,06,025
  • Covered forms: 13,900 (2.60%)
  • Missing forms: 5,20,994 (97.40%)
  • Covered tokens: 50,79,352 (38.17%)
  • Missing tokens: 82,26,673 (61.83%)
  • Most frequent missing forms

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 7
  • Forms in Wikipedia: 35,431
  • Tokens: 3,876,195
  • Covered forms: 4 (0.0%)
  • Missing forms: 35,427 (100.0%)
  • Covered tokens: 392 (0.0%)
  • Missing tokens: 3,875,803 (100.0%)
  • Most frequent missing forms
  • Forms in Wikidata: 178
  • Forms in Wikipedia: 176,311
  • Tokens: 108,297,498
  • Covered forms: 130 (0.1%)
  • Missing forms: 176,181 (99.9%)
  • Covered tokens: 14,815,847 (13.7%)
  • Missing tokens: 93,481,651 (86.3%)
  • Most frequent missing forms
  • Forms in Wikidata: 193,380
  • Forms in Wikipedia: 261,374
  • Tokens: 74,084,890
  • Covered forms: 46,136 (17.7%)
  • Missing forms: 215,238 (82.3%)
  • Covered tokens: 46,846,313 (63.2%)
  • Missing tokens: 27,238,577 (36.8%)
  • Most frequent missing forms

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 125
  • Forms in Wikipedia: 10,844
  • Tokens: 1,442,683
  • Covered forms: 79 (0.7%)
  • Missing forms: 10,765 (99.3%)
  • Covered tokens: 27,189 (1.9%)
  • Missing tokens: 1,415,494 (98.1%)
  • Most frequent missing forms
  • Forms in Wikidata: 527,906
  • Forms in Wikipedia: 111,139
  • Tokens: 30,879,404
  • Covered forms: 56,365 (50.7%)
  • Missing forms: 54,774 (49.3%)
  • Covered tokens: 28,008,930 (90.7%)
  • Missing tokens: 2,870,474 (9.3%)
  • Most frequent missing forms
  • Forms in Wikidata: 204,047
  • Forms in Wikipedia: 1,008,036
  • Tokens: 596,433,479
  • Covered forms: 108,758 (10.8%)
  • Missing forms: 899,278 (89.2%)
  • Covered tokens: 474,266,894 (79.5%)
  • Missing tokens: 122,166,585 (20.5%)
  • Most frequent missing forms
  • Forms in Wikidata: 38,901
  • Forms in Wikipedia: 129,276
  • Tokens: 40,452,744
  • Covered forms: 16,792 (13.0%)
  • Missing forms: 112,484 (87.0%)
  • Covered tokens: 18,421,523 (45.5%)
  • Missing tokens: 22,031,221 (54.5%)
  • Most frequent missing forms
  • Forms in Wikidata: 110,752
  • Forms in Wikipedia: 965,225
  • Tokens: 1,508,248,447
  • Covered forms: 76,243 (7.9%)
  • Missing forms: 888,982 (92.1%)
  • Covered tokens: 1,401,671,705 (92.9%)
  • Missing tokens: 106,576,742 (7.1%)
  • Most frequent missing forms

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 5,759
  • Forms in Wikipedia: 27,201
  • Tokens: 4,222,541
  • Covered forms: 2,516 (9.2%)
  • Missing forms: 24,685 (90.8%)
  • Covered tokens: 2,458,658 (58.2%)
  • Missing tokens: 1,763,883 (41.8%)
  • Most frequent missing forms
  • Forms in Wikidata: 523,788
  • Forms in Wikipedia: 372,589
  • Tokens: 405,914,020
  • Covered forms: 87,603 (23.5%)
  • Missing forms: 284,986 (76.5%)
  • Covered tokens: 371,630,501 (91.6%)
  • Missing tokens: 34,283,519 (8.4%)
  • Most frequent missing forms
  • Forms in Wikidata: 2,637,156
  • Forms in Wikipedia: 123,073
  • Tokens: 16,832,892
  • Covered forms: 72,700 (59.1%)
  • Missing forms: 50,373 (40.9%)
  • Covered tokens: 13,668,062 (81.2%)
  • Missing tokens: 3,164,830 (18.8%)
  • Most frequent missing forms

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 1,002,593
  • Forms in Wikipedia: 26,466
  • Tokens: 3,138,442
  • Covered forms: 16,173 (61.1%)
  • Missing forms: 10,293 (38.9%)
  • Covered tokens: 2,381,172 (75.9%)
  • Missing tokens: 757,270 (24.1%)
  • Most frequent missing forms
  • Forms in Wikidata: 38,669
  • Forms in Wikipedia: 100,251
  • Tokens: 44,426,012
  • Covered forms: 8,675 (8.7%)
  • Missing forms: 91,576 (91.3%)
  • Covered tokens: 15,439,001 (34.8%)
  • Missing tokens: 28,987,011 (65.2%)
  • Most frequent missing forms
  • Forms in Wikidata: 8,830
  • Forms in Wikipedia: 276,898
  • Tokens: 46,847,582
  • Covered forms: 5,106 (1.8%)
  • Missing forms: 271,792 (98.2%)
  • Covered tokens: 12,719,489 (27.2%)
  • Missing tokens: 34,128,093 (72.8%)
  • Most frequent missing forms
  • Forms in Wikidata: 251,898
  • Forms in Wikipedia: 465,138
  • Tokens: 474,988,250
  • Covered forms: 54,502 (11.7%)
  • Missing forms: 410,636 (88.3%)
  • Covered tokens: 415,303,686 (87.4%)
  • Missing tokens: 59,684,564 (12.6%)
  • Most frequent missing forms

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 1,163
  • Forms in Wikipedia: 4,816
  • Tokens: 859,259
  • Covered forms: 437 (9.1%)
  • Missing forms: 4,379 (90.9%)
  • Covered tokens: 303,991 (35.4%)
  • Missing tokens: 555,268 (64.6%)
  • Most frequent missing forms
  • Forms in Wikidata: 328,189
  • Forms in Wikipedia: 249,890
  • Tokens: 76,643,376
  • Covered forms: 54,458 (21.8%)
  • Missing forms: 195,432 (78.2%)
  • Covered tokens: 41,906,877 (54.7%)
  • Missing tokens: 34,736,499 (45.3%)
  • Most frequent missing forms
  • Forms in Wikidata: 7,546
  • Forms in Wikipedia: 54,443
  • Tokens: 18,734,831
  • Covered forms: 3,040 (5.6%)
  • Missing forms: 51,403 (94.4%)
  • Covered tokens: 12,433,105 (66.4%)
  • Missing tokens: 6,301,726 (33.6%)
  • Most frequent missing forms
  • Forms in Wikidata: 4,868
  • Forms in Wikipedia: 135,627
  • Tokens: 28,543,040
  • Covered forms: 2,816 (2.1%)
  • Missing forms: 132,811 (97.9%)
  • Covered tokens: 13,433,679 (47.1%)
  • Missing tokens: 15,109,361 (52.9%)
  • Most frequent missing forms
  • Forms in Wikidata: 154
  • Forms in Wikipedia: 274,652
  • Tokens: 64,674,851
  • Covered forms: 100 (0.0%)
  • Missing forms: 274,552 (100.0%)
  • Covered tokens: 268,172 (0.4%)
  • Missing tokens: 64,406,679 (99.6%)
  • Most frequent missing forms
  • Forms in Wikidata: 392,850
  • Forms in Wikipedia: 100,137
  • Tokens: 40,049,055
  • Covered forms: 16,561 (16.5%)
  • Missing forms: 83,576 (83.5%)
  • Covered tokens: 22,919,883 (57.2%)
  • Missing tokens: 17,129,172 (42.8%)
  • Most frequent missing forms

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 3,171
  • Forms in Wikipedia: 1,153
  • Tokens: 113,878
  • Covered forms: 421 (36.5%)
  • Missing forms: 732 (63.5%)
  • Covered tokens: 74,877 (65.8%)
  • Missing tokens: 39,001 (34.2%)
  • Most frequent missing forms
  • Forms in Wikidata: 405,316
  • Forms in Wikipedia: 341,080
  • Tokens: 284,500,580
  • Covered forms: 94,654 (27.8%)
  • Missing forms: 246,426 (72.2%)
  • Covered tokens: 259,212,966 (91.1%)
  • Missing tokens: 25,287,614 (8.9%)
  • Most frequent missing forms
  • Forms in Wikidata: 491
  • Forms in Wikipedia: 290,844
  • Tokens: 34,282,183
  • Covered forms: 401 (0.1%)
  • Missing forms: 290,443 (99.9%)
  • Covered tokens: 2,448,961 (7.1%)
  • Missing tokens: 31,833,222 (92.9%)
  • Most frequent missing forms

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 778,649
  • Forms in Wikipedia: 11,551
  • Tokens: 1,031,544
  • Covered forms: 8,317 (72.0%)
  • Missing forms: 3,234 (28.0%)
  • Covered tokens: 884,683 (85.8%)
  • Missing tokens: 146,861 (14.2%)
  • Most frequent missing forms

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 874
  • Forms in Wikipedia: 10,240
  • Tokens: 1,365,293
  • Covered forms: 369 (3.6%)
  • Missing forms: 9,871 (96.4%)
  • Covered tokens: 488,116 (35.8%)
  • Missing tokens: 877,177 (64.2%)
  • Most frequent missing forms
  • Forms in Wikidata: 84
  • Forms in Wikipedia: 92,063
  • Tokens: 13,288,668
  • Covered forms: 39 (0.0%)
  • Missing forms: 92,024 (100.0%)
  • Covered tokens: 62,019 (0.5%)
  • Missing tokens: 13,226,649 (99.5%)
  • Most frequent missing forms
  • Forms in Wikidata: 1,780
  • Forms in Wikipedia: 60,189
  • Tokens: 8,004,635
  • Covered forms: 1,048 (1.7%)
  • Missing forms: 59,141 (98.3%)
  • Covered tokens: 2,297,967 (28.7%)
  • Missing tokens: 5,706,668 (71.3%)
  • Most frequent missing forms

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 748,786
  • Forms in Wikipedia: 28,789
  • Tokens: 1,992,352
  • Covered forms: 8,837 (30.7%)
  • Missing forms: 19,952 (69.3%)
  • Covered tokens: 1,074,456 (53.9%)
  • Missing tokens: 917,896 (46.1%)
  • Most frequent missing forms
  • Forms in Wikidata: 4,170
  • Forms in Wikipedia: 51,515
  • Tokens: 16,143,659
  • Covered forms: 3,268 (6.3%)
  • Missing forms: 48,247 (93.7%)
  • Covered tokens: 11,802,729 (73.1%)
  • Missing tokens: 4,340,930 (26.9%)
  • Most frequent missing forms

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 605
  • Forms in Wikipedia: 5,941
  • Tokens: 371,515
  • Covered forms: 296 (5.0%)
  • Missing forms: 5,645 (95.0%)
  • Covered tokens: 127,767 (34.4%)
  • Missing tokens: 243,748 (65.6%)
  • Most frequent missing forms

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 42
  • Forms in Wikipedia: 5,791
  • Tokens: 1,119,898
  • Covered forms: 2 (0.0%)
  • Missing forms: 5,789 (100.0%)
  • Covered tokens: 63 (0.0%)
  • Missing tokens: 1,119,835 (100.0%)
  • Most frequent missing forms
  • Forms in Wikidata: 154,811
  • Forms in Wikipedia: 153,555
  • Tokens: 49,620,256
  • Covered forms: 47,736 (31.1%)
  • Missing forms: 105,819 (68.9%)
  • Covered tokens: 44,346,741 (89.4%)
  • Missing tokens: 5,273,515 (10.6%)
  • Most frequent missing forms
  • Forms in Wikidata: 912
  • Forms in Wikipedia: 260,266
  • Tokens: 130,343,371
  • Covered forms: 638 (0.2%)
  • Missing forms: 259,628 (99.8%)
  • Covered tokens: 38,051,158 (29.2%)
  • Missing tokens: 92,292,213 (70.8%)
  • Most frequent missing forms

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 85,080
  • Forms in Wikipedia: 23,956
  • Tokens: 4,198,152
  • Covered forms: 8,674 (36.2%)
  • Missing forms: 15,282 (63.8%)
  • Covered tokens: 3,421,951 (81.5%)
  • Missing tokens: 776,201 (18.5%)
  • Most frequent missing forms

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 15,596
  • Forms in Wikipedia: 21,156
  • Tokens: 4,611,923
  • Covered forms: 3,312 (15.7%)
  • Missing forms: 17,844 (84.3%)
  • Covered tokens: 3,380,398 (73.3%)
  • Missing tokens: 1,231,525 (26.7%)
  • Most frequent missing forms
  • Forms in Wikidata: 18,892
  • Forms in Wikipedia: 333,225
  • Tokens: 117,356,732
  • Covered forms: 7,399 (2.2%)
  • Missing forms: 325,826 (97.8%)
  • Covered tokens: 43,225,789 (36.8%)
  • Missing tokens: 74,130,943 (63.2%)
  • Most frequent missing forms

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 15,643
  • Forms in Wikipedia: 21,465
  • Tokens: 5,029,117
  • Covered forms: 1,844 (8.6%)
  • Missing forms: 19,621 (91.4%)
  • Covered tokens: 2,437,803 (48.5%)
  • Missing tokens: 2,591,314 (51.5%)
  • Most frequent missing forms
  • Forms in Wikidata: 34,520
  • Forms in Wikipedia: 214,847
  • Tokens: 158,056,230
  • Covered forms: 13,767 (6.4%)
  • Missing forms: 201,080 (93.6%)
  • Covered tokens: 118,278,858 (74.8%)
  • Missing tokens: 39,777,372 (25.2%)
  • Most frequent missing forms
  • Forms in Wikidata: 56
  • Forms in Wikipedia: 119,245
  • Tokens: 40,889,103
  • Covered forms: 47 (0.0%)
  • Missing forms: 119,198 (100.0%)
  • Covered tokens: 370,880 (0.9%)
  • Missing tokens: 40,518,223 (99.1%)
  • Most frequent missing forms
  • Forms in Wikidata: 912,842
  • Forms in Wikipedia: 651,825
  • Tokens: 290,067,562
  • Covered forms: 139,743 (21.4%)
  • Missing forms: 512,082 (78.6%)
  • Covered tokens: 177,464,353 (61.2%)
  • Missing tokens: 112,603,209 (38.8%)
  • Most frequent missing forms

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 968
  • Forms in Wikipedia: 11,146
  • Tokens: 1,533,326
  • Covered forms: 80 (0.7%)
  • Missing forms: 11,066 (99.3%)
  • Covered tokens: 394,013 (25.7%)
  • Missing tokens: 1,139,313 (74.3%)
  • Most frequent missing forms

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 57,028
  • Forms in Wikipedia: 1,705
  • Tokens: 95,453
  • Covered forms: 50 (2.9%)
  • Missing forms: 1,655 (97.1%)
  • Covered tokens: 3,279 (3.4%)
  • Missing tokens: 92,174 (96.6%)
  • Most frequent missing forms
  • Forms in Wikidata: 128,106
  • Forms in Wikipedia: 109,573
  • Tokens: 18,366,700
  • Covered forms: 46,034 (42.0%)
  • Missing forms: 63,539 (58.0%)
  • Covered tokens: 12,451,506 (67.8%)
  • Missing tokens: 5,915,194 (32.2%)
  • Most frequent missing forms
  • Forms in Wikidata: 94
  • Forms in Wikipedia: 106,577
  • Tokens: 19,924,659
  • Covered forms: 76 (0.1%)
  • Missing forms: 106,501 (99.9%)
  • Covered tokens: 114,079 (0.6%)
  • Missing tokens: 19,810,580 (99.4%)
  • Most frequent missing forms
  • Forms in Wikidata: 32
  • Forms in Wikipedia: 183,777
  • Tokens: 42,439,136
  • Covered forms: 23 (0.0%)
  • Missing forms: 183,754 (100.0%)
  • Covered tokens: 127,781 (0.3%)
  • Missing tokens: 42,311,355 (99.7%)
  • Most frequent missing forms
  • Forms in Wikidata: 260,534
  • Forms in Wikipedia: 219,718
  • Tokens: 72,173,155
  • Covered forms: 66,988 (30.5%)
  • Missing forms: 152,730 (69.5%)
  • Covered tokens: 64,271,303 (89.1%)
  • Missing tokens: 7,901,852 (10.9%)
  • Most frequent missing forms

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 5,790
  • Forms in Wikipedia: 31,721
  • Tokens: 2,539,025
  • Covered forms: 988 (3.1%)
  • Missing forms: 30,733 (96.9%)
  • Covered tokens: 265,649 (10.5%)
  • Missing tokens: 2,273,376 (89.5%)
  • Most frequent missing forms

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 116
  • Forms in Wikipedia: 9,793
  • Tokens: 1,252,518
  • Covered forms: 32 (0.3%)
  • Missing forms: 9,761 (99.7%)
  • Covered tokens: 4,586 (0.4%)
  • Missing tokens: 1,247,932 (99.6%)
  • Most frequent missing forms
  • Forms in Wikidata: 15
  • Forms in Wikipedia: 27,089
  • Tokens: 2,068,858
  • Covered forms: 13 (0.0%)
  • Missing forms: 27,076 (100.0%)
  • Covered tokens: 4,870 (0.2%)
  • Missing tokens: 2,063,988 (99.8%)
  • Most frequent missing forms
  • Forms in Wikidata: 27
  • Forms in Wikipedia: 20,893
  • Tokens: 3,583,109
  • Covered forms: 19 (0.1%)
  • Missing forms: 20,874 (99.9%)
  • Covered tokens: 21,296 (0.6%)
  • Missing tokens: 3,561,813 (99.4%)
  • Most frequent missing forms
  • Forms in Wikidata: 2,111
  • Forms in Wikipedia: 151,341
  • Tokens: 30,211,406
  • Covered forms: 1,323 (0.9%)
  • Missing forms: 150,018 (99.1%)
  • Covered tokens: 6,789,825 (22.5%)
  • Missing tokens: 23,421,581 (77.5%)
  • Most frequent missing forms
  • Forms in Wikidata: 238,597
  • Forms in Wikipedia: 356,409
  • Tokens: 114,386,141
  • Covered forms: 26,771 (7.5%)
  • Missing forms: 329,638 (92.5%)
  • Covered tokens: 16,775,930 (14.7%)
  • Missing tokens: 97,610,211 (85.3%)
  • Most frequent missing forms

These statistics use corpus data from the Leipzig Corpora Collection.

  • Forms in Wikidata: 7,630
  • Forms in Wikipedia: 17,576
  • Tokens: 4,872,849
  • Covered forms: 1,357 (7.7%)
  • Missing forms: 16,219 (92.3%)
  • Covered tokens: 2,284,826 (46.9%)
  • Missing tokens: 2,588,023 (53.1%)
  • Most frequent missing forms
  • Forms in Wikidata: 46
  • Forms in Wikipedia: 60,377
  • Tokens: 75,656,151
  • Covered forms: 29 (0.0%)
  • Missing forms: 60,348 (100.0%)
  • Covered tokens: 3,180,600 (4.2%)
  • Missing tokens: 72,475,551 (95.8%)
  • Most frequent missing forms