Wikidata:Item quality/Pilot analysis result

The below figure depicts the distribution between given quality scales by two different labelers and number of labelers who gave that quality scale.

Apparently, the labelers were agreed on the similar quality scales on 105 items (42%). The Cohen's kappa coefficient for the above distribution is 0.313. According to Landis JR and Koch GG, Cohen's kappa coefficient can be interpreted as:

Kappa Coefficient Definition
< 0 No agreement between two labelers
0 - 0.2 Slight agreement between two labelers
0.2 - 0.4 Fair agreement between two labelers
0.4 - 0.6 Moderate agreement between two labelers
0.6 - 0.8 Substantial agreement between two labelers
0.8 - 1.0 Almost perfect agreement between two labelers

Looking at the above table, it is safe to say that the labelers in the pilot campaign have fair agreement between each other.

However, note that pages such as Wikimedia Disambiguation pages, Wikinews articles, redirect pages are included in this analysis result. The next section describes the analysis result when we exclude the mentioned pages. To ease the naming of these pages, I call them as "unwanted pages".

Analysis Result (Excluding Unwanted Pages) edit

The below figure depicts the distribution between given quality scales by two different labelers and number of labelers who gave that quality scale, after excluding the unwanted pages.

 

After excluding the unwanted pages, the total number of sample is dropped to 223. The number of items which have similar quality scales is slightly decreased to 91 (41%). The Cohen's kappa coefficient becomes 0.309.

Looking at this result, it seems there is no much change when we exclude the unwanted pages.

From the above figure, I was interested to the spike which occurred on items which labeled as "A & D" and "B & D". I have tried to identify what these items are. I found that most of these items (19 out of 25) are biological related items (e.g. protein and microRNA). Hence, I think that labelers were confused when they grade these items because they typically do not have sitelinks but rich on external references. Perhaps, we should revise the criterion related to sitelinks in order to resolve this issue.