User:PAC2/Gender diversity

Home

Gender diversity in Wikipedia articles is a project about measuring gender diversity in Wikipedia articles.

Gender bias is a great challenge for the Wikipedia community. Humaniki provides a global overview about gender imbalance in Wikipedia's project[1]. The share of biographies by gender is a very useful statistic. But we need to go further. What if women are simply not cited in a general article ?

My idea is very simple : takes the list of articles cited in an article (aka blue links), if they concern people, get the gender using sex or gender (P21) and compute gender statistics for the whole article.

Interpretation may be difficult. No one knows the fair share of women in an article. However, when the share of women is really low, we may have forgotten some women and it's worth to have a look at the article and see if we can reduce gender imbalance.

Contributions are welcome. Leave a message on the talk page if you want to suggest any improvement.

This project has been awarded in the WikidataCon Community Awards 2021 in the category Gender Equality.

Insights edit

 
Share of women in articles about occupations in Wikipedia in French[2]

Tools edit

Discussions edit

Methodology edit

Measuring gender diversity using Wikimedia API and SPARQL query edit

In this section we explore several SPARQL queries to get gender diversity.

Those queries can be run directly in Jupyter notebook using PAWS.

List of all links with gender edit

SELECT ?item ?itemLabel ?gender ?genderLabel 
WHERE {
  SERVICE wikibase:mwapi {
     bd:serviceParam wikibase:endpoint "en.wikipedia.org";
                     wikibase:api "Generator";
                     mwapi:generator "links";
                     mwapi:titles "Economics";.
     ?item wikibase:apiOutputItem mwapi:item.
  } 
  FILTER BOUND (?item)
  ?item wdt:P31 wd:Q5 . 
  ?item wdt:P21 ?gender . 
  ?item rdfs:label ?itemLabel filter (lang(?itemLabel) = "en") . 
  ?gender rdfs:label ?genderLabel filter (lang(?genderLabel) = "en") .
}
ORDER BY ?gender
Try it!

Simple count edit

SELECT ?gender ?genderLabel (COUNT(*) AS ?count) 
WHERE {
  SERVICE wikibase:mwapi {
     bd:serviceParam wikibase:endpoint "en.wikipedia.org";
                     wikibase:api "Generator";
                     mwapi:generator "links";
                     mwapi:titles "Economics";.
     ?item wikibase:apiOutputItem mwapi:item.
  } 
  FILTER BOUND (?item)
  ?item wdt:P31 wd:Q5 . 
  ?item wdt:P21 ?gender . 
  ?gender rdfs:label ?genderLabel filter (lang(?genderLabel) = "en") .
}
GROUP BY ?gender ?genderLabel
ORDER BY DESC(?count)
Try it!

Computing the share of males, females and non binary people edit

This query compute share of women, men, intersexual and non-binary people in the article "Economics". I group together "transgender male" with "male" and "transgender female" with "female".

Caveat : the ROUND function rounds to unity and not to decimal.

SELECT 
(SUM(?female) AS ?count_females)
(SUM(?male) AS ?count_males)

(SUM(?nonbinary) AS ?count_nonbinary) 
(SUM(?intersexual) AS ?count_intersexual) 
(COUNT(*) AS ?count) 
(ROUND(100 * ?count_females / ?count) AS  ?share_females) 
(ROUND(100 * ?count_males / ?count) AS  ?share_males) 
(ROUND(100 * ?count_nonbinary / ?count) AS  ?share_nonbinary)
(ROUND(100 * ?count_intersexual / ?count) AS ?share_intersexual)

{
  SERVICE wikibase:mwapi {
     bd:serviceParam wikibase:endpoint "en.wikipedia.org";
                     wikibase:api "Generator";
                     mwapi:generator "links";
                     mwapi:titles "Economics";.
     ?item wikibase:apiOutputItem mwapi:item.
  } 
  FILTER BOUND (?item)
  ?item wdt:P31 wd:Q5 . 
  ?item wdt:P21 ?gender . 
  BIND(IF(?gender IN(wd:Q6581097, wd:Q2449503), 1, 0) AS ?male ) 
  BIND(IF(?gender IN(wd:Q6581072, wd:Q1052281), 1, 0 ) AS ?female)
  BIND(IF(?gender = wd:Q48270, 1, 0) AS ?nonbinary) 
  BIND(IF(?gender = wd:Q1097630, 1,0) AS ?intersexual)
}
Try it!

Comparing different articles edit

# This query takes a list of article in Wikipedia, analyse the gender of all entities cited in the article and count the share of males, females and non binary.
# The goal is to measure gender diversity inside wikipedia articles
# Feedback and comments are welcome on my talk page User:PAC2
SELECT ?article
(SUM(?female) AS ?count_females) 
(SUM(?male) AS ?count_males)
(SUM(?nonbinary) AS ?count_nonbinary) 
(SUM(?intersexual) AS ?count_intersexual) 
(COUNT(*) AS ?count) 
(ROUND(100 * ?count_females / ?count) AS  ?share_females) 
(ROUND(100 * ?count_males / ?count) AS  ?share_males) 
(ROUND(100 * ?count_nonbinary / ?count) AS  ?share_nonbinary)
(ROUND(100 * ?count_intersexual / ?count) AS ?share_intersexual)
{
  VALUES ?article {
  "Anthropology"
  "Philosophy"
  "Economics"
  "Sociology"
  "Demography"
  }
  SERVICE wikibase:mwapi {
     bd:serviceParam wikibase:endpoint "en.wikipedia.org";
                     wikibase:api "Generator";
                     mwapi:generator "links";
                     mwapi:titles ?article;.
     ?item wikibase:apiOutputItem mwapi:item.
  } 
  FILTER BOUND (?item)
  ?item wdt:P31 wd:Q5 . 
  ?item wdt:P21 ?gender . 
  BIND(IF(?gender IN(wd:Q6581097, wd:Q2449503), 1, 0) AS ?male ) 
  BIND(IF(?gender IN(wd:Q6581072, wd:Q1052281), 1, 0 ) AS ?female)
  BIND(IF(?gender = wd:Q48270, 1, 0) AS ?nonbinary) 
  BIND(IF(?gender = wd:Q1097630, 1,0) AS ?intersexual)
}
GROUP BY ?article
Try it!

Related work edit

Isaac (WMF) has developed something similar with a user script (w:User:Isaac (WMF)/link gender, w:User:Isaac (WMF)/link gender.js) which calls an API (https://article-gender-data.wmcloud.org/api/v1/out links-details).

Gendered News is a research project measuring gender diversity in French newspapers. It uses first names to compute the probability of having a male/female and returns gender counts at the article level.

OpenSexism has created the Wednesday index, a twitter thread which measures each wednesday gender diversity in 26 Wikipedia articles[3]. OpenSexism has published an article about the wednesday index[4].

Dsp13 has created a new wiki page to improve gendered citation statistics in Wikipedia : w:User:Dsp13/Gendered citation bias.

“If you start at any given article on Wikipedia, you're much less likely to eventually reach an article about a woman artist than you are about a male artist – and this was true for women across the board.[5]

References edit

See also edit