Wikidata:WikiProject Journalists

The goal of this project is to organize and add information about journalists to Wikidata. Wikimedia relies on high quality sources to use as references. Many of these sources are written by people who are not in Wikidata. How can we help fix that?


For now we will simply pose some questions and fill them in as we go.

Existence? edit

Should this project exist or are there other projects that it should be merged with?

Current State edit

What is the current state of journalist data in Wikidata?

What are the most common items and properties? edit

Items edit

Properties edit

What are some useful SPARQL queries that can be used to assess the current data? edit

What does the ontology look like? edit

Sub-classes of journalist graph

https://angryloki.github.io/wikidata-graph-builder/?property=P279&item=Q1930187&mode=reverse

What is the Gender gap? edit

Gender gap report from Denelezh

SQID Report on Q1930187 (journalist) edit

Reasonator Report on Q1930187 (journalist) edit

Scholia edit

Scholia is a linked data project focused on academic publications but sometimes is able to generate interesting reports for journalists. A good showcase of what is possible if enough journalist linked data is put into Wikidata. For example,

Deduplication and Record Linkage edit

There are already on the order of 100k journalists in Wikidata. Any attempt to add new data in bulk will need to resolve collisions between incoming and existing journalists. This is a very common problem and solutions typically involve Record linkage. Can we design record linkage solutions for existing databases? Will we need a custom record linkage model for each database we try to incorporate or are there common features that we can use across multiple databases?

Currently the closest thing with have to a unique id for them is their twitter ID!

OpenRefine edit

The OpenRefine tool is one well supported method of doing this.

Existing Databases edit

What existing databases of journalists exist and how can we integrate their data?

Muck Rack edit

Good visability on google, seems to have a page for every journalist and on that page has summary of who they written for, excerpts of thie work, links to thier social media and thier twitter feed.

The jounalist can take ownership of each page and corrections are delt with via a chat mechnaism that can actioned with a few hours.

The unique ID of the page is proprietarty and the links they show are to properietry sites too such as twitter, I would like to see them add and use a open cross-platform id

The Factual edit

Not publicly available or offered commercially, but they maintain an internal database of journalists.

Standards edit

How should journalism data be structured? edit

What information do we need about journalists, publishers, newspapers?

What is the best way to handle freelancers?

Should we link news sites to their ratings on

Can we incorporate data from the Wikipedia:Reliable_sources/Perennial_sources or the other way around?

Is there a uniqueid for us to use for each jounalist from an open and independant organisation edit

for example instead of a "twitter handle" which has become the de-facto "uniqueID" it should be something like :-

  • Integrated Authority File, ISNI,VIAF or Worldcat

Also judging by how disorganised most jouranlist media presence can be, the id will have needed to be given them automatically rather than something they had to apply for.

Can we develop ShEx expressions that encapsulate these expectations? edit

How should we handle referencing? edit

Related External Projects edit

JournalList edit

"A Networked List of News Publishers" --https://journallist.net/

Managers of the trust.txt framework (see this video for a short introduction and comparison to existing "txt" solutions such robots.txt and ads.txt.

Related Wikimedia Projects and Sites edit

What lessons can we learn from existing projects? How can we collaborate?

Tools edit