User:Periglio/Biography

I have a personal project involving birth and death dates, for which I have used Persondata as my source. As that has been deprecated, I am now switching to Wikidata. I maintain a local database which currently holds 1,186,238 records.

Source of data edit

  • Wikidata
    • Name
    • Sex
    • Nationality
    • Occupation
    • Wikilink (EN)
    • Date of birth
    • Date of death
  • Wikipedia
    • Primary index
    • Secondary index
    • Still alive

My current database consists of all Wikipedia articles containing the Persondata template. In time, this will become all items with the property P31 (instance of) - Q5 (human). This will be subject to analysis of results. Primary and secondary indexes are a proposed property for Wikidata, in the meantime this will be picked up from Wikipedias DEFAULTSORT.

Still alive is a flag that indicates that the person has died, although a death date may be unknown. Wikipedia has a living person category - the only equivalent on Wikidata appears to be an unspecified death date. Not sure at this stage if this is standardised.

Stage 1 - Look up all Q codes for existing database entries edit

Currently in progress. Running slowly as I analyse the results to see the quality of information. I find a lot of enwiki articles that are now redirects, these are corrected and put back in the work queue.

At the end of this stage, I will have a list of enwiki articles that are not represented on wikidata.

Update 25 January 2015 edit

1,188,131 records from Persondata - 686,279 (58%) checked so far - 1,790 not found on Wikidata

  • Current prediction: 3,099 Persondata records without a link to Wikidata

Stage 2 Extract birth/death from Wikidata edit

This is being worked on now. The intention is to extract dates from Wikidata into my personal database. Some validation will be done at this stage such as making sure the dates shows a reasonable life span.

+ Fields that correspond to Wikidata
Code Label Sample
P19 Place of birth London
P20 Place of death Norwich
P21 sex or gender female
P27 country of citizenship United Kingdom
P31 instance of human
P106 occupation politician
P569 date of birth 07 September 1533
P570 date of death 24 March 1603

Update 25 January 2015 edit

My software now grabs a block of random people from my database and extracts the fields I am interested in: Label, gender, nationality, date of birth, data of death and enwiki link. Up to now I have been doing small runs, discovering the nuances of Wikidata - eg records with multiple birth dates, twins!, "somevalue" entries. Today, I have run my first 1000 record update.

Out of 1000 records, I found 116 where my Persondata does not match.

  • 70 Date on Wikidata, not in Persondata
  • 28 Date on Persondata, not on Wikidata
  • 30 Dates that differ (some have different precision, some are just wrong)

I am concentrating on the question, should Persondata be copied to Wikidata. Of the 28 dates that would have been copied, only 25 were valid. Although it is a relatively small sample size, extrapolated up to the full database of 1187996 records, this gives 33264 records that could be copied, but it includes 3564 dodgy ones!

At the moment, I am validating only what I find on Wikidata, the list below are my current errors messages.

  • D001_Missing Instance of
  • D002_Instance of {0}, should be human
  • D003_No gender specified
  • D004_Gender of {0} expecting male or female
  • D005_No citizenship specified
  • D006_More than one citzenship found
  • D007_Multiple instances
  • D008_Missing {0} label
  • D009_Missing {0} description
  • D010_Multiple birth dates
  • D011_Multiple death dates
  • D012_Future birth
  • D013_Future death
  • D014_Date of birth too recent
  • D015_Lived to too great an age
  • D016_Age too young
  • D017_Too old for no death date
  • D018_Birth date after death date
  • W001_Unbalanced HTML comment found
  • W002_Article had no defaultsort
  • W003_Cannot extract defaultsort
  • W004_Unbalanced template brackets
  • W005_Unbalanced category brackets
  • W006_Birth category does not match
  • W007_Birth category found, no wikidata dob
  • W008_Death category does not match
  • W009_Death category fund, no wikidata dod
  • W010_Living people category on a dead person
  • W011_Living person without Living people category
  • W012_No birth year and no explanation
  • W013_No birth date and no explanation
  • W014_No death year and no explanation
  • W015_No death date and no explanation

Stage 3 edit

This will incorporate the validation from my proddata project. Wikidata will be compared against the information on the associated enwiki page. For example, mismatched categories will be flagged for investigation.

Once this stage is in operation and showing that wikidata may be relied upon, I will be happy to stop my objection to the removal of Persondata. Although with my experience in security printing, using overt and covert anti-copy techniques, having hidden birth/death details is a way to catch vandalism.

Stage 4 edit

Probably expand into using other language wikipediae.