Wikidata talk:WikiProject 20th Century Press Archives/Tools & tasks
Latest comment: 5 years ago by Jneubert in topic Best strategy to add new items derived from PM20?
Best strategy to add new items derived from PM20?
edit- Producing all available properties in one script,
- (current implementation for companies with type/official-name/incepted/abandoned/gnd-id in companies_missing_in_wikidata.rq (also restricts result to entries checked with Mix-and-match) and script create_missing_wikidata.pl)
- or alternatively, adding items with a
- very basic query/script covering only label(s), perhaps description(s), type and pm20 id,
- complemented by many queries/scripts, which will add a single property for all linked items (already existing or created from PM20) lacking that property.
- (example implementation for organization type (P31) in missing_class_via_pm20.rq)
The second strategy requires more queries/scripts, more invocations and better documentation. However, it allows for progressive improvement (e.g., incrementally mapping more and more professions). Additionally, it can be combined with manual addition of items (e.g., via mix-n-match), where also very few properties are populated automatically.
-- Jneubert (talk) 11:42, 28 April 2019 (UTC)
- Implementation: One script, multiple queries, multiple invocations for enhancement
- One script (add_missing_wikidata.pl), which can be called in 'create' or 'enhance' mode. For item creation, it uses a query such as persons_missing_in_wikidata.rq. For enhancing Wikidata one property at a time, the script uses either a generic query (property_missing_in_wikidata.rq - works, if the source property can be addressed directly or via a property path), or a configuratble property-specific query. Jneubert (talk) 10:04, 6 May 2019 (UTC)
- Examples
- Script invocation with add_missing_wikidata.pl pm20_pe create and running the lines below in QS added Hagenbeck (Q63527827):
CREATE LAST|Lde|"Hagenbeck" LAST|Len|"Hagenbeck" LAST|Dde|"Hamburger Zoodirektoren-Familie" LAST|Den|"family" LAST|P4293|"pe/006937" LAST|P31|Q8436|S248|Q36948990|S4293|"pe/006937"|S1810|"Hagenbeck <Familie>"|S813|+2019-05-06T00:00:00Z/11 LAST|P227|"118700537"|S248|Q36948990|S4293|"pe/006937"|S1810|"Hagenbeck <Familie>"|S813|+2019-05-06T00:00:00Z/11
- Script invocation with add_missing_wikidata.pl pm20_pe enhance P227 and running the line below in QS adds GND ID to Georges Ibrahim Abdallah (Q3102918):
Q3102918|P227|"118848658"|S248|Q36948990|S4293|"pe/000021"|S1810|"Abdallah, Georges Ibrahim"|S813|+2019-05-06T00:00:00Z/11
Data corrections
editErrors which should be fixed in the upstream IFIS database.
PM20 companies en
edit- Done http://purl.org/pressemappe20/folder/co/070866 muß heißen: Millom and Askam Hematite Iron Co. Ltd.
- Done http://purl.org/pressemappe20/folder/co/065801 'Benutze für' muß heißen: Lonrho Ltd.
- Done http://purl.org/pressemappe20/folder/co/049888 : Bernam-Perak Rubber Plantation
- Done http://purl.org/pressemappe20/folder/co/070878 : Moabund Tea Company
- Done http://purl.org/pressemappe20/folder/co/067119: Glendon Rubber
- Done http://purl.org/pressemappe20/folder/co/061599 : Jeremiah Rotherham & Co
- Done http://purl.org/pressemappe20/folder/co/071980: Scottish Tea and Lands Company of Ceylon
- Done http://purl.org/pressemappe20/folder/co/050054: Blue Funnel Line
- Done http://purl.org/pressemappe20/folder/co/072148: Sengat Rubber Estate
- Done http://purl.org/pressemappe20/folder/co/068673: English Association of American Bond and Share Holders
- Done http://purl.org/pressemappe20/folder/co/019860: J. Henry Schroder & Co
- Done http://purl.org/pressemappe20/folder/co/050404: ... Goldfields ...
- Done http://purl.org/pressemappe20/folder/co/022735 UDS-Group -> UDS Group
PM20 companies de
edit- Done http://purl.org/pressemappe20/folder/co/045446 Tiefbau-Berufsgenossenschaft (ohne Leerzeichen)
- Done http://purl.org/pressemappe20/folder/co/005405 Start- und Endezeitpunkt löschen
- Done Handelsbank Leipzig doppelt?
- Handelsbank AG - umgesetzt in Leipziger Handels- und Verkehrs-Bank AG 38 Artikel, 1919 - 1942 (bis zur Namensänderung Anfang 1942)
- Handelsbank AG (Leipzig) 12 Artikel, 1942-1943
- zwei WD-Items
- Done http://purl.org/pressemappe20/folder/co/010208 -> Metrawatt AG
- "Yarrow & Co" -> co/070438 umfasst nur 24 der 80 genannten Dokumente. Für den Rest habe ich eine neue Fa eingeführt: "The Yorkshire Electric Power Co." -> http://purl.org/pressemappe20/folder/co/070438. Ein Schneidefehler bei der Mappenbildung. (maxwan, 6.8.21)
PM20 companies fr
edit- Done http://purl.org/pressemappe20/folder/co/004828 Startzeitpunkt löschen
Wrong GND in IFIS
editWrong predecessor / successor relation
edit- http://purl.org/pressemappe20/folder/co/023922 -> http://purl.org/pressemappe20/folder/co/010042, http://purl.org/pressemappe20/folder/co/011919