Wikidata:WikiProject 20th Century Press Archives/Tools & tasks
Tools
editon Toolforge
edit- https://pm20-search.toolforge.org - Fulltext search for folders
- https://pm20-report.toolforge.org - Lists and Reports
- https://pm20-filmlink - Link PM20 film sections
other
edit- Wikidata SPARQL endpoint and according queries
- PM20 SPARQL endpoint and according queries
- Classifications used for subjects and wares (via Skosmos, in German):
- Mix-n-match catalogs:
- PM20 folder ID (10/2017)
- PM20 corp de (10/2017)
- PM20 corp international (10/2017)
- PM20 newspapers (zdb) (09/2019)
Task: Link individual PM20 folders to Wikidata
editOften, the titles of PM20 folders can be matched only one by one, through manual lookup or browsing, to Wikidata items. Tools like Mix-n-match do not work well for certain types of folders. This is particularly true for the folders in the subject archives (Länder-/Sacharchiv). Therefore a short description of the according manual workflow here:
- Search or discover the folder via the web application or via lists of folders, e.g. from the country/topics archive (sortable and filterable folder list).
- We use the folder Deutschland (bis 1945) : Enteignung von Juden, Arisierung (1933-1945) (Expropriation of Jews in Germany 1933-1945, Aryanization) as example here.
- Copy the persistent link of the folder, which underlays the icon "Mappen-Zitier-Link". This is normally done by right-click and "Copy link address" (or similar named function) in the browser.
- Search the fitting Wikidata item - e.g. Aryanization (Q664017).
- Go to bottom of the item page and click "add statement".
- Start typing "pm20 folder" in the Property input box and select "PM 20 folder ID".
- Paste the persistent URL copied in step 2 and shorten it (e.g., from http://purl.org/pressemappe20/folder/sh/126128,208307 to "sh/126128,208307").
- Click "publish".
Add links to Wikipedia, if appropriate
editSometimes, PM20 folders may be a valuable external complement to Wikipedia articles. At the bottom, or the right side column, of the Wikidata item, links to Wikipedias in different languages are displayed. For each Wikipedia, there are rules on when and how to add external links - please check them carefully.
English Wikipedia
edit- See the rules on w:Wikipedia:External links.
- In order to be able to receive feedback on your edits, log into Wikipedia as a named user.
- If the folder contents looks like a valuable addition to the according WP article, edit its "External links" section (or add
== External links ==
, at the article bottom, but above categories and the like) - see example. - Use the
PM20
template with the folder ID described above, e.g.
* {{PM20|FID=sh/126128,208307}}
- for adding an link. By default, the WP article name is inserted into the link. If this does not fit well, you can insert an additional
|NAME=...
into the curly bracket with a better fitting description of the folder content.
- Adding a short description of your edit in the "Summary" field helps watchers of the article.
German Wikipedia
edit- See the rules at de:Wikipedia:Weblinks.
- In order to be able to receive feedback on your edits, log into Wikipedia as a named user.
- If the folder contents looks like a valuable addition to the according WP article, edit its "Weblinks" section (or add == Weblinks ==, at the article bottom, but above the section "Einzelnachweise" (individual citations), categories and the like) - see example.
- Use the
Pressemappe
template with the folder ID described above, e.g.
* {{Pressemappe|FID=sh/126128,208307}}
- for adding an link. By default, the WP article name is inserted into the link. If this does not fit well, you can insert an additional
|NAME=...
into the curly bracket with a better fitting description of the folder content.
- Adding a short description of your edit in the "Zusammenfassung und Quellen" field helps watchers of the article.
Regular maintenance tasks
editAdd PM20 ID via GND ID ("pm20 via gnd")
editHas been run initially for 1600+ IDs. If GND IDs were inserted into Wikidata items which are known in not-yet-linked PM20 folders, we can automatically add the PM20 ID to the item.
cd /opt/sparql-queries/bin perl make_qs_input.pl ../wikidata/missing_pm20_id_via_gnd.rq qsStatement
The query and the script are available on Github.
Set qualifiers ("pm20 folder name" / "pm20 doc count")
editQuickStatements input files for subject named as (P1810), number of works (P3740) and number of works accessible online (P5592) are generated via
cd /opt/sparql-queries/bin perl make_qs_input.pl ../pm20/folder_names_qs.rq qsStatement perl make_qs_input.pl ../pm20/folder_doc_total_count.rq qsStatement perl make_qs_input.pl ../pm20/folder_doc_online_count.rq qsStatement
Because company names are cleaned up currently, creation of "named as" qualifiers is restricted to sh wa pe
for now.
The folder names / doc counts queries and the script are available on Github.
Consistency checks for the PM20 Subject Categories system
editThe PM20 Subject Categories system is kept as a set of interlinked items in Wikidata, insofar the categories are linked to PM20 folder items.
Fix folder "main subject" statements (if checks reveal errors)
editRemove main subject (P921) properties linking to non-PM20 Subject Category items:
cd /opt/sparql-queries/bin perl make_qs_input.pl ../pm20/folder_subject_remove_qs.rq qs
Add correct statements:
perl make_qs_input.pl ../pm20/folder_subject_add_qs.rq qs
One-time tasks
editAdd items for all un-linked person folders
editAfter extended M-n-m and looking up heads of state and multiple-documents folders manually, and some testing, items for all 346 remaining person folders were created automatically. As discussed on the talk page,
perl add_missing_wikidata.pl pm20_pe create
(script, query) was executed and the output pasted into Quickstatements. Jneubert (talk) 15:08, 13 June 2019 (UTC)
- Rather minimal example item: Albert Hopff (Q64589732)
Add person information from PM20 to WD
edit- DONE Enhance items with missing date of birth (P569) and date of death (P570)
- DONE Enhance items with missing GND ID (P227)
- DONE occupation (P106) for economist (Q188094) and business economist (Q1860032) (from "Tätigkeitsfeld") (all items linked to PM20 - query)
perl add_missing_wikidata.pl pm20_pe enhance P106
- DONE occupation (P106) for other identifiable professions (social scientist (Q15319501), earth scientist (Q11424604)) (items linked to PM20 folders with docs)
- DONE link families and its members (see here)
- DONE link companies and its staff (founded by (P112), board member (P3320), supervisory board member (P5052))
- Example: Friedrich Krupp AG (Q679201)
- ABORTED Add work location (P937) to all persons with "Wirkungsbereich: Kolonialwesen" (used in Archivführer Deutsche Kolonialgeschichte) (ca. 50 items for Germany, see list)
- work location (P937) is not a sufficient criterion for German colonial history: Persons in the colonial adminstration in Germany (such as Karl Rathgen (Q71153) (Kolonialinstitut), Johannes Bell (Q458248) (Reichsminister Kolonialfragen)) have not actually worked in the German colonies. Some prepared for colonization such as Georg August Schweinfurth (Q63126) (Expeditionen, Deutsche Kolonialgesellschaft), worked in colonies of other countries (Eduard Amandus Lippert (Q1289175)), or had business in multiple colonies (Friedrich Lenz (Q1460735)). --Jneubert (talk) 14:54, 14 October 2019 (UTC)
- POSTPONED field of work (P101) for other "Tätigkeitsfelder" (see here) and work location (P937) (partly depending on use in applications)
- POSTPONED "Herkunftsland" (country of citizenship (P27) missing in c. 620 items) (=> some countries are tricky because of history, perhaps citizenship even cannot be derived from "Herkunftsland"
- RESULTS Statements in WD sourced by PM20
Create Mix-n-match catalog for newspapers
editDONE A mnm catalog for newspapers and journals from PM20 was created, comprising 1359 entries from the internal "publikation" database table, with the ZDB ID is key. Records without ZDB ID were omitted, some duplicates (e.g. same ZDB ID for paper and supplement) were skipped. (input file) --Jneubert (talk) 06:52, 8 September 2019 (UTC)
Replace Wikipedia links which do not use the templates
editLinks to (webopac|webopac0).hwwa.de and zbw.eu/beta/p20 will become obsolete, probably by end of 2020. Therefore, all references to such links have to be replaced.
Folder links
edit- German Wikipedia: de:Vorlage:Pressemappe
- webopac0.hwwa.de DONE (for Weblinks only)
- zbw.eu/beta/p20 giftbot search article namespace DONE
- webopac.hwwa.de giftbot search article namespace DONE
Document links
editDirect links to documents or pages have to be replaced, too. Depends on the introduction of persistent addresses for documents. DONE
Add PM20 geo/subject folders
edit- Add PM20 geo codes to linked items according to existing mapping
- DONE for the geo codes connected to subject folders (query, script) Jneubert (talk) 16:29, 13 August 2020 (UTC)
- Upper level categories (first and second level)
- DONE Translate subject category labels to (British) English
- DONE Create items for PM20 subject categories (160 in total)
- perl add_missing_wikidata.pl pm20_subject_category
- perl add_missing_wikidata.pl pm20_subject_category enhance P361 (partOf hiearchy)
- Two dozend items which link to special intermediate levels not transferred to Wikidata got no partOf link and need to be fixed
- DONE Create items for folders (3776 in total)
- perl add_missing_wikidata.pl pm20_subject_folder - temporarily interrupted because of Quickstatements creating duplicates
- All remaining categories
- DONE Translate subject category labels to English
- DONE Fix hierarchy
- DONE Create items for PM20 subject categories (exactly the 1452 categories from "klassifikator WHERE klass_code='JE' and mappen_anzahl is not null")
- perl add_missing_wikidata.pl pm20_subject_category
- DONE Create category hierachy
- perl add_missing_wikidata.pl pm20_subject_category enhance P361 (partOf hiearchy)
- DONE Create category sort label
- perl add_missing_wikidata.pl pm20_subject_category enhance P8484
- perl add_missing_wikidata.pl pm20_geo_category enhance P8483
- DONE Create items for folders
- perl add_missing_wikidata.pl pm20_subject_folder
- DONE Set document counts
- perl make_qs_input.pl ../pm20/folder_doc_total_count.rq qsStatement
- perl make_qs_input.pl ../pm20/folder_doc_online_count.rq qsStatement
- OPTIONAL (later)
- Map subject categories to WD items (via main subject (P921))
- Create all known geo and subject categories, even when for now without folders (for later use in film sections)
- Create reverse has part statements (issues: meaningful order, completeness)
- Create film sections for countres not or incompletely represented as folders, create pages and add according geo codes
Add company/institution folders
edit- Retrieving and using direct links
- DONE via GND
- DONE via linked Wikipedia page in PM20
- Segment set of companies according to main languages, using county code field
- DONE - see segments and statistics
- DONE for each segment
DONE for Dutch, for English, for German, for French, for div (Mnm, search, QS, errors), ...- Mapping
- Rules for in-exact matches, expressed via mapping relation type (P4390):
- related match (Q39894604) for relevant items of another class (e.g. founder, building, brand)
- narrow match (Q39893967) for folders which cover only a certain aspect of a company (e.g., Bank of England (B3) - Administration)
- exact match (Q39893449) NOT USED, because PM20 is not based on a formal legal definition of a company (neither in general is Wikidata)
- Create mnm catalog for company folders with documents, order by document count, matching against organization and wikipedia for the according language
- for all entries, including already mapped
- with synonyms (altLabel), names with GND excluded
- ./make_mnm_input.sh pm20 nl
- Map from top
- Create openrefine in same order (matching English labels) (???)
- only for unmapped entries (after mnm automatch)
- Create list of QS insert statements and use in parallel for creating missing items
- Rules for in-exact matches, expressed via mapping relation type (P4390):
- Create QS inserts for all unmapped entries (using the country code lists above)
- TODO exclude only exactly or unqualified mapped items
- ./add_missing_wikidata.pl pm20_co
- Update Mix-n-match (Action -> Katalog manuell synchronisieren -> Mix-n-match aktualisieren)
- Mapping
- DONE Cleanup large intersected companies (e.g., Deutsche Reichsbahn)
- DONE Add standard qualifiers for P4293 (name, counts)
- Cleanup / extension re. inexact mappings
- DONE set "related match" mapping relation qualifier for all co/person, co/building etc. mappings
- DONE cleanup duplicate exact/unqualified mappings
- DONE fix missing folders (~50, lost with change from create_rdf.pl to create_rdf1.pl)
- DONE match items for missing folders
- DONE repeat creation of QS inserts
- DONE Fix missing French labels
- DONE Adding inception/demoliton date
- DONE check inception before demolition date
- DONE Adding GND
- DONE Adding instance-of statements (if not existant)
- DONE Interlinking with persons
- founder: perl add_missing_wikidata.pl pm20_co enhance P112
- board: perl add_missing_wikidata.pl pm20_co enhance P3320
- advisory board: perl add_missing_wikidata.pl pm20_co enhance P5052
- DONE Interlinking with companies
- successor/predecessor
- subsidiary/mother
- DONE Mapping (via Geonames ID) and import of headquarter location
- POSTPONED Add items derived from missing Geonames IDs
- DONE Add derived country
- DONE Mapping and import of industry sector
- DONE Assign industry according to PM20 NACE code (add all assignments)
- DONE Map SK values to WD industries, based on German label (sometimes very coarse-grained)
- DONE Map to targets with NACE code if some confidence in the mapping to NACE is given
- POSTPONED Derive a partial SK-NACE mapping at the end (could this be extended with NT relationen to include in-exect mappings for e.g. Metallindustrie?)
- DONE Assign more industries derived from SK mapping
- DONE For NACE-equivalent, do not add if the same assignment
with PM20 as sourcealready exists For very broad targets, do not add if any assignmet exists
- DONE For NACE-equivalent, do not add if the same assignment
- DONE Create systematic display of industries used in SK mapping
- POSTPONED Fix missing German and English labels (use existing or PM20 label for pre-existing entity?)
- POSTPONED Which synonyms can be added safely?
- POSTPONED Perhaps, link or create separate items for companies indentified by GND (zbwext:includesInstitutionNamed)
- Note as part of the description
Add wares folders
edit- DONE Add property PM20 ware ID (P10890)
- DONE Create Openrefine mapping of ware names
- DONE Add PM20 ware ID (P10890) links to ware items
- DONE Create special categories items (w/o existing ware items)
- perl add_missing_wikidata.pl pm20_ware_category
- DONE Create folder items
- perl add_missing_wikidata.pl pm20_ware_folder
- DONE Add counts
- perl make_qs_input.pl ../pm20/folder_doc_total_count.rq qsStatement
- perl make_qs_input.pl ../pm20/folder_doc_online_count.rq qsStatement
- DONE Add reverse WD links in PM20
- DONE Add mapping for missing country names (countries not required for subjects)
- DONE Add PM20 geo code (P8483) links to geo items
- DONE Create folder items
- DONE Remove duplicates created by QS
- DONE Add names and counts
- perl make_qs_input.pl ../pm20/folder_doc_total_count.rq qsStatement
- perl make_qs_input.pl ../pm20/folder_doc_online_count.rq qsStatement
- DONE Verify completeness of mapping
- DONE Check completeness of wikidata extract in PM20 endpoint
- Fix missing hierarchy levels in PM20 dataset (https://w.wiki/6Csx)
- Recreate category pages with reverse links to WD