Wikidata:WikiProject 20th Century Press Archives/Tools & tasks

 

Home

 

Data Structure

 

Data Sources

 

Use Cases

 

Tools & Tasks

 

Statistics

 

Tools

edit

on Toolforge

edit

other

edit
edit

Often, the titles of PM20 folders can be matched only one by one, through manual lookup or browsing, to Wikidata items. Tools like Mix-n-match do not work well for certain types of folders. This is particularly true for the folders in the subject archives (Länder-/Sacharchiv). Therefore a short description of the according manual workflow here:

  1. Search or discover the folder via the web application or via lists of folders, e.g. from the country/topics archive (sortable and filterable folder list).
    We use the folder Deutschland (bis 1945) : Enteignung von Juden, Arisierung (1933-1945) (Expropriation of Jews in Germany 1933-1945, Aryanization) as example here.
  2. Copy the persistent link of the folder, which underlays the icon "Mappen-Zitier-Link". This is normally done by right-click and "Copy link address" (or similar named function) in the browser.
  3. Search the fitting Wikidata item - e.g. Aryanization (Q664017).
  4. Go to bottom of the item page and click "add statement".
  5. Start typing "pm20 folder" in the Property input box and select "PM 20 folder ID".
  6. Paste the persistent URL copied in step 2 and shorten it (e.g., from http://purl.org/pressemappe20/folder/sh/126128,208307 to "sh/126128,208307").
  7. Click "publish".
edit

Sometimes, PM20 folders may be a valuable external complement to Wikipedia articles. At the bottom, or the right side column, of the Wikidata item, links to Wikipedias in different languages are displayed. For each Wikipedia, there are rules on when and how to add external links - please check them carefully.

English Wikipedia

edit
  • See the rules on w:Wikipedia:External links.
  • In order to be able to receive feedback on your edits, log into Wikipedia as a named user.
  • If the folder contents looks like a valuable addition to the according WP article, edit its "External links" section (or add == External links ==, at the article bottom, but above categories and the like) - see example.
  • Use the PM20 template with the folder ID described above, e.g.
   * {{PM20|FID=sh/126128,208307}}
for adding an link. By default, the WP article name is inserted into the link. If this does not fit well, you can insert an additional |NAME=... into the curly bracket with a better fitting description of the folder content.
  • Adding a short description of your edit in the "Summary" field helps watchers of the article.

German Wikipedia

edit
  • See the rules at de:Wikipedia:Weblinks.
  • In order to be able to receive feedback on your edits, log into Wikipedia as a named user.
  • If the folder contents looks like a valuable addition to the according WP article, edit its "Weblinks" section (or add == Weblinks ==, at the article bottom, but above the section "Einzelnachweise" (individual citations), categories and the like) - see example.
  • Use the Pressemappe template with the folder ID described above, e.g.
   * {{Pressemappe|FID=sh/126128,208307}}
for adding an link. By default, the WP article name is inserted into the link. If this does not fit well, you can insert an additional |NAME=... into the curly bracket with a better fitting description of the folder content.
  • Adding a short description of your edit in the "Zusammenfassung und Quellen" field helps watchers of the article.

Regular maintenance tasks

edit

Add PM20 ID via GND ID ("pm20 via gnd")

edit

Has been run initially for 1600+ IDs. If GND IDs were inserted into Wikidata items which are known in not-yet-linked PM20 folders, we can automatically add the PM20 ID to the item.

 cd /opt/sparql-queries/bin
 perl make_qs_input.pl ../wikidata/missing_pm20_id_via_gnd.rq qsStatement

The query and the script are available on Github.

Set qualifiers ("pm20 folder name" / "pm20 doc count")

edit

QuickStatements input files for subject named as (P1810), number of works (P3740) and number of works accessible online (P5592) are generated via

 cd /opt/sparql-queries/bin
 perl make_qs_input.pl ../pm20/folder_names_qs.rq qsStatement
 perl make_qs_input.pl ../pm20/folder_doc_total_count.rq qsStatement
 perl make_qs_input.pl ../pm20/folder_doc_online_count.rq qsStatement

Because company names are cleaned up currently, creation of "named as" qualifiers is restricted to sh wa pe for now.

The folder names / doc counts queries and the script are available on Github.

Consistency checks for the PM20 Subject Categories system

edit

The PM20 Subject Categories system is kept as a set of interlinked items in Wikidata, insofar the categories are linked to PM20 folder items.

Various queries for checking

Fix folder "main subject" statements (if checks reveal errors)

edit

Remove main subject (P921) properties linking to non-PM20 Subject Category items:

 cd /opt/sparql-queries/bin
 perl make_qs_input.pl ../pm20/folder_subject_remove_qs.rq qs

Add correct statements:

 perl make_qs_input.pl ../pm20/folder_subject_add_qs.rq qs

One-time tasks

edit

Add items for all un-linked person folders

edit

After extended M-n-m and looking up heads of state and multiple-documents folders manually, and some testing, items for all 346 remaining person folders were created automatically. As discussed on the talk page,

 perl add_missing_wikidata.pl pm20_pe create

(script, query) was executed and the output pasted into Quickstatements. Jneubert (talk) 15:08, 13 June 2019 (UTC)[reply]

Rather minimal example item: Albert Hopff (Q64589732)

Add person information from PM20 to WD

edit
 perl add_missing_wikidata.pl pm20_pe enhance P106

Create Mix-n-match catalog for newspapers

edit

DONE A mnm catalog for newspapers and journals from PM20 was created, comprising 1359 entries from the internal "publikation" database table, with the ZDB ID is key. Records without ZDB ID were omitted, some duplicates (e.g. same ZDB ID for paper and supplement) were skipped. (input file) --Jneubert (talk) 06:52, 8 September 2019 (UTC)[reply]

edit

Links to (webopac|webopac0).hwwa.de and zbw.eu/beta/p20 will become obsolete, probably by end of 2020. Therefore, all references to such links have to be replaced.

edit
edit

Direct links to documents or pages have to be replaced, too. Depends on the introduction of persistent addresses for documents. DONE

Add PM20 geo/subject folders

edit
  • Add PM20 geo codes to linked items according to existing mapping
  • Upper level categories (first and second level)
    • DONE Translate subject category labels to (British) English
    • DONE Create items for PM20 subject categories (160 in total)
      perl add_missing_wikidata.pl pm20_subject_category
      perl add_missing_wikidata.pl pm20_subject_category enhance P361 (partOf hiearchy)
      Two dozend items which link to special intermediate levels not transferred to Wikidata got no partOf link and need to be fixed
    • DONE Create items for folders (3776 in total)
      perl add_missing_wikidata.pl pm20_subject_folder - temporarily interrupted because of Quickstatements creating duplicates
  • All remaining categories
    • DONE Translate subject category labels to English
    • DONE Fix hierarchy
    • DONE Create items for PM20 subject categories (exactly the 1452 categories from "klassifikator WHERE klass_code='JE' and mappen_anzahl is not null")
      perl add_missing_wikidata.pl pm20_subject_category
    • DONE Create category hierachy
      perl add_missing_wikidata.pl pm20_subject_category enhance P361 (partOf hiearchy)
    • DONE Create category sort label
      perl add_missing_wikidata.pl pm20_subject_category enhance P8484
      perl add_missing_wikidata.pl pm20_geo_category enhance P8483
    • DONE Create items for folders
      perl add_missing_wikidata.pl pm20_subject_folder
    • DONE Set document counts
      perl make_qs_input.pl ../pm20/folder_doc_total_count.rq qsStatement
      perl make_qs_input.pl ../pm20/folder_doc_online_count.rq qsStatement
  • OPTIONAL (later)
    • Map subject categories to WD items (via main subject (P921))
    • Create all known geo and subject categories, even when for now without folders (for later use in film sections)
    • Create reverse has part statements (issues: meaningful order, completeness)
    • Create film sections for countres not or incompletely represented as folders, create pages and add according geo codes

Add company/institution folders

edit
  • Retrieving and using direct links
    • DONE via GND
    • DONE via linked Wikipedia page in PM20
  • DONE for each segment
    DONE for Dutch, for English, for German, for French, for div (Mnm, search, QS, errors), ...
    • Mapping
      • Rules for in-exact matches, expressed via mapping relation type (P4390):
      • Create mnm catalog for company folders with documents, order by document count, matching against organization and wikipedia for the according language
        • for all entries, including already mapped
        • with synonyms (altLabel), names with GND excluded
          ./make_mnm_input.sh pm20 nl
      • Map from top
        • Create openrefine in same order (matching English labels) (???)
        • only for unmapped entries (after mnm automatch)
      • Create list of QS insert statements and use in parallel for creating missing items
    • Create QS inserts for all unmapped entries (using the country code lists above)
      TODO exclude only exactly or unqualified mapped items
      ./add_missing_wikidata.pl pm20_co
    • Update Mix-n-match (Action -> Katalog manuell synchronisieren -> Mix-n-match aktualisieren)
  • DONE Cleanup large intersected companies (e.g., Deutsche Reichsbahn)
  • DONE Add standard qualifiers for P4293 (name, counts)
  • Cleanup / extension re. inexact mappings
    • DONE set "related match" mapping relation qualifier for all co/person, co/building etc. mappings
    • DONE cleanup duplicate exact/unqualified mappings
    • DONE fix missing folders (~50, lost with change from create_rdf.pl to create_rdf1.pl)
    • DONE match items for missing folders
    • DONE repeat creation of QS inserts
  • DONE Fix missing French labels
  • DONE Adding inception/demoliton date
    • DONE check inception before demolition date
  • DONE Adding GND
  • DONE Adding instance-of statements (if not existant)
  • DONE Interlinking with persons
    • founder: perl add_missing_wikidata.pl pm20_co enhance P112
    • board: perl add_missing_wikidata.pl pm20_co enhance P3320
    • advisory board: perl add_missing_wikidata.pl pm20_co enhance P5052
  • DONE Interlinking with companies
    • successor/predecessor
    • subsidiary/mother
  • DONE Mapping (via Geonames ID) and import of headquarter location
    • POSTPONED Add items derived from missing Geonames IDs
    • DONE Add derived country
  • DONE Mapping and import of industry sector
    • DONE Assign industry according to PM20 NACE code (add all assignments)
    • DONE Map SK values to WD industries, based on German label (sometimes very coarse-grained)
      • DONE Map to targets with NACE code if some confidence in the mapping to NACE is given
      • POSTPONED Derive a partial SK-NACE mapping at the end (could this be extended with NT relationen to include in-exect mappings for e.g. Metallindustrie?)
    • DONE Assign more industries derived from SK mapping
      • DONE For NACE-equivalent, do not add if the same assignment with PM20 as source already exists
      • For very broad targets, do not add if any assignmet exists
    • DONE Create systematic display of industries used in SK mapping
  • POSTPONED Fix missing German and English labels (use existing or PM20 label for pre-existing entity?)
  • POSTPONED Which synonyms can be added safely?
  • POSTPONED Perhaps, link or create separate items for companies indentified by GND (zbwext:includesInstitutionNamed)
    • Note as part of the description

Add wares folders

edit
  • DONE Add property PM20 ware ID (P10890)
  • DONE Create Openrefine mapping of ware names
  • DONE Add PM20 ware ID (P10890) links to ware items
  • DONE Create special categories items (w/o existing ware items)
    perl add_missing_wikidata.pl pm20_ware_category
  • DONE Create folder items
    perl add_missing_wikidata.pl pm20_ware_folder
  • DONE Add counts
    perl make_qs_input.pl ../pm20/folder_doc_total_count.rq qsStatement
    perl make_qs_input.pl ../pm20/folder_doc_online_count.rq qsStatement
  • DONE Add reverse WD links in PM20
  • DONE Add mapping for missing country names (countries not required for subjects)
  • DONE Add PM20 geo code (P8483) links to geo items
  • DONE Create folder items
    • DONE Remove duplicates created by QS
  • DONE Add names and counts
    perl make_qs_input.pl ../pm20/folder_doc_total_count.rq qsStatement
    perl make_qs_input.pl ../pm20/folder_doc_online_count.rq qsStatement
  • DONE Verify completeness of mapping
  • DONE Check completeness of wikidata extract in PM20 endpoint
  • Fix missing hierarchy levels in PM20 dataset (https://w.wiki/6Csx)
  • Recreate category pages with reverse links to WD

Activity log

edit

Rough log of PM20-related activities