Wikidata:WikiProject 20th Century Press Archives/Data structure
The folder metadata from the data source is available in an preliminiary RDF structure:
- SPARQL endpoint (with example query)
- RDF data format documentaion (in German)
- further example queries
- Mapping to IFIS data structure
For discussion about modeling issues, see the talk page.
External identifier property for PM20 folders
editPM20 folder ID (P4293) (documentation and discussion)
Qualifiers
editThe following qualifiers are in regular use with PM20 folder ID (P4293):
- subject named as (P1810) - folder names may differ widely from item labels (see this discussion about the use of "named as")
- number of works (P3740) - articles and other documents accessible only in ZBW premises
- number of works accessible online (P5592) - articles and other documents freely accessible online
- mapping relation type (P4390) - in particular, "related match" when types do not match (e.g., a company folder with the item for the founder of that company)
Considered qualifiers:
Reference statements
editfor the properties below use:
Persons
editStill incomplete mapping of the metadata from person folders:
type | property | pid | datatype | cardinality | rdf | transformation |
---|---|---|---|---|---|---|
Lde:Len | skos:prefLabel | adjust_label (label_type=last_first) | ||||
Dde | schema:hasOccupation | |||||
Den | ||||||
I | instance of | P31 | item | 1.1 | Q5 (fix) OR Q8436 (fix) if label contains "<Familie>" | |
P | PM20 Folder ID | P4293 | string | 1.1 | dcterms:identifier | |
P | GND ID | P227 | string | 0.1 | gndo:gndIdentifier | |
P | date of birth | P569 | date | 0.n | schema:birthDate | format_date() |
P | date of death | P570 | date | 0.n | schema:deathDate | format_date() |
P | occupation | P106 | item | 0.n | zbwext:activity/schema:about | map to wd items, for certain fields of activity |
P | field of work | P101 | item | 0.n | zbwext:activity/schema:about | map to wd items, for certain fields of activity |
P | family | P53 | item | 0.n | dct:hasPart (inv) | not for families |
types: L=label, D=description, P=property, I=implied property
Organizations/companies
edit- Considerations, early examples and feedback in the Project Chat
- about type business/company/... on Companies project talk page]
- discussed at https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2020/05#Business_v._enterprise_v._company
Interlinking
edit- Companies (create reverse relations)
- Location
General considerations about categories (subjects, wares, and geographical locations)
edit- Persistent URIs for categories follow the pattern: http://purl.org/pressemappe20/category/(geo%7Csubject%7Cware)/i/{numerical-id} "numerical-id" stems from the IFIS database. This means no more Persitent URIs can be coined when the IFIS database is not available any more.
- Persitent URIs resolve to HTML pages if there are folders for the category
- Countries and probably wares can be mapped to real world items, subjects are too arbitrary (e.g., "Land und Leute, Politik und Wirtschaft, Allgemein | Country and people, politics and economy, general)", "Postwesen. Telegraphenwesen und Fernsprechwesen | Postal services, telegraphy and telephony)" (and subcategories), "Geschichtliche Vorgänge 1900-1914 | Historical events 1900-1914")
- A completed country (location) mapping to Wikidata items already exists
- Subject categories must be represented in Wikidata as one item per category, for
- normalization (functional dependency of signature and category names (in German and other languages) from id)
- representation of the category hierarchy
- extensibility, when new folders should be added from films (versus a frozen HTML version of the classification at a certain point in time)
- Ware categories are can be mostly represented by existing Wikidata items. A few items need to be created anew (see below).
Country/subject folder items, defined by existing location + PM20 subject classification (Draft)
editEach folder in the country/subject archive is defined by a combination of
- a geographical entity (countries, geographical regions, a few cities, the whole world)
- a subject category (within a hierarchy of subject categories)
The model developed here has to be extensible, because currently only parts of the digitized material have been organized in well described folders, while others only exist as ranges of images of digitized roll films.
New properties
editOpen question: Use ID (precise, but closed) or notation (more fuzzy, extendable) Probably: Use notation/signature/code as external id, for future extensibility without an external database (beyond the film material itself).
- Implementation: The purl.org address, e.g. http://purl.org/pm20/category/geo/s/A9, redirects to https://pm20.zbw.eu/category/geo/s/A9, which will redirect to https://pm20.zbw.eu/category/geo/i/140905 (numerical identifier via Apache RewriteMap). Currently it redirects to http://zbw.eu/beta/pm20voc/ag/140905, and that to https://zbw.eu/beta/skosmos/pm20ag/en/page/140905, a Skosmos concept page. --Jneubert (talk) 08:02, 24 July 2020 (UTC)
name | pid | datatype | links to | comment | temporary use for example creation |
---|---|---|---|---|---|
PM20 subject code (P8484) | external | term entry and list of folders by "country" | at the PM20 subject category item | - | |
PM20 geo code (P8483) | external | term entry and list of folders by subject and ware | at the real world location item | - | |
PM20 ware ID (P10890) | external | term entry and list of folders by "country" | at the real world ware/product/product class item or a special PM20 ware category (Q111973176) (in a few cases) | catalog code (P528) | |
PM20 film section ID (P11822) | external | image from PM20 film or fiche | with mandatory qualifier number of pages (P1104), indicating a range of follow-up images (possibly across multiple films/fiches) | inventory number (P217) | |
Alternatives:
one generic property for notation (something like skos:notation - already existing? preliminarily use short name (P1813)?), similar to catalog code (P528), in combination with catalog (P972)Or: Use with formatter url instead of list property? (No - only works as a non-extensable list (generated HTML page) + lookup mechanism (notation -> id).- Preliminary implementaion of external-id target pages with Skosmos leaves the lists almost hidden as rdfs:seeAlso links.
- Future implementation will use an signature -> id redirect
- Todo later-on: static html page, perhaps with customized 404 message "Signature not defined in the static representation of PM20 vocabulary (as of ...)"
Non-uniqueness problems:
- PM20 folder ID (P4293) in regular items (e.g., petroleum in the United States (Q7178964)) and in folder items. (Probably will occur rarely)
- PM20 location code in location items and in qualifiers at folder items (e.g., B42a in example British India : Health situation, general (Q92867525) - is that a problem? Probably no: The qualifier only complements the location item, with the unique identifier repeated redundantly)
currently ca. 1400 categories
example items (partOf/hasPart hierarchy):
PM20 subject category system (Q92732036)
type | property | pid | datatype | cardinality | source property | transformation |
---|---|---|---|---|---|---|
Lde | skos:prefLabel @de | |||||
Len | skos:prefLabel @en | |||||
Dde | "Systematikstelle des Pressearchiv 20. Jahrhundert" (fix) | |||||
Den | "subject category of the 20th Century Press Archives" (fix) | |||||
Ade | PM20 subject code + labelDe | |||||
Aen | PM20 subject code + labelEn | |||||
P | instance of | P31 | item | 1.1 | PM20 subject category (Q92707903) (fix) | |
P | part of | P361 | item | 1.1 | super_class() | |
P | PM20 subject code | P8484 | external | 1.1 | ||
P | main subject | P921 | item | 0.1 | manual lookup | |
P | has part | P527 | item | 0.1 | ||
type: L=label, D=description, P=property, I=implied property
PM20 subject code (P8484) holds short form of the notation (e.g., n24 Sm12), as it was used in the signatures put onto the clippings. A fuller form useful for sorting (e.g., n 24 SM 012) could be added to the category item as a label with the language code zxx (no linguistic content (Q22282939)) (does not work - neither with QS (Lzxx) nor interactively). Other possible option: use series ordinal (P1545) as a qualifier for PM20 subject code (P8484) (example).
currently ca. 9000 subject folders
example items:
- Germany : Individual diseases and their control (Q91257808)
- British India : Health situation, general (Q92867525) (this folder does not exist in the current PM20 application, because the material has not yet been checked and published - it is accessible on the ZBW premises)
- Poland : PM20 (Q93270979) (dito - reference to "all about Poland in the country/subject archives")
- Nordslesvig : Historical events (Q104977549) publicly accessible film section
type | property | pid | datatype | cardinality | restriction | source property | transformation |
---|---|---|---|---|---|---|---|
Lde | skos:prefLabel | ||||||
Len | derived from English location and class labels? | ||||||
Dde | "Mappe aus dem Pressearchiv 20. Jahrhundert" (fix) | ||||||
Den | "folder of the 20th Century Press Archives" (fix) | ||||||
I | instance of | P31 | item | 1.1 | PM20 country/subject folder (Q91257459) (fix) | ||
P | part of | P361 | item | 1.1 | 20th Century Press Archives (Q36948990) (fix) | ||
P | facet of | P1269 | item | 1.1 | subclass/instance of human-geographic territorial entity (Q15642541) | zbwext:country | qualified with PM20 geo code (P8483), used for lookup |
P | main subject | P921 | item | 1.1 | instance of PM20 subject category (Q92707903) | zbwext:subject | qualified with PM20 subject code (P8484), used for lookup |
P | IIIF manifest | P6108 | url | 1.1 | manifest_url() TODO Anzeige mit plugin? | ||
P | PM20 folder ID | P4293 | external | 1.1 | starting wth "sh/" | dct:identifier | when also used for films, alternatively with new "PM20 film" property |
P | number of works | P3740 | quantity | 1.1 | zbwext:totalDocCount | ||
Possible extension for image ranges on films or fiches
edit- May be applied to countries, wares, company folders, subject folders (missing person folders are not digitized)
- Additional property: PM20 film section ID (P11822) (see above) - created
- described as: range of microfilm images of the 20th Century Press Archives (start position given as property value, qualified with number of pages). (Later on perhaps extended to digitized microfiches, or otherwise the latter in a separate property)
- Formatter URL: https://pm20.zbw.eu/film/
- value example: h2/sh/S2043H/1151
- mandatory qualifier: number of pages (P1104)
- on films, one image normally contains two pages
- calculation of P1104 for films uses start of next range: ( {number of images of start film} - {value} ) + {number of images of all intermediate films} + { next range start ] - 1 ) * 2 (two pages per image - not necessarily exact due to start/end page, images with only one page, etc.)
- optional qualifier: including (P1012) subcategory (Q92876464)
- indicates the inclusion of sub-folder hierarchy
- multiple entries per item are possible! (h1, h2, h3, ...)
Ware folders
edit- Same way as subject folders, linking to subject codes? Probably not - the notations used in the category system derived from the IFIS database were obviously not in use for the period before 1998 (?). The main entry was the ware name.
- If some kind of notation for ware categories is needed, see https://stackoverflow.com/questions/4009281/how-can-i-generate-url-slugs-in-perl for a "slugified" version of the categoriy names. --Jneubert (talk) 17:46, 28 July 2020 (UTC)
- However, different from subject categories, we can assume that all ware categories are known. So all ware categories are covered by numeric IDs and we have no need to construct an extendable naming/URI scheme.
- On the Wikidata level, the ID in a ware item can link directly to the category (= list of folders by country). If no ware item exists (e.g. for category Axe, hatchet, hammer), a PM20 ware category (Q111973176) item could be created.
- Open question: Link all categories to Wikidata, or only the ones with folders? --Jneubert (talk) 09:59, 23 March 2022 (UTC)
- For each ware, in front of the specific countries folders, there were general articles collected, organized by a general - or in some cases (such as coal) ware-specific - schema. In the legacy application, all these articles were assigned to the "country"
H World
. The categories of the according schema were added as keywords (disregarding the notation and hierarchy) on the article level (examples Abfall, Kohle). Jneubert (talk) 11:06, 9 May 2022 (UTC)
Only used for special categories, mostly collections of concepts (like Axe, hatchet, hammer (Q113376049)). Normally, commodities and wares categories use already defined normal items.
type | property | pid | datatype | cardinality | source property | transformation |
---|---|---|---|---|---|---|
Lde | skos:prefLabel @de | |||||
Len | skos:prefLabel @en | |||||
Dde | "spezielle Kategorie von Waren des Pressearchiv 20. Jahrhundert" (fix) | |||||
Den | "special category of commodities/wares of the 20th Century Press Archives" (fix) | |||||
Ade | ? | |||||
Aen | ? | |||||
P | instance of | P31 | item | 1.1 | PM20 ware category (Q111973176) (fix) | |
P | PM20 ware ID | P10890 | external | 1.1 | ||
P | main subject | P921 | item | 0.1 | manual lookup | |
type: L=label, D=description, P=property, I=implied property
type | property | pid | datatype | cardinality | restriction | source property | transformation |
---|---|---|---|---|---|---|---|
Lde | skos:prefLabel | ||||||
Len | derived from English location and class labels? | ||||||
Dde | "Mappe aus dem Pressearchiv 20. Jahrhundert" (fix) | ||||||
Den | "folder of the 20th Century Press Archives" (fix) | ||||||
I | instance of | P31 | item | 1.1 | PM20 ware/country folder (Q113376528) (fix) | ||
P | part of | P361 | item | 1.1 | 20th Century Press Archives (Q36948990) (fix) | ||
P | facet of | P1269 | item | 1.1 | subclass/instance of human-geographic territorial entity (Q15642541) | zbwext:country | qualified with PM20 geo code (P8483), used for lookup |
P | main subject | P921 | item | 1.1 | instance of PM20 ware category (Q111973176) or other commodity/ware item | zbwext:ware | qualified with PM20 ware ID (P10890), used for lookup |
P | IIIF manifest | P6108 | url | 1.1 | manifest_url() TODO Anzeige mit plugin? | ||
P | PM20 folder ID | P4293 | external | 1.1 | starting wth "wa/" | dct:identifier | when also used for films, alternatively with new "PM20 film" property |
P | number of works | P3740 | quantity | 1.1 | zbwext:totalDocCount | ||
Related on Wikidata
editOther collections
edit- Wikidata:WikiProject DNB (Dictionary of National Biography) - Wikisource entries and their modelling and cross-linking in WD
Other classifications
edit- The Carnegie Classification of Institutions of Higher Education seems to be imported completely into WD (top Carnegie Classification of Institutions of Higher Education (Q4223026), 158 subclasses, 3180 uses of Carnegie Classification of Institutions of Higher Education (P2643))
- Nickel–Strunz Classification (of minerals) Nickel–Strunz classification (Q3679719) (e.g., phosphate mineral (Q3092395))
- modelled differently: International Statistical Classification of Diseases and Related Health Problems (Q50018) with properties like ICD-10 ID (P494) , Hornbostel–Sachs (Q496327) - ??
Possibly interesting properties
edit- has list (P2354) (item) Wikimedia list related to this subject, list of works (P1455) link to the article with the works of a person
- scope and content (P7535) (monolingual text) a summary statement providing an overview of the archival collection
- full work available at URL (P953) (url) online version (vs. pm20intern film stretches)
- external data available at URL (P1325) (url) URL where external data on this item can be found
- URL (P2699) (url) location of a resource. To qualify with of (P642)
- applies to part, aspect, or form (P518) qualifier (item) part, aspect, or form of the item to which the claim applies
- valid in place (P3005) qualifier (item) place/country where a statement is valid
- statement is subject of (P805) qualifier (item) can link to an item for an aspect (examples)
- partially coincident with (P1382) relation between NACE items and other classes?
- floruit (P1317) date when the person was known to be active or alive, when birth or death not documented (for human, not organization!)