Wikidata:WikiProject Datasets/reports/Ingesting data about historical monuments in the city of Zurich

Introduction to the case report edit

Background edit

This report describes the process of ingesting data about Historical monuments in the City of Zurich, Switzerland.

Besides ingesting the data in Wikidata the following three documents / guidelines are verified during the process in order to get further insights:

Data Sources edit

There are two different data sources which are used during the case studies:

  • Data from the City of Zurich, Monuments Protection Service (main data source for the import, Link dataset)
  • Data from the Swiss Inventory of Cultural Property of National and Regional Significance (KGS) Link dataset. Some items have already been imported from this data source with the ingest described in this report.

Procedure edit

The procedure followed contains the nine steps introduced in Ingesting Swiss heritage institutions and further illustrated (and enhanced by a 10th step) in A practical beginners user-guideline for ingesting datasets into Wikidata.

Field notes edit

Step 1: Goals, data structure and data quality edit

Data structure diagram edit

The diagram below shows the structure of the data source about historical monuments in the city of Zurich. Only distributions that are used for ingesting the data are illustrated with all properties and data record structure. Only bold properties are used for the ingest.

 

Data quality assessment edit

The following table shows the base line defined (minimal quality needed to ingest the data set in Wikidata) and the results of the data quality assessment itself. The assessment table is based on Cai and Zhu (2015), the criteria used are based on Rohweder et al. (2015).

information quality category information quality dimension indicators baseline baseline explanation assessment
A) accessibility data quality 1) accessibility The information can be accessed in an easy an direct manner. x Data cannot be imported into Wikidata if it's not accessible. x
2) ease of manipulation The information can easily be manipulated and can be used for different purposes. Data should be easy to manipulate; however it is sufficient if the data is usable in the context of cultural heritage. x
3) compatible license* The license under which the data has been published is compatible with Wikidata's CC Zero License. x Compulsory for data ingestion into Wikidata. x
B) intrinsic data quality 1) reputation Information enjoys high reputation if the source of the data, transport media and processing system have a high reputation of trustworthiness and competence. x The organization responsible for collecting the data must have a high reputation. x
2) free of error Information is free of error if it is consistent with reality. Some errors can be corrected during the import; no serious consequences if there are any left.
3) objectivity Information is objective if it is strictly factual and non-judgemental. x Information on cultural heritage must be objective. x
4) believability Information is believable if certificates demonstrate a high quality standard or the information is acquired and disseminated at high effort. x Information must be accurately collected; no certificates needed. x
5) verifiability* The origin of the data is clear. x The origin of the data must be clear so statements in Wikidata can be covered with meaningful references. x
Statements on items that result from the data can be covered with meaningful references.
C) representational data quality 1) understandability Information is directly understood by the users and can be used for their purposes. x Information must be directly understandable in the context of cultural heritage. x
2) concise representation Exactly the required information is presented in a suitable and easily graspable format. Additional information can be present. Data can be cleansed before it is ingested. x
3) consistent representation The information is represented throughout in the same way. x Necessary for efficient automated ingestion. x
4) interpretability Information is described in the same, professionally correct manner. x No unstructured data must be ingested into Wikidata. x
The part of the data intended for the ingestion is structured data.
D) contextual data quality 1) timeliness Information represents the actual properties of the described object in a timely manner. Timely manner is difficult to define in the context of cultural heritage and depending on the actual properties. Therefore not part of the baseline. (x)
2) value-added The use of the information can lead to a quantifiable increase in a monetary target function. Monetary target function not mandatory for intended use.
3) completeness Information is not missing and is available at the defined time points in the respective process steps. x A dataset should be ingested as a complete set into Wikidata. Label and descriptions should be unique or at least significant. x
For all mandatory properties, statements with reasonable values can be created.
Label and description (terms) can be used with meaningful and easily retrievable values.
4) appropriate amount of data The amount of information available satisfies the requirements. x Requirements must be fulfilled; Wikidata must not be flooded with unnecessary data. x
5) relevancy Data is relevant when providing information necessary to the user. x Information must provide information necessary for cultural heritage. x
6) notability* The data to be imported meet Wikidata's need for notability. x Every item ingested into Wikidata must meet Wikidata's need for notability. x

The results of the data assessment shows the baseline has been reached and the data set will be ingested.

Step 2: Mapping edit

Data record edit

As for step 2 the mapping was created. For the mapping with Schema.org the schema LandmarksOrHistoricalBuildings has been used.

Property in datafile Property in Schema.org Property in Wikidata Refers to class in Wikidata Value range / Example values Remarks
ID (technically: ID) http://schema.org/identifier inventory number (P217) Wikidata property for cultural heritage identification ID08582 The prefix "ID" is used to indicate which field is used in the source and to ingest the numerical value into the property of data type string. ID is used in 5 digit format (with leading zeros) to match the properties format requirements.
Objektbezeichnung (technically: OBJEKTBEZE) http://schema.org/name label multilingual text (label) Rule 1, row with ID 5798: Rote Fabrik

Rule 2, ID 880: Sprüngli-Haus

Rule 3, ID 238: Denkmalschutzobjekt, Talstrasse 82 (Zürich, Schweiz)

Rule 4, ID 1111: Denkmalschutzobjekt (Zürich, Schweiz)

Rule 5, ID 2: Mehrfamilienhaus, Baurstrasse 40 (Zürich, Schweiz)

To to ensure items are ingested with a unique (or at least significant) label the following rules are applied:
  1. if the label (Objektbezeichnung) itself can be considered as significant only the label is used
  2. if no label (Objektbezeichnung) is available from the source (2052 rows) the description (NaehereBezeichnung) is used instead (106 rows)
  3. if there is neither a label nor a description (1946 rows) the terms "Denkmalschutzobjekt" (de) and "cultural property" (en) in combination with address (Adresse) is used as label
  4. if there is no label, no description nor an address (9 rows) the rows are ingested anyway to have a complete dataset. For these rows the terms "Denkmalschutzobjekt" (de) and "cultural property" (en) is used (items can be identified via the inventory number and updated if more accurate data is available).
  5. if the rules above don't form a significant label (e.g. multiple rows with the same label) address, municipality and country can be used to make them unique (see samples to the left).
  6. if the description (NaehereBezeichnung) is already used in label the terms "Denkmalschutzobjekt in der Stadt Zürich, Schweiz" (de) and "historical monument in the city of Zurich, Switzerland" are used as description if not already clearly stated in the label.
NaehereBezeichnung (technically: NAEHEREBEZ) http://schema.org/description description multilingual text (description) Verwaltungsgebäude Since items without a description are difficult to identify in Wikidata search the following rules are applied:
  • if there is no description (NaehereBezeichnung) or this one has been used as label, a generic description is set: "historical monument in the city of Zurich, Switzerland"
Inventarkategorie (technically: INVENTARKA) http://schema.org/additionalProperty curator (P1640) designation for an administrative territorial entity (Q15617994) value "kommunal": municipality (Q15284)

value "kantonal": canton (Q2311958)

value "regional": administrative region (Q3455524)

source value range: "kommunal", "kantonal", "regional"
Schutzstatus (technically: UNTERSCHUT) http://schema.org/additionalProperty heritage status (P1435) historic monument in Switzerland (Q3323397) value "Nein" -> just in inventory: Swiss cultural property of cantonal or local significance (Q28971394)

value "Ja" -> heritage preservation: Swiss cultural property under the protection of cultural heritage (Q30246026)

source value range: "Nein", "Ja"
Adresse (technically: ADRESSE) http://schema.org/address located at street address (P969) address (Q319608)

street address (Q24574749)

Seestrasse 395, 8038 Zürich If there are multiple addresses stated in the source, multiple statements are created. For property P969 the entire address is needed including ZIP and city. Therefore these two have to be mapped (e.g. using this directory)
Baujahr (technisch: BAUJAHR) http://schema.org/additionalProperty inception (P571) year (Q577) 1896
Stadtkreis (technisch: STADTKREIS) http://schema.org/additionalProperty located in the administrative territorial entity (P131) administrative territorial entity (Q56061) District 2 (Q456153)
Vermessungsbezirk (technisch: VERMBEZIRK) http://schema.org/additionalProperty located in the administrative territorial entity (P131) administrative territorial entity (Q56061) Wollishofen (Q642353) only if different from the item used for Stadtkreis
coordinates (extracted from KML file, mapped with FID (different form ID!) and id5) http://schema.org/geo coordinate location (P625) geographic coordinate system (Q22664)

World Geodetic System 1984 (Q11902211)

27°59'17"N, 86°55'31"E The coordinates in the original data source appear in a unique Swiss format (CH1903+) in the dbf shape file. They need to be mapped from the KMZ file in the international WGS-84 format. They can be extracted from KML with the tool KMLCSV Converter for instance. The format has to be changed afterwards from «<Longitude>,<Latitude>,<Altitude>» to «<Latitude>,<Longitude>» (e.g. in Excel with this formula: " =CONCATENATE(LEFT(RIGHT(B2;19);17);",";LEFT(B2;17)) ").

Additional values ingested edit

Property in datafile Property in Schema.org Property in Wikidata Refers to class in Wikidata Possible values Remarks
- http://schema.org/additionalProperty located in the administrative territorial entity (P131) administrative territorial entity (Q56061) Zürich (Q72) All items from the data source are located in the administrative territorial entity of the City of Zurich
- http://schema.org/additionalProperty located in the administrative territorial entity (P131) administrative territorial entity (Q56061) Canton of Zürich (Q11943) All items from the data source are located in the administrative territorial entity of the Canton of Zurich
http://schema.org/additionalProperty country (P17) country (Q6256) Switzerland (Q39)
- http://schema.org/additionalProperty instance of (P31) architectural structure (Q811979) architectural structure (Q811979) https://data.stadt-zuerich.ch/dataset/denkmalschutzobjekt:

"Diese Objekte sind Gebäude, Gebäudeteile, Brunnen, Brücken oder Denkmäler."

and since monument (Q4989906) is a sub-class of architectural structure (Q811979) this is used for all new items

Step 3: Data formats and data cleansing edit

Checking data formats in OpenRefine resulted in the following findings:

  • All rows have an ID.
  • Different spellings used for same meaning:
    • ehem. vs Ehem. vs. Ehemalige vs. ehemalige vs. Ehemaliges vs. ehemaliges
    • Schweiz. vs Schweizerische vs Schweizerischer
  • Lots of rows with the same label (Objektbezeichnung) only differing by description (Naehere Bezeichnung) or even only by address (e.g. "Wohnsiedlung Auzelg")
    • see rules to form unique (or at least significant) labels in the mapping of step 2
  • Non-numeric values (184) for inception (Baujahr) (e.g. vor 1812, um 1770).
  • No numeric errors in the numbers (year between 1100 and 2004)

To ease later updates no data cleansing is done for different spellings used for same meanings.

Step 4: Unique identifier edit

The property "ID" of the source data set is used as unique identifier and assigned to the property inventory number (P217) in Wikidata. As described in step 3 all rows of the source data have an unique identifier assigned. To ease later updates of the data set this property is also set for all items already present with a PCP reference number (P381, ger.: KGS-DS-Nummer).

Step 5: Mapping to existing data edit

This query can be used to get all properties with Swiss Heritage Status Class A or B in the city of Zurich (Query).

A mapping of the resulting list with the source data of the city of Zurich can be done using OpenRefine.

Step 6: Model the data source in Wikidata edit

One data set and two distributions are modeled based on this data source.

Data set edit

Item: historical monuments in the city of Zurich, Switzerland (Q30237745)

Property in source DCAT property name

(* mandatory, + recommended)

Representation in Wikidata Value Remarks
Titel title* label en: historical monuments in the city of Zurich, Switzerland

de: Inventar der kunst- und kulturhistorischen Schutzobjekte der Stadt Zürich, Schweiz

Beschreibung description* description en: Inventory of built heritage objects in the city of Zurich, Switzerland

de: Dataset zur Datenbank der städtischen Denkmalpflege zu den inventarisierten und geschützten Objekten in der Stadt Zürich

Datenowner publisher+ publisher (P123) Denkmalpflege, Amt für Städtebau, Hochbaudepartement, Zürich (Switzerland) (Q30322523)
Kontakt contact point+ [NEW PROPERTY] contact point TBD: New Item: Open Data Zürich (opendata@zuerich.ch) as instance of (P31) contact point (Q30322502) with qualifier subject has role (P2868) with value contact point (Q30322502) TBD: Property does not yet exist. See property proposal discussion.
Datentyp - - - Not mapped to a property, represented implicitly by structure
Datenlieferant - [NEW PROPERTY] contact point TBD: New Item: Geomatik + Vermessung Zürich, Tiefbau- und Entsorgungsdepartement as instance of (P31) contact point (Q30322502)

with qualifier subject has role (P2868) with value vendor (Q1762621)

TBD: Property does not yet exist. See property proposal discussion.
Räumliche Beziehung spatial/ geographical coverage location (P276) Zürich (Q72)
Rechtsgrundlage - laws applied (P3014) Planungs- und Baugesetz (PBG) (Q30246760)

with qualifier section, verse, or paragraph (P958) and value § 203 (StRB Nr. 635 vom 28.02.1964)

Datenqualität - - - omitted since this is only a reference to the remarks section which is unstructured
Erstmalige Veröffentlichung release date publication date (P577) 12.09.2014, 13:20
Zeitraum temporal coverage start time (P580)

end time (P582)

start time (P580): 2014-08-01

end time (P582): 2014-08-31

Aktualisierungsdatum update/modification date significant event (P793) data set modification (Q30241577)

with qualifier

point in time (P585)  2015-04-15

Version version edition number (P393) 1.0
Aktualisierungsintervall frequency publication interval (P2896) monatlich
Bemerkungen - - - Not mapped because this is unstructured data
- dataset distribution+ dataset distribution (P2702) distribution of historical monuments in the city of Zurich, Switzerland, June 2017 (zip) (Q30243079)

distribution of historical monuments in the city of Zurich, Switzerland, June 2017 (kmz) (Q30243243)

Additional recommended property of DCAT not present in Source
- theme/category+ main subject (P921) cultural heritage (Q210272) Additional recommended property of DCAT not present in Source
- - instance of (P31) data set (Q1172284)

heritage register (Q15097084)

item's classification in Wikidata
- - country (P17) Switzerland (Q39)
- - applies to territorial jurisdiction (P1001) Zürich (Q72)
- - official website (P856) https://data.stadt-zuerich.ch/dataset/denkmalschutzobjekt used as reference URL for references in other statements

Distribution (ZIP) edit

Item: distribution of historical monuments in the city of Zurich, Switzerland, June 2017 (zip) (Q30243079)

This item is used as source of all statements mapped in step 2 apart from coordinate location (see distribution below) and additional values ingested.

Property in source DCAT property name

(* mandatory, + recommended)

Representation in Wikidata Value Remarks
Titel title label en: distribution of historical monuments in the city of Zurich, Switzerland, June 2017

de: Distribution des Inventars der kunst- und kulturhistorischen Schutzobjekte der Stadt Zürich, Schweiz, Juni 2017

Beschreibung description+ description en: distribution as zip file of the inventory of built heritage objects in the city of Zurich, Switzerland (modified: June 2017)

de: Distribution als ZIP-Datei zur Datenbank der städtischen Denkmalpflege zu den inventarisierten und geschützten Objekten in der Stadt Zürich (zuletzt aktualisiert: Juni 2017)

[page URL] access URL URL (P2699) https://data.stadt-zuerich.ch/dataset/denkmalschutzobjekt/resource/4abb2915-6ed7-4759-8969-15061235d669 used as reference URL for references in other statements
URL download URL URL (P2699) https://data.stadt-zuerich.ch/dataset/denkmalschutzobjekt/resource/4abb2915-6ed7-4759-8969-15061235d669/download/denkmalschutzobjekt.zip with qualifier of pointing to download (Q7126717)
Zuletzt aktualisiert significant event (P793) digital distribution modification (Q30243125)

with qualifier

point in time (P585) 2017-06-08

Format format+ file format (P2701) ZIP
Lizenz licence+ license (P275) Creative Commons CCZero

-> CC0 (Q6938433)

Erstellt release date publication date (P577) "vor über 1 Jahr" -> 2016 (as statement was added on June 12 2017) inaccurate value
format - - ZIP same as "Format" above (listed twice on web page)
id - - 4abb2915-6ed7-4759-8969-15061235d669 Not mapped, not a part of DCAT, no additional value in the context of Wikidata
last modified - - Vor 3 Tagen same as "Zuletzt aktualisiert" above (listed twice on web page)
on same domain - - 1 Not mapped, not a part of DCAT, no additional value in the context of Wikidata
package id - - denkmalschutzobjekt Not mapped, not a part of DCAT, no additional value in the context of Wikidata
resource type media type media type (P1163) file
revision id checksum checksum (P4092) with qualifier determination method (P459) and value globally unique identifier (Q254972) e7a69750-f643-459e-8fd3-a7cbfeb054e1 Remark: It's actually a GUID not an actual checksum. This DCAT property represents best the source's property.
state status [NEW PROPERTY] active TBD: Property does not yet exist.

publication status might be used (see property proposal discussion)

url type - - upload Not mapped, not a part of DCAT, no additional value in the context of Wikidata
- - instance of (P31) digital distribution (Q269415) item's classification in Wikidata
part of (P361) historical monuments in the city of Zurich, Switzerland (Q30237745)

Distribution (KMZ) edit

Item: distribution of historical monuments in the city of Zurich, Switzerland, June 2017 (kmz) (Q30243243)

This item is used as a source of of the coordinate location statement mapped in step 2 since the zip distribution contains coordinates in unique Swiss projection (CH1903+) only.

Property in source DCAT property name

(* mandatory, + recommended)

Representation in Wikidata Value Remarks
Titel title label en: distribution of historical monuments in the city of Zurich, Switzerland, June 2017

de: Distribution des Inventars der kunst- und kulturhistorischen Schutzobjekte der Stadt Zürich, Schweiz, Juni 2017

Beschreibung description+ description en: distribution as kmz file of the inventory of built heritage objects in the city of Zurich, Switzerland (modified: June 2017)

de: Distribution als KMZ-Datei zur Datenbank der städtischen Denkmalpflege zu den inventarisierten und geschützten Objekten in der Stadt Zürich (zuletzt aktualisiert: Juni 2017)

[page URL] access URL URL (P2699) https://data.stadt-zuerich.ch/dataset/denkmalschutzobjekt/resource/c92191f6-2bec-43a8-b4ea-43b3af282f6f used as reference URL for references in other statements
URL download URL URL (P2699) https://data.stadt-zuerich.ch/dataset/denkmalschutzobjekt/resource/c92191f6-2bec-43a8-b4ea-43b3af282f6f/download/denkmalschutzobjekt.kmz with qualifier of pointing to download (Q7126717)
Zuletzt aktualisiert significant event (P793) digital distribution modification (Q30243125)

with qualifier

point in time (P585) 2017-06-08

Format format+ file format (P2701) KMZ
Lizenz licence+ license (P275) Creative Commons CCZero

-> CC0 (Q6938433)

Erstellt release date publication date (P577) "Vor 9 Monaten" -> 2016 (as statement was added on June 12 2017) inaccurate value
format - - KMZ same as "Format" above (listed twice on web page)
id - - c92191f6-2bec-43a8-b4ea-43b3af282f6f Not mapped, not a part of DCAT, no additional value in the context of Wikidata
last modified - - Vor 3 Tagen same as "Zuletzt aktualisiert" above (listed twice on web page)
on same domain - - 1 Not mapped, not a part of DCAT, no additional value in the context of Wikidata
package id - - denkmalschutzobjekt Not mapped, not a part of DCAT, no additional value in the context of Wikidata
resource type media type media type (P1163) file
revision id checksum checksum (P4092) with qualifier determination method (P459) and value globally unique identifier (Q254972) 9d06bab5-9cd2-44d7-bc2f-8c9728c218eb Remark: It's actually a GUID not an actual checksum. This DCAT property represents best the source's property.
state status [NEW PROPERTY] active TBD: Property does not yet exist.

publication status might be used (see property proposal discussion)

url type - - upload Not mapped, not a part of DCAT, no additional value in the context of Wikidata
- - instance of (P31) digital distribution (Q269415) item's classification in Wikidata
part of (P361) historical monuments in the city of Zurich, Switzerland (Q30237745)

Step 7: Clean up existing data on Wikidata edit

[Contribution needed]

This step has not been executed yet.

Document findings and cleanups done here.

Step 8: Ingest the data edit

[Contribution needed]

Only a few sample items have been created. The data ingest of the whole data set still needs to be done.

Sample items

Items without a significant label

Items with a non numeric inception

Items already existing

Items that are part of a existing item (part of Zürich Hauptbahnhof (Q224494))

Information about the data set and distributions above and in the corresponding items should be updated at the time of the actual ingest.

Step 9: Visualize the data edit

[Contribution needed]

The SPARQL queries from step 5 can be used again to check ingested and updated data. Check for errors.

Step 10: Case Report edit

See above.