Wikidata:WikiProject Datasets/reports/Ingesting data about historical monuments in the city of Zurich
Introduction to the case report edit
Background edit
This report describes the process of ingesting data about Historical monuments in the City of Zurich, Switzerland.
Besides ingesting the data in Wikidata the following three documents / guidelines are verified during the process in order to get further insights:
- Procedure described in A practical beginners user-guideline for ingesting datasets into Wikidata
- Mapping data to DCAT as described in DCAT - Wikidata - Schema.org mapping
- Recommendations on datasets that serve as sources for data ingestion in Wikidata
Data Sources edit
There are two different data sources which are used during the case studies:
- Data from the City of Zurich, Monuments Protection Service (main data source for the import, Link dataset)
- Data from the Swiss Inventory of Cultural Property of National and Regional Significance (KGS) Link dataset. Some items have already been imported from this data source with the ingest described in this report.
Procedure edit
The procedure followed contains the nine steps introduced in Ingesting Swiss heritage institutions and further illustrated (and enhanced by a 10th step) in A practical beginners user-guideline for ingesting datasets into Wikidata.
Field notes edit
Step 1: Goals, data structure and data quality edit
Data structure diagram edit
The diagram below shows the structure of the data source about historical monuments in the city of Zurich. Only distributions that are used for ingesting the data are illustrated with all properties and data record structure. Only bold properties are used for the ingest.
Data quality assessment edit
The following table shows the base line defined (minimal quality needed to ingest the data set in Wikidata) and the results of the data quality assessment itself. The assessment table is based on Cai and Zhu (2015), the criteria used are based on Rohweder et al. (2015).
information quality category | information quality dimension | indicators | baseline | baseline explanation | assessment |
---|---|---|---|---|---|
A) accessibility data quality | 1) accessibility | The information can be accessed in an easy an direct manner. | x | Data cannot be imported into Wikidata if it's not accessible. | x |
2) ease of manipulation | The information can easily be manipulated and can be used for different purposes. | Data should be easy to manipulate; however it is sufficient if the data is usable in the context of cultural heritage. | x | ||
3) compatible license* | The license under which the data has been published is compatible with Wikidata's CC Zero License. | x | Compulsory for data ingestion into Wikidata. | x | |
B) intrinsic data quality | 1) reputation | Information enjoys high reputation if the source of the data, transport media and processing system have a high reputation of trustworthiness and competence. | x | The organization responsible for collecting the data must have a high reputation. | x |
2) free of error | Information is free of error if it is consistent with reality. | Some errors can be corrected during the import; no serious consequences if there are any left. | |||
3) objectivity | Information is objective if it is strictly factual and non-judgemental. | x | Information on cultural heritage must be objective. | x | |
4) believability | Information is believable if certificates demonstrate a high quality standard or the information is acquired and disseminated at high effort. | x | Information must be accurately collected; no certificates needed. | x | |
5) verifiability* | The origin of the data is clear. | x | The origin of the data must be clear so statements in Wikidata can be covered with meaningful references. | x | |
Statements on items that result from the data can be covered with meaningful references. | |||||
C) representational data quality | 1) understandability | Information is directly understood by the users and can be used for their purposes. | x | Information must be directly understandable in the context of cultural heritage. | x |
2) concise representation | Exactly the required information is presented in a suitable and easily graspable format. | Additional information can be present. Data can be cleansed before it is ingested. | x | ||
3) consistent representation | The information is represented throughout in the same way. | x | Necessary for efficient automated ingestion. | x | |
4) interpretability | Information is described in the same, professionally correct manner. | x | No unstructured data must be ingested into Wikidata. | x | |
The part of the data intended for the ingestion is structured data. | |||||
D) contextual data quality | 1) timeliness | Information represents the actual properties of the described object in a timely manner. | Timely manner is difficult to define in the context of cultural heritage and depending on the actual properties. Therefore not part of the baseline. | (x) | |
2) value-added | The use of the information can lead to a quantifiable increase in a monetary target function. | Monetary target function not mandatory for intended use. | |||
3) completeness | Information is not missing and is available at the defined time points in the respective process steps. | x | A dataset should be ingested as a complete set into Wikidata. Label and descriptions should be unique or at least significant. | x | |
For all mandatory properties, statements with reasonable values can be created. | |||||
Label and description (terms) can be used with meaningful and easily retrievable values. | |||||
4) appropriate amount of data | The amount of information available satisfies the requirements. | x | Requirements must be fulfilled; Wikidata must not be flooded with unnecessary data. | x | |
5) relevancy | Data is relevant when providing information necessary to the user. | x | Information must provide information necessary for cultural heritage. | x | |
6) notability* | The data to be imported meet Wikidata's need for notability. | x | Every item ingested into Wikidata must meet Wikidata's need for notability. | x |
The results of the data assessment shows the baseline has been reached and the data set will be ingested.
Step 2: Mapping edit
Data record edit
As for step 2 the mapping was created. For the mapping with Schema.org the schema LandmarksOrHistoricalBuildings has been used.
Property in datafile | Property in Schema.org | Property in Wikidata | Refers to class in Wikidata | Value range / Example values | Remarks |
---|---|---|---|---|---|
ID (technically: ID) | http://schema.org/identifier | inventory number (P217) | Wikidata property for cultural heritage identification | ID08582 | The prefix "ID" is used to indicate which field is used in the source and to ingest the numerical value into the property of data type string. ID is used in 5 digit format (with leading zeros) to match the properties format requirements. |
Objektbezeichnung (technically: OBJEKTBEZE) | http://schema.org/name | label | multilingual text (label) | Rule 1, row with ID 5798: Rote Fabrik
Rule 2, ID 880: Sprüngli-Haus Rule 3, ID 238: Denkmalschutzobjekt, Talstrasse 82 (Zürich, Schweiz) Rule 4, ID 1111: Denkmalschutzobjekt (Zürich, Schweiz) Rule 5, ID 2: Mehrfamilienhaus, Baurstrasse 40 (Zürich, Schweiz) |
To to ensure items are ingested with a unique (or at least significant) label the following rules are applied:
|
NaehereBezeichnung (technically: NAEHEREBEZ) | http://schema.org/description | description | multilingual text (description) | Verwaltungsgebäude | Since items without a description are difficult to identify in Wikidata search the following rules are applied:
|
Inventarkategorie (technically: INVENTARKA) | http://schema.org/additionalProperty | curator (P1640) | designation for an administrative territorial entity (Q15617994) | value "kommunal": municipality (Q15284)
value "kantonal": canton (Q2311958) value "regional": administrative region (Q3455524) |
source value range: "kommunal", "kantonal", "regional" |
Schutzstatus (technically: UNTERSCHUT) | http://schema.org/additionalProperty | heritage status (P1435) | historic monument in Switzerland (Q3323397) | value "Nein" -> just in inventory: Swiss cultural property of cantonal or local significance (Q28971394)
value "Ja" -> heritage preservation: Swiss cultural property under the protection of cultural heritage (Q30246026) |
source value range: "Nein", "Ja" |
Adresse (technically: ADRESSE) | http://schema.org/address | located at street address (P969) | address (Q319608) | Seestrasse 395, 8038 Zürich | If there are multiple addresses stated in the source, multiple statements are created. For property P969 the entire address is needed including ZIP and city. Therefore these two have to be mapped (e.g. using this directory) |
Baujahr (technisch: BAUJAHR) | http://schema.org/additionalProperty | inception (P571) | year (Q577) | 1896 | |
Stadtkreis (technisch: STADTKREIS) | http://schema.org/additionalProperty | located in the administrative territorial entity (P131) | administrative territorial entity (Q56061) | District 2 (Q456153) | |
Vermessungsbezirk (technisch: VERMBEZIRK) | http://schema.org/additionalProperty | located in the administrative territorial entity (P131) | administrative territorial entity (Q56061) | Wollishofen (Q642353) | only if different from the item used for Stadtkreis |
coordinates (extracted from KML file, mapped with FID (different form ID!) and id5) | http://schema.org/geo | coordinate location (P625) | geographic coordinate system (Q22664) | 27°59'17"N, 86°55'31"E | The coordinates in the original data source appear in a unique Swiss format (CH1903+) in the dbf shape file. They need to be mapped from the KMZ file in the international WGS-84 format. They can be extracted from KML with the tool KMLCSV Converter for instance. The format has to be changed afterwards from «<Longitude>,<Latitude>,<Altitude>» to «<Latitude>,<Longitude>» (e.g. in Excel with this formula: " =CONCATENATE(LEFT(RIGHT(B2;19);17);",";LEFT(B2;17)) "). |
Additional values ingested edit
Step 3: Data formats and data cleansing edit
Checking data formats in OpenRefine resulted in the following findings:
- All rows have an ID.
- Different spellings used for same meaning:
- ehem. vs Ehem. vs. Ehemalige vs. ehemalige vs. Ehemaliges vs. ehemaliges
- Schweiz. vs Schweizerische vs Schweizerischer
- Lots of rows with the same label (Objektbezeichnung) only differing by description (Naehere Bezeichnung) or even only by address (e.g. "Wohnsiedlung Auzelg")
- see rules to form unique (or at least significant) labels in the mapping of step 2
- Non-numeric values (184) for inception (Baujahr) (e.g. vor 1812, um 1770).
- No numeric errors in the numbers (year between 1100 and 2004)
To ease later updates no data cleansing is done for different spellings used for same meanings.
Step 4: Unique identifier edit
The property "ID" of the source data set is used as unique identifier and assigned to the property inventory number (P217) in Wikidata. As described in step 3 all rows of the source data have an unique identifier assigned. To ease later updates of the data set this property is also set for all items already present with a PCP reference number (P381, ger.: KGS-DS-Nummer).
Step 5: Mapping to existing data edit
This query can be used to get all properties with Swiss Heritage Status Class A or B in the city of Zurich (Query).
A mapping of the resulting list with the source data of the city of Zurich can be done using OpenRefine.
Step 6: Model the data source in Wikidata edit
One data set and two distributions are modeled based on this data source.
Data set edit
Item: historical monuments in the city of Zurich, Switzerland (Q30237745)
Property in source | DCAT property name
(* mandatory, + recommended) |
Representation in Wikidata | Value | Remarks |
---|---|---|---|---|
Titel | title* | label | en: historical monuments in the city of Zurich, Switzerland
de: Inventar der kunst- und kulturhistorischen Schutzobjekte der Stadt Zürich, Schweiz |
|
Beschreibung | description* | description | en: Inventory of built heritage objects in the city of Zurich, Switzerland
de: Dataset zur Datenbank der städtischen Denkmalpflege zu den inventarisierten und geschützten Objekten in der Stadt Zürich |
|
Datenowner | publisher+ | publisher (P123) | Denkmalpflege, Amt für Städtebau, Hochbaudepartement, Zürich (Switzerland) (Q30322523) | |
Kontakt | contact point+ | [NEW PROPERTY] contact point | TBD: New Item: Open Data Zürich (opendata@zuerich.ch) as instance of (P31) contact point (Q30322502) with qualifier subject has role (P2868) with value contact point (Q30322502) | TBD: Property does not yet exist. See property proposal discussion. |
Datentyp | - | - | - | Not mapped to a property, represented implicitly by structure |
Datenlieferant | - | [NEW PROPERTY] contact point | TBD: New Item: Geomatik + Vermessung Zürich, Tiefbau- und Entsorgungsdepartement as instance of (P31) contact point (Q30322502)
with qualifier subject has role (P2868) with value vendor (Q1762621) |
TBD: Property does not yet exist. See property proposal discussion. |
Räumliche Beziehung | spatial/ geographical coverage | location (P276) | Zürich (Q72) | |
Rechtsgrundlage | - | laws applied (P3014) | Planungs- und Baugesetz (PBG) (Q30246760)
with qualifier section, verse, or paragraph (P958) and value § 203 (StRB Nr. 635 vom 28.02.1964) |
|
Datenqualität | - | - | - | omitted since this is only a reference to the remarks section which is unstructured |
Erstmalige Veröffentlichung | release date | publication date (P577) | 12.09.2014, 13:20 | |
Zeitraum | temporal coverage | start time (P580) | start time (P580): 2014-08-01
end time (P582): 2014-08-31 |
|
Aktualisierungsdatum | update/modification date | significant event (P793) | data set modification (Q30241577)
with qualifier point in time (P585) 2015-04-15 |
|
Version | version | edition number (P393) | 1.0 | |
Aktualisierungsintervall | frequency | publication interval (P2896) | monatlich | |
Bemerkungen | - | - | - | Not mapped because this is unstructured data |
- | dataset distribution+ | dataset distribution (P2702) | distribution of historical monuments in the city of Zurich, Switzerland, June 2017 (zip) (Q30243079)
distribution of historical monuments in the city of Zurich, Switzerland, June 2017 (kmz) (Q30243243) |
Additional recommended property of DCAT not present in Source |
- | theme/category+ | main subject (P921) | cultural heritage (Q210272) | Additional recommended property of DCAT not present in Source |
- | - | instance of (P31) | data set (Q1172284) | item's classification in Wikidata |
- | - | country (P17) | Switzerland (Q39) | |
- | - | applies to territorial jurisdiction (P1001) | Zürich (Q72) | |
- | - | official website (P856) | https://data.stadt-zuerich.ch/dataset/denkmalschutzobjekt | used as reference URL for references in other statements |
Distribution (ZIP) edit
Item: distribution of historical monuments in the city of Zurich, Switzerland, June 2017 (zip) (Q30243079)
This item is used as source of all statements mapped in step 2 apart from coordinate location (see distribution below) and additional values ingested.
Property in source | DCAT property name
(* mandatory, + recommended) |
Representation in Wikidata | Value | Remarks |
---|---|---|---|---|
Titel | title | label | en: distribution of historical monuments in the city of Zurich, Switzerland, June 2017
de: Distribution des Inventars der kunst- und kulturhistorischen Schutzobjekte der Stadt Zürich, Schweiz, Juni 2017 |
|
Beschreibung | description+ | description | en: distribution as zip file of the inventory of built heritage objects in the city of Zurich, Switzerland (modified: June 2017)
de: Distribution als ZIP-Datei zur Datenbank der städtischen Denkmalpflege zu den inventarisierten und geschützten Objekten in der Stadt Zürich (zuletzt aktualisiert: Juni 2017) |
|
[page URL] | access URL | URL (P2699) | https://data.stadt-zuerich.ch/dataset/denkmalschutzobjekt/resource/4abb2915-6ed7-4759-8969-15061235d669 | used as reference URL for references in other statements |
URL | download URL | URL (P2699) | https://data.stadt-zuerich.ch/dataset/denkmalschutzobjekt/resource/4abb2915-6ed7-4759-8969-15061235d669/download/denkmalschutzobjekt.zip with qualifier of pointing to download (Q7126717) | |
Zuletzt aktualisiert | significant event (P793) | digital distribution modification (Q30243125)
with qualifier point in time (P585) 2017-06-08 |
||
Format | format+ | file format (P2701) | ZIP | |
Lizenz | licence+ | license (P275) | Creative Commons CCZero | |
Erstellt | release date | publication date (P577) | "vor über 1 Jahr" -> 2016 (as statement was added on June 12 2017) | inaccurate value |
format | - | - | ZIP | same as "Format" above (listed twice on web page) |
id | - | - | 4abb2915-6ed7-4759-8969-15061235d669 | Not mapped, not a part of DCAT, no additional value in the context of Wikidata |
last modified | - | - | Vor 3 Tagen | same as "Zuletzt aktualisiert" above (listed twice on web page) |
on same domain | - | - | 1 | Not mapped, not a part of DCAT, no additional value in the context of Wikidata |
package id | - | - | denkmalschutzobjekt | Not mapped, not a part of DCAT, no additional value in the context of Wikidata |
resource type | media type | media type (P1163) | file | |
revision id | checksum | checksum (P4092) with qualifier determination method (P459) and value globally unique identifier (Q254972) | e7a69750-f643-459e-8fd3-a7cbfeb054e1 | Remark: It's actually a GUID not an actual checksum. This DCAT property represents best the source's property. |
state | status | [NEW PROPERTY] | active | TBD: Property does not yet exist.
publication status might be used (see property proposal discussion) |
url type | - | - | upload | Not mapped, not a part of DCAT, no additional value in the context of Wikidata |
- | - | instance of (P31) | digital distribution (Q269415) | item's classification in Wikidata |
part of (P361) | historical monuments in the city of Zurich, Switzerland (Q30237745) |
Distribution (KMZ) edit
Item: distribution of historical monuments in the city of Zurich, Switzerland, June 2017 (kmz) (Q30243243)
This item is used as a source of of the coordinate location statement mapped in step 2 since the zip distribution contains coordinates in unique Swiss projection (CH1903+) only.
Property in source | DCAT property name
(* mandatory, + recommended) |
Representation in Wikidata | Value | Remarks |
---|---|---|---|---|
Titel | title | label | en: distribution of historical monuments in the city of Zurich, Switzerland, June 2017
de: Distribution des Inventars der kunst- und kulturhistorischen Schutzobjekte der Stadt Zürich, Schweiz, Juni 2017 |
|
Beschreibung | description+ | description | en: distribution as kmz file of the inventory of built heritage objects in the city of Zurich, Switzerland (modified: June 2017)
de: Distribution als KMZ-Datei zur Datenbank der städtischen Denkmalpflege zu den inventarisierten und geschützten Objekten in der Stadt Zürich (zuletzt aktualisiert: Juni 2017) |
|
[page URL] | access URL | URL (P2699) | https://data.stadt-zuerich.ch/dataset/denkmalschutzobjekt/resource/c92191f6-2bec-43a8-b4ea-43b3af282f6f | used as reference URL for references in other statements |
URL | download URL | URL (P2699) | https://data.stadt-zuerich.ch/dataset/denkmalschutzobjekt/resource/c92191f6-2bec-43a8-b4ea-43b3af282f6f/download/denkmalschutzobjekt.kmz with qualifier of pointing to download (Q7126717) | |
Zuletzt aktualisiert | significant event (P793) | digital distribution modification (Q30243125)
with qualifier point in time (P585) 2017-06-08 |
||
Format | format+ | file format (P2701) | KMZ | |
Lizenz | licence+ | license (P275) | Creative Commons CCZero | |
Erstellt | release date | publication date (P577) | "Vor 9 Monaten" -> 2016 (as statement was added on June 12 2017) | inaccurate value |
format | - | - | KMZ | same as "Format" above (listed twice on web page) |
id | - | - | c92191f6-2bec-43a8-b4ea-43b3af282f6f | Not mapped, not a part of DCAT, no additional value in the context of Wikidata |
last modified | - | - | Vor 3 Tagen | same as "Zuletzt aktualisiert" above (listed twice on web page) |
on same domain | - | - | 1 | Not mapped, not a part of DCAT, no additional value in the context of Wikidata |
package id | - | - | denkmalschutzobjekt | Not mapped, not a part of DCAT, no additional value in the context of Wikidata |
resource type | media type | media type (P1163) | file | |
revision id | checksum | checksum (P4092) with qualifier determination method (P459) and value globally unique identifier (Q254972) | 9d06bab5-9cd2-44d7-bc2f-8c9728c218eb | Remark: It's actually a GUID not an actual checksum. This DCAT property represents best the source's property. |
state | status | [NEW PROPERTY] | active | TBD: Property does not yet exist.
publication status might be used (see property proposal discussion) |
url type | - | - | upload | Not mapped, not a part of DCAT, no additional value in the context of Wikidata |
- | - | instance of (P31) | digital distribution (Q269415) | item's classification in Wikidata |
part of (P361) | historical monuments in the city of Zurich, Switzerland (Q30237745) |
Step 7: Clean up existing data on Wikidata edit
[Contribution needed]
This step has not been executed yet.
Document findings and cleanups done here.
Step 8: Ingest the data edit
[Contribution needed]
Only a few sample items have been created. The data ingest of the whole data set still needs to be done.
Sample items
Items without a significant label
- Mehrfamilienhaus, Dufourstrasse 117 (Zürich, Switzerland) (Q30321669)
- Mehrfamilienhaus, Lindenstrasse 33 (Zürich, Switzerland) (Q30321715)
Items with a non numeric inception
Items already existing
Items that are part of a existing item (part of Zürich Hauptbahnhof (Q224494))
- Zürich Hauptbahnhof, Bahnhof mit Bahnhofhalle (Q30321751)
- Zürich Hauptbahnhof, Querhalle (Q30322662)
- Zürich Hauptbahnhof, Geleisehalle (Q30322678)
- Zürich Hauptbahnhof, Zentralstellwerk Hauptbahnhof (Q30322683)
- Zürich Hauptbahnhof, Alfred Escher-Brunnen (Q30322685)
- Zürich Hauptbahnhof, Teil des Hauptbahnhofs (Q30322689)
Information about the data set and distributions above and in the corresponding items should be updated at the time of the actual ingest.
Step 9: Visualize the data edit
[Contribution needed]
The SPARQL queries from step 5 can be used again to check ingested and updated data. Check for errors.
Step 10: Case Report edit
See above.