Wikidata:Dataset Imports/Previous version discussions
Estonian businesses (test) edit
- Name of dataset: A subset of Estonian businesses (test)
- Source: Register OÜ
- Link: https://courses.cs.ut.ee/2017/Ontoloogiadisain/spring/uploads/Main/organizations-12052017.xml
- Description: A subset of Estonian businesses (for testing linked data import before import of full dataset)
- Request by:
Census of Population data of Philippines Cities, Municipalities, Provinces and Regions (1903-2007) edit
- Name of dataset: Census of Population data of Philippines Cities, Municipalities, Provinces and Regions (1903-2007)
- Source: Philippine Statistics Authority
- Link: https://archive.org/download/PhilippinesCensusofPopulationLGUs19032007 (As Philippines public domain FOI request)
- Description: Census of Population data of Philippines Cities, Municipalities, Provinces and Regions (1903-2007)
- Request by: --Exec8 (talk) 04:51, 28 January 2017 (UTC)
Workflow edit
Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Match the data to existing data | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|---|
Name: Census of Population data of Philippines Cities, Municipalities, Provinces and Regions (1903-2007)
Source:Philippine Statistics Authority Link: Web.Archive.org upload (As Philippines public domain FOI request) Description: Census of Population data of Philippines Cities, Municipalities, Provinces and Regions (1903-2007) |
Link: Web.Archive.org upload (As Philippines public domain FOI request)
Done: To do: - Notes: - |
Structure: Population (P1082)
Example item: Dasol (Q41917), Urdaneta (Q43168), Pangasinan (Q13871), Ilocos Region (Q12933) Done: To do: - |
Done:
To do: Notes: |
Done:
To do: Notes: |
Done:
To do: - Notes: |
Date complete:
Notes: |
Discussion: edit
World Heritage Sites edit
- Name of dataset: World Heritage sites
- Source: UNESCO World Heritage Centre
- Link: http://whc.unesco.org/en/list
- Description: A database of the World Heritage sites
- Request by: John Cummings (talk) 14:43, 6 December 2016 (UTC)
Workflow edit
Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Match the data to existing data | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|---|
Name: World Heritage sites
Source: UNESCO World Heritage Centre Link: http://whc.unesco.org/en/list Description: A database of the World Heritage sites |
Link: here
Done: All To do: - Notes: - |
Structure: World Heritage Site ID (P757) , (P2614) World Heritage criteria (2005), (P1435) heritage status = World Heritage Site (with start time as qualifier)
Example item: Q4176 Done: All To do: - |
Done: All
To do: Notes: |
Done:
To do: Inception (P571): remaining items (dates can be found in the site descriptions on the World Heritage website) Notes: |
Done: All except construction date Inception
To do: - Notes: |
Date complete:
Notes: |
Discussion: edit
UNESCO list of journalists who were killed in the exercise of their profession edit
- Name of dataset: journalists who were killed in the exercise of their profession
- Source: UNESCO
- Link: http://www.unesco.org/new/en/communication-and-information/freedom-of-expression/press-freedom/unesco-condemns-killing-of-journalists/
- Description: Yearly lists journalists who were killed in the exercise of their profession collated by UNESCO
- Request by: John Cummings (talk) 15:20, 6 December 2016 (UTC)
Workflow edit
Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Match the data to existing data | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|---|
Name: journalists who were killed in the exercise of their profession
Source: UNESCO Description: Yearly lists journalists who were killed in the exercise of their profession collated by UNESCO |
Link: here
Done: Import data To do: manual work on job and employer columns Notes: |
Structure:
Example item: Done: To do: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Date complete:
Notes: |
Discussion: edit
I don't understand how to include the official condemnation of the killing by UNESCO and the responses by the governments --John Cummings (talk) 15:44, 6 December 2016 (UTC)
UNESCO Atlas of the World's Languages in danger edit
- Name of dataset: UNESCO Atlas of the World's Languages in danger
- Source: UNESCO
- Link: http://www.unesco.org/languages-atlas/
- Description: A database of the world's endangered languages
- Request by: John Cummings (talk) 15:58, 6 December 2016 (UTC)
Workflow edit
Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Match the data to existing data | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|---|
Name: UNESCO Atlas of the World's Languages in danger
Source: UNESCO Link: http://www.unesco.org/languages-atlas/ Description: A database of the world's endangered languages |
Link: here
Done: All To do: - Notes: |
Structure:
Example item: Done: To do: |
Done: All
To do: Notes: |
Done:
To do: Matching in Mix n' Match Notes: |
Done: Imported into Mix n' Match
To do: Notes: |
Date complete:
Notes: |
Discussion: edit
UNESCO Art Collection edit
- Name of dataset: UNESCO Art Collection
- Source: UNESCO
- Link: http://www.unesco.org/artcollection/jsps/welcome.jsp
- Description: A catalogue of art held by UNESCO
- Request by: Sign your name using John Cummings (talk) 16:37, 6 December 2016 (UTC)
Workflow edit
Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Match the data to existing data | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|---|
Name:
Source: Link: Description: |
Link: here
Done: Imported data on all the artworks To do: Add links to the individual pages of the artworks Notes: Not available as a structured database, database created by hand |
Structure:
Example item: Done: To do: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Date complete:
Notes: |
Discussion: edit
UNESCO Memory of the World Programme edit
- Name of dataset: UNESCO Memory of the World Programme
- Source: UNESCO
- Link: http://www.unesco.org/new/en/communication-and-information/flagship-project-activities/memory-of-the-world/homepage/
- Description: An international initiative launched to safeguard the documentary heritage of humanity
- Request by: Sign your name using John Cummings (talk) 16:55, 6 December 2016 (UTC)
Workflow edit
Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Match the data to existing data | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|---|
Name: UNESCO Memory of the World Programme
Source: UNESCO Description: An international initiative launched to safeguard the documentary heritage of humanity |
Link: here
Done: All To do: - Notes: |
Structure:
Example item: Done: To do: |
Done: All
To do: Notes: |
Done: Mix n' Match
To do: Notes: |
Done: Mix n' Match
To do: Next steps Notes: |
Date complete:
Notes: |
Discussion: edit
UNESCO Lists of Intangible Cultural Heritage and the Register of Best Safeguarding Practices edit
- Name of dataset: UNESCO Lists of Intangible Cultural Heritage and the Register of Best Safeguarding Practices
- Source: UNESCO
- Link: http://www.unesco.org/culture/ich/en/lists
- Description: The UNESCO international register of Intangible Cultural Heritage
- Request by: Sign your name using John Cummings (talk) 17:20, 6 December 2016 (UTC)
Workflow edit
Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Match data to existing data | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|---|
Name: UNESCO Lists of Intangible Cultural Heritage and the Register of Best Safeguarding Practices
Source: UNESCO Link: http://www.unesco.org/culture/ich/en/lists Description: The UNESCO international register of Intangible Cultural Heritage |
Link: here
Done: All To do: Notes: |
Structure:
Example item: Done: To do: |
Done: All
To do: Notes: |
Done: Imported into Mix n' Match
To do: Match on Mix n' Match Notes: |
Done:
To do: Notes: |
Date complete:
Notes: |
Discussion: edit
European Red List of Habitat edit
- Name of dataset: European Red List of Habitat
- Source: European Comission
- Link: http://forum.eionet.europa.eu/european-red-list-habitats/library/project-deliverables-data/database/raw-database-13_1_17
- Description: From the website: "The first ever European Red List of Habitats reviews the current status of all natural and semi-natural terrestrial, freshwater and marine habitats and highlights the pressures they face. Using a modified version of the IUCN Red List of Ecosystems categories and criteria, it covers the EU28, plus Iceland, Norway, Switzerland and the Balkan countries and their neighbouring seas. Over 230 terrestrial and freshwater habitats were assessed.
The European Red List of Habitats provides an entirely new and all embracing tool to review commitments for environmental protection and restoration within the EU2020 Biodiversity Strategy. In addition to the assessment of threat, a unique set of information underlies the Red List for every habitat: from a full description to distribution maps, images, links to other classification systems, details of occurrence and trends in each country and lists of threats with information on restoration potential. All of this is publicly available in PDF and database format (see links below), so the Red List can be used for a wide range of analysis. The Red List complements the data collected on Annex I habitat types through Article 17 reporting as it covers a much wider set of habitats than those legally protected under the Habitats Directive."
- Request by: GoEThe (talk) 12:04, 23 February 2017 (UTC)
Workflow edit
Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Match the data to existing data | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|---|
Name: European Red List of Habitat
Source: European Comission Link: [1] Description: Current status of all natural and semi-natural terrestrial, freshwater and marine habitats in Europe. |
Link: [2]
Done: All data imported to spreadsheet To do: Check coding in sheet "European Red List of Habitats", formatting of names with diacritics. Notes: |
Structure:
Example item: Done: To do: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Date complete:
Notes: |
Discussion: edit
DB Netz Betriebsstellenverzeichnis edit
- Name of dataset: DB Netz Betriebsstellenverzeichnis (Open-Data-Portal)
- Source: DB Netz AG (infrastructure departement of germany’s national railway company)
- Link: https://data.deutschebahn.com/dataset/data-betriebsstellen (the latest one, currently from 2017-01)
- Description:
- Abk: The abbreviation used for operational purposes („Ril 100“, formerly „DS 100“). Import to station code (P296).
- Name: The full name. Import to official name (P1448).
- Kurzname: Name variant abbreviated to fit within 16 characters. Import to short name (P1813).
- Typ: Type of location. Import to instance of (P31). I’m suggesting to restrict the import to Bf (Bahnhof (station) (Q27996466)), Hp (Haltepunkt (train stop) (Q27996460)), Abzw (junction (Q27996464)), Üst (Überleitstelle (Q27996463)), Anst (Anschlussstelle (Q27996461)), Awanst (Ausweichanschlussstelle (Q27996462)) and Bk (Q27996465) (including combinations of those like „Hp Anst“, but not the variants like „NE-Hp“) for now.
- Betr-Zust: Wheter the location is only planned or no longer exists. I’m suggesting to not automaticaly import anything with a value here.
- Primary Loaction Code: The code from TSI-TAP/TSI-TAF. Import to station code (P296).
- UIC: Which country the location is in. I’m suggesting to restrict the import to germany (80) for now.
- RB: Which regional section of DB Netz is responsible for this location. I’m suggesting to not automaticly import those which don’t have a value after the other suggested filterings. Or in other words: To not import those without a value here, but ignore the value otherwise.
- gültig von: Literally translates to „valid from“, but honestly I don’t know which date exactly this refers to. Anyway: Not relevant, or maybe don’t import those newer than 2017-01-01.
- gültig bis: Literally translates to „valid until“, same as before just whatever end. Not relevant.
- Netz-Key: Add zeroes on the left until it’s six digits long, prepend the UIC country code and import to UIC station code (P722).
- Fpl-rel: Whether this can be ordered as path of a train path. Not relevant.
- Fpl-Gr: Whether the infrastructur manager (for the germans around: that’s the EIU) responsible for creating the train’s timetable may change here. Not relevant.
- Note about my usage of „P296“ in the description section above: It’s not really clear to me how P296 is supposed to be used. Maybe a new property or whatever would be better. So read this as „P296 or new property“. Note that there are already Items with those codes in P296, which would need to be changed to whatever representation is chosen.
- Request by: --Nenntmichruhigip (talk) 19:52, 21 March 2017 (UTC)
Workflow edit
Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Match the data to existing data | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|---|
Name:
Source: Link: Description: |
Link:
Done: To do: Notes: |
Structure:
Example item: Done: To do: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Date complete:
Notes: |
Discussion: edit
Protected Planet dataset for Germany edit
- Name of dataset: Protected Planet dataset for Germany
- Source: Protected Planet
- Link: https://www.protectedplanet.net/country/DE
- Description: A dataset of IUCN protected areas in Germany
- Request by: Sign your name using John Cummings (talk) 12:44, 1 April 2017 (UTC)
Workflow edit
Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Match the data to existing data | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|---|
Name:
Source: Link: Description: |
Link:
Done: To do: Notes: |
Structure:
Example item: Done: To do: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Date complete:
Notes: |
Discussion edit
List of Museums of São Paulo/Brazil edit
- Name of dataset: List of Museums of São Paulo/Brazil
- Source: Secretaria do Estado da Cultura de São Paulo (Sao Paulo State Secretariat of Culture)
- Link: http://estadodacultura.sp.gov.br/busca/##(global:(enabled:(space:!t),filterEntity:space,map:(center:(lat:-22.61401087437028,lng:-49.2022705078125),zoom:7)),space:(filters:(type:!(%2761%27,%2760%27))))
- Description: List of 485 museums of São Paulo, the state of Brazil with the biggest number of museums
- Request by: Juliana Monteiro
Workflow edit
Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Match the data to existing data | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|---|
Name:
Source: Link: Description: |
Link:
Done: To do: Notes: |
Structure:
Example item: Done: To do: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Date complete:
Notes: |
Discussion edit
Debates of the Constituent Assembly of Jura edit
- Name of dataset: Debates of the Constituent Assembly of Jura
- Source: Jura cantonal archives
- Link: http://www.jura.ch/DFCS/OCC/ArCJ/Projets/Archives-cantonales-jurassiennes-Projets.html
- Description: Sound funds of the plenary sessions of the Constituent Assembly of Jura
- Request by: Marcolurati (talk) 14:22, 3 April 2017 (UTC)
Workflow edit
Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Match the data to existing data | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|---|
Name: Debates of the Constituent Assembly of Jura
Source: Jura cantonal archives Link: http://www.jura.ch/DFCS/OCC/ArCJ/Projets/Archives-cantonales-jurassiennes-Projets.html Description: Sound collection of the plenary sessions of the Constituent Assembly of the canton Jura in Switzerland |
Link: https://docs.google.com/spreadsheets/d/1dqt8hwk9Wp8o5n9i4umoLX-uorW3q7YSpmOpd1FeRD4/edit?usp=sharing
Done: To do: Notes: The Wikimedia Commons page with the sound tracks already exists |
Structure:
Example item: Done: To do: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Date complete:
Notes: |
Discussion edit
For the Workshop Wiki SUPSI - Chapter 2 we are looking into how to add this database to Wikidata. The database was provided by Ilario Valdelli of Wikimedia Switzerland to act as a key study for the viability of adding Wikimedia content's metadata (in this specific case audio recordings collection).
We will work on documenting the process in order to provide a real example for the Archives and Institutions in Switzerland to encourage them using Wikidata as database too.
As it is the first time we are uploading to Wikidata, we would like to have to chance to discuss and find the best way to import those data and define the properties for the audio contents.
Ethnologue's EGIDS language status edit
- Name of dataset: Ethnologue's EGIDS language status
- Source: Ethnologue.com
- Link: https://www.ethnologue.com/browse/codes
- Description:
- Request by: Beeyan (talk) 03:43, 6 April 2017 (UTC)
Workflow edit
Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Match the data to existing data | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|---|
Name: EGIDS language status
Source: Ethnologue Link: https://www.ethnologue.com/browse/codes Description: Import the "Language Status" in every page of languages |
Link:
Done: To do: Notes: |
Structure:
Example item: Done: To do: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Date complete:
Notes: |
Discussion edit
What is the difference between these three letter codes and the ISO-639-3 also maintained by Ethnologue? (As far as I am aware they are the same. Thanks, GerardM (talk) 17:29, 27 June 2017 (UTC)
- @GerardM: it's same --Beeyan (talk) 08:23, 9 August 2017 (UTC)
DB SuS and RNI Stationsdaten edit
- Name of dataset: Stationsdaten DB Station&Service und DB RegioNetz Infrastruktur (Open-Data-Portal)
- Source: DB Station&Service AG (passenger train station departement of germany’s national railway company) and DB RegioNetz Infrastruktur GmbH (infrastructure departement of a regional-oriented subsection of germany’s national railway company)
- Link: https://data.deutschebahn.com/dataset/data-stationsdaten and https://data.deutschebahn.com/dataset/data-stationsdaten-regio (the latest one respectively, currently from 2016-07 and from 2016-01)
- Description:
- Bundesland: Which federal state the station is in. Import to located in the administrative territorial entity (P131), if there isn’t a more specific value already (see also row 9 „Ort“).
-
- BM: (DB SuS) Which station management (subregions of the regional areas; yes, Berlin central station has it’s own station management) is responsible for the station. Not sure how it should be imported. Propably same as „RB“ in the import from DB Netz above.
- Regionalbereich: (DB RNI) Which regional section operates the station. Not sure how it should be imported.
- Bf. Nr.: Station number in DB’s own system. Import to station code (P296).
- Station: The full name. Import to official name (P1448).
- Bf DS 100 Abk.: The abbreviation used for operational purposes („Ril 100“, formerly „DS 100“). Be careful about importing, as one passenger station may map to multiple operational stations (Famous example: Passenger station 1071 is all of BL, BLS, BHBF and BHBT).
- Kat. Vst / Kategorie Vst: Category of the passenger station. Import to instance of (P31) with the appropriate subitem of DB InfraGO station category (Q550637).
- Straße: Postal adress. Ignore.
- PLZ: Postal area code. Ignore.
- Ort: Which city the station is in. I’m not sure how accurate this is, but it seems good enough to import to located in the administrative territorial entity (P131) where there isn’t such a statement already.
- Aufgabenträger: Which authority (transportation authority (Q29471795)) is mainly responsible for ordering the regional passenger transport services. Not sure if it should be imported.
- (the following three rows in the RNI table can be ignored)
- Note about my usage of „P296“ in the description section above: See #DB Netz Betriebsstellenverzeichnis.
- Request by: --Nenntmichruhigip (talk) 19:52, 21 March 2017 (UTC)
Workflow edit
Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Match the data to existing data | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|---|
Name:
Source: Link: Description: |
Link:
Done: To do: Notes: |
Structure:
Example item: Done: To do: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Date complete:
Notes: |
Discussion: edit
Berliner Malweiber edit
- Name of dataset: Berliner Malweiber
- Source: Stiftung Stadtmuseum Berlin
- Link: https://www.stadtmuseum.de/ausstellungen/berlin-stadt-der-frauen
- Description: Metadata produced by the digitisation project Berliner Malweiber undertaken by the Stadtmuseum Berlin, related to its collection of portraits by female artists displayed in the exhibition Berlin – Stadt der Frauen (March–August 2016).
- Request by: Hgkuper (talk) 12:14, 4 May 2017 (UTC)Hgkuper
Workflow edit
Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Match the data to existing data | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|---|
Name: Berliner Malweiber
Source: Stiftung Stadtmuseum Berlin Link: https://www.stadtmuseum.de/ausstellungen/berlin-stadt-der-frauen Description: Metadata relating to the museum's digitisation project Berliner Malweiber, involving works by female artists displayed by the museum in its exhibition Berlin – Stadt der Frauen (March–August 2016). |
Link: here
Done: Initial import of data into spreadsheet; metadata complemented with GND IDs where available. |
Structure: link | Done | Done | Done | Date complete: 2019-09-18
Notes: Will tweak (enrich) selected entries where possible, but the ingest of the fundamental metadata is complete. |
Discussion edit
The data will be imported by User:Hgkuper in preparation for the digiS workshop A gentle introduction to WIKIDATA.
UNESCO Atlas of World Languages in Danger edit
- Name of dataset: UNESCO Atlas of World Languages in Danger
- Source: UNESCO
- Link: http://www.unesco.org/languages-atlas/index.php
- Description: UNESCO’s Atlas of the World’s Languages in Danger is intended to raise awareness about language endangerment, it provides information on numbers of speakers, relevant policies and projects, sources, ISO codes and geographic coordinates.
- Request by: John Cummings (talk) 06:59, 10 June 2017 (UTC)
Workflow edit
Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Match the data to existing data | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|---|
Name:
Source: Link: Description: |
Link:
Done: To do: Notes: |
Structure:
Example item: Done: To do: |
Done:
To do: Notes: |
Done: Added to Mix n' Match
To do: Notes: |
Done:
To do: Notes: There are several languages currently existing in Wikidata that AWLD has more detail on, having a specific entry for each dialect. Once the data has been imported a query should be run to find the items with multiple AWLD and they should be separated out into separate items for each dialect and link back to the non dialect item. |
Date complete:
Notes: |
Discussion edit
JPL Small-Body Database edit
- Name of dataset:JPL Small-Body Database (SBDB)
- Source:JPL Small-Body Database
- Link:https://www.jpl.nasa.gov/
- Description: A database about astronomical objects. It is maintained by Jet Propulsion Laboratory (JPL) and NASA and provides data for all known asteroids and several comets, including orbital parameters and diagrams, physical diagrams, and lists of publications related to the small body. The database is updated on a daily basis.
- Request by: Noobius2 (talk) 20:47, 16 June 2017 (UTC)
Workflow edit
Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Match the data to existing data | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|---|
Name:
Source: Link: Description: |
Link:
Done: To do: Notes: |
Structure:
Example item: Done: To do: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Date complete:
Notes: |
Discussion edit
Gatehouse Gazetteer (Wales) edit
Imported into new dataset imports here
- Name of dataset: Gatehouse Gazetteer (Wales)
- Source: http://www.gatehouse-gazetteer.info/download.html
- Link: http://www.gatehouse-gazetteer.info/download.html
- Description: Database of castle sites in Wales (including sites which are uncertain)
- Request by: Richard Nevell (WMUK) (talk) 12:30, 19 June 2017 (UTC)
Workflow edit
Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Match the data to existing data | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|---|
Name:
Source: Link: Description: |
Link: https://docs.google.com/spreadsheets/d/1o-fZ7HbMieFEJ6vHT61Kp9e97Ix7hTl4utI8Jl0AXos/edit#gid=132680949
Done: Import into Mix n' Match, matched in Mix n' Match To do: Quickstatements Notes: |
Structure:
Example item: Done: To do: |
Done:
To do: Notes: |
Done: Added to Mix n' Match
To do: Notes: |
Done:
To do: Notes: |
Date complete:
Notes: |
Discussion (20) edit
@Richard Nevell (WMUK): I see that you have created multiple items for castles that have no statements and no sources. I have looked at two items and they seem to be duplicates of already existing items Berry's Wood (Q17647484) and Hen Domen (Q5712905). Duplicate items produce additional workload and the general expectation in Wikidata is that contributors make a reasonable effort to avoid duplicate items when mass creating new items.
I created a property proposal for an ID to reference this website at https://www.wikidata.org/wiki/Wikidata:Property_proposal/Gatehouse_Gazetteer_place_ID . Importing via mix-and-match is likely a better idea.
There's also the question of whether data besides the name/description can be imported. ChristianKl (talk) 15:44, 27 June 2017 (UTC)
- Hello @ChristianKl:. The two Berry's Wood enclosures are in different countries (England and Wales) while Hen Domen, Llansantffraid Deuddwr is a different site to the Hen Domen near Montgomery. How many items have you merged so far? The items have been created ahead of matching with Mix'n'match (catalogue here); information on county, country, coordinates, and instance would be imported. Richard Nevell (WMUK) (talk) 16:14, 27 June 2017 (UTC)
- Only the two. Hen Domen, Llansantffraid Deuddwr is located in the historic country of Montgomeryshire (according to http://www.gatehouse-gazetteer.info/Welshsites/664.html). What makes you think that isn't near to Montgomery? ChristianKl (talk) 16:24, 27 June 2017 (UTC)
-
@ChristianKl: The distance between those two sites is 20km as the crow flies. I realise that's not clear from what I added to Wikidata as there were no coordinates on the new item. Both sites are in the historic county of Montgomeryshire, but it does cover something like 2,000km2.
I've been using Mix'n'matches 'game mode to match Wikidata items to the catalogue. The only options for entries without a match are 'new item' (which is what I've been using) and 'N/A'. Have I been using the wrong option? I understand that having Wikidata items without statements isn't particularly helpful, but it is meant to only be temporary. Richard Nevell (WMUK) (talk) 11:35, 29 June 2017 (UTC)
- Hi all, creating new items without statements is just how Mix n' Match works. The statements will be added to the items items once the matching has been complete, there around 300 matches still to go. Thanks, --John Cummings (talk) 11:59, 29 June 2017 (UTC)
@ChristianKl: Is it ok to resume using Mix'n'match to create items? I reckon I could get the rest done by theend of the week so we'll be ready to import statements. Richard Nevell (WMUK) (talk) 13:15, 5 July 2017 (UTC)
- Given that @ValterVB: is the person who's at the moment deleting the items, it might be better to have his opinion. Given that I created the property proposal, I can't create the property for the identifier. If another admin or property creator creates it, it would help a lot with making clear that the items are notable. ChristianKl (talk) 13:25, 5 July 2017 (UTC) S
- Which Item? Without example isn't easy, probably was item without source and with only label e description. --ValterVB (talk) 20:05, 5 July 2017 (UTC)
- ValterVB Mixnmatch will create items with only a description, but then add more statements. Is it OK to resume the matching process? I could try to go quicker so items don't stay empty for long. Richard Nevell (WMUK) (talk) 14:18, 6 July 2017 (UTC)
- Do you have an example to see wich statements you add? And After how long you add the others statements? --ValterVB (talk) 17:43, 6 July 2017 (UTC)
- @ValterVB: Q30758975 is an item you deleted (twice) that is in the property proposal ChristianKl mentions up above. ArthurPSmith (talk) 18:06, 6 July 2017 (UTC)
- @ValterVB: I think it's bad that you delete items (and especially redelete them when undeleted) without knowing what you delete. Engaging with Richard Nevell (WMUK) makes much more sense than blindly deleting his items. ChristianKl (talk) 18:13, 6 July 2017 (UTC)
- I know what I delete: an item without statement, wihout sitelink without back link, no notable, it's in our guideline, If I found this kind of item I delete them withou doubt. --ValterVB (talk) 19:44, 6 July 2017 (UTC)
- @ValterVB: Items that are linked from a property proposal dicussion fulfill a structural need and are thus notable. Aside from that those castles are clearly identifiable entities that can be described with public sources and thus also notable under 3. ChristianKl (talk) 23:34, 6 July 2017 (UTC)
- @ValterVB: I understand that you deleted them because they had no statements, and that's what the policy says. But if I am allowed to match the rest of the set through Mix'n'match I intend to added statements to each item (including instance and location). Would you be happy letting me try that before deleting them? Richard Nevell (WMUK) (talk) 17:44, 7 July 2017 (UTC)
- @Richard Nevell (WMUK): Yesterday I asked "And After how long you add the others statements?", @ChristianKl: If you add "public sources" that clearly identify the item we don't delete the item, nobody can force someone else to look for sources. If a user create an item can do a little effort and add the sources. The items are judged for the state they are in, not for potential that they can have. --ValterVB (talk) 19:30, 7 July 2017 (UTC)
- @ValterVB: The criteria is whether there are public sources that can be used to describe the item and not whether the item is described by public sources. ChristianKl (talk) 19:35, 7 July 2017 (UTC)
- OK, add link to public source in the item so we can check and eventually not delete. --ValterVB (talk) 20:07, 7 July 2017 (UTC)
- ValterVB Does six days sound reasonable? Richard Nevell (WMUK) (talk) 12:32, 8 July 2017 (UTC)
- 6 days? Why so much time? no technical reason to wait one week. For me 48 hours it's the max accettable. --ValterVB (talk) 13:15, 8 July 2017 (UTC)
- Addendum: If you have a list with source I can do it with my bot: Creation and addition of sources with 1 edit. --ValterVB (talk) 13:18, 8 July 2017 (UTC)
- I imagine it could be done reasonably quickly by someone who is well versed with the process, however I am still learning the ropes. Six days should be enough for me to complete the matching, get help with quick statements, and get the information imported while also accommodating other calls on my time. Richard Nevell (WMUK) (talk) 15:55, 8 July 2017 (UTC)
- In 6 day you can loss the item, because somen has changed the thing, you can win the lottery and forgot WIkidata and the process is veri dangerous. If you use quickstatement is more sure and correct add reference right after creating the item using "LAST" command. --ValterVB (talk) 20:29, 10 July 2017 (UTC)
- I imagine it could be done reasonably quickly by someone who is well versed with the process, however I am still learning the ropes. Six days should be enough for me to complete the matching, get help with quick statements, and get the information imported while also accommodating other calls on my time. Richard Nevell (WMUK) (talk) 15:55, 8 July 2017 (UTC)
- Addendum: If you have a list with source I can do it with my bot: Creation and addition of sources with 1 edit. --ValterVB (talk) 13:18, 8 July 2017 (UTC)
- 6 days? Why so much time? no technical reason to wait one week. For me 48 hours it's the max accettable. --ValterVB (talk) 13:15, 8 July 2017 (UTC)
- ValterVB Does six days sound reasonable? Richard Nevell (WMUK) (talk) 12:32, 8 July 2017 (UTC)
- OK, add link to public source in the item so we can check and eventually not delete. --ValterVB (talk) 20:07, 7 July 2017 (UTC)
- @ValterVB: The criteria is whether there are public sources that can be used to describe the item and not whether the item is described by public sources. ChristianKl (talk) 19:35, 7 July 2017 (UTC)
- @Richard Nevell (WMUK): Yesterday I asked "And After how long you add the others statements?", @ChristianKl: If you add "public sources" that clearly identify the item we don't delete the item, nobody can force someone else to look for sources. If a user create an item can do a little effort and add the sources. The items are judged for the state they are in, not for potential that they can have. --ValterVB (talk) 19:30, 7 July 2017 (UTC)
- I know what I delete: an item without statement, wihout sitelink without back link, no notable, it's in our guideline, If I found this kind of item I delete them withou doubt. --ValterVB (talk) 19:44, 6 July 2017 (UTC)
- Do you have an example to see wich statements you add? And After how long you add the others statements? --ValterVB (talk) 17:43, 6 July 2017 (UTC)
- ValterVB Mixnmatch will create items with only a description, but then add more statements. Is it OK to resume the matching process? I could try to go quicker so items don't stay empty for long. Richard Nevell (WMUK) (talk) 14:18, 6 July 2017 (UTC)
- Which Item? Without example isn't easy, probably was item without source and with only label e description. --ValterVB (talk) 20:05, 5 July 2017 (UTC)
ValterVB, Richard Nevell (WMUK) I think there is a missunderstanding, creating empty items is how Mix n' Match works. If you delete empty items created by Mix n' Match you are breaking the data import process for the catalogues. There are currently 100s of catalogues being imported using this tool, some of which take several months to go through to match correctly to existing data. If the policy is incompatible with one of the main data import methods for Wikidata then I suggest we have a larger problem.... --John Cummings (talk) 15:12, 30 August 2017 (UTC)
- Then we have a big problem. --ValterVB (talk) 19:18, 30 August 2017 (UTC)
- If the intention is to add further statements to an item, what is the harm? Richard Nevell (WMUK) (talk) 10:30, 31 August 2017 (UTC)
- Creating items and leaving them blank for a while is a problem, mainly because in the meantime someone might try to match the same concept to Wikidata and not detect your item (because it is blank and therefore hard to find). So they might create their own, and we end up with duplicates. − Pintoch (talk) 12:21, 31 August 2017 (UTC)
- Granted, it's possible but with a suitably short time period the likelihood of this is small. Richard Nevell (WMUK) (talk) 14:42, 31 August 2017 (UTC)
- The problem is: given how Mix and Match currently works, this time period can be rather long. Also, it is totally possible for someone to dump a dataset in Mix'n'Match, start matching some of it, and get bored at some point: in this case, the created items will remain empty forever… − Pintoch (talk) 14:00, 7 September 2017 (UTC)
- Granted, it's possible but with a suitably short time period the likelihood of this is small. Richard Nevell (WMUK) (talk) 14:42, 31 August 2017 (UTC)
- Creating items and leaving them blank for a while is a problem, mainly because in the meantime someone might try to match the same concept to Wikidata and not detect your item (because it is blank and therefore hard to find). So they might create their own, and we end up with duplicates. − Pintoch (talk) 12:21, 31 August 2017 (UTC)
- If the intention is to add further statements to an item, what is the harm? Richard Nevell (WMUK) (talk) 10:30, 31 August 2017 (UTC)
Hi ValterVB and Pintoch, can I suggest we start a discussion on the main project chat do discuss this possible incompatibility between the main import tool and Wikidata policy? There are 10s of Mix n' Match catalogues being processed at the moment, it does not seem realistic or practical to stop using the tool whilst this is discussed. Thanks, --John Cummings (talk) 15:13, 4 September 2017 (UTC)
- Totally! By the way, I am also working on an alternative to Mix'n'Match: OpenRefine. − Pintoch (talk) 14:00, 7 September 2017 (UTC)
- @ValterVB: please can you undelete all the items created by @Richard Nevell (WMUK): and myself asap? We are trying to import data into all items but its breaking because you deleted the items. We can populate the items quickly after you undelete them. Thanks, --John Cummings (talk) 14:06, 7 September 2017 (UTC)
- ValterVB, don't worry about undeleting these items. I'm going to recreate them now, along with some basic statements. Best NavinoEvans (talk) 10:17, 8 September 2017 (UTC)
- @ValterVB: please can you undelete all the items created by @Richard Nevell (WMUK): and myself asap? We are trying to import data into all items but its breaking because you deleted the items. We can populate the items quickly after you undelete them. Thanks, --John Cummings (talk) 14:06, 7 September 2017 (UTC)
Protected Planet Sites in Niger edit
- Name of dataset: PPSNE
- Source: protectedplanet.net
- Link: https://www.protectedplanet.net
- Description: Protected Planet Sites for Niger
- Request by: Battleofalma (talk) 12:32, 19 June 2017 (UTC)
Workflow edit
Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Match the data to existing data | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|---|
Name:
Source: Link: Description: |
Link: https://docs.google.com/spreadsheets/d/1tlqH0TggjqYL-nv2VWKsSYLRK9IQr4gC3jJtvnxwfBw/edit#gid=998871309
Done: 25% To do: 75% Notes: |
Structure:
Example item: Done: To do: |
Done:
To do: Notes: |
Mix N Match Catalogue: https://tools.wmflabs.org/mix-n-match/#/catalog/483Done:
To do: Notes: |
Done:
To do: Notes: |
Date complete:
Notes: |
Discussion edit
Localisation and Information about all Fountains in the City of Zurich edit
- Name of dataset: Brunnen der Stadt Zürich (Fountains in the City of Zurich)
- Source: Open-Data-Catalog of the City of Zürich
- Link: https://data.stadt-zuerich.ch/dataset/brunnen
- Description: This Geodataset shows the locations of the ~1280 fountains which are maintained by the Water Supply Department of the City of Zurich (Wasserversorgung Stadt Zürich). The Geo-Dataset contains interesting attributes like the historical year of construction, the description of the fountain, the kind of water it contains or what kind of fountain it is. The Dataset is under CC-0-License an can be used freely.
- Request by: Marco Sieber, Open-Data-Zürich-Team, Stadt Zürich
Workflow edit
Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Match the data to existing data | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|---|
Name: Brunnen
Source: Open-Data-Catalog City of Zurich Link: https://data.stadt-zuerich.ch/dataset/brunnen Description: This Geodataset - available as GeoJSON, Geopackage, KML, Shapefile, Web Map Service and Web Feature Service - shows the locations of the ~1280 fountains which are maintained by the Water Supply Department of the City of Zurich (Wasserversorgung Stadt Zürich). The Geo-Dataset contains interesting attributes like the historical year of construction, the description of the fountain, the kind of water it contains or what kind of fountain it is. |
Link: https://github.com/opendata-zurich/wikidata/blob/master/fountains/20170918_brunnen_zuerich.xls
Done: Manually converted from GeoJSON to Excel by author. To do: Notes: The conversion from GeoJSON 2 Spreadsheet is not automated yet |
Structure:
Example item: Done: Properties accepted, data imported. Link::
|
Done:
To do: Notes: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Date complete:
Notes: |
Discussion edit
Population by Stadtquartier since 1970 in the City of Zurich edit
- Name of dataset: Population by Stadtquartier in the City of Zurich since 1970
- Source: Open-Data-Catalog of the City of Zurich
- Link: https://data.stadt-zuerich.ch/dataset/bev-bestand-jahr-quartier-seit1970
- Description: This dataset contains all the population since 1970 in the City of Zurich per Statistical Stadtquartier (~district). Dataowner: Statistik Stadt Zürich
- Request by: Marco Sieber, Open-Data-Zürich-Team, Stadt Zürich
Workflow edit
Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Match the data to existing data | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|---|
Name: Bevölkerung nach Stadtquartier, seit 1970 (Resident population per District since 1970)
Source: Dataowner Statistik Stadt Zürich. Source of this file: Open-Data-Catalog of the City of Zurich Link: https://data.stadt-zuerich.ch/dataset/bev-bestand-jahr-quartier-seit1970 Description: Attributes: [Ereignisjahr (technisch: StichtagDatJahr), Time stamp of when the number of the population is representative. Usually at the 31.12.YEAR] [Stadtquartier (Sort) (technisch: QuarSort), Official ID of the District called «Statistischen Stadtquartier» (Integer).] [Stadtquartier (lang) (technisch: QuarLang) Official Name of the District called «Statistischen Stadtquartier»(String).] [Wirtschaftliche Bevölkerung (technisch: AnzBestWir), amount of the resident population (Wirtschaftlich anwesende Personen) (Integer).] The Number of the population according to the Definition of «Resident Population»[3], which is different than the «Permanent resident Population»[4]. The Federal Statiscal Office publishes data for the latter. |
Link: https://github.com/opendata-zurich/wikidata/blob/master/population_quartiere_since1970/bev324od3240.xlsx
Done: CSV2Excel by author. To do: Notes: |
Structure:
Example item: Done: To do: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Date complete:
Notes: |
Discussion edit
Public Art on public ground in the City of Zurich edit
- Name of dataset: Kunst im Stadtraum (KiS) / Public Art on public ground in the City of Zurich
- Source: Open-Data-Catalog of the City of Zürich
- Link: https://data.stadt-zuerich.ch/dataset/kunst-im-stadtraum
- Description: This dataset is a collection of Public Art Objects, which are in possession of the City of Zurich and stand on public ground. The information stored in this data are coming from the responsible departements «Kunst im öffentlichen Raum» and «Kunst und Bau». It contains basic information about these objects and the artists who created them. All objects are georeferenced as well.
- Request by: Marco Sieber, Open-Data-Zürich-Team, Stadt Zürich
Workflow edit
Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Match the data to existing data | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|---|
Name: Kunst im Stadtraum
Source: Open-Data-Catalog City of Zurich. Link: https://data.stadt-zuerich.ch/dataset/kunst-im-stadtraum Description:
|
Link: https://github.com/opendata-zurich/wikidata/blob/master/public_art/kunstimstadtraum.xlsx
Done: Marco Sieber To do: Notes: |
Structure:
Example item: Done: To do: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Date complete:
Notes: |
Discussion edit
UNESCO Honorary and Goodwill Ambassadors edit
- Name of dataset: UNESCO Honorary and Goodwill Ambassadors
- Source: UNESCO
- Link: http://www.unesco.org/new/en/goodwill-ambassadors/
- Description: A list of UNESCO Honorary and Goodwill Ambassadors
- Request by: John Cummings (talk) 09:45, 2 October 2017 (UTC)
Workflow edit
Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Match the data to existing data | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|---|
Name: UNESCO Honorary and Goodwill Ambassadors
Source: UNESCO Link: http://www.unesco.org/new/en/goodwill-ambassadors/ Description: A list of UNESCO Honorary and Goodwill Ambassadors |
Link: https://docs.google.com/spreadsheets/d/1mZCj9ZYGxrzex-9IlEtHlFrXldU97cwrChz5Ym1GSfo/edit?usp=sharing
Done: Import data To do: extract URLs Notes: |
Structure:
Example item: Done: To do: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Date complete:
Notes: |
Discussion edit
Northern Ireland Sites and Monuments Record edit
- Name of dataset: Northern Ireland Sites and Monuments Record
- Source: Northern Ireland Government
- Link: https://www.opendatani.gov.uk/dataset/sites-and-monuments-record
- Description: National built heritage register for Northern Ireland
- Request by: John Cummings (talk) 21:01, 25 October 2017 (UTC)
Workflow edit
Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Match the data to existing data | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|---|
Name:
Source: Link: Description: |
Link: https://docs.google.com/spreadsheets/d/1QVlCe6qDJVPNjVHuk-rAaoZwQ5UKZf0xD2n5M8121zU/edit?usp=sharing
Done: To do: Notes: |
Structure:
Example item: Done: To do: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Date complete:
Notes: |
Discussion edit
Maliku Lama,Kanamit,Maliku,Pulang Pisau,Kalimantan Tengah,Indonesia edit
- Name of dataset: Dusun Maliku Lama
- Source:
- Link:
- Description:
- Request by: RapiNazwa (talk) 22:06, 3 November 2017 (UTC)
Workflow edit
Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Match the data to existing data | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|---|
Name:
Source: Link: Description: |
Link:
Done: To do: Notes: |
Structure:
Example item: Done: To do: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Date complete:
Notes: |
Discussion edit
Alle politisch selbständigen Gemeinden mit ausgewählten Merkmalen am 30.09.2017 (3. Quartal 2017) edit
- Name of dataset: Alle politisch selbständigen Gemeinden mit ausgewählten Merkmalen am 30.09.2017 (3. Quartal 2017)
- Source: Statistisches Bundesamt
- Link: https://www.destatis.de/DE/ZahlenFakten/LaenderRegionen/Regionales/Gemeindeverzeichnis/Administrativ/Archiv/GVAuszugQ/AuszugGV3QAktuell.html;jsessionid=F39E8370DC8C1DA3F40804D32989D763.InternetLive1
- Description: Alle politisch selbständigen Gemeinden mit ausgewählten Merkmalen am 30.09.2017
- Request by: Aloi baf (talk) 10:43, 30 November 2017 (UTC)
Workflow edit
Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Match the data to existing data | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|---|
Name: Alle politisch selbständigen Gemeinden mit ausgewählten Merkmalen am 30.09.2017 (3. Quartal 2017)
Source: Statistisches Bundesamt Description: Alle politisch selbständigen Gemeinden mit ausgewählten Merkmalen am 30.09.2017 |
Link: https://docs.google.com/spreadsheets/d/1HP_4IV-VHWtll0YGjKpUaThPYeXsq9u9vuILE36NViA/edit?usp=sharing
Done: all To do: Notes: Formated .xlsx to easy readable .csv. Separated some columns. Reformated numbers to english delimiter '.'. Added 'Name' and 'Titel' colummn. |
Structure: Population (P1082), coordinate location (P625), postal code (P281), area (P2046)
Example item: Rahden (Q182979) Done: To do: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Date complete:
Notes: |
Discussion edit
CBDB edit
- Name of dataset: CBDB
- Source: https://projects.iq.harvard.edu/cbdb/home
- Link: https://projects.iq.harvard.edu/cbdb/home
- Description: SQLite dump is available.
- Request by: Fantasticfears (talk) 12:45, 13 December 2017 (UTC)
Workflow edit
Phase 1:
Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Match the data to existing data | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|---|
Name:CBDB
Source: SQlite dump Description:Large linked information |
Link:
Done: Done To do: Notes: Already structured. |
Structure:
Example item: Q720 Done: To do: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Done: Import most male in the database.
To do: More claims and link to other entities. Notes: |
Date complete:
Notes: |
Discussion edit
Why did you execute this import without discussion? One million edits and 300,000 new items without useful content is not very constructive. Also, adding this data with an edit rate from 500 till 800 is harmful to the servers. It's also quite a violation of our bot policy. Sjoerd de Bruin (talk) 18:16, 18 December 2017 (UTC)
- I usually don't get a response and yes I didn't make such discussion which I should. I'm aware of your concerns now. As of content, like a complicated dataset it presents, it's fairly hard to build a working one-time script by pywikibot without an existing item. So I decided to import all of that first. I planned to import more linked information based on that. But it seems that community don't allow such behaviours at all and I don't know what to take from here. Since it has an index from CBDB, it certainly meets the notability. As of server load, I didn't know that in the beginning. What I thought is that PAWS allows a faster import but it was down. So I simply go for quick action. Anyway, it seems that I've made a mistake here. What should I do next then to correct them? The data is still quite nice to be imported and hopefully we can both learn something.--Fantasticfears (talk) 19:44, 18 December 2017 (UTC)
- A subgroup to experiment with was better. What will be added to the items in the next round? I hope date of birth and death, gender and profession at least? We do matchings with Chinese artists to Wikidata and this data makes it impossible to verify the right person. We have added https://www.wikidata.org/wiki/Q46346812 but we see 李琦 Li Qi many times in Wikidata without any info. Another question, traditional chinese is used as language for the label and chinese is not filled. Is that correct? --Hannolans (talk) 13:53, 21 December 2017 (UTC)
- Sorry about the late reply. I agree with that a subgroup is better though I've imported quite many. As of Li Qi, that's because there are different people about Li Qi and they are different. CBDB has a lot of occupations in the database and some of them are not listed in Wikidata. In that case, it doesn't get imported. --Fantasticfears (talk) 12:10, 18 January 2018 (UTC)
- There are two issues:
- Was our bot policy violated
- Should we keep the data
- As far as (1) goes, technically it wasn't and the core problem is that our bot policy doesn't speak about QuickStatements. Do the extend that we do want it to cover QuickStatements we should likely amend it.
- As far as (2) goes, I'm in favor of having the data given that's data from a high quality source. When data about the birth/death/floruit is added disambiguation will be easier and in addition the source data set has family relations that are useful to have. ChristianKl ❪✉❫ 09:21, 22 December 2017 (UTC)
- @ChristianKl: I appreciate that. Since every item has a link to CBDB, it's not hard to import more claims. I'd like to take it slower pace and start to make a proposal for the bot.--Fantasticfears (talk) 12:10, 18 January 2018 (UTC)
- @Fantasticfears: ok, now what? What are you going to do with these items? Please explain why you think these items are notable. That's not clear from your proposal. Multichill (talk) 21:32, 22 December 2017 (UTC)
- I think the next step should be a proper bot proposal that explains what you want to do with the existing items. ChristianKl ❪✉❫ 14:01, 24 December 2017 (UTC)
- I noticed that the bot created many items that are duplicates of existing items (e.g. Wang Yangming (Q45417762), Li Wenzhong (Q45484249), Hu Dahai (Q45485882)) when the existing items didn't have CBDB ID (P497). I manually merged dozens of them, and I guess there may be thousands more. I think after basic information are added from the database, we should use a bot to merge the duplicates based on some criteria (perhaps two items with the same name + date of birth + date of death could be determined as duplicates and should be merged).--Stevenliuyi (talk) 23:22, 24 December 2017 (UTC)
- I guess the next step would be to add as much info about each person as possible. I am looking at Du Youlan (Q45728367) linked to [5]. We should get Chinese name, sex or gender (P21), country of citizenship (P27) and 種族部族 ethnic group (P172) (?). "Algorithmically generated index year" might be equivalent to floruit (P1317) with sourcing circumstances (P1480) = circa (Q5727902). I do not know if some of the people in the database have dates of birth and death or occupations. They would be useful for matching and detecting duplicates. Also family relations would be handy. --Jarekt (talk) 18:32, 27 December 2017 (UTC)
- @Multichill:: Per notability criteria, it meets the 2) It refers to an instance of a clearly identifiable conceptual or material entity and 3) It fulfills some structural need. CBDB is a collection of all notables that involved in Chinese history, some people may not famous enough to meet Wikipedia's notability criteria, they were still involved some notable people and CBDB has relation data towards them. Nevertheless, my code had some faults and it added some less notable people in the Wikidata. I would also like to remove them.
- @Stevenliuyi: Wikidata doesn't prevent duplicate claims neither pywikibot does. Do we have another bot to remove that?
- @Jarekt::I'll include that in the proposal.
--Fantasticfears (talk) 12:10, 18 January 2018 (UTC)
- Given that all the data is well-sourced and describes clearly identifiable conceptual or material entity I think taht they should all have a place in Wikidata but the forum for that discussion should be a bot request. ChristianKl ❪✉❫ 13:35, 18 January 2018 (UTC)
- @Fantasticfears: you haven't edited this site for over a month. You just created a lot of nearly empty items on Wikidata of which some (or a lot) are duplicates. Please explain how you plan to expand these items in 2018 and contribute in a meaningful way to this project. Multichill (talk) 18:38, 18 January 2018 (UTC)
- @Multichill: Sorry for the late response. Though it doesn't make sense to focus my personal life, mistakes and commitment for Wikidata. Improvements should be the point in the future. I've mentioned this import to CBDB authors. Here is the code for last import. Since most people in the CBDB is imported and a bot is required, I'd start to continue this thread in a proposal.--Fantasticfears (talk) 12:28, 24 January 2018 (UTC)
- @Fantasticfears: you haven't edited this site for over a month. You just created a lot of nearly empty items on Wikidata of which some (or a lot) are duplicates. Please explain how you plan to expand these items in 2018 and contribute in a meaningful way to this project. Multichill (talk) 18:38, 18 January 2018 (UTC)
FAMCL edit
- Name of dataset: FAMCL
- Source: Frye Art Museum
- Link: http://fryemuseum.org/collection_list/
- Description: Frye Art Museum Collection List: Catalog of works owned by the Frye Art Museum
- Request by: Peaceray (talk) 20:36, 24 January 2018 (UTC)
Workflow edit
Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Match the data to existing data | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|---|
Name:
Source: Link: Description: |
Link:
Done: To do: Notes: |
Structure:
Example item: Done: To do: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Date complete:
Notes: |
Discussion edit
I am completely new to this, so please let me know about anything else that I need to do or any faux pas I might make. Peaceray (talk) 20:43, 24 January 2018 (UTC)
Directory of Open Access Journals edit
Moved to Wikidata:Dataset_Imports/Directory_of_Open_Access_Journals
Car models edit
- Name of dataset: car_dataset
- Source:
- Link: http://ai.stanford.edu/~jkrause/cars/car_dataset.html
- Description: dataset used for automobile model recognition
- Started by:
--- Jura 20:12, 7 March 2018 (UTC)
Workflow edit
Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Match the data to existing data | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|---|
Name:
Source: Link: Description: |
Link:
Done: To do: Notes: |
Structure:
Example item: Done: To do: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Date complete:
Notes: |
Discussion edit
- Listing this here mainly because I don't plan to import this myself. Licensing tbd. Maybe just the list of models should be imported.
--- Jura 20:12, 7 March 2018 (UTC)
UNESCO field offices edit
- Name of dataset: All UNESCO field offices
- Source: UNESCO
- Link: http://www.unesco.org/new/bfc/all-offices/
- Description: A link of all UNESCO field offices by region
- Started by: John Cummings (talk) 10:23, 9 March 2018 (UTC)
Workflow edit
Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Match the data to existing data | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|---|
Name:
Source: Link: Description: |
Link:
Done: To do: Notes: |
Structure:
Example item: Done: To do: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Date complete:
Notes: |
Discussion edit
New York Times Obituaries edit
- Name of dataset: New York Times Obituaries
- Source: New York Times
- Link: https://www.nytimes.com/section/obituaries
- Description: There are 439,000 obituaries in the New York Times database, I don't know how they could be exported
- Started by: John Cummings (talk) 12:36, 9 March 2018 (UTC)
Workflow edit
Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Match the data to existing data | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|---|
Name:
Source: Link: Description: |
Link:
Done: To do: Notes: |
Structure:
Example item: Done: To do: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Date complete:
Notes: |
Discussion edit
Video Game Companies edit
Moved to Wikidata:Dataset Imports/Video Game Companies Jean-Fred (talk) 10:17, 30 May 2018 (UTC)
Anagraphical data of italian schools edit
- Name of dataset: Anagraphical data of italian schools
- Source: Italian Ministry of Education, Universities and Research
- Link: http://dati.istruzione.it/opendata/opendata/catalogo/elements1/?area=Scuole
- Description: A dataset of the italian schools data
- Licensing: data are licensed under italian-open-data-license-v20. I do know it's not CC0, could it be suitable anyway?
- Started by: Floatingpurr (talk) 11:34, 20 March 2018 (UTC)
Workflow edit
Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Match the data to existing data | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|---|
Name: Anagraphical data of italian schools
Source: Italian Ministry of Education, Universities and Research Link: http://dati.istruzione.it/opendata/opendata/catalogo/elements1/?area=Scuole Description: A dataset of the italian schools data
|
Link: spreadsheet here
Done: Merged the 4 datasets of the source link. Added references to schools records already present in Wikidata. Added reference to Wikidata locations. To do: Notes: A lot of schools seems somehow a slightly different flavor of the same one. This does happens since the same Institute may offer different educational paths. |
Structure:
Example item: Done: To do: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Date complete:
Notes: |
Discussion edit
Just updated the spreadsheet. I'm open to suggestions about how importing data. Floatingpurr (talk) 15:07, 17 April 2018 (UTC)
PubMed Central Articles edit
- Name of dataset: PubMed Central Articles
- Source: PubMed Central (PMC)
- Link: https://www.ncbi.nlm.nih.gov/pmc/
- Description: More than 80% of PMC articles are currently on Wikidata. This is an effort to add the missing ones.
- Started by: Mahdimoqri (talk) 04:34, 23 March 2018 (UTC)
Workflow edit
Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Match the data to existing data | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|---|
Name: PMC
Source: PMC Link: https://www.ncbi.nlm.nih.gov/pmc/ Description: 4.7 Million free full-text biomedical and life sciences articles from NIH/NLM |
Link:
Done: To do: Notes: |
Structure:
Example item: Done: To do: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Date complete:
Notes: |
Discussion edit
PubMed Central Journals edit
- Name of dataset: PubMed Central Journals
- Source: National Center for Biotechnology Information (NCBI)
- Link: https://www.ncbi.nlm.nih.gov/pmc/journals/#csvfile
- Description: Currently, more than 100 PMC journals are either missing from Wikidata or missing a NLM Unique ID (P:P1055). This is an effort to add/complete them.
- Started by: Mahdimoqri (talk) 19:09, 23 March 2018 (UTC)
Workflow edit
Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Match the data to existing data | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|---|
Name: PubMed Central Journals
Source: National Center for Biotechnology Information (NCBI) Link: https://www.ncbi.nlm.nih.gov/pmc/journals/ Description: |
Link: missing or incomplete items
Done: To do: Find which ones are missing and which ones are only missing P:P1055 Notes: |
Structure:
Example item: Done: To do: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Done:
To do: Notes: |
Date complete:
Notes: |
Discussion edit
RNSR (Répertoire National des Structures de Recherche) edit
- Name of dataset: RNSR (Répertoire National des Structures de Recherche)
- Source: Ministère de l'Enseignement Supérieur et de la Recherche
- Link: https://data.enseignementsup-recherche.gouv.fr/explore/dataset/fr-esr-repertoire-national-structures-recherche/
- Description: Ce jeu de données présente les structures de recherche publiques, actives ou inactives, référencées dans le répertoire national des structures de recherche (RNSR).
- Started by: OdileB (talk) 08:31, 27 March 2018 (UTC)
Workflow edit
Description of dataset | Create and import data into spreadsheet | Structure of data within Wikidata | Format the data to be imported | Match the data to existing data | Importing data into Wikidata | Date import complete and notes |
---|---|---|---|---|---|---|
Name: RNSR (Répertoire National des Structures de Recherche)
Source: Ministère de l'Enseignement Supérieur et de la Recherche Link: https://data.enseignementsup-recherche.gouv.fr/explore/dataset/fr-esr-repertoire-national-structures-recherche/ Description: Identification data + some information on 6542 French research labs or structures |
Link: https://data.enseignementsup-recherche.gouv.fr/explore/dataset/fr-esr-repertoire-national-structures-recherche/download/?format=csv&use_labels_for_header=true
Done: it's done To do: Notes: Additionnally, I have checked the web sites. |
Structure:
Example item: Done: To do: |
Done:
To do: Notes: |
Done:For a first try, I matched 67 labs or research centers with both Qid and web site.
To do: Notes: |
Done: 694 new French national research structure identifiers added.
To do: Double-check the remaining labs in the RNSR data, then create them in Wikidata. Notes: I plan to import this myself. |
Date complete: 12th April 2018
Notes: |
Discussion edit
I intend to use QuickStatements. Can anybody explain the use of the "source property" ? OdileB (talk) 07:54, 29 March 2018 (UTC)
- @OdileB: I am nor an user of QuickStatements but concerning sources, you can have a look at Help:Sources. Snipre (talk) 09:08, 13 April 2018 (UTC)
Crossref Journals (moved) edit
GNIS Domestic (moved) edit
Moved to Wikidata:Dataset Imports/Geographic Names Information System (GNIS) Domestic