Wikidata:Dataset Imports/Previous version discussions

Estonian businesses (test) edit

Name of dataset: A subset of Estonian businesses (test)
Source: Register OÜ
Link: https://courses.cs.ut.ee/2017/Ontoloogiadisain/spring/uploads/Main/organizations-12052017.xml
Description: A subset of Estonian businesses (for testing linked data import before import of full dataset)
Request by:

Census of Population data of Philippines Cities, Municipalities, Provinces and Regions (1903-2007) edit

Name of dataset: Census of Population data of Philippines Cities, Municipalities, Provinces and Regions (1903-2007)
Source: Philippine Statistics Authority
Link: https://archive.org/download/PhilippinesCensusofPopulationLGUs19032007 (As Philippines public domain FOI request)
Description: Census of Population data of Philippines Cities, Municipalities, Provinces and Regions (1903-2007)
Request by: --Exec8 (talk) 04:51, 28 January 2017 (UTC)[reply]

Workflow edit

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Match the data to existing data	Importing data into Wikidata	Date import complete and notes
Name: Census of Population data of Philippines Cities, Municipalities, Provinces and Regions (1903-2007) Source:Philippine Statistics Authority Link: Web.Archive.org upload (As Philippines public domain FOI request) Description: Census of Population data of Philippines Cities, Municipalities, Provinces and Regions (1903-2007)	Link: Web.Archive.org upload (As Philippines public domain FOI request) Done: To do: - Notes: -	Structure: Population (P1082) Example item: Dasol (Q41917), Urdaneta (Q43168), Pangasinan (Q13871), Ilocos Region (Q12933) Done: To do: -	Done: To do: Notes:	Done: To do: Notes:	Done: To do: - Notes:	Date complete: Notes:

Discussion: edit

World Heritage Sites edit

Name of dataset: World Heritage sites
Source: UNESCO World Heritage Centre
Link: http://whc.unesco.org/en/list
Description: A database of the World Heritage sites
Request by: John Cummings (talk) 14:43, 6 December 2016 (UTC)[reply]

Workflow edit

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Match the data to existing data	Importing data into Wikidata	Date import complete and notes
Name: World Heritage sites Source: UNESCO World Heritage Centre Link: http://whc.unesco.org/en/list Description: A database of the World Heritage sites	Link: here Done: All To do: - Notes: -	Structure: World Heritage Site ID (P757) , (P2614) World Heritage criteria (2005), (P1435) heritage status = World Heritage Site (with start time as qualifier) Example item: Q4176 Done: All To do: -	Done: All To do: Notes:	Done: To do: Inception (P571): remaining items (dates can be found in the site descriptions on the World Heritage website) Notes:	Done: All except construction date Inception To do: - Notes:	Date complete: Notes:

Discussion: edit

UNESCO list of journalists who were killed in the exercise of their profession edit

Name of dataset: journalists who were killed in the exercise of their profession
Source: UNESCO
Link: http://www.unesco.org/new/en/communication-and-information/freedom-of-expression/press-freedom/unesco-condemns-killing-of-journalists/
Description: Yearly lists journalists who were killed in the exercise of their profession collated by UNESCO
Request by: John Cummings (talk) 15:20, 6 December 2016 (UTC)[reply]

Workflow edit

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Match the data to existing data	Importing data into Wikidata	Date import complete and notes
Name: journalists who were killed in the exercise of their profession Source: UNESCO Link: http://www.unesco.org/new/en/communication-and-information/freedom-of-expression/press-freedom/unesco-condemns-killing-of-journalists/ Description: Yearly lists journalists who were killed in the exercise of their profession collated by UNESCO	Link: here Done: Import data To do: manual work on job and employer columns Notes:	Structure: Example item: Done: To do:	Done: To do: Notes:	Done: To do: Notes:	Done: To do: Notes:	Date complete: Notes:

Discussion: edit

I don't understand how to include the official condemnation of the killing by UNESCO and the responses by the governments --John Cummings (talk) 15:44, 6 December 2016 (UTC)[reply]

UNESCO Atlas of the World's Languages in danger edit

Name of dataset: UNESCO Atlas of the World's Languages in danger
Source: UNESCO
Link: http://www.unesco.org/languages-atlas/
Description: A database of the world's endangered languages
Request by: John Cummings (talk) 15:58, 6 December 2016 (UTC)[reply]

Workflow edit

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Match the data to existing data	Importing data into Wikidata	Date import complete and notes
Name: UNESCO Atlas of the World's Languages in danger Source: UNESCO Link: http://www.unesco.org/languages-atlas/ Description: A database of the world's endangered languages	Link: here Done: All To do: - Notes:	Structure: Example item: Done: To do:	Done: All To do: Notes:	Done: To do: Matching in Mix n' Match Notes:	Done: Imported into Mix n' Match To do: Notes:	Date complete: Notes:

Discussion: edit

UNESCO Art Collection edit

Name of dataset: UNESCO Art Collection
Source: UNESCO
Link: http://www.unesco.org/artcollection/jsps/welcome.jsp
Description: A catalogue of art held by UNESCO
Request by: Sign your name using John Cummings (talk) 16:37, 6 December 2016 (UTC)[reply]

Workflow edit

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Match the data to existing data	Importing data into Wikidata	Date import complete and notes
Name: Source: Link: Description:	Link: here Done: Imported data on all the artworks To do: Add links to the individual pages of the artworks Notes: Not available as a structured database, database created by hand	Structure: Example item: Done: To do:	Done: To do: Notes:	Done: To do: Notes:	Done: To do: Notes:	Date complete: Notes:

Discussion: edit

UNESCO Memory of the World Programme edit

Name of dataset: UNESCO Memory of the World Programme
Source: UNESCO
Link: http://www.unesco.org/new/en/communication-and-information/flagship-project-activities/memory-of-the-world/homepage/
Description: An international initiative launched to safeguard the documentary heritage of humanity
Request by: Sign your name using John Cummings (talk) 16:55, 6 December 2016 (UTC)[reply]

Workflow edit

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Match the data to existing data	Importing data into Wikidata	Date import complete and notes
Name: UNESCO Memory of the World Programme Source: UNESCO Link: http://www.unesco.org/new/en/communication-and-information/flagship-project-activities/memory-of-the-world/homepage/ Description: An international initiative launched to safeguard the documentary heritage of humanity	Link: here Done: All To do: - Notes:	Structure: Example item: Done: To do:	Done: All To do: Notes:	Done: Mix n' Match To do: Notes:	Done: Mix n' Match To do: Next steps Notes:	Date complete: Notes:

Discussion: edit

UNESCO Lists of Intangible Cultural Heritage and the Register of Best Safeguarding Practices edit

Name of dataset: UNESCO Lists of Intangible Cultural Heritage and the Register of Best Safeguarding Practices
Source: UNESCO
Link: http://www.unesco.org/culture/ich/en/lists
Description: The UNESCO international register of Intangible Cultural Heritage
Request by: Sign your name using John Cummings (talk) 17:20, 6 December 2016 (UTC)[reply]

Workflow edit

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Match data to existing data	Importing data into Wikidata	Date import complete and notes
Name: UNESCO Lists of Intangible Cultural Heritage and the Register of Best Safeguarding Practices Source: UNESCO Link: http://www.unesco.org/culture/ich/en/lists Description: The UNESCO international register of Intangible Cultural Heritage	Link: here Done: All To do: Notes:	Structure: Example item: Done: To do:	Done: All To do: Notes:	Done: Imported into Mix n' Match To do: Match on Mix n' Match Notes:	Done: To do: Notes:	Date complete: Notes:

Discussion: edit

European Red List of Habitat edit

Name of dataset: European Red List of Habitat
Source: European Comission
Link: http://forum.eionet.europa.eu/european-red-list-habitats/library/project-deliverables-data/database/raw-database-13_1_17
Description: From the website: "The first ever European Red List of Habitats reviews the current status of all natural and semi-natural terrestrial, freshwater and marine habitats and highlights the pressures they face. Using a modified version of the IUCN Red List of Ecosystems categories and criteria, it covers the EU28, plus Iceland, Norway, Switzerland and the Balkan countries and their neighbouring seas. Over 230 terrestrial and freshwater habitats were assessed.

The European Red List of Habitats provides an entirely new and all embracing tool to review commitments for environmental protection and restoration within the EU2020 Biodiversity Strategy. In addition to the assessment of threat, a unique set of information underlies the Red List for every habitat: from a full description to distribution maps, images, links to other classification systems, details of occurrence and trends in each country and lists of threats with information on restoration potential. All of this is publicly available in PDF and database format (see links below), so the Red List can be used for a wide range of analysis. The Red List complements the data collected on Annex I habitat types through Article 17 reporting as it covers a much wider set of habitats than those legally protected under the Habitats Directive."

Request by: GoEThe (talk) 12:04, 23 February 2017 (UTC)[reply]

Workflow edit

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Match the data to existing data	Importing data into Wikidata	Date import complete and notes
Name: European Red List of Habitat Source: European Comission Link: [1] Description: Current status of all natural and semi-natural terrestrial, freshwater and marine habitats in Europe.	Link: [2] Done: All data imported to spreadsheet To do: Check coding in sheet "European Red List of Habitats", formatting of names with diacritics. Notes:	Structure: Example item: Done: To do:	Done: To do: Notes:	Done: To do: Notes:	Done: To do: Notes:	Date complete: Notes:

Discussion: edit

DB Netz Betriebsstellenverzeichnis edit

Name of dataset: DB Netz Betriebsstellenverzeichnis (Open-Data-Portal)
Source: DB Netz AG (infrastructure departement of germany’s national railway company)
Link: https://data.deutschebahn.com/dataset/data-betriebsstellen (the latest one, currently from 2017-01)
Description:
1. Abk: The abbreviation used for operational purposes („Ril 100“, formerly „DS 100“). Import to station code (P296).
2. Name: The full name. Import to official name (P1448).
3. Kurzname: Name variant abbreviated to fit within 16 characters. Import to short name (P1813).
4. Typ: Type of location. Import to instance of (P31). I’m suggesting to restrict the import to Bf (Bahnhof (station) (Q27996466)), Hp (Haltepunkt (train stop) (Q27996460)), Abzw (junction (Q27996464)), Üst (Überleitstelle (Q27996463)), Anst (Anschlussstelle (Q27996461)), Awanst (Ausweichanschlussstelle (Q27996462)) and Bk (Q27996465) (including combinations of those like „Hp Anst“, but not the variants like „NE-Hp“) for now.
5. Betr-Zust: Wheter the location is only planned or no longer exists. I’m suggesting to not automaticaly import anything with a value here.
6. Primary Loaction Code: The code from TSI-TAP/TSI-TAF. Import to station code (P296).
7. UIC: Which country the location is in. I’m suggesting to restrict the import to germany (80) for now.
8. RB: Which regional section of DB Netz is responsible for this location. I’m suggesting to not automaticly import those which don’t have a value after the other suggested filterings. Or in other words: To not import those without a value here, but ignore the value otherwise.
9. gültig von: Literally translates to „valid from“, but honestly I don’t know which date exactly this refers to. Anyway: Not relevant, or maybe don’t import those newer than 2017-01-01.
10. gültig bis: Literally translates to „valid until“, same as before just whatever end. Not relevant.
11. Netz-Key: Add zeroes on the left until it’s six digits long, prepend the UIC country code and import to UIC station code (P722).
12. Fpl-rel: Whether this can be ordered as path of a train path. Not relevant.
13. Fpl-Gr: Whether the infrastructur manager (for the germans around: that’s the EIU) responsible for creating the train’s timetable may change here. Not relevant.
Note about my usage of „P296“ in the description section above: It’s not really clear to me how P296 is supposed to be used. Maybe a new property or whatever would be better. So read this as „P296 or new property“. Note that there are already Items with those codes in P296, which would need to be changed to whatever representation is chosen.
Request by: --Nenntmichruhigip (talk) 19:52, 21 March 2017 (UTC)[reply]

Workflow edit

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Match the data to existing data	Importing data into Wikidata	Date import complete and notes
Name: Source: Link: Description:	Link: Done: To do: Notes:	Structure: Example item: Done: To do:	Done: To do: Notes:	Done: To do: Notes:	Done: To do: Notes:	Date complete: Notes:

Discussion: edit

Protected Planet dataset for Germany edit

Name of dataset: Protected Planet dataset for Germany
Source: Protected Planet
Link: https://www.protectedplanet.net/country/DE
Description: A dataset of IUCN protected areas in Germany
Request by: Sign your name using John Cummings (talk) 12:44, 1 April 2017 (UTC)[reply]

Workflow edit

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Match the data to existing data	Importing data into Wikidata	Date import complete and notes
Name: Source: Link: Description:	Link: Done: To do: Notes:	Structure: Example item: Done: To do:	Done: To do: Notes:	Done: To do: Notes:	Done: To do: Notes:	Date complete: Notes:

Discussion edit

List of Museums of São Paulo/Brazil edit

Name of dataset: List of Museums of São Paulo/Brazil
Source: Secretaria do Estado da Cultura de São Paulo (Sao Paulo State Secretariat of Culture)
Link: http://estadodacultura.sp.gov.br/busca/##(global:(enabled:(space:!t),filterEntity:space,map:(center:(lat:-22.61401087437028,lng:-49.2022705078125),zoom:7)),space:(filters:(type:!(%2761%27,%2760%27))))
Description: List of 485 museums of São Paulo, the state of Brazil with the biggest number of museums
Request by: Juliana Monteiro

Workflow edit

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Match the data to existing data	Importing data into Wikidata	Date import complete and notes
Name: Source: Link: Description:	Link: Done: To do: Notes:	Structure: Example item: Done: To do:	Done: To do: Notes:	Done: To do: Notes:	Done: To do: Notes:	Date complete: Notes:

Discussion edit

Debates of the Constituent Assembly of Jura edit

Name of dataset: Debates of the Constituent Assembly of Jura
Source: Jura cantonal archives
Link: http://www.jura.ch/DFCS/OCC/ArCJ/Projets/Archives-cantonales-jurassiennes-Projets.html
Description: Sound funds of the plenary sessions of the Constituent Assembly of Jura
Request by: Marcolurati (talk) 14:22, 3 April 2017 (UTC)[reply]

Workflow edit

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Match the data to existing data	Importing data into Wikidata	Date import complete and notes
Name: Debates of the Constituent Assembly of Jura Source: Jura cantonal archives Link: http://www.jura.ch/DFCS/OCC/ArCJ/Projets/Archives-cantonales-jurassiennes-Projets.html Description: Sound collection of the plenary sessions of the Constituent Assembly of the canton Jura in Switzerland	Link: https://docs.google.com/spreadsheets/d/1dqt8hwk9Wp8o5n9i4umoLX-uorW3q7YSpmOpd1FeRD4/edit?usp=sharing Done: To do: Notes: The Wikimedia Commons page with the sound tracks already exists	Structure: Example item: Done: To do:	Done: To do: Notes:	Done: To do: Notes:	Done: To do: Notes:	Date complete: Notes:

Discussion edit

For the Workshop Wiki SUPSI - Chapter 2 we are looking into how to add this database to Wikidata. The database was provided by Ilario Valdelli of Wikimedia Switzerland to act as a key study for the viability of adding Wikimedia content's metadata (in this specific case audio recordings collection).

We will work on documenting the process in order to provide a real example for the Archives and Institutions in Switzerland to encourage them using Wikidata as database too.

As it is the first time we are uploading to Wikidata, we would like to have to chance to discuss and find the best way to import those data and define the properties for the audio contents.

Ethnologue's EGIDS language status edit

Name of dataset: Ethnologue's EGIDS language status
Source: Ethnologue.com
Link: https://www.ethnologue.com/browse/codes
Description:
Request by: Beeyan (talk) 03:43, 6 April 2017 (UTC)[reply]

Workflow edit

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Match the data to existing data	Importing data into Wikidata	Date import complete and notes
Name: EGIDS language status Source: Ethnologue Link: https://www.ethnologue.com/browse/codes Description: Import the "Language Status" in every page of languages	Link: Done: To do: Notes:	Structure: Example item: Done: To do:	Done: To do: Notes:	Done: To do: Notes:	Done: To do: Notes:	Date complete: Notes:

Discussion edit

What is the difference between these three letter codes and the ISO-639-3 also maintained by Ethnologue? (As far as I am aware they are the same. Thanks, GerardM (talk) 17:29, 27 June 2017 (UTC)[reply]

@GerardM: it's same --Beeyan (talk) 08:23, 9 August 2017 (UTC)[reply]

DB SuS and RNI Stationsdaten edit

Name of dataset: Stationsdaten DB Station&Service und DB RegioNetz Infrastruktur (Open-Data-Portal)
Source: DB Station&Service AG (passenger train station departement of germany’s national railway company) and DB RegioNetz Infrastruktur GmbH (infrastructure departement of a regional-oriented subsection of germany’s national railway company)
Link: https://data.deutschebahn.com/dataset/data-stationsdaten and https://data.deutschebahn.com/dataset/data-stationsdaten-regio (the latest one respectively, currently from 2016-07 and from 2016-01)
Description:
1. Bundesland: Which federal state the station is in. Import to located in the administrative territorial entity (P131), if there isn’t a more specific value already (see also row 9 „Ort“).
2. - BM: (DB SuS) Which station management (subregions of the regional areas; yes, Berlin central station has it’s own station management) is responsible for the station. Not sure how it should be imported. Propably same as „RB“ in the import from DB Netz above.
  - Regionalbereich: (DB RNI) Which regional section operates the station. Not sure how it should be imported.
3. Bf. Nr.: Station number in DB’s own system. Import to station code (P296).
4. Station: The full name. Import to official name (P1448).
5. Bf DS 100 Abk.: The abbreviation used for operational purposes („Ril 100“, formerly „DS 100“). Be careful about importing, as one passenger station may map to multiple operational stations (Famous example: Passenger station 1071 is all of BL, BLS, BHBF and BHBT).
6. Kat. Vst / Kategorie Vst: Category of the passenger station. Import to instance of (P31) with the appropriate subitem of DB InfraGO station category (Q550637).
7. Straße: Postal adress. Ignore.
8. PLZ: Postal area code. Ignore.
9. Ort: Which city the station is in. I’m not sure how accurate this is, but it seems good enough to import to located in the administrative territorial entity (P131) where there isn’t such a statement already.
10. Aufgabenträger: Which authority (transportation authority (Q29471795)) is mainly responsible for ordering the regional passenger transport services. Not sure if it should be imported.
11. (the following three rows in the RNI table can be ignored)
Note about my usage of „P296“ in the description section above: See #DB Netz Betriebsstellenverzeichnis.
Request by: --Nenntmichruhigip (talk) 19:52, 21 March 2017 (UTC)[reply]

Workflow edit

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Match the data to existing data	Importing data into Wikidata	Date import complete and notes
Name: Source: Link: Description:	Link: Done: To do: Notes:	Structure: Example item: Done: To do:	Done: To do: Notes:	Done: To do: Notes:	Done: To do: Notes:	Date complete: Notes:

Discussion: edit

Berliner Malweiber edit

Name of dataset: Berliner Malweiber
Source: Stiftung Stadtmuseum Berlin
Link: https://www.stadtmuseum.de/ausstellungen/berlin-stadt-der-frauen
Description: Metadata produced by the digitisation project Berliner Malweiber undertaken by the Stadtmuseum Berlin, related to its collection of portraits by female artists displayed in the exhibition Berlin – Stadt der Frauen (March–August 2016).
Request by: Hgkuper (talk) 12:14, 4 May 2017 (UTC)Hgkuper[reply]

Workflow edit

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Match the data to existing data	Importing data into Wikidata	Date import complete and notes
Name: Berliner Malweiber Source: Stiftung Stadtmuseum Berlin Link: https://www.stadtmuseum.de/ausstellungen/berlin-stadt-der-frauen Description: Metadata relating to the museum's digitisation project Berliner Malweiber, involving works by female artists displayed by the museum in its exhibition Berlin – Stadt der Frauen (March–August 2016).	Link: here Done: Initial import of data into spreadsheet; metadata complemented with GND IDs where available.	Structure: link	Done	Done	Done	Date complete: 2019-09-18 Notes: Will tweak (enrich) selected entries where possible, but the ingest of the fundamental metadata is complete.

Discussion edit

The data will be imported by User:Hgkuper in preparation for the digiS workshop A gentle introduction to WIKIDATA.

UNESCO Atlas of World Languages in Danger edit

Name of dataset: UNESCO Atlas of World Languages in Danger
Source: UNESCO
Link: http://www.unesco.org/languages-atlas/index.php
Description: UNESCO’s Atlas of the World’s Languages in Danger is intended to raise awareness about language endangerment, it provides information on numbers of speakers, relevant policies and projects, sources, ISO codes and geographic coordinates.
Request by: John Cummings (talk) 06:59, 10 June 2017 (UTC)[reply]

Workflow edit

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Match the data to existing data	Importing data into Wikidata	Date import complete and notes
Name: Source: Link: Description:	Link: Done: To do: Notes:	Structure: Example item: Done: To do:	Done: To do: Notes:	Done: Added to Mix n' Match To do: Notes:	Done: To do: Notes: There are several languages currently existing in Wikidata that AWLD has more detail on, having a specific entry for each dialect. Once the data has been imported a query should be run to find the items with multiple AWLD and they should be separated out into separate items for each dialect and link back to the non dialect item.	Date complete: Notes:

Discussion edit

JPL Small-Body Database edit

Name of dataset:JPL Small-Body Database (SBDB)
Source:JPL Small-Body Database
Link:https://www.jpl.nasa.gov/
Description: A database about astronomical objects. It is maintained by Jet Propulsion Laboratory (JPL) and NASA and provides data for all known asteroids and several comets, including orbital parameters and diagrams, physical diagrams, and lists of publications related to the small body. The database is updated on a daily basis.
Request by: Noobius2 (talk) 20:47, 16 June 2017 (UTC)[reply]

Workflow edit

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Match the data to existing data	Importing data into Wikidata	Date import complete and notes
Name: Source: Link: Description:	Link: Done: To do: Notes:	Structure: Example item: Done: To do:	Done: To do: Notes:	Done: To do: Notes:	Done: To do: Notes:	Date complete: Notes:

Discussion edit

Gatehouse Gazetteer (Wales) edit

Imported into new dataset imports here

Name of dataset: Gatehouse Gazetteer (Wales)
Source: http://www.gatehouse-gazetteer.info/download.html
Link: http://www.gatehouse-gazetteer.info/download.html
Description: Database of castle sites in Wales (including sites which are uncertain)
Request by: Richard Nevell (WMUK) (talk) 12:30, 19 June 2017 (UTC)[reply]

Workflow edit

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Match the data to existing data	Importing data into Wikidata	Date import complete and notes
Name: Source: Link: Description:	Link: https://docs.google.com/spreadsheets/d/1o-fZ7HbMieFEJ6vHT61Kp9e97Ix7hTl4utI8Jl0AXos/edit#gid=132680949 Done: Import into Mix n' Match, matched in Mix n' Match To do: Quickstatements Notes:	Structure: Example item: Done: To do:	Done: To do: Notes:	Done: Added to Mix n' Match To do: Notes:	Done: To do: Notes:	Date complete: Notes:

Discussion (20) edit

@Richard Nevell (WMUK): I see that you have created multiple items for castles that have no statements and no sources. I have looked at two items and they seem to be duplicates of already existing items Berry's Wood (Q17647484) and Hen Domen (Q5712905). Duplicate items produce additional workload and the general expectation in Wikidata is that contributors make a reasonable effort to avoid duplicate items when mass creating new items.

I created a property proposal for an ID to reference this website at https://www.wikidata.org/wiki/Wikidata:Property_proposal/Gatehouse_Gazetteer_place_ID . Importing via mix-and-match is likely a better idea.

There's also the question of whether data besides the name/description can be imported. ChristianKl (talk) 15:44, 27 June 2017 (UTC)[reply]

Hello @ChristianKl:. The two Berry's Wood enclosures are in different countries (England and Wales) while Hen Domen, Llansantffraid Deuddwr is a different site to the Hen Domen near Montgomery. How many items have you merged so far? The items have been created ahead of matching with Mix'n'match (catalogue here); information on county, country, coordinates, and instance would be imported. Richard Nevell (WMUK) (talk) 16:14, 27 June 2017 (UTC)[reply]

Only the two. Hen Domen, Llansantffraid Deuddwr is located in the historic country of Montgomeryshire (according to http://www.gatehouse-gazetteer.info/Welshsites/664.html). What makes you think that isn't near to Montgomery? ChristianKl (talk) 16:24, 27 June 2017 (UTC)[reply]

@ChristianKl: The distance between those two sites is 20km as the crow flies. I realise that's not clear from what I added to Wikidata as there were no coordinates on the new item. Both sites are in the historic county of Montgomeryshire, but it does cover something like 2,000km².

I've been using Mix'n'matches 'game mode to match Wikidata items to the catalogue. The only options for entries without a match are 'new item' (which is what I've been using) and 'N/A'. Have I been using the wrong option? I understand that having Wikidata items without statements isn't particularly helpful, but it is meant to only be temporary. Richard Nevell (WMUK) (talk) 11:35, 29 June 2017 (UTC)[reply]

Hi all, creating new items without statements is just how Mix n' Match works. The statements will be added to the items items once the matching has been complete, there around 300 matches still to go. Thanks, --John Cummings (talk) 11:59, 29 June 2017 (UTC)[reply]

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘@ChristianKl: Is it ok to resume using Mix'n'match to create items? I reckon I could get the rest done by theend of the week so we'll be ready to import statements. Richard Nevell (WMUK) (talk) 13:15, 5 July 2017 (UTC)[reply]

Given that @ValterVB: is the person who's at the moment deleting the items, it might be better to have his opinion. Given that I created the property proposal, I can't create the property for the identifier. If another admin or property creator creates it, it would help a lot with making clear that the items are notable. ChristianKl (talk) 13:25, 5 July 2017 (UTC) S[reply]

Which Item? Without example isn't easy, probably was item without source and with only label e description. --ValterVB (talk) 20:05, 5 July 2017 (UTC)[reply]

ValterVB Mixnmatch will create items with only a description, but then add more statements. Is it OK to resume the matching process? I could try to go quicker so items don't stay empty for long. Richard Nevell (WMUK) (talk) 14:18, 6 July 2017 (UTC)[reply]

Do you have an example to see wich statements you add? And After how long you add the others statements? --ValterVB (talk) 17:43, 6 July 2017 (UTC)[reply]

@ValterVB: Q30758975 is an item you deleted (twice) that is in the property proposal ChristianKl mentions up above. ArthurPSmith (talk) 18:06, 6 July 2017 (UTC)[reply]

@ValterVB: I think it's bad that you delete items (and especially redelete them when undeleted) without knowing what you delete. Engaging with Richard Nevell (WMUK) makes much more sense than blindly deleting his items. ChristianKl (talk) 18:13, 6 July 2017 (UTC)[reply]

I know what I delete: an item without statement, wihout sitelink without back link, no notable, it's in our guideline, If I found this kind of item I delete them withou doubt. --ValterVB (talk) 19:44, 6 July 2017 (UTC)[reply]

@ValterVB: Items that are linked from a property proposal dicussion fulfill a structural need and are thus notable. Aside from that those castles are clearly identifiable entities that can be described with public sources and thus also notable under 3. ChristianKl (talk) 23:34, 6 July 2017 (UTC)[reply]

@ValterVB: I understand that you deleted them because they had no statements, and that's what the policy says. But if I am allowed to match the rest of the set through Mix'n'match I intend to added statements to each item (including instance and location). Would you be happy letting me try that before deleting them? Richard Nevell (WMUK) (talk) 17:44, 7 July 2017 (UTC)[reply]

@Richard Nevell (WMUK): Yesterday I asked "And After how long you add the others statements?", @ChristianKl: If you add "public sources" that clearly identify the item we don't delete the item, nobody can force someone else to look for sources. If a user create an item can do a little effort and add the sources. The items are judged for the state they are in, not for potential that they can have. --ValterVB (talk) 19:30, 7 July 2017 (UTC)[reply]

@ValterVB: The criteria is whether there are public sources that can be used to describe the item and not whether the item is described by public sources. ChristianKl (talk) 19:35, 7 July 2017 (UTC)[reply]

OK, add link to public source in the item so we can check and eventually not delete. --ValterVB (talk) 20:07, 7 July 2017 (UTC)[reply]

ValterVB Does six days sound reasonable? Richard Nevell (WMUK) (talk) 12:32, 8 July 2017 (UTC)[reply]

6 days? Why so much time? no technical reason to wait one week. For me 48 hours it's the max accettable. --ValterVB (talk) 13:15, 8 July 2017 (UTC)[reply]

Addendum: If you have a list with source I can do it with my bot: Creation and addition of sources with 1 edit. --ValterVB (talk) 13:18, 8 July 2017 (UTC)[reply]

I imagine it could be done reasonably quickly by someone who is well versed with the process, however I am still learning the ropes. Six days should be enough for me to complete the matching, get help with quick statements, and get the information imported while also accommodating other calls on my time. Richard Nevell (WMUK) (talk) 15:55, 8 July 2017 (UTC)[reply]

In 6 day you can loss the item, because somen has changed the thing, you can win the lottery and forgot WIkidata and the process is veri dangerous. If you use quickstatement is more sure and correct add reference right after creating the item using "LAST" command. --ValterVB (talk) 20:29, 10 July 2017 (UTC)[reply]

ValterVB, Richard Nevell (WMUK) I think there is a missunderstanding, creating empty items is how Mix n' Match works. If you delete empty items created by Mix n' Match you are breaking the data import process for the catalogues. There are currently 100s of catalogues being imported using this tool, some of which take several months to go through to match correctly to existing data. If the policy is incompatible with one of the main data import methods for Wikidata then I suggest we have a larger problem.... --John Cummings (talk) 15:12, 30 August 2017 (UTC)[reply]

Then we have a big problem. --ValterVB (talk) 19:18, 30 August 2017 (UTC)[reply]

If the intention is to add further statements to an item, what is the harm? Richard Nevell (WMUK) (talk) 10:30, 31 August 2017 (UTC)[reply]

Creating items and leaving them blank for a while is a problem, mainly because in the meantime someone might try to match the same concept to Wikidata and not detect your item (because it is blank and therefore hard to find). So they might create their own, and we end up with duplicates. − Pintoch (talk) 12:21, 31 August 2017 (UTC)[reply]

Granted, it's possible but with a suitably short time period the likelihood of this is small. Richard Nevell (WMUK) (talk) 14:42, 31 August 2017 (UTC)[reply]

The problem is: given how Mix and Match currently works, this time period can be rather long. Also, it is totally possible for someone to dump a dataset in Mix'n'Match, start matching some of it, and get bored at some point: in this case, the created items will remain empty forever… − Pintoch (talk) 14:00, 7 September 2017 (UTC)[reply]

Hi ValterVB and Pintoch, can I suggest we start a discussion on the main project chat do discuss this possible incompatibility between the main import tool and Wikidata policy? There are 10s of Mix n' Match catalogues being processed at the moment, it does not seem realistic or practical to stop using the tool whilst this is discussed. Thanks, --John Cummings (talk) 15:13, 4 September 2017 (UTC)[reply]

Totally! By the way, I am also working on an alternative to Mix'n'Match: OpenRefine. − Pintoch (talk) 14:00, 7 September 2017 (UTC)[reply]

@ValterVB: please can you undelete all the items created by @Richard Nevell (WMUK): and myself asap? We are trying to import data into all items but its breaking because you deleted the items. We can populate the items quickly after you undelete them. Thanks, --John Cummings (talk) 14:06, 7 September 2017 (UTC)[reply]

ValterVB, don't worry about undeleting these items. I'm going to recreate them now, along with some basic statements. Best NavinoEvans (talk) 10:17, 8 September 2017 (UTC)[reply]

Protected Planet Sites in Niger edit

Name of dataset: PPSNE
Source: protectedplanet.net
Link: https://www.protectedplanet.net
Description: Protected Planet Sites for Niger
Request by: Battleofalma (talk) 12:32, 19 June 2017 (UTC)[reply]

Workflow edit

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Match the data to existing data	Importing data into Wikidata	Date import complete and notes
Name: Source: Link: Description:	Link: https://docs.google.com/spreadsheets/d/1tlqH0TggjqYL-nv2VWKsSYLRK9IQr4gC3jJtvnxwfBw/edit#gid=998871309 Done: 25% To do: 75% Notes:	Structure: Example item: Done: To do:	Done: To do: Notes:	Mix N Match Catalogue: https://tools.wmflabs.org/mix-n-match/#/catalog/483Done: To do: Notes:	Done: To do: Notes:	Date complete: Notes:

Discussion edit

Localisation and Information about all Fountains in the City of Zurich edit

Name of dataset: Brunnen der Stadt Zürich (Fountains in the City of Zurich)
Source: Open-Data-Catalog of the City of Zürich
Link: https://data.stadt-zuerich.ch/dataset/brunnen
Description: This Geodataset shows the locations of the ~1280 fountains which are maintained by the Water Supply Department of the City of Zurich (Wasserversorgung Stadt Zürich). The Geo-Dataset contains interesting attributes like the historical year of construction, the description of the fountain, the kind of water it contains or what kind of fountain it is. The Dataset is under CC-0-License an can be used freely.
Request by: Marco Sieber, Open-Data-Zürich-Team, Stadt Zürich

Workflow edit

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Match the data to existing data	Importing data into Wikidata	Date import complete and notes
Name: Brunnen Source: Open-Data-Catalog City of Zurich Link: https://data.stadt-zuerich.ch/dataset/brunnen Description: This Geodataset - available as GeoJSON, Geopackage, KML, Shapefile, Web Map Service and Web Feature Service - shows the locations of the ~1280 fountains which are maintained by the Water Supply Department of the City of Zurich (Wasserversorgung Stadt Zürich). The Geo-Dataset contains interesting attributes like the historical year of construction, the description of the fountain, the kind of water it contains or what kind of fountain it is.	Link: https://github.com/opendata-zurich/wikidata/blob/master/fountains/20170918_brunnen_zuerich.xls Done: Manually converted from GeoJSON to Excel by author. To do: Notes: The conversion from GeoJSON 2 Spreadsheet is not automated yet	Structure: Example item: Done: Properties accepted, data imported. Link:: To do:	Done: To do: Notes:	Done: To do: Notes:	Done: To do: Notes:	Date complete: Notes:

Discussion edit

Population by Stadtquartier since 1970 in the City of Zurich edit

Name of dataset: Population by Stadtquartier in the City of Zurich since 1970
Source: Open-Data-Catalog of the City of Zurich
Link: https://data.stadt-zuerich.ch/dataset/bev-bestand-jahr-quartier-seit1970
Description: This dataset contains all the population since 1970 in the City of Zurich per Statistical Stadtquartier (~district). Dataowner: Statistik Stadt Zürich
Request by: Marco Sieber, Open-Data-Zürich-Team, Stadt Zürich

Workflow edit

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Match the data to existing data	Importing data into Wikidata	Date import complete and notes
Name: Bevölkerung nach Stadtquartier, seit 1970 (Resident population per District since 1970) Source: Dataowner Statistik Stadt Zürich. Source of this file: Open-Data-Catalog of the City of Zurich Link: https://data.stadt-zuerich.ch/dataset/bev-bestand-jahr-quartier-seit1970 Description: Attributes: [Ereignisjahr (technisch: StichtagDatJahr), Time stamp of when the number of the population is representative. Usually at the 31.12.YEAR] [Stadtquartier (Sort) (technisch: QuarSort), Official ID of the District called «Statistischen Stadtquartier» (Integer).] [Stadtquartier (lang) (technisch: QuarLang) Official Name of the District called «Statistischen Stadtquartier»(String).] [Wirtschaftliche Bevölkerung (technisch: AnzBestWir), amount of the resident population (Wirtschaftlich anwesende Personen) (Integer).] The Number of the population according to the Definition of «Resident Population»[3], which is different than the «Permanent resident Population»[4]. The Federal Statiscal Office publishes data for the latter.	Link: https://github.com/opendata-zurich/wikidata/blob/master/population_quartiere_since1970/bev324od3240.xlsx Done: CSV2Excel by author. To do: Notes:	Structure: Example item: Done: To do:	Done: To do: Notes:	Done: To do: Notes:	Done: To do: Notes:	Date complete: Notes:

Discussion edit

Public Art on public ground in the City of Zurich edit

Name of dataset: Kunst im Stadtraum (KiS) / Public Art on public ground in the City of Zurich
Source: Open-Data-Catalog of the City of Zürich
Link: https://data.stadt-zuerich.ch/dataset/kunst-im-stadtraum
Description: This dataset is a collection of Public Art Objects, which are in possession of the City of Zurich and stand on public ground. The information stored in this data are coming from the responsible departements «Kunst im öffentlichen Raum» and «Kunst und Bau». It contains basic information about these objects and the artists who created them. All objects are georeferenced as well.
Request by: Marco Sieber, Open-Data-Zürich-Team, Stadt Zürich

Workflow edit

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Match the data to existing data	Importing data into Wikidata	Date import complete and notes
Name: Kunst im Stadtraum Source: Open-Data-Catalog City of Zurich. Link: https://data.stadt-zuerich.ch/dataset/kunst-im-stadtraum Description: Attributes: Titel: Title or Describtion of the piece of Art. If the official titel is not known, there's a description within brackets []. Künstler_IN : Artist. Datierung : Date of creation of the piece of Art. Gattung : Type of Art (e.g. fountain, installation, architectural sculpture, etc.) Material_Technik : Material or technique used. Standort : Description on where the object can be found. ID: ID used for the objects. Is supposed to be stable. Easting_WGS: longitude value in WGS84 Northing_WGS: latitude value in WGS84 Links to further information: Public Art Objects on the Züriplan (online map) with pictures.	Link: https://github.com/opendata-zurich/wikidata/blob/master/public_art/kunstimstadtraum.xlsx Done: Marco Sieber To do: Notes:	Structure: Example item: Done: To do:	Done: To do: Notes:	Done: To do: Notes:	Done: To do: Notes:	Date complete: Notes:

Discussion edit

UNESCO Honorary and Goodwill Ambassadors edit

Name of dataset: UNESCO Honorary and Goodwill Ambassadors
Source: UNESCO
Link: http://www.unesco.org/new/en/goodwill-ambassadors/
Description: A list of UNESCO Honorary and Goodwill Ambassadors
Request by: John Cummings (talk) 09:45, 2 October 2017 (UTC)[reply]

Workflow edit

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Match the data to existing data	Importing data into Wikidata	Date import complete and notes
Name: UNESCO Honorary and Goodwill Ambassadors Source: UNESCO Link: http://www.unesco.org/new/en/goodwill-ambassadors/ Description: A list of UNESCO Honorary and Goodwill Ambassadors	Link: https://docs.google.com/spreadsheets/d/1mZCj9ZYGxrzex-9IlEtHlFrXldU97cwrChz5Ym1GSfo/edit?usp=sharing Done: Import data To do: extract URLs Notes:	Structure: Example item: Done: To do:	Done: To do: Notes:	Done: To do: Notes:	Done: To do: Notes:	Date complete: Notes:

Discussion edit

Northern Ireland Sites and Monuments Record edit

Name of dataset: Northern Ireland Sites and Monuments Record
Source: Northern Ireland Government
Link: https://www.opendatani.gov.uk/dataset/sites-and-monuments-record
Description: National built heritage register for Northern Ireland
Request by: John Cummings (talk) 21:01, 25 October 2017 (UTC)[reply]

Workflow edit

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Match the data to existing data	Importing data into Wikidata	Date import complete and notes
Name: Source: Link: Description:	Link: https://docs.google.com/spreadsheets/d/1QVlCe6qDJVPNjVHuk-rAaoZwQ5UKZf0xD2n5M8121zU/edit?usp=sharing Done: To do: Notes:	Structure: Example item: Done: To do:	Done: To do: Notes:	Done: To do: Notes:	Done: To do: Notes:	Date complete: Notes:

Discussion edit

Maliku Lama,Kanamit,Maliku,Pulang Pisau,Kalimantan Tengah,Indonesia edit

Name of dataset: Dusun Maliku Lama
Source:
Link:
Description:
Request by: RapiNazwa (talk) 22:06, 3 November 2017 (UTC)[reply]

Workflow edit

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Match the data to existing data	Importing data into Wikidata	Date import complete and notes
Name: Source: Link: Description:	Link: Done: To do: Notes:	Structure: Example item: Done: To do:	Done: To do: Notes:	Done: To do: Notes:	Done: To do: Notes:	Date complete: Notes:

Discussion edit

Alle politisch selbständigen Gemeinden mit ausgewählten Merkmalen am 30.09.2017 (3. Quartal 2017) edit

Name of dataset: Alle politisch selbständigen Gemeinden mit ausgewählten Merkmalen am 30.09.2017 (3. Quartal 2017)
Source: Statistisches Bundesamt
Link: https://www.destatis.de/DE/ZahlenFakten/LaenderRegionen/Regionales/Gemeindeverzeichnis/Administrativ/Archiv/GVAuszugQ/AuszugGV3QAktuell.html;jsessionid=F39E8370DC8C1DA3F40804D32989D763.InternetLive1
Description: Alle politisch selbständigen Gemeinden mit ausgewählten Merkmalen am 30.09.2017
Request by: Aloi baf (talk) 10:43, 30 November 2017 (UTC)[reply]

Workflow edit

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Match the data to existing data	Importing data into Wikidata	Date import complete and notes
Name: Alle politisch selbständigen Gemeinden mit ausgewählten Merkmalen am 30.09.2017 (3. Quartal 2017) Source: Statistisches Bundesamt Link: https://www.destatis.de/DE/ZahlenFakten/LaenderRegionen/Regionales/Gemeindeverzeichnis/Administrativ/Archiv/GVAuszugQ/AuszugGV3QAktuell.html;jsessionid=F39E8370DC8C1DA3F40804D32989D763.InternetLive1 Description: Alle politisch selbständigen Gemeinden mit ausgewählten Merkmalen am 30.09.2017	Link: https://docs.google.com/spreadsheets/d/1HP_4IV-VHWtll0YGjKpUaThPYeXsq9u9vuILE36NViA/edit?usp=sharing Done: all To do: Notes: Formated .xlsx to easy readable .csv. Separated some columns. Reformated numbers to english delimiter '.'. Added 'Name' and 'Titel' colummn.	Structure: Population (P1082), coordinate location (P625), postal code (P281), area (P2046) Example item: Rahden (Q182979) Done: To do:	Done: To do: Notes:	Done: To do: Notes:	Done: To do: Notes:	Date complete: Notes:

Discussion edit

CBDB edit

Name of dataset: CBDB
Source: https://projects.iq.harvard.edu/cbdb/home
Link: https://projects.iq.harvard.edu/cbdb/home
Description: SQLite dump is available.
Request by: Fantasticfears (talk) 12:45, 13 December 2017 (UTC)[reply]

Workflow edit

Phase 1:

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Match the data to existing data	Importing data into Wikidata	Date import complete and notes
Name:CBDB Source: SQlite dump Link: https://hu-my.sharepoint.com/personal/hongsuwang_fas_harvard_edu/_layouts/15/guestaccess.aspx?docid=07ade27f1bf524247b8b2295d04111975&authkey=AaFG1lZv0b8DCjUbfUG8uIU Description:Large linked information	Link: Done: Done To do: Notes: Already structured.	Structure: Example item: Q720 Done: To do:	Done: To do: Notes:	Done: To do: Notes:	Done: Import most male in the database. To do: More claims and link to other entities. Notes:	Date complete: Notes:

Discussion edit

Why did you execute this import without discussion? One million edits and 300,000 new items without useful content is not very constructive. Also, adding this data with an edit rate from 500 till 800 is harmful to the servers. It's also quite a violation of our bot policy. Sjoerd de Bruin (talk) 18:16, 18 December 2017 (UTC)[reply]

I usually don't get a response and yes I didn't make such discussion which I should. I'm aware of your concerns now. As of content, like a complicated dataset it presents, it's fairly hard to build a working one-time script by pywikibot without an existing item. So I decided to import all of that first. I planned to import more linked information based on that. But it seems that community don't allow such behaviours at all and I don't know what to take from here. Since it has an index from CBDB, it certainly meets the notability. As of server load, I didn't know that in the beginning. What I thought is that PAWS allows a faster import but it was down. So I simply go for quick action. Anyway, it seems that I've made a mistake here. What should I do next then to correct them? The data is still quite nice to be imported and hopefully we can both learn something.--Fantasticfears (talk) 19:44, 18 December 2017 (UTC)[reply]

A subgroup to experiment with was better. What will be added to the items in the next round? I hope date of birth and death, gender and profession at least? We do matchings with Chinese artists to Wikidata and this data makes it impossible to verify the right person. We have added https://www.wikidata.org/wiki/Q46346812 but we see 李琦 Li Qi many times in Wikidata without any info. Another question, traditional chinese is used as language for the label and chinese is not filled. Is that correct? --Hannolans (talk) 13:53, 21 December 2017 (UTC)[reply]

Sorry about the late reply. I agree with that a subgroup is better though I've imported quite many. As of Li Qi, that's because there are different people about Li Qi and they are different. CBDB has a lot of occupations in the database and some of them are not listed in Wikidata. In that case, it doesn't get imported. --Fantasticfears (talk) 12:10, 18 January 2018 (UTC)[reply]

There are two issues:

Was our bot policy violated
Should we keep the data

As far as (1) goes, technically it wasn't and the core problem is that our bot policy doesn't speak about QuickStatements. Do the extend that we do want it to cover QuickStatements we should likely amend it.

As far as (2) goes, I'm in favor of having the data given that's data from a high quality source. When data about the birth/death/floruit is added disambiguation will be easier and in addition the source data set has family relations that are useful to have. ChristianKl ❪✉❫ 09:21, 22 December 2017 (UTC)[reply]

@ChristianKl: I appreciate that. Since every item has a link to CBDB, it's not hard to import more claims. I'd like to take it slower pace and start to make a proposal for the bot.--Fantasticfears (talk) 12:10, 18 January 2018 (UTC)[reply]

@Fantasticfears: ok, now what? What are you going to do with these items? Please explain why you think these items are notable. That's not clear from your proposal. Multichill (talk) 21:32, 22 December 2017 (UTC)[reply]

I think the next step should be a proper bot proposal that explains what you want to do with the existing items. ChristianKl ❪✉❫ 14:01, 24 December 2017 (UTC)[reply]

I noticed that the bot created many items that are duplicates of existing items (e.g. Wang Yangming (Q45417762), Li Wenzhong (Q45484249), Hu Dahai (Q45485882)) when the existing items didn't have CBDB ID (P497). I manually merged dozens of them, and I guess there may be thousands more. I think after basic information are added from the database, we should use a bot to merge the duplicates based on some criteria (perhaps two items with the same name + date of birth + date of death could be determined as duplicates and should be merged).--Stevenliuyi (talk) 23:22, 24 December 2017 (UTC)[reply]

I guess the next step would be to add as much info about each person as possible. I am looking at Du Youlan (Q45728367) linked to [5]. We should get Chinese name, sex or gender (P21), country of citizenship (P27) and 種族部族 ethnic group (P172) (?). "Algorithmically generated index year" might be equivalent to floruit (P1317) with sourcing circumstances (P1480) = circa (Q5727902). I do not know if some of the people in the database have dates of birth and death or occupations. They would be useful for matching and detecting duplicates. Also family relations would be handy. --Jarekt (talk) 18:32, 27 December 2017 (UTC)[reply]

@Multichill:: Per notability criteria, it meets the 2) It refers to an instance of a clearly identifiable conceptual or material entity and 3) It fulfills some structural need. CBDB is a collection of all notables that involved in Chinese history, some people may not famous enough to meet Wikipedia's notability criteria, they were still involved some notable people and CBDB has relation data towards them. Nevertheless, my code had some faults and it added some less notable people in the Wikidata. I would also like to remove them.

@Stevenliuyi: Wikidata doesn't prevent duplicate claims neither pywikibot does. Do we have another bot to remove that?

@Jarekt::I'll include that in the proposal.

--Fantasticfears (talk) 12:10, 18 January 2018 (UTC)[reply]

Given that all the data is well-sourced and describes clearly identifiable conceptual or material entity I think taht they should all have a place in Wikidata but the forum for that discussion should be a bot request. ChristianKl ❪✉❫ 13:35, 18 January 2018 (UTC)[reply]

@Fantasticfears: you haven't edited this site for over a month. You just created a lot of nearly empty items on Wikidata of which some (or a lot) are duplicates. Please explain how you plan to expand these items in 2018 and contribute in a meaningful way to this project. Multichill (talk) 18:38, 18 January 2018 (UTC)[reply]

@Multichill: Sorry for the late response. Though it doesn't make sense to focus my personal life, mistakes and commitment for Wikidata. Improvements should be the point in the future. I've mentioned this import to CBDB authors. Here is the code for last import. Since most people in the CBDB is imported and a bot is required, I'd start to continue this thread in a proposal.--Fantasticfears (talk) 12:28, 24 January 2018 (UTC)[reply]

FAMCL edit

Name of dataset: FAMCL
Source: Frye Art Museum
Link: http://fryemuseum.org/collection_list/
Description: Frye Art Museum Collection List: Catalog of works owned by the Frye Art Museum
Request by: Peaceray (talk) 20:36, 24 January 2018 (UTC)[reply]

Workflow edit

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Match the data to existing data	Importing data into Wikidata	Date import complete and notes
Name: Source: Link: Description:	Link: Done: To do: Notes:	Structure: Example item: Done: To do:	Done: To do: Notes:	Done: To do: Notes:	Done: To do: Notes:	Date complete: Notes:

Discussion edit

I am completely new to this, so please let me know about anything else that I need to do or any faux pas I might make. Peaceray (talk) 20:43, 24 January 2018 (UTC)[reply]

Directory of Open Access Journals edit

Moved to Wikidata:Dataset_Imports/Directory_of_Open_Access_Journals

Car models edit

Name of dataset: car_dataset
Source:
Link: http://ai.stanford.edu/~jkrause/cars/car_dataset.html
Description: dataset used for automobile model recognition
Started by:
--- Jura 20:12, 7 March 2018 (UTC)[reply]

Workflow edit

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Match the data to existing data	Importing data into Wikidata	Date import complete and notes
Name: Source: Link: Description:	Link: Done: To do: Notes:	Structure: Example item: Done: To do:	Done: To do: Notes:	Done: To do: Notes:	Done: To do: Notes:	Date complete: Notes:

Discussion edit

Listing this here mainly because I don't plan to import this myself. Licensing tbd. Maybe just the list of models should be imported.
--- Jura 20:12, 7 March 2018 (UTC)[reply]

UNESCO field offices edit

Name of dataset: All UNESCO field offices
Source: UNESCO
Link: http://www.unesco.org/new/bfc/all-offices/
Description: A link of all UNESCO field offices by region
Started by: John Cummings (talk) 10:23, 9 March 2018 (UTC)[reply]

Workflow edit

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Match the data to existing data	Importing data into Wikidata	Date import complete and notes
Name: Source: Link: Description:	Link: Done: To do: Notes:	Structure: Example item: Done: To do:	Done: To do: Notes:	Done: To do: Notes:	Done: To do: Notes:	Date complete: Notes:

Discussion edit

New York Times Obituaries edit

Name of dataset: New York Times Obituaries
Source: New York Times
Link: https://www.nytimes.com/section/obituaries
Description: There are 439,000 obituaries in the New York Times database, I don't know how they could be exported
Started by: John Cummings (talk) 12:36, 9 March 2018 (UTC)[reply]

Workflow edit

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Match the data to existing data	Importing data into Wikidata	Date import complete and notes
Name: Source: Link: Description:	Link: Done: To do: Notes:	Structure: Example item: Done: To do:	Done: To do: Notes:	Done: To do: Notes:	Done: To do: Notes:	Date complete: Notes:

Discussion edit

Video Game Companies edit

Moved to Wikidata:Dataset Imports/Video Game Companies Jean-Fred (talk) 10:17, 30 May 2018 (UTC)[reply]

Anagraphical data of italian schools edit

Name of dataset: Anagraphical data of italian schools
Source: Italian Ministry of Education, Universities and Research
Link: http://dati.istruzione.it/opendata/opendata/catalogo/elements1/?area=Scuole
Description: A dataset of the italian schools data
Licensing: data are licensed under italian-open-data-license-v20. I do know it's not CC0, could it be suitable anyway?
Started by: Floatingpurr (talk) 11:34, 20 March 2018 (UTC)[reply]

Workflow edit

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Match the data to existing data	Importing data into Wikidata	Date import complete and notes
Name: Anagraphical data of italian schools Source: Italian Ministry of Education, Universities and Research Link: http://dati.istruzione.it/opendata/opendata/catalogo/elements1/?area=Scuole Description: A dataset of the italian schools data	Link: spreadsheet here Done: Merged the 4 datasets of the source link. Added references to schools records already present in Wikidata. Added reference to Wikidata locations. To do: Notes: A lot of schools seems somehow a slightly different flavor of the same one. This does happens since the same Institute may offer different educational paths.	Structure: Example item: Done: To do:	Done: To do: Notes:	Done: To do: Notes:	Done: To do: Notes:	Date complete: Notes:

Discussion edit

Just updated the spreadsheet. I'm open to suggestions about how importing data. Floatingpurr (talk) 15:07, 17 April 2018 (UTC)[reply]

PubMed Central Articles edit

Name of dataset: PubMed Central Articles
Source: PubMed Central (PMC)
Link: https://www.ncbi.nlm.nih.gov/pmc/
Description: More than 80% of PMC articles are currently on Wikidata. This is an effort to add the missing ones.
Started by: Mahdimoqri (talk) 04:34, 23 March 2018 (UTC)[reply]

Workflow edit

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Match the data to existing data	Importing data into Wikidata	Date import complete and notes
Name: PMC Source: PMC Link: https://www.ncbi.nlm.nih.gov/pmc/ Description: 4.7 Million free full-text biomedical and life sciences articles from NIH/NLM	Link: Done: To do: Notes:	Structure: Example item: Done: To do:	Done: To do: Notes:	Done: To do: Notes:	Done: To do: Notes:	Date complete: Notes:

Discussion edit

PubMed Central Journals edit

Name of dataset: PubMed Central Journals
Source: National Center for Biotechnology Information (NCBI)
Link: https://www.ncbi.nlm.nih.gov/pmc/journals/#csvfile
Description: Currently, more than 100 PMC journals are either missing from Wikidata or missing a NLM Unique ID (P:P1055). This is an effort to add/complete them.
Started by: Mahdimoqri (talk) 19:09, 23 March 2018 (UTC)[reply]

Workflow edit

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Match the data to existing data	Importing data into Wikidata	Date import complete and notes
Name: PubMed Central Journals Source: National Center for Biotechnology Information (NCBI) Link: https://www.ncbi.nlm.nih.gov/pmc/journals/ Description:	Link: missing or incomplete items Done: To do: Find which ones are missing and which ones are only missing P:P1055 Notes:	Structure: Example item: Done: To do:	Done: To do: Notes:	Done: To do: Notes:	Done: To do: Notes:	Date complete: Notes:

Discussion edit

RNSR (Répertoire National des Structures de Recherche) edit

Name of dataset: RNSR (Répertoire National des Structures de Recherche)
Source: Ministère de l'Enseignement Supérieur et de la Recherche
Link: https://data.enseignementsup-recherche.gouv.fr/explore/dataset/fr-esr-repertoire-national-structures-recherche/
Description: Ce jeu de données présente les structures de recherche publiques, actives ou inactives, référencées dans le répertoire national des structures de recherche (RNSR).
Started by: OdileB (talk) 08:31, 27 March 2018 (UTC)[reply]

Workflow edit

Description of dataset	Create and import data into spreadsheet	Structure of data within Wikidata	Format the data to be imported	Match the data to existing data	Importing data into Wikidata	Date import complete and notes
Name: RNSR (Répertoire National des Structures de Recherche) Source: Ministère de l'Enseignement Supérieur et de la Recherche Link: https://data.enseignementsup-recherche.gouv.fr/explore/dataset/fr-esr-repertoire-national-structures-recherche/ Description: Identification data + some information on 6542 French research labs or structures	Link: https://data.enseignementsup-recherche.gouv.fr/explore/dataset/fr-esr-repertoire-national-structures-recherche/download/?format=csv&use_labels_for_header=true Done: it's done To do: Notes: Additionnally, I have checked the web sites.	Structure: Example item: Done: To do:	Done: To do: Notes:	Done:For a first try, I matched 67 labs or research centers with both Qid and web site. To do: Notes:	Done: 694 new French national research structure identifiers added. To do: Double-check the remaining labs in the RNSR data, then create them in Wikidata. Notes: I plan to import this myself.	Date complete: 12th April 2018 Notes:

Discussion edit

I intend to use QuickStatements. Can anybody explain the use of the "source property" ? OdileB (talk) 07:54, 29 March 2018 (UTC)[reply]

@OdileB: I am nor an user of QuickStatements but concerning sources, you can have a look at Help:Sources. Snipre (talk) 09:08, 13 April 2018 (UTC)[reply]

Crossref Journals (moved) edit

Moved to Wikidata:Dataset Imports/Crossref journals

GNIS Domestic (moved) edit

Moved to Wikidata:Dataset Imports/Geographic Names Information System (GNIS) Domestic