Wikidata:Dataset Imports/Previous version discussions

Estonian businesses (test) edit

Census of Population data of Philippines Cities, Municipalities, Provinces and Regions (1903-2007) edit

Workflow edit

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name: Census of Population data of Philippines Cities, Municipalities, Provinces and Regions (1903-2007)

Source:Philippine Statistics Authority

Link: Web.Archive.org upload (As Philippines public domain FOI request)

Description: Census of Population data of Philippines Cities, Municipalities, Provinces and Regions (1903-2007)

Link: Web.Archive.org upload (As Philippines public domain FOI request)

Done:

To do: -

Notes: -

Structure: Population (P1082)

Example item: Dasol (Q41917), Urdaneta (Q43168), Pangasinan (Q13871), Ilocos Region (Q12933)

Done:

To do: -

Done:

To do:

Notes:

Done:

To do:

Notes:

Done:

To do: -

Notes:

Date complete:

Notes:

Discussion: edit

World Heritage Sites edit

Workflow edit

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name: World Heritage sites

Source: UNESCO World Heritage Centre

Link: http://whc.unesco.org/en/list

Description: A database of the World Heritage sites

Link: here

Done: All

To do: -

Notes: -

Structure: World Heritage Site ID (P757) , (P2614) World Heritage criteria (2005), (P1435) heritage status = World Heritage Site (with start time as qualifier)

Example item: Q4176

Done: All

To do: -

Done: All

To do:

Notes:

Done:

To do: Inception (P571): remaining items (dates can be found in the site descriptions on the World Heritage website)

Notes:

Done: All except construction date Inception

To do: -

Notes:

Date complete:

Notes:

Discussion: edit

UNESCO list of journalists who were killed in the exercise of their profession edit

Workflow edit

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name: journalists who were killed in the exercise of their profession

Source: UNESCO

Link: http://www.unesco.org/new/en/communication-and-information/freedom-of-expression/press-freedom/unesco-condemns-killing-of-journalists/

Description: Yearly lists journalists who were killed in the exercise of their profession collated by UNESCO

Link: here

Done: Import data

To do: manual work on job and employer columns

Notes:

Structure:

Example item:

Done:

To do:

Done:

To do:

Notes:

Done:

To do:

Notes:

Done:

To do:

Notes:

Date complete:

Notes:

Discussion: edit

I don't understand how to include the official condemnation of the killing by UNESCO and the responses by the governments --John Cummings (talk) 15:44, 6 December 2016 (UTC)[reply]

UNESCO Atlas of the World's Languages in danger edit

Workflow edit

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name: UNESCO Atlas of the World's Languages in danger

Source: UNESCO

Link: http://www.unesco.org/languages-atlas/

Description: A database of the world's endangered languages

Link: here

Done: All

To do: -

Notes:

Structure:

Example item:

Done:

To do:

Done: All

To do:

Notes:

Done:

To do: Matching in Mix n' Match

Notes:

Done: Imported into Mix n' Match

To do:

Notes:

Date complete:

Notes:

Discussion: edit

UNESCO Art Collection edit

Workflow edit

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name:

Source:

Link:

Description:

Link: here

Done: Imported data on all the artworks

To do: Add links to the individual pages of the artworks

Notes: Not available as a structured database, database created by hand

Structure:

Example item:

Done:

To do:

Done:

To do:

Notes:

Done:

To do:

Notes:

Done:

To do:

Notes:

Date complete:

Notes:

Discussion: edit

UNESCO Memory of the World Programme edit

Workflow edit

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name: UNESCO Memory of the World Programme

Source: UNESCO

Link: http://www.unesco.org/new/en/communication-and-information/flagship-project-activities/memory-of-the-world/homepage/

Description: An international initiative launched to safeguard the documentary heritage of humanity

Link: here

Done: All

To do: -

Notes:

Structure:

Example item:

Done:

To do:

Done: All

To do:

Notes:

Done: Mix n' Match

To do:

Notes:

Done: Mix n' Match

To do: Next steps

Notes:

Date complete:

Notes:

Discussion: edit

UNESCO Lists of Intangible Cultural Heritage and the Register of Best Safeguarding Practices edit

Workflow edit

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match data to existing data Importing data into Wikidata Date import complete and notes
Name: UNESCO Lists of Intangible Cultural Heritage and the Register of Best Safeguarding Practices

Source: UNESCO

Link: http://www.unesco.org/culture/ich/en/lists

Description: The UNESCO international register of Intangible Cultural Heritage

Link: here

Done: All

To do:

Notes:

Structure:

Example item:

Done:

To do:

Done: All

To do:

Notes:

Done: Imported into Mix n' Match

To do: Match on Mix n' Match

Notes:

Done:

To do:

Notes:

Date complete:

Notes:

Discussion: edit

European Red List of Habitat edit

The European Red List of Habitats provides an entirely new and all embracing tool to review commitments for environmental protection and restoration within the EU2020 Biodiversity Strategy. In addition to the assessment of threat, a unique set of information underlies the Red List for every habitat: from a full description to distribution maps, images, links to other classification systems, details of occurrence and trends in each country and lists of threats with information on restoration potential. All of this is publicly available in PDF and database format (see links below), so the Red List can be used for a wide range of analysis. The Red List complements the data collected on Annex I habitat types through Article 17 reporting as it covers a much wider set of habitats than those legally protected under the Habitats Directive."

Workflow edit

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name: European Red List of Habitat

Source: European Comission

Link: [1]

Description: Current status of all natural and semi-natural terrestrial, freshwater and marine habitats in Europe.

Link: [2]

Done: All data imported to spreadsheet

To do: Check coding in sheet "European Red List of Habitats", formatting of names with diacritics.

Notes:

Structure:

Example item:

Done:

To do:

Done:

To do:

Notes:

Done:

To do:

Notes:

Done:

To do:

Notes:

Date complete:

Notes:

Discussion: edit

DB Netz Betriebsstellenverzeichnis edit

  • Name of dataset: DB Netz Betriebsstellenverzeichnis (Open-Data-Portal)
  • Source: DB Netz AG (infrastructure departement of germany’s national railway company)
  • Link: https://data.deutschebahn.com/dataset/data-betriebsstellen (the latest one, currently from 2017-01)
  • Description:
    1. Abk: The abbreviation used for operational purposes („Ril 100“, formerly „DS 100“). Import to station code (P296).
    2. Name: The full name. Import to official name (P1448).
    3. Kurzname: Name variant abbreviated to fit within 16 characters. Import to short name (P1813).
    4. Typ: Type of location. Import to instance of (P31). I’m suggesting to restrict the import to Bf (Bahnhof (station) (Q27996466)), Hp (Haltepunkt (train stop) (Q27996460)), Abzw (junction (Q27996464)), Üst (Überleitstelle (Q27996463)), Anst (Anschlussstelle (Q27996461)), Awanst (Ausweichanschlussstelle (Q27996462)) and Bk (Q27996465) (including combinations of those like „Hp Anst“, but not the variants like „NE-Hp“) for now.
    5. Betr-Zust: Wheter the location is only planned or no longer exists. I’m suggesting to not automaticaly import anything with a value here.
    6. Primary Loaction Code: The code from TSI-TAP/TSI-TAF. Import to station code (P296).
    7. UIC: Which country the location is in. I’m suggesting to restrict the import to germany (80) for now.
    8. RB: Which regional section of DB Netz is responsible for this location. I’m suggesting to not automaticly import those which don’t have a value after the other suggested filterings. Or in other words: To not import those without a value here, but ignore the value otherwise.
    9. gültig von: Literally translates to „valid from“, but honestly I don’t know which date exactly this refers to. Anyway: Not relevant, or maybe don’t import those newer than 2017-01-01.
    10. gültig bis: Literally translates to „valid until“, same as before just whatever end. Not relevant.
    11. Netz-Key: Add zeroes on the left until it’s six digits long, prepend the UIC country code and import to UIC station code (P722).
    12. Fpl-rel: Whether this can be ordered as path of a train path. Not relevant.
    13. Fpl-Gr: Whether the infrastructur manager (for the germans around: that’s the EIU) responsible for creating the train’s timetable may change here. Not relevant.
  • Note about my usage of „P296“ in the description section above: It’s not really clear to me how P296 is supposed to be used. Maybe a new property or whatever would be better. So read this as „P296 or new property“. Note that there are already Items with those codes in P296, which would need to be changed to whatever representation is chosen.
  • Request by: --Nenntmichruhigip (talk) 19:52, 21 March 2017 (UTC)[reply]

Workflow edit

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name:

Source:

Link:

Description:

Link:

Done:

To do:

Notes:

Structure:

Example item:

Done:

To do:

Done:

To do:

Notes:

Done:

To do:

Notes:

Done:

To do:

Notes:

Date complete:

Notes:

Discussion: edit

Protected Planet dataset for Germany edit

Workflow edit

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name:

Source:

Link:

Description:

Link:

Done:

To do:

Notes:

Structure:

Example item:

Done:

To do:

Done:

To do:

Notes:

Done:

To do:

Notes:

Done:

To do:

Notes:

Date complete:

Notes:

Discussion edit

List of Museums of São Paulo/Brazil edit

Workflow edit

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name:

Source:

Link:

Description:

Link:

Done:

To do:

Notes:

Structure:

Example item:

Done:

To do:

Done:

To do:

Notes:

Done:

To do:

Notes:

Done:

To do:

Notes:

Date complete:

Notes:

Discussion edit

Debates of the Constituent Assembly of Jura edit

Workflow edit

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name: Debates of the Constituent Assembly of Jura

Source: Jura cantonal archives

Link: http://www.jura.ch/DFCS/OCC/ArCJ/Projets/Archives-cantonales-jurassiennes-Projets.html

Description: Sound collection of the plenary sessions of the Constituent Assembly of the canton Jura in Switzerland

Link: https://docs.google.com/spreadsheets/d/1dqt8hwk9Wp8o5n9i4umoLX-uorW3q7YSpmOpd1FeRD4/edit?usp=sharing

Done:

To do:

Notes: The Wikimedia Commons page with the sound tracks already exists

Structure:

Example item:

Done:

To do:

Done:

To do:

Notes:

Done:

To do:

Notes:

Done:

To do:

Notes:

Date complete:

Notes:

Discussion edit

For the Workshop Wiki SUPSI - Chapter 2 we are looking into how to add this database to Wikidata. The database was provided by Ilario Valdelli of Wikimedia Switzerland to act as a key study for the viability of adding Wikimedia content's metadata (in this specific case audio recordings collection).

We will work on documenting the process in order to provide a real example for the Archives and Institutions in Switzerland to encourage them using Wikidata as database too.

As it is the first time we are uploading to Wikidata, we would like to have to chance to discuss and find the best way to import those data and define the properties for the audio contents.

Ethnologue's EGIDS language status edit

Workflow edit

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name: EGIDS language status

Source: Ethnologue

Link: https://www.ethnologue.com/browse/codes

Description: Import the "Language Status" in every page of languages

Link:

Done:

To do:

Notes:

Structure:

Example item:

Done:

To do:

Done:

To do:

Notes:

Done:

To do:

Notes:

Done:

To do:

Notes:

Date complete:

Notes:

Discussion edit

What is the difference between these three letter codes and the ISO-639-3 also maintained by Ethnologue? (As far as I am aware they are the same. Thanks, GerardM (talk) 17:29, 27 June 2017 (UTC)[reply]

@GerardM: it's same --Beeyan (talk) 08:23, 9 August 2017 (UTC)[reply]

DB SuS and RNI Stationsdaten edit

Workflow edit

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name:

Source:

Link:

Description:

Link:

Done:

To do:

Notes:

Structure:

Example item:

Done:

To do:

Done:

To do:

Notes:

Done:

To do:

Notes:

Done:

To do:

Notes:

Date complete:

Notes:

Discussion: edit

Berliner Malweiber edit

Workflow edit

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name: Berliner Malweiber

Source: Stiftung Stadtmuseum Berlin

Link: https://www.stadtmuseum.de/ausstellungen/berlin-stadt-der-frauen

Description: Metadata relating to the museum's digitisation project Berliner Malweiber, involving works by female artists displayed by the museum in its exhibition Berlin – Stadt der Frauen (March–August 2016).

Link: here

Done: Initial import of data into spreadsheet; metadata complemented with GND IDs where available.

Structure: link Done Done Done Date complete: 2019-09-18

Notes: Will tweak (enrich) selected entries where possible, but the ingest of the fundamental metadata is complete.

Discussion edit

The data will be imported by User:Hgkuper in preparation for the digiS workshop A gentle introduction to WIKIDATA.

UNESCO Atlas of World Languages in Danger edit

Workflow edit

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name:

Source:

Link:

Description:

Link:

Done:

To do:

Notes:

Structure:

Example item:

Done:

To do:

Done:

To do:

Notes:

Done: Added to Mix n' Match

To do:

Notes:

Done:

To do:

Notes: There are several languages currently existing in Wikidata that AWLD has more detail on, having a specific entry for each dialect. Once the data has been imported a query should be run to find the items with multiple AWLD and they should be separated out into separate items for each dialect and link back to the non dialect item.

Date complete:

Notes:

Discussion edit

JPL Small-Body Database edit

  • Name of dataset:JPL Small-Body Database (SBDB)
  • Source:JPL Small-Body Database
  • Link:https://www.jpl.nasa.gov/
  • Description: A database about astronomical objects. It is maintained by Jet Propulsion Laboratory (JPL) and NASA and provides data for all known asteroids and several comets, including orbital parameters and diagrams, physical diagrams, and lists of publications related to the small body. The database is updated on a daily basis.
  • Request by: Noobius2 (talk) 20:47, 16 June 2017 (UTC)[reply]

Workflow edit

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name:

Source: Link:

Description:

Link:

Done:

To do:

Notes:

Structure:

Example item:

Done:

To do:

Done:

To do:

Notes:

Done:

To do:

Notes:

Done:

To do:

Notes:

Date complete:

Notes:

Discussion edit

Gatehouse Gazetteer (Wales) edit

Imported into new dataset imports here

Workflow edit

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name:

Source:

Link:

Description:

Link: https://docs.google.com/spreadsheets/d/1o-fZ7HbMieFEJ6vHT61Kp9e97Ix7hTl4utI8Jl0AXos/edit#gid=132680949

Done: Import into Mix n' Match, matched in Mix n' Match

To do: Quickstatements

Notes:

Structure:

Example item:

Done:

To do:

Done:

To do:

Notes:

Done: Added to Mix n' Match

To do:

Notes:

Done:

To do:

Notes:

Date complete:

Notes:

Discussion (20) edit

@Richard Nevell (WMUK): I see that you have created multiple items for castles that have no statements and no sources. I have looked at two items and they seem to be duplicates of already existing items Berry's Wood (Q17647484) and Hen Domen (Q5712905). Duplicate items produce additional workload and the general expectation in Wikidata is that contributors make a reasonable effort to avoid duplicate items when mass creating new items.

I created a property proposal for an ID to reference this website at https://www.wikidata.org/wiki/Wikidata:Property_proposal/Gatehouse_Gazetteer_place_ID . Importing via mix-and-match is likely a better idea.

There's also the question of whether data besides the name/description can be imported. ChristianKl (talk) 15:44, 27 June 2017 (UTC)[reply]

Hello @ChristianKl:. The two Berry's Wood enclosures are in different countries (England and Wales) while Hen Domen, Llansantffraid Deuddwr is a different site to the Hen Domen near Montgomery. How many items have you merged so far? The items have been created ahead of matching with Mix'n'match (catalogue here); information on county, country, coordinates, and instance would be imported. Richard Nevell (WMUK) (talk) 16:14, 27 June 2017 (UTC)[reply]
Only the two. Hen Domen, Llansantffraid Deuddwr is located in the historic country of Montgomeryshire (according to http://www.gatehouse-gazetteer.info/Welshsites/664.html). What makes you think that isn't near to Montgomery? ChristianKl (talk) 16:24, 27 June 2017 (UTC)[reply]

@ChristianKl: The distance between those two sites is 20km as the crow flies. I realise that's not clear from what I added to Wikidata as there were no coordinates on the new item. Both sites are in the historic county of Montgomeryshire, but it does cover something like 2,000km2.

I've been using Mix'n'matches 'game mode to match Wikidata items to the catalogue. The only options for entries without a match are 'new item' (which is what I've been using) and 'N/A'. Have I been using the wrong option? I understand that having Wikidata items without statements isn't particularly helpful, but it is meant to only be temporary. Richard Nevell (WMUK) (talk) 11:35, 29 June 2017 (UTC)[reply]

Hi all, creating new items without statements is just how Mix n' Match works. The statements will be added to the items items once the matching has been complete, there around 300 matches still to go. Thanks, --John Cummings (talk) 11:59, 29 June 2017 (UTC)[reply]

────────────────────────────────────────────────────────────────────────────────────────────────────@ChristianKl: Is it ok to resume using Mix'n'match to create items? I reckon I could get the rest done by theend of the week so we'll be ready to import statements. Richard Nevell (WMUK) (talk) 13:15, 5 July 2017 (UTC)[reply]

Given that @ValterVB: is the person who's at the moment deleting the items, it might be better to have his opinion. Given that I created the property proposal, I can't create the property for the identifier. If another admin or property creator creates it, it would help a lot with making clear that the items are notable. ChristianKl (talk) 13:25, 5 July 2017 (UTC) S[reply]
Which Item? Without example isn't easy, probably was item without source and with only label e description. --ValterVB (talk) 20:05, 5 July 2017 (UTC)[reply]
ValterVB Mixnmatch will create items with only a description, but then add more statements. Is it OK to resume the matching process? I could try to go quicker so items don't stay empty for long. Richard Nevell (WMUK) (talk) 14:18, 6 July 2017 (UTC)[reply]
Do you have an example to see wich statements you add? And After how long you add the others statements? --ValterVB (talk) 17:43, 6 July 2017 (UTC)[reply]
@ValterVB: Q30758975 is an item you deleted (twice) that is in the property proposal ChristianKl mentions up above. ArthurPSmith (talk) 18:06, 6 July 2017 (UTC)[reply]
@ValterVB: I think it's bad that you delete items (and especially redelete them when undeleted) without knowing what you delete. Engaging with Richard Nevell (WMUK) makes much more sense than blindly deleting his items. ChristianKl (talk) 18:13, 6 July 2017 (UTC)[reply]
I know what I delete: an item without statement, wihout sitelink without back link, no notable, it's in our guideline, If I found this kind of item I delete them withou doubt. --ValterVB (talk) 19:44, 6 July 2017 (UTC)[reply]
@ValterVB: Items that are linked from a property proposal dicussion fulfill a structural need and are thus notable. Aside from that those castles are clearly identifiable entities that can be described with public sources and thus also notable under 3. ChristianKl (talk) 23:34, 6 July 2017 (UTC)[reply]
@ValterVB: I understand that you deleted them because they had no statements, and that's what the policy says. But if I am allowed to match the rest of the set through Mix'n'match I intend to added statements to each item (including instance and location). Would you be happy letting me try that before deleting them? Richard Nevell (WMUK) (talk) 17:44, 7 July 2017 (UTC)[reply]
@Richard Nevell (WMUK): Yesterday I asked "And After how long you add the others statements?", @ChristianKl: If you add "public sources" that clearly identify the item we don't delete the item, nobody can force someone else to look for sources. If a user create an item can do a little effort and add the sources. The items are judged for the state they are in, not for potential that they can have. --ValterVB (talk) 19:30, 7 July 2017 (UTC)[reply]
@ValterVB: The criteria is whether there are public sources that can be used to describe the item and not whether the item is described by public sources. ChristianKl (talk) 19:35, 7 July 2017 (UTC)[reply]
OK, add link to public source in the item so we can check and eventually not delete. --ValterVB (talk) 20:07, 7 July 2017 (UTC)[reply]
ValterVB Does six days sound reasonable? Richard Nevell (WMUK) (talk) 12:32, 8 July 2017 (UTC)[reply]
6 days? Why so much time? no technical reason to wait one week. For me 48 hours it's the max accettable. --ValterVB (talk) 13:15, 8 July 2017 (UTC)[reply]
Addendum: If you have a list with source I can do it with my bot: Creation and addition of sources with 1 edit. --ValterVB (talk) 13:18, 8 July 2017 (UTC)[reply]
I imagine it could be done reasonably quickly by someone who is well versed with the process, however I am still learning the ropes. Six days should be enough for me to complete the matching, get help with quick statements, and get the information imported while also accommodating other calls on my time. Richard Nevell (WMUK) (talk) 15:55, 8 July 2017 (UTC)[reply]
In 6 day you can loss the item, because somen has changed the thing, you can win the lottery and forgot WIkidata and the process is veri dangerous. If you use quickstatement is more sure and correct add reference right after creating the item using "LAST" command. --ValterVB (talk) 20:29, 10 July 2017 (UTC)[reply]

ValterVB, Richard Nevell (WMUK) I think there is a missunderstanding, creating empty items is how Mix n' Match works. If you delete empty items created by Mix n' Match you are breaking the data import process for the catalogues. There are currently 100s of catalogues being imported using this tool, some of which take several months to go through to match correctly to existing data. If the policy is incompatible with one of the main data import methods for Wikidata then I suggest we have a larger problem.... --John Cummings (talk) 15:12, 30 August 2017 (UTC)[reply]

Then we have a big problem. --ValterVB (talk) 19:18, 30 August 2017 (UTC)[reply]
If the intention is to add further statements to an item, what is the harm? Richard Nevell (WMUK) (talk) 10:30, 31 August 2017 (UTC)[reply]
Creating items and leaving them blank for a while is a problem, mainly because in the meantime someone might try to match the same concept to Wikidata and not detect your item (because it is blank and therefore hard to find). So they might create their own, and we end up with duplicates. − Pintoch (talk) 12:21, 31 August 2017 (UTC)[reply]
Granted, it's possible but with a suitably short time period the likelihood of this is small. Richard Nevell (WMUK) (talk) 14:42, 31 August 2017 (UTC)[reply]
The problem is: given how Mix and Match currently works, this time period can be rather long. Also, it is totally possible for someone to dump a dataset in Mix'n'Match, start matching some of it, and get bored at some point: in this case, the created items will remain empty forever… − Pintoch (talk) 14:00, 7 September 2017 (UTC)[reply]

Hi ValterVB and Pintoch, can I suggest we start a discussion on the main project chat do discuss this possible incompatibility between the main import tool and Wikidata policy? There are 10s of Mix n' Match catalogues being processed at the moment, it does not seem realistic or practical to stop using the tool whilst this is discussed. Thanks, --John Cummings (talk) 15:13, 4 September 2017 (UTC)[reply]

Totally! By the way, I am also working on an alternative to Mix'n'Match: OpenRefine. − Pintoch (talk) 14:00, 7 September 2017 (UTC)[reply]
@ValterVB: please can you undelete all the items created by @Richard Nevell (WMUK): and myself asap? We are trying to import data into all items but its breaking because you deleted the items. We can populate the items quickly after you undelete them. Thanks, --John Cummings (talk) 14:06, 7 September 2017 (UTC)[reply]
ValterVB, don't worry about undeleting these items. I'm going to recreate them now, along with some basic statements. Best NavinoEvans (talk) 10:17, 8 September 2017 (UTC)[reply]

Protected Planet Sites in Niger edit

Workflow edit

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name:

Source:

Link:

Description:

Link: https://docs.google.com/spreadsheets/d/1tlqH0TggjqYL-nv2VWKsSYLRK9IQr4gC3jJtvnxwfBw/edit#gid=998871309

Done: 25%

To do: 75%

Notes:

Structure:

Example item:

Done:

To do:

Done:

To do:

Notes:

Mix N Match Catalogue: https://tools.wmflabs.org/mix-n-match/#/catalog/483Done:

To do:

Notes:

Done:

To do:

Notes:

Date complete:

Notes:

Discussion edit

Localisation and Information about all Fountains in the City of Zurich edit

  • Name of dataset: Brunnen der Stadt Zürich (Fountains in the City of Zurich)
  • Source: Open-Data-Catalog of the City of Zürich
  • Link: https://data.stadt-zuerich.ch/dataset/brunnen
  • Description: This Geodataset shows the locations of the ~1280 fountains which are maintained by the Water Supply Department of the City of Zurich (Wasserversorgung Stadt Zürich). The Geo-Dataset contains interesting attributes like the historical year of construction, the description of the fountain, the kind of water it contains or what kind of fountain it is. The Dataset is under CC-0-License an can be used freely.
  • Request by: Marco Sieber, Open-Data-Zürich-Team, Stadt Zürich

Workflow edit

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name: Brunnen

Source: Open-Data-Catalog City of Zurich

Link: https://data.stadt-zuerich.ch/dataset/brunnen

Description: This Geodataset - available as GeoJSON, Geopackage, KML, Shapefile, Web Map Service and Web Feature Service - shows the locations of the ~1280 fountains which are maintained by the Water Supply Department of the City of Zurich (Wasserversorgung Stadt Zürich). The Geo-Dataset contains interesting attributes like the historical year of construction, the description of the fountain, the kind of water it contains or what kind of fountain it is.

Link: https://github.com/opendata-zurich/wikidata/blob/master/fountains/20170918_brunnen_zuerich.xls

Done: Manually converted from GeoJSON to Excel by author.

To do:

Notes: The conversion from GeoJSON 2 Spreadsheet is not automated yet

Structure:

Example item:

Done: Properties accepted, data imported. Link::


To do:


Done:

To do:

Notes:

Done:

To do:

Notes:

Done:

To do:

Notes:

Date complete:

Notes:

Discussion edit

Population by Stadtquartier since 1970 in the City of Zurich edit

  • Name of dataset: Population by Stadtquartier in the City of Zurich since 1970
  • Source: Open-Data-Catalog of the City of Zurich
  • Link: https://data.stadt-zuerich.ch/dataset/bev-bestand-jahr-quartier-seit1970
  • Description: This dataset contains all the population since 1970 in the City of Zurich per Statistical Stadtquartier (~district). Dataowner: Statistik Stadt Zürich
  • Request by: Marco Sieber, Open-Data-Zürich-Team, Stadt Zürich

Workflow edit

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name: Bevölkerung nach Stadtquartier, seit 1970 (Resident population per District since 1970)

Source: Dataowner Statistik Stadt Zürich. Source of this file: Open-Data-Catalog of the City of Zurich

Link: https://data.stadt-zuerich.ch/dataset/bev-bestand-jahr-quartier-seit1970

Description: Attributes: [Ereignisjahr (technisch: StichtagDatJahr), Time stamp of when the number of the population is representative. Usually at the 31.12.YEAR] [Stadtquartier (Sort) (technisch: QuarSort), Official ID of the District called «Statistischen Stadtquartier» (Integer).] [Stadtquartier (lang) (technisch: QuarLang) Official Name of the District called «Statistischen Stadtquartier»(String).] [Wirtschaftliche Bevölkerung (technisch: AnzBestWir), amount of the resident population (Wirtschaftlich anwesende Personen) (Integer).] The Number of the population according to the Definition of «Resident Population»[3], which is different than the «Permanent resident Population»[4]. The Federal Statiscal Office publishes data for the latter.

Link: https://github.com/opendata-zurich/wikidata/blob/master/population_quartiere_since1970/bev324od3240.xlsx

Done: CSV2Excel by author.

To do:

Notes:

Structure:

Example item:

Done:

To do:

Done:

To do:

Notes:

Done:

To do:

Notes:

Done:

To do:

Notes:

Date complete:

Notes:

Discussion edit

Public Art on public ground in the City of Zurich edit

  • Name of dataset: Kunst im Stadtraum (KiS) / Public Art on public ground in the City of Zurich
  • Source: Open-Data-Catalog of the City of Zürich
  • Link: https://data.stadt-zuerich.ch/dataset/kunst-im-stadtraum
  • Description: This dataset is a collection of Public Art Objects, which are in possession of the City of Zurich and stand on public ground. The information stored in this data are coming from the responsible departements «Kunst im öffentlichen Raum» and «Kunst und Bau». It contains basic information about these objects and the artists who created them. All objects are georeferenced as well.
  • Request by: Marco Sieber, Open-Data-Zürich-Team, Stadt Zürich

Workflow edit

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name: Kunst im Stadtraum

Source: Open-Data-Catalog City of Zurich. Link: https://data.stadt-zuerich.ch/dataset/kunst-im-stadtraum

Description:

  • Attributes:
    • Titel: Title or Describtion of the piece of Art. If the official titel is not known, there's a description within brackets [].
    • Künstler_IN : Artist.
    • Datierung : Date of creation of the piece of Art.
    • Gattung : Type of Art (e.g. fountain, installation, architectural sculpture, etc.)
    • Material_Technik : Material or technique used.
    • Standort : Description on where the object can be found.
    • ID: ID used for the objects. Is supposed to be stable.
    • Easting_WGS: longitude value in WGS84
    • Northing_WGS: latitude value in WGS84
Link: https://github.com/opendata-zurich/wikidata/blob/master/public_art/kunstimstadtraum.xlsx

Done: Marco Sieber

To do:

Notes:

Structure:

Example item:

Done:

To do:

Done:

To do:

Notes:

Done:

To do:

Notes:

Done:

To do:

Notes:

Date complete:

Notes:

Discussion edit

UNESCO Honorary and Goodwill Ambassadors edit

Workflow edit

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name: UNESCO Honorary and Goodwill Ambassadors

Source: UNESCO

Link: http://www.unesco.org/new/en/goodwill-ambassadors/

Description: A list of UNESCO Honorary and Goodwill Ambassadors

Link: https://docs.google.com/spreadsheets/d/1mZCj9ZYGxrzex-9IlEtHlFrXldU97cwrChz5Ym1GSfo/edit?usp=sharing

Done: Import data

To do: extract URLs

Notes:

Structure:

Example item:

Done:

To do:

Done:

To do:

Notes:

Done:

To do:

Notes:

Done:

To do:

Notes:

Date complete:

Notes:

Discussion edit

Northern Ireland Sites and Monuments Record edit

Workflow edit

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name:

Source:

Link:

Description:

Link: https://docs.google.com/spreadsheets/d/1QVlCe6qDJVPNjVHuk-rAaoZwQ5UKZf0xD2n5M8121zU/edit?usp=sharing

Done:

To do:

Notes:

Structure:

Example item:

Done:

To do:

Done:

To do:

Notes:

Done:

To do:

Notes:

Done:

To do:

Notes:

Date complete:

Notes:

Discussion edit

Maliku Lama,Kanamit,Maliku,Pulang Pisau,Kalimantan Tengah,Indonesia edit

Workflow edit

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name:

Source:

Link:

Description:

Link:

Done:

To do:

Notes:

Structure:

Example item:

Done:

To do:

Done:

To do:

Notes:

Done:

To do:

Notes:

Done:

To do:

Notes:

Date complete:

Notes:

Discussion edit

Alle politisch selbständigen Gemeinden mit ausgewählten Merkmalen am 30.09.2017 (3. Quartal 2017) edit

Workflow edit

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name: Alle politisch selbständigen Gemeinden mit ausgewählten Merkmalen am 30.09.2017 (3. Quartal 2017)

Source: Statistisches Bundesamt

Link: https://www.destatis.de/DE/ZahlenFakten/LaenderRegionen/Regionales/Gemeindeverzeichnis/Administrativ/Archiv/GVAuszugQ/AuszugGV3QAktuell.html;jsessionid=F39E8370DC8C1DA3F40804D32989D763.InternetLive1

Description: Alle politisch selbständigen Gemeinden mit ausgewählten Merkmalen am 30.09.2017

Link: https://docs.google.com/spreadsheets/d/1HP_4IV-VHWtll0YGjKpUaThPYeXsq9u9vuILE36NViA/edit?usp=sharing

Done: all

To do:

Notes: Formated .xlsx to easy readable .csv. Separated some columns. Reformated numbers to english delimiter '.'. Added 'Name' and 'Titel' colummn.

Structure: Population (P1082), coordinate location (P625), postal code (P281), area (P2046)

Example item: Rahden (Q182979)

Done:

To do:

Done:

To do:

Notes:

Done:

To do:

Notes:

Done:

To do:

Notes:

Date complete:

Notes:

Discussion edit

CBDB edit

Workflow edit

Phase 1:

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name:CBDB

Source: SQlite dump

Link: https://hu-my.sharepoint.com/personal/hongsuwang_fas_harvard_edu/_layouts/15/guestaccess.aspx?docid=07ade27f1bf524247b8b2295d04111975&authkey=AaFG1lZv0b8DCjUbfUG8uIU

Description:Large linked information

Link:

Done:   Done

To do:

Notes: Already structured.

Structure:

Example item: Q720

Done:

To do:

Done:

To do:

Notes:

Done:

To do:

Notes:

Done: Import most male in the database.

To do: More claims and link to other entities.

Notes:

Date complete:

Notes:

Discussion edit

Why did you execute this import without discussion? One million edits and 300,000 new items without useful content is not very constructive. Also, adding this data with an edit rate from 500 till 800 is harmful to the servers. It's also quite a violation of our bot policy. Sjoerd de Bruin (talk) 18:16, 18 December 2017 (UTC)[reply]

I usually don't get a response and yes I didn't make such discussion which I should. I'm aware of your concerns now. As of content, like a complicated dataset it presents, it's fairly hard to build a working one-time script by pywikibot without an existing item. So I decided to import all of that first. I planned to import more linked information based on that. But it seems that community don't allow such behaviours at all and I don't know what to take from here. Since it has an index from CBDB, it certainly meets the notability. As of server load, I didn't know that in the beginning. What I thought is that PAWS allows a faster import but it was down. So I simply go for quick action. Anyway, it seems that I've made a mistake here. What should I do next then to correct them? The data is still quite nice to be imported and hopefully we can both learn something.--Fantasticfears (talk) 19:44, 18 December 2017 (UTC)[reply]
A subgroup to experiment with was better. What will be added to the items in the next round? I hope date of birth and death, gender and profession at least? We do matchings with Chinese artists to Wikidata and this data makes it impossible to verify the right person. We have added https://www.wikidata.org/wiki/Q46346812 but we see 李琦 Li Qi many times in Wikidata without any info. Another question, traditional chinese is used as language for the label and chinese is not filled. Is that correct? --Hannolans (talk) 13:53, 21 December 2017 (UTC)[reply]
Sorry about the late reply. I agree with that a subgroup is better though I've imported quite many. As of Li Qi, that's because there are different people about Li Qi and they are different. CBDB has a lot of occupations in the database and some of them are not listed in Wikidata. In that case, it doesn't get imported. --Fantasticfears (talk) 12:10, 18 January 2018 (UTC)[reply]
  • There are two issues:
  1. Was our bot policy violated
  2. Should we keep the data
As far as (1) goes, technically it wasn't and the core problem is that our bot policy doesn't speak about QuickStatements. Do the extend that we do want it to cover QuickStatements we should likely amend it.
As far as (2) goes, I'm in favor of having the data given that's data from a high quality source. When data about the birth/death/floruit is added disambiguation will be easier and in addition the source data set has family relations that are useful to have. ChristianKl09:21, 22 December 2017 (UTC)[reply]
@ChristianKl: I appreciate that. Since every item has a link to CBDB, it's not hard to import more claims. I'd like to take it slower pace and start to make a proposal for the bot.--Fantasticfears (talk) 12:10, 18 January 2018 (UTC)[reply]
@Fantasticfears: ok, now what? What are you going to do with these items? Please explain why you think these items are notable. That's not clear from your proposal. Multichill (talk) 21:32, 22 December 2017 (UTC)[reply]
I think the next step should be a proper bot proposal that explains what you want to do with the existing items. ChristianKl14:01, 24 December 2017 (UTC)[reply]
I noticed that the bot created many items that are duplicates of existing items (e.g. Wang Yangming (Q45417762), Li Wenzhong (Q45484249), Hu Dahai (Q45485882)) when the existing items didn't have CBDB ID (P497). I manually merged dozens of them, and I guess there may be thousands more. I think after basic information are added from the database, we should use a bot to merge the duplicates based on some criteria (perhaps two items with the same name + date of birth + date of death could be determined as duplicates and should be merged).--Stevenliuyi (talk) 23:22, 24 December 2017 (UTC)[reply]
I guess the next step would be to add as much info about each person as possible. I am looking at Du Youlan (Q45728367) linked to [5]. We should get Chinese name, sex or gender (P21), country of citizenship (P27) and 種族部族 ethnic group (P172) (?). "Algorithmically generated index year" might be equivalent to floruit (P1317) with sourcing circumstances (P1480) = circa (Q5727902). I do not know if some of the people in the database have dates of birth and death or occupations. They would be useful for matching and detecting duplicates. Also family relations would be handy. --Jarekt (talk) 18:32, 27 December 2017 (UTC)[reply]
@Multichill:: Per notability criteria, it meets the 2) It refers to an instance of a clearly identifiable conceptual or material entity and 3) It fulfills some structural need. CBDB is a collection of all notables that involved in Chinese history, some people may not famous enough to meet Wikipedia's notability criteria, they were still involved some notable people and CBDB has relation data towards them. Nevertheless, my code had some faults and it added some less notable people in the Wikidata. I would also like to remove them.
@Stevenliuyi: Wikidata doesn't prevent duplicate claims neither pywikibot does. Do we have another bot to remove that?
@Jarekt::I'll include that in the proposal.

--Fantasticfears (talk) 12:10, 18 January 2018 (UTC)[reply]

Given that all the data is well-sourced and describes clearly identifiable conceptual or material entity I think taht they should all have a place in Wikidata but the forum for that discussion should be a bot request. ChristianKl13:35, 18 January 2018 (UTC)[reply]
@Fantasticfears: you haven't edited this site for over a month. You just created a lot of nearly empty items on Wikidata of which some (or a lot) are duplicates. Please explain how you plan to expand these items in 2018 and contribute in a meaningful way to this project. Multichill (talk) 18:38, 18 January 2018 (UTC)[reply]
@Multichill: Sorry for the late response. Though it doesn't make sense to focus my personal life, mistakes and commitment for Wikidata. Improvements should be the point in the future. I've mentioned this import to CBDB authors. Here is the code for last import. Since most people in the CBDB is imported and a bot is required, I'd start to continue this thread in a proposal.--Fantasticfears (talk) 12:28, 24 January 2018 (UTC)[reply]

FAMCL edit

Workflow edit

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name:

Source:

Link:

Description:

Link:

Done:

To do:

Notes:

Structure:

Example item:

Done:

To do:

Done:

To do:

Notes:

Done:

To do:

Notes:

Done:

To do:

Notes:

Date complete:

Notes:

Discussion edit

I am completely new to this, so please let me know about anything else that I need to do or any faux pas I might make. Peaceray (talk) 20:43, 24 January 2018 (UTC)[reply]

Directory of Open Access Journals edit

Moved to Wikidata:Dataset_Imports/Directory_of_Open_Access_Journals

Car models edit


Workflow edit

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name:

Source:

Link:

Description:

Link:

Done:

To do:

Notes:

Structure:

Example item:

Done:

To do:

Done:

To do:

Notes:

Done:

To do:

Notes:

Done:

To do:

Notes:

Date complete:

Notes:

Discussion edit

  • Listing this here mainly because I don't plan to import this myself. Licensing tbd. Maybe just the list of models should be imported.
    --- Jura 20:12, 7 March 2018 (UTC)[reply]

UNESCO field offices edit


Workflow edit

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name:

Source:

Link:

Description:

Link:

Done:

To do:

Notes:

Structure:

Example item:

Done:

To do:

Done:

To do:

Notes:

Done:

To do:

Notes:

Done:

To do:

Notes:

Date complete:

Notes:

Discussion edit

New York Times Obituaries edit


Workflow edit

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name:

Source:

Link:

Description:

Link:

Done:

To do:

Notes:

Structure:

Example item:

Done:

To do:

Done:

To do:

Notes:

Done:

To do:

Notes:

Done:

To do:

Notes:

Date complete:

Notes:

Discussion edit

Video Game Companies edit

Moved to Wikidata:Dataset Imports/Video Game Companies Jean-Fred (talk) 10:17, 30 May 2018 (UTC)[reply]

Anagraphical data of italian schools edit


Workflow edit

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name: Anagraphical data of italian schools

Source: Italian Ministry of Education, Universities and Research

Link: http://dati.istruzione.it/opendata/opendata/catalogo/elements1/?area=Scuole

Description: A dataset of the italian schools data


Link: spreadsheet here

Done: Merged the 4 datasets of the source link. Added references to schools records already present in Wikidata. Added reference to Wikidata locations.

To do:

Notes: A lot of schools seems somehow a slightly different flavor of the same one. This does happens since the same Institute may offer different educational paths.

Structure:

Example item:

Done:

To do:

Done:

To do:

Notes:

Done:

To do:

Notes:

Done:

To do:

Notes:

Date complete:

Notes:

Discussion edit

Just updated the spreadsheet. I'm open to suggestions about how importing data. Floatingpurr (talk) 15:07, 17 April 2018 (UTC)[reply]

PubMed Central Articles edit


Workflow edit

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name: PMC

Source: PMC

Link: https://www.ncbi.nlm.nih.gov/pmc/

Description: 4.7 Million free full-text biomedical and life sciences articles from NIH/NLM

Link:

Done:

To do:

Notes:

Structure:

Example item:

Done:

To do:

Done:

To do:

Notes:

Done:

To do:

Notes:

Done:

To do:

Notes:

Date complete:

Notes:

Discussion edit

PubMed Central Journals edit


Workflow edit

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name: PubMed Central Journals

Source: National Center for Biotechnology Information (NCBI)

Link: https://www.ncbi.nlm.nih.gov/pmc/journals/

Description:

Link: missing or incomplete items

Done:

To do: Find which ones are missing and which ones are only missing P:P1055

Notes:

Structure:

Example item:

Done:

To do:

Done:

To do:

Notes:

Done:

To do:

Notes:

Done:

To do:

Notes:

Date complete:

Notes:

Discussion edit

RNSR (Répertoire National des Structures de Recherche) edit


Workflow edit

Description of dataset Create and import data into spreadsheet Structure of data within Wikidata Format the data to be imported Match the data to existing data Importing data into Wikidata Date import complete and notes
Name: RNSR (Répertoire National des Structures de Recherche)

Source: Ministère de l'Enseignement Supérieur et de la Recherche Link: https://data.enseignementsup-recherche.gouv.fr/explore/dataset/fr-esr-repertoire-national-structures-recherche/

Description: Identification data + some information on 6542 French research labs or structures

Link: https://data.enseignementsup-recherche.gouv.fr/explore/dataset/fr-esr-repertoire-national-structures-recherche/download/?format=csv&use_labels_for_header=true

Done: it's done

To do:

Notes: Additionnally, I have checked the web sites.

Structure:

Example item:

Done:

To do:

Done:

To do:

Notes:

Done:For a first try, I matched 67 labs or research centers with both Qid and web site.

To do:

Notes:

Done: 694 new French national research structure identifiers added.

To do: Double-check the remaining labs in the RNSR data, then create them in Wikidata.

Notes: I plan to import this myself.

Date complete: 12th April 2018

Notes:

Discussion edit

I intend to use QuickStatements. Can anybody explain the use of the "source property" ? OdileB (talk) 07:54, 29 March 2018 (UTC)[reply]

@OdileB: I am nor an user of QuickStatements but concerning sources, you can have a look at Help:Sources. Snipre (talk) 09:08, 13 April 2018 (UTC)[reply]

Crossref Journals (moved) edit

Moved to Wikidata:Dataset Imports/Crossref journals

GNIS Domestic (moved) edit

Moved to Wikidata:Dataset Imports/Geographic Names Information System (GNIS) Domestic