Wikidata:Dataset Imports/Higher Education Institutions of Brazil (2011)

You may find these related resources helpful:

Guidelines for using this page edit

Documenting the import edit

  • Guidelines on how to import a dataset into Wikidata are available at Wikidata:Data Import Guide.
  • Please include notes on all steps of the process.
  • Once a dataset has been imported into Wikidata please edit the page to change the progress status from in progress to complete.
  • It is strongly recommended to use Visual Editor when making changes to this page, particularly for editing any of the tables.

Creating a Wikidata item for the dataset edit

  • Please create a Wikidata item for the dataset, this will allow us to improve the coverage of datasets on Wikidata and understand what datasets are available on that topic and which of them have been added to Wikidata.
  • If you are working with very large dataset you can break it into smaller Mix n' Match catalogues, but only create one Wikidata item.
  • Link the dataset Wikidata item to this page using Wikidata Dataset Imports URL (P5195)

Getting help edit

  • If your dataset import runs into issues please edit the page to change the progress status from in progress to help needed.
  • You can ask for help on Wikidata:Project chat.

Overview edit

Dataset name edit

Higher Education Institutions of Brazil (2011)

Source edit

Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira (INEP)

Link edit

http://dados.gov.br/dataset/instituicoes-de-ensino-superior

Dataset description edit

Information about the 2365 institutions present in the 2011 registry of higher education institutions.

Additional information edit

Latest dataset available is from 2011, but there's already up to date information of 2014.

Probably INEP will only update this particular dataset upon request.

Progress of import edit

The table below is used to track the progress of importing this dataset. The suggested column headings are most applicable to data being imported from a spreadsheet - you can change some column headings or add new columns as required to best describe the progress of this import.

Wikidata item for the datasetImport data into spreadsheetFormat the spreadsheet to import the dataStructure of data within WikidataImporting data into WikidataVisualisations
list of Higher Education Institutions of Brazil (Q56599716)Link: Original Dataset Link: Structured for Wikidata

Done:

- Converted labels into titlecase except a few common stopwords (o, a, os, as, de, do, da, dos, das, para)

- Reconciled municipalities (most were automatic, a few manual matches, fairly certain of quality)

- Reconciled municipality/state/country as source of income, no conflicts with previous matches

- Standardized website URLS and removed invalid ones

- Retrieved coordinates combining Google Geocode API on addresses and Google Places API on org names, accepting a 1km error and prioritizing the Places response

- Generated short descriptions

- Substituted original columns with their properties/statements values

- Names will be imported as rdfs:label and short names as rdfs:altLabel and short name (P1813)

- Descriptions generated with a summary of NOMEORG, REDE and DEPADM5 in portuguese

- Items are going to be instance of (P31) either university (Q3918) or the more general higher education institution (Q38723), depending on NOMEORG

- Items are going to be instance of (P31) either public university (Q875538) or private university (Q902104), depending on REDE

- Items are going to be instance of (P31) private not-for-profit educational institution (Q23002054), depending on DEPADM5

- Municipalities will be imported as located in the administrative territorial entity (P131)

- Addresses will be translated into coordinate location (P625)

- Sanitized website URLS will be added as official website (P856)

Wikidata:Requests for permissions/Bot/GupyBotProWD profile

Edit history edit

Use the table below to list batches of edits that have been completed for this dataset. Ideally each entry should have all applicable columns filled out, but at a minimum please make to add a date and description to give an idea of what was added to Wikidata and when.

DateDescriptionMethodPropertiesQualifiersReferencesStatements addedStatements removedLink to import sheet

Discussion of import edit

These headings are generally useful, please change this section to suit your needs.

Import completion notes edit

Data was imported successfully with OpenRefine, but the batch editing errored a few times, requiring the segmentation of the dataset in smaller batches. This indicates the presence of conflicting reconciliation matches. Those are very few, though, probably at most 10 entities. This is the case for hand curation afterwards, which can probably be done by investigating constraint violations like the presence of multiple "official website" statements.

Visualisations edit

Maintenance edit

Queries and expected results edit

Query linkDescriptionExpected results

Schedule of new data released edit