Wikidata:Dataset Imports/Higher Education Institutions of Brazil (2011)
You may find these related resources helpful:
|Dataset Imports||Why import data into Wikidata.||Learn how to import data||Bot requests||Ask a data import question|
Guidelines for using this pageEdit
Documenting the importEdit
- Guidelines on how to import a dataset into Wikidata are available at Wikidata:Data Import Guide.
- Please include notes on all steps of the process.
- Once a dataset has been imported into Wikidata please edit the page to change the progress status from in progress to complete.
- It is strongly recommended to use Visual Editor when making changes to this page, particularly for editing any of the tables.
Creating a Wikidata item for the datasetEdit
- Please create a Wikidata item for the dataset, this will allow us to improve the coverage of datasets on Wikidata and understand what datasets are available on that topic and which of them have been added to Wikidata.
- If you are working with very large dataset you can break it into smaller Mix n' Match catalogues, but only create one Wikidata item.
- Link the dataset Wikidata item to this page using Wikidata Dataset Imports page (P5195)
- If your dataset import runs into issues please edit the page to change the progress status from in progress to help needed.
- You can ask for help on Wikidata:Project chat.
Higher Education Institutions of Brazil (2011)
Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira (INEP)
Information about the 2365 institutions present in the 2011 registry of higher education institutions.
Latest dataset available is from 2011, but there's already up to date information of 2014.
Probably INEP will only update this particular dataset upon request.
Progress of importEdit
The table below is used to track the progress of importing this dataset. The suggested column headings are most applicable to data being imported from a spreadsheet - you can change some column headings or add new columns as required to best describe the progress of this import.
|Wikidata item for the dataset||Import data into spreadsheet||Format the spreadsheet to import the data||Structure of data within Wikidata||Importing data into Wikidata||Visualisations|
|list of Higher Education Institutions of Brazil (Q56599716)||Link: Original Dataset||Link: Structured for Wikidata
- Converted labels into titlecase except a few common stopwords (o, a, os, as, de, do, da, dos, das, para)
- Reconciled municipalities (most were automatic, a few manual matches, fairly certain of quality)
- Reconciled municipality/state/country as source of income, no conflicts with previous matches
- Standardized website URLS and removed invalid ones
- Retrieved coordinates combining Google Geocode API on addresses and Google Places API on org names, accepting a 1km error and prioritizing the Places response
- Generated short descriptions
- Substituted original columns with their properties/statements values
|- Names will be imported as rdfs:label and short names as rdfs:altLabel and short name (P1813)
- Descriptions generated with a summary of NOMEORG, REDE and DEPADM5 in portuguese
- Items are going to be instance of (P31) private not-for-profit educational institution (Q23002054), depending on DEPADM5
- Municipalities will be imported as located in the administrative territorial entity (P131)
- Addresses will be translated into coordinate location (P625)
- Sanitized website URLS will be added as official website (P856)
|Wikidata:Requests for permissions/Bot/GupyBot||ProWD profile|
Use the table below to list batches of edits that have been completed for this dataset. Ideally each entry should have all applicable columns filled out, but at a minimum please make to add a date and description to give an idea of what was added to Wikidata and when.
|Date||Description||Method||Properties||Qualifiers||References||Statements added||Statements removed||Link to import sheet|
Discussion of importEdit
These headings are generally useful, please change this section to suit your needs.
Import completion notesEdit
Data was imported successfully with OpenRefine, but the batch editing errored a few times, requiring the segmentation of the dataset in smaller batches. This indicates the presence of conflicting reconciliation matches. Those are very few, though, probably at most 10 entities. This is the case for hand curation afterwards, which can probably be done by investigating constraint violations like the presence of multiple "official website" statements.