User:ProteinBoxBot/2020 complex portal
Overall summary
editBuild a bot that creates Wikidata pages for each Complex Portal entry.
There are already 22 existing entries that should act as examples. 11 of these are for SARS-CoV-2 and were created during the virtual Covid-19 BioHackathon in April 2020 using OpenRefine followed by some manual curation. Preliminary ShEx were also developed (see below).
Considerations
edit- update methods - Complex Portal releases are roughly every 2 months
- location: Wikidata or EBI end?
Status
editKickoff meeting
editWe had an initial kickoff meeting (minutes). Moving forward:
- Complex portal is available with a CC-BY 4.0. We assume that since we are not importing all of complex portal, but creating references and pointers to the original content, this is eligble for inclusion into Wikidata. EBI Terms of Use
- The bot will be managed by the complex portal team, but build by the members of this sprint group
- One of the next steps is to finalize the semantic model (Entity Schema)
- This semantic model will then drive the bot development which will be in Python hosted primarily on Github.
Participants
editGameplan
edit- Define and write up when two items are the same, needed to determine if a new items needs to be created (done)
- Update EntitySchema for Macromolecular complex & Complex Portal entity * Andra/Jose *
- Create a draft bot to populate Wikidata with information from Complex Portal (done)
- Run the bot on a single complex: CPX-5742 SARS-CoV-2 polymerase complex ("missing" SARS-CoV-2 complex) (done)
- Adapt the bot to handle other complexes - first other coronavirus complexes, then yeast (as publication in preparation)
Properties
editProperty label | Property ID |
---|---|
instance of (P31) | P31 |
found in taxon (P703) | P703 |
has part(s) (P527) | P527 |
.. | .. |
Property label | property id |
---|---|
Complex Portal accession ID (P7718) | P7718 |
RNACentral ID (P8697) | P8697 |
.. | .. |
Proposed
editEntity Schema
edit- E186 Macromolecular complex
- E194 Complex Portal entity
- Complex Portal accession ID (P7718)
Bot development
editIn progress
Example complexes
edit- SARS-CoV-2 primase complex (Q90012271) - manually curated after Openrefine import (SARS-CoV-2 primase complex)
- Pyruvate dehydrogenase E1 heterotetramer (Q50265809) - created by pathwaybot (Pyruvate dehydrogenase E1 heterotetramer (human))
- Mitochondrial respiratory chain complex I (Q50265911) - created by pathwaybot (Mitochondrial respiratory chain complex I)
Example non-coding RNA
edit- long non-coding RNA NONMMUT046978.2 (Q99841998) - created by andrawaag and bmeldal for property proposal Wikidata:Property_proposal/Natural_science#RNACentral_ID
Results
editin progress
WikiPathways SPARQL query to list yeast complexes
editPREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX dct: <http://purl.org/dc/terms/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX wp: <http://vocabularies.wikipathways.org/wp#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> SELECT DISTINCT (STR(?label) AS ?complex) ?wpIdentifier ?pathway ?page WHERE { ?complex a wp:Complex ; dct:isPartOf ?pathway . OPTIONAL { ?complex rdfs:label ?label } ?pathway dc:title ?title ; foaf:page ?page ; dc:identifier ?wpIdentifier ; wp:organismName "Saccharomyces cerevisiae"^^xsd:string . } ORDER BY ?wpIdentifier