User:ProteinBoxBot/2020 complex portal

Overall summary edit

Build a bot that creates Wikidata pages for each Complex Portal entry.

There are already 22 existing entries that should act as examples. 11 of these are for SARS-CoV-2 and were created during the virtual Covid-19 BioHackathon in April 2020 using OpenRefine followed by some manual curation. Preliminary ShEx were also developed (see below).

Considerations edit

  • update methods - Complex Portal releases are roughly every 2 months
  • location: Wikidata or EBI end?

Status edit

Kickoff meeting edit

We had an initial kickoff meeting (minutes). Moving forward:

  • Complex portal is available with a CC-BY 4.0. We assume that since we are not importing all of complex portal, but creating references and pointers to the original content, this is eligble for inclusion into Wikidata. EBI Terms of Use
  • The bot will be managed by the complex portal team, but build by the members of this sprint group
  • One of the next steps is to finalize the semantic model (Entity Schema)
  • This semantic model will then drive the bot development which will be in Python hosted primarily on Github.

Participants edit

Gameplan edit

  • Define and write up when two items are the same, needed to determine if a new items needs to be created (done)
  • Update EntitySchema for Macromolecular complex & Complex Portal entity * Andra/Jose *
  • Create a draft bot to populate Wikidata with information from Complex Portal (done)
  • Run the bot on a single complex: CPX-5742 SARS-CoV-2 polymerase complex ("missing" SARS-CoV-2 complex) (done)
  • Adapt the bot to handle other complexes - first other coronavirus complexes, then yeast (as publication in preparation)

Properties edit

Statements
Property label Property ID
instance of (P31) P31
found in taxon (P703) P703
has part(s) (P527) P527
.. ..
Identifiers
Property label property id
Complex Portal accession ID (P7718) P7718
RNACentral ID (P8697) P8697
.. ..

Proposed edit

Entity Schema edit

Bot development edit

In progress

Example complexes edit

Example non-coding RNA edit

Results edit

in progress

WikiPathways SPARQL query to list yeast complexes edit

PREFIX rdfs:    <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc:      <http://purl.org/dc/elements/1.1/>
PREFIX dct:     <http://purl.org/dc/terms/>
PREFIX foaf:    <http://xmlns.com/foaf/0.1/> 
PREFIX wp:      <http://vocabularies.wikipathways.org/wp#>
PREFIX xsd:     <http://www.w3.org/2001/XMLSchema#>

SELECT DISTINCT (STR(?label) AS ?complex) ?wpIdentifier ?pathway ?page
WHERE {
  ?complex a wp:Complex ;
           dct:isPartOf ?pathway .
  OPTIONAL { ?complex rdfs:label ?label }
  ?pathway dc:title ?title ;
           foaf:page ?page ;
           dc:identifier ?wpIdentifier ;
           wp:organismName "Saccharomyces cerevisiae"^^xsd:string .
} ORDER BY ?wpIdentifier

Scholia aspect patch edit