User:SCIdude/Maintaining

In the following we try to solve how to maintain a subset of Wikidata.

Bot maintainers can either dump an external database/ontology into Wikidata and let it rot (external data WILL change), or they can create infrastructure to keep Wikidata in sync with external data. For curators this also means that curation tasks on that data need to be (semi-)automated because they depend on that data. Taken together we are dealing with a pipeline (Q2165493) based on a directed acyclic graph (Q1195339) (DAG) that has (several) external sources and one internal, namely the current state of the WD subset. Since there are semiautomatic processes the whole maintenance problem cannot be solved using a single cloud-based process. Usually the import will be done by a bot, and the curation uses SPARQL queries as input for its subprocesses. The subqueries are not even necessary if there is a single maintenance application which has a copy of the WD subset.

Trying to fully automate semiautomatic processes requires constant feedback which changes the subprocess code, and therefore its output. So, regardless whether we have a single application or a collection of scripts, the interdependency of subprocesses needs to be defined, and it will have the topology of a DAG.