Wikidata:WikiProject PersonalData/Jupyter wikidata

Context

Wikidata is all about federated production of data.
Jupyter has progressively focused on federated production of data analysis pipelines.

In the context of research:

Jupyter is very much associated with good practices in Open Science.
Wikidata has the Wikidata for Research project.

This project aims to work on a better integration between the two.

Past work

Unsuccessful "Wikidata for research" grant application

In 2014-2015, several members of the Wikidata community | jointly applied to a EU infrastructure call. They published their proposal, whose Wikidata entry has the list of authors, affiliations, and Wikipedia handles.

Successful OpenDreamKit application

On the same call, a bunch of mathematicians and some computer scientists applied and got funded for the OpenDreamKit project. The project proposal is visible here. Work Package 6, in particular, concerns the structuring of Data, Knowledge and Software through explicit semantics, for more efficient science. See the publication (Arxiv) , which also has a Wikidata entry Interoperability in the OpenDreamKit Project: The Math-in-the-Middle Approach (Q57389301).

A substantial part of OpenDreamKit concerns Jupyter.

PAWS

Wikimedia itself has identified the usefulness of Jupyer, and launched PAWS. The motivation is described here.

OpenHumans

Tim Head of WildTreeTech has worked with OpenHumans to implement Personal Data Notebooks, that help individuals deploy Jupyter notebooks on their own data.

A growing collection of notebooks is here.

Possible future work

BOSSEE

As a follow up to OpenDreamKit, a project proposal is currently being written for a new infrastructure call. The project is called ("BOSSEE") and centered around Jupyter. The brainstorm is here. At the moment BOSSEE involves mostly previous participants to OpenDreamKit, but also Tim Head of WildTreeTech,

PersonalData.IO

A lot of open science is concerned with reproducible data analysis pipelines. This often means that the analysis pipeline can be redeployed in completely different contexts. One of the participants in OpenDreamKit, Paul-Olivier Dehaye, has gone one to found a nonprofit, PersonalData.IO, focused on personal data empowerment. He is also on the board of MyData, a global organisation focused on those topics as well (so is Mad Ball, who is leading OpenHumans). He thinks that these analysis pipelines (and associated infrastructure) can be useful for redistributing power in the personal data economy, and that Wikidata and Jupyter are part of a solution (they offer possibilities for federation of data, and analysis).

Ideas

Any step that would contribute to a more flexible integration of Wikidata and Jupyter.

Concretely:

congruent deployment of integrated Wikidata and Jupyter - a la PAWS -, at all scales (Wikidata + Jupyter in entreprise, and eventually for the individual)
modeling of a federated wikidata for State in a Jupyter notebook (useful when the notebook is deployed through binder, for instance)
modeling of a federated Jupyter for data processing operations as a wikidata-formalized workflow.
modular notebook generation based on data stored in Wikidata (first cell from here, second cell from there, etc; for instance data cleaning, the core data use, and the data reshare)
modularized binder along the same lines
possible integration of Solid with Jupyter (as an add-on)