User:Fnielsen/Open science data in wikis with data mining

Open science data in wikis with data mining — representing scientific data in wikis for large-scale meta-analysis with the goal of presenting an overview of qualititative results across all of science.

Ten hundred words summary edit

Imagine numbers from all studies presented in books and papers put into a computer and carefully put up so everyone from all over the world quickly can see them. So everyone can add new numbers by themselves. So everyone can show the numbers against each other to see if they agree or not between studies. We will study ways to make this possible. We start from brain studies and take out numbers from brain study papers and put them into a computer store. We need to find a way to do this fast and do it exactly. We need to decide on a way to handle the numbers in the computer so taking numbers in and out of the store is easy and the form is easy for others to understand. We need to find a way to make the computer understand whether the studies agree or not. And a way to find what is the cause when they do not agree. And finally we need to find a way to show whether the number agree or not to people from the whole world sitting at their computers.

(should validate with http://splasho.com/upgoer5/)

Stages edit

Collection
Identification of studies, extraction of data and entry in system. This is related to the StrepHit proposal, Magnus Manske's SourceMD and QuickStatement tools, and Primary Sources tool.
Representation
Data should be represented succinctly, both for ease of entry and for further query and analysis. A meta-analysis ontology is wanted. Physical units (e.g., milliliter) should be supported. This stage could easily be regarded as non-trivial (a typically meta-analysis is ofter represented in a simple spreadsheet) but it is important to get this right to get a flexible and extensible format.
Selection
Research on how data is selected. Can predefined querys retrieve all relevant sets of data for meta-analysis? Would a user/researcher need to have an interactive interface to select and deselect items?
Analysis
Implementation of various statistical meta-analysis methods from the literature. Research on new statistical methods for analysis of multiple sets of meta-analyses results, e.g., unsupervized factorization methods.
Aggregation
Research on various means to aggregate meta-analysis results, particular beyond funnel and forest plots.

State-of-art edit

Brede Wiki for personality genetics
A concise web-service with collaborative input and versioned storage for personality genetics with meta-analysis and presentation of results. After mass meta-analysis it shows an overview of all candidate gene personality genetics. Its limitation is that it does not handle other data than personality genetics
Brede Wiki
A MediaWiki-based wiki with structured data stored in MediaWiki templates and comma-separated values format and with off-wiki meta-analysis. Can handle data from a variety of domains. The wiki contains a considerable number of studies from Matthew Kempton's and his coworkers' meta-analyses on brain regions. Present version has inconvenient editing facility and little semantic markup of information in the CSV formatted pages.
Gene Wiki
A bot that aggregates information from bioinformatics resources and pushes it to Wikipedia and Wikidata. Work around the bot has been documented in a number of published papers.
NoRDF-RDF translation
NoRDF (non-triple-like) formats such as Wikibase/Wikidata need conversion to triple-format, see, e.g., Reifying RDF: What Works Well With Wikidata? and The Statistical Core Vocabulary (scovo)
Internet Brain Volume Database
(The original link http://www.cma.mgh.harvard.edu/ibvd/ is apparently down) IBVD is a online presentation of results from brain volume studies with structural neuroimaging as presented in published papers. Meta-analytic plotting is on the mean rather than the standardized difference.
AlzGene and related resources
Online meta-analyses of gene association studies for Alzheimer's Disease, Parkinson's Disease, ALS, MS and schizophrenia.

Methods edit

  • Ontology research for defining a meta-analysis schema.
  • Research on representation of scientific data for meta-analysis in Web-based collaborative environments.
  • Using Wikibase (Q16354758) for representing data suitable for meta-analysis (Q815382), either in Wikidata or in a separated Wikibase installation.
  • Other possibilities are MediaWiki with templates, MediaWiki with CSV data, versioned ORMs are possibilities for data representation. Data is n-ary requiring some thoughts in connection with translation to a triple format. Exploration of other representations of data (besides MediaWiki and Wikibase) for online versioned and structured data.
  • Research on statistical methods for multiple meta-analysis results. Unsupervized learning methods.

Examples edit

Wikibase example with ventricle-brain ratio edit

Example with data from a neuroimaging (Q551875) study called "Ventricular enlargement in major depression (Q21100980)". Here is one form of "wikibasification":

Wikidata SPARQL example with ventricle-brain ratio edit

Individual "notable values" may also be represented in Wikidata. Here is a query that fetches numerical data from the ventricle-brain ratio item:

PREFIX pr: <http://www.wikidata.org/prop/reference/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX v: <http://www.wikidata.org/prop/statement/>
PREFIX q: <http://www.wikidata.org/prop/qualifier/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX prov: <http://www.w3.org/ns/prov#>

SELECT ?value ?year ?referenced ?referencedLabel WHERE {	
   wd:Q17141282 p:P1181 ?value_statement .
   ?value_statement v:P1181 ?value .
   ?value_statement prov:wasDerivedFrom ?ref .
   ?ref pr:P248 ?referenced .
   ?referenced wdt:P577 ?year .
   SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}

Short link: preview.tinyurl.com/ncny58y