Wikidata:FactGrid/blog-archive

Problems and more, a brief status report of the FactGrid Project on Friday 13, April 2018 edit

Well, not for the US Army, that's the good news...

Four months have passed and the FactGrid project has run into its first unexpected problems. We are confident that we will solve the – primarily technical – issues, but one of the lessons we have learned so far is that we will need the support of a larger community in order to situate the FactGrid Project with more impact in the Wikidata-community.

What do we want to achieve? We are trying to launch a Wikibase installation with the aim to offer a platform for original research. Data hosted on the FactGrid will be free to be used by Wikidata. Data will leave the FactGrid database, however, with the personal authorisations of research which Wikidata is not be able to generate.

Digital humanities projects interested to work on the collective FactGrid platform will sponsor software developments with their respective DH-funding. The cooperation with Wikimedia should make sure that tools we sponsor will become part of the wider Wikibase software package. We want to prevent island solutions.

What kinds of problems have we been facing? And where do we need you?

Problem 1: The software is free but the vital tools do not work outside the Wikidata environment. Wikibase is – relatively – easy to install, but the central tools you need in order to get data into and out of the database – QuickStatements and SPARQL – proved to be hard wired to the original Wikidata compound. Lucas Werkmeister has managed to free Quick-Statements from these ties. SPARQL remains on his agenda. We have no idea how tools that use SPARQL (in order to create visualisations for instance) will work once we have the independent SPARQL version. The software problems have blasted our entire schedule.

Problem 2: Getting the first sets of data into the FactGrid. We have four larger spread sheets of data from the Gotha Illuminati project which we want to use in order to create an attractive show case.

Our data sets are intriguing and able to attract a wider public interest without further advertisement. They should create steam for the engine if we manage to convinced all the parties involved (Freemason, Berlin State Archive, Gotha Reasearch Centre and the Wikimedia Community) to risk a crowd sourced identification of the roughly 6,000 digitised documents which we have been gathering over the last four years. We have underestimated, however, the problems an empty database (a database without any properties and any primary items) is causing.

If you feel cool with QuickStatements and if you think an empty wikibase installation is just the free space you have been dreaming of, join the team and help us to learn how we can use our data with the brilliant software.

Problem 3: We will need a more massive Wikidata and/or GND input. We will need our own landscape of information ready to be improved if we want to attract other projects of historical research and regular internet users (with wider a genealogical project for instance). A strategist is here needed, someone with ideas how we could (for instance) acquire all the names of people with birth dates between 1400 and 1800 from Wkidata and/or the GND for our database. (To keep the database clean we might focus on basic data like names, birth dates, places of birth and death, and family connections). If you feel you could organise such an import – we would offer you all the freedom of the experiment you would ask for.

Problem 4: We will need something like forms which users can fill in in order to create standardised CVs with the Wikibase software. Adrian Heine has taken the first steps into this project. Our aim is a Wikibase environment which normal people can correspond with like they have been corresponding with the Wikipedia software so far. You pick a person of your interest and you get a questionnaire with modules (on places and addresses that the respective person has lived, on employments he or she has been in, on the person's genealogy, on personal contacts we can prove). Wikibase is presently not exactly ready to be edited by normal people.

Problem 5 (a project for the future): Wikibase needs something like a standard Wikibase-Interpreter Magnus Manske’s Reasonator has been the cool thing on all my presentations of the Wikibase software in DH-circles. You can pick your language and you get an organised data sheet.

Things get difficult if you want to correct or augment the Reasonator's information sheet; and things get even more difficult if you want to run the Reasonator on your own platform. The development of an immediate interface that produces smooth pages of structured information will be necessary in order to motivate people to gather information for Wikidata (or any affiliate). The Wikibase-Interpreter would be ready to offer the complete knowledge on any field of interest. It would be ready to list all the places a person is known to have visited, all the contacts he or she is known to have had - whether face to face or through letters. Think of the thousands of contacts of the Leibniz’ correspondence – a problem to be solved with pages that give an overview and “more” on the user’s particular request. The Reasonator is, so far not reading Wikidata directly, nor is it part of the Wikimedia software development. Wikidata will need its own Interpreter in order to become an independent source of information – an independent source that also serves all the Wikipedias around the Globe.

We have been able to offer a couple of grants in 2017 in order to get the project going. We should be able to use the further funding of DH-projects interested in the software and the collaborative platform to inspire if not to fully finance future tools. DH-projects will, however, only risk a cooperation with the FactGrid project and with Wikimedia as the software and data-partner if we manage to offer an attractive show case of what can be done. The Illuminati files are an immensely cool project to begin with. The global interest in these files is huge, everyone has heard of the Illuminati and here you get their most secret files. Visualisations of networks and of the geographical spread of the secret order will find a good test case here. If we should be able to inspire a crowd sourced annotation of all the known documents – that would stir up a global press attention.

We are, however, far from the show case which we could present anywhere at this moment. --Olaf Simons (talk) 09:23, 13 April 2018 (UTC)[reply]

The First FactGrid Workshop, Berlin 2018-12-1/2 edit

FactGrid Workshop Berlin at Wikimedia Deutschland

For our first meetup in real life, we chose the rooms of Wikimedia Deutschland. Both Wikidata volunteers and historians put their heads together to tackle the problem of filling the database for a very specific historic project — bringing the knowledge from the collection of books from the order of the Illuminati (the so-called “Schwedenkiste”) online. The project with the files from the order is explained here.

The group started with an introduction into Wikidata and the historic project. An Etherpad documented our discussion (in German).

As an example, let’s look at some results:

For demonstration purposes, we corrected the date of death for August Bohse (Q760965). While the correction was easy enough, giving a reference for the fact (adding the book, author etc) was unnecessarily complicated. It was decided to prefill the Wikibase instance with

Wikidata objects
GND objects
bibliographic information (catalogs such as VD16-VD18, ESTC, STCN …)

Double entries will become a problem. Something like a Wikidata game looks very promising to weed them out.

Tools like Squid and Reasonator look especially attractive to present the content of the FactGrid Wikibase installation.

We also briefly looked into license questions.

What happens next? The data has to be prepared and put into the Wikibase instance for FactGrid.

A workshop will follow in early 2018 to discuss progress. We would also like to present and give out ideas at the following upcoming events:

Barcamp Open Science 2018, March 12, 2018, Wikimedia, Berlin
Coding Da Vinci Ost: Kick-Off 14./15. April 2018 Universitätsbibliothek Leipzig
Wikimedia Hackathon 2018: Barcelona 18–20 May 2018

Further coordination will also happen on the mailing list factgrid@wikimedia.de.