Wikidata:Development plan/archive2020/status updates

Wikidata as a platform edit

Increase data quality and trust edit

Feedback loops with data re-users edit

We want to work with large data re-users to get their feedback and improvements for our data in a way that works for our community.

Outcome: We have talked to a number of large users of Wikidata’s data to better understand what mistakes they identify themselves and if/how their users flag mistakes to them. We have helped them understand that providing us with mistakes they find etc is crucial for the long-term sustainability of Wikidata. We will continue working with them on building up workflows and processes that can work for everyone involved but this takes time given the large organisations involved.

Automated finding of references based on semantic markup edit

Too many statements on Wikidata don't have a reference. For some of these statements we can automatically find references. We can do this by comparing our data with data in linked websites (marked up with schema.org or similar mark-up). These websites could be in the identifiers, references or even connected Wikipedia articles.

Outcome: We have built a tool that looks at the websites linked from an Item through its external identifiers. It analyzes these pages for schema.org markup that corresponds to the statements on the Item that do not have a meaningful reference. We have collected more than 50.000 potential references this way and made some of them available through a Wikidata game and released a dashboard analyzing the game decisions as well as the remaining ones as a dump for further import.

Checks against 3rd party databases edit

Via external identifiers we have connections to a lot of other data bases. We can compare our data against their data and highlight differences so editors can look into them and fix them when needed. We want to build a system that is extensible so that anyone can do the mapping for another 3rd party database in the future.

Outcome: We have researched the topic and interviewed a number of editors as well as operators of 3rd-party databases to understand the problem space better. The implementation will be done in 2021.

Improve quality scoring for Items edit

Scoring of Items in Wikidata is possible now using ORES. The scoring is not accurate enough yet though and needs improvement.

Outcome: We have improved the existing signals that ORES takes into account when scoring the quality of an Item and added new signals. We have also gathered new training data for the machine learning system to make sure that it understands the current state of Wikidata after significant shifts in the content over the past years. We have retrained the model with new signals and training data and made an improved Item quality score available through ORES. We have then analyzed the average quality development over time and started investigating additional ways of evaluating quality beyond what ORES is able to capture.

Finding problems in the ontology edit

We want to make it easier for people to find issues in Wikidata's ontology because modeling inconsistencies and related problems are a big obstacle for data reuse.

Outcome: We made some progress on this through the Curious Facts project (mentioned further down) but the main part of the work slipped into 2021.

Tainted references - persistence edit

We have developed the first version of tainted references. Depending on feedback by the editors we do want to make the indicator persistent in order to allow more people to see and clean up mismatching value/reference pairs.

Outcome: When changing the value of a statement while leaving the reference untouched, the editor making the change is now notified that this might not be the right thing to do and asked to double-check. Making these cases available to other editors has not been implemented yet.

Expose data about current events (Prototype) edit

Current events are a common target of vandalism while also being important for reusers. We should find ways to expose them on Wikidata so editors can better keep an eye on them.

Outcome: A dashboard has been developed to show Items related to current events. It has not been rolled out yet due to some infrastructure issues caused by the requirement to have near-real-time access to the underlying data. Roll-out will happen in 2021

Finding gaps and biases edit

We want to find more ways for people to find biases and gaps in the data in Wikidata so we can work on making our data less biased and more complete.

Outcome: This topic has slipped to 2021.

Research for easier monitoring of Items edit

Institutions are interested in more easily monitoring how their data is changed. We need to research how to make this possible.

Outcome: We researched the topic but decided to delay working on the topic in favor of infrastructure improvements.

Evaluation of data quality of a subset of Items edit

It is important that data re-users have trust in Wikidata's data quality and can easily access useful information about the data quality of the specific parts of Wikidata that are important to their work.

Outcome: We have done the preparatory work by improving the Item quality scoring via ORES as well as by exploring analyzing the number of constraint violations on an Item as a measure of data quality. We will build the tool itself in 2021.

Completeness indicators edit

Within Wikidata, it is sometimes difficult to determine the completeness of an Item or a particular area of knowledge. Reusers of our data are interested in knowing this information when evaluating whether Wikidata is the right data source for their needs.

Outcome: We deprioritized this topic in favor of infrastructure improvements.

Curious facts (prototype) edit

Curious facts are potential indicators for wrong data. After analyzing the data in Wikidata for curious facts, we can then make them visible to users within the interface. This will make it possible for editors to check on the data to determine which, if any, corrections are needed.

Outcome: The prototype for this has been developed and will be published for feedback in early 2021.

Encourage more data use edit

Page rank for Items (Prototype) edit

Wikidata has a lot of Items. When querying for a list of Items you want to often order them by some kind of importance measure. Each Item in Wikidata should get a score that tries to represent how important it is compared to other Items.

Outcome: We have analyzed how we can rank the individual Items in a query result with little overhead and have a reasonable default ranking now. Integration in the Query Service is still to be done.

Easier access to data for programmers edit

We want to improve our APIs to make it easier for programmers to access our data.

Outcome: We have drafted and published a specification for a REST API and gotten feedback on it and incorporated it. Implementation will be done in 2021.

Query Service UI improvements edit

We will enhance the usability of the Query Service by making improvements and adjustments to the UI.

Outcome: We deprioritized this in favor of infrastructure improvements and the Query Builder.

Incorporate feedback into partnership model V1 edit

The first version of the partnership model has been published and we now need to incorporate feedback we received for it.

Outcome: Version 2 of the partnership model has been written and published, taking into account feedback for version 1.

Data partnership model expansion - version 2 edit

After reviewing and incorporating feedback received on the first version of the partnership model, we will expand the model and publish a second version to the community.

Outcome: Version 2 of the partnership model has been written and published.

Enable more diverse data and users edit

Investigate PanLex integration edit

By transforming thousands of translation dictionaries into a single common structure, the PanLex database makes it possible to derive billions of lexical translations that are not found in any single dictionary. We are investigating how Wikidata/Wikibase and PanLex can work together for the benefit of all.

Outcome: We had discussions with people from PanLex to discuss setting up their own Wikibase instance and potentially in the future feeding some of that data into Wikidata. The project is on-hold because of time constraints on their side.

Accessibility evaluation edit

We don’t want to exclude any users through technical barriers. As a first step we need to first understand the current state.

Outcome: We have deprioritized this in favor of working on the design system. However, design system components will follow accessibility standards, so we have started investigations into this topic.

Usability problems/UX debt edit

Over the years a number of UX issues have been accumulating on Wikibase / Wikidata. We need to tackle the worst ones.

Outcome: We have started looking into them and taken them into account for the work on the Design System. More work needs to be done in the future. The work on the lexicographical data UI in 2021 will feed into this.

Support the creation of underrepresented knowledge edit

To support the mission of giving more people more access to more knowledge, it is important to increase the diversity of knowledge in Wikidata. We will explore how to best connect this data to Wikidata and try to identify if there are any unintentional ways that Wikidata's method of storing knowledge might not support participation by certain communities.

Outcome: We have deprioritized this in favor of the work on biases and gaps in 2021.

Interface improvements for lexicographical data edit

To increase the usability of lexicographical data in Wikidata, we will ensure that there is a stable interface that allows people to edit and reuse this data more easily.

Outcome: We have made small improvements but the majority of the work has slipped to 2021.

Query builder for lists edit

Queries and lists are an integral part of accessing the data in Wikidata and making sense of it. Right now creating lists requires knowledge of SPARQL. We want to make it easier for people to create lists without having to know SPARQL.

Outcome: A test system for the query builder exists and allows editors to create their first meaningful queries. Additional work on it will happen in 2021.

Other edit

Design system edit

We want all parts of Wikidata and Wikibase to be consistent, to provide an overall better user experience, and we want to save design and development time through reusing UI components. The design system will help us with that by defining a number of standard components, patterns, and guidelines.

Outcome: The Design System setup is in place now and is being used to build the Query Builder. It will be used and expanded for future work on other features. We are also aligning with WMF colleagues how to reuse components between the organizations.

Normalization of wb_terms table edit

The wb_terms table has grown in size to the point where the infrastructure can't cope with its growth anymore. We need to rearchitect the table to scale Wikidata better.

Outcome: The table has been removed and the area is now less of a bottle neck than before.

Infrastructure analysis edit

We will have a company provide an outside view of our infrastructure, provide an evaluation and give recommendations. We then need to review the feasibility and fit of these recommendations.

Outcome: The infrastructure analysis has been worked on and is ongoing in 2021.

Prototype of list/simple query storage and lookup infrastructure (Prototype) edit

We want to prototype the infrastructure that will allow us to make a query builder possible.

Outcome: This has been deprioritized in favor of work on the wb_terms table.

Vue component library edit

In order to make development of new features for Wikibase easier we need to build out a library of UI components that can be reused across all of Wikibase.

Outcome: This has been done.

Wikibase ecosystem edit

Build out the ecosystem edit

User research with catalogers at GND edit

Despite COVID, we completed user research (virtually) with catalogers at the GND.

Wikibase community calls edit

Wikibase Live sessions were initiated in 2020, serving as both a regular community touch-point and a venue for sharing knowledge.

Wikibase community meetups edit

Due to COVID, Wikibase community meetups were not held in 2020.

Access Wikidata Properties in custom Wikibase instance (aka Federation) edit

We built a first version of Wikidata-Wikibase Federation that allows for the use of Wikidata's properties in place of local properties in a Wikibase. The feature entered a pre-release testing phase in the latter months of 2020, and is scheduled for official release with Wikibase 1.36.

Expand GLAM strategy edit

We conducted research into the GLAM sector (galleries, libraries, archives, and museums) to create an addendum to the product strategy for Wikibase.

Documentation for things to do after installing Wikibase edit

The Wikibase post-installations were updated and improved in 2020. It is now available on the Wikibase website.

Strategy and infrastructure for releasing Wikibase packages edit

As Wikibase becomes more and more used outside Wikimedia we need to set up proper release infrastructure and processes. We began the “Release Strategy and Infrastructure” initiative in 2020, and are continuing it through the first months of 2021.

Update instructions for Wikibase installations edit

We need to document how to upgrade an existing Wikibase installation. This effort is currently delayed as we finish implementing the new release infrastructure for Wikibase suite, and will be tackled in 2021.

Merge UI edit

Wikidata has a gadget to improve the workflow of merging two Items. In 2020, we conducted user research (expert interviews) with some of the most frequent users of the Merge.js gadget on Wikidata. This research allowed us to create initial designs for an improved/integrated merge tool for Wikibase. Development of this tool will be revisited in the future, most likely at the end of 2021.

Research for data input forms edit

A regular request from people running Wikibase instances is that they would like an easy way to build forms for their editors so they can easily enter complete and accurate data. We were unable to tackle this research in 2020.

Support structures in the Wikibase Ecosystem edit

In 2020, we worked with institutional users of Wikibase to understand their most common needs for service and support. This resulted in the launch of the Wikibase Consultants and Support Providers directory, which providers are invited to add themselves to.

Installation pingback (prototype) edit

We developed an prototype of an opt-in pingback mechanism for Wikibase at the end of 2020. It is currently in its final stages and we expect to include it in the next Wikibase release.

Wikibase website edit

We began improving the content of the Wikibase website in 2020, particularly user documentation. A more thorough redesign is planned for next year.

MVP for Wikibase as a service platform edit

We have defined a possible minimum viable product (MVP) for Wikibase as a Service and built a prototype of this platform at the end of 2020. We will continue our efforts toward creating the MVP in 2021.

Automated configuration discovery edit

In 2020, we tackled the “automated configuration discovery” topic by developing the WikibaseManifest extension and conducting initial testing with Wikidata tool builders. The extension will be released to Wikibase users with Wikibase 1.36.

Explore opportunities for more organisations to use Wikibase in their projects edit

We continued to support organisations in the GLAM sector and beyond in their explorations, evaluations, and implementations of Wikibase.

Investigate and include core gadgets as part of Wikibase (excl. merge gadget) edit

This effort was not tackled in 2020, but we anticipate that full adoption of the WikibaseManifest extension by users and tool builders will help to expand the range of tools and add-ons available in Wikibase.

Federation version 2 edit

Due to delays, development of Federation version 2 will take place in 2021.

Wikibase service defaults for non-Wikidata.org installations edit

Due to delays, this initiative was not tackled in 2020.

Link to media on wikis other than Commons in statements edit

Wikibase users, particularly those in the GLAM sector, want to be able to link to/display media in Wikibase that is not and cannot be on Commons.

Make Query Service less specific to Wikidata edit

Due to delays, this initiative was not tackled in 2020.

Automatic inclusion of local client site for sitelinks edit

Due to delays, this initiative was not tackled in 2020.

Other edit

Improve Wikibase lower and midlevel documentation edit

Outcome: Done

Wikibase extension registration and decoupling edit

We completed the decoupling of Wikibase Client and Wikibase Repository and implemented extension registration for Wikibase.

OPEN!NEXT edit

OPEN!NEXT is in progress and our efforts will continue as planned.