Wikidata:First Birthday/Op-ed

Sven Manguard has been an editor on Wikidata since its third day and is an administrator on Wikidata and Wikimedia Commons. Initially an item creator before bots began handling that task, Sven now spends his time closing property for deletion requests, doing gnomish cleanup and background work, and building Wikidata's coverage of video game content rating systems. The views expressed here are those of the author only.



State of the project: What we're doing right, what we're not, and what to look for in the year ahead (October 28, 2013)


What we're doing right edit

The community

Wikidata is a pleasant project to work on, in very large part because it has a welcoming, friendly, and enthusiastic community. While every Wikimedia project has tensions and disagreements, Wikidata has had very few disagreements escalate to the point that they became disruptive. While the relationship between developers and users is often rocky and suffers from unclear lines of communication, the relationship between the Wikidata development team and the community is overwhelmingly positive. Clear lines of communication in the form of weekly news bulletins and a project manager that is almost always available on IRC during European business hours (and often well after business hours). The high amount of activity in Wikidata's project chat, IRC channels, and mailing list are a testament to the continued enthusiasm that contributors have for the project, and along with impressive numbers of new users and active users, is an indication of the project's continued health and productivity.


Interwiki links

There were plenty of ideas to get excited about when Wikidata was announced. The ability to update a piece of information in once central location and have that update reflected on 200 sister projects was exciting. The ability to run complex queries that are capable of generating lists like "female poets from South America born after 1955" or "critically endangered species of birds native to Canada" or "video games with a USK rating of 12+ and a PEGI rating of 18+" was exciting. Compared to features like those, the ability to move interwiki links off of the mainspace pages in sister projects and into a central repository doesn't sound particularly exciting. Useful, yes, but not exciting.

The consequences of moving interwiki links to Wikidata, though, were definitely exciting. Because Wikidata does not allow multiple articles from the same project to link to the same item (i.e. two different Danish Wikipedia articles can't share a Wikidata item, but a Danish Wikipedia article and a Swedish Wikipedia article can), tens of thousands of duplicate articles were discovered during the import process and merged or deleted on the projects that were hosting them. The archives at User:Soulkeeper/dups show only a fraction of the duplicates that have been caught and fixed thus far. Hundreds, possibly thousands, of incorrect interwiki links have been detected and fixed, and almost two hundred more are being worked through at Wikidata:Interwiki conflicts. With interwiki links now imported from both Wikipedia and Wikivoyage, and with those links sharing an item on Wikidata when they cover the same location, we may someday soon see inter-project links on the same sidebar as inter-language links. There is more work to do in the year ahead, but the impact that bringing interwiki links over to Wikidata has had already more than justified the project.


Administration

A well-administered project requires more than just keeping backlogs down. Although considering the volume of deletion requests handled on Wikidata, keeping backlogs down is still an impressive feat. To be a well-administered project, the relationship between members of the community that hold advanced permissions and those that do not needs to be built on mutual trust and respect. To be a well-administered project, there needs to be a balance between having established policies to fall back on and having the freedom to make decisions based on the case at hand. If a project doesn't have enough policies in place, and covering the right areas, simple decisions become chaotic. If a project has too many policies, and they either cover too much or are too rigidly enforced, simple decisions become overcomplicated, complex decisions aren't looked at with nuance, and some people may begin to view proper enforcement of policy as being more important than finding an ideal solution.

Wikidata has, in its first year, managed to be overwhelmingly successful in both of these areas. The project was very lucky in that it got off on the right foot during the administrator elections that took place right after the project went live. The community showed candidates a tremendous amount of trust and opted to elect a large and diverse group of initial administrators. As the project matured, a conscious effort was made to keep the barriers to becoming an administrator low enough that users saw being elected as achievable. This is in no small part because the community looked to English Wikipedia as an example of what not to do. Regardless of the motivations, the results speak for themselves. Wikidata does not have an "us versus them" relationship between members of the community who hold advanced permissions and those whom do not. Thus, the tension that the "us versus them" mentality causes on other projects hasn't migrated to Wikidata. As for the balance between too many policies and too few, Wikidata leans rather strongly towards having too few. Some users would be more comfortable with additional policies in certain areas, but it is not so unstructured that people feel that the project is going off the rails on a crazy train. While a lot of decisions were made on the fly in the beginning, a year in, the project is in a good place administratively. If Wikidata continues down the path it's gone thus far, there is little doubt that it will continue to run smoothly.


Development

No list of achievements can be complete without noting the tremendous job that both the Wikidata development team and the community tool and bot developers have done in the past year. While people are certainly eagerly waiting for the remainder of the datatypes to go live, and for queries to become possible, Wikidata has six datatypes live and has deployed to three sister projects in its first year. Additionally, members of the community have come up with dozens of powerful editing aids, most of which can be viewed at Wikidata:Tools, that have expanded on Wikidata's capabilities beyond what even the Wikidata developers had envisioned. Over 100 bots have been approved, and collectively they have made tens of millions of edits, populating Wikidata with enough information that maps, timelines, and other diagrams can be generated using only information stored on Wikidata. As more datatypes go on line, and more data is brought over to Wikidata, the project will only become more powerful and more useful to the Wikimedia community and the wider world.

What we're not edit

Deployments to sister projects

The deployment of Wikidata to Wikimedia Commons, while handled well by the development team, was botched by the Wikidata community. With weeks of advanced notice that Wikidata was coming to Commons, the community failed to settle on a model for linking Commons categories and gallery pages to Wikidata items. On the day that the deployment went live, the planning page looked like this, with every idea marked as being on hold and no final decision decided upon. The notability policy wasn't updated to include Commons pages until a day after the deployment. Even now, a month after the deployment, the policy isn't clear.

The Wikivoyage deployment was easy. Structurally, Wikipedia and Wikivoyage are very similar, and so most of the important decisions were easy. Deploying to Commons should also have been easy, as we were only dealing with interwiki links for a few types of pages and because Commons is a project that many editors on Wikidata are familiar with. We're running out of easy projects. Wikispecies and Wikiquote are probably the easiest ones left, but Wiktionary, Wikisource, Wikiversity, Wikibooks, and Wikinews all differ significantly from the projects that Wikidata has been deployed to thus far, and will require a great deal more planning and cross-community interaction than we've done to date.

If the Wikidata community does not do a better job going forward in properly preparing for deployments to sister projects, it will continue to suffer the problems and confusion that comes from trying to clean up after botched deployments. Equally important, Wikidata's utility depends on the editors of our sister projects wanting to integrate Wikidata with those projects. Every time we fail to involve the sister projects during deployment, we lose an opportunity to build an effective working partnership with those projects.


Decision making

The troubles in the run-up to the Commons deployment are endemic of a larger struggle that Wikidata has had in making major decisions. While many major decisions were resolved with relative ease during the first few months of the project, the recent lack of participation in requests for comment discussions is hampering the development of the project. As a result, decisions are being made over IRC because it has become the only place where someone can get a reasonable number of opinions in a reasonable amount of time.

One illustration of how this is problematic is the request for comment asking whether we should be using a large number of specific properties or a smaller number of generic properties when importing data. This is a fundamental structural question that has no business still being unresolved a year into the project. The RfC itself has been open since the first day of June and has been linked to in the watchlist notices for most of the time it's been open, but it only has nine participants. The issue is too important to close without a consensus, but despite being open for five months, there aren't enough voices to form a consensus. With no guiding consensus on the issue, the question winds up playing out at Properties for Deletion discussions, which puts the burden of reaching a decision on the admins closing the discussions instead of on the community at large. That request for comment is the oldest one currently open, but there are a dozen more open RfCs, with another one open since June, two since July, and four since August.

Some of the requests for comment on Wikidata are more important than others, but almost all, if not all of them are still major decisions that deserve a hearing and a response from the community. Going forward, Wikidata should aim to have a dozen and a half or two dozen people weigh in on every single RfC, and should aim to have RfCs resolved within a month's time. There are 4,000 active users, including 600 that have made more than 100 edits in the past month, and 90 administrators (a group that you would certainly hope are generally active), so it's not that much to ask for two dozen people to show up twice a month and spend an hour or two going through the open RfCs and weighing in. The project would be that much healthier from having major decisions made in a timely manner.


Translations and multilingualism

In theory, Wikidata is a language neutral-project in which users can work and communicate in whatever language works best for them. In practice, much like Commons (the other language neutral-project), almost every important discussion happens exclusively in English. Trying to discuss highly nuanced situations through Google Translate isn't ideal, but neither is making people with little or no English language fluency feel left out. So long as the community remains open to people contributing to discussions in languages other than English, and is not condescending or hostile to non-English speakers, the community can take its time finding a solution that works. It's something that should be worked on in the year ahead, but it's not something that must be handled immediately.

One thing that should be handled immediately, or at least over the next few weeks and months, is the lack of translations for help pages, policy pages, and properties. With 13.5 million items, it is unreasonable to expect that every item have a label and description in every major language, but there is no excuse for not having up to date translations for properties, of which there are only a thousand, and of policy pages and help pages, of which there are less than two dozen. Help:Label is only in 12 languages and Help:Description in 11. Among those not represented are Chinese, Arabic, and Hindi. The numbers are slightly better in policy pages, but many of the translations in both help pages and policy pages are out of date. In order for Wikidata to be useful to its sister projects, editors from those projects need to be able to identify which the properties they want to import data from. In order for Wikidata to be successful in attracting new users who speak languages underrepresented on Wikidata today, we need to have policy and help pages waiting for them so that they know how to jump in and help.


What to look for in the year ahead edit

Additional datatypes

Wikidata is on track to get several new datatypes in the coming year, and has already approved properties that will go live once the new datatypes are deployed. The "Number (dimensionless)" datatype will allow Wikidata to store sports statistics, population figures, mortality rates, and data that is stored in ordered lists, such as an element's atomic numbers or a university's place in a best colleges ranking. The "Number (with dimension)" datatype will allow Wikidata to store information that has an attached measurement, such as economic data (GDP per capita, net profit, total assets), measurements of physical phenomenon (a person's height, a planet's orbital period, an element's boiling point). Between the two, over 100 properties are in the wings. Other properties that are still waiting to be deployed include the "Monolingual text" datatype, which will allow for nicknames and mottos to be recorded, and the "Geo-shape" datatype, which will allow Wikidata to store objects such as roads, rivers, or the Great Wall of China as a series of lines instead of as one or more specific points along those routes.


Queries

Several users have already created tools that approximate queries in certain cases, but there's no substitute for the real thing. As mentioned above, the ability to generate lists of articles as specific as "female poets from South America born after 1955" is exciting. Queries will be a tremendously powerful tool in the hands of Wikimedia project editors, researchers, and ordinary users. Although it may take years to reach that point, queries have the potential to render the Wikipedia category system obsolete. While the impact of the datatypes set to release in the coming year is significant, the impact of queries will dwarf all other aspects of the project. It will take significant forethought and careful planning by both the developers and the community to properly roll out queries, but once they do roll out queries will be the focal point of Wikidata's relationship with its sister projects, and will attract significant outside interest.


Final words edit

A year in, there is no doubt that Wikidata is on the right track. While there are areas for improvement, Wikidata has attracted a thriving community, filled with friendly and enthusiastic editors and supported by a talented development team. Everything that has happened in the past year indicates that Wikidata will meet the challenges ahead, and will do so with the same class and the same energy that has built the project to where it is today.

Congratulations and best wishes for another great year.

Sven Manguard