User:Tpt/Fifth Birthday

The past year has seen a huge growth of the project: the number of items has raised from 24 to 38M, thanks to initiatives like WikiCite, and the average number of statements per item have grown from 4.9 to 8.3. Google Scholar references more than 1000 scholarly works mentioning Wikidata. The OpenStreetMap community has started a great project of mapping its content to Wikidata, leading to the addition of more than 1M Qids in OpenStreetMap. It has also seen the start of the development work of two amazing new projects, Lexemes support and Structured Data on Commons.

For next year, I really look forward the addition of the lexemes entity type in Wikidata that is hopefully going to allow us to build the broadest structured cross-languages dictionary ever made. It is going to have a huge impacts for languages that are not much present on the internet by creating free lexical data that could be used to internationalize a lot of projects and to do cross-language research. For example it would allow to do automated translation and text generation for these languages.

I also hope next year will see the creation of mature toolkits allowing to write smart and efficient tools to contribute and reuse Wikidata, like a stable version of the Primary Sources tools that would help a lot large scale donations of not very high quality data.

But I am a bit concern by the speed of Wikidata growth: we have to make sure that the data we have in Wikidata is still going to be clean, updated and usable in 10 years even if the people having done the first data injection are gone. It is in my opinion especially important for data that changes a lot like the ones related to organizations. For example in France there are around 600k municipal council that are elected every 6 year. All these data up to date in Wikidata would mean to do a big update every 6 year (with the required duplication removal effort to avoid duplicates) and continuous update to make sure to reflect the resignation. It is why I think that the most important challenge we are going to face in the following years will not be anymore "how to get more good data" but "how are we going to keep clean and updated" our existing data. We should build tools able to do automated or human supervised updates.

The Wikimedia world has seen much bigger challenges in the past, like Wikipedia reliability, and we managed to overcome them. So, I am sure that we will be able to sustain Wikidata growth like we did for Wikipedia and that Wikidata is going to become in the next years the most important free and reliable DataHub on the web.

Tpt