Wikidata:WikidataCon 2017/Notes/ClassRank: discovering the relevance of each class in Wikidata

Title: ClassRank: discovering the relevance of each class in Wikidata.

Speaker(s)

Name or username: Daniel Fernández-Álvarez Contact (email, Twitter, etc.): danifdezalvarezgmail.com

Useful links: slides: https://www.slideshare.net/DanielFernndezlvarez1/presentation-classrank-wikidatacon-2017

Abstract

The knowledge contained in Wikidata is provided by a wide and heterogeneous community, which expands the graph in hardly predictable ways. How can we maintain summaries of this huge amount of information? Which are the most linked topics? And which kind of SPARQL queries allow us to access that content?

We think that the idea of "class" can be a key element to provide an answer to those questions, and we have developed the ClassRank algorithm. ClassRank takes ideas of PageRank-like algorithms and adapts them to the domain of classes. Our approach detects which are the most relevant classes in an RDF graph according to the centrality of their instances. This can be helpful for several reasons:

A class is an abstract concept that can be seen as a topic which groups a set of individuals (instances). Then, a ranking of class relevance can be used as a ranking of topic relevance. This allows summarizing the content of the graph.
All the instances of a class have a common nature and are supposed to fit in a certain basic set of properties (schema). The usage of these shared properties allows designing SPARQL queries that involve all those individuals at a time.

During this talk, I would like to present and discuss the ideas of our approach. We have implemented a prototype of ClassRank and we have used it to measure class relevance in Wikidata. I would also like to present and discuss the obtained results.

Collaborative notes of the session

PageRank used to measure ClassRank

http://boa.weso.es