Open main menu

Wikidata:Federation input

Development plan Usability and usefulness Status updates Development input Contact the development team

We do want to allow more and more people to run their own Wikibase instance. As part of that we want to allow people to deeply connect these Wikibase instances among each other and especially with Wikidata. We also want to allow them to benefit from work done in other Wikibase instances and especially Wikidata, like the definition and refining of properties. As people talk and think about this it is often referred to as Federation. However there is the problem that we don't talk about the same thing when we say Federation but instead about a multitude of related features. In order to move the development forward we need to sort out what people actually want and need to make sure we're actually developing something useful. This page is for collecting your input for this.

What already existsEdit

There are two things that already exist that we would count under the Federation umbrella:

  1. Structured Data on Commons: Here a Wikibase instance (Commons) uses Wikidata's Items and Properties to make statements on media files. Restrictions that apply: There are no Properties and Items that are defined locally on Commons. The two wikis are in the same database cluster.
  2. Wikidata Query Service: You can write a SPARQL query that combines data from Wikidata with data from another SPARQL endpoint in the same query. Restrictions that apply: There is a white-list of other SPARQL endpoints that you can federate with. It only works for the ones on the whitelist.

What should existEdit

Please copy the section template and fill it out. The more precise you can be the better.

Input by User:MaxlathEdit

  • I would like to be able to do this: be able to reuse entities from a whitelist of remote Wikibase instances, either as snak property or snak entity value. From a property or value input, the interface should suggest those remote entities.
  • I think this would be great because: that means we don't have to redefine in each Wikibase the properties and entities that are already defined elsewhere, I could for instance state that <my local item> -> main subject (P921) -> September (Q123), reusing P921 and Q123 from Wikidata, without having to redefine and maintain/keep in sync those entities locally.
  • Things we should keep in mind:
    • When your entities could possibly be accepted on another Wikibase instance (typically, following the acceptability criteria and with a compatible data license), having a mechanism to easily move an entity from one Wikibase to another would be quite convenient. Technically, that means re-creating the entity on the remote Wikibase (requires due authorizations, OAuth?), and turning the local entity into a redirection to that remote entity.
    • Merging a local entity into a remote entity would just mean turning that local entity into a redirection to that remote entity while transferring the data that can be transferred to the remote entity (involving the same authorizations requirements as the point above on moving entities). This means we need a concept of redirect to a remote entity
    • Wikibase backend would need to have a mechanism to detect and propagate redirections happening on a remote instance
    • You might get in the case where you would like to set the value of a claim on a remote entity to a local entity: that's only a problem when that local entity can't be, for some reason (ex: notability criteria in Wikidata), re-created on the remote instance; otherwise just move it there
    • Reusing data from other places is what makes linked data great, but fully delegating the definition of a concept to a remote instance can be itchy. Due to your instance's specificities (different focus, different schema/data constraints, different copyright policy, etc), you might want to locally extend (or even overwrite?) the properties and values on that remote entity: having the possibility to edit a local "shadow entity" or "entity extension" of a remote entity could make sense in different use cases
    • As of today, every Wikibase instance re-attributes P and Q ids; to avoid conflicts and make things easier to understand, we could invite every Wikibase instances to centrally claim their preferred prefixes (like wd: for Wikidata entities). Wikibase could then associate a prefix to each of its whitelisted remote Wikibase instances and use those prefixes to serialize WikibaseProperty, WikibaseItem (etc.) values. Those prefix conventions could then also be used and declined in each Wikibase Query Service installations: if my instance entity prefix is abc:, the convention would be to use abct:, abcp:, abcps:, abcpq:, abcpr: etc., as equivalents for Wikidata wdt:, p:, ps:, pq:, and pr: prefixes.

-- Maxlath (talk) 20:28, 1 September 2019 (UTC)

Discussion and questions about this inputEdit

Input by SnipreEdit

  • The first parameter to define is if WD will continue to cover all subjects or if some dedicated areas will be completely transfered to other wikibase DBs.
    So we have 2 possibilities:
    • WD keeps one item about all subjects and other wikibase DBs only provide more detailed information. In that case the link between WD and other wikibase DBs can be handled in the same way as we mange URL. The only point to develop is to be able to perform an extraction over all databases sharing the same subject.
    • Some subjects (like scientific article, books,...) moves completeley to anotehr wikibase DB and then WD has to decide if we want to copy the data into WD or if you prefer to create a link between the data and the item where the data are used. Snipre (talk) 07:48, 2 September 2019 (UTC)

Discussion and questions about this inputEdit

This is a key point, and there should be a "meeting of the minds" here prior to any significant technical activities. It does not have to be all or nothing. There may be significant and ongoing elements of data donation, as described in my own input below. But federation is a kind of joint venture, and the parties will want to carefully define the relationship between themselves. External projects will not want the rug to be pulled from under them by having crucial data or relationships removed - either by community processes or Wikimedia fiat. Maybe Wikidata does not have the expertise to cover all topics, or the desire to do so for any number of reasons - performance, policy, liability. But before you set out limits, be aware that they may reduce the utility and quality of what you do allow, just as it has on Wikipedia for certain topic areas. If you do not, for example, allow all elements of a set to be on Wikidata because some are not 'notable' and a sister project complains, there may be little benefit to federation because people have to fill in everything Wikidata will not allow. And you are likely to lose enrichment on the items you do want. GreenReaper (talk) 09:18, 2 September 2019 (UTC)

Input by GreenReaperEdit

  • I would like to be able to do this: Integrate [some of] WikiFur's data on furry conventions with Wikidata; and base lists, charts, maps and templates off said data, including content from both WikiFur and Wikidata/Commons.
  • I think this would be great because: Trying to manage data on 160+ events reveals a need for a centralized data store, especially as representations (various lists; info/navboxes; EasyTimelines; an event map) and languages multiply. We'd also like to avoid being a "furry silo"; sharing data Wikidata is willing to accept, which could prove useful for Wikimedia projects and other consumers within and outside the fandom.
  • Things we should keep in mind:
    • Wikipedia's own data on such events is limited by a paucity of 'reliable' sources, in part because many events have been unwilling to admit so-called 'reliable' media (having been burned in the past). This federation proposal is based in part on the premise that Wikidata will be willing to accept more than Wikipedia. (From what I've read the answer is "yes, but maybe not as much as you'd like". We need to be clear on where the bounds lie to avoid a situation where data or links are removed a few years down the road, breaking applications and templates we've built, as we have limited editorial or development capacity to respond to sudden changes.)
    • Our goal is to be comprehensive for annual organized events (including those with attendance of ~11,000 right down to 11). Ideally, this would include ticket/membership data, which we currently publish (via the map) as Event JSON-LD. A few events already have Wikidata items; the vast majority do not. It's not a priority to track the more-frequent "furmeets", though some larger ones have articles.
    • A non-trivial portion of data is provided by fans going to convention closing ceremonies and reporting announced attendance and charity donation figures. In some cases this could be substantiated by later postings on official websites or social media accounts. In other cases it will not. Clarity over how such data should so be qualified so that data consumers can use it or not as they wish would be appreciated. If it is not acceptable at all, then it may be that we need a local Wikibase repo with a SPARQL union or similar. (Examples of relevant queries is likely to be useful for onboarding similar federated partners.)
    • Vandalism has been an issue for articles on WikiFur, but less so for data. However it would help to be able to review relevant changes, which our domain experts may spot issues with that Wikidata misses. Ideally it could be integrated into our own Recent Changes; we probably do not have sufficient use of personal watchlists for this to be an effective means of review (nor can we rely on editors checking watchlists on Wikimedia properties).
    • In rare cases, data is known to be a lower bound, not a final sum or there is concern over the accuracy of official data. If anyone knows where this applies, it's editors within the community; but such data will need to be appropriately qualified if imported to Wikidata (or even if searchable from Wikidata).
    • Some data relating to convention (although not yet) relates to living people. This is most likely to come up in relation to either key staff or guests of honour and other entertainers (an as-yet unfilled Event property). A related concern is that some people may not want their real name, date of birth, location, etc. linked to their fan name. WikiFur has a personal exclusion policy but it would not eliminate all references of this kind because we consider someone taking a key role in an event to have put themselves out there, at least under their fan name. I could see this result in creating an item just about the fan persona in some cases.
    • It'd help to have training opportunities to get up to speed with SPARQL and Lua as they relate to Wikibase. I'd then have a better idea of what is practical and could pass skills on to others in my community. I imagine this is true of other fan communities of which WikiFur is a somewhat-representative example - communities are not guaranteed to have staff with skills in system administration, data modelling, query planning or programming.
    • Fandom may not have the cultural cachet of GLAM. But leaving the field open will just let commercial providers soak up much of the youth audience for editing which could otherwise help to improve Wikimedia projects. Moreover, there is interest in such topics from the perspective of both scientific research and literary history (Fred Patten, author of this fandom retrospective, has also contributed to UC Riverside's Eaton collection). This aspect might become more relevant if we get into templates about works (which would be our "phase 2") because there is a significant number of furry fandom comics, novels and magazines (as well as several publishing businesses). Most are unlikely to ever feature in Wikipedia, although a few authors have been covered - but they exist and there is data about them which is of interest to us, and perhaps to others.
    • Our map currently contains image file suffixes for logos and other promotional images relating to events. These are typically only fair use and cannot be uploaded to Commons; nor to Wikipedia because in most cases they would not form part of an article. It isn't obvious whether references to such data would be suitable as a Wikidata property (maybe it'd be useful as a full URL), or if we need to record it locally.

Here's the documentation of data items for WikiFur's furry convention map (one of the more complicated data representations; most just correlate event instances with date, a numeric figure, and maybe location):

/* Add map points, details and optional pre-reg per https://developers.google.com/search/docs/data-types/event - escape internal apostrophies, e.g. 'Dealer\'s'.
addLocation(latitude, longitude, 'Name' {English WikiFur article}, 'XX/Logo.XXX' or ['XX/Logo.XXX','YY/Flyer.YYY',...] or null {last part of original image URL on en.wikifur.com, or full URL; logo first; current flyer second, ideally width 720px+, 50K+ px total}, 'furry.web/site' {no http[s]://}, 'Place name<br>Street<br>City, State Code<br>Country [notes in brackets]', 'Phone Number' {or ''}, 'Jan 1[-[Feb ]3] 20XX' {or 'Dec YY 20XX-Jan Z'}, attendance {or null}, minimum age {restrictions allowed; or null}, 'https://event/registration' {or null}, 'reg currency code' {ISO 4217; 'USD' not '$'; or ['USD','ARS'] if available in multiple currencies; or null}, [
['Name', 'Brief description', Price {include fees; 50 or 50.50, not 50,50; use [14.50,504.20] if multiple currencies}, 'Available from' {at price, datetime with offset; check DST}, 'Available until', 'SoldOut'/'PreOrder'/'InStoreOnly' {optional, use if definitively sold out/put to waiting list/only at-con respectively}],
[...]
] {or null if no registration}
);

An example of a fully-detailed item for map and event listing purposes (more are available in the complete convention map script):

addLocation(52.474897,    13.459467, 'Eurofurence', '2f/Eurofurence_logo_crop.gif', 'eurofurence.org', 'Estrel Hotel Berlin<br>Sonnenallee 225<br>12057 Berlin<br>Germany', '+49 391 5949 0', 'Aug 19-23 2020', 3412, 18, 'https://reg.eurofurence.org/regsys/start.jsp', 'EUR', [
['Basic (Early Bird)','Wednesday-Sunday',90,'2019-01-19T07:00+0100','2019-02-01T00:00+0100'],
['Sponsor (Early Bird)','Wednesday-Sunday; T-shirt; Sponsor gifts; conbook mention',155,'2019-01-19T07:00+0100','2019-02-01T00:00+0100'],
['Super Sponsor (Early Bird)','Wednesday-Sunday; exclusive event access; T-shirt; Sponsor gifts; conbook mention',250,'2019-01-19T07:00+0100','2019-02-01T00:00+0100'],
['Basic','Wednesday-Sunday',100,'2019-02-01T00:00+0100','2019-04-01T00:00+0200'],
['Sponsor','Wednesday-Sunday; T-shirt; Sponsor gifts; conbook mention',165,'2019-02-01T00:00+0100','2019-04-01T00:00+0200'],
['Super Sponsor','Wednesday-Sunday; exclusive event access; T-shirt; Sponsor gifts; conbook mention',270,'2019-02-01T00:00+0100','2019-04-01T00:00+0200'],
['Basic (Late)','Wednesday-Sunday',125,'2019-04-01T00:00+0200','2019-08-01T00:00+0200'],
['Sponsor (Late)','Wednesday-Sunday; T-shirt; Sponsor gifts; conbook mention',190,'2019-04-01T00:00+0200','2019-08-01T00:00+0200'],
['Super Sponsor (Late)','Wednesday-Sunday; exclusive event access; T-shirt; Sponsor gifts; conbook mention',295,'2019-04-01T00:00+0200','2019-08-01T00:00+0200'],
['Single Day','Access for one day',60,'2019-08-01T00:00+0200','2019-08-17T24:00+0200']
]);

Discussion and questions about this inputEdit

Input by User:JumtistEdit

  • I would like to be able to do this: Reuse a whitelist of Wikidata properties on another Wikibase instance.
  • I think this would be great because: it is a first step. We are coming from a centralized Wikibase world where Wikidata is omnipresent. It is great to hear about federation, but it is technically challenging. So let's start small by sharing ontologies/properties.
  • Things we should keep in mind: Sharing properties can be defined as having the possibility of reusing external properties to ease a Wikibase instance set up by not redefining the same properties among different instances, it would look like symbolic links by leaving the data elsewhere. But this feature might want to come with a possibility of splitting/forking the property when necessary (ie. a Wikidata property changes its datatype and does not fit the schema of your instance anymore, then you would need the possibility of creating your own property "based on" the WD prop). We are not talking about any merge possibility yet, but if only we could mutualise some props (and entities on a longer run) then we would like to fork it easily.

Discussion and questions about this inputEdit

I agree and think it's necessary. However, it should be kept in mind that Wikidata properties have not been defined with this concern in mind. When I tried to reuse a bunch of properties in my local Wikibase, I remarked that some of them don't fit well outside Wikidata. For instance, let's take occupation (P106), whose description states occupation of a person; see also "field of work" (Property:P101), "position held" (Property:P39). I see two potential problems:

  1. the numerical identifiers mentioned in the description won't be mapped to the Wikibase property numbers/won't make sense for a new user
  2. even if a mapping would be created to adapt the P numbers, the related properties (P101 and P39) may be judged irrelevant or too specific in a certain context and, therefore, would not be imported in the 'local' Wikibase...

Moreover, and it applied to the reuse of both properties and items, some statements associated with one given item may be considered as 'noise' and irrelevant (again, either "meta" properties related to Wikidata in particular or too specific such as the flag or the motto of a town item), altering the interest of an automated import. --Anchardo (talk) 16:41, 2 September 2019 (UTC)

To take a page from programming: what may be needed for interaction between Wikibase instances is an interface. I believe the rough equivalent in this domain is a Shape Expression (ShEx), and there is a WikiProject for this and it is in the development plan (albeit perhaps not specifically with federation in mind), which led to the creation of the EntitySchema extension. Of course you are right that there must be agreement on the semantics as well but it might be best to use these to focus on the minimum details which there needs to be agreement and use federated queries relying on such shapes, rather than trying to import entire properties. GreenReaper (talk) 19:19, 2 September 2019 (UTC)

Input by Yurik (talk)Edit

  • I would like to be able to do this: Easily extend WDQS query UI for my own site with the following customizations:
    • Add more than one custom "data freshness" indicator, i.e. how stale is the wikidata + freshness of my local Blazegraph instance (last update). Note that non-WDQS might have different definition of how old is stale - i.e. once an hour update is reasonable in some cases.
    • Be able to quickly re-target examples, help, and other links to a different site
    • Be able to extend rather than replace pre-defined namespaces. For example, sophox.org (OpenStreetMap data an RDF) defines its own namespaces, such as osmd: instead of wd: for the entities defined in Wikidata. The issue is that most Sophox users are also interested in federation - and for that they need to use wikidata namespaces, so they want to continue using wd:, wdt: and other ones in a federated subquery. The UI should be able to use both my Blazegraph instance as well as WDQS depending on the namespace used.
  • I think this would be great because: make it easy to use WDQS together with a custom data store.
  • Things we should keep in mind:
    • Developers, especially the more seasoned ones, would rather code to the latest standard, and use modern tools, rather than manually (in their heads) backport to an older browser. Modern developing tools such as Babel and packing compiling would be good to use to make sure noone accidentally uses some new construct like "for of" or a promise, and later find out that it doesn't automatically get backported.

Discussion and questions about this inputEdit

--Yurik (talk) 22:21, 2 September 2019 (UTC)

Input by ArthurPSmithEdit

  • I would like to be able to do this: create a very large wikibase instance that overlaps somewhat with what Wikidata holds, and automatically synchronize data for linked items between my wikbase and Wikidata. Example: all of the billion stars in the GAIA archive (Wikidata currently has items for about 50,000 stars). Or all 100's of millions of business enterprises in the world eg. from OpenCorporates (Wikidata currently has about 200,000 items for businesses).
  • I think this would be great because: these databases are too big for Wikidata, but it would be great to transform them into a common data model and coordinate/cross-reference them, linking this data to the rest of the universe of data we have.
  • Things we should keep in mind:
  1. QID's of linked items will be different in my wikibase and in Wikidata - a special property like "Wikidata ID" could hold the relationship perhaps.
  2. It should be possible to reference Wikidata items as values for item-valued properties directly without having to copy those items to my wikibase (for example for references, "stated in" xxxx should allow for the reference to be a Wikidata item, not a local wikibase one). If this isn't possible then synchronization becomes more difficult because any new item value for a statement on a linked item on the Wikidata side would need to also be copied.
  3. Property ID's would probably be different too but should have similar relationship possibilities (i.e. synchronized/linked or referenced without copying)
  4. User ID's for people making updates may need to be translated somehow in the synchronization process... Should updates be applied incrementally exactly as they happen, or batched?

Discussion and questions about this inputEdit

Input by Susanna ÅnäsEdit

  • I would like to be able to do this:
    • Wikidocumentaries creates a page of any Wikidata item (later also local wikibase items), displays related data in a visual way and allows users to enrich the data. I would like to be able to create new local items in the Wikidocumentaries wikibase and use Wikidata properties to describe them. Also, I would like to use Wikidata items or local items as values in statements. The local database will also need it's own properties for some application-related features. The local items could be imported to Wikidata based on a schema when they are mature enough.
    • For items already displayed by Wikidocumentaries (Wikidata/local), Wikidocumentaries can make federated queries to other SPARQL endpoints to retrieve and display data and content. A crowdsourcing tool would allow the user to verify if the related piece of data is valid to enrich the item. The tool would then write a statement with all necessary sources to the local wikibase if the item existed locally, or Wikidata, if it existed in Wikidata.
    • In the federated landscape, Wikidocumentaries could display a page about items that exist in any federated Wikibase, based on search matches. For this, the search (pulldown) needs to take them all into account. The amount of hits will grow remarkably, and the search features need to developed. In Wikidocumentaries, the user's task is to collect and import data and content to Wikimedia projects. The item from a federated source could be saved/linked to the local Wikibase and/or Wikidata, or not saved at all, it the federated data is enough. The local item will be needed when there are contributions that cannot be saved in any Wikimedia projects, such as testimonial articles, user-defined picture collections etc. related to the topic.
    • Local wiki articles are the counterparts of Wikipedia articles as part of Wikidocumentaries pages. I would like to solve how they would migrate with the items, how they are linked to the data item in the local wikibase, how to create links in these articles across the federated landscape, and how translations are maintained and if translation tools can be used.
  • I think this would be great because: The purpose of Wikidocumentaries is to give tools to users to explore GLAM content from across open repositories and associate them with items in Wikidata. New items could be added based on other sources or own research. Family or local researchers can add topics that have only limited interest, and be able to collect, arrange and remix material about that as well. These topics can be pulled to Wikidata upon some level of maturity or notability via the federated infrastructure. This will allow a flow of data from reliable sources evaluated by users and user communities to Wikidata.
  • Things we should keep in mind: It is very important to keep track of data provenance.

Susanna Ånäs (Susannaanas) (talk) 05:32, 5 September 2019 (UTC)

Discussion and questions about this inputEdit

Input by PintochEdit

  • I would like to be able to do this:
    • First, be able to replicate the federation between Commons and Wikidata outside the Wikimedia infrastructure. In other words, be able to configure a Wikibase instance X to take all its properties and property values from Wikibase instance Y. If I remember correctly this is currently only possible for instances relying on the same SQL database cluster: this goal would therefore amount to lifting this restriction.
    • Second, generalize that, in the ways hinted by others above (see User:ArthurPSmith's suggestions for instance)
  • I think this would be great because:
    • As a tool maintainer it would be massively useful if the relationship between Commons and Wikidata could be one particular instance of a generic federation scenario. This would make it easier to write tools that would be applicable beyond Wikimedia wikis, supporting the wider Wikibase ecosystem.
    • This first goal looks like a good first step because Commons already demonstrates the practical usability and usefulness of this sort of federation. It is therefore a sane exercise to generalize this functionality to make it available outside Wikimedia infrastructure.
  • Things we should keep in mind:
    • I think it would be good to dedicate a lot of attention to the discoverability of these federation scenarios. As a user (i.e. an API client), how do I discover that the Wikibase endpoint served at https://commons.wikimedia.org/w/api.php uses the Wikibase endpoint https://www.wikidata.org/w/api.php for its properties and values? In the JSON serialization of a MediaInfo on Commons, how to I discover that the Qids refer to Wikidata? (By discover I mean programmatically infer from the API itself). This is a basic principle of the semantic web, by promoting the use of URIs, not ids which are meaningless when taken out of context.

Pintoch (talk) 14:28, 25 September 2019 (UTC)

Discussion and questions about this inputEdit

Input by Luca MauriEdit

  • I would like to be able to do this: To link the items form my own Wikibase instance to corresponding items in Wikidata and be able to get existing properties from it.
    • This should be done with a mechanism similar to the Sitelinks
    • In the item page of my custom Wikibase, users should be able to add additional properties, when they don't exist in Wikidata and for any reason they cannot be added there
    • From the regular wiki pages, users should be able to read both the properties existing in my custom Wikibase and the properties of the related linked item in Wikidata
  • I think this would be great because:
    • As Wikidata is designed to be a repository of anything, there is no point in inserting in a custom Wikibase the same information that exist there already. A custom Wikidata should be able to take advantage of what already exist, reuse it as it is, and expand it with custom properties
  • Things we should keep in mind:
    • A system must be introduced to differentiate Properties from the custom Wikibase and from Wikidata: in both repositories there will eventually be a P1, P2, P3 and so on. They will most likely represent different kind of data with the same name, so there must be a way to use unique numers. Either a central repository of Property numbers must be created or some prefix specific to the repository must be added to the Property number.
      • First solution means that anybody can claim any numbers of Properties and all the Wikibase installation inside the Federation must comply with this standard. Probably this is too difficult to coordinate and too prone to errors. Alternatively GUID can be used instead of numbers, but this would further complicate user s handling
      • Second solution is probably easier as the existence of prefixes is a de facto standard (as seen in sites table, for instance): syntax can be formalized as something like {{#statements:P102}} always points to local Wikibase installation while something like {{#statements:swb:P102}} points to Property numbered 102 in some remote Wikibase installation prefixed swb.

--Lucamauri (talk) 15:16, 6 October 2019 (UTC)

Discussion and questions about this inputEdit

Input by XXXEdit

  • I would like to be able to do this:
  • I think this would be great because:
  • Things we should keep in mind:

Discussion and questions about this inputEdit