Wikidata talk:REST API feedback round/Archive2


Additional clarifications after initial spec feedback edit

Simplification of structure for labels, descriptions and aliases edit

Hi everyone,

We are now in the process of implementing the first version of the Wikibase REST API. We came across one thing that we think we should change from the spec in a no-trivial way because it just seems to make more sense. The REST API as specified returns labels, descriptions and aliases for Items in a rather elaborate structure. We think it is better to simplify it, without losing any information. You can see the details of the current and the changed-to structure on Phabricator: phab:T305362. I'm quickly checking back with you here if you see any good reason not to make this change.

Cheers --Lydia Pintscher (WMDE) (talk) 14:10, 11 April 2022 (UTC)Reply

  Support Looks fine to me! ArthurPSmith (talk) 14:59, 11 April 2022 (UTC)Reply
  Comment So much better with the flatter structure! A few further suggestions/questions for further flattening and simplifying the schema:
  • Could sitelinks[siteid]->site be removed due to sitelinks[siteid] already providing ID's of each sitelink as a key?
  • statement and statement->mainsnak have a fair bit of duplication from the look of things. Could statements[propertyid]->statement[0..n]->mainsnak->datavalue become statements[propertyid]->statement[hash]->datavalue (and similar for datatype)? Could statements[propertyid]->statement[0..n]->id and statements[propertyid]->statement[0..n]->mainsnak->hash then be removed? (as well as other seemingly unnecessary mainsnak->property, etc.
  • Could statements[propertyid]->(qualifiers|references)[0..n]->hash be replaced with statements[propertyid]->(qualifiers|references)[hash]?
  • Could snaks-order and qualifiers-order be removed or does Wikibase still plan to advertise an ability to users to change the order by replacing all qualifiers and references for a statement at once?
--Dhx1 (talk) 13:00, 1 August 2022 (UTC)Reply
When creating a new REST API it would be great to also support https://meta.wikimedia.org/wiki/Community_Wishlist_Survey_2021/Wikidata/Allow_access_to_some_inverse_statements_through_Lua_and_parser_functions . This feature would allow Wikidata to be used better by the Wikipedia's and allow us to have less doublicated data. ChristianKl15:36, 6 August 2022 (UTC)Reply

Feedback by Dhx1 edit

  • I’m currently using the action API in this way: Indirectly with Pywikibot (Q15169668) (predominantly for User:RfcBot), wikibase-cli (Q87194660) and OpenRefine (Q5583871).
  • I think this is great about the proposed REST API (incl. why): Easier and potentially more efficient (less requests across the Internet) integration of Wikibase with other systems/tools.
  • I think this could be improved about the proposed REST API (incl. why):
    • SPARQL integration would greatly assist with GET, PATCH and DELETE actions. For example, GET /entities/items where the parameter for selecting items is a SPARQL query against Wikidata Query Service (Q20950365) (or other endpoint) that returns a list of items. Or as another example, PATCH /entities/items/ where the parameter for selecting items to patch is a SPARQL query against Wikidata Query Service (Q20950365) (or other endpoint). (Perhaps separate permissions would be needed for the bulk PATCH approach and only applied to approved bot accounts)
    • PATCH and DELETE actions could be improved by allowing a list of items to be supplied. For example, allowing deletion of multiple statements or references as a single action.
    • For JSON Patch (RFC 6902: JavaScript Object Notation (JSON) Patch (Q47467655)) the test operation only supports simple equality tests. To be more useful this could perhaps be extended to also support other tests such as:
      • PCRE regex such that updates only occur when a new value is NOT a:
        • match against the case insensitive old value
        • match against the old value when ignoring whitespace
        • match against the old URL with or without a trailing slash
        • ...etc
      • less precise than number/date (allows for values to replaced only with more precise numbers/dates)
    • Merging of items could be included as a new API method.
    • API methods for retrieving item history would be valuable to many external clients (mostly useful to anti-spam/vandalism bots).
  • Concerns I have:
    • PUT actions appear dangerous due to race conditions. For example, what happens if the entity has changed by client B between client A reading an entity with GET /entities/items and then client A attempting to replace the entity via PUT /entities/item/{entity_id}?
    • Why is snaks-order / qualifiers-order part of the schema? I thought there was no guarantee of order being consistent?
    • Could documentation of the schema include information on expected Snak->datatype and Snak->datavalue, particularly for more complex types such as coordinates, quantities with units and precisions and timestamps with precisions?
    • The example value in the documentation for GET /entities/{entity_type}/{entity_id}/statements is seemingly missing an ID value that GET /statements/{statement_id} is stated to require (e.g. in the format of Q42$F078E5B3-F9A8-480E-B7AC-D97778CBBEF9).
  • Other comments:
    • datatype is returned to clients, but this information is probably not much value to clients on is own. Clients would likely need to have first downloaded statements against the relevant property to understand other details beyond datatype such as constraints on the property values (minimum value, maximum value, regex for required format, etc).

--Dhx1 (talk) 10:40, 1 August 2022 (UTC)Reply

The PUT race condition scenario could be solved by making the if-match header required instead of optional. I think this might be a good idea. VeniVidiVicipedia (talk) 12:15, 1 August 2022 (UTC)Reply
Looks like a good way to solve that problem :) Dhx1 (talk) 13:28, 1 August 2022 (UTC)Reply
Thank you so much for your feedback! Here are a few replies:
  • SPARQL integration: We are currently not planning to integrate SPARQL into the REST API because we believe the areas where it would be needed are better served with other APIs in the future (e.g. graphql). But we will keep looking into it as we build out the API.
  • Allowing PATCH and DELETE actions to be supplied with a list of Items: That generally sounds like a good idea we’ll put on the list for a future release.
  • Expanding the test operation for JSON PATCH: Those sound useful, yes. We’ll put that on the list for a release after the first version.
  • Merging two Items: Very good point. The REST API will need to support it. We now have T314859 for tracking it for one of the next releases.
  • API module for accessing the history of an Item: We delegate that to the MediaWiki REST API and it can already do some of that: https://www.wikidata.org/w/rest.php/v1/page/Q42/history
  • PUT actions appear dangerous due to race conditions: We need to discuss more about how to address this. We now have T314860 for it.
  • SNAK and qualifier order: Good point. We now have T314861 for it.
  • Documentation and examples: Yes, we’ll definitely polish and expand that. We now have T314874 for it.
  • Including additional information about the datatype: It seems to depend quite a lot on what you’re trying to build if that is enough or not. We’ll keep an eye on it.
Thanks a lot again for having a look and giving feedback. - Mohammed Sadat (WMDE) (talk) 14:34, 11 August 2022 (UTC)Reply

Feedback by Nicereddy edit

So, I spent a few hours today (plus a few hours building out a scaffold a few weeks ago when I found the REST API Swagger docs) putting together a now-mostly-working Ruby gem for the new REST API: https://github.com/connorshea/wikidatum

I know most of the Wikidata/Wikibase community uses Python, but I like Ruby and I use the existing MediaWiki APIs a lot for various bot scripts and to pull video game item data into vglist, so I'm writing this mostly just for my own personal usage :)

Anyway, I have some feedback based on that experience. I think the REST API works really well overall, and is *a lot* better than the old API, but there are definitely some rough edges which could use smoothing out.

  • The lack of docs for all the possible formats of `datavalue` is pretty frustrating, although I know there's this page in the Wikimedia docs explaining the different datavalue formats (it's just not linked from anywhere, as far as I know?).
  • Ideally, datavalue would have its own Swagger schema for the values in it that covers the various possible (e.g. each DataValue type should have its own Schema at the bottom defining what it looks like).
  • It's a bit weird that the structure of the sitelinks repeats the site name like this? Would it make sense to remove the "site" attribute, or are there cases where the site and key here would differ?
"sitelinks": {
  "dewiki": {
    "site": "dewiki",
    "title": "Data-Bridge",
    "badges": []
  },
  "enwiki": {
    "site": "enwiki",
    "title": "Data bridge",
    "badges": []
  }
}
  • novalue and somevalue statements having a different structure for mainsnak really kicked me in the teeth as I tried to implement this. I got stuck for a good while trying to figure that one out. Can we give it a datavalue of some kind, even if it's just { "type": "novalue", "value": null }? If not, we should at least make note of that in the REST API docs as a major caveat for integrating with the REST API.
  • I don't see any Swagger documentation - or any docs anywhere else - about how authentication works with the REST API. There's also not much information about things like error states or rate limiting.
  • There aren't examples in the Swagger docs for how references or qualifiers work for PUT/POST/PATCH requests as far as I can tell, those would be very nice to have.
  • Similarly, there isn't anything in the Swagger docs that goes over how to use different types of datavalue formats in PUT/POST/PATCH requests. Adding a simple string property is easy enough, but things like external IDs are more complex and not very obvious. They should definitely be covered either by the Swagger docs or some docs elsewhere.
  • Would it be possible to have the qualifier orders and snak orders in the qualifiers/snaks themselves? E.g. rather a separate array listing the order, each qualifier/snak would have its own attribute like order: 1.
  • I noticed that there's no Swagger documentation on the badges attribute for sitelinks. It'd be good to explain the format of that array, since I didn't know what it'd look like until I saw in the Wikimedia Docs for the Wikibase JSON format that it's an array of Q-IDs representing the specific badges an article has.
  • Does the beta Wikidata instance have a list of good items somewhere? I've managed to find a few, but a lot of items don't have any statements, so it's not great for testing the REST API.
  • I'm not sure if it's a limitation of Swagger, but the types defined for descriptions, aliases, and labels could be a bit clearer to show that the keys in the map are language codes.
  • I'd definitely like to see the editing of labels, descriptions, and aliases be a priority after the initial MVP of the REST API launches on stable Wikidata :)


This has progressed a ton and is looking great, I'm looking forward to further improvements and the stable release of the API :D

Nicereddy (talk) 00:00, 7 August 2022 (UTC)Reply

Am I supposed to be able to delete statements without auth? Because I just did while building out the delete statement method for my gem 😅 Nicereddy (talk) 03:31, 7 August 2022 (UTC)Reply
It’s really cool to see your ruby gem. That was quick! And it looks like it’s a great way to find some issues we need to address, especially documentation. Here about the individual points:
  • Various documentation issues: Yes we didn’t prioritize this yet but definitely need to polish it for the first proper release. Thanks for collecting the specific examples. That helps to prioritize. We now have T314874 for collecting these and tackling them.
  • Sitelink structure repeating the site name: We took that over from the action API so far and will discuss simplifying it. We’ll have to do some digging for why this structure was initially chosen.
  • Qualifier and snak order: We are looking into removing these in T314861.
  • List of good Items on beta Wikidata: No such list exists but feel free to make changes to any existing Item on beta to have the data you need for your testing. It’s a free-for-all testing ground.
  • Prioritizing editing of labels, descriptions and aliases: Noted. Thanks!
  • Deleting statements without authentication on beta: Yes that should be allowed as editing without an account is allowed on beta. And there isn’t really a technical difference between editing and deleting a statement for that.
Thanks again for going through all of it and the effort to build the ruby gem. It’s very cool to see that happen in this early stage. - Mohammed Sadat (WMDE) (talk) 14:41, 11 August 2022 (UTC)Reply
@Mohammed Sadat (WMDE) Regarding the auth on deleting statements, I'd be a bit worried about the API enabling mass-vandalism if that's the case. I guess that's already possible by just automating a browser, but it definitely lowers the barrier somewhat (not sure if Wikimedia uses anything like Cloudflare to protect against bots hitting the site via a browser?). You can still IP ban them if the vandalism is discovered, but if they just spin up new virtual servers on some cloud provider, the IP could keep changing and that may not be very easy to prevent.
My understanding is that the "old" MediaWiki API always requires authentication, at least based on how the Ruby gem for the MediaWiki API works. Nicereddy (talk) 18:38, 12 August 2022 (UTC)Reply
We will dig deeper into the topic as it is important, of course. There is phab:T316354 for it now. -Mohammed Sadat (WMDE) (talk) 10:05, 1 September 2022 (UTC)Reply
@Mohammed Sadat (WMDE) So, I spent a good chunk of time running into a wall with implement the "add statement" method in my library, because I was sending this as the body of the request for the POST request to /items/Q123/statements:
::::{
::::  "statement": {
::::    "mainsnak": {
::::      "snaktype": "value",
::::      "property": "P625",
::::      "datatype": "string",
::::      "datavalue": {
::::        "type": "string",
::::        "value": "test data"
::::      }
::::    }
::::  },
::::  "bot": true,
::::  "tags": [],
::::  "comment": "foo"
::::}
::::
The error message was just { "code": "invalid-statement-data", "message": "Invalid statement data provided" } which wasn't very useful. Apparently, the Swagger docs are missing the "type" property that should be a sibling of mainsnak. I only figured that out by looking at the unit tests in the Wikibase git repo. 😅
So *this* works:
::::{
::::  "statement": {
::::    "mainsnak": {
::::      "snaktype": "value",
::::      "property": "P625",
::::      "datatype": "string",
::::      "datavalue": {
::::        "type": "string",
::::        "value": "test data"
::::      }
::::    }
::::    "type": "statement"
::::  },
::::  "bot": true,
::::  "tags": [],
::::  "comment": "foo"
::::}
::::
So 1) it'd be great if the error message could specify what attributes are missing for the statement to be valid (this is probably easier said than done, but it'd be significantly better than the current situation), and 2) the Swagger docs definitely need to be updated to include the "type" attribute for the POST request, and it'd probably be worth checking for that problem in the other statement-related requests. (EDIT: I've created https://phabricator.wikimedia.org/T315164 for this issue).
Thanks, Nicereddy (talk) 03:28, 13 August 2022 (UTC)Reply
Thanks! We’re also digging more into it if we even need it in phab:T316077. -Mohammed Sadat (WMDE) (talk) 10:14, 1 September 2022 (UTC)Reply
A few more issues I've hit while working on the Ruby gem yesterday and this morning:
  • The "datatype" and "value.type" in a mainsnak sometimes differ, and not just "string" vs "url" (which I understand the need for), but stuff like "globecoordinate" vs "globe-coordinate". This is very silly, and I'd appreciate it if they were using the same names.
  • The error messages for invalid statements on the POST endpoint for statements are really frustrating. No matter what you get wrong, it'll just tell you "Invalid statement data provided" and nothing else. I mentioned this problem in a previous comment, but I wanted to emphasize the importance of the error message having specific details. If I'm missing the "type: statement" key-value pair, it should tell me in the error message that I'm missing the "type" key. If I'm using the wrong format for a property (e.g. passing an integer instead of "P123") it should ideally tell me that as well. If I completely skip the "mainsnak", or nest "property" under "datavalue", or give an invalid "snaktype" of "blahvalue", these things should be pointed out specifically. There are tons of ways the statement can be invalid, so being specific about the problem would be incredibly useful. This was one of my biggest problems with the old Wikidata API, and I'd love if it wasn't also a problem in the new REST API.
Thanks, Nicereddy (talk) 14:46, 13 August 2022 (UTC)Reply
Thanks! The inconsistencies like globe-coordinate vs globecoordinate etc are not great indeed. We will take a closer look at the data structures used in requests and responses. That would include unifying certain things like the one you mention.
You also raise a good point with the not super helpful error messages. We’ll have to improve them. We’re tracking that in phab:T316718. -Mohammed Sadat (WMDE) (talk) 10:16, 1 September 2022 (UTC)Reply

Feedback by VeniVidiVicipedia edit

  • I see OpenAPI version 3.0 is being used. The API might be more future-proof if OpenAPI version 3.1 is used instead. This newer version is compatible with JSON schema.
  • A search endpoint for widata items would be nice. A convenient search endpoint is not available at the moment. The wbsearchenties endpoint is difficult to use and does only a match from the beginning (query 'jackson' wont find 'michael jackson'). Eventually we found another endpoint (action=query) where it is possible to do a search with partial matches. But it was quite a struggle involving lots of guessing and trial and error. Having an endpoint for searching wikidata that is easier to use and understand would be great.

VeniVidiVicipedia (talk) 14:52, 11 August 2022 (UTC)Reply

We have not spend a lot of time thinking about the OpenAPI version to be honest. Looking at it a bit more closely it seems switching isn’t immediately possible because it’s not yet supported by the tool we use for generating automated documentation. Judging from https://github.com/swagger-api/swagger-ui/issues/5891#issuecomment-1172274595 it might also still take a while.
A search endpoint in the future probably makes sense, yeah. We have not included it in the current work because we wanted to get the basics for reading and editing Item data right first. We’ll keep it on the list. -Mohammed Sadat (WMDE) (talk) 10:19, 1 September 2022 (UTC)Reply
Return to the project page "REST API feedback round/Archive2".