Wikidata:Property proposal/applies if regular expression matches
applies if regular expression matches id edit
Originally proposed at Wikidata:Property proposal/Authority control
Description | the statement is only true, if the id matches this regex |
---|---|
Data type | String |
Domain | property |
Allowed values | valid regular expression with at least one capture group |
Example 1 |
|
Example 2 |
|
Example 3 |
|
Example 4 |
|
Example 5 | |
Example 6 |
https://www.imdb.com/title/$1/
|
See also |
|
Motivation edit
Fandom article ID (P6262) and P6623 (P6623) use a third party services to resolve ids to urls. but i think it could be done entirely on wikidata, if it was possible to pass multiple variables to the formatter URL (P1630). This is a proposal to do that.
We'd need a qualifier holding a regular expression.
- the regex will be used to determine which formatter url shall be used. therefore it must not match if the supplied id does not hold the required number of variables.
- the regex will also be used to extract the variables from the id to the formatter url.
- as a fallback an external resolver may be used if no regex matches the id. this fallback should be highlighted somehow. For this proposal I chose no value. formatter urls that match the regular expression must be preferred.
--Shisma (talk) 21:00, 16 February 2020 (UTC)
Discussion edit
- Do you intend to engage the Wikidata developers so this can be supported in the UI, or how otherwise would you envision this to be actually used? ArthurPSmith (talk) 15:31, 18 February 2020 (UTC)
- @Shisma: Also maybe relevant - see Phabricator Task T150939 ArthurPSmith (talk) 19:27, 20 February 2020 (UTC)
- Support --Tinker Bell ★ ♥ 19:45, 19 February 2020 (UTC)
- Support Germartin1 (talk) 13:53, 23 May 2020 (UTC)
- @ArthurPSmith: it is not clear to me what the link between this property proposal and the Phabricator task? Does it mean this property can be created even if the Phab task is not fixed? Pamputt (talk) 05:55, 17 June 2020 (UTC)
- @Pamputt: I'm not sure the property will be much use without having something like the phab task actually looked at and getting some input from developers on feasibility of this approach. But on the other hand the outline presented here seems well-thought-out, so it would at least provide some input on how the phab task could be done. So I don't have a problem with the property being created soon. ArthurPSmith (talk) 18:38, 17 June 2020 (UTC)
- Given the way Wikibase works with formatter URL (P1630) on external-id-properties, on these, the property should probably only be used with third-party formatter URL (P3303). On string-datatype properties, it could replace format as a regular expression (P1793). --- Jura 11:33, 17 June 2020 (UTC)
- @Shisma, ArthurPSmith, Tinker Bell, Germartin1, Jura1: Done applies if regular expression matches (P8460) Pamputt (talk) 13:10, 18 July 2020 (UTC)
- @Shisma, ArthurPSmith, Tinker Bell, Germartin1, Jura1, Pamputt: :( It would have been good to quickly ping the dev team about this. It doesn't look like we can implement this among others for for security reasons. (We can't just work with arbitrary regular expressions and the constraints check we have is an exception.) Property:P345#P1630 for example now makes it considerably harder for 3rd parties to work with the data if there is no preferred formatter statement. What do we do? --Lydia Pintscher (WMDE) (talk) 17:42, 20 July 2020 (UTC)
- @Lydia Pintscher (WMDE): I'm not sure I follow the security concern here - do you have an example of a regular expression that could cause a security problem? Can we restrict the types of regular expressions to avoid the problem somehow? ArthurPSmith (talk) 17:53, 20 July 2020 (UTC)
- https://www.regular-expressions.info/catastrophic.html We have solved it for the constraint checks by using the query service because it has functionality to prevent this when evaluating a regex. For formatting this will not be possible this way. --Lydia Pintscher (WMDE) (talk) 18:00, 20 July 2020 (UTC)
- I think a very limited collection of regex's should be ok though. For this, if we disallow any nesting of grouping or quantifiers on groupings (i.e. (..(..)) or (..)* or (..)+ or (..)? are all forbidden) would that still pose a problem? ArthurPSmith (talk) 18:10, 20 July 2020 (UTC)
- Maybe also no lazy matching (*?, +? etc) and require the regex to match the entire identifier string (implicit ^ at start and $ at end). That should keep things pretty efficient I think. ArthurPSmith (talk) 18:19, 20 July 2020 (UTC)
- @ArthurPSmith: Well, Wikibase can’t just trust that the regexes will be safe, and I’m not convinced it’s easy to detect whether they are or not. (T214378 proposed an even more limited subset of regexes, and that hasn’t gone anywhere, either.) --Lucas Werkmeister (WMDE) (talk) 14:33, 23 July 2020 (UTC)
- @Lucas Werkmeister (WMDE): Is there maybe a PHP library available that can handle just the simplest regexes without the exponential growth problem? It seems like a very common issue! ArthurPSmith (talk) 15:15, 23 July 2020 (UTC)
- @ArthurPSmith: The closest thing is probably RE2 (Q7299973). More generally, solutions to this problem are the subject of the RFC T240884, but I don’t know when that will move forward, and without it we can’t implement support for this new property. (And, like Lydia says, I would’ve preferred to testify this before the property was created :/ ) --Lucas Werkmeister (WMDE) (talk) 12:24, 24 July 2020 (UTC)
- @Lucas Werkmeister (WMDE): Ah, RE2 was exactly the sort of thing I was thinking of, it's unfortunate there's no native PHP support. Would it be helpful to chime in on T240884 that there are other reasons we might want this for Wikidata use? ArthurPSmith (talk) 14:57, 24 July 2020 (UTC)
- @ArthurPSmith: The closest thing is probably RE2 (Q7299973). More generally, solutions to this problem are the subject of the RFC T240884, but I don’t know when that will move forward, and without it we can’t implement support for this new property. (And, like Lydia says, I would’ve preferred to testify this before the property was created :/ ) --Lucas Werkmeister (WMDE) (talk) 12:24, 24 July 2020 (UTC)
- @Lucas Werkmeister (WMDE): Is there maybe a PHP library available that can handle just the simplest regexes without the exponential growth problem? It seems like a very common issue! ArthurPSmith (talk) 15:15, 23 July 2020 (UTC)
- @ArthurPSmith: Well, Wikibase can’t just trust that the regexes will be safe, and I’m not convinced it’s easy to detect whether they are or not. (T214378 proposed an even more limited subset of regexes, and that hasn’t gone anywhere, either.) --Lucas Werkmeister (WMDE) (talk) 14:33, 23 July 2020 (UTC)
- https://www.regular-expressions.info/catastrophic.html We have solved it for the constraint checks by using the query service because it has functionality to prevent this when evaluating a regex. For formatting this will not be possible this way. --Lydia Pintscher (WMDE) (talk) 18:00, 20 July 2020 (UTC)
- @Lydia Pintscher (WMDE): I'm not sure I follow the security concern here - do you have an example of a regular expression that could cause a security problem? Can we restrict the types of regular expressions to avoid the problem somehow? ArthurPSmith (talk) 17:53, 20 July 2020 (UTC)