Wikidata:Property proposal/URL match pattern

URL match pattern edit

Originally proposed at Wikidata:Property proposal/Generic

Descriptionregex pattern of URL that an external ID may be extracted. Qualifier "URL match replacement value" can overwrite the default \1. Use non-capturing groups when needed "(?:)".
Data typeString
Domainproperty
Example 1IMDb ID (P345) → (one of multiple values) https:\/\/www\.imdb\.com\/(?:title|name|news)\/([a-z0-9]+)(\/.*)?
<replacement value> \1
Example 2PubMed ID (P698)https:\/\/pubmed\.ncbi\.nlm\.nih\.gov\/(\d+)(-[^\/]*)?\/
<replacement value> \1
Example 3ISNI (P213)https?:\/\/www\.isni\.org\/(\d{4})(| |%20)(\d{4})(| |%20)(\d{4})(| |%20)(\d{4})
<replacement value> \1 \3 \5 \7
Example 4ZVG number (P679)http:\/\/gestis-en\.itrust\.de\/nxt\/gateway\.dll\/gestis_en\/0+([1-9]\d+)\.xml.*
<replacement value> \1
Example 5CricketArchive player ID (P2698)https:\/\/cricketarchive\.com\/Archive\/Players\/\d+\/\d+\/(\d+)\.html
<replacement value> \1
Example 6Fandom article ID (P6262)https:\/\/([a-z0-9\.-]+)\.(wikia|fandom)\.com\/wiki\/(.*)
<replacement value> \1:\3
Example 7Geni.com profile ID (P2600)https:\/\/www\.geni\.com\/(?:profile|people)\/[^\/]+\/(\d+)(#.*)?
<replacement value> \1
See also

URL match replacement value edit

Description(qualifier only) optional qualifier to overwrite the default \1
Data typeString
Example 1see above

Motivation edit

This will provide a way to extract property and ID from a given URL. A future tool or gadget may benefit from this. GZWDer (talk) 23:46, 26 February 2020 (UTC)[reply]

Discussion edit

  Comment Here's an example of how this would look on Fandom article ID (P6262):

URL match pattern
  https:\/\/([a-z0-9\.-]+)\.(wikia|fandom)\.com\/wiki\/(.*)
URL match replacement value \1:\3
0 references
add reference


add value

If a tool wanted to automatically generate a Fandom article ID (P6262) from the URL https://minecraft.fandom.com/wiki/Sheep for example, it would match the regex specified with property against that URL. There are three caputring groups in the regex. The first one is ([a-z0-9\.-]+), and matches "minecraft", the second one is (wikia|fandom) and matches "fandom", and the third one is (.*) and matches "Sheep". The URL match replacement value allows these capturing groups to be put together. \1:\3 turns into minecraft:Sheep, since \N is replaced with the value of the nth capturing group. --SixTwoEight (talk) 01:52, 4 March 2020 (UTC)[reply]

Replacement for external id extraction
  \1:\3
applies if regular expression matches https:\/\/([a-z0-9\.-]+)\.(wikia|fandom)\.com\/wiki\/(.*)
0 references
add reference


add value
  •   Comment Thanks for the ping Lockal. Something like this will almost certainly help. I don't have time right now to get my head around the alternatives, but I'm very glad to see this. --99of9 (talk) 11:17, 6 October 2020 (UTC)[reply]
  • @GZWDer:, sorry for interruption, what do you think about my suggestion above (about reusing of applies if regular expression matches)? Also, inversion of pattern/replacement creates somewhat cleaner representation in case when different URL patterns are using the same replacement (which is impossible it an opposite way: a single URL pattern can never use multiple replacement patterns). If you agree, could you update your proposal, please? --Lockal (talk) 07:58, 13 October 2020 (UTC)[reply]