Wikidata:Property proposal/location URL match pattern

location URL match pattern edit

Originally proposed at Wikidata:Property proposal/Generic

Descriptionregex pattern of URL that individual shops/amenity within a chain can be matched against.
Data typeString
Domainchain (Q65553774)
Example 1Greggs (Q3403981) -> ^https:\/\/www\.greggs\.co\.uk\/shop-finder\?shop-code=(\d+)$
Example 2Sainsbury's (Q152096) -> ^https:\/\/stores\.sainsburys\.co\.uk\/(\d+)\/([-\w]+)$
Example 3Bupa (Q931628) -> ^https:\/\/www\.bupa\.co\.uk\/dental\/dental-care\/practices\/([-\w]+)$
Planned useMatch urls to brands

Motivation edit

In the OSM community we have (at least) 3 homes for this data, All The Places (Q115707984), Name Suggestion Index (Q62108705) and QA tools. We're duplicating effort, and falling out of sync [1], we need a better home for them, and one accessible to everyone not just coders on GitHub. CjMalone (talk) 14:01, 16 December 2022 (UTC)[reply]

Discussion edit

  • do we want to try to capture the structure of the URLs are at all? For example using something akin to URL match replacement value (P8967)? I'm generally ok with creating this property either way. BrokenSegue (talk) 02:06, 17 December 2022 (UTC)[reply]
    I've looked up a few examples of URL match replacement value (P8967), I'm not sure I understand it.
    But similarly, you'll notice I've put groups in the examples, there was a time when I really liked those as unique ids. But overtime I've come to the conclusion that they are aren't as valuable or simple as I thought. The Greggs one is a nice simple int, however it's there cms id, not the "store id" the staff use. The Bupa one is a classic SEO slug, including the region and the name, in this case it's just lowercase, replacing spaces for hyphens, unfortunately it's often more complex. The Sainsbury's is a mix of both.
    Going from ref -> url is hard, and I don't think it's feasible to do for all the chains in a unified way. CjMalone (talk) 12:01, 18 December 2022 (UTC)[reply]
  •   Support (  Don't ping me) NMaia (talk) 07:02, 18 December 2022 (UTC)[reply]
  •   Comment There could be infinite number of variants to this property where we have regex patterns for other things than locations, like products, people, projects, publications etc. Can we make this property more generic somehow? Ainali (talk) 19:50, 22 December 2022 (UTC)[reply]
    interesting thought. we could use qualifiers maybe applies to part (P518)? what to call it is tricky though. it's for any relationship between the entity and some other entity. maybe just reusing URL match pattern (P8966) would be ok? I guess @CjMalone: suggested that originally and I veto'd it. oops. BrokenSegue (talk) 20:19, 22 December 2022 (UTC)[reply]
  •   Oppose Shouldn't a separate property should be created each time e.g. for the provided example of https://www.bupa.co.uk/dental/dental-care/practices/brampton, "brampton" is the value of a new property "Bupa Dental Care Practice ID", and this new property has a OpenStreetMap tag or key (P1282) value corresponding to an OSM tag "ref:GB:bupa:..." that also has the value of "brampton"? --Dhx1 (talk) 04:14, 2 January 2023 (UTC)[reply]
      Comment For example, https://github.com/alltheplaces/alltheplaces/blob/HEAD/locations/spiders/hilton.py corresponds to Wikidata property Hilton Hotel ID (P11388), https://github.com/alltheplaces/alltheplaces/blob/HEAD/locations/spiders/accor.py corresponds to Wikidata property Accor hotel ID (P11222). Wikidata has a severe lack of authority control identifiers of the types that All The Places (Q115707984) would need but there is some progress made towards this goal, particularly in some categories such as hotels. Many or most restaurant and retail chains have individual store codes/identifiers, but often it's not easy to find these codes/identifiers as they may for example only be printed on invoices/receipts and otherwise only used internally by the business. Dhx1 (talk) 04:37, 2 January 2023 (UTC)[reply]
    @Dhx1: are you proposing we make hundreds of new properties to meet this use case? Also, that would require us to make items for every individual branch of Greggs and such. That may or may not be desirable. I see this proposal as providing value. BrokenSegue (talk) 04:53, 2 January 2023 (UTC)[reply]
    Is this intent then that this property is "official URL pattern for instances of a class", and matching groups are not used in the regex patterns (because the matched group text has no definition/linked property)? Thus if you have a matching URL, you know that it it is official URL of a particular instance (with or without a Wikidata item) of a known class. We'd then seemingly need to have items for "McDonald's restaurant", "Starbucks store", etc to which this property applies? Dhx1 (talk) 05:40, 2 January 2023 (UTC)[reply]
    i think the idea is to attach it to McDonald's and Starbucks and have the property mean "URL match pattern for locations of this business" BrokenSegue (talk) 06:27, 2 January 2023 (UTC)[reply]
  •   Comment I think having a central store for this information would be very useful. I currently maintain my own list at https://osm.mathmos.net/chains/top-brands.cgi for UK brands in OSM, to help with QA tools I run. I'm not sure whether or not Wikidata is the right place for the regex's though. Certainly, I think there's a bit more nuance to the values, than simply assigning one to each wikidata brand. In particular we might want different expressions for different countries, or possibly different expressions for different types of outlet within the same brand. These issues make me wonder if the Name Suggestion Index (which already categorizes entries based on such distinctions) might be a better home for the data. -- Rjw62 (talk) 22:34, 9 January 2023 (UTC)[reply]
    we can use qualifiers to specify the country that each regex applies to. I don't see why wikidata wouldn't be a good place for this data though maybe OSM would be a better steward. I'm indifferent as long as the information is tied together anyways (between OSM/Wikidata) BrokenSegue (talk) 23:47, 9 January 2023 (UTC)[reply]
    I actually proposed it in NSI, just because I know that world better than Wikidata. The answer seemed to go to "maybe", and then nothing happened.
    If we think about this technically, NSI already consumes Wikidata and so it should be relatively easy for them to copy it to there output from Wikidata when/if they decide to. Wikidata consumers don't consume NSI, so it's an extra step away for every consumer that wants the data.
    And then there is a less technical argument, it's easier for people to contribute to Wikidata than to JSON files on GitHub. CjMalone (talk) 11:22, 13 January 2023 (UTC)[reply]
  •   Support This property would be more broadly useful with a complementary "location URL formatter" property (with class of non-item property value (P10726)format string (Q3748135)). Then consumers could not only match existing URLs to extract location IDs but also construct URLs for individual locations based on known IDs. – Minh Nguyễn 💬 20:31, 9 February 2023 (UTC)[reply]