Wikidata:Property proposal/title match pattern

website title match pattern edit

Originally proposed at Wikidata:Property proposal/Authority control

Descriptiona regular expression extracting a probable label from the <title /> of a website
Data typeString
Domainproperty
Allowed valuesregular expression with a single capture group
Example 1IMDb ID (P345)URL match pattern (P8966)^https?:\/\/(?:(?:www|m)\.)?imdb\.com\/(?:(?:search\/)?title(?:\?companies=|\/)|name\/|event\/|news\/|company\/|list\/)(\w{2}\d+)title match pattern^(.*)\s-\sIMDb$
Example 2X username (P2002)URL match pattern (P8966)^https?:\/\/(?:mobile\.)?twitter\.com\/(?:intent.+screen_name=)?(?!home|hashtag|explore|settings)([0-9A-Za-z_]{1,15})title match pattern^(.+)\s\(@[^\)]+\)\s\/\sTwitter$
Example 3MusicBrainz artist ID (P434)URL match pattern (P8966) ^https?:\/\/musicbrainz\.org\/artist\/([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}) title match pattern^(.+)\s-\sMusicBrainz$

Motivation edit

Wikidata for Web (Q99894727) is a browser extension that recognises websites that have the value external id property on wikidata from it's url using the URL match pattern (P8966) property.


It is also able to create external id statements and add them to a user defined (or new entity)

 

the user, however has to enter the label of an existing entity manually. Most websites already carry an appropreate label in their <title/>. it is however usually diluted with some website specific words, that are most likely not part of the label.

The Twitter Profile of Tim Berners-Lee for example has a title element that looks like this: <title>Tim Berners-Lee (@timberners_lee) / Twitter</title>

In order to find the wikidata label we only need what ever precedes the opening bracket. A regular expression to extract that string could be ^(.+)\s\(@[^\)]+\)\s\/\sTwitter

This property would be meant to be used as a qualifier for URL match pattern (P8966) (see examples) --Shisma (talk) 12:53, 18 June 2022 (UTC)[reply]

Discussion edit

  •   Comment I'm not sure why you need this for an existing entry - they should already have a label? But I could see this being useful for new entities - is that what you meant here? ArthurPSmith (talk) 16:09, 20 June 2022 (UTC)[reply]
for existing items this would be merely a convenience feature. Often the title (the relevant part im trying to extract) of the thing is a 1:1 match to some wikidata label/alias. I could also use this property to add subject named as (P1810) to each statement or to add aliases-Shisma (talk) 06:37, 21 June 2022 (UTC)[reply]