About this board

Previous discussion was archived at User talk:Shisma/Archive 1 on 2019-05-26.

MediaWiki message delivery (talkcontribs)
Reply to "Wikidata weekly summary #653"
MediaWiki message delivery (talkcontribs)
Reply to "Wikidata weekly summary #652"
MediaWiki message delivery (talkcontribs)
Reply to "Wikidata weekly summary #551"
MediaWiki message delivery (talkcontribs)
Reply to "Wikidata weekly summary #650"
Kirilloparma (talkcontribs)

When you want to add a reference to any statement, please avoid using reference URL (P854) property (as here) when there is an external identifier property in Wikidata. In this case, you should have used this property (as reference) instead of P854. I don't know who created the Wikidata for Firefox tool, but this should be fixed as soon as possible, as it is semantically incorrect. See for example how the UseAsRef tool works, this is exactly what is expected from Wikidata for Firefox. In a nutshell: when we have an external identifier property in Wikidata, it is that property that should be used in references, not P854.

Shisma (talkcontribs)
Shisma (talkcontribs)

Correct me if I'm wrong: It seems to me UseAsRef allows a human user, to use an external id statement as a source on a different statement that a human user has verified intrinsicly contain information to be referenced.

For example:

  1. title (P1476) → Quizzical
  2. ‎Playdate Catalog ID (P12125)318641

a human could assume that if statement 1. is true, a url that can be derived from statement 2. would contain a proof for it.

Is this the idea?

Kirilloparma (talkcontribs)

Yes, that's the idea. A person who sees that a statement is true (let's say date of birth (P569)), then another property of the external identifier (let's say danskfodbold.com player ID (P12827)) can be a reference to confirm that statement, but instead of using the semantically incorrect and broad in scope property reference URL (P854), it should be exactly used (as in the example) danskfodbold.com player ID (P12827) property.

This is what it's supposed to look like:

Karim Zaza (Q945746)

date of birth
Normal rank 9 January 1975
1 reference


add value

And NOT like this as the Wikidata for Firefox tool suggests on a regular basis:

Karim Zaza (Q945746)

date of birth
Normal rank 9 January 1975
1 reference
reference URL https://danskfodbold.com/spiller.php?spillerid=10156
title danskfodbold.com - DBU's Officielle Statistikere
retrieved 21 June 2024
add reference


add value

Regards

Shisma (talkcontribs)

I think both reference types are alright. stated in works best in situations where a statement is derived from a database dump/extract, where the statement might not even be visible under a certain url. reference url is appropriate where a url holding the statement is known but the presence of an intrinsic connection to the database entry is uncertain. None of the two options are any better or worse: The difference is merely how the information was retrieved.

Wikidata for Web does not know database extracts: it only ever sees urls and can therefore only make reference statements for urls. Therefore I think reference url is a good default. However I could allow the user to choose which reference type is a better source for a particular statement. What do you think about this?

Kirilloparma (talkcontribs)

> I think both reference types are alright.

In this case yes, and in this case no. In the first case, you can use any tool you want, because we are referencing a news article and Wikidata does not have a separate property for news articles of this database. In the second case, Wikidata for Firefox is not suitable because we already have an external identifier property and it is that property that should be in the references, not P854, which is the crux of the problem.

> stated in works best in situations where a statement is derived from a database dump/extract, where the statement might not even be visible under a certain url. reference url is appropriate where a url holding the statement is known but the presence of an intrinsic connection to the database entry is uncertain. None of the two options are any better or worse: The difference is merely how the information was retrieved.

It has nothing to do with stated in (P248) in or reference URL (P854) for those cases where we are referring to a URL that does not contain any identifiers. In our case, I am talking about the URL that is used by the formatter URL (P1630) property. We don't need to use reference URL (P854) at all if we already have an external identifier property. stated in (P248) again has nothing to do with it, although it is the one that increases "what links here" and it is advantageous. I still suggest you avoid using the problematic Wikidata for Firefox for references, since we have UseAsRef which does a great job and doesn't use P854 for external identifiers that existing on Wikidata.

> Wikidata for Web does not know database extracts: it only ever sees urls and can therefore only make reference statements for urls

And that's the problem.

> Therefore I think reference url is a good default.

I don't think this is the right approach. By using reference URL (P854) we are only increasing its usability, and the external identifier property is essentially not used at all. What's the point of creating external identifier properties then if we don't even use them later and prefer P854? It's such a waste! Question: why did you propose this property that I created later? Let's say we're going to use this property and P854 in references (see ), okay, but what about the property itself? And here is the result, the property itself is used only 2 times in the references, that's the "waste" I'm talking about. Regards

Kirilloparma (talkcontribs)

Hi! Last time we discussed about this topic, I made it clear that this tool should not use P854 if the external identifier property exists, but I see that you are still using the first option. I would ask you to respect the choices of those who not only prefer, but also care about Wikidata external identifiers. Constantly using P854 instead of an existing created external identifier loses the whole point of Wikidata properties. Here's an example: this property mentions that its scope also includes the use of an external ID as a reference. But the question is: why is this indicated if P854 is going to be used instead of the external ID anyway? That's what I'm talking about, the whole point of using the property as a reference is lost here, its use will be useless if the P854 option is always applied. Regards

Shisma (talkcontribs)

I apologise for not answering earlier. I thought I did.

Question: why did you propose this property that I created later? Let's say we're going to use this property and P854 in references (see ), okay, but what about the property itself? And here is the result, the property itself is used only 2 times in the references, that's the "waste" I'm talking about.

When I proposed the property, I did not intend it to be used in references. I don't mind if anyone does though. It was meant as a main statement and I think i perfectly fulfils its function.

…not for experienced Wikidata editors, as no one in their right mind would use P854 in references.

Well, I'm sorry but I don't see that the use of reference URL (P854) is universally agreed to be a bad practice by the community.

Here's an example: this property mentions that its scope also includes the use of an external ID as a reference.

I think the constraint only implies that the property may be used in the reference scope, not that it has to be.

Reply to "P854"

The Chrome extension and big entities and regular expressions…

17
Trade (talkcontribs)

I dont know if you notice but the heavy amount of lag makes the extension very hard to use when matching entries to countries.

Any chance you could let us just not show the statements? Just show whether or not the entries is connected to an item along with description, label and P31

Shisma (talkcontribs)

Opening big items like say United States of America (Q30) is indeed very slow. I surely can make some optimisations for those cases. But from looking at your contributions I gather you are using the Chrome version which is unlikely to see such optimisations soon as it is currently unmaintained. 😕

Trade (talkcontribs)

What you got against Chrome /:

Shisma (talkcontribs)

Chrome is soon going to depricate the webRequest api in order to prevent adblockers from working. Unfortunately the same api is currently crucial for this extension to work. I'd still like to support chrome derivatives if possible but I'm not going to spend any time until the dust has around this has settled. I don't want to spend multiple weekends on something that could break the second I'm done.

I hope that puts things in perspective.

Shisma (talkcontribs)

I don't have anything against chrome. Its just additional work and this is a project that I maintain in my spare time.

Trade (talkcontribs)

So which versions are actually maintained?

Shisma (talkcontribs)
Trade (talkcontribs)

I noticed that everytime i use the extension with Rate Your Music track ID (P13056), it adds an / at the end

Also it keeps trying to add the name of the artist and "Lyrics and ratings" as an alias to the tracks which is enabled by default

"Wenche Myhre - Ei snerten snelle - Lyrics and ratings" is not an alias you really want on Q130534993

Shisma (talkcontribs)
Trade (talkcontribs)
Shisma (talkcontribs)

I don't understand. You want me to join ZI Jony's conversation, or you want ZI Joni to join this conversation? 😅

Trade (talkcontribs)

First

Trade (talkcontribs)

Oh and also add a URL match pattern and web page title extract pattern to Push Square series ID and VG247 series ID if you get time

Shisma (talkcontribs)

Yes, i’ll have a look at it later. How about i’ll write down a short guide on how to write a url regular expression? It’s pretty straightforward and you won’t depend on me.

Shisma (talkcontribs)

The general workflow looks like this:

  1. Click on the link om one of the examples that links to the external website and copy the URL.
  2. (when the url contains %) go to urldecoder.io paste the url into the top field and copy it from the bottom.
  3. Go to regex101.com
  4. Paste the URL into the field that says Test Sting.
  5. Now paste the same URL into the Regular expression field.

We will now fine tune the the Regular Expression so that the part of the Test string we want to extract is highlighted in green.

Things we need to change:

Some parts of the URL will be highlighted in the Regular Expression field:

Expression: https://www.pushsquare.com/games/browse?title=serie:overwatch

These characters have a specific meaning in the context of regular expression. At this stage, we don't want that. And we can convert them to regular characters by adding a \ in front of them.

Expression: https:\/\/www\.pushsquare\.com\/games\/browse\?title=series:overwatch

At this stage, the Test string should appear completely blue. This means, that the pattern will now match exactly this URL.

We now need to capture the actual id. We do that by wrapping the id in ( and )

Expression: https:\/\/www\.pushsquare\.com\/games\/browse\?title=series:(overwatch)

Test sting: https://www.pushsquare.com/games/browse?title=series:overwatch

The id should now be highlighted in green. This pattern will now only find the id overwatch. It won't match any other id. In order to change that we have to express the pattern of the id in the expression.

What is the pattern? It looks like the id only consists of one or more lowercase characters. You can just make a list of all possible characters and wrap it into [ and ]. Like [abcdefghijklmnopqrstuvwxyz] but you can shorten it down to [a-z]:

Expression: https:\/\/www\.pushsquare\.com\/games\/browse\?title=series:([a-z])

Test sting: https://www.pushsquare.com/games/browse?title=series:overwatch

This will now find only the first character of the id. In order to match one or more characters, you can just add a + after the group.

Expression: https:\/\/www\.pushsquare\.com\/games\/browse\?title=series:([a-z]+)

This pattern should now successfully highlight the id overwatch from

Test sting: https://www.pushsquare.com/games/browse?title=series:overwatch.

But what's that? It doesn't work with:

Test string: https://www.pushsquare.com/games/browse?title=series:assassin's-creed 😱

That's because we didn't expect characters like ' or - to appear in our ID.

In order to find them that we add these characters to the list of our possible characters, wrapped in [ and ]:

Expression: https:\/\/www\.pushsquare\.com\/games\/browse\?title=series:([a-z'-]+)

This pattern should now match all the provided examples.

There is a quick introduction on Youtube that should help with other scenarios you might come across. (the relevant part begins at minute 1)

Shisma (talkcontribs)
Back ache (talkcontribs)

Looks pretty good :-) there are some "URL match pattern"'s out there that are that need a little maintence because of things like there now being a variation of URL they didn't expect or it initnally having been autogenerated from another property Back ache (talk) 07:38, 19 October 2024 (UTC)

Reply to "The Chrome extension and big entities and regular expressions…"
MediaWiki message delivery (talkcontribs)
Reply to "Wikidata weekly summary #649"
MediaWiki message delivery (talkcontribs)

Reply to "Wikidata weekly summary #648"

Duplicated "Star Trek: The Classic Episodes" editions

16
Maxlath (talkcontribs)

Hi! Back in 2018, it seems that you did a quickstatement import that resulted in creating 10 almost identical items, sharing the same ISBN as well as other identifiers. There seems to be some differences in format de distribution (P437) though:

Can those be merged? Was that intentional or a mistake?

Maxlath (talkcontribs)
Shisma (talkcontribs)

It sure looks like it. lets merge them

Maxlath (talkcontribs)
Shisma (talkcontribs)

It would seem that these three books – the only Star Trek Books published by Weltbild (Q883522) – actually all have the same ISBN. Is this possible?

They are definitely individual books.

Maxlath (talkcontribs)
Shisma (talkcontribs)

I'll work through it

Shisma (talkcontribs)

I think my script back then had issues with miscellanies. I'm sorry. thanks for making me aware

Maxlath (talkcontribs)
Colin R Robinson (talkcontribs)

I made this SPARLQL query which gives all all items that have both a P212 violation and a P12204 ELMCIP ID.

SELECT ?item ?itemLabel ?p212_value ?elmcip_id WHERE {
  ?item wdt:P212 ?p212_value ;
        wdt:P12204 ?elmcip_id .
  FILTER(EXISTS {
    SELECT ?item (COUNT(?p212_value) AS ?count) WHERE {
      ?item wdt:P212 ?p212_value .
    } GROUP BY ?item HAVING(?count > 1)
  })
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!

There were 2 results, so I have quickly fixed them.

46.212.78.229 (talkcontribs)

Is there an easy way to see Unique Value Violations by ELMCIPBot? I will correct those immediately - User:Colin R Robinson

Richard Nevell (talkcontribs)

Indeed, if there's effectively a work list by user I'd like to fix the ones I'm responsible for.

Maxlath (talkcontribs)

I tried to find an effective way, and now have a list of users contributing to several items with the same ISBN; for you that gives this:

I'm going to post other users constraint violations on Property_talk:P212

Maxlath (talkcontribs)
Reply to "Duplicated "Star Trek: The Classic Episodes" editions"

Missing or faulty url match patterns

3
Summary by Shisma

Added new match patterns. Updated faulty ones

Trade (talkcontribs)

Could you make it work with MetalTabs.com musician ID (P13021)?

Shisma (talkcontribs)
Shisma (talkcontribs)
Reply to "Missing or faulty url match patterns"