Wikidata:Requests for permissions/Bot/Soweego bot 2

Feedback edit

@Jura1: thank you very much for your feedback. I have posted a reply here to avoid further modification of this archived page: User_talk:Jura1#Thanks_for_your_feedback_on_User:Soweego_bot_task_2


Dear Jura1,

In Wikidata:Requests_for_permissions/Bot/Soweego_bot_2, you mentioned 2 important points. Let me explain.

  1. IMDb: the bot sees raw URLs from target sources like MusicBrainz (Q14005) and tries its best to convert them to known external identifiers. To achieve so, it attempts to match each given input URL against all formatter URL (P1630) of external identifier properties, and to extract the correct identifier through format as a regular expression (P1793). This is done via SPARQL queries. Unfortunately, IMDb ID (P345) seems to have an exotic formatter URL marked as preferred statement: https://tools.wmflabs.org/wikidata-externalid-url/?p=345&url_prefix=https://www.imdb.com/&id=$1. It will never match an IMDb input URL, hence the reason why you are seeing them added as is. Do you have any suggestions to avoid this? Of course, consider that building a custom rule for each exception is not a sustainable solution;
  2. official website (P856) VS described at URL (P973): I totally understand this point and will implement an extra check for official website (P856) values.

Thanks again for your precious comments.

Cheers,
Hjfocs (talk) 09:46, 19 November 2018 (UTC)[reply]

Sounds good. BTW, I added the commment above as well as it might be easier to find for other people. Further, I included the recommended "new section" header. --- Jura 06:49, 20 November 2018 (UTC)[reply]