Wikidata:Project chat

(Redirected from Wikidata:Project Chat)

Wikidata project chat
A place to discuss any and all aspects of Wikidata: the project itself, policy and proposals, individual data items, technical issues, etc.

Please use {{Q}} or {{P}} the first time you mention an item or property, respectively.
Other places to find help

On this page, old discussions are archived after 7 days. An overview of all archives can be found at this page's archive index. The current archive is located at 2021/10.

Elo ratings updateEdit

I am ready to do it regularly:

  • Update the chess players' ratings on a monthly basis according to the FIDE monthly reports
  • Sort rating entries by date
  • Add the maximum rating value, if such a property exists. I do not know. This property is needed in the chess player's card on Wikipedia. If there is no such property, maybe create it?
  • Delete duplicate a chess player's rating if he has not played this month and his rating has not changed. Example: one participant flooded Wikidata with useless data, for example, Garry Kasparov has 201 rating entries, but the last 140 entries are the same. Why duplicate a rating if it hasn't changed since last month?
  • Add a chess player, if he is in the FIDE's sheet, regardless of his achievements. Or need clear criteria for adding. Although I do not see the need for this. The criteria are necessary for wikipedia in my opinion. But wikidata is for everyone, we don't know who might need this data.
  • Replacing the links to the member's page on the FIDE's site with a more reliable link to the monthly rating file on the FIDE's site. Member's page is deleted after death, but monthly files remain.

Please express your opinion. Игорь Темиров (talk) 17:41, 28 September 2021 (UTC)[]

@Игорь Темиров Where do you envision these monthly Elo rating to go in a ten or twenty years' time? The items of chess players will have hundreds of statements for Elo ratings. Wouldn't it be better to just move all Elo ratings to Tabular data on Commons now? Vojtěch Dostál (talk) 18:54, 28 September 2021 (UTC)[]
@Vojtěch Dostál. I have been posting the ratings of the best chess players on the commons.wikimedia.org for a year now. Used in the module:RatingFIDE for set the rating Elo to the chess player's card.
But there is a limit to the page size, accommodates about 30 thousand chess players out of 300 thousand. This is good for the current chess rating, but bad for the history of the rating change.
Apparently, Wikidata is the best option for this, but with the condition of deleting duplicate useless entries during the period when the chess player has no games. Игорь Темиров (talk) 19:41, 28 September 2021 (UTC)[]
Re: size limit - that's unfortunate. I wish there was a way to store these numerical, periodically changing data that doesn't involve Wikidata :( Vojtěch Dostál (talk) 19:49, 28 September 2021 (UTC)[]
I am now not active in the chess field, so this should only be a hint that those actually involved can exchange ideas. Regarding the duplicated posts, you could, if you want, use the two properties start time and end time. I have already done this with chart placements, which, like the chess player ranking, change from time to time. --Gymnicus (talk) 21:21, 28 September 2021 (UTC)[]
  • Do we still do ELO ratings for zombies? At some point people kept getting ELO ratings even for dates after their known date of death.
It seemed to be me that @Steak: put some order in it and updates them on a quarterly basis. Isn't this sufficient?
There is some discussion also at Wikidata_talk:WikiProject_Chess#Elo_ratings_update. --- Jura 05:56, 29 September 2021 (UTC)[]
"Duplicate ratings" is a misconception by the thread opener. A rating is not simply duplicate, when it does not change from one rating period to the next. FIDE provides Elo ratings in regular intervals (currently once per month, in former times it was e.g. twice a year), and our task is to provide this data. There is no gain in simply omitting ratings that do not change. A complete list of best ratings of current month would not be possible if players without rating change are simply left out. And yes, we also provide Elo ratings of people that are dead, when FIDE also provides this data, but these ratings are marked as deprecated. Steak (talk) 06:00, 29 September 2021 (UTC)[]
I don't think Wikidata's task is to provide nonsensical ratings. If people are interested in zombies, these should go directly to FIDE. --- Jura 06:17, 29 September 2021 (UTC)[]
I agree with Jura here. If rating value does not change over a time period, it is unnecessary to state it several times. Instead, a longer time window of start time (P580) and end time (P582) can be used to aggregate this information, without any information loss. Vojtěch Dostál (talk) 10:14, 29 September 2021 (UTC)[]
"Duplicate ratings is a misconception by the thread opener" - FIDE provides the full rating because it is its responsibility. But we don't need unchanging ratings. Игорь Темиров (talk) 15:56, 29 September 2021 (UTC)[]
Removing duplicate rating makes it hard or impossible to query lists for specific months. Can you show me how you would get the Elo Top 100 for, e.g. January 2009, if not every player has a rating statement for that month? Currently the query looks like this:
SELECT ?item ?elo ?fide_url WHERE {
  ?item wdt:P31 wd:Q5; p:P1087 [ ps:P1087 ?elo; pqv:P585/wikibase:timeValue ?time ] .
  FILTER (YEAR(?time) = 2009 && MONTH(?time) = 1) .
} ORDER BY DESC(?elo) LIMIT 100
Try it!
How would you modify it?
Next issue is with the de:Template:Elo-Diagramm: If a rating is removed, the connection line between the ratings is not anymore straight, but skewed, because the intermedia rating is missing. See e.g. de:Sergei_Alexandrowitsch_Karjakin: If all ratings between February 2020 and November 2020 would be missing, there would be a diagonal line between January 2020 and January 2021. Which would be of of course wrong. For Bent Larsen on the page de:Template:Elo-Diagramm, it is even more apparent, that the line would be wrongly diagonal if the constant ratings between January 2005 and October 2008 would be removed. Steak (talk) 16:16, 29 September 2021 (UTC)[]
  • Why do we need the top 100 Elo, for example, for January 2009?
"Which would be of of course wrong" - Not at all, the graph only shows the trend, and not the super-accurate value of the rating. Игорь Темиров (talk) 16:30, 29 September 2021 (UTC)[]
You simply don't know what people need. Why would you not need a top Elo list of a given rating period? You cannot simply say "We don't need it". Currently the possibility is there, and you want to destroy it without need. And regarding the Graph: No, if the graphs shows a line at rating 2700, when in reality the rating was 2750, then the graph is at least misleading. Steak (talk) 16:38, 29 September 2021 (UTC)[]
@Steak presumably there should be no diagonal lines, because the rating is not only merely not-everywhere-smooth but actually discontinuous. So probably interpolate: step-after is what you need somewhere in there? See https://vega.github.io/vega/examples/line-chart/ for an example. Inductiveload (talk) 16:32, 29 September 2021 (UTC)[]
Yes, maybe, I don't know how this works exactly. Steak (talk) 16:38, 29 September 2021 (UTC)[]
@Steak Specifically, with interpolate: step-after you only need a data point when the value changes, so all the intermediate points are redundant. Inductiveload (talk) 23:20, 29 September 2021 (UTC)[]
Okay, for this usecase this might be fine. Still, for ratings lists of a given month, all ratings of that month are needed. Steak (talk) 15:33, 30 September 2021 (UTC)[]
  • Actually, I don't mind the periodical updates. The frequency is IMHO debatable. I (still) find the additions after a person died problematic. I thought we had stopped with that. For some uses, I just skip chess players. Occasionally, it happens that this drops a person that is otherwise notable. --- Jura 16:31, 29 September 2021 (UTC)[]
More general   Comment, not specifically for this exact ELO thing. Would it make more sense to farm out data-series like this (and other "heavy" uses like the famous found in taxon (P703) (e.g. see (E)-p-coumaric acid (Q99374)) to farm out to a separate, dedicated item? Specific templates can follow a relevant property (e.g. probably some kind of generic "has data series", qualified as needed) to the list and then happily inhale as much as it wants from there. Meanwhile, the actual item itself is not accruing a unbounded amount of data (multiplied by the number of "heavy" items certain projects want to use).
Are people using queries that that would be impractical for? Inductiveload (talk) 16:43, 29 September 2021 (UTC)[]

shouldn't this discussion be at Wikidata:Requests for permissions/Bot? BrokenSegue (talk) 23:28, 29 September 2021 (UTC)[]

Sure, if the Thread Opener wants to do changes in big scale, he should apply for a bot. Steak (talk) 15:33, 30 September 2021 (UTC)[]

The ratings for September appeared. Summing up some of the discussion:

  • The edits are done by my bot.
  • Does not add new non-title players.
  • Sorts Elo occurrences by date without removing duplicates.
  • Adds ratings for September and the following months without duplicates.
  • Replaces the link to the FIDE's player card with the link to the regular rating list of FIDE (example, Q108360564) or olimpbase-file (Q108680205).

So good? Игорь Темиров (talk) 07:09, 2 October 2021 (UTC)[]

I don't agree to the sorting and to the last point. Sorting is not needed. If you sort without changing the statements, I actually don't care if you do it or not. But don't remove retrieval date and the FIDE-IDs in the references. They are needed. For example, take Roman Popov (Q27530047) and Roman Popov (Q27530048). If you look at an Elo list, how would you distinguish between them? In this case, you could use the DoB, but this might be missing or even be the same. The only unique identifier is the ID, and therefore it is needed for every single statement. For this usecase, it does not matter if the profile page still exists or not. Steak (talk) 13:45, 3 October 2021 (UTC)[]
Your example is useless. If we know the ID of a chess player, then we will find it in the rating list. But if his card has disappeared, then your link leads nowhere. For example, grandmasters Pal Benko (Q465247), Stanislav Bogdanovich (Q18544873), Oleg Chernikov (Q4513619), István Csom (Q874008), Gildardo García (Q11699820), Dmitry Kayumov (Q4218359), Gennady Kuzmin (Q1231236), Yrjo A. Rantanen (Q1338668), Radoslav Simic (Q4419655), Markus Stangl (Q90543), Sulava, Nenad (Q2284127), Dmitry Svetushkin (Q3774368), Miroslav Tosic (Q4461611), Predrag Trajkovic (Q4461781), Wolfgang Uhlmann (Q1510108), Arsen Yegiazarian (Q2120169). Make sure all your Elo links from these chess players are going nowhere. These are only grandmasters and only for the last two years. What to do with your links that have stopped working?! But you can still find it in the rating lists. Of course, replacing links with more reliable ones is needed. Игорь Темиров (talk) 14:38, 3 October 2021 (UTC)[]
@Игорь Темиров: Again, if you are running a bot you must make get approval on Wikidata:Requests for permissions/Bot. Looking at your history I see lots of bot edits without approval. Sorting ELO statements is not valuable. Also can you go and nominate all the non-notable items you created for deletion. Example: Q108686706. BrokenSegue (talk) 16:42, 3 October 2021 (UTC)[]
@BrokenSegue: Thanks. All this has already been discussed above. Except for deleting. A request for deletion was submitted, but did not find support. But, as I wrote above, I will not add chess players without titul in the future. Игорь Темиров (talk) 17:17, 3 October 2021 (UTC)[]
@Игорь Темиров: Can you give me a link to where request for deletion was submitted? I'm very surprised we decided to keep clearly non-notable people. BrokenSegue (talk) 17:56, 3 October 2021 (UTC)[]
@BrokenSegue: yes. Игорь Темиров (talk) 18:19, 3 October 2021 (UTC)[]
That request for deletion is still active and you are the only one opposed to deletion... BrokenSegue (talk) 18:33, 3 October 2021 (UTC)[]
This topic is not about that. Игорь Темиров (talk) 19:30, 3 October 2021 (UTC)[]

Steak wrote "How would you modify it?" I would simply add tests for start time (P580) and end time (P582). There is no problem in that, but the query will be slower because you need a filter (unlike for point in time (P585)!):

SELECT ?item ?elo
WHERE
{
  ?item wdt:P31 wd:Q5 .
  ?item p:P1087 ?elo_stm .
  ?elo_stm ps:P1087 ?elo .
  {
    ?elo_stm pq:P585 "2009-01-00T00:00:00Z"^^xsd:dateTime .
  }
  UNION
  {
    ?elo_stm pq:P580 ?start_time .
    ?elo_stm pq:P582 ?end_time .
    FILTER (?start_time <= "2009-01-00T00:00:00Z"^^xsd:dateTime &&
            "2009-01-00T00:00:00Z"^^xsd:dateTime <= ?end_time)
  }
}
ORDER BY DESC(?elo)
LIMIT 100
Try it!

The German Elo graph template could also read P580/P582. Using them gives no loss of data. --Dipsacus fullonum (talk) 21:55, 7 October 2021 (UTC)[]

I still doubt that this is useful. Let's say I want to know Magnus Carlsen rating in January 2009. Currently, I can simply query P585 = 01/01/2009. This would need to be replaced by a filter using start time and end time. This is not user friendly. Or, you can also reverse it: If want to know someones rating in June 1999, I would currently get no result, because there was not rating published. This would be totally correct! With your method, you would get some anachronistic misleading rating. Steak (talk) 18:08, 12 October 2021 (UTC)[]
And you also can't find out how many chess players have brown eyes on January 1, 2009. And who cares? We say real reasons, but you come up with useless ones. What for? Игорь Темиров (talk) 11:23, 17 October 2021 (UTC)[]
But if you really want to, then, for example, Carlsen's rating for August 1, 2021. As you can see, the March rating is obtained. So after March, the rating was not entered
SELECT ?value ?maxdate  WHERE  {
  {
    SELECT (max(?date) as ?maxdate) where {
     wd:Q106807 p:P1087 [pq:P585 ?date]
     FILTER (?date <= "2021-08-01"^^xsd:dateTime).           
      }
    }    
  wd:Q106807 p:P1087 ?rating.
  ?rating ps:P1087 ?value.
  ?rating pq:P585 ?date.
  FILTER (?date = ?maxdate)         
  }
Try it!

Игорь Темиров (talk) 15:29, 17 October 2021 (UTC)[]

datatype for en:Coxeter–Dynkin diagram?Edit

Is it possible to create a property about Coxeter–Dynkin diagram (Q169451)? Currently, Coxeter–Dynkin diagram in English Wikipedia uses a LUA module to output multiple pictures. (like this      ) I have no idea which datatype can store that symbols.--[雪菲🐉蛋糕🎂] >[娜娜奇🐰鮮果茶☕](☎️·☘️) 09:53, 30 September 2021 (UTC)[]

@A2569875 I don't know about a datatype, but if there was a canonical serialisation format, you could use that (e.g. like w:SMILES) as a string.
If you have an image, you could use image (P18)the file, and then qualify with object has role (P3831)Coxeter–Dynkin diagram (Q169451)? Inductiveload (talk) 13:54, 30 September 2021 (UTC)[]
Is it possible create a datatype like Help:Data_type#Chess?--[雪菲🐉蛋糕🎂] >[娜娜奇🐰鮮果茶☕](☎️·☘️) 05:10, 1 October 2021 (UTC)[]
You may want to propose a new property instead of a new datatype.--GZWDer (talk) 05:33, 1 October 2021 (UTC)[]
But how? convert the node-edge graph into string?--[雪菲🐉蛋糕🎂] >[娜娜奇🐰鮮果茶☕](☎️·☘️) 06:24, 1 October 2021 (UTC)[]
Is there any standard way of representing these graphs as strings? If not, then using the "commons media file" is probably the best approach. — Martin (MSGJ · talk) 07:17, 1 October 2021 (UTC)[]
Asking... en:Wikipedia_talk:WikiProject_Mathematics#about_Coxeter–Dynkin_diagram, en:Talk:Coxeter–Dynkin_diagram#store_into_wikidata. Please wait. --[雪菲🐉蛋糕🎂] >[娜娜奇🐰鮮果茶☕](☎️·☘️) 10:10, 3 October 2021 (UTC)[]
@Tomruen:. --[雪菲🐉蛋糕🎂] >[娜娜奇🐰鮮果茶☕](☎️·☘️) 13:28, 10 October 2021 (UTC)[]
I got some comment from English Wikipedia en:Wikipedia_talk:WikiProject_Mathematics#about_Coxeter–Dynkin_diagram. Tom Ruen say that Coxeter–Dynkin diagram (Q169451) can be representing as an Ascii strings like x4o3o for       ({{CDD|node_1|4|node|3|node}}). In the Wikidata:Property proposal/Bowers acronym, we noticed that some of the symbol text used in the URL (https://bendwavy.org/klitzing/incmats/$1.htm) is actually a Coxeter–Dynkin diagram (Q169451), not a Bowers acronym (P9997). For example, bowers acronym of Infinite-order triangular tiling (Q17077551) are aztrat, link is https://bendwavy.org/klitzing/incmats/x3oinfino.htm ; the symbol “x3oinfino”(x3o∞o,      ) is not a Bowers acronym, it is en:Coxeter–Dynkin diagram. In the klitzing's website, there is also a page that explains how to representing these graphs as strings, https://bendwavy.org/klitzing/explain/dynkin-notation.htm . @MSGJ, Inductiveload:, Can we use this symbols system create a property for Coxeter–Dynkin diagram (Q169451)? If allowed, I will propose the property later. --[雪菲🐉蛋糕🎂] >[娜娜奇🐰鮮果茶☕](☎️·☘️) 15:34, 17 October 2021 (UTC)[]

DOI (P356)Edit

Could somebody have a look on property Property:P356? Somebody changed the constraint violation and now a point is a violation. Thx --Chris.urs-o (talk) 02:37, 13 October 2021 (UTC)[]

@Chris.urs-o: User:Ivan A. Krestinin removed some '\' escape characters from one of the format constraints a couple of weeks ago, but maybe that's not the problem you're seeing? Can you point to a case where there's a problem right now? ArthurPSmith (talk) 17:57, 13 October 2021 (UTC)[]
I was the one removing the unnecessary escapes. The issue was there already though, my changes only made it visible - before my edit, the evaluation of the regex produced an error, so it was unable to determine whether there were any constraint violations. - Nikki (talk) 12:06, 19 October 2021 (UTC)[]

The allowed units constraint and unit dimensionEdit

Hello.

The allowed units constraint (Q21514353) lists units allowed for a quantity-valued property and notifies an editor if a statement was entered using a unit not in that list. If the editor sees the notification then there are two possible things to do: Add the unit to the list of allowed units or consider using (or proposing) a different property.

To help editors in deciding what to do I'll give a quick explanation: A unit, say, joule per metre (Q56023789), is used to express a variety of quantities, indicated by measured physical quantity (P111). This list need not be exhaustive - there might be subclasses of any of those quantities (like radius (Q173817) being a subclass of length (Q36253)), and those can be expressed in the same unit. All quantities related to a unit in this way have something in common: They have the same value for ISQ dimension (P4020). When two units express quantities with the same dimension then those units are called compatible - they can be converted into each other and values expressed in those units can be compared. Now, for our quantity-valued properties we should try to allow only compatible units. An example: If a property initially allowed energy (Q11379) units, say, joule (Q25269) and electronvolt (Q83327), and you want to enter a value given in joule per kilogram (Q57175225), then you should rather find a different property - one that expresses specific energy (Q3023293).

I recently did a survey of properties and found 25 which violated that principle. Some were easy or slightly difficult to fix, now we are down to 17. Here are the remaining ones:

select distinct ?prop ?propLabel where {
  ?prop wikibase:propertyType wikibase:Quantity .
  ?prop p:P2302 [
    ps:P2302 wd:Q21514353 ;
    pq:P2305 / wdt:P111 / wdt:P4020 ?dim1 ;
    pq:P2305 / wdt:P111 / wdt:P4020 ?dim2 ;
  ] .
  filter (! sameTerm(?dim1, ?dim2))
  service wikibase:label { bd:serviceParam wikibase:language "en" }
}
Try it!

Feel free to have a look and see whether any of those remaining ones can be improved or split into more specific properties.

Best wishes, Toni 001 (talk) 08:22, 14 October 2021 (UTC)[]

  • @Toni 001: I don't think that anything can be done with concentration (P6274) and solubility (P2177). These two are used with so many different units that splitting it to different properties would be impractical. Also, I see that molar mass added using mass (P2067) is being deleted in some items — years ago 'molar mass' property was rejected, because of the existence of mass (P2067). It seems that 'molar mass' property should be proposed once again. Wostr (talk) 17:26, 15 October 2021 (UTC)[]
    Yes, previously abandoned property proposals could indeed be revived using the argument of unit compatibility.
    Regarding concentration (P6274): With its 80 values it so far borders on being not-so-notable; I'd suggest that any contributor planning to enter some hundred values consider proposing a new property with a definite unit dimension. Toni 001 (talk) 08:12, 21 October 2021 (UTC)[]

geographic region (Q82794) and territorial entity (Q1496967)Edit

Hello,

The two look very similar to me. Should they be merged? Is one a subclass of the other? --GrandEscogriffe (talk) 20:59, 14 October 2021 (UTC)[]

According to the German-language Wikipedia, geographic region (Q82794) and territorial entity (Q1496967) are two different things. I would have to take a closer look at the German-language articles to explain why this is so. But only because of the articles de:Region (Q82794) and de:Gebiet (Q1496967) you cannot merge the two data objects. --Gymnicus (talk) 21:18, 14 October 2021 (UTC)[]
I would love to hear more from a German speaker. But it seems like de:Gebiet is not so much about a specific concept, as about a word which can mean de:Region or more specific things. At this point I am in favor of disconnecting de:Gebiet from the item (and perhaps giving it an another item about the word Gebiet) and merging. --GrandEscogriffe (talk) 10:57, 15 October 2021 (UTC)[]
I have to admit that I don't understand the breakup you are aiming for. What is the point of this separation, except that the data object then has a higher number? The data object territorial entity (Q1496967) was created in 2012 for the German article de:Gebiet. Why should you change that now? If the information in the data object is incorrect, then you have to correct it, but not simply create a new data object. --Gymnicus (talk) 11:31, 15 October 2021 (UTC)[]
@GrandEscogriffe Isn't geographic region (Q82794) any defined 3D or 2D space anywhere? Eg. region of a galaxy, region in mathematics, etc Vojtěch Dostál (talk) 11:33, 15 October 2021 (UTC)[]
@Vojtěch Dostál: In the German language Wikipedia, the region is defined as follows: „Region bezeichnet in der Geographie und der Raumordnung ein anhand bestimmter Merkmale abgegrenztes Teilgebiet der Erdoberfläche.” (english: “In geography and spatial planning, a region denotes a sub-area of ​​the earth's surface that is delimited on the basis of certain characteristics.”) --Gymnicus (talk) 11:36, 15 October 2021 (UTC)[]
@Vojtěch Dostál: there is a mismatch between the descriptions of geographic region (Q82794) on one side (according to which it can be geographic, spatial or mathematical), and its placement within larger classes and the content of the wp articles on the other side (according to which it is only geographical, i.e. on earth). The descriptions of Q1496967 seem actually more fitted to what Q82794 does.
@Gymnicus: The problem that I want to solve is that about one thousand items which are clearly geographic regions get marked as instances or subclasses of territorial entity (Q1496967) instead of geographic region (Q82794). --GrandEscogriffe (talk) 13:49, 15 October 2021 (UTC)[]
@GrandEscogriffe I guess we should make geographic region (Q82794) more general then, shouldn't we? Generalization of the concept (in line with its description) will not harm the existing uses too much. territorial entity (Q1496967) will then be a subclass of geographic region (Q82794). This way, Wikidata can accomodate both of these items. Would that make sense? Vojtěch Dostál (talk) 15:43, 18 October 2021 (UTC)[]

Described at urlEdit

With this edit someone deleted my addition of a url for a human. They wrote "a mere mention by name is not a 'describe' ", but I believe we have always allowed that. We do not need a full biography of someone. Should the deletion be reversed? --RAN (talk) 21:42, 14 October 2021 (UTC)[]

The page found at the URL does not in any meaningful way at all describe the item's subject, and so it's a completely unsuitable value for described at URL (P973), which as a bare minimum, requires the linked page to describe the subject. Here's the full section relating to the item's subject: "For the 2011/12 season Schöneiche has budgeted 33,000 euros for the disposal of street leaves in front of private properties, says Beate Cyron, deputy head of the construction depot." In what way does that describe Beate Cyron? SMH. --Tagishsimon (talk) 21:53, 14 October 2021 (UTC)[]
If I was to write a biography of an obscure person, I would want to find every scrap of information on that person. Other than your personal animosity towards me, what makes it "completely unsuitable", how information dense is it required to be? It perfectly describes him as the "deputy head of the construction depot". --RAN (talk) 22:00, 14 October 2021 (UTC)[]
You are confusing what might legitimately be used as a reference URL (P854) for an occupation (P106) property statement, with something that describes the subject. For sure there's a continuum between the two, and though it's hard to articulate where the line is drawn, your article is not within many dustbin wagon's length of it, and on the wrong side. --Tagishsimon (talk) 22:32, 14 October 2021 (UTC)[]
Then tell me exactly how described_at_url must be used, so we can construct a bot to delete all that do not meet the requirement. We need objective rules, not subjective ones that are followed by ad hoc deletion, that is how you end up with a database skewed by selection bias. If it can't be described so that a bot can make the decision, we should not have it. --RAN (talk) 23:22, 14 October 2021 (UTC)[]
Your supposition that a bot could be written to analyse whether a page meets a rule-based set of requirements is niave or petulant, not least in the context of "though it's hard to articulate where the line is drawn". Whereas objective rules, to the extent they can be written, are doubtless desirable, we are still left with the unbridgable chasm between an incidental mention in an article about street cleaning, and a description of a person. If a single take-away will pacify you, the linked page should be substantially (i.e. mostly) about the subject and/or provide substantial (a plurality of) information about the subject. --Tagishsimon (talk) 00:07, 15 October 2021 (UTC)[]
  • So substantially=51% of the sentences, or 51% of the paragraphs, or 51% of the pages of a book, which is it? By your rule I could not use the url for a list of war dead for a WWI veteran killed in action, since they are just one name on the list. I could not use a url of a list of Holocaust victims. Would that be true? Both of these are active projects. Argue the case before us, not the person presenting the case. There is no need to call me "niave [sic] or petulant". I would love to hear other people's opinions, thank you for yours. --RAN (talk) 02:48, 15 October 2021 (UTC)[]
Yes, you're getting there. You would not use the url for a list of war dead for a WWI veteran killed in action, since they are just one name on the list. You might use that list as a reference for a claim. Meanwhile, I'm trying not to make rules. You are - still petulantly - wobbling along with "51% of the sentences, or 51% of the paragraphs, or 51% of the pages of a book, which is it?" nonsense, as if the duck test didn't exist. --Tagishsimon (talk) 03:19, 15 October 2021 (UTC)[]
Subjective "duck tests" lead to selection bias, which we should avoid, its another way of saying "I don't like it". --RAN (talk) 06:07, 15 October 2021 (UTC)[]
I don't think described at URL (P973) is appropriate here. --99of9 (talk) 06:42, 15 October 2021 (UTC)[]
gtyhg!-_=+\[ 78.190.239.206 06:41, 16 October 2021 (UTC)[]
Would that URL not be more suitable as a reference for, say, a position held (P39) statement? It seems to me that described at URL (P973) implies that the URL is specifically about the subject (i.e. if a URL were an item, it would have a main subject (P921), the value of that property being the subject). Inductiveload (talk) 11:39, 15 October 2021 (UTC)[]


  • (Someone edit conflicted this response into non-existence so here it is again) I would think it should provide more than a passing mention of the subject to count. If it just provides one or two facts then use it as a reference for those facts instead. This is always going to be subjective though. BrokenSegue (talk) 02:56, 15 October 2021 (UTC)[]
Of course if the deleter had done that as opposed to the deletion, I would not have complained. Of course, not all facts fit neatly into our pre-defined categories, so it is helpful to have a place to store facts that do not fit easily into those categories, so the next editor can find them. Deleted no one sees them. --RAN (talk) 03:32, 18 October 2021 (UTC)[]

Gadgets brokenEdit

Please anyone know how to fix editing scripts: User talk:Magnus Manske/wikidata useful.js#Stopped working and seems User:MichaelSchoenitzer/quickpresets.js too. --Infovarius (talk) 22:12, 15 October 2021 (UTC)[]

I had the same issue with WUS and repaired it according this. JAn Dudík (talk) 08:42, 21 October 2021 (UTC)[]

Population densityEdit

Hello friends,

Do we have a property for "Population density"? Population density is an important component of population. If we don't already have property for population density. Can we create it as a qualifier for population (Property:1082) or as a separate value? Regards. T Cells (talk) 11:40, 16 October 2021 (UTC)[]

@T_Cells: isn't it easily derived by doing population/area (which you can do in SPARQL)?
Or you might be able to use density (P2054) and make a new entity: unit that is "people per unit area." Justin0x2004 (talk) 14:06, 16 October 2021 (UTC)[]
@Justin0x2004:, I wasn't asking about how to determine population density. I was asking if we already have property for population density or as a separate value or a qualifier for population (Property:1082). Regards. T Cells (talk) 11:04, 18 October 2021 (UTC)[]
@T Cells I know but there are some things that we wouldn't store in Wikidata. For example we wouldn't store the number of characters in a person's name because it is easy to derive on the fly. I was thinking that population density might be such a thing. Justin0x2004 (talk) 14:16, 18 October 2021 (UTC)[]
No, it's not such a thing. Popular density is an important component of population. T Cells (talk) 14:45, 18 October 2021 (UTC)[]
@T Cells Yes, it's important, but easily calculable from population and area. Vojtěch Dostál (talk) 15:38, 18 October 2021 (UTC)[]
Also volumetric number density (Q176449) is related... though you want "count per area." Justin0x2004 (talk) 14:34, 16 October 2021 (UTC)[]
It seems that we don't have such a property. We should maintain consistent units within a property, therefore we can't reuse existing population or (mass) density properties. Instead a new property could be proposed. I took the liberty to create items for the corresponding quantity (human population areal density (Q108913965)) and a unit (person per square kilometre (Q108913970)). Toni 001 (talk) 08:47, 17 October 2021 (UTC)[]
@Toni_001:
> We should maintain consistent units within a property
I don't think that is true. For example concentration (P6274) has at least a dozen different units that are allowed. Creating a new property for each distinct unit makes Wikidata less able to semantically generalize. A property proposal for human population areal density (Q108913965) would be excessively specific. e.g. What if I want to represent the population density of Danaus plexippus (Q212398) or badger (Q638105)? Do I need to go through more property proposal processes?
I think we could accommodate this request in a more general way by looking for or proposing a corresponding property of areal number density (Q108914965).
Justin0x2004 (talk) Justin0x2004 (talk) 13:16, 17 October 2021 (UTC)[]
Hello. concentration (P6274) is an outlier and should not be followed, see my detailed explanation here.
Note that human population areal density (Q108913965) is a subclass of population areal density (Q108913962); the latter refers to any species. There are different ways to model population density: one would be to create a property for each species; the other option is to create a more general one and add a species-qualifier. That's up to the community to discuss. Toni 001 (talk) 07:55, 18 October 2021 (UTC)[]
@Toni 001 I think we are creating duplicates.
areal number density (Q108914965) and number of entities per area (Q108914597). Justin0x2004 (talk) 14:25, 18 October 2021 (UTC)[]

I don't think there is a point to explicitly adding population density as a new property if it's easily computable from other properties we already have. BrokenSegue (talk) 19:20, 18 October 2021 (UTC)[]

An argument in favor is being able to enter values as given in a source - without any computation which might incur errors. Toni 001 (talk) 08:32, 19 October 2021 (UTC)[]
@Toni 001 Fair point. Justin0x2004 (talk) 12:31, 22 October 2021 (UTC)[]

postal codeEdit

postal code (P281) is singular (all the labels and descriptions). It is equivalent to <http://www.w3.org/2006/vcard/ns#postal-code> whose label and comment are also singular.

But it looks like we have some use of em dash to designate a range of zip codes (plural). I think the only reason for doing so is that enumerating all the postal codes for a region such as New York City (Q60) makes the web UI content take up a lot of space... and it adds more triples. But clearly if web UI content consideration and triple count wasn't a concern we would want to store each of the postal codes in a range individually.

Do we need another property called "postal code range?" It seems like a mistake to put a range in the same property that we use for single codes.

I wonder how many kludges like this (using a property intended for singular thing for a plural thing instead) we have just because we are worried about web UI content length and triple count.

cc @NM1982:

Justin0x2004 (talk) 13:52, 16 October 2021 (UTC)[]

@Justin0x2004 It sounds very reasonable to me to create a new property, probably two of them - something like "postal code range minimum" and "postal code range maximum". Vojtěch Dostál (talk) 15:35, 18 October 2021 (UTC)[]
I think that would be problematic if a place has two disjoint ranges of postal codes. BrokenSegue (talk) 19:23, 18 October 2021 (UTC)[]
@BrokenSegue True. What about postal code (P281) : somevalue with those two properties in qualifiers? Does it make some sense? Vojtěch Dostál (talk) 08:23, 19 October 2021 (UTC)[]
@Vojtěch Dostál
It could work but I think the simple solution is staring us in the face: just put each postal code in Wikidata.
That is the most SPARQL query friendly thing to do. Justin0x2004 (talk) 23:39, 21 October 2021 (UTC)[]
@Justin0x2004 Yes, SPARQL friendly, but some pages have so many statements they can hardly be loaded in a browser. I wonder if we should try to prevent properties with a potential to have so many statements per item. Or is this not a big deal? Vojtěch Dostál (talk) 05:11, 22 October 2021 (UTC)[]
@Vojtěch Dostál I don't think Wikidata should make data representation sacrifices just because the Wikidata web UI doesn't load defensively. That is to say, I think the Wikidata Web UI should collapse statements groups if it thinks there are too many to display. I contribute to and use Wikidata for the data not the Web UI and I suspect most Wikidata contributors feel the same way. Justin0x2004 (talk) 12:22, 22 October 2021 (UTC)[]
@Justin0x2004 I agree, but it's not only about Wikidata UI. Pages with many statements also have issues in infobox Lua modules and quite possibly in some tools as well. Vojtěch Dostál (talk) 14:04, 22 October 2021 (UTC)[]

Introducing P10000: Research Vocabularies Australia IDEdit

Research Vocabularies Australia ID (P10000) has arrived.

"Research Vocabularies Australia (Q41147961) helps you find, access, and reuse vocabularies for research." This site is part of Australian Research Data Commons (Q4824459). The property, created today by User:UWashPrincipalCataloger (Thanks!), is an identifier for a vocabulary (not necessarily Australian) in the R.V.A. database. Many of these vocabularies are themselves well known to us, and have identifier properties on Wikidata, such as Getty Thesaurus of Geographic Names ID (P1667), ANZSRC 2020 FoR ID (P8529), and Australian Faunal Directory ID (P6039). My hope is that we may find more databases in their growing set that we in turn may use as useful properties. If you would like to help matching this catalogue, head over to (brilliant tool) Mix'n'Match set 4770.

This milestone arrived after a week of 42 property approvals, and there are plenty more to discuss in the pipeline. (15 of those proposed in the last week are Australia-related, because I was angling at hooking this milestone! Amongst those, I think R.V.A. is perhaps the most appropriate meta-vocabulary-data for a celebration of our project.)

I'm a particular fan of identifier properties, because they make Wikidata the backbone of the authority control network. They also expand the utility of Entity Explosion, a free browser extension I built to showcase and make use of Wikidata external links. So I've been keen for a long time on making sure all high-quality online databases in my country have properties and Mix'n'Match sets, and use a properties-by-country dashboard to keep track of how each country is going (using the brilliant integraality). If you're interested in proposing a property about your speciality, your interests, or your location, please don't hesitate. The template is intimidating at first, but you can always view the source of other proposals, and other users will come to your aid anyway. If you're already an experienced user, especially if you are scraping or regex-savvy, I'd suggest challenging yourself by setting up a Mix'n'Match set for a property that doesn't yet have one.

Congratulations to the community on jointly building the amazing project that is Wikidata. --99of9 (talk) 03:31, 17 October 2021 (UTC)[]

+1 ArthurPSmith (talk) 16:55, 18 October 2021 (UTC)[]

Please add an underscore as part of the scheme for Swedish Portrait IDEdit

See Gustaf Edström (Q5627420) where the url scheme uses an underscore. --RAN (talk) 07:00, 18 October 2021 (UTC)[]

Perfect, thanks! --RAN (talk) 00:14, 19 October 2021 (UTC)[]

How to set quantity (P1114)/numeric value (P1181) as infinity, or other special values?Edit

Some geometric shape (Q815741) need to describe by an Euler characteristic (Q852973). For example, a cube (Q812880) has 6 faces, 12 edges, and 8 vertices:

has parts of the class
  face
quantity 6
shape square
0 references
add reference
  side
quantity 12
0 references
add reference
  vertex
quantity 8
0 references
add reference


add value
has facet polytope
  square
quantity 6
0 references
add reference


add value

But some geometric shape (Q815741) has infinitely many faces, edges or vertices. for example apeirogonal tiling (Q4779315):

has parts of the class
  face
quantity 2
shape apeirogon
0 references
add reference
  side
quantity infinity
0 references
add reference
  vertex
quantity infinity
0 references
add reference


add value

Another example is apeirogon (Q4779316). It contains infinitely many edges:

has facet polytope
  side
quantity infinity
0 references
add reference


add value

So, How to set quantity (P1114) as infinity?--[雪菲🐉蛋糕🎂] >[娜娜奇🐰鮮果茶☕](☎️·☘️) 08:20, 18 October 2021 (UTC)[]

To my knowledge there is no way to represent infinity in a quantity-valued property. Toni 001 (talk) 11:01, 18 October 2021 (UTC)[]
@Toni 001:. So, How can I describe "apeirogon (Q4779316) contains infinitely many edges" and "Mucube (Q11420123) contains infinitely many square (Q164) faces" in wikidata? value known, but too large for datatype (Q54767019)??--[雪菲🐉蛋糕🎂] >[娜娜奇🐰鮮果茶☕](☎️·☘️) 14:13, 18 October 2021 (UTC)[]
Perhaps use <no value> on any numeric properties and then use has quality (P1552) instead to state that certain property values are infinite? defining formula (P2534) could potentially be used in some way to represent irrational numbers. --Dhx1 (talk) 13:53, 19 October 2021 (UTC)[]

Compiling Wikiquote contributions stats through WikidataEdit

Hello, How can I use Wikidata to compile stats of how many Wikiquote have been contributed in the last 30 or 60 days is there a query for such regards, user:Shoodho

@Shoodho: Wikidata may not be the best tool for that, depending on how you define "contributing a Wikiquote". Do you mean all new pages created on any Wikiquote language version in a given time window? Or do you only count new Wikidata items, or updated Wikidata items? Vojtěch Dostál (talk) 15:33, 18 October 2021 (UTC)[]
Hello, is it possible to get a sorted list for the new pages Wikiquote articles per language, and also to find out improved or edited Wikiquote articles. Which tool would you recommend if Wikidata is not the best tool. Shoodho (talk) 17:27, 18 October 2021 (UTC)[]
For new pages I'd recommend using Special:NewPages (exists in all language versions). For statistics, you better use https://quarry.wmcloud.org/ but I'm unable to write a quick query for you there. Someone else might be able to help you. Same for edited pages there. Vojtěch Dostál (talk) 09:01, 19 October 2021 (UTC)[]

Wikidata weekly summary #490Edit

Coolest Tool Award 2021: Call for nominationsEdit

The third edition of the m:Coolest Tool Award is looking for nominations!

Tools play an essential role for the Wikimedia projects, and so do the many volunteer developers who experiment with new ideas and develop and maintain local and global solutions to support the Wikimedia communities. The Coolest Tool Award aims to recognize and celebrate the coolest tools in a variety of categories.

The awarded projects will be announced and showcased in a virtual ceremony in December. Deadline to submit nominations is October 27. More information: m:Coolest Tool Award. Thanks for your recommendations! -- 2021 Coolest Tool Academy team

Complex multi level property ontology design?Edit

Hi there,

I am considering using Wikidata as a backing database for an academic project (read: eventually described in a literature publication) to have computationally accessible & rich metadata for all the bioluminescent species (aka taxa) of the world.

As I see it now, this would involve proposing quite a few Wikidata properties. I understand the process to submit a new Wikidata property proposal (See: https://www.wikidata.org/wiki/Property:P6800 , proposal here: https://www.wikidata.org/wiki/Wikidata:Property_proposal/has_sequenced_genome), but what I don't understand is how to design a more complex Wikidata ontology, e.g. that has sub-properties or uses qualifiers. For example, I might propose this property or other sub-properties (or qualifiers):

  • taxon has bioluminescence (boolean true/false)
  • taxon has autogenic bioluminescence (boolean true/false) (could be a qualifer on the above property, or an independent "sub-property"?)
  • taxon has symbiotic bioluminescence (boolean true/false)


  • taxon has body-internal bioluminescence (boolean true/false) [This could even be divided into cell intrinsic vs body cavity, as some taxa secrete into a body cavity]
  • taxon has body-external secreted bioluminescence (boolean true/false) [e.g. Cypridinidae, some ostracods]


  • taxon has neuronal control of bioluminescent intensity and/or kinetics (boolean true/false) [Restricted to animals]
  • taxon effector neurotransmitter for bioluminescent control is (Wikidata compound link, e.g. https://www.wikidata.org/wiki/Q424979, for Photinus pyralis)
  • taxon produces single-color bioluminescece (boolean true/false)
  • taxon produces multi-color bioluminesence (boolean true/false)


  • taxon in vivo bioluminescent peak emission wavelength (integer / float, would be wavelengths in nanometers, and would have to be flexible as there may be multicolor bioluminescence)
  • taxon luciferase in vitro bioluminescent peak emission wavelength (integer / float, would be wavelengths in nanometers, and would have to be flexible as there may be multicolor bioluminescence)



  • taxon high-level habitat is (Something like marine vs terrestrial vs freshwater)

The other trick, is while the most "explicit" way to apply these properties, would be to apply them to lower level taxon Wikidata items (e.g. all 2000+ firefly [Lampyridae] species/Wikidata items that are thought to be bioluminescent), an "easier" way would be to apply to a higher level taxon like the family Lampyridae (https://www.wikidata.org/wiki/Q25420). So, is there a way to specify in Wikidata that the property would automatically apply to all those descendent taxa that do not yet have an annotation, or would a Wikidata Bot be the only way to do that?

Another complexity: is even in some higher level taxonomic groupings that are almost entirely bioluminescent (e.g. Ctenophora) there are exceptions to the rule with lineages that seemingly are not luminescent. Would using the "opposite of" qualifier with the ``taxon has bioluminescence`` property be the correct way to apply annotate this fact?

Another complexity: Bioluminescence has independently evolved 100+ times in 2 of 3 of the domains of life (Bacteria and Eukaryota), and really varies quite a huge degree in terms of its physiology and ecological roles. So, the ontology has to take this into consideration: a too restrictive ontology or a too broad ontology will almost immediately run up against edge-cases in bioluminescence that will be an issue. In a way, I think bioluminescence is a way to "stress test" the other aspects of taxonomy, Cheminformatics, & ecology-informatics that are aspired to be developed on Wikidata or similar databases.

So, as you can see, I have a lot of thoughts. But the main thought: there could be a lot of ways to structure these properties/qualifiers as they do have inter-relationships that should be described / controlled.

Your thoughts? Are there some good guides/books on the structure of triplestore / graph databases & Wikidata in particular? and Are there any favorite tools to scope out such an ontology structure, or is it just drawing it out in a graph diagram tool like Draw.IO (https://github.com/jgraph/drawio-desktop)?

I will mention, the luciferins used in bioluminesence are natural products. I have been having some discussions with others in the Cheminformatic / Wikidata field (like @Egon_Willighagen) that the existing "found in taxon" property (https://www.wikidata.org/wiki/Property:P703) alone, is IMO, too generic for cheminformatics in the bioluminescence field. A compound or luciferase being "found in a taxon" is necessary but not sufficient to demonstrate other interesting biological things like small molecule or protein biosynthesis in the annotated taxon (coelenterazine, is widely distributed in oceanic food webs & is acquired in the diet & is accumulated in specialized ways & even possibly recycled) or the luciferase can be obtained from the diet and not encoded in the genome of the organisms which has specialized physiology to use and control it (See here: https://en.wikipedia.org/wiki/Kleptoprotein).

Apologies for pinging everybody if this is not the right way or place to discuss things. I am relatively new to Wikidata so please let me know if this is against norms. I also appreciate frank feedback if Wikidata is not the right place / not technically well suited for such a project.

Tobias1984 (talk) Andy Mabbett (Pigsonthewing); Talk to Andy; * *Andy's edits TypingAway (talk) Daniel Mietchen (talk) Tinm (talk) Tubezlob Vincnet41 Netha Hussain Fractaler Tris T7 TT me Photocyte Nomen ad hoc GoEThe (talk)
  Notified participants of WikiProject Biology

Saehrimnir
Leyo
Snipre
Jasper Deng
Dcirovic
Walkerma
Egon Willighagen
Denise Slenter
Daniel Mietchen
Kopiersperre
Emily Temple-Wood
Pablo Busatto (Almondega)
Antony Williams (EPA)
TomT0m
Wostr
Devon Fyson
User:DePiep
User:DavRosen
Benjaminabel
99of9
Kubaello
Fractaler
Sebotic
Netha
Hugo
Samuel Clark
Tris T7
Leiem
Christianhauck
SCIdude
Binter
Photocyte
Robert Giessmann
Cord Wiljes
Jonathan Bisson
GrndStt
Ameisenigel
Charles Tapley Hoyt
ChemHobby
Peter Murray-Rust
Erfurth
  Notified participants of WikiProject Chemistry

Photocyte (talk) 21:52, 18 October 2021 (UTC)[]

Unconvinced that new properties are required in many cases. has quality (P1552) with appropriate values (e.g. "has bioluminescence", "has autogenic bioluminescence", "has symbiotic bioluminescence" &c) might possibly work just as well?

Thank you, I was not aware of such an approach. I will consider that as well. Photocyte (talk) 05:57, 19 October 2021 (UTC)[]

Agreed. That is a good path forward for most of your objectives. You can start by making items for things like "autogenic bioluminescence" which we haven't yet described in linked data! --99of9 (talk) 06:39, 19 October 2021 (UTC)[]
There are two levels to annotate this, as there are two types of evidence. The evidence you probably have in mind is the observation of bioluminescence by the biologist describing the species. I agree with User:Photocyte how to handle this. The second level is the biochemical characterization of molecules and enzymes taking part in the biological process bioluminescence (Q179924). This usually is done by imports from the external UniProt database and results in statements like
If you are planning to add biochemical evidence please use these four statements as template. --SCIdude (talk) 07:09, 19 October 2021 (UTC)[]

Thanks for your feedback & highlighting these existing properties@SCIdude! I will note, I added the P362 / "found in taxon" claims which you cite (https://www.wikidata.org/wiki/Q27125143#P361) , however that term is not sufficient to indicate biosynthesis in that taxa, which is really what I want to annotate, (vs dietary acquisition), but such an additional property biosynthesized by taxon is a broader natural products concept than something that would just apply to bioluminescence alone, so I would love to hear feedback on from the cheminformatic / natural product community as there may already be existing plans to structure this concept on Wikidata Photocyte (talk) 19:39, 21 October 2021 (UTC) edit of: 16:19, 19 October 2021 (UTC)[]

Just as a general note, Wikidata does not have a "Boolean" datatype, and I don't believe we have any properties that can be regarded as Boolean-valued. Boolean data can best be accommodated as suggested above, with an item-valued property. The non-Boolean properties suggested above are probably fine as far as I am aware. ArthurPSmith (talk) 16:52, 19 October 2021 (UTC)[]
We have some pseudo-Boolean properties, for external identifiers where the value, if any, is always the QID of the item on which it appears. We don't need to store the QID a second time: all we are really doing is indicating "true". Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:53, 19 October 2021 (UTC)[]
I have also just discovered emergency services (P6855), which has example values including "yes " and "no". Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:47, 23 October 2021 (UTC)[]

Please consider: either you know from analysis/mass spectroscopy that a molecule occurs in a taxon, this is modeled by P703 / "found in taxon". Or you know from characterization of specific enzyme that a molecule is biosynthesized. Biosynthesis of molecules is modeled in the Gene Ontology. In that case the enzyme item gets a P682 process statement. If your specific process is not in GO, submit an addition to them. --SCIdude (talk) 06:43, 20 October 2021 (UTC)[]

Thanks for feedback all! @SciDude, regarding GO annotation of enzyme biosynthetic activity: understood. However, you can know if a taxon biosynthesizes a given molecule, without any enzyme characterization, and I can even argue that enzyme characterization alone, e.g. heterologous expression, is insufficient to demonstrate an actual functional role in the taxon's biosynthesis of the taxon even if a literature source states that - see glycosyltransferases which have known broad substrate scopes & are very abundant family of enzymes in genomes - definitely possible that "the" glycosyltransferase for a given biosynthesis in vitro is not "the" in vivo. But in short, totally understood that there is a way to structure this by pushing more annotations at the gene/enzyme level & using Gene ontology (GO) type annotations. But at this stage, I'm not looking to recapitulate the functional annotations that Uniprot is doing (IMO, a pretty good job with) at Wikidata, but more interested in properties that make sense as a taxon annotation, both for bioluminescence (where they are unique) & for the greater natural products / cheminformatics community (where they intersect) Photocyte (talk) 19:39, 21 October 2021 (UTC)[]

is this possible?Edit

At the English Wikipedia there are 87,000+ individual taxonomy templates. These templates form one-way linked lists working up from an initial taxon. en:Module:Autotaxobox is the tool that stiches them together. This works well enough for en.wiki but to be useable on other-language wikis, requires the import of all 87,000+ data templates and the base templates and modules that render the taxoboxen. And, of course, there is the accompanying maintenance headache of keeping the data at the various individual wikis synchronized.

One possible solution to this problem is to convert all of the 87,000+ template into a series of lua data modules. I have written some awb scripts and supporting lua functions to demonstrate that the 87,000+ templates can be converted to 100 or so lua data modules. That still leaves the maintenance headache because how does the everyday editor create a new taxon in the data? It is likely that the everyday editor will shy away from editing lua code.

There have been suggestions that wikidata is a possible solution. But, the existing wikidata taxonomy structure is not compatible with the autotaxobox structure at en.wiki. For the purposes of demonstration, I have hacked a bit of lua code that, beginning at the genus Felis, crawls the taxonomy tree through the lua data tables (left column) and through wikidata (center column). The output from Module:Autotaxobox is on the right as a reference. See en:Module talk:Sandbox/trappist the monk/taxonomy. The lua data version is more or less correct, though there are still bugs in my code, but the wikidata version is wildly different.

My question: Is it possible to clone the data from the 87,000+ taxonomy templates into wikidata where they would exist in some sort of isolation from the current taxonomy data? As I see it there is no reason for the Autotaxobox data to impinge on the existing taxonomy data nor any desire for the two to share. Certainly, they could share properties (P225 taxon name, P105 taxon rank, P171 parent taxon). As separate data items the taxon are easier for editors to create and edit so there is a reduced chance that inexperienced editors will break something wholly unrelated. As a separate collection of data, other-language wikipedias would only need the base templates and modules that render the taxoboxen.

Is this possible?

Trappist the monk (talk) 23:41, 18 October 2021 (UTC)[]

this sounds possible though I do wonder what new properties you are imagining. Wikidata already has a lot of taxonomic information. What information is missing? Why would you want the information to live in isolation from the current data? BrokenSegue (talk) 03:30, 19 October 2021 (UTC)[]
At en.wiki there is an essay that describes why the current taxonomy structure in Wikidata is not appropriate for the task of creating taxoboxen.
Each of the 87,000+ taxon names (one per taxonomy template at en.wiki) gets a unique qid. For example there will be a new Felis qid with a description that includes the word 'Autotaxobox'. Each of these autotaxobox qids will have a property for each parameter supported by the en.wiki taxonomy templates. These parameters and how their functionality might be implemented as properties are:
|parent= – holds the qid for the current taxon name's parent – much as parent taxon (P171) does at Felis (Q228283) – perhaps P171 can be used for the autotaxobox taxonomies
|rank= – holds the taxon name's taxonomic rank – a text string is sufficient though taxon rank (P105) might be used to hold the qid for the taxon name's rank. There are about 130 taxonomy ranks used by the en.wiki taxonomy templates. I don't know how many of those ranks are listed in wikidata. Is there an easy way to get a list of all of the P105 taxon ranks?
|link= – an internationalized link label (should be named as a label) that falls back to English. The link label itself is plain text (I don't know how i18n would work for this – a language code qualifier?). The link label is used with the site link retrieved from the taxon name's qid so:
[[:<lang>:<site link>|<link label>]]
[[:vi:Abaciscus (bướm đêm)|Abaciscus]]Abaciscus
|extinct= – boolean true when a taxon name is extinct; boolean false or no value when not extinct
|always_display= – boolean true to force the display of a taxon name in a taxonomy list; normally boolean false or no value
|refs= – one or more plain-text references that support the choice of values above; may contain template markup ({{cite book |...}})
|same_as= – holds the qid of a taxon name that this taxon name will take data from; this allows the taxonomy list to skip over sections of the taxonomy tree that aren't appropriate or necessary for the current taxonomy list
I thought that I answered your "Why would you want the information to live in isolation" question in my initial post. Tell me what was unclear in that post and I'll try to do better.
Trappist the monk (talk) 13:47, 19 October 2021 (UTC)[]
You wrote "as separate data items the taxon are easier for editors to create and edit so there is a reduced chance that inexperienced editors will break something wholly unrelated". I don't get either concern honestly. BrokenSegue (talk) 14:36, 19 October 2021 (UTC)[]
en:Template:Taxonomy/Felis is one of the 87,000+ taxonomy templates. The internals of that template look like this:
{{Don't edit this line {{{machine code|}}}
|rank=genus
|link=Felis
|parent=Felinae
}}
Were we to transfer that template to wikidata, we would create an autotaxobox qid for it and give it 'rank', 'link', and 'parent' properties. The other properties 'extinct', 'always_display', 'refs', and 'same_as' should probably be present but set to no value. This collection of properties assigned to the new Felis autotaxobox qid is the separate data item to which I referred. Editors working on autotaxobox taxa have no need to edit taxonomy data at wikidata that is not related to autotaxoboxen. Especially for non-expert editors, I believe that simple is best. The autotaxobox qids do not need: image (P18), start time (P580), taxon range map image (P181), collage image (P2716), described by source (P1343), Commons category (P373), topic's main category (P910), earliest date (P1319), or any of the 30-ish identifiers available at Felis (Q228283). All of that is just stuff that serves no purpose for the 87,000+ autotaxobox qids that would make up the complete data set. The data set should be simple to maintain.
Is this a better explanation?
Trappist the monk (talk) 17:09, 19 October 2021 (UTC)[]
AFAIK, all taxodata was copied from enwiki long time ago in an early stage of Wikidata. Also the thing you are probably missing is Template:Automatic taxobox (Q6705326) - this template exists in 75 editions of Wikipedia (including enwiki), but does not exist in dewiki/frwiki/eswiki. Ruwiki have already replaced ~80% of classic taxoboxes with wikidata-based taxoboxes. The problems there were mostly related to Lua memory, not to lack of data. Lockal (talk) 07:37, 19 October 2021 (UTC)[]
I don't know what you mean by "Also the thing you are probably missing is Template:Automatic taxobox (Q6705326)". I don't think that I'm missing that template. Template:Automatic taxobox is implemented by Module:Automated taxobox (Q61472014) which takes the value assigned to the template's |taxon= parameter as the starting point when it creates the taxonomy list. The taxonomy list is created using the data from a handful of the 87,000+ taxonomy templates.
Yes, lua memory is an issue that is solved at en.wiki by discarding a data module after it has been used. The issue for a lua implementation is making the data 'editable' by non-technical editors who know nothing about lua syntax. And you still have the interwiki importation and synchronization issues to deal with.
Trappist the monk (talk) 13:47, 19 October 2021 (UTC)[]
Ah, sorry, you are correct, and my information is wrong. I thought ruwiki was able to get rid of NN000 templates in implementation, but I lost the track and no one managed to do it. Lockal (talk) 14:56, 19 October 2021 (UTC)[]
99of9
Achim Raschka (talk)
Andrawaag (talk)
Brya (talk)
CanadianCodhead (talk)
Canley
Circeus
Dan Koehl (talk)
Daniel Mietchen (talk)
Eewilson (talk)
Enwebb
Faendalimas
FelixReimann (talk)
Hyperik (talk)
Infomuse (talk)
Infovarius (talk)
Jean-Marc Vanel
Joel Sachs
Klortho (talk)
Lymantria (talk)
Magnefl (talk)
MPF
Manojk
MargaretRDonald
Mellis (talk)
Michael Goodyear
Mr. Fulano (talk)
Myrmoteras (talk)
Nis Jørgensen
Oronsay
PEAK99
Peter Coxhead
PhiLiP
Andy Mabbett (talk)
Plantdrew
Prot D
pvmoutside
RaboKarbakian
Rod Page
Strobilomyces (talk)
Stuchka (talk)
Succu (talk)
TiagoLubiana (talk)
Tinm
Tom.Reding
TomT0m
Tommy Kronkvist (talk)
Totodu74 (talk)
Tris T7 TT me
Tubezlob
William Avery
Minorax
Culex
Koala0090
Mike Krüger
Friesen5000
Salgo60
TED
GoEThe (talk)
Estopedist1
Leptospira
Melissadilara
Lagewi
Luca.favorido
JJ Ford BHL
Mzaki
Metacladistics
  Notified participants of WikiProject Taxonomy; pinging the experts --Azertus (talk) 14:29, 19 October 2021 (UTC)[]
I'm the user that started this conversation some weeks ago. I'm an admin at SqWiki and when I wanted to update our current taxonomy system to be EnWiki standard I found out that I needed to import ~90k templates to do that while keeping up with an unusual way of template usage which started this whole conversation. Ever since, I've been following the discussions about the subject and I have a somewhat naive question: One of the most common arguments that gets thrown around is that editors need to be able to easily edit each taxon often. Is the whole taxonomic system really supposed to be THAT dynamic? I fully support technical simplifications but I really don't totally grasp the emphasize of the dynamic part. I do understand that, like many other things, no taxon is ever set in stone and they're ever evolving + we continuously get new information that changes our Tree of Life but I'm not sure if this case is that different from any other scientific systems that use tree like structures. I haven't seen the dynamic aspect being put as a main criteria in their technical designs. I mean, sure, if the system could also satisfy that point then, why not, but having that as a main aim in technical design looks a bit strange in my eyes, like having a bakery project with one of its main aims of designing an infrastructure to allow selling bread in 0.3 seconds for each customer. But maybe I'm wrong and that dynamic aspect should be an important part of the design. - Klein Muçi (talk) 12:15, 20 October 2021 (UTC)[]
If you look at en:Module talk:Sandbox/trappist the monk/taxonomy and compare the left-most taxonomy list with the reference list on the right, you will see that the two lists are not the same. The data for the leftmost list was compiled from the 87,000+ taxonomy templates on 15 October. And then over the next two days an editor changed three of those templates and created another:
en:Template:Taxonomy/Mammalia/skip changed parent 16 October
en:Template:Taxonomy/Theriimorpha created 16 October
en:Template:Taxonomy/Theriiformes changed parent 16 October
en:Template:Taxonomy/Trechnotheria changed parent 17 October 2021
This, of course, highlights a drawback of the lua data module system... Editors will edit. Making it easy for them to do so in a way that they don't break unrelated stuff is a good thing.
If you can sell bread faster than your competitor and thereby sell more bread than your competitor, why wouldn't you do that? Of course what that may require is that you now sell something that only vaguely resembles bread à la 'Wonderbread'.
Trappist the monk (talk) 13:25, 20 October 2021 (UTC)[]
I examined your examples above and I'm still a bit in a dilemma whether the need for dynamic edits comes as a lack of information currently on the subject from Wikipedia or from advances/changes in new studies.
For example, the date in this ref in this edit would suggest that that dynamic needs comes from the lack of information currently.
On this other hand, the summary on this edit makes you think that need for dynamic edits is coming from new advances in scientific discourse.
(Of course the data set of evaluation was only 4 from out of ~90k cases.)
But why does this matter, all may say, given that in both cases you still need to dynamically edit? Well, that's because if the first case is true in most of the cases than the problem can be simplified into "we all wait a bit more until the information is there to support the core of the auto-taxonomic system". And then we start with the Lua method anyway given that it will support 80% of the cases. 20% of the cases can be added by experienced Lua editors (if we are to agree that Lua modules are beyond the tech threshold of normal editors) and that percentage is expected to get lower as more time passes. If the second case is true for most of the cases and the bio-taxonomy is inherently dynamic in nature or we live in an era of grand genetics discoveries, then the problem of dynamic edits should really be considered a priority in devising whatever system shall be used for it because the aforementioned supposed percentage of "unsupported cases" won't get lower as time passes, at least relatively speaking. - Klein Muçi (talk) 15:05, 20 October 2021 (UTC)[]
Alas, referencing in the 87,000+ taxonomy templates is rather lacking. en:Module:Sandbox/trappist the monk/taxonomy T4 (one of the 34 lua data modules needed to create the left-hand list) has the data from 1000 taxonomy templates. Of those 1000 templates, 368 have references. References in taxonomy templates are not required (see en:Wikipedia:Automated taxobox system/taxonomy templates#refs). That a reference was added is a good thing.
In the :en:WT:TOL discussions a recurring topic is editability so it isn't just me who is saying that. Right now, the autotaxobox system data is eminently editable. Moving that data here keeps the data editable and also makes the whole data set available to all wikipedias for the remarkably low cost of importing a few modules and templates to do the data fetching and rendering.
Trappist the monk (talk) 16:10, 20 October 2021 (UTC)[]
Yes, I'm aware that that topic is a recurring one. That's why I said that I'm seeing the "editability" argument thrown around a lot. Anyway if the solution is thought to be found, of course there are no objections to it. It's just that it was starting to seem a bit strange to me that we were arguing that taxonomy had this inherited attribute of being dynamic by itself which was starting to "veto" some of the ideas that were being proposed even more than the Lua memory usage issue, which seemed like a technical hardblock. I have some past experiences on somehow similar occasions (but still much diverse) where "the opposition" was mostly coming by the overall inconvenience new things/workflows bring in general. But as I said, if we can have an automatic system that also provides dynamic editability it would be even better. And given that you argue that we already have a solution (at least a blueprint of it) that satisfies all factors, then of course further arguing on already solved problems is detrimental. - Klein Muçi (talk) 21:43, 20 October 2021 (UTC)[]
en.wiki's set of taxonomy templates is far from being complete. Currently it is important that it be easy to create new taxonomy templates. Estimates of accepted genera range from 175,000 to 300,000+, and while most of the 87,000 existing templates are for genera some are for higher taxonomic ranks. en.wiki is very unlikely to have a taxonomy template for a taxon which doesn't yet have an article there. Around 1/3 of en.wiki's taxon article are not using the automatic taxobox system (I compile statistics on use of automatic taxoboxes in en.wiki: this is the most recent update) There probably aren't very many taxon articles on sq.wiki that don't have a corresponding taxonomy templatte on en.wiki, but that isn't true for all languages (ceb, war and sv wikis have more taxon articles than en does). Plantdrew (talk) 23:14, 20 October 2021 (UTC)[]

I don't think that I've gotten an answer to my is-this-possible question. Have I? Am I to interpret the lack of a definitive answer, as an answer in the negative? Did I ask this question in the correct place? If not, where is the better place for me to ask it?

Trappist the monk (talk) 13:35, 23 October 2021 (UTC)[]

I think the answer is it's technically possible, but not sensible (and therefore not possible to get consusus to do it). Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:50, 23 October 2021 (UTC)[]
@Pigsonthewing: Why is it "not sensible"?
Trappist the monk (talk) 11:24, 24 October 2021 (UTC)[]
For all the reasons given and implied above. You said you don't think that you've gotten an answer. I'm pointing out that you have. (But see also DRY.) Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:11, 24 October 2021 (UTC)[]

Coolest Tool Award 2021: Call for nominationsEdit

The third edition of the m:Coolest Tool Award is looking for nominations!

Tools play an essential role for the Wikimedia projects, and so do the many volunteer developers who experiment with new ideas and develop and maintain local and global solutions to support the Wikimedia communities. The Coolest Tool Award aims to recognize and celebrate the coolest tools in a variety of categories.

The awarded projects will be announced and showcased in a virtual ceremony in December. Deadline to submit nominations is October 27. More information: m:Coolest Tool Award. Thanks for your recommendations! -- SSethi (WMF) for the 2021 Coolest Tool Academy team 05:57, 19 October 2021 (UTC)[]

tbh, this goes down like a cup of baby sick, in the context of the tools we actually depend on - Petscan and Listeria, for instance - being somewhat broken. I'd very much trade 'cool tools' for WMF support for basic working tools & fit for purpose UI. WMF's concentration on frothy initiatives like this, rather than on assisting in the provision of core tools, seems to me to be an abdication of responsibility by a group of people who do not eat their own dogfood, are unaffected by the shortcomings, and who are frankly semi- or wholly-detached from - even ignorant of - the practical work involved in adding and curating content on WP and WD. --Tagishsimon (talk) 02:14, 21 October 2021 (UTC)[]

Constraints for PodcastsEdit

Why are there constraint flags on, for example, dot com: The Wikipedia Story (Q108929215), saying it should not have author (P50), inception (P571) or full work available at URL (P953) values? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:38, 19 October 2021 (UTC)[]

@Pigsonthewing: there's a couple of cases where the community seems to prefer start time (P580) and end time (P582) instead of inception (P571) & co. I believe author (P50) is thought to be ambiguous in the case of podcasts. There's always the generic creator (P170), but founded by (P112), producer (P162) and presenter (P371) are often a good fit as well.
The constraint against full work available at URL (P953) is debatable. I'd say the intention is to only link podcasts episodes directly to their download links. Ideally, the acast and Amazon link would be external identifiers. If the full work/podcast was made available on a one-off website by the creator, I'd say full work available at URL (P953) would be the best fit. You could also change the acast/amazon links to go directly to the feed and use web feed URL (P1019), or keep them and use described at URL (P973). --Azertus (talk) 14:19, 19 October 2021 (UTC)[]
Thank you. I'm looking for the reasons why the community might prefer such constraints (and indeed, looking to determine whether or not there is consensus). In this specific example, the writer is also the producer, but is not the presenter; "creator" is simply too vague. She is credited as the "writer", for which "author" is a synonym. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:56, 19 October 2021 (UTC)[]

Link Wikidata ids to Wikipedia textEdit

I am trying to build a multilingual dataset from Wikipedia. As part of its features, I would like the entities mentioned in the text to be related to their Wikidata id.

For example, given the page "Wikidata", the intro states: Wikidata is a collaboratively edited multilingual knowledge graph hosted by the Wikimedia Foundation.[2] It is a common source of open data that Wikimedia projects such as Wikipedia,[3][4] and anyone else, can use under the CC0 public domain license. [...]

Here, each link points to a Wikipedia page, which should have its own Wikidata id.

At minimum, for collaboratively edited (page: Wiki) I would like to get Q171 (Wikidata item for Wiki). At best, I would like to get the id, the item name, the position in text.

I need to do this for the whole Wikipedia, in multiple languages. Thus, I am looking for a solution that is very fast (a dump that already contains this info would be great, but I am not sure it exists).

Do you have any advice?

I've worked some on this. I do not think there is a pre-assembled dataset that includes this. You'll have to assemble it yourself. BrokenSegue (talk) 16:39, 19 October 2021 (UTC)[]
Thanks for your answer. Do you have any reference for where to start? It seems to me it should be relatively easy (basically each internal link is a Wikidata item) but I cannot find many references and it is the first time I try to work with Wiki-related APIs and files. 5.170.104.126 17:01, 19 October 2021 (UTC)[]
It sounds similar to KELM-corpus, which is also described here. You can probably get what you want after combining quadruples from TEKGEN and kelm_generated_corpus.jsonl. Lockal (talk) 12:51, 20 October 2021 (UTC)[]

edit warringEdit

having a little conflict with Kirilloparma (talkcontribslogs). In my opinion, the number of available languages via {{Label}} is more important than its singular/plural form. - Coagulans (talk) 15:22, 19 October 2021 (UTC)[]

From here. After my explanation user proceeded to revert my edits instead of discussing. He believes that this is the standardization of country navigation templates, however I don't see any consensus on that and as I explained There are dozens of templates that are using plural (see Template:Games properties, Template:Authority control properties, Template:Denmark properties, Template:Italy properties, Template:Lithuania properties etc.) and I don't see any problem here as each language should have its own grammatically correct label, so please let the other users to feel free to improve this template in good faith instead of removing or reverting ([2], [3]) without any good reason. Coagulans for some reason is not allowing me to improve this template despite that I'm assuming the good faith. He's suggesting to use an automated template {{Label}}, which is not always the best option here. So the question is: why I'm not free to improve this template without using automated template {{Label}}, is there any consensus that it must be necessarily used? Looking at Template:Authority control properties or Template:Geology properties as example I see no such schema suggested by Coagulans. Thoughts? Regards Kirilloparma (talk) 16:41, 19 October 2021 (UTC)[]
Available languages:
for video game (Q7889): {{Label}} - 131
for video games: {{LangSwitch}} - 9
Both forms are grammatically correct. - Coagulans (talk) 11:34, 20 October 2021 (UTC)[]

A little idea came to meEdit

Hello everyone,
I suggest you merge Q100000000 (non-existent) with Wikipedia 20 (Q100235488) (Merge with the oldest Item). CC @Kaganer, 99of9:. Cordially. —Eihel (talk) 14:21, 20 October 2021 (UTC)[]

if it is techinically feasible, that would be a nice idea ! this way, a very symbolic QID would bear a very symbolic item, celebrating a long time success ! --Hsarrazin (talk) 18:03, 20 October 2021 (UTC)[]
Good idea! If it's technically possible, I encourage its realization: it's not so disturbing and it's nice to have an additional symbolic number (as has been done on some items at the very beginning). — Baidax 💬 06:25, 22 October 2021 (UTC)[]
<head desk> --Tagishsimon (talk) 08:09, 22 October 2021 (UTC)[]

new girlEdit

i am new too wikipedia can anyone tell me what else we do here  – The preceding unsigned comment was added by Szastrocky.2023 (talk • contribs) at 14:47‎, 20 October 2021 (UTC).

Responded on user talk Vahurzpu (talk) 18:47, 20 October 2021 (UTC)[]

number of matches played/races/starts (P1350)Edit

Why not number of competitions played? Eurohunter (talk) 18:11, 20 October 2021 (UTC)[]

A competition - such as the FIFA World Cup - involves multiple matches, and so number of competitions played != number of matches played. Also, whilst we're here: you must surely know that if you use a template for a talk page section header, users cannot navigate from watch and hisory lists to the section. It's very much a suboptimal thing to do. --Tagishsimon (talk) 18:21, 20 October 2021 (UTC)[]

Merge scary - me run awayEdit

Well, I'm buried at the moment and can't take the time to learn about the scary merge process. If someone else can take this on, it'd be great. Otherwise long time to enough free time for me.

Creating author entry at en.wikisource. Thought to look for entry here for Horace James. Found two entries:

https://www.wikidata.org/wiki/Q96381241     Horace James
https://www.wikidata.org/wiki/Q94683234     Horace James

They both seem to contain info not present in the other. The wikipedia article w:Horace James (minister) links to the first listed entry. But the first entry doesn't have date of death like the second does!

And if you are feeling capable, while looking for Horace James I found

https://www.wikidata.org/wiki/Q108427284 Scovell, Horace James
https://www.wikidata.org/wiki/Q76201686 Horace James Scovell

which again seems like duplicate entries for the same person. Shenme (talk) 19:16, 20 October 2021 (UTC)[]

@Shenme: I merged the first pair. Thanks for noticing it, it's a very common situation that Wikidata has an item for something (like this person) and then enwiki creates an article about them and a new Wikidata item is created based on that, completely ignoring the fact that Wikidata already has it. There really should be a UI fix to reduce this sort of problem somehow. Anyway... On the second pair, the first is an article about the second, so two different items. ArthurPSmith (talk) 19:26, 20 October 2021 (UTC)[]
Thank you very much. But now he was born twice? Scary! :-) Shenme (talk) 19:34, 20 October 2021 (UTC)[]
Thank you again. Now I can sleep soundly. Shenme (talk) 20:33, 20 October 2021 (UTC)[]

Learn how Movement Strategy Implementation Grants can support your Movement Strategy plansEdit

Participate Movement Charter Drafting committee election!

Your participation is needed. Community elections for the Movement Charter Drafting Committee last until October 24 (23:59 AoE). We have gathered 600 votes. It would be great to increase community participation. Let’s try to double that number! Please vote before October 24

Learn how Movement Strategy Implementation Grants can support your Movement Strategy plans

The Movement Strategy Implementation Grants give the support you need for your strategy plans. The Movement Strategy and Governance team is here to support your ideas and plans. Learn more.

--*Youngjin (talk) 03:49, 21 October 2021 (UTC)[]

Two similar albums, two Wikidata items?Edit

Q108266229 is the item for w:Sticker (album). In four days another version of the album will be released named Favorite with a few different tracks and, obviously, a different title. Should this be integrated into Q108266229 (how?) or should a new item be created for this new version? — Alexis Jazz (talk or ping me) 12:21, 21 October 2021 (UTC)[]

Most of the claims will be different, it will be the same artist/label, but a different release date, release#. And as one item can have only one title (in a language) you will have no choice: two items. Edoderoo (talk) 13:10, 21 October 2021 (UTC)[]

Interwiki language linksEdit

Existe un artículo en castellano, y también en idioma ruso, quisiera agregalo para que aparezca también en la barra de idiomas a seleccionar. ¿de qué manera se realiza esto?

It would be helpful if you could provide links to the two articles. If they have the same subject, then they should be linked to a single WD item, which will make the interwiki links work. Once we know which articles you're talking about, we can advise further. --Tagishsimon (talk) 17:02, 21 October 2021 (UTC)[]

Can I add "arbitrary persons" to wikidata (current use case: add scientific publications)Edit

I am a new user, so I will probably ask seemingly naive questions. I am a researcher in engineering and I am a bit disappointed how little literature of my field is covered by wikidata (cf. https://www.wikidata.org/wiki/Wikidata:ORCIDator). To help improving this situation I thought of using https://www.wikidata.org/wiki/Wikidata:ORCIDator. The first step there is: "Find or create the Wikidata item for a person". Thus I want to ask: 1. Can I just add persons with their name and maybe more information (e.g. university they are with, date of birth) to wikidata? 2. What is about privacy? 3. Is there any relevance threshold? Cark84 (talk) 20:22, 21 October 2021 (UTC)[]

See WD:N. --Tagishsimon (talk) 20:32, 21 October 2021 (UTC)[]
@Cark84: See also Wikidata:Living people. In general published authors do qualify as notable for Wikidata purposes, but some of their personal information should still be considered private. ArthurPSmith (talk) 21:12, 21 October 2021 (UTC)[]
  • Always have enough information to disambiguate them from someone with the same name. For some people with common names we may have a dozen people listed as only an author of a book or only an author of a research article, and no way to know if they are duplicates or distinct people. Link to their profile page at the university, if they have one. --RAN (talk) 01:14, 24 October 2021 (UTC)[]

Classes and instancesEdit

I was unable to find an answer in the help pages to what is probably a beginner question, but an important one nonetheless.

If I'm understanding things correctly, the categorization tree on wikidata is constructed entirely by use of subclasses, and the entities used for this shall not be an instance of anything else. All "real things" however should be an instance of one of these categorization entities, but it is probably strictly an error to make them a subclass of something else. Have I understood the organization of data correctly or am I way off? Thank you in advance. --Infrastruktur wdt:P31 wd:Q5 (T | C) 00:58, 22 October 2021 (UTC)[]

It's actually fairly common in WD to have items that are both instances and subclasses - see xylographer (Q1437754) for an example. Most occupations are modeled this way. - PKM (talk) 01:14, 22 October 2021 (UTC)[]
@Infrastruktur: so plenty of "real thing"s are subclass of (P279) of something. Generally physical/tangible things are instance of (P31) of something but this is not always true. Concepts are more likely to be subclass of (P279) but that's not close to a rule see for example biology (Q420). BrokenSegue (talk) 02:02, 22 October 2021 (UTC)[]
Does entities that use subclass of (P279) imply that they are meant to be used for classification, or at least can be used for that? And conversely items that doesn't use subclass of (P279) can not be used for classification? As for instance of (P31) I guess it would be an error to have an entity that is an instance of something that is an instance itself? (unless it also has the subclass statement) --Infrastruktur wdt:P31 wd:Q5 (T | C) 02:40, 22 October 2021 (UTC) []
You should only be instance of (P31) of something that is a subclass. It's a little hard to say if something is used for classification in general. Items can represent concepts and sometimes those concepts have instances. computer model (Q55990535) is used for classification and it's both instance of (P31) and subclass of (P279). Honestly wikidata isn't super self-consistent and you need to learn the style used. For example we never say something is instance of (P31) horror film (Q200092) even though it's a subclass of film (Q11424) (because we have genre (P136)). BrokenSegue (talk) 02:57, 22 October 2021 (UTC)[]
Some instances are only so because of convenience. Protein instances can include a lot of things so are actually sets and would need to be subclasses. My take after two years is that P31 and P279 are practically the same. If I were an AI developer using WD I would not handle these statements separately. --SCIdude (talk) 09:44, 22 October 2021 (UTC)[]
@Infrastruktur: The responses above indicate some confusion among the community here but I believe the underlying ontological logic is sound, if not always applied correctly. instance of (P31) and subclass of (P279) are indeed very different. subclass of (P279) for example is transitive while instance of (P31) is not. instance of (P31) points from the level of individual physical objects (locations, particular people, individual vehicles, etc.) to their classes, but it can also point from the level of classes to metaclasses, metaclasses to second-order metaclasses, etc. subclass of (P279) relationships stay within a single class/metaclass layer. Though we do have some concepts that cross metaclass levels which confuses things a bit. The explanation at Help:BMP is quite good though. See also Wikidata:WikiProject Ontology. ArthurPSmith (talk) 16:58, 22 October 2021 (UTC)[]

What is the differnent between these items?Edit

These items seems duplicate as far as I see, but bot sure.

Is there a way to konw these are same or not? --Suisui (talk) 14:25, 22 October 2021 (UTC)[]

@Suisui: To me this looks like double or multiple creations. Something must have gone wrong with MonicaMu's batch. At first glance it seems to me that you can merge the respective data objects. --Gymnicus (talk) 14:55, 22 October 2021 (UTC)[]
There's a known quickstatements issue, which is that it will create 2 or more items per CREATE statement, for a subset of CREATE statements, in batch-mode. Doesn't occur when using client-side quickstatements. Presumably a race condition. --Tagishsimon (talk) 15:04, 22 October 2021 (UTC)[]
Thanks. hmmm known issue.. Should I redirect one to another, or just leave it as it is? --Suisui (talk) 16:39, 22 October 2021 (UTC)[]
Merge away, Suisui. Cleanup in isle 9. --Tagishsimon (talk) 17:48, 22 October 2021 (UTC)[]

Canadian Women Artists History Initiative ID (P8631)Edit

Can someone change Canadian Women Artists History Initiative ID (P8631) to allow use in a reference (as reference (Q54828450)) (as, for instance, is allowed by Dictionary of Canadian Biography ID (P2753))? It's a biographical dictionary that seems reliable and so I think it should be permissible as a reference. AleatoryPonderings (talk) 14:45, 22 October 2021 (UTC)[]

Done. --Tagishsimon (talk) 15:06, 22 October 2021 (UTC)[]

Talk to the Community TechEdit

Read this message in another language

Hello!

We, the team working on the Community Wishlist Survey, would like to invite you to an online meeting with us. It will begin on 27 October (Wednesday) at 14:30 UTC on Zoom, and will last an hour. Click here to join.

Agenda

  • Become a Community Wishlist Survey Ambassador. Help us spread the word about the CWS in your community.
  • Update on the disambiguation and the real-time preview wishes
  • Questions and answers

Format

The meeting will not be recorded or streamed. Notes without attribution will be taken and published on Meta-Wiki. The presentation (all points in the agenda except for the questions and answers) will be given in English.

We can answer questions asked in English, French, Polish, Spanish, German, and Italian. If you would like to ask questions in advance, add them on the Community Wishlist Survey talk page or send to sgrabarczuk@wikimedia.org.

Natalia Rodriguez (the Community Tech manager) will be hosting this meeting.

Invitation link

We hope to see you! SGrabarczuk (WMF) (talk) 23:00, 22 October 2021 (UTC)[]

Sandbox stringEdit

Should Sandbox-String (P370) be used on non-sandbox items? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:03, 23 October 2021 (UTC)[]

"harmful" descriptions?Edit

Recently (I can't recall the exact item), I tried to add the description "American businessman and foundation administrator" to a human item (try it on any item yourself). However, I repeatedly get the message "This action has been automatically identified as harmful, and therefore disallowed. If you believe your action was constructive, please inform an administrator of what you were trying to do. A brief description of the abuse rule which your action matched is: LTA AI". I have never encountered such warnings before. Is there a log somewhere of material deemed "harmful" and "disallowed"? And I fail to see how this particular phrase is "harmful" or abuse: it was the same description used in a reference source (I believe Prabook (Q25328680), which draws heavily from published "Who's Who..." biographical dictionaries). What gives? -Animalparty (talk) 20:44, 24 October 2021 (UTC)[]