The Wikidata Barnstar | ||
Děkuji za rychlou a účinnou pomoc se šablonou Česká divadelní encyklopedie! |
Vojtěch Dostál
This data source seems to be riddled with errors (I've caught 5 so far today). I've been deprecating the erroneous claims as I find them, but do you have any other thoughts on what could be done about these? Do we need to import every error on that website, or can you do some additional checking (e.g. if the site says a person died before they were born and/or were born in the far future), and maybe reach out to them to fix the issues? Thanks,
Hi, if it's just five or so out of 5,000, that's not so bad. However I'll do a check for these 'obvious' mistakes - I usually do that with OpenRefine but I had to use QuickStatements because OpenRefine has an unpleasant bug.
Most of the wrong dates I found were already found by you, good job. I fixed just one or two more.
Hi! I'm having a look at cases of descritto nella fonte (P1343) with qualifier URL (P2699) and I found these items (https://w.wiki/BBwY) with descritto nella fonte (P1343)Q124625144. I think these could be converted into an external identifier: do you see any possible drawback? Thanks as always!
Hi! I see no drawback for now, but these websites tend to be terribly short-lived and I can't predict how long it will last.
I see. At least with an external ID we can very easily change the formatter URL to an eventual new domain or, in the unfortunate case of a shutdown, to an archival URL. If you want I can make the property proposal.
I have no problem with that, thanks.
You added wrong date of birth causing constraint violation and Lua error.
@Bean49 Thanks for reporting it. Have you ever seen something like that?
I just reported it at https://github.com/OpenRefine/OpenRefine/issues/6798
Thanks. Not until now.
I fixed all the occurences manually.
Why do you duplicate the value? Without preferred rank, it breaks many uses on Wikipedias. And violates constraint here on Wikidata. I reported it before.
Please can you provide an example where the same value was duplicated?
I provided you. See my first message. First you duplicated it with tool, then manually.
Twice with tool, removed one manually.
Maybe you are mixing up two different 'errors', one of which is described at https://github.com/OpenRefine/OpenRefine/issues/6798 and the other (presence of two seemingly identical dates, but with different calendars) is not an error at all. Or am I not seeing something?
Please set preferred rank for one of it, as required by the constraint and many use cases. Thank you.
That's a nice theory, but how do you propose to do that automatically for thousands of cases where the birth dates differ by a few days? There's no way to determine which of them is correct. Same with calendar dates, we can't really be sure which piece of data is correct.
The problem is artificially created by you, not by nature. Now we have two identical dates. Great! Perfect! Thank you for your contributions. Please fix at least this item. Thank you.
Please advise, was he born on 1st December of Gregorian calendar or Julian calendar? Happy to fix it then.
Why only one is displayed in cs:Vilém Slavata z Chlumu a Košumberka? By the result of which parameter? The rank do is for this.
I think Czech Wikipedia uses limit=1 to display only one value. But the implementation is different in each wiki...
I kindly ask you, pick one and set preferred rank, doesn't matter which one, because you duplicated it arbitrarily. As an alternative, you can delete one of them. Or do you intend to duplicate all dates where the calendar isn't specified in the source?
What is kind about that? You are pressing me to disrupt Wikidata, seemingly to serve the purposes of some arbitrary Wikipedia version which does not use Wikidata properly.
You get me wrong. You created an artificial problem. I kindly ask you to clear it. Otherwise I will have to ask the community. I don't understand why do you think that it is right what you did.
Yes, please go and ask the community if it is really OK to arbitratily choose one value as preferred. Let's find a systematic solution, not an ad hoc decision on one item.
Ahoj Vojtěch,
all links to aleph.nkp.cz are offline since a few days. Do you know more about this problem?
@Bernice Heiderman Because of national holidays in Czechia (Q15054220, Q2379466 and Sunday) is probably planned maintenance.
@Bernice Heiderman, JAn Dudík: I hope, planned maintenance would probably be properly announced. But it is possible that the inability to react immediately to the outage was related to the holidays. In some institutions, I suspect that some overzealous worker shuts down too many computers when he leaves his workplace.
Ahoj Vojto,
tak jsem si pro zábavu spustila uschovaný tvůj dotaz https://w.wiki/3MsN na podivně zdvojená data narození. Co jsem zkoumala, zdá se, že některé vznikly slepením dvou záznamů z REGO. Něco jsem už vyřešila založením nové položky (např. Jan Drábek) a budu v tom pokračovat. Spíš by bylo dobré přijít na nějaký způsob, jak nepatřičnému propojení zabránit.
Ahoj Heleno, tvou zpravu zde registruji ale nemel jsem jeste cas odpovedet, promin.
Slepeni dvou polozek je vzacny bug v OpenRefine, sleduji to delsi dobu a nepodarilo se mi porozumet presnym pricinam vzniku.
Nemelo by toho snad byt mnoho. Nebo jsi nasla takovych spojenych REGO zaznamu vice? Pokusim se na to udelat dotaz, ale je to slozite, nektere REGO zaznamy jsou duplicitni a tedy opravdu do 1 polozky patri.
Tak jsem některé prozkoumala a rozdělila. Ale u nových položek nevytvářím všechno, jen to nejdůležitější. Snad ty nebo někdo jiný občas spouští doplňování jmen a příjmení podle labelů tam, kde to dosud není...
Ahoj, sem tam zakládám položky pro jezy a až nyní jsem si všiml, že jsi je importoval z databáze Centrální evidence vodních toků. Při tvém importu ovšem nedošlo k vyplnění administrativní jednotky a proto se tebou importované položky nezobrazují na některých mapách a zbytečně tak vznikají duplicity (používám Wikidata Query Service a filtr dle okresů, takže toho tvého importu jsem si až do dnes nevšiml). Rovněž občas nesedí názvy jezů tvého importu s nynějším stavem evidence vodních toků (tebou založená položka Q123547704 jez Choceň II je ve zdrojové databázi podle mapy vedena jako Choceň I: viz https://voda.gov.cz/?page=jezy-mapa a naopak Choceň I jako Choceň II). Šly by tyto dvě chyby nějak strojově opravit (adm. jednotky a názvy)?
Ad jezy v Chocni - v Mapy.cz je to naopak a odpovídá tomu i obsah Category:Jez_Choceň_II. Vycházel jsem asi z toho. Co je správně? Důležité je aby objekty byly spárovany k sobě (na základě říčního kilometru), jméno je až druhořadé...
Administrativní jednotky doplním roboticky kde to půjde podle souřadnic - ve zdrojové databázi nejsou.
Díky za doplnění těch jednotek. Co se týče těch názvů, tak i u Brandýsa nad Orlicí jsem si všiml nesrovnalostí a to tam commonscat není. Jediné, co mě napadá, že mohlo od importu dojít ke změně v té databázi.
Often times, the names in ÚSOP are not official names but instead just the taxon names. For example https://www.wikidata.org/w/index.php?title=Q11775258&oldid=1901567939 contains "Lípa malolistá" as the official name which is only the taxon. "Lípa malolistá" also shouldn't be an alias there.
There are ~100 items with "Lípa malolistá" as the official name, ~50 more with "Lípa velkolistá" and others with different taxon names.
Yes, unfortunately the taxon name is sometimes (often) the official name. There is not much we can do.
Sometimes there's a better (descriptive) name in the official database and we try to use it, but if I remember correctly it's not easy to find. I cannot find a different name for the example you have given in the official databases.
Hi,
I see that you imported a lot of trees from the Czech Republic. That a great work, thanks a lot ! we might take inspiration for importing trees from France.
Thad said, I have a couple of question: why use arbre remarquable (Q811534) in nature de l’élément (P31) and not arbre (Q10884) in nature de l’élément (P31) and distinction reçue (P166) or statut patrimonial (P1435) (like we do for protected buildings). Some data are a bit strange and contradictory, Q26779918 is not a tree but a group of trees, why not just leave bosquet (Q1510380)? Also the quantity in comprend des éléments de type (P2670) are strange, is there 2, 3 or 8 trees? (looking at the source it seems it's 3 trees with only 2 of them protected but it's unclear and my Czech is not good).
I'd love to hear what you think.
Cheers,
The DRÚSOP register has two columns: "poč vyhl." (počet vyhlášený, number of originally declared) and "poč. souč." (počet současný, current count). The importer did not use any qualifiers to distinguish the two numbers.
Hi, it's been some time since the import happened and it's true that I would change some modelling nowadays.
At first glance, having P31 : strom (Q10884) or skupina stromů (Q1510380) sounds like a good idea but I am not sure where to put památný strom (Q811534). Protected tree is something like a nature reserve (a category of protected area) in the Czech Republic and we tend to use P31 for these protected area designations (see Prachovské skály (Q452242) for example). Therefore, we understand památný strom (Q811534) as a type of protection designation similar to national monument or national reserve, no matter how many trees are included. The strange data in zahrnuje (P2670) are a mistake by LinkedPipes ETL Bot and I'll try to look into it when time allows...
Thank for the quick answer.
As I said, I would put arbre remarquable (Q811534) (or a more specific sublass?) in distinction reçue (P166) or statut patrimonial (P1435). For me, a label or a protection is the same, wether it's a Nobel prize, a protected building or a protected tree.
There is no hurry, we can take time to think about it. For more point of view, I'm also pinging Nikola Tulechki who worked on trees in Bulgaria, Nemo bis for Italia, Lodewicus de Honsvels in Germany and Pere prlpz in Catalonia.
Hello.
In my view, arbre singular (Q811534) means any notable tree, that is any tree that is covered individually by reputable sources. Some of them are included in official natural heritage catalogues or have some kind of legal protection, like arbre d'interès local (Q115867635). However, arbre singular (Q811534) is a value of instància de (P31), but arbre d'interès local (Q115867635) is a value of estatus patrimonial (P1435), as we do for buildings.
It would be possible to use arbre (Q10884) as instància de (P31) instead of using arbre singular (Q811534) and it would be fairly reasonable, but I see a couple of problems with that:
- It's different from what we do with animals. For example, we have Orca Ulisses (Q7879048)instància de (P31)animal individual (Q26401003), not animal (Q729). In fact, "individual tree" is an enAlias for arbre singular (Q811534).
- Notable trees should have arbre singular (Q811534) somewhere, but if they aren't officially protected (for example, trees covered in a book of notable trees but without legal protection) we can't use arbre singular (Q811534) with estatus patrimonial (P1435) nor premi rebut (P166), because it's not an status nor an award, it just means that some reputable source decided that the tree is worth mentioning or describing. Then, if arbre singular (Q811534) is neither instància de (P31), nor estatus patrimonial (P1435) nor premi rebut (P166), what is it?
Of course, there is an inconsistency in Wikidata between how we treat trees, buildings and people, specially in instància de (P31). For buildings we take a quite concrete instància de (P31) (like church or cathedral), for people we stick to human and all individual characteristics go to other properties and for living beings we take the middle ground of animal individual (Q26401003) and arbre singular (Q811534). I suppose we could take a different and unified approach and try to reduce the number of values of instància de (P31) (or expand them) across Wikidata, but that would go far beyond trees.
Where I'm usually doubtful is about what to do with small sets of trees, but also small sets of anything else (two buildings, two people, two hills...). To make things more complex, as far as I know, such sets of a few trees are usually protected in Barcelona as arbre d'interès local (Q115867635) and not as the equivalent protection for groves ("arbreda d'interès local", still not present in Wikidata). Therefore, I tend to use for them the same properties as for a single tree, which doesn't feel like a very satisfactory solution - although I think I've encountered only a few of such cases.
I've not looked into the import and I don't have a specific opinion to add. Where there is some doubt, I prefer a statement to be repeated in multiple properties: if Q811534 is stated both in P31 and P1435, then it will be easier for people to find what they need with an individual query even if they're not aware of the more specific classes or properties. What matters is only that it's possible for those who care to narrow down the results to more specific definitions (e.g. designations which use a specific official source as reference).
(Unrelatedly, P1435 has a horrible label in French and Italian, as "patrimonio" sounds like everything needs to be treated for its property/capital/money value. I despise it.)
A few informative queries about instància de (P31):
There are some thousands of arbre singular (Q811534) https://w.wiki/7gET but only a couple of arbre (Q10884) https://w.wiki/7gEW
By looking at the map of all items with coordinates and individu del tàxon (P10241) https://w.wiki/7gEb I would say that:
- Somebody in Portugal, Estonia or some Austrian land may be interested in this discussion. I can't check now and notify.
- There are a lot of legal status used as instància de (P31). That's different of what we do with buildings, AFAIK, where we put the status in estatus patrimonial (P1435).
In line with what @Pere prlpz had said, we definitely can add památkový status (P1435) : památný strom v Česku (Q21296252) and keep the current instances as they are.
There is some inconsistency in labels (and aliases) of "Q811534". Some of them mean a specific type of protection (regardless of the number), some of them a general significance of any type, and in some languages ("es" and surroundings) just any "single tree".
My current reasoning is as follow :
- the current values of P31 is a bit of a mess with a lot of differents values (I missed the portugal trees because of that), sometimes several values on the same item, it doesn't make them easy to find
- arbre remarquable (Q811534) is very general and fuzzy, for nature de l’élément (P31) we need a item truly meaning "individual tree" (and only that, regardless of protection, heritage, status, etc. like we do on most items), it could be arbre (Q10884) or a new item
- most items about "remarkable" have nothing to indicate why they are "remarkable", regardless of the previous point, we should add distinction reçue (P166) or statut patrimonial (P1435), and the value need to be something more specific than arbre remarquable (Q811534) (like Tree of Public Interest (Q52062847) or monument naturel en Allemagne (Q21573182), we already have a lot of these specific items)
What do you think?
Cheers,
The wordings of labels of arbre singular (Q811534) are quite different but the ones I can understand convey a similar meaning "tree of interest", "tree of heritage value", "tree of cultural or natural significance", "notable tree"... I am missing labels that mean a specific type or protection or that imply legal protection?
Alias are more varied and sometimes have disparate meanings for the same language (for example, for Romanian I'd say they range from individual tree to protected tree). I take this just as a consequence of not having items for protected tree or monument tree and using a single item for the instances of all individual trees.
About the inconsistency between meanings "individual tree" and "notable tree":
- By now, I would say that they are quite equivalent in Wikidata. If a tree has an item, it follows the rules in Wikidata:Notability and this means that it has been described as a reliable source. Therefore, all individual trees present in Wikidata are notable trees, just as all animal individual (Q26401003) are notable animals (Talk:Q26401003#Label is an interesting short debate about the same question for animals).
- Notability threshold in Wikidata is pretty low. After seeing that somebody uploaded to Wikidata all streets of Brussels or Toulouse, all hotels in Barcelona or all houses in some neighbourhoods of Prague among other sets of non famous things, I wonder if somebody else will eventually create items for all individual trees int he streets of Paris or Sidney. If that happens we could need different items for "notable tree" and "individual tree", although at the moment I can't see that coming.
My previous answer was written at the same time as Vigneron's. This is an addition after reading his one.
I don't oppose creating different items for "famous/notable tree" and "individual tree", although I find difficult to tell apart one from the other. The only criteria I can think of is that "notable trees" have a proper name or legal protection as an individual tree or small group, and I'm not sure if this criteria is consistent even in my city.
For me, and if you get rid of the notion remarkable tree than mean everying and nothing, the difference seems easy an obvious : all trees are indivudual trees, only the few ones with a specific protection or award are protected/awarded trees. Hence, we use P31 = tree (and just that) for all of them and for the others we complete with P166 or P1435.
You have a point that remarkable tree means everything and nothing.
My biggest doubt in using P31 = tree for all trees is what happens if at some point Wikidata is flooded with trees from an exhaustive register of trees of somewhere, because we would need some way tho tell apart the notable ones (the ones covered individually by some reliable source) from all trees. That situation seems unlikely for trees in the short term, but something similar happened in France with instal·lació esportiva (Q1076486) and since even the smallest private sports center has P31 of sports venue it would be very hard to make a list of notable sports venues in France (libraries in Spain are in a similar situation).
Using legal protection and awards may be useful, but there are notable trees (covered by reputable sources) that don't have legal protection. For example, https://patrimonicultural.diba.cat/element/roure-sam or the trees marked (with a proper name) in Mapa Topogràfic de Catalunya (Q63431924), both of which are official reputable sources but aren't legal protections nor heritage classifications.
Maybe I'm overthinking this and I'm preparing for a too unlikely risk.
I hear your concerns (and yes, sport venues/facilities are a mess in France, with a lot of duplicates) and you're right, it may happens with trees *but* there is still WD:N to solve that, and I don't think that "instance of tree" instead "instance of remarkable tree" will really impact this.
I've just learned that we have important tree (Q10065268) and arbre singular (Q811534). I can't see any difference between them and I think we should merge them unless we want to use one for "notable tree" and the other one for "individual tree".
Both items have sitelinks to Czech Wikipedia so we can use those articles as hints. památný strom (Q811534) is for trees protected by state, while významný strom (Q10065268) is for just about any remarkable tree. This distinction was introduced to Czech Wikipedia by @Xth-Floor and he might be interested in this discussion. I am afraid that the other sitelinks in those two items do not correspond to 'our' definition and it may need some reshuffling, but let's see.
The sitelinks of arbre singular (Q811534) seem to be mixing both meanings, sometimes in the same article. You have a point that we could use an item for "tree" and another for "protected tree", although that's quite different of what we do for buildings.
Just for completeness, we have 555 instances of important tree (Q10065268).
Thank you for pinging me. I will certainly follow this discussion and I can try to apply the consensus to the Árvore de Interesse Público (Q52062847) instances I imported, but I do not have strong feelings about what the "proper" way is. I would certainly like them to be in better alignment with other protected trees in Wikidata, to make them more findable, so any tips in this regard are welcome.
I am currently working in importing all trees in my hometown in OpenStreetMap. That makes some sense, because this allows to detect fallen, sick or missing trees to my local community. But just a few trees are notable, i.e., have a name. In my opinion, only those having a name/being notalbe in some sense, should have the right to be in WD. A similar analysis for streets shows a key difference: streets have a name, importing them in WD may allow to carry out analysis of names, length, etc - even though you might do it as well from OSM data - if OSM streets were well labeled with proper keys.
Hello @VIGNERON @Pere prlpz @Nemo bis Can we try to wrap up this discussion and identify the key action points, before this discussion is archived?
I agree but I'm not sure what conclusion can be drawn (and since I started the discussion, it's maybe better if someone else close it).
I've not re-read everything but I can't identify any action points here except that it would be nice to document how some of these properties and classes have been used so far. Is there an appropriate project page?
I can try to make a summary, but I'm afraid it will be a summary about how we disagree, because we didn't agree on much despite the very interesting talk.
- Trees arbre (Q10884) and singular trees arbre singular (Q811534) in instància de (P31):
- It has been argued that the distinction is quite meaningless and it would be better to use arbre (Q10884) for any individual tree.
- However, the current practice is the opposite, since arbre singular (Q811534) is overwhelmingly more common (about 14k to 14).
- There is even a third item important tree (Q10065268) with a single sitelink but 555 instances. Meanings of important tree (Q10065268) and arbre singular (Q811534) are very mixed according to their sitelinks, as far as I can understand them. If we wanted to use those items we should define clearly the meaning of each one.
- Merging the three items (or two of them) is not possible, since they have different articles in some wikipedias.
- About protected status of trees:
- There seems to be some consensus on putting the protection status (e.g. arbre d'interès local (Q115867635) or arbre d'interès públic (Q52062847)) in heritage designation estatus patrimonial (P1435), as we do for buildings.
- Some participants favour using the protection status in instància de (P31) as we do for national parks and other protected areas.
- The usual practices in Wikidata for instància de (P31) when dealing with trees, animals, buildings, protected areas and people are rather different, even when within some of those areas they are reasonably consistent (e.g. people). That problem lies beyond the scope of this discussion.
In light or our disagreements, any global action we could take or any global recommendation will either leave a lot of redundancy or go against the opinions and practices of some participants, and therefore I can't see a good conclusion that more or less pleases everyone:
- We could add arbre (Q10884) to all trees, but since we don't have consensus to remove arbre singular (Q811534) or important tree (Q10065268) the result would be very redundant. I think I wouldn't dare to do that.
- We could add the protection status to estatus patrimonial (P1435) even if we don't have consensus to remove it from instància de (P31). That part is probably more reasonable.
Thanks @Pere prlpz. Maybe we could bring this up to the Wikidata:Project chat to get more advice?
Thanks @Pere prlpz! After reading the conversation again, I think I will merge významný strom (Q10065268) to památný strom (Q811534) and make it clear in Czech label and description that památný strom (Q811534) is not *only* about trees protected by law, as they now suggest. The official item for Czech law-protected trees will then be památný strom v Česku (Q21296252). We can keep památný strom (Q811534) in the instances while památný strom v Česku (Q21296252) should go to památkový status (P1435) if we agree to use this property.
@Adam Hauner @Xth-Floor FYI
@Vojtěch Dostál, thank you for letting me know about this. I'm not sure, if památkový status (P1435) is appropriate: protection of "památný strom v Česku (Q21296252)" is primary protection of part of nature/natural enviroment, only some of such protected trees are also protected for cultural heritage or historical significance. Could you find better suited property from area of the nature protection?
Just for the record: IMHO estatus patrimonial (P1435) should not be use anymore for nothing related with protected nature. Here you can find a discussion against this practice and the proposed alternative to model protected areas.
Olea I'm still unsure for the use of statut patrimonial (P1435)... In think that first, we should really start a broader discussion here on Wikidata to get more point of views (unrelatedly, we can talk about it IRL this weekend ) and then indeed maybe propose to create a new property for "natural designation".
@VIGNERON it will be great to meet you in person :-)
And I think the idea of not using estatus patrimonial (P1435) for protected areas doesn't translate well to not to use it for trees. An individual tree is not a protected area (nor an area). It's an individual item like a building or an sculpture. Interestingly there are values of estatus patrimonial (P1435) like art públic de Barcelona (Q15945449) that apply to sculptures and to some trees.
Additionally, I'm not sure about what you propose. In your link you propose a new property "protection status of a natural area", but as far as I know it has not been adopted, and therefore it couldn't be used even if it were suitable for trees.
Nowadays, the alternatives to state the status of a tree are using instància de (P31) and using estatus patrimonial (P1435), unless I'm missing some alternative. In other places you have argued for a flatter ontology - which has some merit - and using estatus patrimonial (P1435) provides a flatter ontolgy and less granularity in instància de (P31).
@Pere prlpz With the proposed data model, a tree (say QXXXXX) would be P31 as tree (or a subclass maybe). If the tree is protected, there should a related designation (say QYYYYY). Then, just only need to state QXXXXX localizado en el área protegida (P3018) QYYYYY. Check page 14.
> using estatus patrimonial (P1435) provides a flatter ontology and less granularity in instancia de (P31).
The proposes flatters the ontology thanks to a new property and a consistent data model. Check page 17. This also would fix practical data reuse problems like discerning UNESCO’s World Heritage cultural from natural sites.
I'm not saying this is THE proposal, but it has a lot of previous thought.
This change of your bot was erroneous because of the ambiguity of the Czech word "překlad": szemöldökfa (Q1370517) and fordítás (Q7553). I corrected one instance but there seems to be 500+ more occurrences. I created a bot request to fix all the instances. Your assistance would be welcome.
Thanks for finding this mistake, I will fix all occurences ASAP
BTW it stems from my earlier mistake here https://www.wikidata.org/w/index.php?title=Q1370517&diff=1647540140&oldid=1623073388
Hello, Mr Vojtěch Dostál, you made a mistake, because I speak romanian, but not romani, which is not the same thing, it is a different language. Could you please rectify it ? Many thanks and have a nice day ! Radu Alexandru Negrescu-Suţu
Thank you, I deprecated the information in Wikidata (marked it as false) and informed the people who run the source database.
The mistake is already fixed in the source database.