Open main menu

Wikidata:Project chat/Archive/2017/10

This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

Contents

M7 highway

Hello. Can you merge Q39072913 and Q1739369? --Drabdullayev17 (talk) 07:42, 5 October 2017 (UTC)

They are indeed the same thing. Done, thanks for spotting! Syced (talk) 08:18, 5 October 2017 (UTC)
This section was archived on a request by: Matěj Suchánek (talk) 16:27, 5 October 2017 (UTC)

How to express the number of articles or records?

… in a reference work (Q13136)/database (Q8513)/data set (Q1172284). collection or exhibition size (P1436)? (The current type constraint with collection (Q2668072) would allow it's use for database (Q8513)/data set (Q1172284), but Property talk:P1436 says the domain are: museum (Q33506), library (Q7075), archives (Q166118), exhibition (Q464980).)
Thank you, --Marsupium (talk) 16:45, 5 October 2017 (UTC)

I forgot about number of parts of this work of art (P2635), sorry! --Marsupium (talk) 16:59, 5 October 2017 (UTC)
This section was archived on a request by: Marsupium (talk) 16:59, 5 October 2017 (UTC)

Property proposal: UK National Fruit Collection ID

Please review Wikidata:Property proposal/UK National Fruit Collection ID, which has been open for over seven days with no comments. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:02, 3 October 2017 (UTC)

Now created. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:20, 6 October 2017 (UTC)
This section was archived on a request by: Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:20, 6 October 2017 (UTC)

Geographic coordinates: precision and import tools

I have a bunch of geographic coordinates to import using the coordinate location (P625) property. The input data is given in sexagesimal notation, i.e. degrees, arcminutes and arcseconds, with a precision of one arcsecond.

  • I suppose the decimal precision is then 1/3600 degree = 0.00027777778 degree, right? How many significant digits are recommended?
  • QuickStatements tells me that it expects “Location in the form of @LAT/LON, with LAT and LON as decimal numbers.” Apparently I cannot provide a precision. If this is correct, this tool is not suitable for my task. Does anybody know whether it supports import of coordinates in sexagesimal notation meanwhile?
  • Which other tools do we have?

Thanks, MisterSynergy (talk) 07:36, 29 September 2017 (UTC)

Anything more than six decimal places is superfluous; see the table on en:Decimal degrees. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 09:12, 29 September 2017 (UTC)
Thanks, that is a comprehensive overview of practical aspects. However, I am also interested in limitations by our software. I have to re-calculate sexagesimals to decimals anyway, and that inevitably leads to repeating decimals starting at five or six decimal positions for most input values. To be reasonably close to the sexagesimal value, I need to consider some of those repeating decimals, and I meanwhile found that the Wikibase software displays values (latitudes, longitudes, and precisions) with a total of 14 significant decimal digits when sexagesimal input was provided (example in testwikidata; input was 38° 53′ 23″ N, 77° 00′ 32″ W as in en:Decimal degrees). This cutoff after 14 digits seems a little large to me given the practical aspects raised by Andy, but I tend to do so as well in order to avoid unexpectable rounding errors. Still, I am happy to hear more input about this… —MisterSynergy (talk) 12:53, 29 September 2017 (UTC)
This was discussed in the past month or so, with no consensus. Please search the page history to find the discussion. After that discussion I searched the web for a reliable document suitable for the level of sophistication appropriate for Wikidata. There is a web page from ISO which is far to sophisticated for this environment. I think Good Laboratory Practice from the National Institute of Standards and Technology is more appropriate for this environment. It advocates rounding the uncertainty to two significant figures, so if d-m-s are converted to decimal degrees, and the uncertainty is 1 arcsecond, the decimal uncertainty would be 0.00028°.
Reconstructing the character string in the original source should not be a goal, for two reasons. First, it is not standard practice to attempt this in science and technology. Second, the original coordinates and uncertainty could have been expressed in many ways, such as decimal degrees, degrees and decimal minutes, meters, feet, or miles. There is no hope of reconstructing the character string in the original source. Jc3s5h (talk) 14:50, 30 September 2017 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── Thanks for the hint that there are past discussions. I have found some of them, including:

There are likely more, but I assume that most important aspects have already been mentioned in them. Unfortunately geoPrecision is not well defined at Wikidata until now (see mw:Wikibase/DataModel#Geographic locations). From what I understand, there are at least three interpretations:

Precision expresses dimension
Geographic coordinates represent the position of a point on a given sphere, but effectively most entities with coordinates have lateral dimensions, i.e. they are not points. This approach tries to express the lateral dimensions of the entity via precision, often derived from other properties of the entity (e.g. human settlements can be assumed to have lateral dimensions of 10 kilometers according to mw:Extension:GeoData). A problem of this approach is that lateral dimensions can be quite irregular. Furthermore, geoPrecision is a single value for latitude and longitude, although the precision value translates to different lateral dimensions for any point not on the equator of the sphere. This approachs aims to provide a value for a “standard zoom” on maps.
Precision expresses measurement uncertainty
If one considers coordinates as a result of a measurement process, this is the kind of precision one would calculate e.g. by standard deviations (stdev) of independent, repeated individual measurement results. Rounding to two significant digits is then indeed a pretty common procedure, not only for coordinates. However, coordinates are rarely determined within a defined measurement process that a physicist would be happy with. Consider the following situation: one wants to measure the coordinates of a city with some GPS tracker (or read it from an online map service—it doesn’t really matter). On the one hand side one could walk to the central city hall, read N measurement results from the device, and then determine a stdev value. It would be related to the quality of the measurement device and completely neglect any accuracy issues, but it wouldn’t be a value really related to the measured entity unless we define where to measure depending on the entity type (e.g. city hall for human settlements). On the other hand one could visit N random sites within the area of the city, and compute a stdev value from the results. The outcome would then be correlated to the lateral dimension of the city (largest value of either NS or EW dimension in degrees, i.e. depending on latitude as well).
Precision expresses number of significant digits
Finally the precision is also frequently used to express how many significant digits coordinates are meant to have. This approach is somewhat implemented in the GUI, which guesses the precision based on the input given by a user. In practice, reasonable values depend on the entity type (and thus its typical lateral dimension) as well.

Another aspect regarding rounding is that the software internally only accepts decimal values, but displays sexagesimal notation for values and precision in most cases. Rounding to two significant decimal digits is a bit pointless when the data is in fact sexagesimal, and the closest decimal representation of the sexagesimal precision should be used then (i.e.  , which needs to be rounded at some point in computers; for some reasons, in Wikidata the decimal representation of sexagesimal input has 14 significant digits right now by default). —MisterSynergy (talk) 08:29, 1 October 2017 (UTC)

Problem with @Andreasmperu:

Any person disagrees with my edit that any physical object or area can be used for renting? edit

Do we have a better property for this other than "item operated"?

I got 38 notifications like this just today.

Nobody really supports or cares to answer about previous edits from this account:

1

If nobody support @Andreasmperu: I will ask for a global block. But I also may ask for a global block for anyone who will support this behaviour. For whatever reason they do their thing it should be limited.

Example of damage 18 September 2017‎-29 September 2017:

Don't get me wrong, edits are pure harm and go under radar and span almost month (I will ignore stalking aspect here). d1g (talk) 10:10, 29 September 2017 (UTC)

update below Example of damage 29 September 2017:

d1g (talk) 16:13, 29 September 2017 (UTC)

  • d1g I don't understand why you continue to be at the center of these edit wars. I understand you are trying to do some good here on wikidata, but contentious addition of dubious claims to abstract concepts isn't likely to improve things here much. There is an awful lot of work needed on concrete items about people, places, organizations, etc, I suggest you focus your energies on Mix n' Match or vandalism fighting rather than trying to fiddle with statements that may make sense in some languages but look wrong in others (how could "renting" be the operator of something?). Because of the huge diversity of wikidata we must be very careful to try to understand the concerns of others, and not to dismiss them out of hand. ArthurPSmith (talk) 18:48, 29 September 2017 (UTC)
    • Renting will involve real objects in any language
    • This is not specific to any language.
    • All edit wars come from one person.
    • You blamed me before, you blame me now instead of admitting that @Andreasmperu: doesn't use talk pages as other editors.
    • For exact reason of word games discussions are not optional.
    • But @Andreasmperu: ignores talk pages with unacceptable frequency.
    • @ArthurPSmith: point me how many times @Andreasmperu: took their time to discuss things in last month? d1g (talk) 18:57, 29 September 2017 (UTC)
      • I don't know who is "stalking" who here, but both of you should stop the revert wars and work on things that other people won't object to or find controversial. I can understand somebody being reluctant to discuss matters with you in detail as your responses often don't make sense. As here - the issue with your edit to "renting" is the use of the property item operated (P121) - to me I don't see how a type of contract (what "renting" is) can be an "operator" of something. Aside from the fact that rentals may involve non-physical items: software and services, for example. I don't see how in any way your addition of that statement improves the state of wikidata. What practical purpose is served by that edit? What concrete thing could somebody do with our information after your edit that they could not do before? ArthurPSmith (talk) 19:26, 29 September 2017 (UTC)
        • @ArthurPSmith: short story: when you see missing claims 1. you add them 2. you don't remove everything just some items were missing (or remove everything and re-add everything)
        • Clear case where we need a new property "subject of agreement"
        • here we are not removing meaningless P279 claims because we don't have better option.
        • I can discuss everything and you blame me?
        • Up to you to continue this song. d1g (talk) 20:13, 29 September 2017 (UTC)


@ArthurPSmith: I may understand your protectionism.

But this section about person who removes "straight line : curve" claims

1 attempt to spot disruptive editor:

d1g (talk) 19:57, 29 September 2017 (UTC)

  • I have no idea what you are trying to argue here regarding your edits. You seem to have claimed that "line" (which has "straight line" as an alias in most of the languages I can see) has "curvature", and engaged in an edit war on that? Why? Curvature of 0 surely? A one-dimensional geometric object that actually has non-zero curvature would be a "curve", which has its own item. Though perhaps there are mathematical subtleties in the various definitions. In any case, once again your addition of something that seems to have no useful purpose to an abstract concept led to an edit war. Why are you doing this? ArthurPSmith (talk) 20:16, 29 September 2017 (UTC)
    • >has "curvature"
    • Does not have a curvature - this is from non-specialized dictionary.
    • >Why are you doing this?
    • I will re-add "straight line does not have curvature" because it is true.
    • @ArthurPSmith: do you really support development of Wikidata?.. d1g (talk) 20:39, 29 September 2017 (UTC)
Two points:
  1. a "physical object" is indeed inappropriate for renting. Renting is a legal concept, so should be described in legal terms. It is not possible to rent a specific rock at specified coordinates on the backside of the moon, although obviously it is a "physical object".
  2. I also disagree with ArthurPSmith's "both of you should [...] work on things that other people won't object to or find controversial." Everybody should work on things he understands and knows, not just on anything where there is no effective opposition. - Brya (talk) 02:53, 30 September 2017 (UTC)
@Brya: in practical sense it is possible to rent space anywhere on land. Exceptions are very limited: terra nullius (Q312461) or to sell nearly any land Alaska (Q797).
I disagree with ArthurPSmith too:
User should absolutely stop editing true claims like
  • "bus driver operates bus"
  • "service is an economic sector"
  • "straight line is without curvature"
Many of complains and "edit wars" from this user is from things shouldn't be contested in first place.
I had no problems like this before september of 2017 d1g (talk) 12:27, 30 September 2017 (UTC)
@Brya: good point that we should edit where we understand things; however, it is also important to recognize that other people may understand things better or differently (sources for claims help of course, but then who judges those sources?) Those who keep getting reverted should take a step back and think about why. D1g seems to be editing all over the place in abstract concepts, and has repeatedly been notified that some of their edits are not helpful. Many of them are fine, I'm sure. A little self-analysis on this would probably be a good idea. ArthurPSmith (talk) 13:17, 30 September 2017 (UTC)
@ArthurPSmith: It is very true that there are cases where other users may understand things better. The issue is to recognize which cases those are. I have noticed many times that users write something like "things are in a complete mess here" and that more often than not those users don't know what they are doing at all; but there are indeed some areas where things are in a complete mess. The only way to tell which is which is by testing against the literature, and to do that a minimum knowledge of the field is necessary.
        Indeed, User:D1g has been editing very enthusiastically all over the place, not only in abstract concepts. Many of these edits were not carefully considered, and it seems a matter of chance which of them are correct. I have not been following him around, but have the feeling that there is a great deal of work to be done in checking these edits. - Brya (talk) 14:49, 30 September 2017 (UTC)

@Brya: I have a silly number of items on my watch list, so (unfortunately) I came across his edits a few weeks ago, and for what I have seen so far his rate of wrong edits could easily reach 50%, which is quite worrying because he has toppled up over 51000 edits. That is how bad the problem is.

Sorry I just came back to answer this thread, but I had a terrible migraine. It is disheartening to face a continuous damage to a project, which already has tons of problems to deal with. Even though this editor has been warned to stop his editing behaviour and encouraged to help in other much needed tasks, he has continued editing in the same way without taking any notice. Maybe he finds fighting vandalism boring, or merging or dividing items not that challenging. I can only talk about myself: I am willing to do those boring, less stimulating tasks, but I have limited time. On top of that, I realised that there are less administrators available from my time zone, so wasting time is a huge problem. It is frustrating to waste it in one single editor, who hasn't been able to acknowledge any wrongdoing but just blamed others.

People who have crossed paths with me might know that I prefer to communicate through edit summaries. I believe that, in most cases, this is is an effective way of saving time for me and the other editor (I strongly dislike to inconvenience people with unnecessary talk). I have tried this approach with this editor, and it clearly didn't work. When I reached out, I noticed that he has a particular understanding of what a discussion is. He didn't get any feedback on Property talk:P121, so he went ahead and make the undiscussed change. As you can see in the above discussion and in many, many talk pages, he doesn't react well to criticism, so he reverts and starts with a constant pinging and personal attacks. He only dialogs with people who agree with him.

Just to mention one case: my involvement with line (Q1228250) started with a merge. The previous item included two sitelinks in Chinese, and a Hong Kong-based ip address had added the statements that were later going to be removed over and over by Digg. Since the item was on my watchlist, I realised those edits didn't make sense, so I restore the status quo. Reverting to the previous situation before a controversial contribution is standard procedure. He doesn't understand that either. He believes there is a persecution (because his edits keep appearing on my watchlist), and yet I am not the only one reverting him.

This user dismissed my last attempt of an understanding (Topic:Tywjpqr3xgjve5a9). I assumed his enthusiasm could be due to a young age, so he could used some guidance. But he doesn't take any advise nor shows any gratitude to people taking the time to point at him the mistakes he is making. So, I stick with my opinion: there is no space in a collaborative project for a person with his current attitude. Andreasm háblame / just talk to me 03:10, 1 October 2017 (UTC)

@Andreasmperu: you have my sympathy. I see he has been blocked for a month. - Brya (talk) 08:07, 1 October 2017 (UTC)

Wikidata tours

The first "Start this tutorial" button on the Wikidata:Tours page doesn't seem to be working. It doesn't load any instruction box. 88.115.40.66 06:17, 1 October 2017 (UTC)

Someone merged the item. :| All good now. Sjoerd de Bruin (talk) 06:22, 1 October 2017 (UTC)

Property proposal: BFI person ID

Please review Wikidata:Property proposal/BFI person ID, which has been open for over seven days, with no comments. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:42, 1 October 2017 (UTC)

Looking for a Wikidata UI focussed on a reader-friendly table

Dear all,

My pet Wikidata-powered database of embassies has become quite popular, and query.wikidata.org is not felt as user-friendly enough. I want to create a page focussed on showing tabular data, without showing technical details. It must have the following features:

  • Table with columns/rows/sort/paging (like the query.wikidata.org default view)
  • SPARQL querying starts immediately, no need to press a button
  • The SPARQL code is not shown (just a small link that opens the query on query.wikidata.org)
  • "+" sign (or similar) in empty cells for people to add missing data. It might just link to the Wikidata item after an explanation popup, or even better allow in-place addition (like TABernacle)
  • Images shown as thumnails (rather than URL)
  • Related QID+label couples shown as hyperlink rather than two separate columns. Example for an embassy in Paris: the SPARQL request would generate both locationLabel="Paris" and locationQID="Q90", and the tool would automatically recognize this pattern and make an hyperlink Paris.
  • URL that can be easily posted on any Wiki (so not query.wikidata.org URLs and not shortened URLs)

Do you know something that fits these requirements? Or most of them? Thanks a lot! Syced (talk) 07:13, 2 October 2017 (UTC)

How close to these requirements do you find Listeria (example 1, example 2) gets for you? Jheald (talk) 07:34, 2 October 2017 (UTC)
Jheald: The table looks great! The biggest problem is that there is no download button (most people come to download the CSV and use it into other tools). Also, it does not guide users to add missing data on Wikidata, so viewers feeling generous will probably modify the wikisource without knowing that it will be overwritten. Finally, the lack of paging might restrict the quantity of data shown quicker than the SPARQL timeout. Cheers! Syced (talk) 08:28, 2 October 2017 (UTC)

Wikidata weekly summary #280

Is it OK to note that somebody is an artist using P31?

This question was already asked at Property_talk:P31#Usage without an answer, Help:Basic_membership_properties#instance_of_.28P31.29 is not helpful. See User_talk:Sjoerddebruin#why.3F for an initial discussion.

Is it acceptable to add artist (Q483501) in instance of (P31) to note that somebody is an artist? User:Sjoerddebruin advocates using occupation (P106) and removing instance of (P31) and I am unsure whatever it is preferable. Mateusz Konieczny (talk) 15:27, 24 September 2017 (UTC)

Using occupation (P106) is the de facto convention at least: for the same statement, there are 13687 cases using P106 while there are only 6 using P31 (which should probably be corrected). Maxlath (talk) 15:59, 24 September 2017 (UTC)
If someone is an actor (Q33999) would you add that to instance of (P31) too? In both cases it's an occupation/profession so it belongs in occupation (P106). Mbch331 (talk) 17:08, 24 September 2017 (UTC)
I added note to Property_talk:P31#Usage - maybe there is a better place to describe how property should not be used Mateusz Konieczny (talk) 18:51, 24 September 2017 (UTC)
I don't expect much people to look at that talk page. Most people look at other items. You're not the only one though, the label of instance of (P31) might be confusing for some people. If they do that property wrong, they will not get the best suggestions. Sjoerd de Bruin (talk) 19:25, 24 September 2017 (UTC)
Well, at least I checked - so I expect that at least in some rare cases it will be useful (and this is one of the most popular properties) Mateusz Konieczny (talk) 21:41, 24 September 2017 (UTC)
When I see artist in combination with P31 I do and will actively remove them. Thanks, GerardM (talk) 06:35, 28 September 2017 (UTC)
Hopefully you replace it with better tagging rather than simply remove it Mateusz Konieczny (talk) 12:58, 28 September 2017 (UTC)
Check my record. Thanks, GerardM (talk) 05:37, 3 October 2017 (UTC)

'used for values only constraint' issues

The use of used for values only constraint (Q21528958) on, for, example audio recording of the subject's spoken voice (P990), causes a constraint violation for the Wikidata property example (P1855). However, it is not possible to add an exception qualifier, because it is on a property, not an item. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 09:08, 2 October 2017 (UTC)

Lucas Werkmeister (WMDE)
Jarekt - mostly interested in properties related to Commons
MisterSynergy
John Samuel
Sannita
Yair rand
Jon Harald Søby
Pasleim
Jura
PKM
ChristianKl
Sjoerddebruin
Salgo60
Fralambert
Manu1400
Was a bee
Malore
Ivanhercaz
Peter F. Patel-Schneider
Pizza1016
Ogoorcs
ZI Jony
  Notified participants of WikiProject property constraints

Good catch! (May I ask how you found this? I’m not aware of any constraint checking system that currently checks qualifiers – the extension/gadget will do this later this month (phab:T176863), and KrBot doesn’t do it as far as I know.) I think there’s more than one possible solution here:
  1. Define a new property “exception to constraint for properties” (or something like that), analogous to Wikidata property example for properties (P2271).
  2. Hard-code Wikidata property example (P1855) as a special property where the roles of main value and qualifiers are somewhat swapped.
Which one do you think is better? --Lucas Werkmeister (WMDE) (talk) 09:55, 2 October 2017 (UTC)
@Lucas Werkmeister (WMDE): It's reported at Wikidata:Database reports/Constraint violations/P990#Value only. I'm not sure which solution would be best. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:23, 2 October 2017 (UTC)
Oh, I see – KrBot only looks at qualifiers for the “value only” constraints. Okay, thanks. --Lucas Werkmeister (WMDE) (talk) 11:31, 2 October 2017 (UTC)
the general issue with qualifiers in Wikidata property example (P1855) (you can only add qualifiers as additional qualifiers on the statement that this is a property example, rather than as qualifiers on the example itself) would be nice to have dealt with, but I don't know how practical that is... ArthurPSmith (talk) 15:52, 2 October 2017 (UTC)
As far as I see, this affects all properties with that type of constraint. A special treatment of Wikidata property example (P1855) within User:KrBots evaluation (@Ivan A. Krestinin) seems to be the best approach to me. The upcoming extension/gadget might need to do the same in the future. —MisterSynergy (talk) 19:06, 2 October 2017 (UTC)

properties for this type (P1963)

These 45 statements seem rather excessive, some of them already included in human (Q5), which currently has 35 such statements. Any idea about the scope of this property? Andreasm háblame / just talk to me 01:57, 3 October 2017 (UTC)

Unless we wish to claim that the dogs, foot powders, and whatnot that were ever elected, whether legitimately or otherwise, rightfully deserve "occupation → politician", those statements which overlap with those in human (Q5) should be removed from the item. Mahir256 (talk) 04:11, 3 October 2017 (UTC)

A new "key" item

Hi! I have just created Q41546637, wikimedian, which was missing. A sysop told me to go on and create it. Can you improve it as well, so it looks as good as Q23038345 (wikipedian)?

Any advice?--Alexmar983 (talk) 02:56, 3 October 2017 (UTC)

Cebuano

Has anyone else noticed that there are a lot of geographical duplicates in the Cebuano language with English ones? Abano Terme (Q30021869) and Abano Terme (Q34603). I found dozens and merged ones where the Geonames matched exactly. --Richard Arthur Norton (1958- ) (talk) 01:46, 24 September 2017 (UTC)

Yes, I believe the Cebuano (ceb) and Swedish (sv) Wikipedias used a bot to auto-generate hundreds of thousands of Wikipedia articles from Geonames data dumps and other sources. However, because there seems to have been no inclination to link these to existing interlanguage articles, Wikidata items or even Geonames IDs, another bot detects the articles as a distinct concept and creates a new Wikidata item. --Canley (talk) 03:17, 24 September 2017 (UTC)
Just search "Cebuano" or "cebwiki" in Wikidata:Project chat/Archive and see. Matěj Suchánek (talk) 08:05, 24 September 2017 (UTC)
Note that one is the municipality seat and the other is the municipality itself. Cebwiki and svwiki describe those seperatly, but most Wikipedia's combine them. Sjoerd de Bruin (talk) 14:42, 24 September 2017 (UTC)
Also note that all the "populated places" who has been created by GeoNames do not exists in reality outside of the GeoNames database. That could include some of these municipality seats. GeoNames has simply copied all names on some maps, and interpreted them as "populated places". -- Innocent bystander (talk) 18:47, 24 September 2017 (UTC)
Its not just the duplication due to the "populated place" vs. "administrative subdivision" in Geonames, there are also many hills, rivers and others which already had an item here duplicated with a new item from this geonames import. And to make it worse - 95% of the hills/mountains I merged so far had a totally bogus height value, sometimes more than 100 meter lower than the real value. But all the hills which had nothing to merge still keep the misleading height value. Ahoerstemeier (talk) 19:27, 24 September 2017 (UTC)
This is Cebearth. We are so respectful of individual Wikipedias here that we let them create an entirely new planet here with our congratulations. Thierry Caro (talk) 20:35, 24 September 2017 (UTC)
I wonder is it feasible to ban mass creating wikidata entries that are doomed to leave mass scale duplicates. And revert such mass scale invalid edits in future Mateusz Konieczny (talk) 13:13, 28 September 2017 (UTC)
Should a geography expert be doing the merges, or should I merge them when they are obvious? Some take an extraordinary amount of effort to distinguish. I might have spent 30 minutes determining that two cemeteries were the same. It was listed in two different towns with different GPS coordinates. It was the same, just large so was in two towns. Wasn't that "entirely new planet" supposed to collide with Earth and destroy Wikipedia when you wrote that? --Richard Arthur Norton (1958- ) (talk) 03:38, 25 September 2017 (UTC)
Cebuano spam generated massive amount of articles and in the worst case merging (probably it is preferable to combine it with deleting unsourced statements from cebuano duplicate) will result in a wrong lik to a joke wikipedia. So I would recommend merging. Mateusz Konieczny (talk) 19:35, 27 September 2017 (UTC)
Right, Cebuano is a curious case of massive bot creation. Ceb.wp has only 4 admins, 2 very active editors, and 99% of their edits from bots. Today, they have nearly the same number of articles as English Wikipedia, and seem to be generating 10x as many articles per day as en.wp. At this rate, they will overtake English Wikipedia fairly soon. [1] -- Fuzheado (talk) 13:52, 25 September 2017 (UTC)
The project is currently in the US. Not much left of the alphabet. Thereafter it will probably shrink because of clean up. -- Innocent bystander (talk) 14:00, 25 September 2017 (UTC)
I have been merging lately dozens of cebwiki and hewiki geographic articles, almost all in Israel. I assume that have I not been a he-en bilingual some of there articles would have stayed seperate for decades. DGtal (talk) 20:07, 27 September 2017 (UTC)

There needs to be a community-wide discussion about this. Due to the mammoth burden this creates for the Wikidata community, a bot making almost 600,000 new articles a month that are composed entirely of "This is a hill with [incorrect value] height" or "This is a type of beetle" is beyond useless. What is the appropriate venue for discussing this at m:? —Justin (koavf)TCM 23:49, 27 September 2017 (UTC)

Wow. Even worse, the bot actually making a quarter-million new articles this month isn't tagged as a bot, even though it definitely has those user rights! —Justin (koavf)TCM 23:54, 27 September 2017 (UTC)
@Koavf: yes it is a bot... you inverted the condition and excluded bots from the list - those new lists are a little difficult to get used to, aren't they ? :) --Hsarrazin (talk) 08:25, 28 September 2017 (UTC)
So we can expect to see 0,25M spam about springs soon - that will duplicate all existing entries about USA springs? Wonderful. Mateusz Konieczny (talk) 13:06, 28 September 2017 (UTC)
@Hsarrazin: Thanks! —Justin (koavf)TCM 15:54, 28 September 2017 (UTC)

When will we start to collaborate?

The Cebuano Wikipedia has all these articles you object to because of a few simple reasons.

  • We do not collaborate
  • Wikidata is downhill for every Wikipedia; when they create articles we gain items
  • We do not consider Wikidata as the source for auto generated texts that are cached.

The Cebuano and other Wikipedias gets its data from sources we do not care for. At that we are spectacularly picky because "when a source is not authoritative we should not include their data". The negative consequences abound. When a source is considered authoritative, we accept the complete stamp collection without a second thought EVEN THOUGH other authorities object to the data and prove it wrong. We do not accept from sources that are maintained by a community (outside of the WMF) because they are not authoritative EVEN THOUGH they want to and often do actively cooperate with (some of) us.

We need to collaborate and this is much more relevant to "authoritative". When there is no interaction with a secondary source, when it is a one way street only, we should at best link to these sources. We have mechanisms to combine with the "authoritative" data but we do not need to increase the mess by importing wholesale. When we do collaborate, when we and them change the data based on the collaboration, it makes sense to include their data and change it as per our customs.

Arguably we do not really collaborate with many Wikipedias at least it seems that way because of the vocalists who oppose any form of collaboration. This alone prevents discussion on how cooperation could exist between multiple Wikipedias who use Wikidata as a tool.

Our negative attitudes that become apparent when we discuss Cebuano are part and parcel of this feeling of superiority that we want to feel. Now managing projects like the Cebuano Wikipedia properly actually gives us the superiority and also a tool to do better not only for this but for many more languages including English. Thanks, GerardM (talk) 06:19, 28 September 2017 (UTC)

What exactly do you mean by "collaborate"? I think the Wikidatans who start merging some of these duplicate items are very cooperative and show a lot of good will. What else do you think Wikidata could do? I suggest that you show the way by initiating the collaboration that you mean. − Pintoch (talk) 07:18, 28 September 2017 (UTC)
"we do not collaborate" - I fully agree that Cebuano is a joke Wikipedia not cooperating with anybody and polluting Wikidata with useless entries. Unfortunately I see little that can be done to fix it. I merged many duplicated entries, but it is a drop in ocean. What worse, soon they will create yet another set of duplicates by copy-pasting next database. With normal Wikipedia at least we would be able to expect that no more than one article exists for a given object (to be more exact - some limited number of articles is duplicated, but it is not likely that it would be on massive scale). Mateusz Konieczny (talk) 13:01, 28 September 2017 (UTC)
Unfortunately due to extreme quality problems data consumers are forced to spend time on filtering out invalid cebuano spam - see for example https://github.com/EdwardBetts/osm-wikidata/issues/159 In the end every user of Wikidata data will be forced to waste time on excluding part of dataset as useless spam (currently it is not terrible, but cebuano will sooner or later start dumping databases without noticing that it already has full set of articles) Mateusz Konieczny (talk) 13:04, 28 September 2017 (UTC)
No we do not collaborate. If we did, we would not have this problem. The fact that some spend time coping with this issue is exactly because we do not share this burden. When all the data were imported and we maintained it, we could share our effort with the source and at the ceb.wikipedia they would gain updates based on this effort. The community at the source want(ed) to collaborate but we said no. Thanks, GerardM (talk) 16:03, 28 September 2017 (UTC)
"The community at the source want(ed) to collaborate but we said no" - when? where? how? Maybe I missed something but my contact with ceb is limited to finding bot-created duplicates - over and over again. Mateusz Konieczny (talk) 16:50, 28 September 2017 (UTC)
@GerardM: again, I repeat my message above: if you see a way to collaborate that would solve the issue, just do it! You have written this "we do not collaborate" rant multiple times, but I do not think it is going to magically attract a crowd of Wikidatans who will happily spend hours merging items manually. We need practical solutions, not appeals for them. − Pintoch (talk) 08:13, 29 September 2017 (UTC)
@Pintoch: Given that we refused in the past to even consider collaboration, we are in the mess we are in. When we reconsider, we can replace the Cebuano articles with information based in Wikidata and show the cached results. This involves collaboration from WMF developers. I think it is in our best interest to do this. Changes could be shared and consequently our work and their work will effectively improve the information we provide. The first thing though is that we accept that this problem is one of our own making. Blaming others will not help us move forward. Thanks, GerardM (talk) 10:51, 29 September 2017 (UTC)
Lsjbot was already active before Wikidata existed, so I don't see how we should have been able to consider a collaboration. The only solution I see for cebwiki is to install the ArticlePlaceholder extension, delete all bot created articles and show automatically generated pages displaying data from Wikidata. Otherwise, all the quality improvements we do here will never fed back to cebwiki. --Pasleim (talk) 11:10, 29 September 2017 (UTC)
@GerardM: Thanks for starting to spell out something a bit more explicit. But I do not see at all how overwriting the Cebuano articles with information drawn from Wikidata would solve the issue. The Wikidata items that are associated with these Cebuano pages contain the same information as in the article (they are created by Lsjbot). So, overwriting the Cebuano articles with that information will not change anything. The problem is to merge these new items with the older items that they duplicate, and fix the inaccurate information imported from Geonames. These older items rarely contain enough information to generate alternate Cebuano articles. So, if I understood your proposal correctly, and if my understanding of the problem is right, your proposal does not address the problem at all. − Pintoch (talk) 11:12, 29 September 2017 (UTC)
When we remove the articles at ceb.wikipedia and replace the data with cached data, it will not fix anything to start of with. What it does is replace the incorrect data at ceb.wikipedia when we ammend things thanks to the caching. When we merge items, the statements available will be combined and as we fix things, we can share the information with Geonames as well. This will consequently start the collaboration needed. As a consequence we can offer LSJBOT to import data in Wikidata and prevent the negative effects as we currently experience. Thanks, GerardM (talk) 12:14, 29 September 2017 (UTC)
@GerardM: What cached data are you talking about? I do not understand at all. I can see that you have a lot of good intentions (and I love collaboration too!), but we are talking here about a technical issue, so a proper technical understanding is required to tackle it. If you do not have that technical understanding, please stop telling others what they should do to handle the problem. In the case where I am just too stupid to understand your solution, why don't you just go to cebWP and run the bot you are talking about? You do not need any help from the WMF for that. − Pintoch (talk) 12:26, 29 September 2017 (UTC)
@GerardM: "As a consequence we can offer LSJBOT to import data in Wikidata" - NONONO! Importing cebwiki rubbish is the main problem, importing more useless duplicates is not going to fix anything! Mateusz Konieczny (talk) 12:33, 29 September 2017 (UTC)
@Mateusz Konieczny: What is it that you prefer: to get the data through a Wikipedia or that we get the data and actively maintain the complete set from the start and make the most of it? We cannot pick and choose. We support all Wikipedias. Thanks, GerardM (talk) 14:16, 29 September 2017 (UTC)
@GerardM: We can pick and choose. For example by blocking editors (especially bots) adding duplicated articles and reverting their edits, including mass deletion of added entries Mateusz Konieczny (talk) 22:19, 29 September 2017 (UTC)
No we cannot. There is no we in your statement. Thanks, GerardM (talk) 06:49, 30 September 2017 (UTC)
Are you seriously claiming that malfunctioning bots may not be blocked on Wikidata? Mateusz Konieczny (talk) 08:21, 30 September 2017 (UTC)
They are not malfunctioning. They do as they are designed. We cannot tell other projects what to do and our function is to reflect what happens in other projects. What we can do is collaborate; take the information in at the start and fix it. This will work much better than the whining about the mishap that happened because we refused to collaborate.
@GerardM: "Given that we refused in the past to even consider collaboration" - I ask again - when? where? how? Maybe I missed something but my contact with ceb is limited to finding bot-created duplicates - over and over again Mateusz Konieczny (talk) 11:39, 29 September 2017 (UTC)
@GerardM: You claimed that "we refused in the past to even consider collaboration" - can you link to such situation where consensus of wikidata editors was to refuse considering collaboration? You repeated this extraordinary claim without any proof Mateusz Konieczny (talk) 22:22, 29 September 2017 (UTC)
The proof is in the pudding.. It is in your own words. It is in the notion that we should only include "authoritative" sources. It is lazy, it is parasysitc and it is a folly. Thanks, GerardM (talk) 06:49, 30 September 2017 (UTC)
So in your opinion "It is lazy, it is parasysitc and it is a folly" to expect that bots will not create duplicate entries? Mateusz Konieczny (talk) 08:20, 30 September 2017 (UTC)
It is a folly and parasitic that we only accept data from "authoritative" sources. Sources that do not accept that their data is questioned. Your obsession with bots is not here nor there. It is the consequence after the fact that we decided not to cooperate. Thanks, GerardM (talk) 12:55, 30 September 2017 (UTC)
I don't think those Cebuano derived items are useful for Wikidata. I would advocate to stop autogenerating items for new Cebuano articles. ChristianKl (talk) 19:01, 28 September 2017 (UTC)
It is the purpose of Wikidata to link items from any and all Wikipedias. As you consider the result of our practice of not collaborating with non "authoritative" projects problematic, you could / should revisit the arguments for collaboration. Thanks, GerardM (talk) 04:31, 29 September 2017 (UTC)
The main page states :
Wikidata is a free and open knowledge base that can be read and edited by both humans and machines.
Wikidata acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wikisource, and others.
        If this is irreconcilable with linking to everything that calls itself a Wikipedia, then the latter will have to give way. It would be nice if it were possible to close down some of these dubious 'Wikipedias'. - Brya (talk) 05:38, 29 September 2017 (UTC)
"to link items from any and all Wikipedias" - it does not mean that everybody has right to run low quality bot that creates duplicated entries. Nobody complains that ceb is linked - main problem is creation of duplicates on massive scale. There is no real cebwiki community and quality of articles is extremely low. As result autogenerating wikidata entries requires extensive checking whatever such entries exist, including duplicates with cebwiki itself (what should be done as stubs on cebwiki were created). Mateusz Konieczny (talk) 07:49, 29 September 2017 (UTC)
You do not get what happens. We could enable the use of our data in generated texts. When we do, the script writers can fulfill their wish and create texts that will be cached. They have to be enticed to do so. Importing the data FIRST in Wikidata helps us and them. We get the data and merge it properly from the start. The script runs and prepares texts that are cached or saved and marked as identical. When the data or the script and consequently the text changes, the text is updated. In this way we always have all the data available to us for us to maintain. Much to be preferred to the current mess. Thanks, GerardM (talk) 05:47, 3 October 2017 (UTC)
@GerardM: please, just stop! It is blatant that you do not understand the issue: what you are proposing is just not practical at all. If you want to prove us wrong: just implement it. You do not need any special permission for that. − Pintoch (talk) 08:17, 3 October 2017 (UTC)
@Pintoch: I do not have to implement anything at all Check this out. It is an example of text generated on the fly for several years now. This is not rocket science. Thanks, GerardM (talk) 08:37, 3 October 2017 (UTC)
Oh, I am glad to know that you just solved the cebwiki issue with that link! Congratulations! − Pintoch (talk) 09:12, 3 October 2017 (UTC)
Oh, I am so sad that you do not understand the cebwiki issue. The link demonstrates that technically we can provide the functionality sought for the Cebuano Wikipedia. They are going to do more databases that we find "problematic" and there is nothing we can do about it. By seeking collaboration we will do much better than we do now. Thanks, GerardM (talk) 10:58, 3 October 2017 (UTC)

Copyright

Geonames is a database under CC BY 4.0. Single facts don't have a copyright but the whole database has it. If Lsjbot is copying the whole Geonames database to cebwiki and we copy whole cebwiki to Wikidata, is this still legal? --Pasleim (talk) 11:17, 29 September 2017 (UTC)

They have their copyright, when we collaborate we can discuss what we do and when they are fine with it, there is no problem. They may have all our changes anyway.. Thanks, GerardM (talk) 12:27, 29 September 2017 (UTC)
@GerardM: "when we collaborate we can discuss what we do and when they are fine with it, there is no problem" Seriously? Repeating "collaborate" would not make copyright violation acceptable Mateusz Konieczny (talk) 12:36, 29 September 2017 (UTC)
@Mateusz Konieczny: When the copyright holder indicates permission, there is no problem. Thanks, GerardM (talk) 08:39, 3 October 2017 (UTC)

Soweego: wide-coverage linking of external identifiers. Call for support

Pardon me if you have already read this in the Wikidata mailing list.

Hi everyone,

Remember the primary sources tool?
While the StrepHit team is building its next version, I'd like to invite you to have a look at a new project proposal. The main goal is to add a high volume of identifiers to Wikidata, ensuring live maintenance of links.

Do you think that Wikidata should become the central linking hub of open knowledge?

If so, I'd be really grateful if you could endorse the soweego project:
m:Grants:Project/Hjfocs/soweego

Of course, any comment is more than welcome on the discussion page.

Looking forward to your valuable feedback.
Best,

--Hjfocs (talk) 09:12, 25 September 2017 (UTC)

I remember there being a grant for improving the tool in a way that solves it's useability challenges. From what I see the tool has the same problems it had a year ago. I feel like there is now an additional problem of some claims having no approve/reject button but I don't see any improvement. Can you give us an update about what happened in the last year? ChristianKl (talk) 19:10, 28 September 2017 (UTC)
@ChristianKl: while this proposal will upload its output to the primary sources tool, it's not strictly related to it.
To answer your question, the project to uplift the primary sources tool has started end of May 2017,[1] has posted an uplift proposal,[2] notified the community,[3] and is in active development.[4] Browsing and de-duplication features are deployed in the current gadget, but the new version is estimated to be delivered on February 2018.[5]
Cheers,
--Hjfocs (talk) 11:20, 3 October 2017 (UTC)
  1. m:Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References/Renewal/Timeline#Overview
  2. Wikidata:Primary_sources_tool
  3. https://lists.wikimedia.org/pipermail/wikidata/2017-July/010902.html
  4. phab:project/view/2788/
  5. m:Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References/Renewal/Timeline

Delete less precise dates

When I see "16 December 1986" and also "1986" (usually with the reference VIAF for year only values) for in a date field, can I delete the less precise date. These seem to be the product of merges. The problem is that it confuses the search for imprecise dates that need to be researched to find out the precise dates. --Richard Arthur Norton (1958- ) (talk) 00:58, 2 October 2017 (UTC)

I would say the reference for the more precise date should be compared to the reference for the less precise date. If the reference for the more precise date is better, then the source should be consulted to see if it really supports the date. Only then should the less precise date be deleted. Jc3s5h (talk) 02:31, 2 October 2017 (UTC)
I also have the same question. I have been deleting the less precise dates, but maybe if the less precise date is older (so it appears first in the list) then it should just be updated with the more precise date (assuming it is confirmed, etc) and the newer date deleted. This is I guess also a question of whether it matters (remember we went and undeleted masses of items after we invented the redirect function). Jane023 (talk) 07:21, 2 October 2017 (UTC)
Simple answer: whatever can be or is externally referenced should be kept, even if you have a co-existence of a less precise and a more precise date. You might want to set preferred rank for the more precise variant: that way data users typically only see only one date, although the less precise one is still considered valid. —MisterSynergy (talk) 07:59, 2 October 2017 (UTC)
Thanks for the answer. A little background: I work a lot with paintings and their painters, so most dates have been mass-uploaded through Commons or some Wikipedia, or even added by myself through some bot edit. Do you mean to say that all of the "imported from" things should be kept? I feel no moment of doubt when I delete things I myself might have created, but for these other imports I am not so sure. Jane023 (talk) 08:10, 2 October 2017 (UTC)
Well, claims with “imported from: some Wikimedia project” are not externally referenced and thus a different case. It would be worth, however, to check whether the project the data was imported from contains an external reference that backs the less precise date variant. If so, keep the claim and add the external reference, otherwise overwrite it with the more precise variant, or remove it if already both are present.
Perhaps it is worth to mention that we do want to have all data that can be (seriously) externally referenced. It does not matter whether it is “less precise” than some other data we have about this entity, or even whether it is known to be wrong. Ranks help in both cases to manage such claims in a way that Wikidata reports only one claim if necessary, and never reports any wrong (yet seriously referenced) data. —MisterSynergy (talk) 08:20, 2 October 2017 (UTC)
I can see the point of keeping deprecated, but seriously referenced, values on the basis of informing readers of mistaken values (or even lies) that have received serious attention. But I can't see the value of stating looser precision values of dates that are in harmony with well-referenced precise dates. For example, I see no value in adding values to the inauguration date for Bill Clinton (Q1124) "January 1993", "1993", "20th century", and "2nd millennium", even though I'm sure we could find reliable sources for all those values. It suffices to indicate the start of his presidency was "20 January 1993". (But the item lacked a reliable source, which I will add.) Jc3s5h (talk) 17:59, 2 October 2017 (UTC)
  • Well stated, most of the year only dates came from VIAF which only lists years. I see no need to keep less precise dates, just well publicized errors, like if the tombstone is incorrect or a major obituary is incorrect. When the SSDI came online I was able to correct seven birth year errors from actors that were in the Britannica online and repeated in Wikipedia. I can see keeping them to deprecate them as widely reported errors with the source included. The temptation of a future editor will be to change it to the more publicized version, even if now known to be incorrect.

OK so in summary, I will continue to delete dates of the form "1930" when there is also an equally trustworthy, but more precise date of the form "1 June 1930" whether or not it is referenced (not all statements need to be individually referenced to be trustworthy). And even if it is a baptismal date other than the birth date, it is still more accurate than "1930", so I will delete the statement for "1930" in such cases. Jane023 (talk) 08:56, 3 October 2017 (UTC)

FYI, my bot has already been doing it. Matěj Suchánek (talk) 12:19, 3 October 2017 (UTC)
Good to know! Jane023 (talk) 12:41, 3 October 2017 (UTC)
yes, indeed ! Matěj Suchánek, how often does your bot runs on this ? --Hsarrazin (talk) 13:09, 3 October 2017 (UTC)
Whenever I recall I'm supposed to run (there are many tasks to do here). Matěj Suchánek (talk) 13:39, 3 October 2017 (UTC)
Another useful task would be to set higher precision date as "preferred one" if there are two non conflicting dates with the same level of referencing. --Jarekt (talk) 13:49, 3 October 2017 (UTC)
The question of ranks is also mentioned in the permission request but it's not clear whether bots should do this. Matěj Suchánek (talk) 14:22, 3 October 2017 (UTC)
@Jane023: If only date of baptism in early childhood (P1636) is set, please don't remove date of birth (P569). Many date consumers like c:Template:Creator can't handle date of baptism in early childhood (P1636) only (yet). Thank you, --Marsupium (talk) 14:11, 3 October 2017 (UTC)
Don't worry - if only date of baptism in early childhood (P1636) is set, then I am likely to add it to a date-of-birth statement, not remove it. Jane023 (talk) 14:26, 3 October 2017 (UTC)

As I was manually importing many dates from Commons Creator page (Q24731821) I usually used the following set to rules:

  • If I see both "Imported from" (internal reference) and "stated in" (external) references for a single date, I was deleting "Imported from" to reduce clutter.
  • If I was adding higher precision date to a statement that already had low precision date I would look at references and
  • delete the old date it it had no external references and new one did
  • keep both but "prefer" new one if both had external references
  • keep both but "prefer" old one if it had external reference and the new date did not

In this last case I occasionally spend a lot of time chasing references. One thing that was really slowing me down is that there seems to be no way to quickly add an identifier (like ULAN, RKD, Benezit, etc.) as a "stated in" reference. I was hoping for some cut-and-paste or "drag" solution but found none. --Jarekt (talk) 13:45, 3 October 2017 (UTC)

There is a gadget for copying references but I have not got it to successfully work. As a shortcut I just say "stated in"= item for RKDimages or item for RKDartists, and make sure the property is set in external ids. Jane023 (talk) 14:29, 3 October 2017 (UTC)
I agree with MisterSynergy and Jarekt (?) that statements from reliable or serious sources shouldn't be removed in general. I've yet encountered cases where a year precision date from RKDartists (Q17299517) – which often has month or day precision dates – was more reliable than a more precise date from another source. --Marsupium (talk) 14:11, 3 October 2017 (UTC)

Please help create a list of useful spreadsheet functions for formatting data to import into Wikidata

Hi all

I'm slowly learning the ropes of Wikidata, one of the steepest learning curves is knowing what spreadsheet functions I need to transform the data I have (often copied from website text rather than a structured database) into something Wikidata can use.

I've started to put together a page of the most commonly used fuctions and where they are useful. I'm not sure where this would go but it feels important to collate for people new to Wikidata and spreadsheets.

User:John_Cummings/Useful_spreadsheet_processes_for_Wikidata

If any of you know any useful spreadsheet tricks to wrangle data for Wikidata please do add them.

Thanks

--John Cummings (talk) 14:14, 2 October 2017 (UTC)

@John Cummings: I've added things about OpenRefine in your page. It's not a spreadsheet software strictly speaking but it's designed exactly for this purpose. − Pintoch (talk) 09:56, 3 October 2017 (UTC)
@Pintoch:, perfect, thanks very much, I have been looking for some kind of solution to the 'select every Nth line' for ages, its a real pain. --John Cummings (talk) 10:03, 3 October 2017 (UTC)

fragmented SCOPUS IDs for scientists

I am trying to improve my knowledge of wikidata item of scientists and it is the third time I face the same problem.

Basically, one of the most useful data for IDs in SCOPUS, on which I can rely in a vast majority of the cases for every established name (that is, more than 10 publications). Unfortunately, working in China, I can say that SCOPUS is 80% right with Euro-american names, but usually not precise with other people.

Here is the last example this guy, of Mas Subramanian Q22096519. This is a quite simple case to discuss, where only 4 old articles are lost in the scopus ID 24489671200, with a more complete main one.

Sometimes it gets worse, for example this one is clearly a merge of two profiles, both inactive (the younger one is a school teacher now) so there is no hope to fix them soon. But let's focus on the fragmented case for now.

In the end SCOPUS is not so bad, it is one of the best when it is updated with a lot of useful graphs and summaries and the list of aliases, but it can make mistakes. A fragmented ID in any case it is not the worst-case scenario, but it is quite common especially for Chinese people, indians, Koreans..

I think it is good sense to assume that on the long term some "left-over profile" will be merged, but often they have a lower number, and I have no idea what it is their policy, if they are keeping the older ID or the ID with more entries... and so I don't know what to do. I keep ignoring and don't insert scopus IDs? I put both of them? I rely on future bots, do you know if they can handle this decently? Any previous expertise?

I still think that creating some sort of automatic mail to be sent to their professional address (to be input manually) and inform them about the issue would be interesting. I am making some tests with people I know and they like wikidata, so I think it could work. Anyway, in the present if I have time, sometimes I send a private email and wait, but if they are young researchers they care but big professors have better things to do.--Alexmar983 (talk) 05:14, 2 October 2017 (UTC)

@Alexmar983: I haven't worked directly with SCOPUS id's but I think the best approach here is to add both id's to the wikidata item if you are sure it's the same person, and then at some point you can do a query (or look at a constraint violations report) and send all the instances of multiple id's to SCOPUS to suggest they get them fixed. ArthurPSmith (talk) 15:49, 2 October 2017 (UTC)
Good point, do you have any idea ArthurPSmith where to suggest this strategy? I mean, a guide line. I face this problem once a month, I'll never reach a critical mass soon to disturb SCOPUS. Plus, i know how a single author can contact SCOPUS, from here I find this form, it should be ok.
So we link them a query and suggest to run. But maybe I can contact them citing this discussion and asking about their policy of ID management. If we know if they make redirects, or always selelct the lower ID, we can run a bot more easily in the future.
I have to say they have a lot of mistakes here and there but they are putting a lot of effort to improve them, and there is no big difference if the notification come from us or a single scientist.--Alexmar983 (talk) 02:50, 3 October 2017 (UTC)
It would probably be helpful to leave a note on the talk page of Scopus Author ID (P1153) so others know what to do with this data, and sure let Scopus know where to look for possible duplicate records. ArthurPSmith (talk) 14:19, 3 October 2017 (UTC)
Property_talk:P1153#Handling_duplicates. Thank you ArthurPSmith, i will come back to write more in the following days, let's here the other wikimedians.--Alexmar983 (talk) 06:01, 4 October 2017 (UTC)

Inactive identifiers

Is there a way to mark identifiers that are no longer valid, because the website is no longer active? Identifier Property:P3520 is no longer active, it would be nice to have the website status listed in a field. That way people will know it is not just down for them, or that there is a formatting error. --Richard Arthur Norton (1958- ) (talk) 14:14, 3 October 2017 (UTC)

Mark the formatter URL as deprecated, with a suitable qualifier for the reason. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:56, 3 October 2017 (UTC)
P.S. As indeed was done last May. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:57, 3 October 2017 (UTC)

New openstreetmap object property?

Hi, I am trying to find a practical solution for infobox maps so that they could be automatically positioned and zoomed. Another related problem to this is how it can be decided in the template if an item should be drawn as a marker or a shape.

Basically kartographer can pull shapes and set position and zoom automatically if there is an OpenStreetMap object with a matching wikidata key. However template or Lua code cannot check if the key exists in OpenStreetMap which means that in practice this still needs to be done manually before you can be sure that map will work.

One way to solve this would be by storing OpenStreetMap object information (per Overpass API:Permanent ID) in Wikidata items too so information could be an indicator on if the OSM backlink exists.

So I am trying to get comments before I decide if it is a good idea to make a proposal for a new Openstreetmap property or not for this (more details here). And if it is not a good idea I would like to hear what would be a good alternative route to solve the problem? --Zache (talk) 18:07, 3 October 2017 (UTC)

Hi, thank you very much for rising this problem here. I believe that having such new property is a not so good solution for a real problem: there should definitely be a way to allow Lua modules to get this data but I am not sure that having the Wikidata-OSM mapping in both Wikidata and OSM is a good idea: we would have to maintain the two mappings and make sure that both of them are synchronized and that would be a quite big effort for a very specific use case. Tpt (talk) 20:18, 3 October 2017 (UTC)
My guess is that this is a more general problem even though I have a specific use case it affects. Anyway, a second example of the problem is if you are drawing maps using SPARQL-queries (Example map) then your SPARQL result will be fine but because of missing wikidata keys in OSM the map will fail. Because data cannot be accessed using SPARQL you cannot see where the error is. Again you can check them one by one but it is slow and there is also the problem that if something changes in OSM then your map can fail and then you need again to check things manually. Technically the solution to this would be SPARQL federation and all that stuff, but like with access from Lua this is something which doesn't exist and implementation can take years even when it takes off. However you are correct with that it is effort to create two way linking instead of oneway link and keep them sync them but i think that it is useful to do even so because work can be automated and it will make data more accessible to us. --Zache (talk) 04:46, 4 October 2017 (UTC)

Professional relations

There's a bunch of professional relations between people:

  • P184 doctoral advisor
  • P185 doctoral student
  • P737 influenced by (partly, range may be different from person)
  • P802 student
  • P1066 student of
  • P1327 professional or sports partner
  • P1775 follower

(to try to find more, see this query but I don't know if it's possible to restrict to range "person"):

select * {
  ?p wdt:P31 wd:Q18608871; wikibase:propertyType wikibase:WikibaseItem; rdfs:label ?label
  filter (lang(?label)="en")
}

Try it!

In the religious domain there is:

These are very narrow. They express the idea that X (or X&Y) promoted Z to some capacity. There's also "beatified" (proclaimed someone a saint), "coronated" (proclaimed someone a king), "knighted" (proclaimed someone a knight, etc). These could be related to an event (you may well want to capture when the promotion happened), whereas the examples on top are straight relations.

  • We got significant person (P3342), which is a generic mechanism that uses qualifier "as" to specify the kind of relation.
  • We got relative (P1038), a subproperty to express "familial relation", used with qualifier "type of kinship".

Questions:

  • Should we create a sub-prop of significant person (P3342) to express "professional relation"? (DNB's AGRELON ontology has 2 prop hierarchies for Familial and Professional relations)
  • Should we replace the above specific relations with this generic mechanism?

Otherwise eg @Ergo Sum: is right to give the argument "if consecrator (P1598) is allowed, we should allow co-consecrator"

--Vladimir Alexiev (talk) 17:38, 2 October 2017 (UTC)

The problem with condensing all these properties to a single one is that there is often a need to express several of these relationships in a single item and, specifically, the need to differentiate among the types of relationships. As just one example, en:Template:Ordination, which was the reason I proposed the two properties, calls several properties from different parameters that would be condensed into one under this proposal, rendering the template useless. Besides the structural reason for keeping them, it is clearer for editors and those retrieving information to know the type of relationship of an item to a value rather than just many relationships existing. Ergo Sum (talk) 22:24, 2 October 2017 (UTC)
Of course the relations will be differentiated, that's what the qualifier is for (see examples in significant person (P3342)). If Ordination is modeled as an Event, we should model consecrator, co-consecrator etc as roles in this event rather than direct relations from the beneficiary of the event --Vladimir Alexiev (talk) 07:30, 3 October 2017 (UTC)
@Vladimir Alexiev: Forgive me for not being intimately familiar with Wikidata; I am primarily a Wikipedia editor. So, are you suggesting that properties such as consecrator (P1598) and the proposed "elevated cardinal by" merely be treated as sub-properties of significant person? For example, the Francis (Q450675) item currently looks like Property: significant event (P793) → Value: consecration (Q125375) → Qualifier: consecrator (P1598) → Value: Antonio Quarracino (Q604515). How does your proposal differ? Ergo Sum (talk) 02:33, 4 October 2017 (UTC)

Irish-American

Irish-American has two meanings, a migrant from Ireland to the United States and someone who's ancestral home (ethnicity) is Ireland (like African-American). Should we restrict the use in the description field to migrants? and leave the other to the field "Ancestral home"? --Richard Arthur Norton (1958- ) (talk) 22:53, 2 October 2017 (UTC)

If there are two meanings then they should be two items with descriptions suitable to disambiguate.  — billinghurst sDrewth 23:06, 2 October 2017 (UTC)
The question was about the use of the phrase in descriptions. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:01, 3 October 2017 (UTC)
Yes, it should be restricted to the usage for migrants. We usually don't (and shouldn't) include ethnicity or race in the short descriptions of modern people, and certainly not if the country of citizenship ("American") is clear. --Anvilaquarius (talk) 13:05, 4 October 2017 (UTC)

Can we use Wikipedias citation mechanism

I really like Wikipedias new citation mechanism, it makes it easy and provides good results. It works by making a call out to a citation bot used by universities , could the same mechanism be added to wikidata?

The best way to make people do the right thing is to make the right thing the easiest option!

Have you seen this page yet? m:WikiCite? -- Fuzheado (talk) 13:51, 3 October 2017 (UTC)
I'm not very sure if the IP that asked the question wanted a tool to extract data from wikidata, like in the citing mechanism on wp, or, on the contrary, a tool that would allow to inject data from libraries directly into wikidata items (i.e. input an isbn and the title, author, language, editor, date, etc. would automatically be added)...
I, for one, would be very interested in the second option ;) --Hsarrazin (talk) 14:16, 3 October 2017 (UTC)
Wikipedia's mechanism is called 'Citoid'; it uses a tool called Zotero, which you can use on your own computer, too. See WD:Zotero. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:59, 3 October 2017 (UTC)
If I understand the question correctly, I would suggest CiteTool from User:Aude, which adds citation metadata extracted from a URL. (GitHub project) − Pintoch (talk) 09:30, 4 October 2017 (UTC)
It would be a good idea to add Aude's CiteTool to gadgets. --Zache (talk) 09:56, 4 October 2017 (UTC)
could you please explain how to use it, please ? I don't see any button, or menu, or... ? --Hsarrazin (talk) 10:25, 4 October 2017 (UTC)
Yes, now I remember being confused about that too!
  • Create a new reference
  • Input a URL with reference URL (P854)
  • An autofill button appear in the top bar of the reference
  • Click on that autofill button and wait for a few seconds
  • You should see new entries in your reference (retrieved date and title usually, you might get richer results with scholarly pages)
Pintoch (talk) 10:33, 4 October 2017 (UTC)

Redundant Wikipedia citations

Where we have two or more references on a statement, and one of them is "imported from [a] Wikipedia", and one of the others has, say, a DOI or URL, should we remove the Wikipedia citation? If so, what types of citations should trigger such a removal?

If there is consensus, I will request a bot to undertake this task. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:38, 3 October 2017 (UTC)

Urls alone are unreliable. What is the imperative to remove the "imported from" statements with a bot? I would like to see a primary or an authoritative secondary statement. I also see nothing wrong with eyeball assessment and removal at this time, and no need to bot remove yet.  — billinghurst sDrewth 21:18, 3 October 2017 (UTC)
See also #Delete less precise dates above. I think we really have to find a solution for this. There are already items with very cluttered sources. Often I don't remove imported from Wikimedia project (P143) references – it can take a lot of time – waiting for a future bot to do this. It's the right way in my eyes. We just need to define the criteria when to remove them. Another reference with the combination of stated in (P248), a (fitting) external ID and retrieved (P813) for example would be such a case for me. --Marsupium (talk) 07:34, 4 October 2017 (UTC)
personally, I use imported from Wikimedia project (P143) references to find the original source on wp, and import it in wikidata (using Drag'n'Drop gadget for instance), and THEN I remove the original imported from Wikimedia project (P143) claim... don't know if this could be more automatized, though :( --Hsarrazin (talk) 07:58, 4 October 2017 (UTC)
There are external sources that are better than Wikipedia, and ones that are worse. A bot can't tell the difference. Jc3s5h (talk) 11:30, 4 October 2017 (UTC)

Wikidata in Wikipedia

Hi, some time ago I translated the WikidataIB module from English Wikipedia for Portuguese Wikipedia, which allowed us, Portuguese-speaking wikipedians, to create new automatic infoboxes that take data directly from Wikidata. However, there has been a community debate about using Wikidata within Wikipedia and some points are persistent there, so I'm here to question about them, in hope that here I can obtain referrals or answers that help me to understand and answer these questions properly.

  • 1st: "There is little patrolling at Wikidata"
  • 2nd: "Pages (at Wikipedia) that transcludes data from Wikidata take more time to load"
  • 3rd: "How is the process of (or who) watching if there's vandalism (even subtle)?"
  • 4th: If a article in a Wikipedia (like w:pt:Suga in ptwiki) is protected allowing editions only by autoconfirmed editors, the informations extracted from Wikidata remains vulnerable to vandalism. What should be done? Ask to protect the item on Wikidata? (if yes, there's a automatic way?)

If this is not the place, or there's a better place to ask these questions, please tell me. Good contributions! Ederporto (talk) 00:28, 4 October 2017 (UTC)

Hello Wikidata community. We are indeed having a vivid discussion on ptwiki. I have two more questions, please:
  • 5th: As far as being able to patrol edits made on Wikidata from Wikipedia goes, how is this being developed? Filters to only see from Wikipedia what matters on Wikidata edits appear not to be as good as they should be.
  • 6h: Could we integrate on Wikipedia the history of edits on Wikidata, so we can see from Wikipedia the evolution of information on the infobox?
I appreciate your effort! --Joalpe (talk) 02:53, 4 October 2017 (UTC)
Answers

4th: as fiwiki admin if page is protected and it is vandalised using Wikidata as proxy then i wouldoverride the values from wikidata with fiwikis values because this will remove the reason to vandalise wikidata and fiwikis page is protected. --Zache (talk) 04:55, 4 October 2017 (UTC)

@Ederporto: good questions. Recent Changes works the same here as in other wikis for patrolling and looking for vandalism. There is a lot of "bot" activity, but if you filter out that and changes from long-established users, the rate of changes in wikidata is not very great (less than 100 per hour needing to be patrolled). However, I think we do not currently have enough people to catch all vandalism. It would probably be helpful to organize some sort of patrol schedule so things don't slip through as much as they have been. Much is caught quickly though. Admins can protect pages and can block people if attention is brought to this - just post on the admin noticeboard here. There isn't an automated way for protected pages in other wikis to have that protection pushed through to wikidata, it might be something to propose to the developers. ArthurPSmith (talk) 14:04, 4 October 2017 (UTC)

Hey :) John and others are currently working on Wikidata:Wikidata in Wikimedia projects to give an overview. Let us know if that is helpful for you. --Lydia Pintscher (WMDE) (talk) 17:28, 4 October 2017 (UTC)

About Filters to only see from Wikipedia what matters on Wikidata edits appear not to be as good as they should be.. My guess is that the New filters for edit review (in beta features settings) is the first one where progress with this will be seen. However they need testing and feedback so enable it in your homewiki and tell what is good/bad and what needs to be fixed etc. Their feedback page is here. --Zache (talk) 17:52, 4 October 2017 (UTC)

Vandalism

An IP user changed the EN label for island (Q23442) to the emoji of a palm tree - cute - and it stayed live for three and a half hours until I happened to stumble across it. Perhaps some tool should catch emoji used as labels? - PKM (talk) 07:06, 4 October 2017 (UTC)

Recent changes with Tag:EmojiMisterSynergy (talk) 07:30, 4 October 2017 (UTC)

Terse descriptions

Is it a general rule to keep the description of a Q-value as terse as possible? I see some geographic entries as "city in Bergen County, New Jersey, United States" and others as just "city in New Jersey". Is it a rule to keep them to the minimum needed to disambiguate them from ones with identical titles? I can see expanding if it was "Smithtown: city in Bergen County, New Jersey" and "Smithtown: city in Morris County, New Jersey". How should I be harmonizing them? --Richard Arthur Norton (1958- ) (talk) 22:44, 1 October 2017 (UTC)

The descriptions as available are problematic as they are. It has been argued over the years that the automated descriptions are superior. This is acknowledged but it makes no difference. Same as with terse descriptions, they lack even more information to help disambiguate. Thanks, GerardM (talk) 04:14, 2 October 2017 (UTC)
@Richard Arthur Norton (1958- ): see Help:Description. Descriptions should be terse. Whether you write "city in Bergen County, New Jersey" or only "city in New Jersey" depends if there are multiple cities with the same name in New Jersey. So descriptions should be as short as possible but still disambiguate items with the same label. --Pasleim (talk) 23:34, 2 October 2017 (UTC)
Excellent, and I agree 100%. --Richard Arthur Norton (1958- ) (talk) 23:39, 2 October 2017 (UTC)
Some of the longer descriptions are created via Mix'n'Match, depending on what’s in the matched catalog. For example, the NARA places Thesaurus of Geographical Names catalog created Chehalis Indian Reservation (Q41522276) with the description “Grays Harbor, Washington, United States, North and Central America, World”. I trimmed that to “Grays Harbor, Washington, United States” and added “reservation in...”. - PKM (talk) 23:55, 2 October 2017 (UTC)
A recent comment on en-wp suggested that for some mobile applications, the descriptions should ideally be less than 36 characters -- it is not a shipwreck if they are longer than that, but the content after 36 characters may not be seen in some searches.
I certainly feel we could do with some more detailed guidance at Help:Description -- see for example my comment in en:Wikipedia_talk:Wikidata/2017_State_of_affairs#Strategies_for_improving_the_descriptions this thread at 12:36, 20 September 2017 in the en-wiki thread asking for guidance on what would be best practice for descriptions for eg civil parishes in West Sussex.
In some areas, eg people, we do have a format that is used quite widely. But in other areas there is much less clarity, and some discussion about what we want for best practice, and then efforts to improve what we have would be useful. (Particularly if en-wiki are about to fork the lot -- it would be good to do our best to improve and consolidate descriptions here before that happens. Jheald (talk) 21:53, 4 October 2017 (UTC)
The benefit of automated descriptions is that you can have multiple algorithms. The drawback of descriptions less than 36 characters is that they are useless for disambiguation purposes. Thanks, GerardM (talk) 05:13, 5 October 2017 (UTC)

Problem solving use of contributor(s) to the creative work or subject (P767) — incorrect classification of building (Q41176)?

I am looking at Wikidata:Database reports/Constraint violations/P767#Types statistics and I see for contributor(s) to the creative work or subject (P767) that it includes types that are patently incorrect

  • building (Q41176)
  • pharmacy (Q13107184)
  • church (Q16970)

it would seem to come about through the classification of building (Q41176) -> architectural structure (Q811979).

Could someone please sort out the class and instances? Thanks.  — billinghurst sDrewth 23:06, 2 October 2017 (UTC)

building (Q41176) -> architectural structure (Q811979) -> construction (Q811430) -> artificial physical object (Q8205328) -> work (Q386724). I think the claim that an artificial physical object is a subclass of a work is wrong. --Pasleim (talk) 23:24, 2 October 2017 (UTC)
Thanks. I agree with your thoughts, so I have been bold enough to remove that. @Pasleim: Is there an easy tool to see that framework or is it just a matter of backtracking?  — billinghurst sDrewth 00:31, 3 October 2017 (UTC)
There is Wikidata generic tree --Pasleim (talk) 09:44, 3 October 2017 (UTC)

Seeking alternate properties

Seeking feedback about alternate properties that could be used to describe people's involvement in the construction of a building, or a component. At Wikidata:Database reports/Constraint violations/P767 the report now shows components

If there is no better set of properties, and people feel comfortable that this is the set to use, then should we be looking to expand the constraints.  — billinghurst sDrewth 23:01, 4 October 2017 (UTC)

Food ingredients

I added a claim that Q27643250 is a subclass of food (Q2095), but this was removed by User:Infovarius with the question: "you sure that e.g. starch (Q41534) can be used in food directly?".

Given that we define food as "any substance consumed to provide nutritional support for the body; form of energy stored in chemical form", I think my edit was correct, but I thought it sensible to check whether anyone has a better way of modelling this? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:48, 3 October 2017 (UTC)

Maybe User:Arwiniwra (created Q27643250) could clarify this? --Succu (talk) 19:21, 3 October 2017 (UTC)
I support the claim as a scientist. Some starches are not digestable, such as microcellulose, but they are still used as food ingredients to alter texture. Non nutritive starches, that do not add to calories, but create the mouth feel of fat are in great demand. --Richard Arthur Norton (1958- ) (talk) 01:11, 4 October 2017 (UTC)
E175, used as food ingredient (dye) is an extreme example. And I am pretty sure that gold (Q897) should not be classified as food. Mateusz Konieczny (talk) 19:32, 4 October 2017 (UTC)
I created Food Ingredient initially as a category that could hold all ingredients as listed on food product ingredient lists on product labels. Now I know the whole discussion of subclassing can be pretty complicated. Sometimes, it makes more sense to consider something a property of a food, and an ingredient could certainly be considered as such. An extreme example would be to say that a water molecule is a subclass of food - it’s certainly in there, but does it make sense as a subclass of food? Is a pillow or a cloth a subclass of a couch? On the other hand, fruit is perhaps a much more obvious subclass of food. So I am inclined to agree that food ingredient would not be a good subclass of food. More specific relationships would probably make more sense, such as a component / part-whole relationship type. I’ve been away from wikidata for too long to know what relationship makes more sense here, but at least I can certainly see objections against subclass in this case.

Relations vs Events

@Ergo Sum: Note that consecrator (P1598) gives this simpler example:

 Francis (Q450675) 
   consecrator (P1598) - Antonio Quarracino (Q604515)

That's what I call "pure relation" and my proposal here is to replace it with a generic approach:

 Francis (Q450675) -
   significant person (P3342) - Antonio Quarracino (Q604515)
   P794 (P794) - consecrator (Q4231232)

Notice that we use property Consecrator for the specific approach, but entity Consecrator for the generic approach. Wikidata has 10k properties (and that's already too much) but 30M entities, so adding entities is a lot better.

The actual value you cite doesn't comply with the property example:

 Francis (Q450675) 
   significant event (P793) - consecration (Q125375) 
   consecrator (P1598) - Antonio Quarracino (Q604515)

Such non-compliance is harmful because it confuses editors which approach to use.

It's what I call an Event-based approach. Rather than saying "X is related to Y", you're saying "X and Y participated in an event, and Y had a certain role in that event". This approach is quite nice because you can also describe date, place etc. But you have failed to relate the event to this info about Francis:

 Francis
   position held - auxiliary bishop
   start time - 20 May 1992

Francis was affirmed in this position by the event. (I thought the position is the "outcome" of the event, but now I notice the event happened after he took the position, so it's a sort of "ceremonial affirmation" I guess).

This shortcoming aside, we can model the actor's role in the event in a generic way:

 Francis (Q450675) 
   significant event (P793) - consecration (Q125375) 
   event actor - Antonio Quarracino (Q604515)  
   P794 (P794) - consecrator (Q4231232)

(Maybe prop "event actor" already exists, I haven't looked). But this only works if there's one actor. If we need to model more participants (consecrator, co-consecrator; as you say "to express several of these relationships in a single item"), then we need to make the event an entity in its own right. This will also allow us to link it freely to other events:

 Francis
   position held - auxiliary bishop
   start time - 20 May 1992
   end time - 3 June 1997
   diocese - Roman Catholic Archdiocese of Buenos Aires
   affirmed by event (promotion event) - Consecration of Francis as auxiliary bishop
 Consecration of Francis as auxiliary bishop
   type - consecration (Q125375) 
   event recipient - Francis
   event actor - Antonio Quarracino
     as - consecrator
   event actor - Whoever
     as - co-consecrator
   point in time - 27 June 1992
   location - Buenos Aires Metropolitan Cathedral

--Vladimir Alexiev (talk) 08:00, 4 October 2017 (UTC)

@Vladimir Alexiev: Thank you for clarifying. I cannot see how creating a new item to correspond with every Catholic and Orthodox bishop (there are thousands of such items) would be helpful. Every Catholic and Orthodox bishop has consecrators and co-consecrators (as well as other relations related to the ceremony). This would involve duplicating a tremendous amount of information and gives rise to the very likely possibility that information could be discordant between the e.g. 'Francis' and 'Consecration of Francis' items. (As an aside, the consecration relates to the creation of a bishop as a bishop, not as a particular administrative position of a bishop like an auxiliary).
The initial, 'simple', method in which the information is still contained within one item would not then work either because by having one qualifier as 'event actor' and another as 'as' would not allow templates to call the information, one of the points of having structured, machine-readable information in Wikidata. Ergo Sum (talk) 21:42, 4 October 2017 (UTC)
Note: if possible, don't use "as", use subject has role (P2868) or object has role (P3831) to make explicit whether it is the subject of the statement or the object of the statement that was playing the role you are about to give. Jheald (talk) 21:40, 4 October 2017 (UTC)
@Jheald: Did you mean to write different Qs there? Ergo Sum (talk) 21:44, 4 October 2017 (UTC)
@Ergo Sum: Sorry, wrong template. Now fixed, should be clearer. Jheald (talk) 21:54, 4 October 2017 (UTC)

Compiling general guidance and tips on matching external databases into Wikidata (Mix n' Match etc)

Hi all

Do you have any tips for matching external databases to Wikidata using Mix n' Match etc?

I've been going over previous databases that I've imported and I'm definitely learning things as I go and getting better at it. Some of this is because I have more knowledge on the subjects, some more general rules and ways of avoiding problems further down the road.

I want to write this up as some general guidance so that our data matching process gets better, please brain dump here and I will put it somewhere sensible.

Thanks

--John Cummings (talk) 10:05, 4 October 2017 (UTC)

Make sure to include or generate descriptions that make matching easier. For example, for humans: gender, date of birth and death, etc. Sjoerd de Bruin (talk) 11:36, 4 October 2017 (UTC)
Good idea! It would be useful to have a part on what not to match, e.g. items for disambiguation pages with external disambiguations (there hasn't be certainty e.g. here). Also something on how to handle withdrawn identifiers, if to deprecate them or not. The only short discussion on this I remember is this one, but I think there have been more. Regards, --Marsupium (talk) 12:11, 4 October 2017 (UTC)
I'd say to try to match as much as possible with comparison against systematic queries and Quick Statements first, and only put things in to Mix'n'match as a last resort. But perhaps I'm being too pessimistic? How good is MnM at picking up obvious matches these days? And at not suggesting matches that are obviously not correct (eg born & died in quite different centuries) ? It is handy to have matched items in MnM, because it is very useful that they then show up in the MnM search function. (Which btw can be extremely useful when going down a list of items to check). I do prefer to do as much matching myself as I can first, because then I feel I have more control & I know what criteria are being applied. But what do others think? Jheald (talk) 21:36, 4 October 2017 (UTC)
Thanks @Jheald:, what strategies do you are doing the matching yourself when you have to do the matching by hand? --John Cummings (talk) 07:42, 5 October 2017 (UTC)
Hi John. It's mostly identifiers for geographical places I've been doing most recently -- I had quite a blitz in the spring. That has a huge advantage that you can match on coordinates being reasonably near, as well as an approx match on the names. I found a nice Perl library (yes, I *am* still basically living in the 90s...) for geospatial indexing ("Algorithm::SpatialIndex"), so I could load in say 20,000 UK places from Wikidata, and then run a similar sized set against it, all in a matter of a couple of minutes or so -- so that for each identifier from the set I was importing, I could get a list of the top 10 potential matches for settlements/parishes within a cut-off of about 8 km. Those I would put in a file, then on that file I'd run a second script to compare the names of the potential matches, stripping out spaces, punctuation, full stops, anything after a first comma or bracket, and standardising the most common elements that could be variable (eg 'upon -> on', 'saint' -> 'st', etc.), to give a list of refined matches.
If there was any other information in the source (eg the county), I would also check that, and put on one side for a closer look anything that didn't match (was it due to a boundary change perhaps?). Also if there was more than one good match, I'd put that to one side (merge candidate? something more subtle going on?).
But if neither of those filters pinged, then I'd just upload the matches with QuickStatements. After that, next step would be a query to see if any of the WD items had been matched to more than one identifier -- again, something to investigate. Then, having taken a look at the ones needing more care, I'd take a look at how much of the identifier set there was left to do.
That's about as far as I took it (at least, as far as I have so far). I'm conscious that there are still quite a lot of places from eg Vision of Britain that are unmatched, and more Domesday places, and KEPN places, and I think even some Geonames places. It's quite likely that some are in, but without coords or a P31 (or, in some cases, without almost any statements at all) - so I have been trying to think how to find these (eg sometimes matching on name + county, or name + geonames), to try to uncover more items that are UK settlements, but not identified as such. Then I'll run the process above again.
The question arises as to how much you can do, in an organised controlled way like this, before that's it. Sometimes identifying a new match can reveal a whole class of potential items that more matches might be found in -- eg the bot imports from sv-wiki with no P31 -- which can give a few more clues. But probably the point comes when you just say, "right, everything else into MnM", or "everything else I'm just going to create an item for" -- eg User:Charles Matthews got to that stage with imports of DNB people and of Art UK people, and User:Multichill with artists, when importing items for Sum of All Paintings.
Going ahead and creating live items on Wikidata has its upsides and downsides. On the one hand, having unmerged duplicates running wild in Wikidata is a menace, because you don't know which statements may get put on to one item and not on to others -- links and information may get split between the two contenders in a quite unpredictable way. But on the other hand, one advantage of having the items here is that you then have items, on which you can put the information you know about them; and you can then extract it with queries, and in particular with Listeria, to produce auto-updating tracking lists. I've been particularly impressed with Multichill's Wikidata:WikiProject sum of all paintings/Creator no authority control, which produces a list of items for painters with works in major collections, that have no (or next to no) identifiers attached to them -- something they absolutely should have, given that they have work in a major collection. Two really neat things about these pages are (i) it's a Listeria list, so updates daily automatically with no further intervention required; and (ii) Multichill has coded it to an MnM search for each un-AC'd artist. This is hugely powerful, because it's easy to adapt the search in MnM -- eg lose the middle name; replace the end of forename or a surname or both with a wildcard (George -> Geo*); etc -- and then MnM will give you all people (particularly strong for artists) who match this name, summaries and/or links for the database entries for checking more details (dates, workplaces, work and family connections, etc), to match to all the other databases in MnM at once. Such an MnM search also very usefully gives an idea of who else may have a similar name, that you don't want your item to be confused with.
It is possible to create pages to drive links from unmatched/under-matched names to MnM searches without Listeria (and so without requiring an item here) -- for instance, for Art UK I occasionally update a set of pages like this one that do the same sort of thing, that I originally used to systematically go through the artists with the largest numbers of works on Art UK, that did not have an identifier RDK-artists, and look and see what I could find for them, going down the sorted list, opening 10 MnM search tabs in my browser at a time. It does have the advantage that an entry in the list doesn't absolutely have to have an item here yet, an old-fashioned 'redlinks' page like this can still work. But having now seen Multichill's Listeria-powered pages, with their fully automated update, I'm now very jealous, and thinking quite soon I'll need to explore how much of the functionality of the Art UK tracking pages can re-created with Listeria...
Sorry, a bit TL;DR - but some ideas, anyway. Jheald (talk) 15:12, 5 October 2017 (UTC)

@John Cummings:, I find this a big and interesting subject: thank you for trying to document it. There are a few remarks that apply quite generally, I think. Being sure about matches is hard, in the worst case. I tried out a training exercise this summer on it, and that only reinforced the conclusion.

Matching is usually done by circumstantial evidence, and that usually works. So, on average, one can apply plausibility. That will sometimes be wrong, and even professional historians can fall into the traps. Wikipedians will be horrified to understand that verifiability doesn't really apply here, or in other words a fair measure of "original research" is behind the matching process.

Therefore, people should be advised to skip freely, and aim to take the low-hanging fruit in the matching. This works until it is nearly all gone, and mix'n'match can then cycle through more-or-less tricky cases. (At least that used to be the case: Magnus may now have changed the pseudorandom generation of what you get. I could expand on this point.)

My main point is that wanting to match a dataset completely requires a rather different set of techniques from just wishing to add numerous statements to Wikidata. I've done both in my time, and the former is quite strenuous. Charles Matthews (talk) 15:53, 5 October 2017 (UTC)

@Jheald:, @Charles Matthews:, thanks very much both for your thoughtful responses, I'm going to work on compiling this in the next month or so. --John Cummings (talk) 16:38, 5 October 2017 (UTC)

How to efficiently check the existence of many Wikidata items?

I have a list of thousands of QIDs, and must check whether they are correct QIDs.

To do so, I am thinking of calling https://www.wikidata.org/w/api.php?action=wbgetentities&props=info&ids=Q42455342&format=json etc for each QID.

Is there a more efficient solution, for instance to check a hundred QIDs in a single request?

By the way, does this check make sense? Or are all QIDs from Q1 to the most recent QID created "valid"? Can you think of another check that would make more sense and be efficient?

Thanks! Syced (talk) 09:13, 5 October 2017 (UTC)

ids in your API request can take up to 50 values. Separate the values with | --Pasleim (talk) 10:00, 5 October 2017 (UTC)
You can expect regular items (valid), redirects (probably valid), and deleted items (not valid). The query service might help you to do it with a very small amount of requests, but since I cannot find the list of Q-IDs easily, I can’t really test it now… —MisterSynergy (talk) 10:18, 5 October 2017 (UTC)
Found it. Something like this should work:
SELECT ?item ?type WHERE {
  VALUES ?item { # list of items to test, with wd: prefix, separated by spaces only
    wd:Q10000136 wd:Q1000257 wd:Q1000321 wd:Q1001543 wd:Q1009423 wd:Q1011703 # ... and so on; just extend the list here
  }
  { # blue links
    ?item wikibase:statements [] . 
    BIND('regular' as ?type) .
  } UNION { # redirects
    ?item owl:sameAs [] . 
    BIND('redirect' AS ?type) .
  } UNION { # red links
    MINUS { ?item owl:sameAs [] }
    FILTER NOT EXISTS { ?item schema:version [] }
    BIND('invalid' AS ?type) .
  }
} ORDER BY ASC(?type)
Try it! You have to put all your Q-IDs into the third line. Works with “thousands of QIDs”, but there may be a limit which I am not aware of. —MisterSynergy (talk) 10:41, 5 October 2017 (UTC)
@Syced: Perhaps PetScan can help with this. Just paste the list of QIDs in the "Manual list" field under "Other sources" (and fill in wikidatawiki in the next field). An example test here. Items Q40999999 and Q41000001 exists, while Q41000000 doesn't exist, and gives a Page ID of "0". Don't know how well it scales, but I know the "manual list" field can take a lot more than 50 items at a time. Jon Harald Søby (talk) 12:32, 5 October 2017 (UTC)
Petscan would be my first choice too... just paste the list...quick and easy ;) --Hsarrazin (talk) 16:37, 5 October 2017 (UTC)

Global ban RfC on user INeverCry

Please share & contribute to: m:Requests for comment/Global ban of INeverCry. --78.53.71.61 11:11, 5 October 2017 (UTC)

Strange anonymous RfC started by IP. --Jarekt (talk) 13:23, 5 October 2017 (UTC)
Unusual indeed Jarekt, though it is explained in the RFC. It is a requirement that where the RFC is undertaken that all communities where the account is active are notified, and in this case I believe that it should be perfunctory in that regard.  – The preceding unsigned comment was added by Billinghurst (talk • contribs) at 5. 10. 2017, 13:58 (UTC).

Partial merge

I was amazed: half of an item didn't move. How is it possible? --Infovarius (talk) 12:09, 5 October 2017 (UTC)

A glitch: you can explicitly ignore some fields blocking the merge. (Didn't I already answer this to you?) Matěj Suchánek (talk) 16:27, 5 October 2017 (UTC)

Wikidata:Copyright_rules (AKA - is Wikidata CC0 in Europe?)

Commons has a nice page explaining how copyright affects commons ( https://commons.wikimedia.org/wiki/Commons:Copyright_rules ).

Is there an equivalent for Wikidata? Database licensing is tricky, especially with major differences between EU and USA. From what I observed it seems that Wikidata is using USA rules, what allows copying databases not only in public domain but also licensed ones. And as consequence Wikidata would not be CC0 in EU. Mateusz Konieczny (talk) 21:59, 1 October 2017 (UTC)

Comment above may be hilariously wrong, and that is why I am looking for something that would allow to verify my understanding of situation Mateusz Konieczny (talk) 22:00, 1 October 2017 (UTC)
see also Wikidata:Copyright, m:Wikilegal/Database_Rights but then if you have special insight into Database licensing, feel free to pontificate. Slowking4 (talk) 03:14, 2 October 2017 (UTC)
Please be aware that Mateusz is citing the above discussion as evidence that "people editing Wikidata are unconcerned about copyright". Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:39, 6 October 2017 (UTC)
Given that it is apparently not documented anywhere whatever wikidata is CC0 under USA or EU rules and there is no good explanation what kind of databases can be imported - I think it is a not entirely incorrect conclusion Mateusz Konieczny (talk) 15:18, 6 October 2017 (UTC)

Help:QuickStatements

We started Help:QuickStatements page with User:Syced, documenting QuickStatements tool by @Magnus Manske:. Please read, correct, expand. --Jarekt (talk) 17:22, 4 October 2017 (UTC)

Nice logo! Jane023 (talk) 10:30, 5 October 2017 (UTC)
Thanks, I hope others use it, so when I use it in a template, people will know what it stands for. --Jarekt (talk) 13:09, 5 October 2017 (UTC)
Great, this was missing for a long time! One could mention https://tools.wmflabs.org/ash-dev/wdutils/csv2quickstatements.php, I haven't used it yet, but it looks useful. --Marsupium (talk) 10:42, 5 October 2017 (UTC)
I added "See Also" section with a link to it. I am missing a point of this tool a bit, since once you have it in spreadsheet than you can just cut and paste from it. But maybe someone will find it useful. I am hoping that when others edit the page I might learn something new.--Jarekt (talk) 15:30, 5 October 2017 (UTC)
Some really common problems should be highlighted:
  1. strings and derived datatypes (external-id, files etc.) need quotes ("") around
  2. monolingual strings need also the language
Matěj Suchánek (talk) 16:38, 5 October 2017 (UTC)
Matěj I thought current version has that but if you want to emphasize more please do. Also can someone help with batch_mode section. I also started FAQ and best practices sections that should be looked over. --Jarekt (talk) 17:18, 6 October 2017 (UTC)

How to add a Wikipedia article to a Wikidata item?

This might sound obvious, but for someone trying to do it for the first time, how do you do it? The how to has screenshots that look like nothing I am seeing (I see no table). I have tried to use the tours but the first one won't launch (yes, I reported it on the talk pages a week or so ago and nobody replied).

My specific task is to links to the two Swedish Wikipedia articles between Q18208207 and Q18208207 (they are assigned to the wrong item). The ceb Wikipedia article is also attached to the wrong wikidata item. Thanks Kerry Raymond (talk) 03:43, 6 October 2017 (UTC)

between Q18208207 and Q18208207 – you mention two same items. Help:Sitelinks should help you. Matěj Suchánek (talk) 07:00, 6 October 2017 (UTC)
I sorted out the specific problem Kerry mentioned at Boyne River (Q18208207) and Boyne River (Q31843660). But I think the general frustration was that the documentation was not sufficient for Kerry to learn how. --99of9 (talk) 07:14, 6 October 2017 (UTC)
This help page seems indeed outdated. --Anvilaquarius (talk) 07:05, 6 October 2017 (UTC)

Diplomatic mission to an international body: what property?

In what property should I put United Nations (Q1065) at Permanent Observer Mission of the Holy See to the United Nations (Q11782064)?

Usually diplomatic missions (such as embassies) use located in the administrative territorial entity (P131), but it is not the right choice here as United Nations (Q1065) does not have jurisdiction over the place where its headquarters happen to be. The same would apply for the United States Mission to the European Union (Q23541051).

Related: How to describe France in the Embassy of Swaziland in France (Q16303712) which happens to be in Belgium?

How about operating area (P2541) maybe?

Thanks! Syced (talk) 04:31, 6 October 2017 (UTC)

Journal de Bruxelles links

This is a 1799/1800 journal. see the Commons categories. I have also put one journal in the French wikisource and am buzy transliterating the next one. How does one link this at two levels: Journal de Bruxelles and the individual journals and pages. Be carefull as there are several journals using the same name.Smiley.toerist (talk) 08:07, 6 October 2017 (UTC)

SHARE Virtual Discovery Environment project

There is a project called SHARE Virtual Discovery Environment project working on integrating library data with the semantic web. Here is an example of how they display data related to Mary Shelley. The Wikidata logo displayed on the page is a link to Shelley's Wikidata item. YULdigitalpreservation (talk) 14:54, 6 October 2017 (UTC)

Doubble Wikinews-items

Wikinews NL is reopened after a while as Incubator-project, but something went wrong in creating Wikidata-items. There are now several double items of the same article. Can anyone help remove the items on Wikidata without link to Wikinews? An example: double items I have no idea how many double there are. --Livenws (talk) 09:10, 12 October 2017 (UTC)

We can make a bot for cleanup. I will try to make one. Meanwhile, imports of new items for nlwikinews should stop. Matěj Suchánek (talk) 10:21, 12 October 2017 (UTC)
Ok, Thank you for helping to clean up! --Livenws (talk) 10:51, 12 October 2017 (UTC)
This section was archived on a request by: Matěj Suchánek (talk) 15:28, 12 October 2017 (UTC)

Distributed game: Pasleim project merge — is this what we truly want?

I just had a look at the gamification of Palseim's merge stuff. I had a play [2], and found out that it defines merge as moving all the links from one item to another and deleting it from the initial item, rather a more traditional merge as we do using the javascript. So we strip data and leave an empty shell of an item without the ability to merge it easily. Presumably any linked items will remain on the original item. To me that just seems wrong. Is that the result that we really want?  — billinghurst sDrewth 03:51, 4 October 2017 (UTC)

woah... I had a pass at it also, but did not check the "merge" since I mostly skipped and hit "different" ... Is it not really a "merge" ? --Hsarrazin (talk) 07:54, 4 October 2017 (UTC)
I'm not the programmer of this game, but I can see that the merging is done in the canonical way, i.e. if there is conflicting data, that junk of data is not edited and no redirect is created. In the above example there were conflicting descriptions. It is only the merge gadget which trashs conflicting descriptions for the sake of creating a redirect, direct API requests, QuickStatements and Special:MergeItems don't do this. There is also an open phabricator ticket dealing with this issue. --Pasleim (talk) 09:17, 4 October 2017 (UTC)
Seems to be pretty weird: somebody merged a footballer into a baseball player, although there were sitelinks to both in enwiki and eswiki, and the 'Game' let him. - Brya (talk) 10:41, 4 October 2017 (UTC)
Users should be careful and avoid wrong decisions. Whether the redirect is created or not, does not matter, they will leave mess anyway.
Wikidata:Administrators' noticeboard#Why haven't these games been blocked yet? Matěj Suchánek (talk) 18:46, 4 October 2017 (UTC)
Of course that is correct Matěj Suchánek however that is not helpful. Users will make wrong decisions. That the system doesn't show or otherwise flag is clearly our problem. That an item is stripped and becomes unrecognisable for a merge is our problem. For that we should not be blaming the poor person who uses a tool where they believe that they are helping, and cannot even see that the merge failed. Our system, one of our tools, our responsibility, let us not blame the users.  — billinghurst sDrewth 03:49, 7 October 2017 (UTC)

Councils are organisations, not places

English metropolitan district council (Q19414242), a type of organisation, is listed as a subclass of local authority (Q837766), which is subclass of political territorial entity (Q1048835), a territorial entity. How should this be resolved? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:38, 4 October 2017 (UTC)

For parish council (Q7137435) we have subclass of (P279) assembly (Q1752346) and applies to jurisdiction (P1001) civil parish (Q1115575)
But a council (at least, one with more substance than a parish council) is an authority, not just an assembly.
English unitary authority council (Q21561328) is subclass of (P279) local authority (Q837766) and English unitary authority council (Q21561328) and legislature (Q11204)
Not sure whether these are good models.
Sjoerddebruin (talkcontribslogs) has changed local authority (Q837766) to be subclass of (P279) corporation (Q167037), but I am not sure that is quite right either -- corporations typically aren't answerable to an assembly.
So: not sure. Jheald (talk) 20:52, 4 October 2017 (UTC)
Definitely not a corporation when you look at its parents. Why not link it straight to separate legal entity (Q7451779)?  — billinghurst sDrewth 23:17, 4 October 2017 (UTC)
The German article on local authority (Q837766) states that it is a special kind of corporation (Q167037), which has a legal power over a territory. If I understand correctly, local authority (Q837766) relates to the corresponding administrative territorial entity (Q56061) like government (Q7188) relates to country (Q6256). I am not sure if every local authority (Q837766) is a juridical person (Q155076). Ahoerstemeier (talk) 09:02, 5 October 2017 (UTC)
There are no clean cuts here. I think we have to either live with the mess or give up the idea of describing these things with broad subclasses. Creating new items and the class-structure every time something in the law of local authority is changed is neither easily done. -- Innocent bystander (talk) 09:15, 5 October 2017 (UTC)
To note that "corporation" specifically states that it is a subclass of private company (Q5621421) so that cannot apply, hence why I suggested link to the same parent as corporation.  — billinghurst sDrewth 11:07, 5 October 2017 (UTC)
Being bold, I removed "private company" from "corporation" as the English description for corporation was broader than private.  — billinghurst sDrewth 03:43, 7 October 2017 (UTC)

Since update, lost cursor focus in field

Since the update this week, after I manually select a property the cursor is no longer focused and active in the field for the item. Instead I am needing to click into the field. I am wondering whether others are seeing that behaviour? (I'm using Firefox). If that is confirmed, I hope that we could resolve that for next week's release.  — billinghurst sDrewth 11:26, 5 October 2017 (UTC)

Same here on Firefox and Chromium. --Marsupium (talk) 11:45, 5 October 2017 (UTC)
Here too. It's totally ruining my flow. Another thing that changed is that when you tab on from the field to input an item (or string or whatever), the "Add qualifier" link is highlighted, but when I press Enter it saves the statement instead of adding a qualifier. Jon Harald Søby (talk) 12:20, 5 October 2017 (UTC)
Oh, that happened to me as well, now that I think of it.  – The preceding unsigned comment was added by Billinghurst (talk • contribs) at 5. 10. 2017, 12:40 (UTC).
yes, I confirm too (in Firefox)... worse, as I use shortcuts to add statements (a) and labels (l), these letters, when typed even in the field, trigger the shortcut... had to cease editing directly, de-activate shortcuts to be able to type values :(( --Hsarrazin (talk) 16:35, 5 October 2017 (UTC)
same here (Firefox), my AutoHotKey macros don't work anymore, very annoying --Anvilaquarius (talk) 07:03, 6 October 2017 (UTC)
Argh! Sorry. We are looking into it but have issues pinning it down. It'd be great if you could help clarify it in the ticket. Thiemo is posting his findings/questions there in a few minutes. --Lydia Pintscher (WMDE) (talk) 16:56, 5 October 2017 (UTC)
Added my workflow to the ticket as well. - PKM (talk) 23:01, 5 October 2017 (UTC)
Noting that the issue has been identified and has been put into the fixing queue.  — billinghurst sDrewth 03:39, 7 October 2017 (UTC)

Chrome change

In Chrome when I add any field like "gender" to a biographical entry it no longer tabs over to the data field, I now have to click my cursor on the field, or my typing does not get recorded, did something change at Wikidata, or at Chrome, or am I imagining that I never had to do this before? --Richard Arthur Norton (1958- ) (talk) 17:51, 6 October 2017 (UTC)

Sounds like this issue discussed above. YULdigitalpreservation (talk) 18:23, 6 October 2017 (UTC)

Commons category (P373) and coordinate location (P625) external links have disappeared

While I didn't change my preferences, links on Commons category (P373) and coordinate location (P625) have disappeared which is really annoying. Is anyone else having the same issue? — Ayack (talk) 18:17, 6 October 2017 (UTC)

Yes, me too, unfortunately. --Marsupium (talk) 18:38, 6 October 2017 (UTC)
This happens randomly to me. It normally seems to be a caching issue in my web browser - on a Mac using Firefox doing apple-shift-R normally clears it. Thanks. Mike Peel (talk) 22:00, 6 October 2017 (UTC)
Thanks but I’ve already done it without success. I’ve tried several OS (macOS/iOS/Windows) and browsers (Safari/Firefox/Chrome/Brave), it no longer works. — Ayack (talk) 06:43, 7 October 2017 (UTC)

Property:P2736

Biographical Directory of Federal Judges ID (P2736) is for the url https://www.fjc.gov as far as I can tell, the property links to the Wayback machine now. It looks like they renumbered their database Cooper, Frank is our Q5485947‎. --Richard Arthur Norton (1958- ) (talk) 02:15, 7 October 2017 (UTC)

  Done updated url formatter  — billinghurst sDrewth 04:06, 7 October 2017 (UTC)

Should sitelink de:"Betriebswirtschaftslehre" be better linked from item "business economics" (instead of current "business administration")?

According to it's description, "application of economic theory to business", business economics (Q24208053) seems to be much better suited to the German academic discipline "Betriebswirtschaftslehre" than business administration (Q2043282), which is described as "process of managing a business or non-profit organization". That would match the use in STW Thesaurus for Economics (Q26903352), too.

Since I have no experience in dealing with sitelinks, I'm asking here, if there is a process for discussing and making such changes. One part of the problem is that I (and probably most other users) have no idea what the Wikipedia page titles in most of the interlinked languages may mean:

Help very much appreciated - Jneubert (talk) 08:52, 6 October 2017 (UTC)

I just checked Google Translate, and it thinks "Master in Business Administration" = "Master in Betriebswirtschaft"
But really I think one needs to look at the content of the German wiki page, and then the pages in the most developed language wikis for the various possible links, and see what seem the most similar. Jheald (talk) 10:32, 6 October 2017 (UTC)
The best place for a dicussion of where the page should link to is the talk page of the article on dewiki. ChristianKl (talk) 15:39, 7 October 2017 (UTC)

number names

Hi, can somebody use a tool to add labels (from sitelinks) and descriptions to the following items:

billion (Q41650356), trillion (Q41650357), Q41650358, Q41650359, Q19929310, Q41650361, Q41650362

Background you can find here. Greetings Bigbossfarin (talk) 08:57, 6 October 2017 (UTC)

Why have you done this massive change without waiting for some feedback? Some articles have a name as title, but actually deal about a specific number. So each article needs to be check beforehand. Andreasm háblame / just talk to me 01:30, 7 October 2017 (UTC)

Opensofias
Tobias1984
Micru
Arthur Rubin
Cuvwb
TomT0m
Tylas
Physikerwelt
Lymantria
Bigbossfarin
Infovarius
Helder
PhilMINT
Malore
Nomen ad hoc   Notified participants of WikiProject Mathematics Andreasm háblame / just talk to me 01:34, 7 October 2017 (UTC)

@Andreasmperu: Hi I had a discussion with Infovarius eleven days ago about this topic and did a big analysis what Wikipedia articles are about. Five days ago I described my proposal at Interwiki conflicts and waited four days for a feedback. Please tell me your feelings about it and make some improvement proposals. --Bigbossfarin (talk) 14:15, 7 October 2017 (UTC)

Dams and reservoirs

Do we have a property to indicate the relationship between a dam and its reservoir? I would expect “impounds” /“impounded by” but perhaps something more generic can be used? - PKM (talk) 02:59, 5 October 2017 (UTC)

I often use located in or next to body of water (P206) in the dam item, but I'm not sure this is the best property, since it could also be used by the river. --Fralambert (talk) 03:26, 5 October 2017 (UTC)
Reservoir -> has part-> dam, qualified with has role -> dam? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:05, 5 October 2017 (UTC)
I’ve modeled Santa Fe Flood Control Basin (Q41680235) using part of/has part, and I’m thinking about whether I am happy with that. The proposed property “contains” might be better for a park/recreation area inside a flood control basin (we do that lot in California). - PKM (talk) 02:14, 7 October 2017 (UTC)

I personally think it should be dam > has part > reservoir. Because the reservoir is created because of the dam's existence, the same reason why most reservoir+dam topics on Wikipedia are covered under the Dam's title rather than the reservoir (i.e. Victoria Dam instead of Victoria Reservoir). But I'm not sure if that is the best way for Wikidata... Rehman 08:22, 8 October 2017 (UTC)

Wikidata in Siri

It looks like Siri was answering 'What is the anthem of Bulgaria?' with 'Despacito' for a while, and a number of outlets picked up on that.

Whereas the reporters were quick to check Wikipedia's history for vandalism, the seem not to know or be able to yet check Wikidata:

Curious to see how quickly this skill will spread. --Denny (talk) 21:01, 5 October 2017 (UTC)

Deplorable vandals. How could they miss the opportunity to use Never Gonna Give You Up (Q57)? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:13, 5 October 2017 (UTC)
while this is funny, it looks to me like the wikidata edit happened *after* the news reports of Siri getting confused, not before. What the source of the confusion is I have no idea, but I don't really think it's us. ArthurPSmith (talk) 23:59, 5 October 2017 (UTC)
The item has been vandalized three times: [3] (Sep 21–Oct 1), [4] (Oct 5), and [5] (Oct 5). The second and third occasion were reverted rather quickly, but the first time it took 10 days to remove the vandalism. —MisterSynergy (talk) 04:59, 6 October 2017 (UTC)
And now there are trolls: [6]. Item is semi-protected for a month now. —MisterSynergy (talk) 06:45, 6 October 2017 (UTC)

I wish there was a way to watch all changes to the anthem property over all items. My assumption is that it is likely that the trolls will simply deflect to other similar patterns. --Denny (talk) 14:47, 6 October 2017 (UTC)

I was just told that I can, using Listeria: https://en.wikipedia.org/wiki/User%3ADenny%2FAnthemes -- that is cool! :) --Denny (talk) 15:06, 6 October 2017 (UTC)
I’m already watching for such patterns, since there is a Reddit thread about this incident and they have realized that this information came from Wikidata. However, there aren’t any unpatrolled edits to anthem (P85) claims since yesterday evening at the moment. The reCh tool is useful for filtering recent changes of new users… —MisterSynergy (talk) 15:07, 6 October 2017 (UTC)
@Denny: I just added to the reCh tool a function to filter unpatrolled changes by property ID. For that, select in the tool first the option to show only edits on claims, in the next step you can filter it down to property IDs. However, there is only one unpatrolled edit left with anthem (P85), so expect long loading times but few results. --Pasleim (talk) 16:05, 8 October 2017 (UTC)
New feature of the month, IMO. Thanks a lot! MisterSynergy (talk) 16:15, 8 October 2017 (UTC)
@Pasleim: that is awesome, thank you! --Denny (talk) 17:25, 8 October 2017 (UTC)

Problematic edit by Jura1 - duplicate creation

There is https://en.wikipedia.org/wiki/Prizren_(disambiguation) which at https://en.wikipedia.org/wiki/Prizren_(disambiguation)#Interwiki has a link to Q22149442. What then is the reason to create https://www.wikidata.org/wiki/Q41493160 ???? 77.180.123.82 03:12, 6 October 2017 (UTC)

Obviously, that wasn't the correct way to create interwiki links. Matěj Suchánek (talk) 07:01, 6 October 2017 (UTC)
Matěj Suchánek, should User:Jura1 be blocked for that misedit?77.179.250.59 14:47, 6 October 2017 (UTC)
No, we block people for vandalism not for mistakes. --Jarekt (talk) 18:02, 6 October 2017 (UTC)
When someone makes an edit, start a discussion with them, resolve misunderstandings. Assume good faith.  — billinghurst sDrewth 03:38, 7 October 2017 (UTC)
user:billinghurst - why would that be applied here, while for other users it was not? 85.181.108.182 03:07, 8 October 2017 (UTC)
Now that is just being argumentative. AGF exists and should be generally applied. Blocking should be applied mindfully, not punitively.  — billinghurst sDrewth 03:23, 8 October 2017 (UTC)
What I meant that was wrong is this. Matěj Suchánek (talk) 14:21, 7 October 2017 (UTC)
User:Matěj Suchánek - why was that wrong? The article space page is named "Prizren (disambiguation)" and the Wikidata item Q22149442 has label "Prizren" and description "Wikimedia disambiguation page". And User:Pasleim merged the wrong item created by User:Jura1 there. 85.181.108.182 03:06, 8 October 2017 (UTC)
Do other disambiguations link to Wikidata like this? Matěj Suchánek (talk) 14:17, 8 October 2017 (UTC)

How do we identify weird-looking "green" items from constraint reports?

I am looking at the report Wikidata:Database reports/Constraint violations/P767 for contributor(s) to the creative work or subject (P767) and in the "Type statistics" report I see numbers of accepted/green types that are edge cases for the property. How does one easily find some of these items to back track and review the data. [Trying to use what links here is too hard.] Thanks.  — billinghurst sDrewth 03:59, 7 October 2017 (UTC)

SPARQL? --99of9 (talk) 12:14, 7 October 2017 (UTC)
That may be possible, but doubt that I could ever sparkle. I was looking for something that I can do.  — billinghurst sDrewth 12:31, 7 October 2017 (UTC)
Adding links to details list like that in constraint reports has been discussed in the past but dismissed as it would have them grow too much while some are already too heavy to be manageable. So SPARQL is still the way to go. Try this. You can just change the type you want to investigate in the left pane. You'll see you can sparkle ;-) --Nono314 (talk) 13:51, 7 October 2017 (UTC)
Thanks Nono314 (talkcontribslogs), I will see how I go. Do you think that having the sparql quer(y|ies) built into the constraint reports would be possible or useful? (I understand that writing streams of reports for good data is not a sustainable idea.)  — billinghurst sDrewth 02:35, 8 October 2017 (UTC)
It would definitively be possible, and I think also useful in use cases like yours or even for handling actual violations without having to comb through all of them looking for a specific type. The single point of concern is avoiding side effects that would hve a negative impact on the larger reports. My idea would be to use a template for rendering each row of the type statistics table, but I have no idea whether transcluding it many times would break the rendering limit for the page.--Nono314 (talk) 15:23, 8 October 2017 (UTC)

Next Wiki Science Competiton and P18 of scientists

Hi, in the framework of the next Wiki Science Competition I am revising the workflow of the previous years in order to improve the interface with the wikimedia platform. We can rely on the wiki-expertise of some of jurors and the international and national levels, when present, but this is only one possible pathway. We expect thousands of pictures, some of them very specific, we must act to reduce the backlog.

First of all, I have informed the commons village pump, and some wikiverisities and wikipedias etc etc. In the last week I started to revise the uploads from the category people in science

In the 2015 edition, many pictures where lacking enough information about the depicted scientist, so I told User:Kruusamägi we should improve something in that direction. Personally, when presenting the competition on my country, I will stress the importance to avoid too generic description.

What's the link with wikidata? A very clear gap that can be filled IMHO are the P18 of scientists, considering getting images of non "pop-related" people is less easy, they are worth some efforts of "contextualisation". Linking them to a profile in wikidata will probably help their long-term categorization on commons. For "mid-level" profiles, this means that some missing items should be created. I'll do my part, that's why I tried to improve my expertise on that front recently, but there is work to do and even some backlog. Some of my recent edits are now focused on fixing this gap, examples Q41691001 or Q41694266.

In summary, since I see more interest to the topic of bibliometry now on wikidata, considering the number of images will be higher, that's a good occasion to do better than the previous time, so every help is welcome.--Alexmar983 (talk) 08:55, 7 October 2017 (UTC)

Also, which number of publication do you think it is worth to create an item independently of the IDs... just to set a priority. Like 10? I am assuming (but this is a long discussion) that wikicentric notability is different, personally if someone has more than a certain number of publication or more than one ID I create the item, but again I won't mind to have some "operative priority" on which we can agree. Like for example this guy Q41692643 I am not really sure now, I did it by mistake I misread a total number of publication on scopus referring to another person. In this moment I miss we don't a have a good "structured data" on commons... yet--Alexmar983 (talk) 09:58, 7 October 2017 (UTC)
  Done, Conny (talk) 20:49, 7 October 2017 (UTC).

Describing the two sides of a coin

I'm going to add several new coin types to Wikidata, and I'm trying to model them properly, in particular with respect to a rich description of their obverse and reverse, which could also be used to guide the identification efforts of collectors or archaeologists.

For instance, let's take denarius of L. Censorinus (Q5256588). There, I enriched an existing coin type (i.e., a specific denarius of the Roman Republic), replicating the approach of 2 euro coin (Q1981571), i.e., using has part (P527), then obverse (Q257418), and finally depicts (P180) with a certain value (e.g., Apollo (Q37340) or laureate head (Q40794959)). That's pretty neat and straightforward. However, coin descriptions are usually something like this "Obverse: Laureate head of Apollo right; before, control mark. Border of dots. Reverse: Marsyas walking left, with right arm raised and holding wine-skin over left shoulder; behind, column bearing statue of Victory; on right, control mark. Border of dots." I'm not sure one can reach such degree of detail on Wikidata, but - for identification purposes - it is maybe useful to specify that the head of Apollo is looking right or left.

To describe the fact that the head of Apollo is looking right, I tried to use applies to part (P518) as in the example (see denarius of L. Censorinus (Q5256588)), but this does not work very well in cases in which you have, e.g., an emperor both on the obverse and on the reverse, and the emperor is looking right on the obverse and left on the reverse... in these cases, you may loose context, if you use applies to part (P518), while it would be easier to use a new property (like "obverse description / depicts on obverse" and "reverse description / depicts on reverse"), allowing for an additional level of specification, e.g., "denarius of L. Censorinus (Q5256588), obverse description, Apollo, direction, right".

Any comments/suggestions? Should I be happy with has part (P527) and applies to part (P518), paying something in terms of ambiguity, but gaining in generality? Or would it be appropriate to propose two new properties for the description of obverse and reverse? I personally like the idea of fighting property proliferation, so I would be glad to receive suggestions to properly describe all these data with existing properties. (Even a comment like "I think that denarius of L. Censorinus (Q5256588) is OK as it is" would be useful to me, because I want to create similar items and I'd like to check that they are OK before proceeding.) --FedericoMorando (talk) 23:21, 6 October 2017 (UTC)

There may be reasons why this is not such a good idea, but my first inclination would be to have depicts (P180) as the main property, then applies to part (P518) = reverse (Q1542661) as a qualifier. I think that is how the paintings project would do it.
This then for instance makes it easy to have two P180 statements for the emperor, one qualified to apply to the obverse, the other qualified to the reverse.
You could also have depicts (P180) = laureate head (Q40794959) with qualifiers of (P642) = Apollo (Q37340), applies to part (P518) = obverse (Q257418), direction (P560) = right (Q14565199) ...
I think that would be more neat than the current arrangement at denarius of L. Censorinus (Q5256588).
But it is important, for searchability, that a standard needs to be agreed and documented, and items need to be conformed to it. A numismatics style guide page is needed. Jheald (talk) 01:55, 7 October 2017 (UTC)
Qualifier stated as (P1932) can be used to record the exact language to describe the part of an object that was used by a particular source, which can be useful for retrieval eg into infoboxes; but the breakdown into statements is valuable for querying. Jheald (talk) 01:59, 7 October 2017 (UTC)
@FedericoMorando: Thank you for your work on that! I agree with all what Jheald said. Currently, denarius of L. Censorinus (Q5256588) is definitely badly modeled misusing depicts (P180) and applies to part (P518). depicts (P180) shouldn't be used in the qualifier to the has part (P527) statements, but in an own property. applies to part (P518) on the other hand should only be used in a qualifier (has used as qualifier constraint (Q21510863)). The reference for that can be found at Wikidata:WikiProject Visual arts/Item structure#Individual objects and parts. It's not very visible unfortunately. (Any help to improve that is welcome!) And I should have more stressed the encouragement to create own items for single images. But that is the way to go, if it gets to complicated I think. Here you could create an item for obverse and reverse each, connect them (instead of the general obverse (Q257418) and reverse (Q1542661)) with has part (P527) and part of (P361) to denarius of L. Censorinus (Q5256588) and do detailed description there. --Marsupium (talk) 09:14, 7 October 2017 (UTC)
Another useful qualifier is shown with features (P1354) which can be used to indicate that the depicted person has a laurel wreath, crown, mantle, etc. - PKM (talk) 20:02, 7 October 2017 (UTC)
Please have look at Property talk:P1354#Use wears (P3828) too! and perhaps respond! Thanks, —Marsupium (talk) 23:53, 7 October 2017 (UTC)
The possibility of proposing the two Properties adverse and reverse is that a way to go still, or will that be wasted use of time? Breg Pmt (talk) 15:26, 8 October 2017 (UTC)

@Jheald: @Marsupium: and @ FedericoMorando: I have tried to use Eisenhower dollar (Q1312952) with properties and qualifiers as described by user Jehald and Kennedy half dollar (Q631456) with properties and qualifiers as described by user:Marsupium. Can you please comment on this. (note this is just a start for those coins). Breg Pmt (talk) 16:52, 8 October 2017 (UTC)

What I wrote shouldn't contradict Jheald's comment. Eisenhower dollar (Q1312952) looks fine. In Kennedy half dollar (Q631456) I think you have misunderstood me, I have changed it to the model of the first. IMO the use of depicts (P180) in both looks fine now. --Marsupium (talk) 17:15, 8 October 2017 (UTC)
@Marsupium: thank you very much. I suppose @FedericoMorando: wants to correct denarius of L. Censorinus (Q5256588) according to the comments. @Jheald: create numismatics style guide sounds Nice, but how is that done? Maybe a Project for coins and medails should be started. Breg Pmt (talk) 18:02, 8 October 2017 (UTC)
You are welcome, and I thank you! Wikidata:WikiProject Cultural heritage/Navigation already has the topic "coins and medals", but waiting to link to some place … --Marsupium (talk) 18:06, 8 October 2017 (UTC)
@Jheald: @Marsupium: @Pmt: @PKM: @AlessioMela: and all: thanks for your comments. I'm going to change denarius of L. Censorinus (Q5256588) accordingly. As far as it's possible, I would like not to propose new properties. In this regard, I'm happy just adding quite a few new specialized identifiers for coins (see, e.g., Wikidata:Property proposal/Coinage of the Roman Republic Online ID). About a numismatics style guide, I tried collecting some stuff (now to be updated on the basis of this discussion) here Wikidata:WikiProject_Coins. Considering the existence of a community focused on numismatics (at least on various Wikipedia editions, if not on Wikidata so far), maybe it makes sense to have an ad hoc project, but of course I would conform as much as possible to broader cultural heritage projects. --FedericoMorando (talk) 09:53, 9 October 2017 (UTC)

subclasses of subclasses of scientific: several probelms

see also this query: [7]

Qualifying an "ulema of person engaging in a systematic activity to acquire knowledege and describes and predict natural world" looks wrong

I suspect there is a confusion between scholar and scientific. scientific should be a subclass of scholar as well as humanities scholar, Visiting scholar,philosopher

gerontologist, scientific illustrator, should simply be removed of being a subclass of scientist.


I was actually looking for having the list of famous scientists and for that trying to list the scientist. And I eventually found two more problems which are kind of less convoluted. First physician is subclass of scientist and should not, but medical researcher should. application programmer,game programmer and web developer should be a subclass of software engineer which should be split of computer scientist. A lot of current item labeled as computer scientist should be labeled as software engineer. The confusion come that only in english there is no generic term for "Informaticien" in french and other languages which gather computer scientist and software engineer. I believe that most of the persons labeled wrongly as computer scientist were imported form non english wikipedia categories.

Here the full tree of scientist subcategories [8] Xavier Combelle (talk) 04:37, 8 October 2017 (UTC)

I changed a few statements. Feel free to make your own edits if you see wrong or missing statements. --Pasleim (talk) 12:23, 9 October 2017 (UTC)

Short URLS

Can it be made possible to shorten long URLs so that they don't push article text to a silly thin margin in wikidata sourced infobox eg Woolworths in Welsh, thanks AlwynapHuw (talk) 06:22, 15 October 2017 (UTC)

@AlwynapHuw: Would it be better to pipe the urls and have some text there in the form of a traditional link rather than the raw link. Does the template have the ability already?  — billinghurst sDrewth 07:36, 15 October 2017 (UTC)
Please don't; the URL infoboxes is used as part of an embedded microformat; using link text other than the URL corrupts that. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:14, 15 October 2017 (UTC)
The item in question, Woolworths Group (Q958479), was using https://web.archive.org/web/20071010031624/http://www.woolworthsgroupplc.com/ as the value for official website (P856). That was wrong; Woolworths never owned the archive.org website. I have changed it to the correct value, http://www.woolworthsgroupplc.com/, with an end date qualifier. I've preserved the archive URL both using archive URL (P1065) and as a reference. The Infobox should be modified to exclude (or delink using <nowiki>) values with an end date qualifier. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:14, 15 October 2017 (UTC)
Thank you very much, it looks much better nowAlwynapHuw (talk) 16:03, 15 October 2017 (UTC)
This section was archived on a request by: Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:08, 15 October 2017 (UTC)

Merge request

Hi I have never edited here before and am not sure of the procedure. Would someone be able to merge Q6234102 with Q41967968? They are the same topic (civil engineer John Frederick Bateman 1810-1889). Many thanks - Dumelow (talk) 09:43, 15 October 2017 (UTC)

  Done John Samuel 10:47, 15 October 2017 (UTC)
This section was archived on a request by: Matěj Suchánek (talk) 18:59, 15 October 2017 (UTC)

Describing international days, weeks, years etc

Hi all

I'd appreciate a bit of advice describing International Days, International Weeks, International Decades, International Years and United Nations Anniversaries, things like World Radio Day (Q1359258) and World Space Week (Q2915488). Currently they're mostly instance of (P31) world day (Q2558684), which has a werid 'Instance of' statement and is not a week.... Also how would I describe the organisations the days are observed by?

Once I know how to format them I'll import some data on them from the UN to get better coverage of the subject.

Thanks

--John Cummings (talk) 10:05, 6 October 2017 (UTC)

world day (Q2558684) looks ok to me, why not create a similar item for "world week" for the cases of weeks. Not sure what to do about years, that seems to ba a different sort of thing. ArthurPSmith (talk) 12:51, 6 October 2017 (UTC)
@ArthurPSmith: 👍. --John Cummings (talk) 09:54, 10 October 2017 (UTC)

Wikidata weekly summary #281

Wikidata recent changes on other wikis

It seems that displaying recent changes on Wikidata on other wikis has run into some serious scalability problems, see [9]. As far as I can tell, it's been disabled on commons and ruwp for now, it's not clear if other projects are at risk of it also being disabled or not. This is a fairly fundamental part of using Wikidata on other wikis, so I hope this can be resolved soon. Thanks. Mike Peel (talk) 13:08, 10 October 2017 (UTC)

We're working on getting it back for Commons and Russian Wikipedia. It currently looks fine for others. --Lydia Pintscher (WMDE) (talk) 18:13, 10 October 2017 (UTC)

Celebrity userpages

Today there have been several instances of new users registering with the names of various celebrities, and creating userpages that are essentially copies of those celebrities' articles on enwiki. Examples include User:Matt Baker (Presenter), User:Lindsay Dee Lohan, User:Beyonce (Singer) and User:Anna Katherine Popplewell. Anyone have any idea what's going on with these and what should be done with them? It's doubtful that any of these editors are the celebrities under whose names they are editing, and so far none of them have done any editing outside of creating those userpages. Nikkimaria (talk) 16:48, 12 October 2017 (UTC)

Just ignore for as long as they're not breaking any rules. Danrok (talk) 17:06, 12 October 2017 (UTC)
Impersonating other people is breaking rules according to the EnWiki rules (https://en.wikipedia.org/wiki/Wikipedia:Username_policy#Misleading_usernames). I see no good reason to allow this on our Wiki. ChristianKl (talk) 17:21, 12 October 2017 (UTC)
I suggest to add {{Delete}} with deletion rationale “out of project scope” or copyvio (if that is the case) on these pages, so that they automatically appear at Wikidata:Requests for deletions#Pages tagged with .7B.7BDelete.7D.7D. If you think that these accounts shall be blocked, you can request it at the Administrators' noticeboard. However, the Blocking policy does not cover such cases yet, so I am not sure how this would be handled… —MisterSynergy (talk) 17:33, 12 October 2017 (UTC)
Tagged and deleted. Nikkimaria (talk) 18:12, 12 October 2017 (UTC)
This section was archived on a request by: Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:14, 16 October 2017 (UTC)

Subtypes of geothermal power stations

Hi. There are three types of geothermal power station (Q30565277):

  1. . dry steam power station (Q41722780)
  2. . flash steam power station (Q41722964)
  3. . binary cycle power station (Q4086827)

Can someone confirm if the way I've linked each other is correct? I would like to use the same method for other types of power stations as well. Thanks in advance. Rehman 12:06, 8 October 2017 (UTC)

Using subclass of (P279) is the right way to go. But has parts of the class (P2670) is not the inverse property of subclass of (P279), so just add a link from the more specific classes to the more general class but not the other way round. --Pasleim (talk) 16:08, 8 October 2017 (UTC)
Thanks Pasleim. So you mean there is not backward linkage at all? Rehman 15:40, 9 October 2017 (UTC) Especially because these are subclasses exclusive to geothermal power stations.... Rehman 15:40, 9 October 2017 (UTC)
There is neither backward linking possible nor needed. --Pasleim (talk) 18:41, 10 October 2017 (UTC)
Thanks for clarifying. Regards, Rehman 04:12, 11 October 2017 (UTC)

Need help with a SPARQL query

Hello, I would like to find the items that have an entry (or statement) at Wikimedia Commons and an entry at any Wikipedia project, but not on English Wikipedia. Is it possible to do that by making a SPARQL query at query.wikidata.org? If yes, can I get help for building such a query? If yes, then where should I ask for help? Thank you. Fructibus (talk) 21:21, 9 October 2017 (UTC)

@Fructibus: I guess Wikidata:Request a query is the place to ask for query help.
I tried the following query, but it times out if not limited, maybe someone can come up with a more efficient query. In the below code the number of results is limited to 1,000.
SELECT DISTINCT ?item ?commons
WHERE
{
  ?commons   schema:about ?item . FILTER(CONTAINS(str(?commons),'commons.wikimedia.org'))
  ?wikipedia schema:about ?item . FILTER(CONTAINS(str(?wikipedia),'wikipedia.org'))
  OPTIONAL { 
    ?enwp    schema:about ?item . FILTER(CONTAINS(str(?enwp),'en.wikipedia.org'))
  } FILTER(!BOUND(?enwp))
}
LIMIT 1000

Try it!

--Larske (talk) 06:03, 10 October 2017 (UTC)
You might want to filter out all of the category items first. Since most things on Commons are *not* on Wikidata yet, it might be more helpful in general if your query first counts all of the various types of things that have commons categories, which might point you in the general direction you are looking for. It is hard to make such general queries without timing out. Commons is very big. Jane023 (talk) 08:56, 10 October 2017 (UTC)
No time to test now and give you a link, you could try with Petscan, maybe. Very efficient filtering in or out sitelinks ;) --Hsarrazin (talk) 09:17, 10 October 2017 (UTC)
Here's a slightly tweaked version of User:Larske's query, without the string comparisons, so it may be a bit quicker; but it still has to be LIMITed to only a small number
SELECT DISTINCT ?item ?commonscat WHERE {
    ?item wdt:P373 ?commonscat .
    ?wikipedia schema:about ?item . 
    ?wikipedia schema:isPartOf ?site .
    ?site wikibase:wikiGroup "wikipedia" .
    MINUS { 
      ?enwp schema:about ?item .
      ?enwp schema:isPartOf <https://en.wikipedia.org/>
    } 
  } LIMIT 1000
Try it! Jheald (talk) 09:36, 10 October 2017 (UTC)

@Larske: - @Jane023: - @Hsarrazin: - @Jheald: Thanks a lot for all the answers, awesome scripts! This script is actually generating a list of articles that are missing in the English Wikipedia - complementing the en:Wikipedia:Requested articles. Is it possible to make a version to exclude lists and human settlements and administrative divisions? And is it possible to make the script in a way that I can start searching from the Row X from the database? So I can skip some of the results I got in the previous queries.

I never heard about PetScan before, a lot of goodies developed lately! Where can I ask for help with making a PetScan script? Fructibus (talk) 18:42, 10 October 2017 (UTC)

Petscan is a development and expansion of old Catscan gadget on Commons, and Autolist here, by User:Magnus Manske.
it allows to mix info from wp or ws or even commons projects, with wikidata, or between them...
it is much more intuitive than SPARQL, and that's a good thing for people who are dumb at query syntax, like me :) - you have a small manuel here.
you may combine Categories with the presence or absence of templates, and the presence or absence of wikidata item, or specific wikilinks... - you may also use a SPARQL query as input, and filter only items which have, or don't have, some specific statements…
you can also easily adapt your search to refine what you want.
from a list of articles (or pages) that still aren't linked on wikidata, you can use Duplicity to find matching in existing items :)
and finally, each Petscan query is given a specific PSID, which can then be used to call back the same search easily when you want :)
I'm trying to build a query to give you an idea. --Hsarrazin (talk) 20:13, 10 October 2017 (UTC)

About number of deaths (P1120)

Why Property:P1120 appear 2 constraint statements: number of deaths (P1120) property constraint (P2302) range constraint (Q21510860) / maximum value (P2312) 250000000 and number of deaths (P1120) property constraint (P2302) range constraint (Q21510860) / maximum value (P2312) 10000?--林勇智 13:36, 10 October 2017 (UTC)

I have removed one of the constraint. --Pasleim (talk) 21:54, 10 October 2017 (UTC)

Survey of Scottish Witchcraft Database

Hi, at the University of Edinburgh we are looking to offer students on the Data Science for Design MSc course the opportunity to take part in a data import and work through the process of how to map one dataset to be imported in to Wikidata. The dataset we are looking at working with is the Survey of Scottish Witchcraft Database with records on 3000-4000 accused witches and their trials. We have cleared that the database can be imported to Wikidata and we think it would make for a really interesting dataset addition to Wikidata once it has been linked with other datasets on Wikidata as the visualisations have not been updated in some 14 years. NavinoEvans has already suggested a 3 step process:

  1. MATCH matching all of the witches to Wikidata items manually (using Mix'n'Match).
  2. CREATE MISSING ITEMS using QuickStatements and
  3. ADD EXTRA DATA (e.g. date of birth, gender etc) through a series of QuickStatements imports for adding additional data they have about the witches.

I am looking to offer this as a group project for students at the Data Science for Design MSc's "Data Fair" on 26 October 2017 so please let me know if this would be a viable dataset to work with as the interested students would then work on it until 11 December. This is when they would present what they did in terms of the import and showcase some visualisations of their work. The database is stored in MS Access so is there a standard way of converting from Access? Finally, is the first step matching up all the properties that would need created? Nav mentioned that the inclusion of a property such as 'Mentioned in' may be problematic if say we wanted to include a link to people mentioned in a witch trial. Can you let me know what your initial thoughts are on the import anyway? (I am hoping that a successfully managed Wikidata assignment may lead on to other further datasets being imported in future assignments). Many thanks, Stinglehammer (talk) 16:42, 5 October 2017 (UTC)

I'm ambivalent about whether these should go into Wikidata - we've historically been a bit cautious about completely importing historical people databases of this kind, where the subjects are (mostly) unlikely to be represented elsewhere in the historical record. Many of the entries are very minimalist - eg A/EGD/2341 amounts to "in 1660, there was a woman called Jean Campbell, who was from Kirriemuir or Bute, and she was involved in a witch trial, about which we know nothing.". As a result, matching to mix-and-match is unlikely to get many useful hits, and the imported data would mostly not link to anything else. It feels like you wouldn't get very rewarding results from this particular dataset - are there any others you could look at? Andrew Gray (talk) 12:28, 9 October 2017 (UTC)
I think it's great that you plan to give your students a task of importing data into Wikidata.
To make a good import it would be great to have an external IDs on your end. Afterwards we can create a properties of our end that contains entries of the external ID. Likely one ID for people and another one for trials. I think it would be great to have the data inside Wikidata. In cases where a person died there's a date of death that can be used to match entries in other databases. In other cases "First Name/Last Name/Pariah/Old enough to be tried at year X" might also be enough for someone to later do matches to other data sources.
As far as links to other data sets go, it's plausible that we have entries for the ministers involved in the witch trials from other data sets and I think we also have existing items for pariahs.
I don't think Mix&Match will help you much given that most of the people in your database won't have entries in Wikidata. The biggest problem is likely matching the pariah that's written in your database to pariah items in our database.
We you want to go forward a good next step would be to create a property proposal for the external ID properties. It would make sense to make that proposal and have an agreement from our community to create the properties before October 26 (and given that proposal take at least a week it might make sense to have it open sooner. ChristianKl (talk) 15:09, 11 October 2017 (UTC)

Add a BikeTouring Wiki to Wikidata

A new BikeTouring Wiki BikeWOW has been started what is the process of adding it to Wikidata? - Salgo60 (talk) 08:53, 9 October 2017 (UTC)

I don't think we should support a just-started project that is editable by everyone. I highly doubt if it is notable. Sjoerd de Bruin (talk) 09:00, 9 October 2017 (UTC)
But if it has been going on for +years and has good content then it can be a candidate? - Salgo60 (talk) 09:08, 9 October 2017 (UTC)
Maybe as a property, but that depends on consensus. Sjoerd de Bruin (talk) 09:17, 9 October 2017 (UTC)
Just some info: There are 120 articles in that wiki currently, and nearly all of the edits this month seem to be by two authors. Syced (talk) 09:50, 10 October 2017 (UTC)
It's possible to add an external ID property. At this time there's however no reason to do that to link to 120 articles. ChristianKl (talk) 18:06, 11 October 2017 (UTC)

language property

People told me to use language of work or name (P407). Other people told me to use original language of work (P364). What is correct? seeUser_talk:Freakymovie. Freakymovie 14:12, 9 October 2017

That is not what they have said. The deprecation discussion for the property determined that it should be retained for movies, though should be migrated for written works. I believe that some have complained that you have been migrating movies. There was also some level of agreement that there would be a coordinated process to undertake the moves, though I stopped paying attention to that detail though it probably is in around the pertinent aforementioned discussion.  — billinghurst sDrewth 11:39, 9 October 2017 (UTC)
Actually consensus was reached to merge original language of film or TV show (P364) with language of work or name (P407) (without exceptions). Only in the subsequent discussions user ask for exceptions but no consensus on that is reached yet. Nevertheless, no massive movement of properties should be done yet, especially not with QuickStatements, because with QuickStatements we are losing all references and qualifiers. --Pasleim (talk) 12:27, 9 October 2017 (UTC)
I've read somewhere a request for a tool to move a statement (identical with qualifiers and sources) from a property to another... which, in fact, would be the same as changing the number code of the property (PID)... this could be very useful for that kind of case. Is it doable ? --Hsarrazin (talk) 13:48, 9 October 2017 (UTC)
  • The problem we have is that a consensus was reached, but the proposal didn't actually include a plan. Now the few remaining supporters struggle to come up with a plan. We obviously don't want to loose any information or complicate maintenance going forward. If you add data, please follow whatever is suggested by the relevant WikiProject.
    --- Jura 15:04, 9 October 2017 (UTC)
    • OK, I will follow WikiProjects. I don't delete references, only edited items without references. FreakyMovie 08:02, 10 October 2017
@Freakymovie: See WD:PFD#P407 and P364, there's a huge huge and huge section that provides all the problems of languages properties. --Liuxinyu970226 (talk) 15:04, 11 October 2017 (UTC)

Google doodles

Has there been any discussion about creating a property for these? I was wondering because I saw we have Google Doodle (Q18156042) but not much seems to be done with it. Jane023 (talk) 08:52, 10 October 2017 (UTC)

Wikidata:Project chat/Archive/2017/06#Google Doodles (you can search archives from Wikidata:Project chat/Archive). Matěj Suchánek (talk) 11:57, 10 October 2017 (UTC)
OK thanks, I guess nobody cares, so a property is a bit premature. I just added one the same way. Jane023 (talk) 12:17, 10 October 2017 (UTC)
I care: Wikidata:Property proposal/Google Doodle. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:49, 10 October 2017 (UTC)
Thanks! I totally support this, even though I have no idea how it will pan out (so many different topics it boggles my mind). Jane023 (talk) 10:33, 11 October 2017 (UTC)

Data from Wikivoyage

What data from Wikivoyage may be kept on Wikidata? Wikivoyage does not have a restriction against original research, and has no requirement for verifiable sources. Cheers, Pbsouthwood (talk) 15:38, 10 October 2017 (UTC)

Well I can imagine most visitor attractions would be desirable as locally important enough for any Wikipedia. Likewise all major transportation options for local access. Jane023 (talk) 18:48, 10 October 2017 (UTC)
Wikivoyage does not independently generate data, and, in fact, uses a lot Wikidata via the Listing template and also page banners. Information about attractions (including working hours etc), hotels and restaurants can be stored on Wikidata, there was an RfC about is. There are also map contours, but this is pretty much it.--Ymblanter (talk) 18:59, 10 October 2017 (UTC)
Would that include attractions which are not mentioned in "reliable sources"? Wikivoyage has articles on dive sites which are researched and described originally on Wikivoyage.
Which properties are appropriate to add?
Do you know where I can find that RFC? I am new here and finding things takes longer. Pbsouthwood (talk) 19:09, 10 October 2017 (UTC)
This is the RfC I mentioned, I guess most of the properties have never been created. I do not think diving cites you mention are eligible for Wikidata. OTOH, on Wikivoyage we sometimes use pages for territories which are subjectively defined (usually from the point of view of the traveller); these can be the same or different across language versions. This is OR, but it is accepted on Wikidata since any object which has an article on any of Wikimedia projects is notable for Wikidata. (In a sense, same was as Commons galleries - these are OR as well).--Ymblanter (talk) 19:24, 10 October 2017 (UTC)
Wikivoyage's banners are an example of Wikivoyage original content that use present in Wikidata and reused by many tools such as Reasonator and recently a French government online GIS database whose name I don't remember. Listings in the "See" and "Do" sections are worth including into Wikidata. It is true that Wikivoyage does not require a reference for each fact, but similarly most infoboxes imported into Wikidata have zero reference... Each listing has an URL, maybe that could be considered a reference? The URL is always the listing's homepage, for instance the homepage of a museum. The museum's name, opening hours, latitude/longitude, address, phone number are not going to get any better reference than the museum's homepage anyway. Cheers! Syced (talk) 05:39, 11 October 2017 (UTC)
Sometimes we do collect say opening times even from museums which do not have a homepage, and I see nothing wrong even in importing these to Wikidata (though it should be appropriately tagged as unsourced).--Ymblanter (talk) 07:18, 11 October 2017 (UTC)
Taking a random example: Wikivoyage English has solid information about Bait al-Baranda Museum whereas Wikidata has almost no details (Bait al-Baranda Museum (Q12200363)), and the only Wikipedia article is in Arabic.
I would consider most cases to be unproblematic and can't immediately come up with Wikivoyage content that would be a bad fit. In the case of a Museum the Museum is clearly a reliable source for whatever inforamtion Wikidata could store about it (and the source doesn't have to be online). I would recommend that you err on the side of including data in Wikidata. If there's an edge case we can discuss the case. ChristianKl (talk) 14:33, 11 October 2017 (UTC)

Instances of humans made additional instance of something else

https://www.wikidata.org/w/index.php?title=Q11608&diff=568098673&oldid=543029921 ??? Found by looking at the user's contributions, since s/he tagged smartphone models as instances of smartphones. Does WD store any article about an instance of a smartphone? A specific phone, produced, sold, owned, ... 80.171.238.123 19:29, 10 October 2017 (UTC)

So - why can't you fix the problems or maybe better start a conversation with this user on the talk page? ArthurPSmith (talk) 16:11, 11 October 2017 (UTC)

Editing an existing article

Q I am new to Wikidata, but I want to edit to correct and improve an existing article. I want to add some text, but mainly some inline book references and one External Links (a web site).

I don't want my incompetent editing to mess up the existing article

May I have some help? BFP1BFP1 (talk) 14:01, 11 October 2017 (UTC)

On Wikidata we have item and not articles. Are you sure that you are at the right place and you don't want to contribute to a Wikipedia article? Otherwise can you describe in more detail what you want to do and which item you want to edit?ChristianKl (talk) 14:34, 11 October 2017 (UTC)

pywikibot - setting rank to "preferred" using setRank() or changeRank()

Hi Project chat,

first of all: if this is the wrong place to ask, please let me know.

I am currently writing a bot based on python3/pywikibot(latest version) and running into an issue when trying to set the rank of a given claim (in my case I want to set the latest value to "preferred") which I could not resolve so far - maybe someone here has an idea.

I checked the API doc of pywikibot (https://doc.wikimedia.org/pywikibot/api_ref/pywikibot.html) and found two approaches, namely

setRank(rank)

and

changeRank(rank)

My first appraoch is to set the rank of the claim before adding it to the item

claim = pywikibot.Claim(repo, _ALEXA_RANKING_PROPERTY_IDENTIFIER)
target = pywikibot.WbQuantity(alexa_ranking,site=site)	
claim.setTarget(target)
claim.setRank("preferred")
item.addClaim(alexa_ranking_claim, summary=u'Updating Alexa ranking')

While the claim itselfs gets added and shows up on the respective Wikidata Object page the desired rank of "preferred" is not taken into account, instead the "normal" (I guess the default or fallback) rank is shown.

My second approach is to change the rank of the claim after it has been added to the item, using

changeRank("preferred")
item.addClaim(alexa_ranking_claim, summary=u'Updating Alexa ranking')
alexa_ranking_claim.changeRank("preferred")

This way the claim itself gets added, however the call to

changeRank()

results in an Exception with an empty Excpetion message

print(exception) -> ""

I am running out of ideas and therefore would like to ask you if you have a hint where my error is or how to set the rank of a certain claim to "preferred". Thanks for your input and help! --Tozibb (talk) 19:50, 15 October 2017 (UTC)

The first approach doesn't work because niether the API module wbcreateclaim can work with them (it's question whether this should be implemented).
I tried the second approach and it worked, so there may be a mistake in your code. Matěj Suchánek (talk) 16:42, 16 October 2017 (UTC)
(Continued at Topic:U04qxj2j9t1o3oay.) Matěj Suchánek (talk) 12:38, 17 October 2017 (UTC)
This section was archived on a request by: Matěj Suchánek (talk) 12:38, 17 October 2017 (UTC)

member of (P463) and parliamentary terms, which way to fix?

At the moment the practice has been to mark UK (and some other parliaments) members has been to say "member of" then "nnth United Kingdom Parliament", see example Ramsay MacDonald (Q166646). The constraint violation is that "member of" needs to be an organisation. So we either have to fix upstream, or review whether the use of "member of" is correct. At this moment, I am thinking that for the parliamentarians we should be considering migrating the property to be used to be participant of (P1344), as it seems that parliamentary terms are more of an event. I would appreciate others' thoughts.  — billinghurst sDrewth 00:39, 11 October 2017 (UTC)

Using member of (P463) for these is largely deprecated, and (for the UK at least), are all in the process of being moved to position held (P39) statements instead. --Oravrattas (talk) 02:55, 11 October 2017 (UTC)
Yes, this was a bit of an improvised approach from a couple of years ago. I've just this week started importing the "correct" P39 format back to the 1830s. I'll be removing the P463 versions as we go along; they're useful as a backup error-check so I don't want to remove them preemptively. Andrew Gray (talk) 12:23, 11 October 2017 (UTC)
@Andrew Gray: and @Oravrattas: What about the property participant of (P1344), as mentioned by user:billinghurst? Breg Pmt (talk) 18:41, 11 October 2017 (UTC)
@Oravrattas, billinghurst, Pmt: To be honest I don't think it's worth going to the effort of doing that - this information will be in the P39 values anyway (being part of 51st United Kingdom Parliament (Q21084468) is implictly stated by Member of the 51st Parliament of the United Kingdom (Q41582627), and we will try and make sure a sensible query finds that) so it would just be duplication. Cutting down the number of ways to say the same thing seems a good idea :-) Andrew Gray (talk) 19:28, 11 October 2017 (UTC)

Any showcase items or examples for military figures

To Arthur Henry Grant (Q16943854) am trying to add that he was an officer in Royal Inniskilling Fusiliers (Q7374326). I cannot find the representation of military unit. So can anyone suggest a good example of a military officer to follow? Wikidata:Showcase wasn't evidently helpful.  — billinghurst sDrewth 12:26, 11 October 2017 (UTC)

More asking than answering, what about use organization or club to which the subject belongs; member of (P463) Breg Pmt (talk) 19:09, 11 October 2017 (UTC)
@Pmt, billinghurst: We discussed this a couple of years back and the consensus that member of (P463) was a good approach - see notes on Property talk:P463 and Wikidata:Project chat/Archive/2014/05#Military units. You probably also want to add military branch (P241). At some point I want to write up some general guidance on how to handle military people but it's hard to find the time! Andrew Gray (talk) 19:32, 11 October 2017 (UTC)
@Andrew Gray: Thanks a lot. And military rank (P410) as qualificator in position held (P39) as officer Breg Pmt (talk) 19:51, 11 October 2017 (UTC)

Page to outline the benefits and challenges of using facts from Wikidata on other Wikimedia projects

Dear all

Over the past week I have been working on a page to provide information about reusing Wikidata data on other Wikimedia projects which is now at Wikidata:Wikidata in Wikimedia projects.

I wrote it with the following things in mind:

  • To be as simple as possible with as little assumed knowledge as possible other than that the reader may have some experience contributing to another Wikimedia project.
  • People who are reading it may not know much about Wikidata, have misconceptions about it, or may be against Wikidata data being used on their Wikimedia project.
  • That the page could reduce the number of repeat discussions on the same resistance to Wikidata data being used on other Wikimedia projects and could act as a combination of introduction to Wikidata, a FAQ about Wikidata data on other Wikimedia project and provide examples of where it is happening already.

When read it, please keeping in mind:

  • It is a first version, it will need some love to get it to a high standard, I think the weakest section related to Wikidata data quality, if anyone knows of any studies that could be added that would be great.
  • When adding text please please be concise, TLDR is a big problem
  • What is planned in future and putting this in to address any issues
  • As always typos and grammar
  • Anything I’ve missed or fudged, I’m slowly working through a list of discussions I've added to the discussion page (I’ve only really done the first one).

While writing it it has become clear that whilst Wikidata’s basic instructions are better than some other Wikimedia projects they still need improving. If we are expecting contributors from other Wikimedia projects to trust, use and contribute to Wikidata we have to make it more understandable and easier to use. Much of the resistance about reusing Wikidata data on other Wikimedia projects stems from concerns that Wikidata is confusing, improving instructions will go some way towards addressing this. The most common activity of contributors from other projects on Wikipedia will be adding statements and references, Help:Sources learning curve is too steep, we need simple instructions for simple tasks, videos would be really helpful also.

Thanks

--John Cummings (talk) 10:05, 29 September 2017 (UTC)

Thanks for starting the page. I think it's a bit too long already, actually. Maybe we should make a short list that links into this one? I am already starting to think maybe it should be a project and not just one page. So each page could be a set of links to relevant discussions on various language Wikipedia village pumps (the recurring discussions). I know that English has had quite a few recurring discussions, but I am not sure if these are exactly the same discussions in other languages (or for Commons or Wikisource). Poking around the various encyclopedia datasets for example, I noticed the ADB and a few others has created items for each article on German Wikisource, but the ODNB hasn't done this and opted to index each article in English Wikisource to an item about the topic of the article (which I think is better, and of course, many of those topic items already existed). In order to centralize such data mapping decisions we really need to be able to centralize community discussions somehow, but I am not sure this page does that right now, as is. Jane023 (talk) 11:49, 29 September 2017 (UTC)
Thanks Jane023 (talkcontribslogs), I think the examples section is just going to keep growing, I don't really know what to include and what to exclude, I feel like the other sections of the page are basically all leading up to the Examples of data reuse section, and information on things like what control the wiki using (something users on en.wiki have made very clear they want) is the essential information, but the information before is context which is needed..... --John Cummings (talk) 12:37, 29 September 2017 (UTC)
This is an outstanding first effort. Outstanding because it is not only text, the layout, the use of some graphics make it easier to read. While I understand a wish for a shorter format, this easily becomes a self defeating exercise. There is already too much splattered around Wikidata to follow and the discussions have become more angry, acrimonious lately. The tone is friendly and that too is an achievement.
If anything keep this format and have more of these easy to read explanatory texts. Thanks, GerardM (talk) 05:21, 2 October 2017 (UTC)
@GerardM:, you're making me blush :). I've hit the limit of my knowledge of Wikidata, I think linking to developments planned for tracking Wikidata changes from other Wikimedia projects (I guess in Phabricator) is really important to add specifics to, if you know where I could find them? Thanks, --John Cummings (talk) 09:52, 2 October 2017 (UTC)
I like this page. Breg Pmt (talk) 19:58, 4 October 2017 (UTC)

OK I was confused because I was expecting this page: User:John Cummings/Wikidata in Wikimedia projects. Did you write that first and then try to flip it in order to make it more positive? Both pages are good, for different reasons. The first tells people how to go about fixing and contributing, and the other tells people how to reply to blanket negative assertions. Both are needed. Jane023 (talk) 11:54, 5 October 2017 (UTC)

@Jane023: thanks, I need to go back and add this in, maybe as a FAQ or something... I'll keep thinking about it. I'm going to do a lot of Wikidata documentation over the next 12 months. --John Cummings (talk) 16:33, 5 October 2017 (UTC)
My first instinct is that I find the claim that individuals adding data is the most common way new statements get created to be doubtful. I would expect more bot creations. Especially among the data that was added in the last months. ChristianKl (talk) 10:49, 12 October 2017 (UTC)

Best practices for using Wikidata Properties on a separate site?

I'm looking for pointers on best practices in using the Wikidata properties/ontologies on a separate Wikibase instance. I'm going to create a dataset of things that are too obscure/of local interest only and will not meet the Wikidata Notability guidelines and so shouldn't go into wikidata.org. I plan on setting up my own Wikibase instance and domain, but I'd really want to be able to use the properties that the Wikidata community has put together: P31 should mean the same thing for people editing my site as it does for people editing Wikidata, etc.

Also, is there a good way to keep identifies namespaced separately in different instances? I'd like 'P31' to mean the same thing and be called the same thing, but I think it'd be nice to not call my entities 'Q1234' etc. (Someday in a linked data setup if someone tries to combine the two datasets, it might be nice to be very clear that the identifiers are from different universes)

Are there any sites out there that I should look at for inspiration/best practices? Is it possible to set up a wikibase instance and import only the properties from a backup but not any of the entities?

Also, apologies if this is the wrong forum to ask questions about this that are wikidata-like but not explicitly wikidata.org - if there's somewhere else I should go, I'd appreciate the redirect. Thanks! Erik s paulson (talk) 02:58, 5 October 2017 (UTC)

There are at least one or two talks scheduled for WikidataCon that address this - see Wikidata:WikidataCon 2017/Submissions/Integrating a custom Wikibase Instance (Rhizome) and Wikidata via SPARQL for instance - I'd suggest contacting the presenters or others involved for more details. ArthurPSmith (talk) 12:19, 5 October 2017 (UTC)
@Erik s paulson: What kind of data do you want to store? ChristianKl (talk) 14:38, 12 October 2017 (UTC)

Do we need encephalon (Q75865) and brain (Q1073)

Currently, both are listed to refer to FMA50801. Is there any difference in the two concepts? In the languages I speak (English, German) there's no article for encephalon (Q75865), so it's hard for me to evaluate the content. ChristianKl (talk) 18:47, 11 October 2017 (UTC)

I used Google Translate for the French version and encephalon (Q75865) seems to be about Chordata brains. ChristianKl (talk) 00:19, 12 October 2017 (UTC)
indeed, brain (Q1073), the organ is comprised in encephalon (Q75865). 2 French different articles. --Hsarrazin (talk) 12:52, 12 October 2017 (UTC)
@Hsarrazin: Is that's the relationship can you map out with has part (P527) what the encephalon (Q75865) is supposed to contain according to the french? ChristianKl (talk) 13:15, 12 October 2017 (UTC)
since anatomy is clearly not my domain of expertise, I would not risk it. :/ --Hsarrazin (talk) 13:35, 12 October 2017 (UTC)

Recent changes

Hi: I'm a little confused with the new update of Recent Changes. Previously I could click IP contribs to show anonymous edits, but with the new filters when I click it the IP contribs aren't shown. I've tried with clearing all the filters, but still I can't see the them. Esteban16 (talk) 19:12, 11 October 2017 (UTC)

If you were used to the old interface, you can still switch to it. This link will work for the new one. Matěj Suchánek (talk) 09:59, 12 October 2017 (UTC)

director/manager (P1037)

From an English-language point of view, this property does not make much sense as it is.

For one, the label "manager/director" is nonsensical. A manager is one thing, and a director is another. Of course, the exact definition of either can vary from one organisation to another, or in other countries.

I also see that it is being used to list commanders in United States Army Europe (Q181197). Which doesn't look right to me.

Perhaps it would be better if we were to follow the same model as is already used for things like political offices such as Prime Minister of the United Kingdom (Q14211). This would mean having to create new items for specific positions, such as "Commander of the 5th Division", and linking that item to the main item for the "5th Division". In other words a way in which we can create specific job titles without having to create a new property for each one (there could be 1000s of them). Danrok (talk) 00:15, 12 October 2017 (UTC)

@Danrok: For a military commander, you can use commander of (P598) and a link to the unit - so Ben Hodges (Q19798487) would have commander of (P598):United States Army Europe (Q181197) - this avoids having to create specific values to go in P39. There isn't really an appropriate property to link from the unit back to the commander, so to be honest, I'd advise just doing the commander > unit links and leave it there. This is generally the preferred way of handling positions/roles of people, anyway; we don't need all political office items to contain a list of holders. Andrew Gray (talk) 12:41, 12 October 2017 (UTC)

Top 10% economists and Top 10% female economists now in Wikidata

During the last half year, and building on the work of @Bamyers99: and others, I've linked and added items for the complete sets of "top economists" as identified in the widely recognized Research Papers in Economics (Q206316) author rankings. The necessary checks to avoid duplicates have been largely facilitated by the latest enhancements of Mix-n-match, particularly the new "Multiple matches" list and the enriched item information, by @Magnus_Manske: Thanks!! -- Jneubert (talk) 08:23, 12 October 2017 (UTC)

Problems with properties

Hwllo.There are problems with the properties:

  1. Problem with YouTube
  2. The problem of the proposals:There is no distinction between proposals so please distinguish using templates such as c:Template:DeletionHeader and c:Template:DeletionFooter between each of the
    1. open (on hold) proposals
    2. properties ready for creation
    3. not done proposals
    4. withdrawn proposals
    5. ready proposals

Thank you ديفيد عادل وهبة خليل 2 (talk) 08:27, 12 October 2017 (UTC)

I can't figure out what you mean. Please can you clarify? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:21, 12 October 2017 (UTC)
Hi ديفيد عادل وهبة خليل 2 - property proposal status is distinguished by the value in the "status" field of the property proposal template - the allowed values are empty (still under discussion), "ready" (ready for creation), "not done" and "withdrawn" (no longer under discussion), "hold" (if on hold for some reason) and a property ID if created. Empty, ready, and hold proposals are listed on the various property proposal pages and all together on the overview page which is updated by a bot daily. "Ready" properties are also listed automatically in the Category:Properties ready for creation page. I think this is all working nicely right now, but if you're seeing a problem please clarify! ArthurPSmith (talk) 13:09, 12 October 2017 (UTC)
@Pigsonthewing, ArthurPSmith:The problem is that the color of the page itself is fixed and does not change by status, unlike the deletion requests on Wikipedia and Commons ديفيد عادل وهبة خليل 2 (talk) 13:28, 12 October 2017 (UTC)
Ah, that makes sense. @Pasleim: it's your bot, do you think you could show status or use it for color-coding somehow on the Overview page? ArthurPSmith (talk) 13:38, 12 October 2017 (UTC)
Oh, or are you suggesting that a property proposal page itself should be color coded based on status? ArthurPSmith (talk) 13:40, 12 October 2017 (UTC)
And on the proposals pages themselves.Thank you ديفيد عادل وهبة خليل 2 (talk) 13:41, 12 October 2017 (UTC)

Update on creating a list of commonly used proccesses to format data to import it into Wikidata

Thanks for everyone's input on creating a list of commonly used proccesses to format data to import it into Wikidata. I've turned it into a table which is now available on this page. It is still missing a lot of information so please take a look and add what you can.

Thanks again

--John Cummings (talk) 10:07, 12 October 2017 (UTC)

Difference between hamlet and village

I'm looking for guidance on when to use the term "hamlet" and when to use "village" when giving an "instance of/P31" to a small human settlement. Apologies if this is obvious or guidance is listed somewhere, but I can't seem to find anything definitive. The term village is widely used in preference to hamlet, in many cases for settlements that are very small. In addition can either of those terms be used as a "located in the administrative territorial entity/P131" for associating a building/monument etc to a location. I ask because I associated a listed building with a small village, using "located in the administrative territorial entity" only to have to removed by another user. Perhaps there is a more appropriate way of associating a building to a location that is more specific? Many thanks.  – The preceding unsigned comment was added by JerryL2017 (talk • contribs) at 10:52, 16 September 2017‎ (UTC).

@JerryL2017:. Good question.
Taking the point about located in the administrative territorial entity (P131) first, technically P131 should point to an entity that has some administrative status -- so for example in England a civil parish. One advantage of limiting P131 in this way is that we can quality control items at the civil parish level, making sure that they all have Commons categories, official identifiers, etc; and P131 statements set in turn, to make it possible to hierarchically extract all the items in that tree eg in a particular county. The 'administrative' territory is also the level that will have some kind of body with some kind of formal responsibility (or at least consultation input) over the item, so this is a useful thing to identify.
For more precise relations to things that are not administrative units, the property location (P276) is recommended instead -- this could be appropriate for smaller villages. In addition the property located at street address (DEPRECATED) (P969) is available for the full postal address, given separately.
As to the village/hamlet question, I've been recently coming across this a bit myself. I've going through the last few civil parishes not yet identified as such (113 currently to go), merging any doppelgangers, giving them a instance of (P31) = civil parish (Q1115575), making sure that they have a GSS code (2011) (P836), Commons category (P373), located in the administrative territorial entity (P131) etc.
Part of what I've been doing is asking myself should they also be tagged instance of (P31) = village (Q532), or perhaps instance of (P31) = hamlet (Q5084), or instance of (P31) = human settlement (Q486972) (which is sometimes what has been imported from Dutch or Polish wikis).
I don't have a bright-line answer for this. I think there are probably a lot of things at present that aren't tagged village (Q532) that maybe should be; also some that are that probably shouldn't be. For myself, if en-wiki or Commons say that it is a "village and civil parish" then I tend to go with that; also if Googling for "<place-name> village" produces some sensible-looking hits using the word village with the place-name (as oppose to pages that are just auto-generated search engine fodder), then I go with that. Otherwise if there are coordinates I click through to an Ordnance Survey map, and see whether it looks like it has many houses.
Ultimately what we need is a reliable external source on how to classify it. I am not sure whether (for the UK) the census or the ONS has this. I think the Ordnance Survey identifies hamlets in its database (sometimes linked through a TOID (P3120) property, with an ID-number starting with "4"); but I am not sure that we can legally harvest and re-use that on a systematic basis.
At some point this is something that does need to be checked over systematically. But for the moment I've just been doing my best on an item-by-item basis. Jheald (talk) 12:19, 16 September 2017 (UTC)
Wikidata does not seem to have properties to support the distinction between an incorporated village, found in the northeastern United States, and a "village" in ordinary speech, which might or might not have any government organization associated with it. In the northeastern US people often call a certain area, such as Hydeville, Vermont (Q30624489), a "village" in casual conversation, but a "hamlet" when discussing government matters such as elections, because Hydeville has no government of any kind associated with it. In that area, several states issue charters to incorporated villages which do exercise government functions. I can say that because I'm a member of the Board of Civil Authority of the containing town, Castleton (Q1049714). Jc3s5h (talk) 12:37, 16 September 2017 (UTC)
A good approach here is to create specific item types for local concepts like this - eg village municipality of Quebec (Q27676420) for a specific class of village in Quebec, or city of the United States (Q1093829) for US "cities" (in the administrative sense, not the common one), with liberal uses of subclasses to tie everything together. Of course, creating enough items for all the admin types can get very time-consuming... Andrew Gray (talk) 16:31, 16 September 2017 (UTC)
Many thanks for some interesting input. On the village/hamlet question, given that there doesn't seem to be a globally applicable definition of the difference and no systematic way of reviewing those that are in the system (my COUNT of village shows 293000+ instances of village (Q532)) it seems an acceptable approach is to classify any settlement smaller than a town as a village, even if it may only have a few houses. But I'm happy to be corrected here? Thanks again. – The preceding unsigned comment was added by JerryL2017 (talk • contribs) at 08:10, 17 September 2017‎‎ (UTC).
There is an old definition of hamlet in the UK that it is any small settlement that does not have a church, or more broadly, any services, such as a shop, public house, or post office. Typically in the UK, when ecclesiastical parishes were effectively the lowest level of administration, this would mean that the settlement in a parish that had the church would be the village, and all other settlements would be regarded as hamlets, regardless of size of population etc. However, this definition doesn't really take into account changes in demographics and the move towards the use of civil parishes as the administrative area, most of which were based on the older ecclesiastical parishes, but sometimes later merged into bigger civil parishes.
As JHeald stated the Ordnance Survey (OS) classifies any Named Place (TOID (P3120) property, with an ID-number starting with "4") with a field called "Populated Places" which can be city, town, village, or hamlet. Unfortunately, the links to the ontology of the data don't work so we can't see what their definition of the terms are... There do seem to be contradictions with the older definition. For example, Lansallos (Q1762721), a village and former civil parish, is identified by the OS a hamlet. You could argue that it should be marked as a village until some some unknown point in the past, when it came to be regarded as a hamlet...
Interesting point from Jheald on using OS data. Most of it is from OS OpenData which is free to use, but you should should acknowledge the source with "Contains OS data © Crown copyright and database right (year)". Does this mean that every statement of coordinates, area, adjoining areas etc that we use from the OS should be acknowledged? If so, how? Robevans123 (talk) 11:14, 17 September 2017 (UTC)

OGL licence for data

Even though the data is there in the OS Open Data website, and there are several civil parishes we don't have co-ordinates for, I have been cautious about extracting it from the OS, because I am not sure what the answers are to the above licensing questions. (So eg instead I have averaged coordinates for items located in the parish, or looked the place up on Streetmap.co.uk which has quite a nice facility for reading off GPS coordinates for a point on the map). But I think these are questions we need to think about, and perhaps sooner than later. In particular:

  • Is the Open Government License viral -- if somebody in turn reuses the coordinates from Wikidata, would they too need to say that their report "Contains OS Data" ? Reading the OGL the question doesn't seem to be spelt out. But if there was such a passed-on requirement, that would be not be compatible with our headline CC0 licence for Wikidata.
  • If we can use OGL data, how do we implement the credit line. Do we need a new item-valued property "credit line for data" ? Since they are specifically requiring a credit line, simply referencing the source of the data is presumably not enough.
  • If we do use such data, does Wikidata need to have a project-level page with the credit line on it (and any other similar credit lines), presumably prominently linked to, perhaps also giving a count of the number of statements with the credit line & a query to list them.
  • Even if that were legally possible (ie OGL not viral), is it something we would actually want to do? Jheald (talk) 09:06, 18 September 2017 (UTC)
Pinging @Jdforrester (WMF): to see whether he has any input on this. Jheald (talk) 14:07, 18 September 2017 (UTC)
@Jheald: Hey there. I'm not sure I'm the best person to ask with regard to this (I'm not a lawyer after all), and this is mostly asking me in my personal capacity as something I used to work on before I was at the Wikimedia Foundation, but my initial thought would be that yes, the OGL requirement to "acknowledge the source of the Information in your product or application by including or linking to any attribution statement specified by the Information Provider(s) and, where possible, provide a link to this licence"[OGL 1] would be incompatible with CC0, sorry. You should seek better-informed advice than mine, however.
  1. http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
  2. Jdforrester (WMF) (talk) 15:31, 18 September 2017 (UTC)
    Thanks very much @Jdforrester (WMF): for taking the time out to give your thought on this.
    It is certainly clear that we would need to provide that attribution statement, per the license. But it's not quite so clear (at least not to me), whether any obligation is placed on us to similarly require such attribution by anyone who obtains and uses that information point from us. Maybe I'm missing something, but the licence doesn't seem to impose that obligation. And if there was no downstream obligation on people reusing the data from us, then it would be compatible with CC0. On the other hand, I've felt sufficient discomfort about the question that I haven't used any OS data so far (apart from providing links to them, which I think is fair game).
    But you're probably right, we probably need to ask the lawyers. And maybe whoever owns the OGL, if we can put together a case for why requiring induced attribution on further downstream use would be counterproductive. Jheald (talk) 16:02, 18 September 2017 (UTC)
    I've spent some time reading about the OGL and am know pretty sure that:
    • there is no requirement to impose the license conditions on re-users of the data - so using the data within Wikidata is compatible with our own CC0 license.
    • there is a requirement to acknowledge the source - this can be done on a single page or file somewhere reasonably accessible. If multiple OGL data sets are used it is possible to merge the acknowledgements into one statement, such as "Contains public sector information licensed under the Open Government Licence v3.0".
    There are some further restrictions (no personal data, government logos, military insignia, individual's copyright etc..), and also you must not make any claim that you are endorsed by the government or that your data is current when it is not etc.
    I think the rationale behind the OGL is basically that the OGL data is out there for anyone to use with as little restriction as possible, but if you use it, people should be able to see when and where you got it from and ideally be able to access the original, or up-to-date, information easily.
    So yes, I agree with Jheald that Wikidata does need "a project-level page with the credit line on it (and any other similar credit lines)". I think a page labelled "Credits" at the same level as the links to "Privacy policy", "About Wikidata", "Disclaimers" etc are shown at the bottom of the main page.
    The OGL also suggests that it is good practice "maintain a record or list of sources and attributions in another file or location, if it is not practical to include these prominently within your product". I think this would be a good idea anyway to record details of databases that have been incorporated into Wikidata. For example, we have a lot of data on scheduled monuments and listed buildings in the UK that was imported from English Heritage (Q936287)/Historic England (Q19604421) (there was an operational split in 2015) in 2014 and expanded in 2016. Since their database is regularly updated it is useful to know what was added and when, so we can occasionally look for updates.
    Finally, I read somewhere that although a database may be copyright, the individual items in the database are not. SO I think it is possible to occasionally add civil parish coordinates from Ordnance Survey (Q548721) and such use would be regarded as insubstantial. If someone took a copy of the OS OpenData data set and extracted all the civil parish coordinates and added them to Wikidata we would certainly need to include an OGL acknowldegement.
    Disclaimer - I am not a legal expert - all the above is based on reading of available information. We really should get legal advice from the Foundation on this matter...Robevans123 (talk) 11:37, 19 September 2017 (UTC)

    Here's a forum post about coordinate conversion. It isn't clear from the discussion in this sub-thread what kind of coordinates are being obtained from OS Open Data. Wikidata uses WGS84. I don't know about the UK, but in the US, copyright does not apply to facts, it applies to how facts are expressed. If the way the facts were expressed were different, because they are in a different coordinate system, would that avoid the OS copyright? Jc3s5h (talk) 14:32, 19 September 2017 (UTC)

    According to the EU en:Database Directive, "a person infringes a database right if they extract or re-utilise all or a substantial part of the contents of a protected database without the consent of the owner." The key word here is probably "re-utilise". For what it's worth, the UK Ordnance Survey provides latitudes and longitudes based on WGS84 as well as UK national grid coordinates based on OSGB 36. Extraction of the OSGB coordinates in the first place would count as extraction. Even though there seems to be no explicit use of the phrase "derivative work" in the directive, conversion to WGS84 would I fear nevertheless count as "re-utilisation".
    And -- contra what I wrote above -- I think the phrase "re-utilisation" probably does sink us on the third-party re-use question as well. Yes, there seem to be no explicit licensing conditions the OGL tells us we have to impose on re-users of our data. But if those re-users are re-utilising information ultimately derived from an OGL database, I think they too are caught by the "re-utilisation" phrase of the database directive, and therefore they too can only legitimately re-use the information if they comply with the terms of the OGL. And therefore we cannot say honestly, as CC0 would require, that the information is freely re-usable, no strings attached.
    The OGL doesn't oblige us to impose downstream re-use conditions; but it does oblige us to advertise that our data contains OGL data, to make anyone reusing a substantial part of it reasonably aware that there was database right claimed in some of the data, and they would need to comply with the OGL if they in turn were to reuse a substantial part of the OGL originated material.
    So sadly I don't think I can legitimately import land-area information from the ONS db file for all 12,000+ entities we have GSS code (2011) (P836) for. Instead, I think we do have to follow the summary section of WMF Legal's page on database rights over at Meta, to keep "extraction and use of data" from such sources to a minimum.
    But I will drop a note on the talk page there, to see if someone from there can look over this discussion and tell us whether they agree. Jheald (talk) 16:42, 19 September 2017 (UTC)
    Actually, CC0 just waives the rights of the owners/contributors (the affirmers) of the work (Wikidata) to be recognised and acknowledged. But the affirmer also disclaims any responsibility for clearing rights of other persons that may apply to the Work (section 4c of the full (legal code) text of the CC0). Effectively, third party users are responsible for their own actions - if we've acknowledged that geo-spatial information in Wikidata includes information from Ordnance Survey and has been licensed under an OGL, then someone takes all of that OS data and re-uses it in a product without adding an OGL acknowledgment - that is their problem (and also would not be a good business choice since they could get guaranteed up-to-date info from the OS for free anyway). So although Wikidata is freely available, it doesn't mean that it is necessarily a "do what you want with it, no strings attached" resource.
    I do believe that WMF Legal's advice to keep "extraction and use of data" is valid for databases that people want to protect, but I don't think it fully recognises that at least some UK government departments really want you to use their data! Robevans123 (talk) 18:25, 19 September 2017 (UTC)
    Hi, I co-wrote the Wikilegal piece quoted above, and I'd be happy to help here. I understand that the question was raised whether the OGL (a license I was unfamiliar with) is viral (which I think it is not) and whether the OGL allows us to incorporate licensed data into Wikidata (which I think it does). But I need to re-think this, and it's quite late now, so maybe bug me again tomorrow :-) --Gnom (talk) 20:28, 19 September 2017 (UTC)
    @Gnom: Anymore thoughts? How should we acknowledge the data source? With a credits page? Cheers Robevans123 (talk) 19:52, 25 September 2017 (UTC)
    Thank you for reminding me, Robevans123. I think we can incorporate the data into Wikidata simply by linking to the appropriate government page under "source", and that's it. --Gnom (talk) 21:17, 26 September 2017 (UTC)
    @Gnom: Sorry to pick this up so late, I meant to get back to you on this earlier. But when eg the UK Ordnance Survey specify that under the OGL they want the specific attribution text "Contains OS data © Crown copyright and database right (year)", surely we would have to give them that attribution? And if so, if the statement in that attribution is true for the data, how is that compatible with saying that the data is available CC0 ? Jheald (talk) 07:45, 2 October 2017 (UTC)
    @Jheald: Good questions. I think this FAQ answers part of the questions. CC0 only gives away our (wikidatians?) rights to be recognised. We can't give away other people's rights, but we can include their data (provided that they are allowing their data to be used elsewhere) and ideally we should acknowledge them as the source and copyright holders.
    The question of acknowledgement is less clear - I haven't yet found any good examples - there is nothing quite like Wikidata! Most seem to favour a simple statement as you've given above, and UK Ordnance Survey seem happy with that.
    Gnom is suggesting that using "source" is sufficient. I'm not sure whether he means imported from Wikimedia project (P143) or source website for the property (P1896), or either/or, or both. Also useful to add retrieved (P813) so people have an easily have an idea of how old the data is (better than having to trawl through the history page).
    I would favour both:
    I'm pretty sure that such a "belt and braces" approach ensures that we meet the requirements of the Open Government License, and also provides users with enough information, so they are aware that there is copyright on some of the data, and can also go and find the latest and most accurate version of the data from the original source.
    Incidentally, I didn't even realise we had a copyright page until it was mentioned in a later discussion on this page. It should probably be in a slightly more prominent place. Robevans123 (talk) 12:13, 2 October 2017 (UTC)
    I am deeply uncomfortable about this. Even if you are correct, and extensive use of OGL data may be compatible with the letter of the CC0 licence, it seems to me to run strongly against the spirit of it, and against our purpose in warranting that the data here is CC0. To me, that carries the implication (in spirit, if it is true that it is not in letter), that to the best of our awareness there is no copyright or database right held on the data or any extensive portion of it by anyone -- just as all copyrights in Commons images are required to be released or licensed under open licences.
    I would really value hearing specifically from User:Gnom why he thinks that OGL is compatible with Wikidata and CC0. Jheald (talk) 12:23, 2 October 2017 (UTC)
    I think the spirit of the license is defined in the letters of the license. Surely that is the whole point of a license. See section 1, with regard to sub item ii of the legal text.
    I think there is a deep misunderstanding of what CC0 means - all we are doing is waiving our rights as contributors. Almost all of the data is derived from other sources, some in copyright (with permission or fair use), some out of copyright, some in public domain, much of it released by others using Creative Commons licenses. All we are releasing is our contributions (which is primarily how the data is structured and linked through statements and properties etc).
    Take a look at the section beginning "Even though are you not making any warranties of copyright ownership...". In fact, using OGL data extensively does not seem to be any different to using data created from bots crawling through Wikipedia (using data that has been released under a CC BY-SA 3.0 license). In fact, we should also include an acknowledgement of that source (and other Wikipedias) on the copyright page.
    We should also acknowledge Wikimedia Commons as a source as well - the images, associated text, and the structure of the categories are all released under CC-BY-SA licenses...
    We not only want to avoid Wikidata getting sued but we also don't want data reusers to get sued. Even within our own usage of the data we do publish Truthy dumps that string away information contained in qualifiers and thus also any credits that are given in those qualifiers. ChristianKl (talk) 17:11, 11 October 2017 (UTC)
    "Actually, CC0 just waives the rights of the owners/contributors (the affirmers) of the work (Wikidata) to be recognised and acknowledged (...) then someone takes all of that OS data and re-uses it in a product without adding an OGL acknowledgment - that is their problem" - so you claim that users of Wikidata are supposed to magically guess what kind of licences are applied to given statements? How requirements for attribution and other are documented in Wikidata entries? Mateusz Konieczny (talk) 09:52, 6 October 2017 (UTC)
    Just to avoid confusion - the licence in question (Open Government Licence for public sector information, version 3) says the following in the most relevant section:
    You are free to:
    • copy, publish, distribute and transmit the Information;
    • adapt the Information;
    • exploit the Information commercially and non-commercially for example, by combining it with other Information, or by including it in your own product or application.
    You must (where you do any of the above):
    • acknowledge the source of the Information in your product or application by including or linking to any attribution statement specified by the Information Provider(s) and, where possible, provide a link to this licence; [...]
    This means that when I want to incorporate a database licenced under this licence into Wikidata, I am free to do so under one (and only one) condition:
    • I must acknowledge the source of the database by linking to
    • the attribution provided, if any, and
    • if possible(!) provide a link to the licence.
    Since Wikidata allows me to add links and name authorities in connection to any statement to indicate a source, I can do that and thereby fulfill the requirement of the licence.
    Once I have successfully acknowledged the source of the data, I am set. Nothing else needs to be done or ever cared about. End of story.
    Again, to be extra clear: This licence is not viral. This means that the fact that Wikidata is licenced under CC0 does not, and cannot, pose any problem at all. The only case in which this would pose a problem is if the licence were viral. Since the licence is, again, not viral, the CC0 attribution of Wikidata is just fine.
    If the authors of this licence had wanted to create a viral licence, they could have done that. Since they haven't, the licence is not viral.
    I am welcome to any comments or questions regarding this reasoning. Please also let me know if my reasoning is flawed or incomplete. Thanks, --Gnom (talk) 17:16, 2 October 2017 (UTC)
    Maybe I missing something but in one side we have a site that said "use my data but provide attribution", in other side, we have Wikidata that said "use my data and do all what you want, no attribution is necessary" the result is « use wikidata for "cleaning" some type of license ». I don't think that in this manner we respect the license. --ValterVB (talk) 18:10, 2 October 2017 (UTC)
    @ValterVB: could you clarify what you mean by « use wikidata for "cleaning" some type of license »? I'm not sure what this means... Which license do you think we are not respecting - the OGL or CC0? Robevans123 (talk) 00:38, 3 October 2017 (UTC)
    @Robevans123: Example: I want use some important data from site X where license say: "provide attribution", but I don't want provide attribution; I add data on Wikidata with attribution so I can use data without attibution for my work and if someone say something I can say :"Wikidata is CC0, I mustn't provide attribution". --ValterVB (talk) 06:33, 3 October 2017 (UTC)
    @ValterVB: - thanks - that's a great example and clarifies your concerns. I'll reword it slightly, and expand it, and try to explain what the copyright issues are.
    1. Let us say that Angharad creates a database, entirely of her own making and releases it to be freely used with a CC-BY license (CC-BY licenses do not include a requirement for re-attribution - the license is not "viral").
    2. Bronwen sees that the data may be useful to her work, and copies it into the Dilwyn Knowledge Base, which is an open CC0 resource hosted on servers run by the Eurwen Foundation.
    3. Brownen provides attribution to Angharad's work on the Dilwyn Knowledge Base.
    4. Brownen copies Angharad's work into her own product or research and releases it without acknowledging Angharad's work.
    The result is that Bronwen is in breach of copyright of Angharad's work - it is no defence to say that she got the data from the Dilwyn Knowledge Base which is available freely under a CC0 license. A CC0 license releases data purely "as is" with no guarantee that it is complete or accurate, and more importantly, the legal text specifically says "a work made available under CC0 may be protected by copyright and related or neighboring rights" (such as the moral rights/copyright of the original author - in this case Angharad).
    Angharad would be able to sue Bronwen for breach of copyright, and since Bronwen has deliberately tried to evade copyright I would expect the damages due would be very high!
    The question is why Brownen doing this? Even if Bronwen was creating a commercial product, let's call it Ffion's Ultimate Normalizer (FUN), she would not have to pay anything to Angharad if she just gave Angharad the correct attribution somewhere in FUN.
    Incidentally, Angharad would not be able to take action against the Dilwyn Knowledge Base or the Eurwen Foundation.
    Let's say that a third developer, called Cerys, takes Angharad's work and includes it in a product. Again, Cerys is in breach of copyright of Angharad's work - yet again it is no defence to say that she got the data from the Dilwyn Knowledge Base (because the CC0 license offers no warranties, and also says that it "disclaims responsibility for clearing rights of other persons that may apply to the work"). Cerys is required to perform due diligence on the source of the data she is using if she's using the data in a published work or product.
    Hope all this helps. Robevans123 (talk) 19:07, 3 October 2017 (UTC)
    Thanks, I understand your eplanation but I don't think that CC0 is like you mean it. The person who said "is CC0" implicitly say "you can use this data and you can do all what you want" no restrition about the use of this data. In your example Bronwen do an illegal thing, take "something with a lincense" and trasfom it in "something without license" or better "something that you can use like you want without constraints". Naturally this is only my idea and I'm not a lawyer :-) . --ValterVB (talk) 19:52, 3 October 2017 (UTC)
    Addendum: This is an important point and I think that an explicit parer from legal team is fundamental: "Can I add data under CC by in Wikidata if I add citation?" If official answer is yes, probably we can add much more data than now, but in this case I don't understand CC0 license. --ValterVB (talk) 20:06, 3 October 2017 (UTC)
    To further clarify, not only is Bronwen doing an illegal (and deliberate) thing, but so is Cerys (although not deliberate, but careless).
    Yes - we definitely need some explicit text from the WMF legal team. I think the answer to the first question: "Can I add data under CC by in Wikidata if I add citation?" [assuming citation is the same as reference] is:
    Yes - if you are creating one statement for one item
    No - if you are importing all (or a substantial part) of a dataset
    But there should be a second question: "Can I add large parts or all of a data set under CC by in Wikidata if I add a copyright statement (such as 'Contains data from Angharad's Research © copyright and database rights, Angharad Evans, 2017, (link to license), (link to original work)?", to which I think the answer is Yes.
    I think CC0 is as I've described it, but I think many people who work on Wikidata think it is something different. I provided a link to the Creative Commons faq on CC0 earlier, but here it is again. One important thing from the faq is that "CC0 does not affect other persons’ rights in the work". Anyone who says "is CC0" means that "you can use this data and do anything you want with it without restriction" is wrong.
    However, it is certainly possible to create a knowledge base that only includes data in the public domain and mass donations where the donors release their own data under a CC0 license. You can then license such a knowledge base under a CC0 license, and also say that "you can use this data and do anything you want with it without restriction", and that would be correct.
    However, as soon as you add something where the copyright is held by someone else (which you should attribute properly) you can still release under CC0 but you can no longer truthfully say "you can use this data and do anything you want with it without restriction".
    As an aside, the occasional extraction of one parameter from a database can be regarded as fair use, but the wholesale extraction of data and structure from a copyright database (such as CC-BY or OGL) would require attribution, and that is the point where you can no longer truthfully say "you can use this data and do anything you want with it without restriction". Wikidata has passed that point sometime ago. I know of one mass import in 2014 that would certainly pass that point, but I suspect there are many others, and quite possibly earlier... I think there are three options:
    • If Wikidata wants to truthfully say "you can use this data and do anything you want with it without restriction", we probably need to remove an awful lot of statements and items.
    • If Wikidata wants to expand and continue to include more data released under CC-BY and OGL, we should stop saying things like "you can use this data and do anything you want with it without restriction". We would need to add attribution statements somewhere, but we could still release under a CC0 license (but without making exaggerated claims about what you can do with it).
    • A third option might be to also include data from CC-BY-SA sources, and only release or show that data under a share alike license. Technically possible, but a nightmare to administer...
    I think I favour the second option - there is a mass of CC-BY and OGL data out there, which would be even more useful when combined with the interlinking of data that Wikidata provides. However, if the Wikidata community wants to keep saying "you can use this data and do anything you want with it without restriction", then I'll help with the cleanup... Robevans123 (talk) 00:52, 4 October 2017 (UTC)
    Hi, I'm not sure if Robevans123 is correct in his example above. If Cerys takes Angharad's work from Wikidata and includes it in a product, is Cerys really in breach of copyright of Angharad's work? --Gnom (talk) 13:30, 4 October 2017 (UTC)
    @Gnom: I'm pretty sure that I am correct. The only absolute defence against copyright infringement is if someone has created something original and totally independent of another person's efforts. There are guidelines that allow re-use to some extent (fair use/research/non-commercial educational works etc), but even unintentional use is copyright infringement - the way it is treated in law may be different; Bronwen's actions may breach criminal law, Cerys's action may be treated under civil law. In the case that I described Angharad's first recourse would be a polite message to Cerys asking for acknowledgement, which she would probably be happy to do. If, for some strange reason, Cerys refused to do this, then Angharad could take civil action to force acknowledgement (or removal of data from Cerys's product), and seek legal costs.
    Many years ago George Harrison (Q2643) was successfully sued for royalties over the song My Sweet Lord (Q1476003) in which he, unintentionally and unconsciously, copied much of the theme of He's So Fine (Q5688613) by The Chiffons (Q1386995)... Robevans123 (talk) 21:08, 12 October 2017 (UTC)
    Hi Robevans123, please note that we are talking about database law, and not about music: Doesn't the incorporation of the previously protected data into Wikidata "lift" the copyright protection, since the individual data points (now incorporated in Wikidata) cannot be copyrighted? --Gnom (talk) 06:04, 13 October 2017 (UTC)
    Back to Gnom's post. Thanks for clarifying the fact that we can, if we choose to, add any data covered by an Open Government License into Wikidata. But I still need some clarity on exactly how we provide sufficient and correct acknowledgement. As JHeald pointed out, in the case of data provided by Ordnance Survey (the Information Providers in the OGL), the attribution statement they require is "Contains OS data © Crown copyright and database rights (year)". I don't think "simply by linking to the appropriate government page under "source"" is sufficient. It really is a matter of where we put that statement (probably the Wikidata copyright page, if it was a bit more prominent), or possibly a separate "Sources" page, again reasonably prominent (a link from the main page).Robevans123 (talk) 00:38, 3 October 2017 (UTC)

    honorary degree: is it an award or an academic degree

    I am wondering how others have been handling honorary degrees. Are they see as an academic degree (P512) and somehow qualified as being "honorary"; or are they considered an award received (P166)?  — billinghurst sDrewth 04:43, 13 October 2017 (UTC)

    The latter. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 05:45, 13 October 2017 (UTC)
    Then we need to fix the constraints somehow as the awards complains about that property use as not associated with awards; and I could not see a means then to utilise it differently when used academically and through honorary award.  — billinghurst sDrewth 10:48, 13 October 2017 (UTC)

    Navigation tool (maps, etc.) for navigation by navigator

    We have breadcrumb (Q846205). What is the name of activity, tools and object, when a person uses breadcrumb (Q846205) to go through the links? --Fractaler (talk) 07:14, 13 October 2017 (UTC)

    Structured Commons focus group - please consider joining!

    Hello all! For the upcoming development of Structured Data on Commons, I notice that it would be very helpful for me (and the entire Structured Commons team) to be able to work with a group of dedicated community members, from Commons and from Wikidata, whom we can approach for input regularly. Consider it a group of people who are OK to be pinged every now and then with (smaller) requests for feedback (not for larger decision-making, which should take place with the Commons community at large). So I would like to experiment with a focus group (see more info here). We can figure out how we can work best as we go along! I'd very much appreciate it if people who are very interested in Structured Commons would consider signing up. Many thanks! SandraF (WMF) (talk) 12:03, 13 October 2017 (UTC)

    Idea of a query example

    Is it possible to get this Wikipedia list – List of top international rankings by country – from a Wikidata query?  – The preceding unsigned comment was added by Freedatum (talk • contribs) at 13. 10. 2017, 12:51‎ (UTC).

    I don't think so, the data is the list is relatively unstructured and not easily modelable with Wikidata. ChristianKl (talk) 14:14, 13 October 2017 (UTC)
    I think one approach would be to make the wikidata items for each of the individual ranking lists into instances of some sort of "international ranking list", then maybe add the values in the ranking for all the countries via a property like numeric value (P1181) with country (P17) qualifier, and then I expect you could generate something like this list via a SPARQL query. I'm not sure it's worth all that effort though... ArthurPSmith (talk) 16:04, 13 October 2017 (UTC)

    Q24702432

    Who is Q24702432? Visite fortuitement prolongée (talk) 20:12, 16 October 2017 (UTC)

    @Ipigott: please check this; which you created. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:22, 16 October 2017 (UTC)
    I strongly doubt it is notable.--Ymblanter (talk) 20:27, 16 October 2017 (UTC)
    I certainly did not intend to create this item. I don't know how it came about. It should be deleted.--Ipigott (talk) 08:57, 17 October 2017 (UTC)
      Done--Ymblanter (talk) 09:06, 17 October 2017 (UTC)
    This section was archived on a request by: Matěj Suchánek (talk) 15:27, 19 October 2017 (UTC)

    Protecting country names

    Following from Wikidata:Project_chat/Archive/2017/09#People_changing_country_names, it looks like the amount of this type of vandalism seems to be increasing (some examples from the last 24 hours: [10], [11], [12], for more see [13]) and it's not always caught immediately ([14] took 7 hours). Country names are very high profile, and vandalism for them affects Wikipedias using Wikidata information.

    Can we protect the labels for specific entries - and if so, can that be applied to country names asap? Thanks. Mike Peel (talk) 21:58, 6 October 2017 (UTC)

    We can only protect entire items, not parts of it (such as labels only). If you feel that protection is necessary for a specific item due to excessive vandalism, you can request it at the Administrators' noticeboard. —MisterSynergy (talk) 22:08, 6 October 2017 (UTC)
    Ah, that's not good. What I'm after is the equivalent of move-protection on Wikipedias - people should still be able to edit the content, but not the label. Does this need a technical request to the developers, or is it something that needs community consensus (or both)? Thanks. Mike Peel (talk) 22:57, 6 October 2017 (UTC)
    Labels are content of items just as sitelinks and statements are, thus it would definitely require technical changes by the developers. However, given the fact that this is a rather drastic restriction of the openness of this project, I would also suggest to ask for community concensus. An alternative without technical changes would be to define a policy for permanently semi-protected items via community consensus. The core items are in fact very robust meanwhile, but on the other hand this project lives from the idea that anyone can edit. Just as Wikipedias do. —MisterSynergy (talk) 05:43, 7 October 2017 (UTC)
    From watching the fall-out of problems caused by vandals/edit warriors changing the names & values of Wikidata objects on the other side, I'd say this is the point of the spear of any content wars that either have spilled over from a given Wikipedia project -- or will. Thus I believe it is reasonable to request a technical change that specific values of a Wikidata object be protected. Better to have advance notice & flexibility for Wikidata admins to respond to bad-faith edits, then find out a data object has been suborned 6 months after the fact. -- Llywrch (talk) 21:08, 8 October 2017 (UTC)
    I guess that with special:abusefilter it could be possible to just protect the labels and descriptions of the country names in cases where we know that all such labes are correct. However countrynames are corner case where it can be done with af. --Zache (talk) 01:56, 9 October 2017 (UTC)
    @Zache: That sounds like a good option, is there a place where we can propose that? BTW, after some more investigation, it looks like a good chunk of these are coming from mobile traffic - where, it seems, you can't edit statements, you can only edit labels/descriptions (at the top of the page) and sitelinks (at the bottom). So maybe it's not a surprise that there's more vandalism of the names of countries than any other info in their entries. Thanks. Mike Peel (talk) 18:58, 13 October 2017 (UTC)

    References to wmflabs?

    On enwp, @Alsee: asked why Wikidata has references to wmflabs. Beyond seeing that Wiki Loves Monuments is involved, this has me stumped - can anyone provide some insight into this? Alsee's comment in full was "Regarding wmflabs refs: I spent a long time trying to figure out how to use the regular search box to search for wmflabs in wikidata refs, but I couldn't find any way to do so. Am I missing something simple, or is this content really not indexed?!? I finally resorted to teaching myself the wikidata database query language to search refs that way. That's crazy. Here's a query that pulls out some wmflabs refs. Anyway, the large majority of wmflabs hits go to tools.wmflabs.org/heritage. Those are all refs to content extracted from Wikipedia. I also found isolated instances of circular refs to wikidata itself via tools.wmflabs.org/reasonator and tools.wmflabs.org/scholia. There's also tools.wmflabs.org/whois and tools.wmflabs.org/geohack which are god-awful ways to effectively ref external sources." (I'm also trying to explain Wikidata's reference system to Alsee on enwp, help/insight would be appreciated.) Thanks. Mike Peel (talk) 20:30, 11 October 2017 (UTC)

    @Mike Peel:
    it indeed seems an awful way to make a reference to ... any info.
    your query would be much more useful if it was possible to see on which items this is used, and if a bot or a human contributor added those.
    I tried to tweak it, but I'm no good at those. Could you ? --Hsarrazin (talk) 12:07, 12 October 2017 (UTC)
    ok, finally found one : Gänsehäufel (Q23256) - it seems property Cultural heritage database in Austria ObjektID (P2951) indeed has a formatter URL (P1630) pointing to "https://tools.wmflabs.org/denkmalliste/index.php?action=EinzelID&ID=" - but it is an id, in fact. The whole adress can be easily changed by changing P1630, if info were accessible elsewhere on the web, from the same ID.
    in fact, Commons too, uses the same database like commons:Category:Gänsehäufel - guess it's a workaround because the official database for Heritage monuments in Austria is not publicly accessible, see original discussion. --Hsarrazin (talk) 12:19, 12 October 2017 (UTC)
    500k references within ~30 seconds, somewhat easy to find and investigate. The total number is unknown to me. Special:LinkSearch does not help here (no namespace filter available), but one might want to run an SQL query against the externallinks table of the wikidatawiki_p database. —MisterSynergy (talk) 12:33, 12 October 2017 (UTC)
    @MisterSynergy: Thanks, that query definitely makes it a lot easier to see what's going on! Thanks. Mike Peel (talk) 17:41, 12 October 2017 (UTC)
    I have myself added quite a lot of references to wmflabs (e.g. https://www.wikidata.org/wiki/Q1622083#P3761 ). The references I have added point to https://tools.wmflabs.org/whois/gateway.py , which does not draw its data from any Wikimedia project, but from the WHOIS system. Let me know if this was still problematic. − Pintoch (talk) 12:38, 12 October 2017 (UTC)
    @Pintoch: In those cases, why not reference arin directly? There's a link to that page from the wmflabs page. Thanks. Mike Peel (talk) 17:41, 12 October 2017 (UTC)
    @Mike Peel: that would only work for ARIN-issued IP addresses, whereas this gateway works for everything… − Pintoch (talk) 19:18, 12 October 2017 (UTC)
    True, but the gateway generally links to the IP address issuer's page describing the IP address, so can't you just follow that link each time and use that? Thanks. Mike Peel (talk) 19:26, 12 October 2017 (UTC)
    In principle, yes of course! In practice, this means one more slow HTTP request in a context where they are expensive. − Pintoch (talk) 20:02, 12 October 2017 (UTC)
    @Pintoch: I'm not sure what you mean by "slow HTTP request ... expensive"? I wonder if this would be something a bot could do - look for cases of these links and try to replace them with the direct link. What do you think? Thanks. Mike Peel (talk) 19:00, 13 October 2017 (UTC)
    In general references to wmflabs are a poor way to indicate where the information actually came from. The topic really came up because the wmflabs.org/heritage refs are effectively an obscure synonym for "Imported from Wikipedia". On Wikipedia we have been considering filtering out information which is unsourced, or sourced as Imported from Wikipedia. The wmflabs/heritage references effectively cloak the source, and are bypassing that filter. I'm also surprised to discover the scale of the issue. Wikidata has over 1.1 million wmflabs.org/heritage refs, at which point the search times out. Alsee (talk) 21:31, 12 October 2017 (UTC)

    American Civil War

    Do we have anyone with expertise on or interest in the American Civil War? We have a number of items for individual Union Army Divisions, Departments and Districts (Q7885358), but none of these have <instance of> as far as I can tell. It would be great for an expert to build out at least a simple hierarchy of items for the Union and Confederate armies so these things can be categorized. - PKM (talk) 20:55, 11 October 2017 (UTC)

    I've posted a note at en.Wikipedia's Military History project, asking for help. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:25, 12 October 2017 (UTC)
    @PKM: I am a project coordinator of WikiProject Military history on en Wikipedia. Could you please elaborate the need, so that I could help you in the best manner possible. --Krishna Chaitanya Velaga (talk) 16:16, 12 October 2017 (UTC)
    @Pigsonthewing, Krishna Chaitanya Velaga: Thanks! I don't know how parts of armies are typically structured in Wikidata. We need classes and subclasses for things like Confederate Army of Kentucky (Q2917624): no description, Department of the Susquehanna (Q5260564): unit of the Union Army during the American Civil War and District of California (Q5283395): Union Army unit during the American Civil War. These units are related to specific geographical areas, and in the Union Army it seems that a district is part of a department. Union Army (Q1752901) itself has no <instance of>, although Confederate States Army (Q1125021) is <instance> of "army". We might want a hierarchy so that "Union Army" <has parts of the class> "?Department of the Union Army", and those subdivisions have classes for their parts. But if we don't want to build a class structure for individual armies, we could standardize on something like <instance of> "?command division" (of) "Union Army". So I'd like someone who works with armies in Wikidata to advise. - PKM (talk) 19:39, 12 October 2017 (UTC)
    @PKM: and @Pigsonthewing, Krishna Chaitanya Velaga: I think military unit (Q176799) as explained in german Ein Verband (Vbd.) ist in den Streitkräften die gliederungsmäßige oder zeitlich begrenzte Zusammenfassung mehrerer militärischer Einheiten in der Stärke eines Bataillons oder Regiments... is the correct <instance of>. It has already been used for Army of the Potomac (Q653089). Breg Pmt (talk) 18:39, 13 October 2017 (UTC)
    @Pmt: - military unit (Q176799) sounds good! Thanks for finding that! - PKM (talk) 19:29, 13 October 2017 (UTC)

    @PKM: CORRECTION The correct <instance> should be major military unit (Q4005772): organizational level unit, usually regiment or larger in German Zu den Großverbänden zählen in aufsteigender Reihenfolge die Brigade, die Division, das Korps oder Armeekorps, die Armee und die Heeresgruppe sowie weiter vergleichbare zeitweilige oder ständige Truppenkörper wie beispielsweise der Kriegsschauplatz oder gemäß Clausewitz das Kriegstheater Breg Pmt (talk) 21:16, 13 October 2017 (UTC)

    Ah, okay! Thanks, I'll do those now. - PKM (talk) 23:54, 13 October 2017 (UTC)

    OTRS-permit

    Hi, in Finland goverment data is currently mostly CC-BY which is not enough for Wikidata which needs CC0. However least in some cases it is possible to ask permit for storing data as CC0 and in commos this is marked using OTRS-permit. How it works in Wikidata and is there some examples for how it is done before? --Zache (talk) 14:13, 13 October 2017 (UTC)

    To my knowledge there is no OTRS service at Wikidata right now. In this topic (German language, some weeks old) @Lydia Pintscher (WMDE) said that there is a WMF draft document about such a process (?), but she didn’t know about the progress and wanted to ask the legal team. I haven’t heard any news after that, though… —MisterSynergy (talk) 19:50, 13 October 2017 (UTC)

    Generational suffix

    In a June 2013 request for comment, Wikidatians discussed how to create statements about various name components, including generational suffixes. I'd like to state that Johann Michael Keller (Q1695557) has 'generational suffix' the Younger (Q19838173). Should I propose a new property, or use an existing name property with qualifier, or another strategy? Runner1928 (talk) 21:20, 13 October 2017 (UTC)

    new filters for reCh tool

    Patrolling recent changes is cumbersome but necessary way to keep vandalism at a low level. To make patrolling easier I created already a while ago the reCh tool. With the tool you can for example only display edits in a language you understand or you can mass-patroll edits by a specific user.

    I have extended the tool with two new filters now:

    • only show edits on the most important 10,000 items (importance determined by the number of incoming links).
    • only show edits on items speficied by a PagePile. PagePile is a tool by User:Magnus Manske where you can create a list of pages using various sources (SPARQL, PetScan etc.).For example I created a PagePile with all sovereign states. The ID of the PagePile (11023) can be inserted in reCh tool and so you see only edits made on items about sovereign states.--Pasleim (talk) 11:00, 14 October 2017 (UTC)

    Number of descendants

    At WikiProject Genealogy/numbers/descendants, I added a few counts of the number of descendants. It does one count per query to ensure that it keeps getting updated weekly. We can add more people, but preferably not Charlemagne's children.
    --- Jura 19:14, 14 October 2017 (UTC)

    Importing GNIS data

    For US locations, does anyone have a bot for importing and referencing "feature" (= instance of), county, state, country, and coordinates based on the GNIS ID? Adding all of these items by hand is tedious (and I've been doing a lot of them). Here's an example: Spike Island (Q41983719). - PKM (talk) 22:13, 14 October 2017 (UTC)

    Birthday logo update

     
    one year ago

    We need a new version of last year's birthday logo; and the cropped version. @Incabell: Can you help, please? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:17, 12 October 2017 (UTC)

    Here ya go :D --Incabell (talk) 12:23, 12 October 2017 (UTC)
    Don't we need something really new? Previous logo's always expressed their numbers. Sjoerd de Bruin (talk) 19:28, 12 October 2017 (UTC)
    @Sjoerddebruin: If that's something that's wanted, I can do that over the weekend. I had somehow understood that last year's version should be updated and took it as "fix the number please" :) --Incabell (talk) 10:18, 13 October 2017 (UTC)
    Since no one confirmed this, I have not made another logo. --Incabell (talk) 14:07, 15 October 2017 (UTC)
    Thank you very much @Incabell: :) Lea Lacroix (WMDE) (talk) 09:35, 13 October 2017 (UTC)

    Can I use wikidata to link to wikipedia via ICD10 code?

    I am setting up a mediawiki to document a database that uses a subset of ICD10. I would like to link to en.wikipedia articles using the code, since actual diagnosis names on our side aren't necessarily consistent with the standard. So is there a way to build a link knowing that a disease has an ICD-10-CM (P4229) of <something>? For example, I would know that appendicitis Q121041 has an ICD-10-CM (P4229) of K37. Appendicitis Q121041 also has a corresponding en.wikipedia article. Can I generate a URL that will use wikidata to send me to the wikipedia article? Thanks!  – The preceding unsigned comment was added by Tenbergen (talk • contribs) at 15. 10. 2017, 15:29‎ (UTC).

    ──────────────────────────────────────────────────────────────────────────────────────────────────── You can get to the Wikidata item by using Resolver; for example:

    https://tools.wmflabs.org/wikidata-todo/resolver.php?prop=P4229&value=K37

    Maybe User:Magnus_Manske can kindly add a switch to force the use of a given Wiki as the target? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:14, 15 October 2017 (UTC)

    Thanks Pigsontheswing, this set me off in the right direction. I had a look at the page in debugger and found that the following will send me to the corresponding en.wikipedia article:

    https://tools.wmflabs.org/wikidata-todo/resolver.php?prop=P4229&value=K37&project=enwiki

    --  – The preceding unsigned comment was added by 142.161.44.83 (talk • contribs) at 22:55, 15 October 2017‎ (UTC).

    Places and their (local) government bodies

    What is the best way to indicate that Birmingham City Council (Q4916650) is the local authority/ governance body for Birmingham (Q2256)? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:57, 20 October 2017 (UTC)

    legislative body (P194) for the legislature, if you have a broader item like "local government of Birmingham" then perhaps .authority (P797).--Pharos (talk) 18:36, 20 October 2017 (UTC)
    Thank you. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:49, 21 October 2017 (UTC)
    This section was archived on a request by: Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:49, 21 October 2017 (UTC)

    Why don't the Wikidata Tours work?

    Hi

    Can someone tell me what the technical issue with the Wikidata:Tours are that stop them working? I'd really like to help create new tours. Looking at the talk page it seems that its been a problem for a few years. Looking at the talk page it is not clear what the issue is, there seems to be a Phabricator ticket that was opened and then closed again (T85719) but its still broken (or broken again).

    To me this seems like a major barrier to people learning how to contribute to Wikidata.

    Thanks

    --John Cummings (talk) 13:48, 14 October 2017 (UTC)

    Filled phab:T178224. Still, I wonder why a ticket from 2015 would be relevant for the current issue. Sjoerd de Bruin (talk) 15:00, 14 October 2017 (UTC)
    @Sjoerddebruin:, thanks very much. --John Cummings (talk) 22:36, 15 October 2017 (UTC)
    The problem appears to have been caused by overlays, unfortunately disabling the overlays has caused other issues making the Statements tour malfunction. If anyone could help with fixing this it would be greatly appreciated. --John Cummings (talk) 22:49, 15 October 2017 (UTC)
    I can't see any issue with the statements tour. Can you try to describe in which step of the tour you are encountering problems? --Pasleim (talk) 23:20, 15 October 2017 (UTC)
    @Pasleim:, sorry for not being specific enough, when I start the Statement Tour, the 3rd pop up which starts All item pages have a Statements section which can include..... has a problem. It appears in the correct part of the page but the page does not jump to the right section so it appears as though the tour has just stopped. interestingly if you scroll down the screen it seems to 'unstick' the scrolling of the screen and will jump up and down the page fine. You may not be able to see it if you have a very high resolution screen. --John Cummings (talk) 14:50, 16 October 2017 (UTC)

    Notability of Wikivoyage listings

    In terms of notability, does a place having a listing on Wikivoyage by itself warrant its own Wikidata entry? ~nmaia d 15:52, 15 October 2017 (UTC)

    Yes, see https://www.wikidata.org/wiki/Wikidata:Notability ChristianKl (talk) 15:59, 15 October 2017 (UTC)
    @NMaia: What do you mean by "a listing on Wikivoyage"? A page, or an entry in a list on a page? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:08, 15 October 2017 (UTC)
    An entry, like a restaurant, for instance. ~nmaia d 17:02, 15 October 2017 (UTC)
    • Yes, but tracking them at Wikidata isn't optimal yet. Initially, you might want to focus on listings for
       Get in
       Get around
       See
       Do
       Learn
       Sleep
    
    And later only:
       Buy
       Eat
       Drink
    
    Tracking use of items in listings at Wikivoyage (en/fr) is already done.
    --- Jura 16:32, 15 October 2017 (UTC)
    Thanks. How is the tracking done? ~nmaia d 17:02, 15 October 2017 (UTC)
    Currently the listings read the labels only. For a list used in an article, see https://en.wikivoyage.org/w/index.php?title=Aarhus&action=info#mw-wikibase-pageinfo-entity-usage
    This includes Q1138832 currently which has enwikivoyage in "Wikis subscribed to this entity" at https://www.wikidata.org/w/index.php?title=Q1138832&action=info clicking it gets you to https://en.wikivoyage.org/wiki/Special:EntityUsage/Q1138832
    --- Jura 17:15, 15 October 2017 (UTC)
    Fascinating stuff, thanks. So, to be clear, I can create Wikidata entries for individual listings within Wikivoyage articles, correct? ~nmaia d 19:29, 15 October 2017 (UTC)
    Yes. They are fulfilling notability rules 2 and 3. It is useful to use Wikidata entries after their creation in the articles, for instance in a listing template. --RolandUnger (talk) 05:07, 16 October 2017 (UTC)

    German Business Registry, Handelsregister

    Hi all, I created a bot request here: Wikidata:Requests for permissions/Bot/Handelsregister The main goals are:

    1. add data from the German business registry to existing wikidata items
    2. if 1 is done sufficiently, then discuss whether to include the rest with new identifiers

    I was asked on the list to move the discussion to this wiki. So let's discuss on the Wikidata:Requests for permissions/Bot/Handelsregister bot page. SebastianHellmann (talk) 07:47, 16 October 2017 (UTC)

    WordPress plugin to associate tags with Wikidata IDs: opinions sought

    Work is underway on a WordPress plugin to associate a blog's tags/ categories with Wikidata items. How should this work? Please comment on Phabricator (preferred) or here. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:07, 16 October 2017 (UTC)

    Is a Wikiproject a valid catalog?

    Since the disccusion on RfD got closed without consensus about this part, I'll ask the question again in a more broad audience. This currently violates constraints as well, I don't think we should use this property for incrowd projects. Sjoerd de Bruin (talk) 16:41, 10 October 2017 (UTC)

    I agree with you. The whole Black lunch table has been explained to me about 4 times now and I still don't get it. I do understand the need for list monitors for these things (also local WLM initiatives where there is no "list-by-legislation" and only "list-by-historical-society-recommendation". Not sure how to set this up properly. Jane023 (talk) 16:53, 10 October 2017 (UTC)
    Are you sure that you have linked to the correct discussion, @Sjoerddebruin? I don’t see how a WikiProject catalog is important there. #Q28914245 on the same page seems more related. —MisterSynergy (talk) 17:25, 10 October 2017 (UTC)
    Fixed, sorry. Sjoerd de Bruin (talk) 17:27, 10 October 2017 (UTC)
    Okay thanks; ping @Marsupium, GerardM, ValterVB, Pasleim as participants of that discussion.
    First of all, the catalog property should be used as qualifier of a catalog code (P528) only, which is not the case here. Therefore, without a publicly accessible identifier in the catalog I wouldn’t accept that. I also fear that we could be in a situation that we consider each Wikimedian notable for an item—although there are no external references available. That is certainly not desirable. On the other hand, the item in question has an external identifier (I just assume at this point that it is about the same person), so it is notable independently of this “catalog”. —MisterSynergy (talk) 17:44, 10 October 2017 (UTC)
    I closed the discussion on RfD because the person is notable independently of the catalog statement. Note that the item is not about a Wikimedian but it is a candidate to create an article during a Black Lunch Table edit-a-thon, see en:Wikipedia:Meetup/Black Lunch Table/Lists of Articles. --Pasleim (talk) 18:39, 10 October 2017 (UTC)
    I would like more input on this. Sjoerd de Bruin (talk) 12:27, 16 October 2017 (UTC)
    Sorry for late replay, personally I think that this kind of use, is out of scope of Wikidata. I don't think that items like this, this or this (but there are hundreds like these) are useful to wikidata, it's also impossible check if a person is really in Black lunch table, I asked but no answer about this. If they are really artist need some reference to confirm the fact so we can keep them here. Instead, if the only use is monitors for some Wikimedia initiative is better find a different solution that don't "pollute" this project. --ValterVB (talk) 19:22, 16 October 2017 (UTC)
    That horse left the barn. Also, do consider what Wikidata is there for and, all these projects have their people and they are the ones that include items on a list. All these projects have their user stories and consequently in their opinion the entries are valid. When you talk about "pollution" it is as if a bear does not shit in a lake because the water is fresh. Wikidata is there to be used and the more use we get out of it the more relevant it becomes. Thanks, GerardM (talk) 05:48, 17 October 2017 (UTC)
    Sorry, but google translate don't help me in this case, can you try to write more simple? --ValterVB (talk) 06:00, 17 October 2017 (UTC)
    • I don't think the (incorrect) use of the property helps you determine if the items should be kept or deleted.
      --- Jura 06:31, 17 October 2017 (UTC)

    Wikidata weekly summary #282

    Wikidata weekly summary #282

    operator (P137) for embassies

    Until recently, the English description of operator (P137) stated: "person or organization that operates the equipment, facility, or service". Given this long-standing description, and the most recent relevant suggestion on the property talk page, and the guidance on Wikidata:WikiProject International relations that operator (P137) should be an instance of either organization (Q43229) or sovereign state (Q3624078), I think any of the following is valid:

    But similar to how we prefer to be as specific as possible for other properties, like located in the administrative territorial entity (P131), I think we should prefer the last statement above. If one really wants to determine the sending country of an embassy or consulate, then it's a simple matter to get the country (P17) of the item specified by operator (P137). Does anybody have any comments, objections, or suggestions? —seav (talk) 19:23, 14 October 2017 (UTC)

    • There was extensive discussion about where to add the sending country in the past (P17 or elsewhere) and people came up with P137. Somehow this hadn't find it's way into the property description. Most items on Wikidata_talk:Wikivoyage/Lists/Embassies use that. For some statistics, see Wikidata:Wikivoyage/Lists/Embassies/count by country
      --- Jura 19:30, 14 October 2017 (UTC)
      • Here's a list of links to previous discussions here in Project chat in chronological order: Feb 2016, May 2016, June 2016, July 2016, August 2016, September 2016, October 2016. Looking at these discussions, while there was a rough agreement that we use country (P17) to the country hosting the embassy, there were also suggestions to use owned by (P127) or allegiance (P945) instead of operator (P137) for the sending country. There are also suggestions that operator (P137) be used to indicate the government or government agency to be more specific instead of just the country (like it was suggested for military bases on foreign soil). So I don't think the usage of operator (P137) to refer to the sending country is a decision based on consensus but rather based on fait accompli. And these discussions were only started last year. Furthermore, not being able to indicate which ministry, agency, or department actually operates these embassies (it's certainly possible that not all embassies sent by a country is operated by just a single government organization) means we can't represent that information in Wikidata using the entirely appropriate operator (P137) property. This defect is a situation that can be improved in a simple manner: either use another property or extract the country from the item stated in the operator (P137) property. —seav (talk) 23:34, 14 October 2017 (UTC)
        • I don't think P137 is necessarily the best choice, but it might be the optimal one: I think the country should be indicated on the item in one way or the other. If you want to use another property than P137, I don't mind. Once implemented, please ping me and I will update the queries.
          --- Jura 06:52, 15 October 2017 (UTC)
          • Well, an alternative option to using operator (P137) for the agency that operates embassies is parent organization (P749). I'm not sure which between the two is semantically/ontologically better to represent this relation between a government agency and an embassy. I agree though that specifying the sending country directly on the item is warranted because that is how most people look at embassies—as an institution between two sovereign states. —seav (talk) 15:01, 16 October 2017 (UTC)
    I totally agree with seav that operator (P137) should have for value Office of Foreign Missions (Q7079230), because it is the lowest-level agency that operates the embassy. Anyone wanting to know what country this agency belongs to has a simple way to ask for it in SPARQL. I am guilty of assigning countries as operator (P137) values, in hundreds of cases, because I am too lazy too figure out what Zimbabwean agency is responsible for running Zimbabwe's embassies, but anyone familiar with Zimbabwe is warmly encouraged to change operator (P137) to the lowest-level appropriate agency that exists on Wikidata. Cheers and thanks for caring :-) Syced (talk) 07:11, 16 October 2017 (UTC)
    seav: Are you sure that Office of Foreign Missions (Q7079230) operates the US embassies and consulates? From reading the Wikipedia article, it seems that it checks/regulates/educates but it seems that it does not operate them, meaning that it does not give them goals nor funds. So maybe United States Department of State (Q789915) is the best item to specify as an operator? Syced (talk) 08:54, 17 October 2017 (UTC)

    Words having multiple pages in enwiki

    There are a lots of non-english words which have pages in enwiki, leading to duplicates like nonviolence (Q76611) and Ahimsa (Q178498). How to resolve this issue? Capankajsmilyo (talk) 12:53, 17 October 2017 (UTC)

    Wikidata items are not about words but about concepts. The English page is about the Hindi/Buddhist version of the concept. I don't see any need for resolving anything in this case. ChristianKl (talk) 14:31, 17 October 2017 (UTC)
    The English page is about the Hindi/Buddhist version of the concept As is Q178498. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:21, 17 October 2017 (UTC)

    Technical issue

    This project chat page is loaded as expanded in mobile view and it take ages to reach to the desired topic. Can this be loaded in condensed view? Capankajsmilyo (talk) 12:55, 17 October 2017 (UTC)

    Hm, the same problem exists on other project chats. It seems like the behaviour differs from namespace. Sjoerd de Bruin (talk) 13:20, 17 October 2017 (UTC)

    Is there a page on Wikidata that shows how other Wikimedia projects are using Wikidata?

    Hi all

    Is there a page on Wikidata that shows how other Wikimedia projects are using Wikidata? I've been working on a page to explain more about how and why or Wikimedia project use Wikidata and I started a conversation on en.wiki to ask for their opinion (although its mainly been co-opted by one or two users who don't like Wikidata to complain about it). They have started a kind of hand curated list on en.wiki but I would like to have something on Wikidata that gives examples over many languages and projects.

    Thanks

    --John Cummings (talk) 09:36, 11 October 2017 (UTC)

    AFAIK that’s a bit difficult, since “using Wikidata” is not a well-defined scenario. I am aware of the “Page information” page of each item (linked in the left menu), which has a field “Wikis subscribed to this entity” in the “Page properties” section. However, having a sitelink is already enough for a subscription, and I am not sure whether this subscription is always updated properly. Projects that “use” the item without having a sitelink appear there as well, but I cannot explain which kind of “use” triggers a subscription. —MisterSynergy (talk) 10:25, 11 October 2017 (UTC)
    Category:Templates using data from Wikidata (Q11985372)? Matěj Suchánek (talk) 16:41, 11 October 2017 (UTC)
    Just for the numbers there is Grafana dashboard, but it doesn't really tell how wikidata is used. --Zache (talk) 01:38, 12 October 2017 (UTC)
    •   Comment @John Cummings: The English Wikisource is utilising wikidata for populating some of its fields in its author header: image, birth and death data, plus sister interwikis; and the broader authority control templates. Commons is using Wikidata in its Creator: namespace.In general, for people's names, the data is still insufficiently complete, eg. many are wikidata items are missing "family field" and partial "given name" either missing names, or no serial ordinal to order names. The Wikisources seem okay to migrating to data pulls, though need reassurance that the local data is present at WD prior to removing local fields. It is a journey as there are many Wikisource editions that do not have their data even set here at this time, and there is no easy means to push the data from Wikisource to Wikidata, so there we await those tools in development.  — billinghurst sDrewth 04:18, 12 October 2017 (UTC)
    @billinghurst:, perfect, thanks. --John Cummings (talk) 08:34, 12 October 2017 (UTC)
    •   Comment @John Cummings, billinghurst: - The French Wikisource has migrated all its Authors in wikidata, even if data still need to be migrated in some properties. We are now engaging a project to migrate editions/works data, and it will be a long process. There is currently no easy way to push the data to wikidata, and most of the work has to be done manually (or at best, semi-manually with Petscan or QS), book by book, after creating each publisher, each collection, each work... I just added links on the discussion page about this. --Hsarrazin (talk) 08:58, 18 October 2017 (UTC)
    Thanks very much @Hsarrazin:. --John Cummings (talk) 10:53, 18 October 2017 (UTC)
    Our outstanding list is only ~400 authors and that bounces up and down depending on research done. For our works, I am waiting until the more native tools exist a we have metadata already in Index: ns or in files at Commons. Well enough data for for basic data-mining.  — billinghurst sDrewth 11:49, 18 October 2017 (UTC)

    UEFA ranking

    Wikidata:Property proposal/UEFA ranking

    Hello. The above proposal is going to be rejected. A user have proposed another way to add the data. I am fine with it but there are 2 problems. Please read the discussion and say your opinion there. I have the data ready and I want to add it to Wikidata. But I need a secure way how to do that. According to the discussion, no new property is needed, so I need your opinion how to use the properties we already have. Xaris333 (talk) 15:34, 15 October 2017 (UTC)

    Anyone? Xaris333 (talk) 16:09, 16 October 2017 (UTC)

    Examples

    Country coefficient (men football)

    Royal Spanish Football Federation (Q207615)

       ranking (P1352) - 1
       review score by (P447)- Union of European Football Associations (Q35572)
       point in time (P585) - 2017
       points for (P1358) - 89.212
    

    Country coefficient (women football)

    Royal Spanish Football Federation (Q207615)

       ranking (P1352) - 6
       review score by (P447)- Union of European Football Associations (Q35572)
       point in time (P585) - 2017
       points for (P1358) - 41.000
    

    Club coefficient

    Real Madrid CF (Q8682)

       ranking (P1352) - 1
       review score by (P447)- Union of European Football Associations (Q35572)
       point in time (P585) - 2017
       points for (P1358) - 134.00
    

    National team coefficient

    Germany national football team (Q43310)

       ranking (P1352) - 1
       review score by (P447)- Union of European Football Associations (Q35572)
       point in time (P585) - 2017
       points for (P1358) - 40.236
    

    2 problems:

    1) A problem is the country ranking. These rankings are for the countries member of UEFA. The football associations. But, nowadays there are male and female ranking. The association items are the same for both cases. We need a way to show that in the items.

    2) The second problem is that we need to link to someway to UEFA coefficient (Q491781) or UEFA coefficient (women) (Q2981732) for all cases.

    Xaris333 (talk) 16:09, 16 October 2017 (UTC)

    @Xaris333: How about adding determination method (P459) as qualifier pointing to UEFA coefficient (women) (Q2981732) etc.? ArthurPSmith (talk) 17:30, 16 October 2017 (UTC)
    @ArthurPSmith: Good idea. And for men determination method (P459) as qualifier pointing to UEFA coefficient (Q491781). Thanks. Xaris333 (talk) 18:02, 16 October 2017 (UTC)

    @ArthurPSmith: Please check how I add it to England national football team (Q47762). A lot of issues... Xaris333 (talk) 18:43, 17 October 2017 (UTC)

    Hi @Xaris333: what are the problems? I think you've done it perfectly, and I think that "FIFA Ranking" should be reworked to this general scheme. --Vladimir Alexiev (talk) 06:26, 18 October 2017 (UTC)
    Hi @ArthurPSmith: I think that if constraints are inconvenient, they should be relaxed/fixed. It's much better to generalize prop use rather than bow to bad constraints. I see two that may need fixing:
      • "mandatory qualifier constraint: sport" will make each entry too verbose, because "score by" implies the sport.
      • "type constraint: class = human or group of humans" may be too restrictive: I haven't checked whether "national association football team" is an appropriate subclass. And surely other things may have rankings, eg robots in the "robot wars" competitions.
      • Is there some other offending constraint? --Vladimir Alexiev (talk) 06:26, 18 October 2017 (UTC)

    WikiProject Gendergap

    I am thinking of creating a WikiProject to consolidate some discussions about how to track and monitor items about women and their works. Is anyone interested in helping out? I am thinking listing queries that can be tailored per language or country and listing the basic statements desired for Q5 items as well as discussing female-specific occupations such as "queen consort", the "female form of label" for occupations, and also the various ways to link women to their various notable works. Suggestions for a project name are welcome too. Jane023 (talk) 12:20, 16 October 2017 (UTC)

    • Great idea. I/we would love to collaborate. WikiProject Bridging Gender Gap might be the project name. --Titodutta (talk) 12:50, 16 October 2017 (UTC)
    Hi, there already is Wikidata:WikiProject_Las_Imprescindibles, which was inspired by spanish-speaking women in Mexico last summer. Coordination of all women projects would be nice :)

    Harmonia Amanda Exilexi Ash Crow Manu1400 OdileB GrandCelinien Camelia (WikiDonne - Le Imprescindibili) Kvardek du User:Nattes à chat   Notified participants of WikiProject Las Imprescindibles --Hsarrazin (talk) 12:57, 16 October 2017 (UTC) OK great. I don't like "Gender Gap" by the way, because it implies "Gender Pay Gap" in English, and this has nothing to do with that, but more about the lack of female editors and thus indirectly, a lack of content for, by and related to women. I always try to use "Gendergap" because that is the name of the Wikimedia mailing list and anchors the subject better. I understand though that people might object to something they see as a spelling mistake (which it isn't). I think I will just call it WikiProject Women for now, as a short form of "Women in Red" because this is not about red links in the sense of missing items, but more about describing the items we already have. People often talk about the "gender binary" as if both sides are equal, which they definitely aren't on Wikimedia projects. Jane023 (talk) 13:31, 16 October 2017 (UTC)

    See here for now: Wikidata:WikiProject Women. Jane023 (talk) 13:55, 16 October 2017 (UTC)
    Entirely for an initiative like this one. Seeing the Wikidata:WikiProject Women, I'm thinking that it would be amazing to get people to start contributing to Wikidata with a wikidata game-like app! ' Exilexi (talk) 15:58, 16 October 2017 (UTC)
    Yes my gut feeling is that there is a lot of item improvement that can be done in a Wikidata-game like way, but I honestly have no idea how to set it up. Just by creating these lists I am hoping it might help attract a few contributions. Jane023 (talk) 17:17, 16 October 2017 (UTC)
    Good initiative Jane. I've been playing around with the work lists a bit. In my experience work lists should be of a human scale so people can actually finish it and most items should be easily solvable otherwise you'll get frustrated about not being able to solve any items.
    The first list I created was Wikidata:WikiProject Women/Wiki monitor/nlwiki date of birth. The Dutch Wikipedia has date of birth for most people so this seems to be quite easy (just did a couple). It's sorted from new to old so you can raise the quality of new items and slowly work on the backlog. Around a thousand items, that's actually very low given the number of items we have about people.
    The second list I created was Wikidata:WikiProject Women/Wiki monitor/nlwiki occupation. Here we run into an interesting problem: What occupation (P106) do we add to nobility? I filtered out items containing family (P53) or noble title (P97) to remove most of the clutter. With the filtering applied it seems to be a more workable list, but it does leave us with a second problem: How to handle mother/wife/sister (etc.) of some famous person?
    So we might want to fork these two issues:
    1. How to handle occupation (P106) for nobility?
    2. How to handle occupation (P106) for people who are only notable because they relate to a notable person?
    Can't recall a previous discussion about these. Any pointers?
    Multichill (talk) 12:05, 18 October 2017 (UTC)
    Super! I don't see the nobility as "clutter", but as potentially a gold mine for paintings in private collections. Just by clicking ona few of these I found they often lead to very well-populated commons categories with portrait paintings and prints. I agree that it would be ideal to track these separately though. Maybe a bunch of these can quickly be assigned "occupation=noble" so we can list them separately for a potential painting scan. I will see if I can do this. You're right about "clutter" in the sense that most of these have very low Q numbers and have been occupation-less for a very long time. Jane023 (talk) 12:32, 18 October 2017 (UTC)
    On second thought, maybe we need an occupation "portrait model" for these. Jane023 (talk) 12:35, 18 October 2017 (UTC)
    Occupation "portrait model" suggests that a person earns their living with it. A person who is a portrait model for a handful of portrais shouldn't labeled this way. I don't think being of nobility automatically implies an occupation. There's no reason to fill occupation (P106) when we don't have information about it. ChristianKl (talk) 12:46, 18 October 2017 (UTC)

    Facto Post – Issue 5 – 17 October 2017

    Facto Post – Issue 5 – 17 October 2017
     

    Editorial
    Annotations

    Annotation is nothing new. The glossators of medieval Europe annotated between the lines, or in the margins of legal manuscripts of texts going back to Roman times, and created a new discipline. In the form of web annotation, the idea is back, with texts being marked up inline, or with a stand-off system. Where could it lead?

     
    1495 print version of the Digesta of Justinian, with the annotations of the glossator Accursius from the 13th century

    ContentMine operates in the field of text and data mining (TDM), where annotation, simply put, can add value to mined text. It now sees annotation as a possible advance in semi-automation, the use of human judgement assisted by bot editing, which now plays a large part in Wikidata tools. While a human judgement call of yes/no, on the addition of a statement to Wikidata, is usually taken as decisive, it need not be. The human assent may be passed into an annotation system, and stored: this idea is standard on Wikisource, for example, where text is considered "validated" only when two different accounts have stated that the proof-reading is correct. A typical application would be to require more than one person to agree that what is said in the reference translates correctly into the formal Wikidata statement. Rejections are also potentially useful to record, for machine learning.

    As a contribution to data integrity on Wikidata, annotation has much to offer. Some "hard cases" on importing data are much more difficult than average. There are for example biographical puzzles: whether person A in one context is really identical with person B, of the same name, in another context. In science, clinical medicine requires special attention to sourcing (w:WP:MEDRS), and is challenging in terms of connecting findings with the methodology employed. Currently decisions in areas such as these, on Wikipedia and Wikidata, are often made ad hoc. In particular there may be no audit trail for those who want to check what is decided.

    Annotations are subject to a World Wide Web Consortium standard, and behind the terminology constitute a simple JSON data structure. What WikiFactMine proposes to do with them is to implement the MEDRS guideline, as a formal algorithm, on bibliographical and methodological data. The structure will integrate with those inputs the human decisions on the interpretation of scientific papers that underlie claims on Wikidata. What is added to Wikidata will therefore be supported by a transparent and rigorous system that documents decisions.

    An example of the possible future scope of annotation, for medical content, is in the first link below. That sort of detailed abstract of a publication can be a target for TDM, adds great value, and could be presented in machine-readable form. You are invited to discuss the detailed proposal on Wikidata, via its talk page.

    Links

    Editor Charles Matthews. Please leave feedback for him.

    If you wish to receive no further issues of Facto Post, please remove your name from our mailing list. Alternatively, to opt out of all massmessage mailings, you may add w:Category:Opted-out of message delivery to your user talk page.
    Newsletter delivered by MediaWiki message delivery

    For Wikidatans, the invitation at the end of the editorial can be repeated: please come to Wikidata talk:WikiFactMine/Annotation for fact mining and help us clarify and improve our project ideas. Charles Matthews (talk) 08:57, 17 October 2017 (UTC)

    What is the "Scaling up Wikidata editing" link for? It contains mostly outdated information from 4 years ago. Cheers! Syced (talk) 03:58, 18 October 2017 (UTC)

    Ah, strange mistake by me! Fixed now, for the latest blog by Magnus. Thank you. Charles Matthews (talk) 16:30, 18 October 2017 (UTC)

    w:Template talk:Marriage

    A debate on how to handle marriage data is going on at w:Template talk:Marriage#Death. The debate is on how to handle the end date of a marriage when one partner dies and whether the end date is obvious when the subject of the article is dead. It doesn't matter what your opinion is to participate. --Richard Arthur Norton (1958- ) (talk) 13:32, 18 October 2017 (UTC)

    Half a year of avg. weekly vandalism for Israel (Q801)

    See Talk:Q801 where I've summarized 24 instances of vandalism edits in the past 26 weeks, in the label and description fields of Israel (Q801). All were by IP account editors. Is there a policy of item protection, and does this qualify? -- Deborahjay (talk) 15:57, 18 October 2017 (UTC)

    I added semi-protection without timelimit. I didn't add a timelimit because I expect Isreal to continue to be a topic that draws vandalism if the semi-protection would get lifted after a short amount of time. ChristianKl (talk) 16:14, 18 October 2017 (UTC)

    Possible vandalism

    Could an Arabic speaker verify whether this edit which popped up in my watchlist is correct? I am not sure if Google Translate is correct but experiment (Q101965) probably shouldn't be described as "mythical stories". Jc86035 (talk) 14:52, 19 October 2017 (UTC)

    @ديفيد عادل وهبة خليل 2, علاء, Sky xe, Mr. Ibrahem: Mahir256 (talk) 04:57, 20 October 2017 (UTC)
    I remove it, its wrong --Mr. Ibrahem (talk) 08:25, 20 October 2017 (UTC)
    @Mahir256, Jc86035:   Done by @Mr. Ibrahem:. Also I added the correct description --Alaa :)..! 10:46, 20 October 2017 (UTC)
    This section was archived on a request by: Matěj Suchánek (talk) 12:30, 24 October 2017 (UTC)

    Pseudocraterellus undulatus

    If Craterellus sinuosus (Q829662) and Pseudocraterellus undulatus (Q28492054) are "different taxa", why do they both have NBN System Key (P3240) BMSSYS0000015454 and the same labels in several pairs of languages (en/es/fr etc: "Pseudocraterellus undulatus" & sv: "Kruskantarell")? And why do Wikipedia articles for the same taxon appear in different Wikidata items? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:38, 18 October 2017 (UTC)

    It seems like it was a mess before, but the labels were not fixed. Sjoerd de Bruin (talk) 21:55, 18 October 2017 (UTC)
    The UK National Biodiversity Network (Q6970988) changed it's taxonomic concept about Pseudocraterellus undulatus (Q28492054) recently and includes now Craterellus sinuosus (Q829662) as a heterotypic synonym of Pseudocraterellus undulatus (Q28492054). You can use taxon synonym (P1420) with a reference if you want to express this relationship. --Succu (talk) 22:37, 18 October 2017 (UTC)

    Input for Property proposal/Generic#depth

    Property proposals benefit from more input from more people. What do you think about Wikidata:Property proposal/depth ? ChristianKl (talk) 11:51, 19 October 2017 (UTC)

    Country historical continuity

    I guess this has been discussed but doesn't seem to be easy to find. Maybe it wasn't. Let's see.

    There are nation-states all around the world, with long history of state, government, name, flag, capital, etc. changes. It would be nice to have a relation which would connect these entities, which are usually chained together by "replaces" / "replaced by" relations, so queries would be nice and simple and wouldn't be needed to walk all the chain. Merges and splits aren't impossible to handle, either an object is member to more than one "country continuity object" or just one (hardly such thing exists on Earth).

    I see there isn't such a thing. Would it be useless, and if so, why? If not, how it could be best achieved?

    [answered] Additionally a SPARQL question: right now how would you pick Human country_of_citizenship (All countries which ever have been Hungary [throughout the replaced_by chain])?

    Thanks, it's ?country wdt:P1366* wd:Q28 ;)

    --grin 16:09, 19 October 2017 (UTC)

    Correspondence

    Hoi, many libraries have been given the correspondence of someone relevant at their death. What does it take to register this and what does it take to register the people he or she corresponded with (not individual letters). Thanks, GerardM (talk) 06:09, 18 October 2017 (UTC)

    Use archives at (P485)? You may wish to qualify it if only a sub-type of archive.  — billinghurst sDrewth 09:23, 18 October 2017 (UTC)
    GerardM, I don't understand what's your question here ? what does it take ? Time mostly... on wikisource we have complete sets of correspondences for people like George Sand (Q3816). We are able to (and we will) have an item per letter... each one using addressee (P1817), but nothing can be automated so far... so : time ! --Hsarrazin (talk) 09:28, 18 October 2017 (UTC)
    The "Archives at" works for me. There is data on with whom the correspondence was, not the individual letters. These are beyond my scope. Thanks, GerardM (talk) 12:54, 18 October 2017 (UTC)
    A library is generally a serious public source and as such the people who write or receive the letters are notable according to our criteria. In the ideal case the library has their own authority control IDs for the people and/or uses VIAF. Futhermore, it's also welcome when the library uploads the data to Wiki Source. ChristianKl (talk) 16:43, 18 October 2017 (UTC)
    Concur with @billinghurst: - At a recent training at the Library of Congress, we encouraged academics to help populate archives at (P485) if they held papers of politicians or famous figures. For now that seems the best approach. -- Fuzheado (talk) 22:45, 19 October 2017 (UTC)

    Term vs physical quantity

    fineness (Q1401905) is the percentage of a specific metal inside a coin. At the moment is an instance of physical quantity. The term fineness is also a numismatics term in fact is present inside the Wikipedia articles of glossary of numismatics (Q1093389). If I create an item for "numismatics term" may I add it to fineness (Q1401905) inside instance of (P31) or with terms we prefer to "duplicate" items? If we use only one item in fact, for example during querying, we don't have the assurance of meaning. I'm referring to the percentage or to the linguistic form? This is a specific example but I'm looking for a general rule because is quite common. Thanks! --AlessioMela (talk) 19:28, 19 October 2017 (UTC)

    if the meaning is the same (i.e. for a given coin the "fineness" value would be the same with either definition) then it should be fine to leave it as a single item. If the meanings are different in some significant way then it's better to create a new item, and link them with different from (P1889) so nobody will merge them later. ArthurPSmith (talk) 21:40, 19 October 2017 (UTC)
    I think it's a specialization of mass fraction (Q899138). --Succu (talk) 22:07, 19 October 2017 (UTC)

    Cover's depiction: Qualifier?

    It would be awesome linking publications with the stuff depicted in the cover. I was thinking mainly about people: periodicals such as Sports Illustrated (Q275837), Madrid Cómico (Q16746795), La Novela Teatral (Q32860631), Playboy (Q150820), but scope could be widened to books too. Wrt periodicals having items for individuals issues would be needed, though.

    An alternative would be something like "Sports Illustrated February 18, 2002 issue, item-not-created-yet" -> depicts (P180) -> LeBron James (Q36159) // (qualifier applies to part (P518) -> book cover (Q1125338)).

    Is this "qualifier option" OK enough or it would be better proposing a new specific property? (such as "cover's depiction"?). Strakhov (talk) 21:25, 19 October 2017 (UTC)

    we almost had a "cover illustration" property but it took a wrong turn - see Wikidata:Property proposal/Illustrateur de la couverture - feel free to propose a new one specifically for the illustration! ArthurPSmith (talk) 21:42, 19 October 2017 (UTC)
    Thanks for the input! Hmmm. I'll give it some thought. After all, "datatype" could be: 1) Wikimedia Commons file 2a) item with the cover itself 2b) item with the stuff the cover represents. Strakhov (talk) 22:12, 19 October 2017 (UTC)

    SPARQL query parsing values of string property

    Currently we use 2 systems to describe element composition of chemicals, the chemical formula and the element composition based on several statements using "has part" property. Some contributors propose to use the chemical formula to derive the element composition. The question is then: is it possible ?

    For example how can we extract the items of chemicals having a chlorine atom based on the chemical formula property ? Thanks for your help ? This require to 1) extract all items of chemicals having a chemical formula (P274) statement and 2) then to parse the value of this statement to look for the "Cl" substring. But how can we write that ?Snipre (talk) 21:57, 19 October 2017 (UTC)

    SELECT ?compound ?formula WHERE {
      ?compound wdt:P31 wd:Q11173 ;
                wdt:P274 ?formula.
      FILTER(CONTAINS(?formula, "Cl")).
    }
    

    Try it!

    write wdt:P274 instead of wdt:274. Then you get some 18k results. --Pasleim (talk) 22:24, 19 October 2017 (UTC)

    Congressional Data Challenge

    play with Library of Congress API - win prizes. https://labs.loc.gov/experiments/congressionalchallenge/ https://labs.loc.gov/ Slowking4 (talk) 01:58, 20 October 2017 (UTC)

    'Stated as' vs. 'author name string'

    It seems that the above are mutually redundant. Can anyone explain why two separate properties are needed, and if not, which should we deprecate/ delete? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 06:43, 13 October 2017 (UTC)

    • In general, we suggest that people read property descriptions.
      --- Jura 06:55, 13 October 2017 (UTC)
    author name string (P2093) is a workaround property for author (P50) in cases where there is no reliable way to identify the author.
    stated as (P1932) is used as qualifier on author (P50) to indicate how the value is printed in the source --Pasleim (talk) 07:12, 13 October 2017 (UTC)
    As helpful as usual, Jura. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 07:23, 13 October 2017 (UTC)
    The question is if anything Pasleim wrote can't be found on the property description page. What is missing?
    --- Jura 07:26, 13 October 2017 (UTC)
    author name string (P2093) is a workaround property for author (P50), when there is NO item for the author, or it is impossible to know which item would be correct (i.e. author (P50) cannot be identified just from what is written on the publication.
    stated as (P1932) is used when author (P50) is known, but the name on a book seems different from the name on the item (authors with many pseudonyms, initials, etc.) this is not at all the same. stated as (P1932) cannot be used as property, only as qualifier, and it can also be used for other properties (like publishers or actors ; I even recall seeing it, recently, on a book). --Hsarrazin (talk) 08:00, 13 October 2017 (UTC)
    Noting that P1932 is a qualifier, whereas P2093 would be a direct, and is a much broader property than just author. Really useful for old references where a place name or business has morphed.  — billinghurst sDrewth 10:44, 13 October 2017 (UTC)

    And there is a third item, similar : named as (P1810) - it is used in movie distributions, but the use is exactly the same as stated as (P1932). Shouldn't these 2 be merged ? --Hsarrazin (talk) 12:45, 16 October 2017 (UTC) I have used the author string in the past in addition to the author property when the spelling in the book is not covered by the author item. It didn't occur to me that this was only used for when the author doesn't exist as an item. I didn't know about "stated as" at all. It might be helpful to point these things out better in the property discussion. I suppose alternative spellings of author's names should probably go into the alias field of the author item (if it exists) rather than the publication item (assuming that item in turn links out via various property identifiers to the literary work in question). Jane023 (talk) 14:39, 13 October 2017 (UTC)

    stated as (P1932) is older, and has been specifically created to answer the librarian's need to be able to catalog the author, AND how he is stated on the work ([[15]). Even if the pseudo is stated as alias in the author field, how will you know which books were published under a specific pseudo, if you don't have the info on the book ? also, many works signed by title or grade (Captain X, Count Y..., which are very ambiguous..., and not pseudonyms)
    author name string (P2093) was only created because of the preparation of massive imports of scientific articles, as an easy dump, to allow for keeping info at hand, and afterwards create the authors. Please read Property talk:P2093 and Wikidata:Property proposal/Archive/39#P2093. This is very explicit. It should not be used instead of stated as (P1932). --Hsarrazin (talk) 15:59, 13 October 2017 (UTC)
    It looks like P2093 is missing a "value only" constraint. (The English description is clear about its use, but somehow we omitted a constraint).
    --- Jura 16:21, 13 October 2017 (UTC)
    @Jane023: I would suggest that where you need to add "stated as" that you would consider adding that to the alias label of the target, and also consider whether the target requires an additional item. I don't see this as alternative situation, I see it more as additional data.

    Plus in the situations that Hsarrazin mentions, where you find a use of P2039 and the author is now identifiable, then you update to the specific item.  — billinghurst sDrewth 22:52, 13 October 2017 (UTC)

    Yes that makes sense. I will also remove "author name string" when "author" is possible. Jane023 (talk) 23:34, 13 October 2017 (UTC)
    Recent discussion with colleagues involved in the WikiCite initiative resulted in agreement to keep "author name string" when "author" is added, so that the available metadata is not lost. Of course, such metadata can be transferred to "stated as" (hence my suggestion that the two are mutually redundant). Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:13, 16 October 2017 (UTC)
    Well they can't be mutually redundant in the case where there is no item for the author yet. Jane023 (talk) 12:30, 16 October 2017 (UTC)
    Consider: author (P50) -> "Unknown value"; qualified with stated as (P1932). Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:34, 16 October 2017 (UTC)
    Now that just looks like a kludge to get around using the "author name string". I don't like that and would say it is a misuse of the "no value" concept. Jane023 (talk) 14:28, 20 October 2017 (UTC)
    For movie credits named as (P1810) is used as a qualifier to indicate how the name of the subject of the item was recorded. (And this is also how named as (P1810) is used for Authority Control databases). stated as (P1932) is used to indicate how the job they did was recorded. Both qualifiers may be present on the same credit.
    stated as (P1932) rather than named as (P1810) should be used as a qualifier for author (P50), because it is the value of P50 that is being qualified, not its subject. Jheald (talk) 22:02, 17 October 2017 (UTC)

    VIAF ID sync

    Heads-up: Test bot run to add IDs to (mostly) people items. Example. Bot edits (bot does other edits too).

    • I loaded a VIAF ID dump with ~30 million entries into a Toolforge database
    • This contains matches to ~50 other databases (DNB, BNF, SUDOC)
    • I added the Wikidata item for entries where possible, via matches to these IDs on Wikidata
    • I am now checking these Wikidata items for some of the 50 databases where there is a Property
    • I am adding these if they are missing in Wikidata, unless
      • the value does not fit the regular expression given in the property, or
      • a statement for that property already exists in that item, or
      • a statement with that property was removed from that item at some point (irrespective of value), or
      • any of the properties checked have a value mismatch for the VIAF set (e.g., WD says DNB is 123 but VIAF set says it's 234, no edit for any property on that item will occur)

    I am running batches of 100 items right now for testing. It also adds the date, and "imported from:VIAF" as a reference (or rather, for source tracking). Please let me know if there are issues, before I make this larger batches, or a continuous service.

    Also, I am internally logging any issues the bot encounters, and may present those through an interface at a later date, for manual fine-tuning. --Magnus Manske (talk) 13:16, 14 October 2017 (UTC)

    Nice!
    Unless "a statement with that property was removed from that item at some point" seems useful as some other bot re-imports the same again.
    Some clusters include DNB ids for their disambiguations, but possible you already filter them out.
    In the past, people got annoyed when VIAF for locations were imported (e.g. to the many Dutch streets). Maybe you want to focus on items for people only.
    It's not entirely clear if ISNI adds much value. In the past, we also had to disable imports for some other property (which added 100s of identifiers to items). You probably want to skip these as well.
    It's seems that I encounter people with several VIAF regularly, but maybe we should let VIAF engines sort them out before importing them automatically.
    Once done, maybe items for remaining VIAF with several components, but without any conflicts could be created. If you import all film festivals, I would merge any duplicates.
    --- Jura 14:32, 14 October 2017 (UTC)
    Thanks Jura! I am now excluding non-human (not P31:Q5) items, as well as those with multiple VIAF values. I am not sure how to tell apart the "type n" DNBs just from the ID; will the GND ID (P227) regex filter take care of that? Or is there another way? --Magnus Manske (talk) 16:05, 14 October 2017 (UTC)
    As for ISNI, my data structure has (max) one ISNI for one VIAF, so I'll import that for now. I can turn it off easily if it's important, though. --Magnus Manske (talk) 16:10, 14 October 2017 (UTC)
    User:KasparBot/GND_Type_N has a list with some samples. In the VIAF webinterface, they are marked with "undifferentiated". As it's not really clear to me how ISNI are maintained, personally, I'd omit them. As they come from the same source, it's unlikely they add much.
    --- Jura 17:15, 14 October 2017 (UTC)
    Thanks, I have now made my own GND blacklist, seeded from KasparBot, and the bot will check every GND that's not on that list with the GND website before adding it. --Magnus Manske (talk) 19:00, 14 October 2017 (UTC)
    Magnus I am a big fan of continued work on improving identifier coverage. We had a lot of big bot runs in the past, but many of them slowed down after the initial push.
    Magnus, by the way last month Help:QuickStatements was created. Could you look over it and verify we got it right and point us in the right direction if we missed something. --Jarekt (talk) 17:43, 14 October 2017 (UTC)
    Thanks Jarekt! I changed P143 to P248, and add the VIAF ID to the reference (unless the property if VIAF, in which case it's redundant). Example edit. --Magnus Manske (talk) 18:30, 14 October 2017 (UTC)
    Little information just makes it more likely that it's incorrect. I might be mistaken, but it seems it collects labels from incorrect past matches. With all the others, people should have better ways to match.
    --- Jura 18:36, 14 October 2017 (UTC)
    Sorry Jura, I don't know what you mean by this. --Magnus Manske (talk) 19:17, 14 October 2017 (UTC)
    Jarekt wrote that ISNI only includes little information. We can't verify if the link is about the correct person. It seems to be that when ISNI gets re-clustered in VIAF, VIAF labels from other people remain in the ISNI file. If there is a good description of their algorithms, I'd be interested. In any case, it's not as transparent as VIAF. As you add other identifiers, user might be better served by these.
    --- Jura 19:37, 14 October 2017 (UTC)
    OK, thanks. I have deactivated adding ISNI for now. --Magnus Manske (talk) 11:31, 15 October 2017 (UTC)
    When adding identifiers derived from VIAF, may I suggest to add the VIAF ID used as part of the reference? (Not just "imported from VIAF".) And similarly, if a VIAF id is deduced from another id (say, an ISNI), add the ISNI as reference to the VIAF claim? I think it would be really nice to encourage this practice as it makes life a lot easier when trying to understand how an item was built. We have had this discussion with ArthurPSmith and Mike Peel recently. − Pintoch (talk) 18:49, 14 October 2017 (UTC)
    As was already suggested, I am now adding the VIAF ID into the reference, unless the property I add is VIAF itself. I have matched the VIAF IDs in my database to Wikidata via their own Wikidata mapping, via VIAF IDs on Wikidata, and then via a few other IDs (though I did not record which for each case), unless the VIAF was already matched to Wikidata. I believe the "collision avoidance" on ID values, as described above, should limit the issue considerably. --Magnus Manske (talk) 19:04, 14 October 2017 (UTC)

    Note: Example where the bot added a VIAF ID, in this case VIAF had already matched their entry to Wikidata, but we didn't have the "backlink"! --Magnus Manske (talk) 19:15, 14 October 2017 (UTC)

    • Magnus Manske: maybe more identifiers have "undifferentiated" as annotation (occasionally, I come across some from LOC), some others have "sparse". Probably none of these should be imported. Hope my explanation on ISNI was convincing. BTW, my offer for film festivals still stands. ;)
      --- Jura 06:58, 15 October 2017 (UTC)

    When it comes to links to sources, VIAF and Wikidata are very much alike in that they bring together the links of many other sources. For one person there should be one identifier in either. However, it happens that the same person has multiple identifiers and this is reason for a merge. This happen in both VIAF and Wikidata. Wikidata is one of the sources in VIAF. When we find duplicates in VIAF, we can identify both. In the future these two will be merged. We do not need to keep a link to the redirect, there is no value in it. All the links at VIAF for people are relevant because all of them link to libraries in one part of the world. ISNI is indeed a product of the OCLC but it is not about authors but about people. They too should include only one link to one person but for some time double entries may exist and will eventually be merged. Adding both to Wikidata helps this process but once one becomes a redirect, it can be removed from Wikidata.

    The point is that both VIAF and Wikidata represent a process. Our work strengthens what is done at OCLC. This is a two way process so lets import as much as we can and link to all the libraries in the world. Thanks, GerardM (talk) 08:31, 15 October 2017 (UTC)

    "We do not need to keep a link to the redirect, there is no value in it. " The opposite is true. If third party has the old, redirected VIAF identifier in their database, they should be able to use that when querying Wikidata, to find the matching item. That is the very reason we have withdrawn identifier value (Q21441764) avaialble as a qualifier for reason for deprecation (P2241). Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:08, 15 October 2017 (UTC)
    When people have a VIAF identifier and it does not match our, they can query VIAF and get an updated identifier. Our purpose is not to keep old old identifiers around. As to the fact that there is a property for this.. Not impressed, there are many properties that have little value and are hard if not impossible to maintain. Thanks, GerardM (talk) 12:46, 16 October 2017 (UTC)

    OK, I think I'll set this to run continuously now, so the entire 30M set (as far as it is matched to Wikidata) will be processed, in time. Let me know if there are things to change/fix. --Magnus Manske (talk) 16:37, 15 October 2017 (UTC)

    I’d just leave an impression of the imported data: there are surprisingly many cases where identifiers about different persons are mixed up at VIAF, and they are now imported to Wikidata in that poor condition as well. I recommend to check all the imported values for plausibility whenever they show up on the watchlist, and to split items in case this is necessary. —MisterSynergy (talk) 05:50, 18 October 2017 (UTC)
    No. That is not realistic. When we identify mismatches, we can add the right items to the identifiers that are misplaced. VIAF has its processes to pick up such issues. Conversely, when VIAF has an identifier that is different from what we think, it is also for us something to consider. Anything else is part of the "Nirvana fallacy". Thanks, GerardM (talk) 06:05, 18 October 2017 (UTC)
    Not sure what you mean by “not realistic”; the messed-up VIAF entries I’ve seen in this bot run are very realistic, and maybe they can fix it by crawling Wikidata every now and then. Effectively they leverage Wikidata’s work force to maintain their database (what is okay to some extent).
    I also do not understand how you “add the right items to the identifiers that are misplaced”. Typically identifiers are added to items, not the other way round. —MisterSynergy (talk) 06:12, 18 October 2017 (UTC)
    There is nothing wrong with errors when the errors are in a work in progress, particularly not when the organisation of that work is a partner of ours. The notion that things have to perfect is a fallacy. It prevents us from achieving more. Thanks, GerardM (talk) 09:11, 18 October 2017 (UTC)
    This is what I wanted to raise awareness for. There is this bot run, and we should not expect it to be perfect. In fact, it contains a lot of mistakes which need manual fixes. Let’s be honest: most errors will likely just be duplicated to Wikidata, since most items are unwatched and 30M data sets cannot be watched by the community. “Work in progress” is an euphemistic description for the quasi-permantent nature of the errors which are replicated here. —MisterSynergy (talk) 10:13, 18 October 2017 (UTC)
    Your conclusion is wrong. Yes, mistakes will be included in Wikidata but work will continue both in Wikidata and VIAF. So we should compare and delete existing data using the future versions of VIAF. An entry should only exist once and needs to be merged to the correct items. There should be nothing permanent in Wikidata as far as VIAF is concerned because the authority of data is with VIAF and not with us. Disambiguation is at VIAF not at Wikidata either. Thanks, GerardM (talk) 13:09, 18 October 2017 (UTC)
    I am not convinced. From the cases that I saw until now it seems that they have performed some automatic mapping of Wikidata items to their data, likely based on same name + year of birth, but ignoring occupations and so on. This approach is somewhat robust and efficient, but not 100% safe and the mistakes are really difficult to solve. So to be clear: they have messed it up, and the situation has not been solved at VIAF’s side for 2.5 years now. I don’t have much confidence in their ability to spot and correct things by themselves in acceptable times, so “compare … future versions of VIAF” is not an acceptable option here. Instead I spent a lot of time to keep the Wikidata items I work on clean from bad data, and as a side effect I maintain VIAF’s records. —MisterSynergy (talk) 15:43, 18 October 2017 (UTC)
    The notion of "100% safe" is a fallacy. It will never happen. Luckily we are engaged in a lot of work by associating a Wikidata item not only with the VIAF id but also with the national library identifiers. This gives even more scope to collaborate on the international scale that is both Wikidata and VIAF. Your reply is only about your past experiences and there is nothing in there indicating how to move forward. IMHO, continued synchronisation between VIAF (including the national library identifiers) and Wikidata will help. Yes, we need to get rid of all the chaff in the process. Thanks, GerardM (talk) 06:28, 20 October 2017 (UTC)
    If I only saw how VIAF moved forward… unfortunately I cannot see anything there.
    I just spent 15 minutes to figure out what to do with another obvioulsly poor VIAF entry, but I wasn’t able to solve the problem anywhere. There are also four bad national library database entries, three of which already in the Wikidata item. Do we have a help page that indicates proper workflows when correcting misassignments? You seem to be more experienced with that task than I am, so please give advice… Thanks, MisterSynergy (talk) 09:35, 20 October 2017 (UTC)

    Well, I do a lot of work around VIAF, and when I notice a wrong VIAF clustering, I just use the link in bottom of their site to send them an error report, specifying which library entries should not have been merged... and giving them link to sources to correct. Generally, they do not answer, but the correction is made ^^ --Hsarrazin (talk) 09:49, 20 October 2017 (UTC)

    Don't we have mechanisms within Wikidata or Wikimedia to deal with that? I am not willing to reveal my identity (email, IP address, real name or Wikimedia nickname) to VIAF. —MisterSynergy (talk) 12:38, 20 October 2017 (UTC)

    How to add "reference URL" property?

    That is breaking my mind for 10 mins by now. Say Q2474366, for the burget field I choose add reference, property reference URL, paste value https://www.kinopoisk.ru/film/gitler-kaput-2008-396473/box/.
    It is all fine but now what do I have to do, jump 3 times, pray, something else? Because there is no Save option anywhere of any kind, only remove and global cancel. --NeoLexx (talk) 18:41, 23 October 2017 (UTC)

    Hopefully it is not the same answer from 2013 (that the interface is presented but is not working yet). --NeoLexx (talk) 23:10, 23 October 2017 (UTC)

    Um... It works all suddently. Sorry, it must be something wrong on my side. --NeoLexx (talk) 00:26, 24 October 2017 (UTC)

    This section was archived on a request by: Matěj Suchánek (talk) 17:09, 26 October 2017 (UTC)

    Help with undoing a merge

    Someone erroneously merged Category:Love songs (Category:Love songs (Q9578554)) with Category:Torch songs (Category:Torch songs (Q13297379)). Can someone please help and fix it? thanks, DGtal (talk) 13:50, 24 October 2017 (UTC)

    I restored version before 17 october. --ValterVB (talk) 16:43, 24 October 2017 (UTC)
    User:ValterVB thanks. DGtal (talk) 08:10, 26 October 2017 (UTC)
    This section was archived on a request by: Matěj Suchánek (talk) 17:09, 26 October 2017 (UTC)

    Merge

    Q5334980 and William Edward Irwin (Q5334979), I get an error message when I try and merge, I already merged the Wikipedia entries. Both created from duplicate Baseball Reference entries. --Richard Arthur Norton (1958- ) (talk) 13:35, 26 October 2017 (UTC)

    You found a true duplicate pair, ie. two items have the same sitelink. Just remove it from one of them and merge. Matěj Suchánek (talk) 13:42, 26 October 2017 (UTC)
    That did it, thanks. --Richard Arthur Norton (1958- ) (talk) 14:25, 26 October 2017 (UTC)
    This section was archived on a request by: Matěj Suchánek (talk) 17:07, 26 October 2017 (UTC)

    Rank of surnames

    I'm trying to add some data for the ranking of surnames in Norway. I added an example in Hansen (Q2712367). However, there are multiple issues. One is that the property I used (ranking (P1352)) seems to have constraints limiting it to be used for humans only, though this isn't explained in the property's label or description. The other isuse is that many of the source qualifiers I've used, which are the ones laid out in Help:Sources#Web page, give constraint errors saying they're only meant to be used on items that are subclasses of work (Q386724). So before I proceed with the other 3467 names I'd like some advice on what to do. Should the constraints be changed, or should I be using different properties? Jon Harald Søby (talk) 09:47, 20 October 2017 (UTC)

    • You could try the approach used for given names in the Netherlands, e.g. Q4925477#P793.
      --- Jura 09:51, 20 October 2017 (UTC)
      • Still don't think it is the correct property for that. Not sure what property is better, though. Sjoerd de Bruin (talk) 13:03, 20 October 2017 (UTC)
    • This is a very similar issue to what was discussed above under UEFA ranking. I think ranking (P1352)'s constraints need to be relaxed considerably, there's no reason it should be limited to humans or have that limited list of qualifiers. ArthurPSmith (talk) 15:23, 20 October 2017 (UTC)
      • It could work as a qualifier on the same statement. The outcome would be mostly the same. The current approach for sports rankings (at least for cyclists) seems to be to do a property per ranking.
        --- Jura 04:50, 21 October 2017 (UTC)

    UI - entry screen for labels and descriptions

    As far as I can tell, when I click "edit" for labels and descriptions on an item, I randomly get the single-language edit form or I just get editable fields in the multi-language display. Is there any way to control which interface opens up? - PKM (talk) 17:54, 20 October 2017 (UTC)

    There may a problem with scripts, please follow Wikidata:Contact the development team/Archive/2017/10#Unable to edit description in item main page and tell if it helps you. Matěj Suchánek (talk) 18:20, 20 October 2017 (UTC)
    That discussion is archived. Is User:Mxn aware of the issue? I'll ping him at en.WP. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:26, 20 October 2017 (UTC)

    Correct property for a Site of Special Scientific Interest (SSSI) in the UK

    Quick question and sorry if obvious: What is the correct property for Site of Special Scientific Interest (Q422211)? I can find 1932 instances where heritage designation (P1435) has been used and 134 where instance of (P31) has been used. There are 19 instance where both are used. There is also a separate property of Site of Special Scientific Interest (England) ID (P2621) which is also commonly applied, often in addition to the 2 mentioned above. Any advice welcome. Many thanks JerryL2017 (talk) 18:35, 17 October 2017 (UTC)

    Not P31, I think. P31 should be used to indicate what kind of place the site is -- eg marsh, ancient wood, open moorland, etc. Jheald (talk) 21:49, 17 October 2017 (UTC)
    I don't have the anwser, but I see that a Site of Special Scientific Interest have a WDPA ID (P809) and also a IUCN protected areas category (P814). It's like tey have a double nature has a heritage site and protected area. --Fralambert (talk) 23:39, 17 October 2017 (UTC)
    Just a suggestion but maybe we need a new property, as a sub-property of instance of (P31), alongside heritage designation (P1435) called "Natural Heritage Designation". Items with this property could include Site of Special Scientific Interest (Q422211) but also Ramsar site (Q20905436) , nature reserve (Q179049) and Special Protection Area (Q2463705), lots of others too probably. JerryL2017 (talk) 06:59, 19 October 2017 (UTC)
    I don't think this is a good idea, since for most of the protected areas, like national park (Q46169) and nature reserve (Q179049) have only one instance of (P31), there legal statuses. --Fralambert (talk) 14:08, 21 October 2017 (UTC)
    heritage designation (P1435) has "Heritage" as the primary use, but it's also defined as "protected area", and the constraints allow use of subclasses of natural heritage (Q386426). I think using this for SSSIs and similar is reasonable. Andrew Gray (talk) 12:14, 19 October 2017 (UTC)
    I had the the same problem with the historic provincial park (Q28059516) in Saskatchewan and special place (Q14916958) in Nova Scotia. The only solution I found was to put the result in instance of (P31) and heritage designation (P1435). --Fralambert (talk) 14:08, 21 October 2017 (UTC)

    Import books from Open Library

    Hoi, I want to add books available at the Open Library. For the first batch they will be books by authors that have both a Wikidata and an Open Library identifier. The books will be known because they have an identifier like ISBN or a Library of Congress identifier. The books will be available from the Open Library so there is an obvious benefit to the users of Wikidata. Thanks, GerardM (talk) 13:08, 20 October 2017 (UTC)

    These "books" would be... works or editions? strakhov (talk) 13:32, 20 October 2017 (UTC)
    I think this is a task for a bot (or an equivalent use of QuickStatements) and I would therefore appreciate a bot request. ChristianKl (talk) 13:33, 20 October 2017 (UTC)
    They would be whatever is defined by the ISBN, the LoC identifier. A primary consideration is that they link directly to something you can read at Open Library. This is something we do not do for Wikisource. Thanks, GerardM (talk) 10:24, 21 October 2017 (UTC)

    Birthday template

     This user is celebrating Wikidata's 5th birthday.

    {{User Wikidata birthday 2017}} (shown above) is now available for use on your user pages. It can be added to Babel, thus:

    {{#babel:xx|Wikidata birthday 2017|}}

    Happy birthday, Wikidata! Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:22, 20 October 2017 (UTC)

    Some languages (like mine) show number four. I can fix those with "4" but what about the rest? Are they all really about the fifth, not the fourth? Matěj Suchánek (talk) 07:59, 21 October 2017 (UTC)
    See the template's talk page. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:20, 21 October 2017 (UTC)

    located in the administrative territorial entity (P131) constraint violation

    Tyrolean Art Cadastre inventory ID (P4219) is just an example here with its constraint violations. I feel there is a much more general problem not related to Tyrolean Art Cadastre inventory ID (P4219) only.

    The violations located in an administrative territorial entity different from an Austrian municipality are IMHO false positive, as the true reason behind is that the property located in the administrative territorial entity (P131) has two values, one of them is a district and one is a municipality. This is not related to Tyrolean Art Cadastre inventory ID (P4219) but only to located in the administrative territorial entity (P131) and should be only reported there. A bot could do a cleanup to remove parent administrative units.

    @UV: for [16]. Maybe it would be useful to add a hint to {{Complex constraint violations report}}, where to leave - in general - related problem reports? best --Herzi Pinki (talk) 13:36, 21 October 2017 (UTC)

    • I tried to fix the query accordingly. Obviously, if the administrative layer is meaningless altogether, maybe it shouldn't be on most (or all) items.
      --- Jura 13:45, 21 October 2017 (UTC)
    thanks, your modified query left only one constraint violation, which was reported correctly. --Herzi Pinki (talk) 14:10, 21 October 2017 (UTC)

    Wikidata:Requests for comment/Defining account creators

    I have started this RFC about account creators. --MediaWiki message delivery (talk) 22:00, 21 October 2017 (UTC) (for Rschen7754)

    How to define granting of a royal charter?

    What should the correct pattern be for defining when an organisation has been assigned a royal charter? There are several approaches, none of which seem right to me:

    Using award received seems close, but is a royal charter considered an award? Pauljmackay (talk) 07:28, 22 October 2017 (UTC)

    A royal charter is awarded (try a Google search for "awarded+a+royal+charter", including the quote marks), so yes it is an award. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 08:22, 22 October 2017 (UTC)
    Pretty much agree with Andy here that it is awarded, though I would have used the term "granted" however that isn't a specific property, and P166 is the closest. I would qualify with a start/end date, as applicable. If there is other pertinent detail in the charter, that can be qualifiers too (conferred by, has cause, ...).  — billinghurst sDrewth 13:32, 22 October 2017 (UTC)
    Charter is both a noun and a verb; the noun is the document, the verb is the act of granting the authority that is memorialize in the document. So the question becomes, when assigning items as values for properties, should we pay strict attention to whether the item is defined as a noun or verb in its label or description? If so, we would have to create many additional items, so that whenever applicable, the item has both a noun and verb for.
    Alternatively, we could be flexible, and expect the reader to interpret an item as either a noun or verb as applicable. If we accept the verb form of "royal charter" as "while acting as a sovereign, to grant authority which is memorialized in a document" then it certainly could be the inception of a unit of government or an organization. Jc3s5h (talk) 15:24, 22 October 2017 (UTC)

    Poughkeepsie vs Poughkeepsie and Princeton vs Princeton

    Should we create a Wikidata-specific criterion (Q27949687) for all the locations that are within an administrative area of the same name? They are constantly getting confused by people who list places of death as the smaller entity or the adjacent entity, when it should be the other entity. We use two persons with the same date of birth and date of death (Q20978290) for a similar reason, to let people know that they are distinct, and may be confused. --Richard Arthur Norton (1958- ) (talk) 12:59, 18 October 2017 (UTC)

    I am not sure I understand the proposal, so let me rephrase it. Currently we use two persons with the same date of birth and date of death (Q20978290) to differentiate easy to confuse people, for example Ivan Turgenev (Q42831) different from (P1889) Sebastián Evans (Q6122889) / criterion used (P1013) two persons with the same date of birth and date of death (Q20978290). And you would like to create similar Wikidata-specific criterion (Q27949687) for easy to confuse locations, that have the same name and are within the same administrative area? If so than that sounds reasonable. --Jarekt (talk) 15:13, 18 October 2017 (UTC)
    We ran the doppelganger search in the past and merged duplicates that were synonyms of the same person and flagged the others, that were distinct people, with that tag. THat way when it was run again we would not have to look at the old ones again. --Richard Arthur Norton (1958- ) (talk) 19:49, 18 October 2017 (UTC)
    This property sounds like a good idea to me. - PKM (talk) 17:50, 18 October 2017 (UTC)
    There is an alternative that would deal with many of these, that is, to give the full official name as the label. For example, "City of Poughkeepsie" and "Town of Poughkeepsie" or "Poughkeepsie, City of" and "Poughkeepsie, Town of". Jc3s5h (talk) 18:34, 18 October 2017 (UTC)
    The flag would be to remind people to be careful about which one is chosen. In most cases the source material only says "Poughkeepsie" or "Princeton", and the editor must do more research to find which one is correct.
    I added two adjacent locations with the same name (Q42304567), any suggestions on how we can automate locating doppleganger locations and flagging them the way we did doppleganger people? --Richard Arthur Norton (1958- ) (talk) 20:14, 22 October 2017 (UTC)

    looking for data on lipstick

    HI, I am searching for data sets on lipstick. Sales, brands, colors, historical data, any information that may be accessible. I am new to Wikidata so thought to post my situation. Thank you kindly. ART  – The preceding unsigned comment was added by 2601:647:8104:45c0:95e5:4bc4:85b7:8d81 (talk • contribs) at 21. 10. 2017, 13:50‎ (UTC).

    Unfortunately, it doesn't seem like a lot has been done in relation to lipstick (Q184191) brands and specific products. This seems to also be the case in English Wikipedia, where one would expect that if any edition might have this info, it would be English. A good place to start investigating the topic would be to look at what instances and subclasses are there, with a tool like SQID - [17]. Feel free to propose some ideas or reliable sources where Wikidata could start in this area. -- Fuzheado (talk) 16:28, 21 October 2017 (UTC)
    Outside of Wikidata https://opendata.stackexchange.com/ might be a good place to look for data sets. ChristianKl (talk) 12:04, 22 October 2017 (UTC)
    Yes, definitely post your question at the opendata URL above, that's your best bet. Cheers! Syced (talk) 06:31, 23 October 2017 (UTC)

    Update Wikidata:Statistics

    The Wikidata:Statistics has a nice diagram showing "What is in Wikidata?" on the right. But these numbers are dated to 2015-10-25 and are therefore quite outdated. They seem to be integrated with {{Site content|wikidata}}. Can someone update and make sure that it keeps up-to-date? --Zuphilip (talk) 16:11, 22 October 2017 (UTC)

    Here is a updated one. Sjoerd de Bruin (talk) 19:56, 22 October 2017 (UTC)
    Okay, thank you. I tried to update the numbers and the diagram already looks more up-to-date now. However, it looks that the value "None" is not correct, because the sum exceed now the total number. Moreover, the last category about the number of scholarly article (Q13442814) does not show up (maybe number of publication (Q732577) would be even better). User:Sjoerddebruin, User:Zolo Any idea how we can change that, such that the number of publications will also show up?

    ID property issue because of URL structure

    Hi all

    There is an issue with the Gatehouse Gazetteer place ID (P4141), currently the link to the item doesn't work because the URL sequence is broken. e.g Laugharne Castle (Q911714).

    To fix this I just need to change the 'formatter URL' on Gatehouse Gazetteer place ID (P4141) but it looks from the website the sites in England, Wales, and 'the islands' have different URL structures e.g

    Is there any way to fix this using qualifiers or something?

    Thanks

    --John Cummings (talk) 09:12, 17 October 2017 (UTC)

    • Currently all values seem to be 1 to 4 digits (except the sample). Maybe a qualifier could be added and this read by the authority control gadget.
      --- Jura 09:22, 17 October 2017 (UTC)
    @Jura1:, thanks but I don't understand how this fixes the issue, all I want to do is make the click throughs work when you click on the ID number on the item, which is governed by the 'formatted URL'. It is very unlikely there will be any new data to import. --John Cummings (talk) 09:40, 17 October 2017 (UTC)
    We can have identifiers without formatter urls. If it ceases to work, the formatter url should be set to deprecated rank or, if it never worked, deleted.
    --- Jura 09:43, 17 October 2017 (UTC)
    OK, but is it possible to have one 'formatter URL' statement that can have more tahn one URL structure? --John Cummings (talk) 09:58, 17 October 2017 (UTC)
    Not at the moment, no. Andrew Gray (talk) 11:07, 17 October 2017 (UTC)

    Not good news. Well, I suppose this is what happens when identifiers said to be stable turn out not to be. Given over 1000 instances it would not be so great to make the formatter http://www.gatehouse-gazetteer.info/$ and then change the entries. But if the site changes things once, it might do it again.

    SPARQL based on located in the administrative territorial entity (P131) could probably figure out which of the locations are in Wales. So there is scope for a spreadsheet-style bit of automation prefixing Welshsites/, for example. Charles Matthews (talk) 11:11, 17 October 2017 (UTC)

    We could use a bot to check for the different possible prefixes, which could update the ID using the prefix that doesn't return a 404 error. Alternatively, maybe split the identifier into three, each with the different prefix. Thanks. Mike Peel (talk) 11:19, 17 October 2017 (UTC)
    It shouldn't really be an external identifier property if the URL keeps changing.
    --- Jura 13:15, 17 October 2017 (UTC)
    • Note the problem was there from the time of property creation, it is not something new due to a change in their URL's. I fixed the formatter URL though using my wikidata-externalid service at wmflabs, so it works with the original id formats that were in the proposal. I'm not sure this is how the property is actually being used at the moment. There's a property constraint right now that seems to expect a numeric id. ArthurPSmith (talk) 14:39, 17 October 2017 (UTC)
      • Please see my comment above (09:22, 17 October 2017)
        --- Jura 14:41, 17 October 2017 (UTC)
        • Ok, so clearly this property has been entered with incorrect values relative to the plan expressed in the original proposal. @John Cummings: I have fixed Laugharne Castle (Q911714), but the other 1000+ entries will need to be similarly fixed to prefix the numeric id with the correct location string. Also the constraint on the property should be fixed. Note that the numeric ID is NOT sufficient to be an external identifier, because the same number is reused by this site for the different locations (there is both a Welsh and an English site with numeric value 225). ArthurPSmith (talk) 15:08, 17 October 2017 (UTC)
        • Actually it looks like all (?almost?) the sites entered so far are in Wales, so prepending "Welshsites/" to all the ids would fix (most of?) them. @NavinoEvans: can you take a look at this? ArthurPSmith (talk) 15:16, 17 October 2017 (UTC)
          • You could try {{Autofix}}.
            --- Jura 16:20, 17 October 2017 (UTC)
      • @ArthurPSmith: I think you just made the problem discussed above at Wikidata:Project_chat#References_to_wmflabs.3F worse by changing the formatter URL. :-/ Thanks. Mike Peel (talk) 16:29, 17 October 2017 (UTC)
        • @Mike Peel: then this thread is a good illustration why links to wmflabs should not blindly be considered detrimental.It's the self-citation situation which is detrimental, and such a situation can happen with other domain names (for instance, other databases harvesting wikidata). − Pintoch (talk) 16:42, 17 October 2017 (UTC)
        • @Mike Peel: I really don't think that's a legitimate complaint in any way from the enwiki folks, at least as regard these types of links. It's no more "obscuring" the original URL than archive.org URL's obscure the original - less so in a way since it generates a redirect that any machine can follow to find the actual source. ArthurPSmith (talk) 17:28, 17 October 2017 (UTC)
          • FYI I've fixed (with Quickstatements) all the cases with a numeric P4141 identifier where the item was in a located in the administrative territorial entity (P131) chain leading to Wales (Q25) - this was over 900 of them. There are a handful left over that somebody could perhaps check by hand. ArthurPSmith (talk) 16:52, 18 October 2017 (UTC)
          • @Pintoch, ArthurPSmith: I'm being caught in the middle here - technically, what you say makes sense, but it's still an internal link rather than one to an external source. In the case of archive.org links, at least we have archive URL (P1065).
          • Having said that, thanks for making the fix. Is there a link to the 'handful left over' that I can help work through? Thanks. Mike Peel (talk) 00:35, 19 October 2017 (UTC)
    • @Mike Peel: actually it's 237. I think these are (almost?) all in Wales also, but their P131 is wrong or missing; the import job seems to have been a little haphazard. Try
      select ?item ?itemLabel ?ggid where {
        ?item wdt:P4141 ?ggid .
        FILTER regex(?ggid, "^\\d+$")
        SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
       } order by xsd:integer(?ggid)
      
      Try it! ArthurPSmith (talk) 12:56, 19 October 2017 (UTC)
      These are all fixed now. Turns out that they were all Welsh sites. Thanks. Mike Peel (talk) 19:31, 21 October 2017 (UTC)
      @ArthurPSmith: Is there really no way to include spaces in formatter URLs apart from going through your external tool? If so, is this on phabricator somewhere? Thanks. Mike Peel (talk) 19:40, 21 October 2017 (UTC)
        • @Mike Peel: Spaces do work in other formatters, I'm not sure what the issue is with this particular website, but I could not get anything to work when I tried replacing the %20's with spaces originally. ArthurPSmith (talk) 13:23, 23 October 2017 (UTC)

    Two items on the same topic

    Bussokusekika (Q2928770) and Bussokusekika stone (Q18234955) are both on the same topic. The en.wiki and fr.wiki articles at Q2928770 were both misnamed, and the actual topic "Bussokusekika" only currently has an article on ja.wiki. I fixed the title of the English article, but am having a dog of a time making the interwiki links go to the right place. When I try to edit the links, though, I am told that I can't because Q18234955 exists and that I can merge the two if they are on exactly the same topic, but they aren't. What's the normal procedure here? Hijiri88 (talk) 07:52, 21 October 2017 (UTC)

    It seems to me that both items have ja.wiki links and when ja.wiki considers both topics important enough to have their own article, that's two topics. ChristianKl (talk) 13:04, 21 October 2017 (UTC)
    @ChristianKl: Yeah, but the ja.wiki articles need to be switched: the en.wiki and fr.wiki articles correspond to one topic, but currently are linked to the other. The en.wiki article currently links to the ja.wiki article 仏足石歌, but it should link to the ja.wiki article 仏足跡歌碑. I don't know how to fix that, so I came here. I gather that the correct path would be to remove Q18234955, swap out the ja.wiki entry for Q2928770 so it links to the right article, and create a new item that includes the ja.wiki article currently linked to Q2928770. Forgive me if I am missing something. Hijiri88 (talk) 13:58, 21 October 2017 (UTC)
    And FWIW, I want to create an article on the topic 仏足石歌 on en.wiki, but I want the credit for "creating the new page" for en:WP:WAM purposes. To do that, I have to get the bad redirect created by my move deleted. But I don't want to do that until I've fully cleaned up after my move, and doing that requires fixing the interwiki links. Hijiri88 (talk) 14:00, 21 October 2017 (UTC)
    I added label+description+statements for clarity. Syced (talk) 06:57, 23 October 2017 (UTC)

    Why sections on this page are opened by default on mobile?

    Who can tell me why? --111.160.133.118 02:27, 23 October 2017 (UTC)

    See Wikidata:Project chat#Technical issue. Sjoerd de Bruin (talk) 07:39, 23 October 2017 (UTC)

    Advice from Russian speakers requested

    Our website account on (P553) is used for:

    Both appear to be Russian-language social media networks, and so used in Russia (and perhaps neighbouring countries). Are there likely to be any more applicable values, which we have not yet included? Are there templates for these on ru.Wikipedia, etc, which we could scrape? In other words, should we have specific properties for them? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:30, 20 October 2017 (UTC)

    I don't think we need specific properties for them. They slowly, but steadily loose audience to VK (Q116933), Instagram (Q209330) and Facebook (Q355) --Ghuron (talk) 11:54, 24 October 2017 (UTC)

    Cleaning up stated in (P248)

    Enwiki only wants to import statements that are sourced. Currently, stated in (P248) is sometimes used to refer to another Wikidata item from which information is taken. That usually violates the contraints on stated in (P248) and produces problem for Wikipedia as described on https://en.wikipedia.org/wiki/Module_talk:WikidataIB#Not_sourced_to_Wikipedia_or_wmflabs.2C_and_oh_crap . I would advocate to run a bot that removes constraint violating stated in (P248) statements. ChristianKl (talk) 15:16, 22 October 2017 (UTC)

    Is there a smarter way to do this?

    Namely, the usage of closed on (P3026) in Esperanto-Domo (Q42306289). ~nmaia d 03:08, 23 October 2017 (UTC)

    I don't know of one at least. I guess the system to describe opening times will anyway need an overhaul somehow. I think it could be worth to evaluate the use of the syntax of OpenStreetMap's osmwiki:Key:opening_hours as perhaps the most powerful currently available. Unfortunately, that doesn't help you at the moment of course. Best regards, --Marsupium (talk) 18:40, 23 October 2017 (UTC)
    @Marsupium: I agree, OSM's syntax for that is pretty good. Do you know of the latest developments on opening hours? I remember seeing a property proposal that was shot down, and nothing else since. ~nmaia d 23:35, 23 October 2017 (UTC)
    As far as my search could reconstruct it the proposal Wikidata:Property proposal/Archive/51#opening hours was "shot down" in May 2016 and followed by Wikidata:Property proposal/opening hours that ended with the creation of closed on (P3026) and other properties in August 2016. Those facilitate to model days (with some drawbacks compared to OpenStreetMap), but not hours. The OpenStreetMap format was mentioned at Wikidata:Property proposal/Archive/49#scheduled services days and again by Andy Mabbett in the first proposal: "OpenStreetMap's method, which is semi structured, is at http://wiki.openstreetmap.org/wiki/Key:opening_hours Can we learn anything from that? The microformats community also did some analysis, see microformats.org/wiki/opening-hours", but no further investigated it seems. There is definitely needed more work on our model for opening/collection, … times. --Marsupium (talk) 11:17, 24 October 2017 (UTC)
    BTW: Our current system is quite similar to http://schema.org/OpeningHoursSpecification. --Marsupium (talk) 11:19, 24 October 2017 (UTC)

    Wikidata weekly summary #283

    Wikidata weekly summary #282

    Dupes marked as dupes but unmerged

    Using the VIAF dupe tool I found Q20525494 and Jelica Belović-Bernadžikovski (Q12633600) marked as dupes as "Wikimedia duplicated page" but unmerged, is there a reason they are unmerged and categorized as a dupe? --Richard Arthur Norton (1958- ) (talk) 04:37, 24 October 2017 (UTC)

    • You didn't merge the articles yet?
      --- Jura 04:51, 24 October 2017 (UTC)
    No, I wasn't sure if they were marked as duplicates with Wikimedia duplicated page (Q17362920) for some other reason, and not meant to be merged. Or is other work needs to be done first. Best to be cautious. --Richard Arthur Norton (1958- ) (talk) 04:56, 24 October 2017 (UTC)
    There is a list of them at Wikidata:Database reports/Identified duplicates. If you want, you could move all statements to one item. A bot periodically (Matěj Suchánek's) checks the list for mergers and then merges the items.
    --- Jura 05:04, 24 October 2017 (UTC)
    But some of them are missing the duplicated from items, e.g. Didiscus benthamii Domin (1908) (Q17133890) duplicated with which? --Liuxinyu970226 (talk) 06:17, 24 October 2017 (UTC)
    Then it's probably best to search in the history and/or ask the relevant user. Matěj Suchánek (talk) 11:56, 24 October 2017 (UTC)
    So @Brya:? --Liuxinyu970226 (talk) 00:01, 25 October 2017 (UTC)
    Yes, Didiscus benthamii Domin (1908) (Q17133890) is not right. At the time I was trying to figure out what to do with such items: it is one of a set produced by Lsjbot, derived from a then error in CoL (not much later corrected). CoL had copied a database and had promoted every name in it into a species (although in the database this was marked as being a piece of junk). It will be a duplicate of some sort, but nobody seems to know (or care) of what. The best solution would be for the ceb, sv and warwiki to delete such items. Unfortunately, such things are not particularly rare (there are several known sets of junk in ceb, sv and warwiki, and who knows how many unknown sets). - Brya (talk) 03:14, 25 October 2017 (UTC)

    Feature suggestion: Automated testing

    This is an idea I have been mulling over for a while, and as far as I can tell, nothing like this exists. The idea is to have a system that allows automated testing to spot errors in Wikidata entries, a little like Unit testing in software development, for example:

    • Test that for any item that has both P17 and P131, that the location in P131 is in country P17 (taking into account the possibility of multiple values for each, and also former countries)
    • Test that every item with P131 and a coordinate location that these correpsond
    • Test that P576(dissolved date) is not before P571(inception)
    • Test that death date is not before birth date

    While in theory, this could easily be done by anyone using the API, it might be nice if there were some central framework and collection of tests to do something like this. I would be interested to know what other people think of this, or whether it does exist somewhere else and I've just missed it.  – The preceding unsigned comment was added by Danielt998 (talk • contribs) at 15:27, October 24, 2017‎ (UTC).

    @Danielt998: What you are suggesting is I think a more sophisticated form of constraint checking. Some wikiprojects do this now on an ad hoc basis with auto-run SPARQL queries. I think this is also the sort of thing that W3C Shape Expressions are intended for - there's a talk coming up at Wikidatacon on this - Wikidata:WikidataCon 2017/Submissions/Using Shape Expressions for data quality and consistency in Wikidata. ArthurPSmith (talk) 19:34, 24 October 2017 (UTC)

    Wikisyntax in media legend

    Hi to everybody.

    Is there any policy or recommendation about using wikisyntax in media legend (P2096)? If we use it, we can add italics o wikilinks to media legends in Wikipedia but if we use only plain text, its content will be easier to share in other places. So, I'm not sure about what to do.

    Thanks in advance. Paucabot (talk) 08:18, 22 October 2017 (UTC)

    • Wikilinks in media legends are problematic in my opinion. They only work in a specific Wikimedia project, but the media legend does not express which one. Most likely the Wikipedia project corresponding to the media legend language of course, but this does not need to be the case. Another problem is that in case of page moves we would need to update the media legend manually, which would easily be forgotten. —MisterSynergy (talk) 08:25, 22 October 2017 (UTC)
    • Wikisyntax is evil. I say this always. The problem is many users got used to it because of wrong implementations of Lua modules and infoboxes that do not escape arbitrary strings properly. Matěj Suchánek (talk) 08:36, 22 October 2017 (UTC)
      • Also note that Wikidata is for everyone, not just Wikipedia and other sister projects. The media legend should be useable for everyone. Sjoerd de Bruin (talk) 09:30, 22 October 2017 (UTC)
    • They are kind of essential when using this property for caption for images in infoboxes. Note that not all wikitext formatting works, though - wikilinks do, but | doesn't, nor does italics. Thanks. Mike Peel (talk) 16:59, 22 October 2017 (UTC)
    • I think the text should be written in order to be comprehensible without wikilinks. For P18 on items about a person, this could be limited to "2nd person from left" with a date read from the qualifier "point in time".
      --- Jura 08:42, 23 October 2017 (UTC)
    Thanks to everybody for the input. Paucabot (talk) 18:42, 25 October 2017 (UTC)

    inverse family relationship (Q42248293): Q38791127/Q42260742 or Q38791127/child (Q38693655)?

    Q38791127 is Q39155785. Q42260742 also is Q39155785. child (Q38693655) is not pair, is Q30103061. --Fractaler (talk) 12:38, 22 October 2017 (UTC)

    • I haven't thought of these pairs when creating inverse family relationship (Q42248293). If it can't be sorted out, maybe you could try to do more specific items (for use in "criterion used" qualifier) that describe how they relate?
      --- Jura 08:38, 23 October 2017 (UTC)
    inverse family relationship (Q42248293): "for "opposite of"-statements on kinship type items. Sample use: if A is "father" of B, B is "child" of A". Q38791127/Q42260742 is a Q39155785's kinship type (=if F and M is "father and mother" of S and D, S and D is "son and daughter" of F and M), ). Previous is parent (Q7566)/child (Q38693655) (Q30103061's kinship type (=if A is "father" of B, B is "child" of A"). Previous is Q39627308's kinship type (when objects so far only have the ability to create a kinship, but at the moment do not have this; or when may be child (Q26262236)/may be parent (Q26262238)) --Fractaler (talk) 09:49, 23 October 2017 (UTC)
    I don't see the purpose of such items. Where do you want use Q42260742 except for claiming that it is the inverse of Q38791127? --Pasleim (talk) 11:24, 23 October 2017 (UTC)
    Ok, what inverse family relationship (Q42248293) for 1) Q38791127, 2) parent (Q7566)?--Fractaler (talk) 19:55, 23 October 2017 (UTC)
    There is no answer because inverse family relationship (Q42248293) was created without defining it and outside of Wikidata this term is almost not in use. --Pasleim (talk) 09:17, 25 October 2017 (UTC)
    Ok, what about the antipode for 1) Q38791127, 2) parent (Q7566)? --Fractaler (talk) 12:54, 25 October 2017 (UTC)

    Is this an appropriate home for a large collection of data on Texas public school finances and performance?

    I have a large collection of data, over 2000+ rows and 5000+ columns, on Texas public schools all at the district grain, covering their financial and academic performance spanning 2000-2016. Is wikidata the right place to share that data? I am moderately familiar with the concepts of linked data and an capable python developer. Happy to hear the best way to integrate, if it's appropriate at all.

    Thanks!

    Jareyes210 (talk) 20:10, 22 October 2017 (UTC)

    Maybe not all, but at least some of this information is relevant, for instance the number of students each year is definitely on-topic.
    Here is the page detailing properties for schools: Wikidata:WikiProject_Education#Educational_institutions.
    Here is a showcased university, it does not seem to contain any financial/performance data though: University of Konstanz (Q835440). Cheers! Syced (talk) 06:29, 23 October 2017 (UTC)
    • Interesting question. I don't think much has been done in this field as the focus was mostly on colleges/universities.
      From a Wikidata perspective, it would be good to include most if not all information that doesn't change every year and that could be maintained going forward. If available, this could include details on every school in the district.
      Looking at w:List_of_school_districts_in_Texas and (e.g. as random sample) w:Aransas Pass Independent School District, the data on the articles could be stored either at Wikidata or in Commons (in tabular form). The information that is in the infobox might fit directly at Wikidata, maybe even basic financials as for companies (infobox at w:Alphabet Inc., properties at Wikidata:WikiProject_Companies/Properties).
      Tabular data on Commons is preferable for details with many datapoints. For a list of items, see this query.
      --- Jura 08:36, 23 October 2017 (UTC)
    • @Jareyes210, Jura1: Many schools and districts already have items, and I've been adding NCES District ID (P2483) and NCES School ID (P2484) as I can. I also would like to have complete coverage of primary through secondary schools and to treat those items with the attention we pay to items of postsecondary institutions. We can chat at Wikidata:WikiProject Education about the best way to move forward. For instance, I'm curious if the data you have is a subset of what's in the NCES databases or if there's Texas-specific information. That will inform the best way to import data. Runner1928 (talk) 16:47, 25 October 2017 (UTC)

    Universe (Q1)?

    Hello! I've noticed Universe (Q1) (Universe!) has inconsistent naming in tl (Tagalog). "Sansinukob", which is used in the article is different from "uniberso", used in this entry. Why? Are there policiy/ies that explain this inconsistency? - Gacelperfinian (talk) 07:32, 19 October 2017 (UTC)

    The content of the article is the business of tlwiki. If tlwiki wants it to be linked to a different Wikidata item that would also be okay. ChristianKl (talk) 10:23, 19 October 2017 (UTC)
    Are they different notions? Infovarius (talk) 09:27, 26 October 2017 (UTC)

    Formatting dates

    There seem to be no info on how to control the output format of dates.

    I see that in some cases the SERVICE wikibase:label would somehow format it in some ways, but first, I cannot use SERVICE (it triples the query time and it's over the limit) and second, rdfs:label doesn't quite seem to create the same output.

    Anyway. Is there any sane and direct way to format the dates, be that ISO (my preferred), or whatever else? Apart from doing it manually like cutting to date parts then concatenating the whoel damn thing? --grin 14:16, 25 October 2017 (UTC)

    A related question from my side: I want to use in an infoxbox for persons the template "Age", but in order to fill in the fields of this template correctly, I would need to know how to access separately the year of birth, month of birth and day of birth. Does anybody know? --Otets (talk) 14:35, 25 October 2017 (UTC)
    @Otets: In infoboxes you can use {{#statements:P569}}. This returns directly a formatted date.
    @grin: Can you explain where you want to control the output format? API, SPARQL query service GUI, SPARQL Endpoint? --Pasleim (talk) 14:58, 25 October 2017 (UTC)
    @Pasleim: You tell me: I need the json result of (name, birthday, deathday, birthplace, deathplace, occupation, citizenship) for all the people living or lived in European countries. So far I have used the GUI but it possibly isn't the best tool for this particular task (but maybe it is since as far as I see the SPARQL endpoint uses the same engine and limits, and API isn't for this). And I would prefer sane date format (±yyyy-MM-dd) instead of what's the default.
    "ISO" rings alarm bells. If you meanw:ISO 8601 you should be aware that some dates in Wikidata are in the Julian calendar, and ISO 8601 does not allow the Julian calendar. Also, when computing ages, if the birth date is in the Julian calendar and the death date is in the Gregorian calendar, your software will have to know how to convert Julian to Gregorian before computing the age. Jc3s5h (talk) 15:33, 25 October 2017 (UTC)
    @Jc3s5h: You say that like the current output format would actually handle this but it doesn't seem that way
    SELECT ?dob WHERE { wd:Q1394 wdt:P569 ?dob . }
    
    Try it! , so it's probably not related to the original question. Still, it's a valid point for my case and I don't readily see how to query for calendar model. Anyway, I want year-month-day and not something else, regardless of the calendar model. --grin 22:00, 25 October 2017 (UTC)
    I'm not terribly familiar with SPARQL, but apparently it derives it's results from the RDF format, which is an external specification. That specification says the calendar must be Gregorian, so when an RDF dump format is exported, a Julian calendar date is converted to Gregorian. I understand that the SPARQL RDF is similar to, but not identical to, the RDF dump format. JSON, on the other hand, does output the calendar model and does not convert calendars.
    You might find it helpful to experiment with people who only have a Julian calendar birth date, like John Dee (Q201484), rather than a person who has both a Julian and a Gregorian birth date stored. Experiment shows that for John Dee, which the user interface indicates was born 13 July 1527 Julian calendar*, a SPARQL query shows Jul 23, 1527, so SPARQL is converting Julian to Gregorian.
    *You can tell it's Julian because the user interface displays a Julian superscript for dates after about 1582, and a Gregorian superscript before then, to flag calendars used before they were created or after they were superseded (but reality is more complicated than that). Jc3s5h (talk) 00:23, 26 October 2017 (UTC)
    @grin: You mentioned ±yyyy-MM-dd. You should be on alert that the RDF format considers year 0 to exist, so 0000 is equivalent to 1 BC (or BCE if you prefer) and -0001 is equivalent to 2 BC. The user interface uses BCE. JSON follows some quirky outdated standards that deem year 0 to not exist, so -0001 is equivalent to 1 BC, -0002 is equivalent to 2 BC, etc. Jc3s5h (talk) 00:42, 26 October 2017 (UTC)
    @Jc3s5h: As long as it's known and consistent I don't really worry about that, but for me it's using some braindead format like "23 Oct 1956" while I clearly desire to get "1956-10-23". The calendar problem was a slightly unrelated, however interesting, topic. How can I get the standard date format in the JSON output please? :) --grin 10:32, 26 October 2017 (UTC)
    I'm far from being an expert on either JSON or RDF. I am not aware of any way to change the format of either output. To have a better chance of getting a useful response from someone else, you might want to say exactly how you are obtaining the output and what you want it for. Jc3s5h (talk) 10:41, 26 October 2017 (UTC)
    I have, above, replied to Pasleim. I use the json of persons to merge with external database and import the matches, and I passionately hate doing absolutely unnecessary date parsing of various braindead string formats. In my ideal world all dates are yyyy[-MM[-dd]] format (with the optional additions of calendars, times and timezones and precision) and using a defined calendar. I need it for and in the SPARQL output. --grin 11:44, 26 October 2017 (UTC)

    Economic data

    I tried to input some economic data to add to Consumer spending (Q5164722), but I couldn't find a suitable way to put it in. The data is concerning each country's consumer spending as a % of GDP. Which statement/identifier would I use? 0Jak (talk) 22:59, 25 October 2017 (UTC)

    @0Jak: does total expenditure (P2402) fit your needs? Probably qualified by determination method (P459) or applies to part (P518) with Consumer spending (Q5164722) I guess. It would expect a total value, not a % of GDP, but that should be computable if GDP has also been entered. There are often multiple ways to model data in wikidata, so look around at how this has been done in similar cases to try to be consistent. ArthurPSmith (talk) 08:44, 26 October 2017 (UTC)

    country of citizenship (P27) on properties

    When I first came to Wikidata, I added a few inception (P571) to properties in order to document the date those properties were actually created. A handful of contributors came to my talk page to stress how bad an idea that was because it breached some kind of symbolic gap between properties for items and properties for other properties. I said that instance of (P31) was already breaching this on every single property page and the debate stopped soon after as I had stopped to add inception (P571) on other properties. Flash forward to today. country of citizenship (P27) is now widely used on properties and I'm certainly partly responsible for this. It is very useful to be able to easily find all properties related to this or that country. But this also breaches the said symbolic gap. I therefore wonder if things should stay like this. And I think I have a suggestion. Instead of having country (P17)Canada (Q16) on Panthéon des sports du Québec ID (P4416), for instance, I suggest we would have instead instance of (P31) → new 'Wikidata property related to Canada' item. I know we would have to recode Template:Property documentation accordingly but I see at least one other advantage on top of country of citizenship (P27) not breaching the gap – applies to jurisdiction (P1001) would not breach it either! In my example, Panthéon des sports du Québec ID (P4416), we have, so far, applies to jurisdiction (P1001)Quebec (Q176). Wouldn't it be more convenient to have instance of (P31) → new 'Wikidata property related to Québec' item, with the latter a subclass of (P279) the new 'Wikidata property related to Canada' item? I would totally support such a move. Thierry Caro (talk) 00:43, 26 October 2017 (UTC)

    I personally think it is fine to use a property on both properties and regular items, as long as the meaning is substantively the same in both cases. instance of (P31) is our standard class designator and it really does have the same meaning. But we don't use subclass of (P279) on properties as properties can't be classes - the relevant relationship property for properties instead is subproperty of (P1647). In the case of countries, country (P17) seems fine to me, it just means this entity is associated with that country. However, I would question using country of citizenship (P27) - a property can't be a "citizen" I don't think! ArthurPSmith (talk) 08:38, 26 October 2017 (UTC)
    The big problem with country in combination with inception is that often a country did not exist at the time. It is a bad idea. The location infers a country theoretically even in time. Thanks, GerardM (talk) 09:56, 26 October 2017 (UTC)

    For buses, instance of (P31) bus route (Q3240003) vs. transit bus (Q1373492)

    Near 5K+ items are linked to bus route (Q3240003), near all are instance of (P31) bus route (Q3240003), otoh 8 items are linked to transit bus (Q1373492) (all except one of them are P31 linked), so where are rules here? --Liuxinyu970226 (talk) 04:07, 26 October 2017 (UTC)

    Am I missing something? A transit bus is a physical object or class of objects which people drive or are passengers within, whereas a bus route is a planned scheme.  — billinghurst sDrewth 04:45, 26 October 2017 (UTC)

    Diocese

    Columbus Hall (Q19866795) is part of the Roman Catholic Archdiocese of Newark (Q1365140) there are three ways to display that relationship, which of three is best, or include all? We have "diocese" "owned by" and "part of". All three are correct but redundant. Any ideas? --Richard Arthur Norton (1958- ) (talk) 15:34, 26 October 2017 (UTC)

    No, a diocese is an administrative organisation parishes et al are part of. The suggestions are imho all wrong. Thanks, GerardM (talk) 19:58, 26 October 2017 (UTC)

    Catholic schools and cemeteries are also "part of"/"owned" by the diocese according to their website. --Richard Arthur Norton (1958- ) (talk) 21:56, 26 October 2017 (UTC)
    That would be country specific, you should provide evidence/reference.  — billinghurst sDrewth 22:47, 26 October 2017 (UTC)
    Definitely not diocese, I think that should be removed as it diocesan function is different than that of a cathedral which is inherent within the see. For "owned by" that would be okay if the evidence exists that the diocese owns it, rather than some other entity within the church, and would have a start and end date. "Part of" you would need to qualify that to make it acceptable IMO.  — billinghurst sDrewth 21:09, 26 October 2017 (UTC)
    Also to note that a building may have the diocese as an occupant (P466) again would need start and end dates.  — billinghurst sDrewth 22:47, 26 October 2017 (UTC)

    Hoi, "diocese a district under the pastoral care of a bishop in the Christian Church." This makes whatever you refer to part of an area.. The diocese itself may have an office ... Thanks, GerardM (talk) 13:26, 27 October 2017 (UTC)

    Wikidata is becoming a proper citizen of the linked open data web

    Hey everyone,

    Wikidata’s birthday is still a few days away but since there are no deployments on Sundays here is early birthday present number 2 ;-)

    One of Wikidata’s most important but often under-appreciated areas is the external identifiers. They link Wikidata with by now more than 2000 databases, knowledge bases, catalogs and more. External identifiers allow people and machines access to more information about a given topic and help identify the same concept in other databases.

    The identifier can often be expanded to a full URI. (For example, LoC ID n81114174 becomes http://id.loc.gov/authorities/names/n81114174.) This full URI can then be used in the linked open data web to match our data with other datasets and use both of them together easily.

    From today on, Wikidata has full URIs for statements that represent external identifiers in its RDF exports, and thereby becomes a proper citizen of the linked open data web. To make this work the property for the external ID needs to have a statement with property formatter URI for RDF resource (P1921). I’m looking forward to seeing what new things are going to be built with this and how we will show up on http://lod-cloud.net.

    Cheers --Lydia Pintscher (WMDE) (talk) 17:22, 26 October 2017 (UTC)

    You can see the RDF details in RDF Format docs (and for qualifiers/references too). Smalyshev (WMF) (talk) 17:27, 26 October 2017 (UTC)
    This is very interesting, but I'm having trouble wrapping my head around it. Can someone give an example of a query which combines data from wikidata with something from another of these databases using this RDF mapping? --99of9 (talk) 03:02, 27 October 2017 (UTC)

    SPARQL about relations to an entity, human readable

    How to get a list which contain all entities connected to lemon as a target, human readable as "itemid, itemLabel, propertyLabel"? I cannot seem to be able to get entities but statements, and I wasn't able to figure and example out how to get entities and their labels. (Anyway, is there some terse guide about using various rdf stuff like wikidata: and schema:?)

    # things related somehow to lemon!
    SELECT ?item ?itemLabel ?propertyLabel  WHERE {
      ?item ?rel wd:Q1093742.
      ?property wikibase:statementProperty ?rel .
      SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
    }
    LIMIT 50
    

    Try it!

    Thanks! --grin 10:22, 27 October 2017 (UTC)

    mw:Wikibase/Indexing/RDF Dump Format (linked from Help menu as RDF Data Model).
    In your example, you could replace statementProperty with directClaim.Matěj Suchánek (talk) 13:08, 27 October 2017 (UTC)
    But the referenced page is unreadably detailed, and not quite tailored to Wikidata+SPARQL examples; the opposite what I usually would describe as "terse". As for "directClaim", thank you, works! I am accumulating my failures to create examples out of them. ;-) --grin 14:00, 27 October 2017 (UTC)

    Suspected vandalism: Q7877165

    Hi! Can someone please take a look at Q7877165, as the recent edits by User:197.200.75.251 seem to be vandalism. Thanks, Daylen (talk) 15:15, 27 October 2017 (UTC)

    • I also felt so and reverted the edits, which is of course open for discussion, if I am wrong. Thanks. --Titodutta (talk) 18:33, 27 October 2017 (UTC)

    pywikibot / warning & errors when adding claim sources

    Hi Project chat,

    I am currently developing a bot User:Alexabot based upon python3/pywikibot (latest version). When I did a test run with a small amount of itmes to add, I ran into the following issues

    • Warning on editing my newly added claim Alexa rank (P1661), namely setting the old claims to "normal"rank and my newly added to "preferred"

    WARNING: API warning (wbsetclaim) of unknown format: {'messages': [{'name': 'wikibase-conflict-patched', 'type': 'warning', 'parameters': [], 'html': {'*': 'Your edit was patched into the latest version.'}}]}

    This one I consider minor, since it's only a warning. However I would like to know the source of this and how to avoid it.

    • Error on adding sources to my newly added claim

    editconflict: Edit conflict. Could not patch the current revision. [messages:[{'name': 'wikibase-api-editconflict', 'parameters': [], 'html': {'*': 'Edit conflict. Could not patch the current revision.'}}, {'name': 'edit-conflict', 'parameters': [], 'html': {'*': 'Edit conflict.'}}]; help:See https://www.wikidata.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce> for notice of API deprecations and breaking changes.]

    This one is my major problem. It occures always but in two cases when I try to add the sources (point in time (P585),archive URL (P1065),reference URL (P854)) to the claim I just made.

    Examples for bad cases (successfully adding Alexa rank (P1661)) but failing to add the claim sources) are

    Examles for good cases (successfully adding Alexa rank (P1661)) as well as the claim sources)

    While researching the source of the error I came across the information that one source of the error is when someone is editing the item at the same time. However my observation is that it always works with Internet Movie Database (Q37312) and Urban Dictionary (Q310004) and fails with other items. Furthermore I considered the case that the timespan between the action of adding the claim, setting the rank to preferred and adding the claim sources is too short. When going through my logs I can see the throttling (i.e. Sleeping for 9.1 seconds, 2017-10-18 23:20:57 done by pywikibot (see code snipped below) so I assume that this isn't the issue either.

    updating alexa rank for Q918 with page https://twitter.com and ranking 13
    creating alexa ranking claim...
    Sleeping for 8.6 seconds, 2017-10-18 23:21:28
    setting all existing claim ranks from preferred to normal
    setting RANK to preferred...
    Sleeping for 9.1 seconds, 2017-10-18 23:21:37
    WARNING: API error editconflict: Edit conflict. Could not patch the current revision.
    error occured updating alexa rank for item:Q918 with offical webpage https://twitter.com
    editconflict: Edit conflict. Could not patch the current revision. [messages:[{'name': 'wikibase-api-editconflict', 'parameters': [], 'html': {'*': 'Edit conflict. Could not patch the current revision.'}}, {'name': 'edit-conflict', 'parameters': [], 'html': {'*': 'Edit conflict.'}}]; help:See https://www.wikidata.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce> for notice of API deprecations and breaking changes.]
    

    Any suggestions, hints, tips & tricks so resolve the error and/or the warning described above are welcome and appreciated.

    Thanks in advance.

    --Tozibb (talk) 10:30, 31 October 2017 (UTC)

    Hi. I'm also a Pywikibot user.
    The first (informative) warning occurs when you make changes on a single item and the internals are not properly updated. I assume you use changeRank() to change the rank. When I compare the method with other pieces of code, I can't see how claim.on_item.latest_revision_id is updated when the change is successful, unlike other methods. So this actually looks like a bug in Pywikibot.
    I believe the second issue is basically the same problem but in this case, your edit is blocked because you try to edit the same thing providing the same revision id, ie. not updating the internals.
    I'll see what I can fix in Pywikibot and post here again. Matěj Suchánek (talk) 13:34, 31 October 2017 (UTC)
    Hi Matěj Suchánek,
    thanks for your update as well as the explanation! I do use changeRank() as you suggested some time ago. If I understood you correct, I might be able to solve this problem by pulling the latest revision of the item each time I made a change/edit to it. I will give it a try and report back here. Despite this I think that this issue should be solved/handled in the framework/library itself - so thanks for taking care of that.
    --Tozibb (talk) 13:59, 31 October 2017 (UTC)
    Hi Matěj Suchánek,
    I solved the problem by following the approach described above, namely pulling the latest revision of the item before making further edits/changes pywikibot.ItemPage(repo, qCode). Neither errors nor warning are shown in the log - though I just tested 10 items. All of the changed items show property Alexa rank (P1661) along with the claim sources (point in time (P585),archive URL (P1065) and reference URL (P854)).The test items I updated are

    Please let me know if you want me to file a bug/improvement request against pywikibot, if I should provide sample code to reproduce it or I can help you otherwise (am not a python pr0 but I have good understanding of software development and the processes associated with it).

    Again, thanks for your help!

    --Tozibb (talk) 14:53, 31 October 2017 (UTC)
    Your original code would definitely help in deciding whether there was an error in your code or in Pywikibot (as well as the "fixed" one to show what needed to get fixed). If you are familiar with Wikimedia's Phabricator, you can also file a bug report there.
    I would also like to ask you to only make test edits on testing items (like Wikidata Sandbox (Q4115189)) or testing repository, so that item histories and recent changes don't get spammed. Thank you. Matěj Suchánek (talk) 15:04, 31 October 2017 (UTC)
    Hi Matěj Suchánek,
    I created task T179409 providing error details, sample code and mentioned your findings as well. Let me know if you need more details/information on this. I will delete this post in a day or so. Regards, Tozibb (talk) 14:14, 2 November 2017 (UTC)
    Let's just have it archived. Matěj Suchánek (talk) 16:15, 2 November 2017 (UTC)
    This section was archived on a request by: Matěj Suchánek (talk) 16:15, 2 November 2017 (UTC)

    Proposal on citation overkill

    I propose that the spirit of the essay on the English Wikipedia "Citation overkill" should apply to Wikidata. If a property already has one or more online high quality sources that actually verify the claim, it is not a good practice to indiscriminately add additional sources. Furthermore, when a high-quality online source is found, citations to unreliable sources such as Wikipedia should be removed.

    Citations to paper sources should not be removed unless the editor removing it can actually read the source and determine that either that it doesn't support the claim, or is a less reliable source that the other sources that are also cited.

    The property described by source (P1343) is available if it's felt there is a need to list every source that covers the item.

    This issue came up at User talk:Magnus Manske#Turn off your bot. I did not start that thread, but did express the view that it isn't appropriate for a bot to add references to a property without considering what references are already present. Jc3s5h (talk) 12:12, 25 October 2017 (UTC)

    That thread was started for a different topic (bad date imports because of upstream errors), and then "hijacked" by you for this discussion (after a lengthy, fruitless chat about calendar models below). My thoughts on this:
    • Wikidata is not Wikipedia. The deliberately build reference mechanisms in Wikidata allow for, and encourage, a large number of references
    • As long as there is no "ranking" of "high quality sources", this discussion is rather academic, so do we rank VIAF, GND, or EB higher? What about MusicBrainsz?
    • For the most part, the sources for the references my bot adds have their own Wikidata property. That would qualify them as "high quality" by default, for Wikidata, in this context
    • I agree that "imported from Wikipedia" should be removed once there are actual, citable sources. But that's a different discussion that doesn't need to be mashed into this one
    As you know, I am happy to make reasonable adjustments to the bot, so let's discuss! --Magnus Manske (talk) 12:46, 25 October 2017 (UTC)
    I fully support Magnus’ position. A further point to consider is that “citation overkill” is said to complicate reading and editing Wikipedia articles. However, Wikidata is not read in the Wikipedia sense, and also editing works differently (both via machines, which do not struggle with lots of citations). So the main aspect of the Wikipedia “citation overkill” essay does not apply here… —MisterSynergy (talk) 13:13, 25 October 2017 (UTC)
    @Jc3s5h: The problem with your proposition is "what is a high quality source" ? Just starting to answer that question can lead to endless discussions in some topics. The real solution is to constraint large data import to an agreement before importation. The problem is not coming from the number of sources added, the problem is coming from the fact that a lot of persons are working in a solitary way and just import data without any preliminary discussion with Wikiproject. Snipre (talk) 14:29, 25 October 2017 (UTC)
    Exactly, there is no "citation overkill" in Wikidata. Syced (talk) 04:08, 26 October 2017 (UTC)
    In cases where some sources say X and other sources say Y it's often quite useful to have multiple citations at each source. High-quality sources can be wrong and when someone tries to figure out what's actually right it's benefitial to have multiple citations. ChristianKl (talk) 14:46, 25 October 2017 (UTC)
    Sometimes the relative quality of sources is not obvious, but sometimes it is. If there's any doubt, and they both say the same thing, by all means keep both.
    If a source that is universally esteemed and widely consulted clearly has the wrong information, it would be useful to keep the wrong value and the erroneous source, and give the property a rank of deprecated. Jc3s5h (talk) 15:05, 25 October 2017 (UTC)
    You don't know beforehand which sources will turn out to be wrong. There's a moment when multiple values get entered and there's a time to reconcile the different statements. When that moment comes it's useful to have multiple sources at which you can look to make the decision which values you want to depricate. ChristianKl (talk) 15:42, 25 October 2017 (UTC)
    I have a fair idea which sources I regard as reliable, and on the whole I would not regard other databases as high quality sources (there are databases which are pretty good, but also plenty which are not). - Brya (talk) 16:49, 25 October 2017 (UTC)
    I see no problem with a handful of citations on a single datapoint. Overuse of described by source (P1343) - which says "there is some information on this topic in an external source" but without specifying what it actually sources - seems much less useful. Andrew Gray (talk) 18:24, 25 October 2017 (UTC)
    I second the view that automatically adding of sources just because they are there isn't helpful, well-considered it might be. The bot has also added sources that have already been there like here and added dates where the original sources states it with "circa" which the bot did not respect. --Marsupium (talk) 23:17, 25 October 2017 (UTC)
    That is indeed a case where, essentially, the same citation is duplicated, and the others should be removed (I will look into that). However, this is a special case, a technical artefact, not "too many sources" as was the original point of this thread.
    As for the "ca.", I work of the description I got for Mix'n'match, which might have come from an older/abbreviated version (see "catalog description" here). I have taken great care to avoid "ca." and the like from the descriptions I have, but I may have faulty ones in some cases. If there are systematic problems with a specific site (e.g., RKD artists) in my dataset, I can deactivate it for bot use. However, again, a specific fault with this specific bot operating on a specific dataset is not the topic of this thread, which is much larger. --Magnus Manske (talk) 07:44, 26 October 2017 (UTC)
    I doubt it is only about "a specific fault … on a specific dataset". The task you try to tackle is giant and has flaws in many respects. I do not think that it makes sense to ask: "Do we want to import all dates of birth and death collected for Mix'n'match to Wikidata?" Indeed this depends on the specific source, its quality and on the specific problems which arise when trying to scrape it in an automated way. That can't be separated from the question about the number of sources for a statement. Most will agree I guess that we do not want the myriads of newspaper articles and school books that mention the end date of the Second World War to source the corresponding statement. We need a discussion on where to draw the line. For this case, an "opt-in" is as much possible as an "opt-out". BTW: I think there wasn't a bot request, but some kind of announcement when the bot started, thanks in advance to anyone who can provide a link! --Marsupium (talk) 14:56, 28 October 2017 (UTC)

    Military branch

    Should we be using the smallest subdivision like Signal Corps (Q736213) or even smaller is we have them like we do for places of death. We add in the hospital if we know it. Or only the largest part like United States Army (Q9212). I will add instructions like we have for place of death. Place of death (P20) "the most specific known (e.g. city instead of country, or hospital instead of city)". --Richard Arthur Norton (1958- ) (talk) 17:57, 26 October 2017 (UTC)

    The guidance on military branch (P241) seems to suggest the top-level branch (US Army, Danish Navy, etc); for a specific unit, member of (P463) is best. We still need to do a general overhaul of military properties, though... I'll try and set aside some time soon to work on this. Andrew Gray (talk) 22:30, 26 October 2017 (UTC)
    A relating matter was discussed on 11 October 2017 here Wikidata:Project chat/Archive/2017/10#Any showcase items or examples for military figures Breg Pmt (talk) 19:13, 27 October 2017 (UTC)

    Stated in ... website?

    When I format a reference as <stated in> Recreation.gov (Q28649055), I get a constraint violation "Recreation.gov is not a subclass of information". Should I just use <reference URL> for these, or should websites be subclasses of information somehow? - PKM (talk) 18:19, 27 October 2017 (UTC)

    Think websites should classify as information. I therefore added information (Q11028) on intellectual work (Q15621286) which is further up the chain. ChristianKl (talk) 19:24, 27 October 2017 (UTC)
    Excellent, thank you. - PKM (talk) 02:29, 28 October 2017 (UTC)
    Please use reference URL as well as the stated in property, so it's clear which page in the source has been used. Thanks. Mike Peel (talk) 12:36, 28 October 2017 (UTC)

    Q and P input lookahead has become case sensitive—can someone else confirm?

    (Up to some hours ago) When entering in a property or its corresponding item, if you were entering in the underlying identifier, that is typing "p31" or "q3331189" "q1860" the code lookup would be case-insensitive. Overnight—presumably during code update—such actions have become case-sensitive, meaning that it needs to be "P31", etc. Could others please confirm that behaviour and I will poke in a phabricator ticket.  — billinghurst sDrewth 21:55, 25 October 2017 (UTC)

    @Smalyshev (WMF): I am guessing that my report is related to your change over to the new search function.  — billinghurst sDrewth 21:57, 25 October 2017 (UTC)
    Indeed, looks like ID match is case-sensitive, even though label match is not. This is not right, I'll fix that. Smalyshev (WMF) (talk) 22:02, 25 October 2017 (UTC)
      Done this is now fixed. Smalyshev (WMF) (talk) 17:28, 26 October 2017 (UTC)
    Only working for items, not properties. Type "p31" and match not found, type "P31" and "instance of" appears.  — billinghurst sDrewth 23:07, 26 October 2017 (UTC)
    @billinghurst: please re-check, I think that was the consequence of the rollbacks due to unrelated performance issues. Should be ok now. If still isn't, please tell me. Smalyshev (WMF) (talk) 23:34, 1 November 2017 (UTC)
    (-: resolved for me in Q and in P. Thanks for your work.  — billinghurst sDrewth 00:42, 2 November 2017 (UTC)
    This section was archived on a request by: Matěj Suchánek (talk) 16:52, 3 November 2017 (UTC)

    half (Q39373172)

    "one of two equal parts". What is the name of the antipode? --Fractaler (talk) 09:06, 20 October 2017 (UTC)

    What should an antipode of "half" be? --Anvilaquarius (talk) 07:51, 22 October 2017 (UTC)
    "half" is "one of two equal parts"? --Fractaler (talk) 18:17, 22 October 2017 (UTC)
    antonym (Q131779) (antipode) for "one of two equal parts" is "one of two unequal parts" --Fractaler (talk) 17:08, 29 October 2017 (UTC)

    Q5705

    Any idea what we should do with this protected edit request? I am sure qualifiers are needed which would express the POV of the Spanish government vs Catalonian government, but I am not sure what exactly should be there.--Ymblanter (talk) 12:02, 28 October 2017 (UTC)

    We have two worldview here:
    1. Spanish goverment view (damn rebbels!)
    2. Goverment of Catalonia (we're a new country now!)
    Assuming we want to incoporate both views, we need to add the statements as viewed by both camps and use statement disputed by (P1310) to indicate the other camp is opposing it. But before we do it, maybe just wait a couple of days to see how all evolves? Rushing anything will not be very good for the quality and will turn the item in some sort of political battleground. Multichill (talk) 13:31, 28 October 2017 (UTC)
    I agree, though I specifically meant the question whether Carles Puigdemont is still the president: From the POV of Spain, he has been dismissed (see references to the Constitution), from the POV of Catalonia he is in the office.--Ymblanter (talk) 13:53, 28 October 2017 (UTC)
    Simply wait a little bit. Another recent example is Crimean Peninsula (Q7835). --Succu (talk) 22:00, 28 October 2017 (UTC)
    Just interesting, how long to wait? >3 years has gone for Q7835, for example. --Infovarius (talk) 09:25, 29 October 2017 (UTC)
    Crimean Peninsula (Q7835) isn't protected, so you don't have to wait to add statement disputed by (P1310) wherever you like and can source it. ChristianKl (talk) 09:57, 29 October 2017 (UTC)

    Treaty

    Regarding treaties. A treaty is signed on a point in time (P585) it is then publication date (P577) on the same date. But is is not entered into force start time (P580). Should point in time (P585) be used at all? Ref International Convention on Tonnage Measurement of Ships (Q1979787) Breg Pmt (talk) 16:13, 28 October 2017 (UTC)

    As far as I understand many treaties are signed by different treaty parties on different days and a treaty text is often published publically before it's signed by all treaty parties. ChristianKl (talk) 20:49, 28 October 2017 (UTC)
    What treaty? This treaty (Q131569) one? --Succu (talk) 21:35, 28 October 2017 (UTC)

    @Succu: Specific it is International Convention on Load Lines (Q1473305) who is instance of (P31) treaty (Q131569) and IMO Code (Q30143872). @ChristianKl: this treaty is constructed such that it is signed after a Meeting held in the member organisation on a point in time (P585) and is then likely to be public on the same date and publication date (P577) since it is signed. But since the treaty requires to enter into force when two third of the signer have ratified the treaty it will then be valid start time (P580). For an individual country as Norway who is participant (P710) I will use start time (P580) as a qualificator. I am asking as I will continue to work With several other IMO Code (Q30143872) Breg Pmt (talk) 22:37, 28 October 2017 (UTC)

    For a date that a treaty or a law take in force, it think the best solution is to add significant event (P793) --> coming into force (Q490812) with point in time (P585) as a qualifier. --Fralambert (talk) 23:24, 28 October 2017 (UTC)

    @Fralambert: will start time (P580) be better to use than point in time (P585) as there is a movement of time. As the preciding Loadline Convention of 1930 was in force start time (P580) 1930 end time (P582) 1966. Breg Pmt (talk) 23:56, 28 October 2017 (UTC)

    @Pmt: Yes, start time (P580) is probably better that point in time (P585). --Fralambert (talk) 01:39, 29 October 2017 (UTC)

    Entering date only known with lunar calendar

    So, I want to put data only known as lunisolar calendar (Q194235), which is bit frequent in Korean, especially for Joseon (Q28179) or previous people. But it seems native date format doesn't have native support for lunar calendar - how do I put such data? Via qualifier and a property? Then which property? — regards, Revi 18:11, 29 October 2017 (UTC)

    if the date can be translated to a Gregorian date (at least roughly) you could put that in as the date entry and then maybe use stated as (P1932) to add the string form of the date as it was actually originally recorded? We also have refine date (P4241) that was intended to help with a similar problem but I don't think it helps in this case. ArthurPSmith (talk) 20:42, 29 October 2017 (UTC)
    Not sure stated as (P1932) would be helpful: That property has instance of (P31) Wikidata property for items about works (Q18618644) and facet of (P1269) work (Q386724), so not kind of for dates, imo. I found things like 22nd day of the 3rd month in the Chinese calendar (Q839162) which might be useful for refine date (P4241), but not sure if this is right property, too. — regards, Revi 06:09, 30 October 2017 (UTC)

    George Patton as Cavalryman

    Can I express George S. Patton (Q186492) as cavalryman by using field of work (P101) -> cavalry (Q47315), as military branch (P241) seems to be used only for national armies, navies and air forces Breg Pmt (talk) 19:39, 29 October 2017 (UTC)

    VIAF duplicates

    Is there anyway to separate out the two types of VIAF conflicts we have into two different reports? At Wikidata:Database reports/Constraint violations/P214 we have:

    1. Ones we can correct where we have two entities assigned the same VIAF number, one is usually an import error, or if both are correct they need to be evaluated for merger. It is a great way to look for duplicate entries like the doppleganger tool that looked for people that were born and died on the same day.
    2. Ones only the VIAF people can correct, there are two or more VIAF numbers assigned to the same entity and the VIAF people need to merge them. We need to let them know that these need to be double checked to see if a merger on their part is appropriate. --Richard Arthur Norton (1958- ) (talk) 04:53, 24 October 2017 (UTC)
     ? they are in 2 separate sections of the page - you can point the VIAF people at the relevant portion via the direct link - here - or you could pull out that part of the page into a document to send to them I guess? ArthurPSmith (talk) 13:36, 24 October 2017 (UTC)
    I agree there should be a way to flag the cases that we cannot fix on the items themselves, so that we could generate a clean list of ids we believe should be merged in the source… Otherwise, the full list might contain items which have other issues that we can fix ourselves. Of course if we can solve all these problems then the constraint violation report will contain a clean list. But in general the list is too long for a human editor to do all this by themselves, and we don't want many editors to stumble upon the same cases that are waiting for a fix upstream…
    I'm not sure how they should be flagged though (ranks? qualifiers?) − Pintoch (talk) 15:19, 24 October 2017 (UTC)
    In the first case (2 items with same VIAF id) there is also the cases (that only VIAF people can correct) of mixed up viaf clusters, where info about 2 different people were mixed up. Those are very problematic, and lead to repeated imports of wrong other ids that need to be cleaned up. I notified a bunch of those to Viaf directly 2 weeks ago, going down an error report by Magnus Manske, but a better solution to notify them of flagrant cluster errors found would be a good thing. --Hsarrazin (talk) 10:14, 30 October 2017 (UTC)

    Barnstars

    Hello
    I was checking this page Wikidata:Barnstars, is it the central page of our barnstars? If so, perhaps we can do some work to improve the page, I would love to work here. Ideas and comments please? Many Template:Barnstar wand --thanks. --Titodutta (talk) 18:30, 27 October 2017 (UTC)

    Yes, I think this is the central page. I do not think I have ever seen anybody to receive a barnstar though.--Ymblanter (talk) 10:54, 28 October 2017 (UTC)
    Why don't we import simply the "love" icon like enwiki? You know, that shape of hearth? In the end I used enough on enwiki and also on commons, it is not bad. I usually thank newbie for a bunch of edits, but giving them a barnstar... why not?--Alexmar983 (talk) 11:13, 28 October 2017 (UTC)
    I do not think we have any community decisions concerning the barnstars, and you are perfectly welcome to import here whatever you want and use it (the files are on Commons anyway).--Ymblanter (talk) 12:50, 28 October 2017 (UTC)
    I believe Alexmar983 was referring to enabling the WikiLove extension on Wikidata. Jon Harald Søby (talk) 13:56, 28 October 2017 (UTC)
    yes, that's its name. I didn't see the ping BTW.--Alexmar983 (talk) 09:18, 30 October 2017 (UTC)
    • Ymblanter and all, should we go ahead an improve the portal? Once the portal is ready, it would be easier to install extension such as WikiLove or work on necessary arrangements. Regards. --Titodutta (talk) 15:10, 28 October 2017 (UTC)
    we definitely need a magic wand of ontology barnstar. Template:Barnstar wand -- Slowking4 (talk) 00:35, 30 October 2017 (UTC)
    I added documentation to Wikidata:Barnstars. Runner1928 (talk) 15:54, 30 October 2017 (UTC)

    People with incorrect BNA data

    Hello, I found two errors that I am unable to fix:

    How can we fix this and tell BNA about these errors? Maybe there are more of these? Thanks, --Gnom (talk) 17:42, 29 October 2017 (UTC)

    The external database seems fine, that was a bad import on our side. Just remove the claims and it is fixed for these two items. @Magnus Manske might want to have a look at them as well to make sure that it does not happen again. —MisterSynergy (talk) 18:08, 29 October 2017 (UTC)
    Are there any other items that are affected by this error? --Gnom (talk) 18:12, 29 October 2017 (UTC)
    Yes, it seems: Wikidata:Database reports/items with P569=P570. Matěj Suchánek (talk) 07:50, 30 October 2017 (UTC)

    Accessing the Wikidata games by focusing on all games that deal with a specific item

    After Wikidata Con I was thinking about the fact that WikiFactMine now wants to start with a Distributed Game instead of using the Primary Source Tool because the Distributed Game provides more opportunities to show the user the justification for adding a new statement. While I see the advantages of that approach, one of the great things about the Primary Source Tool is that a person who's interested in item QXXXXXXX can go to QXXXXXXX and see all statements that the Primary Source Tool has for the given item. If a scientist cares about a given protein, might not be interested to go through random items with the Distributed WikiFactMine Game but would be interested in going through the potential statements for QXXXXXX.

    It would be great if the Distributed Game would have a capability that allows the browsing of all games that contain potential statements for a given item. Futhermore it would be nice to have a gadget that can automatically show the user who browses a given item when there are potential statments available for the given item. ChristianKl (talk) 18:46, 29 October 2017 (UTC)

    If there was an API, this would be simple. The problem is, the Game only serves as a wrapper around many other APIs. So you need to query for all of them which won't be very effective. Matěj Suchánek (talk) 07:48, 30 October 2017 (UTC)

    Death at sea

    Is there a way we track all the people that died-at-sea, people died aboard a ship? --Richard Arthur Norton (1958- ) (talk) 19:28, 29 October 2017 (UTC)

    I propose place of death (P20) Atlantic Ocean (Q97), of (P642) Vrouw Maria (Q54887), but there should be a better way.--Bigbossfarin (talk) 20:03, 29 October 2017 (UTC)
    That is the way I have been handling it too, but it would be nice if we could have one umbrella term or property for us to be able to tally how many people die this way, without searching every ocean as place of death. Any suggestions?
    I would always have the qualifier stated as (P1932) with the free text "at sea" for such items. Also noting that we are not always lucky as to name the ocean, so there the option is to use the "unknown value" component.  — billinghurst sDrewth 04:12, 30 October 2017 (UTC)
    Related: Wikidata:Project chat/Archive/2015/06#place of death (P20) - at sea. Matěj Suchánek (talk) 07:45, 30 October 2017 (UTC)
    This may relate to my question some time back about how to record "lost at sea" (meaning presumed dead). We never came to good conclusion on that. - PKM (talk) 17:48, 30 October 2017 (UTC)

    Doing a simple query and extract

    I'd like to do a query which I suspect is very simple, but despite trying the intro to the wiki data query service I don't quite see how to do it.

    I'd like to do an extract of the latitude and longitude of all members of this category en:Category:State parks of Connecticut


    Can someone help me do this? (I'd be happy to do it interactively in IRC)--Sphilbrick (talk) 20:35, 29 October 2017 (UTC)

    #defaultView:Map
    SELECT ?item ?title ?c WHERE {
      SERVICE wikibase:mwapi {
    	 bd:serviceParam wikibase:api "Generator" .
         bd:serviceParam wikibase:endpoint "en.wikipedia.org" .
         bd:serviceParam mwapi:gcmtitle "Category:State parks of Connecticut" .
         bd:serviceParam mwapi:generator "categorymembers" .
         bd:serviceParam mwapi:gcmprop "ids|title|type" .
         bd:serviceParam mwapi:gcmlimit "max" .
         ?item wikibase:apiOutputItem mwapi:item .
         ?title wikibase:apiOutput mwapi:title  .
      }
      filter exists{ ?item wdt:P625 [] }
      ?item wdt:P625 ?c
    }
    
    Try it!
    For future similar requests, you can ask them on Wikidata:Request a query --Pasleim (talk) 23:32, 29 October 2017 (UTC)
    i also did a petscan https://petscan.wmflabs.org/?psid=1366635 ;-p Slowking4 (talk) 00:25, 30 October 2017 (UTC)
    Thanks, that's the type of output I was expecting. Now I have to decide whether to get lat long and use location maps, or try to use OSM.--Sphilbrick (talk) 13:23, 30 October 2017 (UTC)

    WikidataCon & Wikidata birthday

    Hello all,

    As you know, this weekend we celebrated Wikidata's fifth birthday, during the WikidataCon in Berlin, but also in Tokyo, Seoul and Munich.

    We collected a lot of amazing presents and stories, that you can find on the birthday page.

    During the WikidataCon, 200 members of the community met, shared their knowledge, experiences, favorite tools, and discussed about a lot of topics. We're currently trying to document the outcomes of the conference as best as possible. You can already find some content on:

    We are also very happy to announce that the WikidataCon will be back in October 2019 in Berlin, to host even more attendees and more exciting content.

    In 2018, we encourage all the local communities to organize their own events around the sixth birthday, in order to build a Wikidata birthday around the world. Meetups or trainings, conferences or parties, big or small, all the ideas are welcome to celebrate Wikidata with your local community and engage new editors. More information will come in the next months, but you can already mention your projects on this page.

    Cheers, Lea Lacroix (WMDE) (talk) 12:38, 30 October 2017 (UTC)

    Wikidata Concepts Monitor

    Hey folks :)

    As you might already have seen in the birthday presents list there is another birthday present: the Wikidata Concepts Monitor (WDCM - http://wdcm.wmflabs.org). It is a tool that enables you to browse and build an understanding of the way Wikidata is used across the Wikimedia projects.

    Here’s the technical gist behind it: Currently 789 projects have client-side Wikidata usage tracking enabled, which allowed us to built a system that counts the number of pages using a particular Wikidata item per project. The count data were subjected to statistical modeling (1) by an unsupervised statistical learning algorithm (2) that is typically used in distributional semantics (3) to discover the most natural groupings of Wikidata items in 14 semantic categories (4) in respect to the way they are used across the Wikimedia universe by the respective communities.

    We hope for the WDCM system to become a tool that helps you discover. Beyond Wikidata’s syntax and semantics we are now beginning to learn about its pragmatics: the way Wikidata items will cluster in respect to how they are used is not necessarily the same as the way they go together in the Wikidata formal ontology. WDCM is the first step towards building an understanding of the highly complicated structure of Wikidata usage. This system can help you discover what Wikidata client projects are similar and in what respect, what semantic categories of items are used more or less frequently across 789 projects, how do items connect in respect to how similarly they are used by our communities, what are the most popular items per project, and many more (hopefully) interesting things.

    Check out the WDCM and don’t forget to let us know what you think on the WDCM Wikidata project discussion page! I'd love to hear about any cool or interesting things you find in the visualizations.

    Thanks to Goran who put in a lot of time to get this up and running and everyone who helped him.

    Cheers --Lydia Pintscher (WMDE) (talk) 17:12, 30 October 2017 (UTC)

    Wikidata prefix search is now Elastic

    Wikidata’s birthday is still a few days away but since there are no deployments on Sundays we’ll get started with an early present ;-)

    Wikidata and Search Platform teams are happy to announce that Wikidata prefix search (aka wbsearchentities API aka the thing you use when you type into that box on the top right or any time you edit an item or property and use the selector widget) is now using new and improved ElasticSearch backend. You should not see any changes except for relevancy and ranking improvements.

    Specifically improved are:

    • better language support (matches along fallback chain and also can match in any language, with lower score)
    • flexibility - we now can use Elasticsearch rescore profiles which can be tuned to take advantage of any fields we index for both matching and boosting, including links counts, statement counts, label counts, (some) statement values, etc. etc. More improvement coming soon in this area, e.g. scoring disambig pages lower, scoring units higher in proper context, etc.
    • optimization - we do not need to store all search data in both DB tables and Elastic indexes anymore, all the data that is needed for search and retrieval of the results is stored in Elastic index and retrieved in a single query.
    • maintainability - since it is now part of the general Wikimedia search ecosystem, it can be maintained together with the rest of the search mechanisms, using the same infrastructure, monitoring, etc.

    Please tell us if you have any suggestions, comments or experience any problems with it.

    Smalyshev (WMF) (talk) 20:57, 25 October 2017 (UTC)

    @Smalyshev (WMF): Any idea why the character য় (U+09DF) or any other character whose decomposition contains nukta (U+09BC) returns nothing even if a label begins with it? I recall a demo you had set up of ElasticSearch that did not have this problem. Mahir256 (talk) 22:16, 25 October 2017 (UTC)
    No idea, but I'll look into it. Smalyshev (WMF) (talk) 22:27, 25 October 2017 (UTC)
    I still just don't get how the search results can be so bad. Search for "theatre": neither Q24354 nor Q11635 are on the first page. Search for "theater": nearly the same here. Search for "United States": Q30 not on the first page. Search for "Germany": the relevant item is only #11 in the results. --Anvilaquarius (talk) 07:01, 26 October 2017 (UTC)
    The examples you gave all look rather good to me. What language do you have your interface set to? --Lydia Pintscher (WMDE) (talk) 10:30, 26 October 2017 (UTC)
    "en - English". --Anvilaquarius (talk) 12:20, 26 October 2017 (UTC)
    Same results for me with the interface language set to "en - English". I would add that there is a disparity between the search suggestions that appear when you type into the search box, and the results you get if you click on the "containing..." link or hit Enter. If you search for "theatre", the suggestions are indeed good: Q11635 and Q24354 are the top suggestions. But in the search results, these items are not among the top results. Korg (talk) 23:33, 26 October 2017 (UTC)
    It has long been the case with stem searching at WD, in that it gives different results in the lookahead and the full body search. To me it is just different, not wrong. Full search is presumably contextual on other factors. Type "missionary" and see what you get, same with "newton", "shakespeare". In fact type "shakesp" and hit search for nada results, and compare that with the same search at Google.  — billinghurst sDrewth 00:05, 27 October 2017 (UTC)
    Thanks, Korg, for the information. I admit I never use the search box itself, but have a browser shortcut for its results, so this never occured to me. I will have to use the box more often to get getter results. Kind of silly, though. --Anvilaquarius (talk) 07:31, 27 October 2017 (UTC)
    Also, I think that free text in properties like P969 (street address) and the labels of properties like P131 must be included somehow in the results. They aren't now, I think. --Anvilaquarius (talk) 07:05, 26 October 2017 (UTC)
    You mean you want to find properties in the top-right search box too? --Lydia Pintscher (WMDE) (talk) 10:30, 26 October 2017 (UTC)
    No, I mean that the stored text data like the one in the P969 field somehow mus be findable other than with a complicated SPARQL query. There is no way to easily search for it if the address is not also included in the labels or descriptions. --Anvilaquarius (talk) 12:24, 26 October 2017 (UTC)
    We could find a qualifier by including the URL. This no longer works.. Could this really useful function be resurrected? Thanks, GerardM (talk) 09:53, 26 October 2017 (UTC)
    Yes. Thiemo is looking into that right now. --Lydia Pintscher (WMDE) (talk) 10:30, 26 October 2017 (UTC)
    It seems Stas needs to have a look. I filed phabricator:T179061. --Lydia Pintscher (WMDE) (talk) 10:34, 26 October 2017 (UTC)
    •   Comment with the previous lookahead functionality when manually editing a field, where I had a copy/paste of text and it included spaces or a tab then previous 'search' functionality used to be able ignore the spaces/tab and chomp the text for its lookup. That functionality has disappeared, I have added a phabricator ticket, though it will need to be properly curated.  — billinghurst sDrewth 00:24, 27 October 2017 (UTC)
    • I have definitely seen ranking improvements with the selector widget - when I type <stated in> = "TGN", Thesaurus of Geographic Names appears first instead of fourth, and it's always what I want. So thanks! - PKM (talk) 18:14, 27 October 2017 (UTC)
    • Both "(Q1073)" and "https://www.wikidata.org/wiki/Q1073" used to find brain (Q1073) but don't anymore do after this update. Especially the second is important to my workflow given that I frequently copy-paste properties and "rightclick/copy address of link" gives the full link. ChristianKl (talk) 11:30, 30 October 2017 (UTC)
    +1 - I've run into both of these with the recent change. It was really nice before that it seemed to notice a Qxxxxx in the middle of any string, that seems to have been lost. ArthurPSmith (talk) 20:43, 30 October 2017 (UTC)

    Strength of an ID

    I see often that some items are created based on one ID in a database. This can make massive creation and I think I have spotted different examples here and there.

    Despite the fact that IDs are carefully discussed, where it is discussed that an item with no specific structural notability can be done based on which type of IDs?

    Let's consider the case I am studying since moths about scientists. I have dozens of new possible ID to suggest, so besides their consistency I am also trying to understand their impact. It is not even the first time I introduce the question, one month aago I asked about limit of publication.

    So I share what I have learned but I am quite sure that similar examples apply for example to musicians or actors. I wish we had a project for bibliometry but I have asked around with no results.

    For example Twitter username (P2002) is not the same type of ResearchGate profile ID (P2038), but this one is generic and can still be manually created and does not compare to for example ORCID iD (P496), which is specifically academic but if I remember can still be created by the researcher itself and therefore is not the same of VIAF ID (P214), which is totally third-party. Right?

    But where is such de facto difference encoded? How, once you approve an ID you also decide if such ID is ok for a mass creation or simply a justification of items with such ID? The entire bibliographic sector has no sense for example if you don't have item even for people with one publication. You might not create them immediately but you have to one day.

    This is the same for every database aspects, there are things you need in theory to aim to some completeness, so what defines an ID worth of giving birth to an item per se in a certain field and what does not?

    I think that the ID should be the best driving force than some internal scenario. For examples, we have bots creating item of authors of publication with no ID at all. They might be important authors, but maybe there is no big point of creating items of researchers based on the presence of their strings in publication uploaded here (there might be also some biomedical tendency, as far as I can see) than in using a good ID from a specific archive, which is maybe more "third-party"

    Shouldn't we have a scale that discriminate "first-rate" IDs to "supporting" IDs? Shouldn't it be an aspect discussed maybe during the ID creation? We only have the definition of external identifier for people but the fact that something is external is just one factor to consider.--Alexmar983 (talk) 03:30, 27 October 2017 (UTC)

    • @Alexmar983: The main place such things are discussed before the import process is the bot approval page - Wikidata:Requests for permissions/Bot. It might be helpful for more people to watch that page and contribute to discussions there if this is becoming a major concern. While a bot is importing things, if there seems to be a problem, it should be raised with the person operating the bot and perhaps brought to a wider group if that doesn't fix things. Mix n Match lists many databases with identifiers that people are individually reviewing for suitability to add to wikidata, so that's another path that something may be created with just one identifier, but it's human-curated and I think has worked pretty well. I don't believe anybody is proposing to import the enter ORCID or VIAF database into wikidata at this point, so I'm not sure there really exists any identifier with millions of items that anybody has proposed is completely suitable for wikidata with no questions asked. I may have missed something though! ArthurPSmith (talk) 09:49, 27 October 2017 (UTC)
    That is not my core concern, actually, the core concern is what makes the strength of an ID. Such aspect is not related to create all items with such IDs directly, which is a step further. In many platforms we have bots but we don't discuss the policies implemented by bots in the bots pages, but in the policy pages.
    To me it sounds that we miss a functional step here. It looks like we are delegating implicitly to whoever says or does not say there that it is fine before massive work, which is a weak base for the development of a big archive. This way you remove a little bit the aspect if an ID gives dignity to an item or not per se, which should be a clearly focused, standardized and archived discussion, substituting with something that sounds like "some people there said it was ok to go all the way down".
    If this is the workflow, than it does not have solid premises for a functional growth, it is more easy to be unbalanced and flawed than other scenarios. IDs should be standardized IMHO by their quality clearly discriminating between those who support an item alone and those who don't, because it makes grow what is your global concept and role and strategy of IDs, which is more appropriate actually to do without the pressure of the numbers of items involved. The moment you decide to use a bot to import them all, that should be a different step.
    You can for example agree that millions item should be uploaded on a general framework (interactions with OpenStreetMap, for example) but than agree that they won't be uploaded in fast pace by bots. Instead, If a database has 99% "reliable" entries for 100000 possible items and another one has 90% "reliable" entries for only 1000 items, it might be easier currently to approve the second and create them quickly, but all of those various mis-entries combined from small sources over the years are much more toxic to the definition of a reliable and coherent work strategy than a clearly defined 1% of a slowly growing big number that probably remain isolated and can identified and removed.---Alexmar983 (talk) 10:53, 27 October 2017 (UTC)
    It sounds like you are thinking of proposing a new property to provide some sort of rating of external id properties? I think we've had some suggestions along those lines before but I'm not aware of a full proposal, so that might be what to do now. As to how this "workflow" works in practice, we've got a currently example right along these lines with the Handelsregister proposal. ArthurPSmith (talk) 13:32, 27 October 2017 (UTC)
    More than a proposal it's my analytical side that see three steps 1) confirm that an ID is sound and can be created 2) evaluate the quality of the ID if it is worth an item just by its presence (WLM Italy? Durch photographers database? We have many examples) 3) bot phase, add all items with such IDs if relevant or all IDs to existing items if not relevant based on external and internal factors. Now, maybe there are historical reason why "2" and "3" have been merged in one step de facto, but there are also reason why "2" could be merged for example with 1. More in general, "2" it is a step on its own and it has to be clearly stated somewhere, it should not be implicit. Than if this emerges as a rating fine, if it is a simple help page to be enlarged later ok... but we should be aware that there are different degrees of discussions and have a clear workflow order now that the project is "mature". Right?--Alexmar983 (talk) 14:16, 27 October 2017 (UTC)

    The problem is that you see things in isolation. A person can be associated with multiple identifiers. So a person may have an ORCID, a VIAF, a Twitter, a GND, whatever. The more identifiers associated mean more confidence that it is the same person. Other sources may share the same attributes like date of birth, education .. whatever. This is yet another clue that it is about the same person. When you add a person like I did recently to complete a list of award winners, you may find that a Wikipedia article is written and consequently that additional values can be added. Obviously there are many more people who have been added who are waiting for an article.

    When you have a huge database that is 99% correct and not connected to other sources, there are a few considerations. When there is no link to be expected to existing data, you can import it. This is particularly true when there is a user story that explains why this data is worthwhile to have. When there is no point to having the data.. why bother. When the data has a large overlap with other data you do want to integrate the data. You want to compare it with the existing data because this data is likely more correct than the data we hold. When our data is 94% correct and the new data 99% we will be able to largely automate finding the overlap. The data that differs, the six percent of the one and the one percent of the other needs to be merged with more attention to detail. All the data that has no likely overlap is fine to import. Thanks, GerardM (talk) 14:28, 27 October 2017 (UTC)

    • Those items are usually notable under section two of your notability policy: "It refers to an instance of a clearly identifiable conceptual or material entity. The entity must be notable, in the sense that it can be described using serious and publicly available references. If there is no item about you yet, you are probably not notable"
    I see "clearly identifiable" as meaning that if you have two sources that describe the entity and two Wikidata items get created for the entity there should be enough information available to merge them in most cases. If you have a "Caroline Smith who was a painter" and there's no additional information about her, we can't say that it's the same person as the "Caroline Smith who has yoga as a hobby" in another database as thus I would judge the described entities not as clearly identifiable. When it comes to "serious" a collection of a museum usually qualifies as "serious". On the other hand a facebook profile isn't a "serious" source for our purposes. ChristianKl (talk) 18:33, 27 October 2017 (UTC)
    What is your point as serious has nothing to do with it. When we decide for whatever reason to import a database we do this. When we cannot match, we don't. It is not "my" notability policy; when someone is added because of something notable, he is added and hopefully merged with other information. Thanks, GerardM (talk) 06:58, 28 October 2017 (UTC)
    But again shouldn't we say that an ID makes you "notable" as a real structured discussion? Plus again, GerardM talk about many aspects of post-automation revision which are interesting (and I kinda imagined them) but it still sound to me that again automation and discussion of ID "notability" are different aspects.
    ChristianKl even if we all think "facebook" (or Twitter, or Wikimedia Username) is not serious ID, that is an easy guess. With "secondary" IDs an expert eye is needed. For example I have found at least 6 or 7 new IDs to create which are perfect for scientists and reliable but I don't think they make a difference for notability per se (combined probably they are fine for "robust identificability", if you want a solid item for bibliometric purprose for example). Now should this information be encoded? Some IDs are complementary and not fully reliable for all items (facebook) some are generally reliable but do not provide notability (MathSciNet), some combined provide a base for completeness of some kind (if you want to use wikidata or a spin off for bibliometric purpose), some clearly provide notability even if this remain a vague aspect the more wikidata grows (the historical member of an elite academic institution)... an example for a site could be similar. I really though this was something that we plan to organically discuss somewhere, like a property of the ID itself.--Alexmar983 (talk) 11:28, 28 October 2017 (UTC)
    Having an ID does not make you notable. Having an identifier is only relevant when we import ALL records of a database. The purpose of secondary databases is that they help with disambiguation and possibly add gravity to a given statement. For instance for scientists we link to ORCID but it does not follow that we need to include all people with an ORCID ID. Thanks, GerardM (talk) 12:55, 28 October 2017 (UTC)inOidn
    Stating something we agree does not solve the original question: where or how (and why) we decide that with an ID follow the need to include all items with such ID? Than we can also discuss about the tole of combination of IDs but for example. But to make list of wikidata works PWIKI LOVES MONUMENTS ID (PWiki Loves Monuments ID) Wiki Loves Monuments ID]] for example creates a thousand of items, which are not more important or relevant than others we don't have, and I am quite sure it will "temporary" for a long time. Flanders Arts Institute venue ID (P3820) induced the creation of items that sometimes do not have an existing wikipedia articles, are they all important museums and performing centers? I cannot stop to think that we need something more rationalized.--Alexmar983 (talk) 09:15, 30 October 2017 (UTC)
    You are not adding any new point. When a list is maintained, its entries are notable. When an identifier exists only as a reference it is not such a list (an example would be Facebook or Twitter). So stop thinking and add value where it makes a difference. Thanks, GerardM (talk) 09:38, 30 October 2017 (UTC)
    One of Wikidata purposes is to serve the other Wikiprojects. If Wiki Loves Monuments needs an ID to identify something I don't see a reason why Wikidata shouldn't provide that functionality. Flanders Arts Institute venue ID (P3820) contains 1170 items. In contrast to the millions of scientific papers we added in the last months it's an insignificant amount of items. Furthermore the items fulfill the notability criteria (2) of being clearly identified conceptual entities who can be described with serious public sources. Wikidata doesn't have Wikipedia notability criteria but much wider criteria. ChristianKl (talk) 21:29, 30 October 2017 (UTC)

    w:Template talk:Marriage

    Just a reminder that a discussion is going on in Wikipedia about a template that we use to import data to Wikidata, w:Template talk:Marriage. The discussion is about including, or not including the end date of a marriage. I don't care which side you take, but more eyes are needed on the issue, since it affects Wikidata. --Richard Arthur Norton (1958- ) (talk) 01:15, 31 October 2017 (UTC)

    Comment - geographic items from ceb Wikipedia

    I can't speak to items outside of the USA, because I haven't been working on those, but just as a data point: so far, every instance that looked like multiple ceb wiki articles for the same US geographic feature that I have investigated turned out to be legitimate, separate features in the USGS GNIS database. Yes, there are three Bluff Points in Clinton County, New York, two of them facing each other across a strait. The assumption I made from conversation here that these items would prove to be duplicates of the same item from different maps was incorrect. - PKM (talk) 21:18, 17 October 2017 (UTC)

    There are a few real duplicates in the GNS database (from which geonames imported the items outside the US, which then was imported into ceb) for those areas I looked into, mostly however elongated items like rivers or mountain ranges, whereas more point-like items like hills the apparent duplicate are usually legitimate. As I usually add the GNS id whenever I look into a ceb-imported item, the P2326 constraint violations show these duplicates. However, geonames added their own duplicates, especially for rivers. Ahoerstemeier (talk) 09:12, 18 October 2017 (UTC)
    It takes a huge amount of effort to determine if they are distinct entities or duplicates, I gave up working on them. We should come up with a way to mark them, that they may be the same as each other, without merging them, and have it done automatically. This is maybe the 5th time this has come up, and it will keep coming up unless we address it somehow. --Richard Arthur Norton (1958- ) (talk) 13:57, 18 October 2017 (UTC)
    I ran across one duplicate in the GNIS database yesterday. I don't know how we could mark things as “said to be the same as” automatically, since it seems the geo coordinates would not be particularly close together or the problem wouldn’t occur. One approach might be pulling the qualifying location information (in parentheses) from the Cebuano wiki article title, doing some cleanup and translation, and putting that into the description field for as many languages as we care to work with. I think descriptions are a better tool than marking as potentially duplicates without them. - PKM (talk) 17:46, 18 October 2017 (UTC)

    Some of the suggestions we got to solve the cebwiki thingy:

    • work by country
    • apply possibly invalid entry requiring further references (Q35779580)
    • stop this nonsense
    • collaborate with other websites/wikis to solve it globally
    • differentiate municipalities and their main locality
    • delete entries with only cebwiki/svwiki sitelinks
    • be aware of duplicate entries, rounded coordinates, mislocated places, elevations derived from rounded coordinates, locations that can't be found (possibly fictional) ..
    • add additional references

    Maybe we should start working on some of this. I regret that I haven't looked into this earlier.
    --- Jura 07:29, 21 October 2017 (UTC)

    Any suggestions for automation? It took me about 20 minutes of research to do two of them, one I merged and one I recognized as being similar but not sure enough to merge. It is maddening. The longer we wait, the more people are linking to the newer one, which isn't always correct. --Richard Arthur Norton (1958- ) (talk) 19:47, 22 October 2017 (UTC)
    Cebuano is now the second largest Wikipedia, I assume from all these automated additions. -- Just a few tenths of a percent behind English shown here. Richard Arthur Norton (1958- ) (talk) 12:46, 25 October 2017 (UTC)
    • Practically we could end notability of Cebuano pages. The Cebuano Wiki could still do what it wants but it wouldn't be our issue anymore. ChristianKl (talk) 13:36, 30 October 2017 (UTC)
      • No we cannot end the notability of Cebuano pages; it is not the only project that makes use of this functionality. What we need is allow scripts that generate text in combination with cached pages. In this way we can remove the generated pages and not deliver less content in Cebuano. However, we should not destroy what Wikidata is there for. Thanks, GerardM (talk) 15:48, 30 October 2017 (UTC)
    • Dealing with the bot created pages by a single person on Cebuano Wiki isn't "what Wikidata is there for". ChristianKl (talk) 08:45, 31 October 2017 (UTC)

    Sorting Identifiers

    As identifiers continue to be added, we need to sort them so the most used ones appear at the top of the list on any given page. Any ideas on how to implement? I imagine ones like VIAF and LCCN will be the most populated ones. --Richard Arthur Norton (1958- ) (talk) 21:21, 22 October 2017 (UTC)

    I would like to have them on alphabet, actually. Maybe with an option to prefer some?Sjoerd de Bruin (talk) 21:29, 22 October 2017 (UTC)
    LCCN is not in our Top 100. ;) --Succu (talk) 21:41, 22 October 2017 (UTC) OK, Library of Congress authority ID (P244) is... --Succu (talk) 21:49, 22 October 2017 (UTC)
    This can easily be done by adding them to or changing MediaWiki:Wikibase-SortedProperties. --Lydia Pintscher (WMDE) (talk) 09:38, 23 October 2017 (UTC)
    Should we have a formal RFC to determine the most popular sort order? --Richard Arthur Norton (1958- ) (talk) 04:39, 24 October 2017 (UTC)
    well, actually, I guess the most popular ones will depend on the users. I, as French, prefer BnF ID (P268) to VIAF ID (P214) and Library of Congress authority ID (P244) is not interesting to me. I guess a German user would prefer GND ID (P227), etc.
    would it be possible to do for IDs like for prefered wikilinks, and be able to have the prefered ones on top ? (can't remember if it's a gadget or regular pref.) --Hsarrazin (talk) 09:34, 30 October 2017 (UTC)
    It would likely be possible to have a smarter sorting system. On the other hand, there's a lot of open tasks and I don't think the impact is high enough that it warrants developmental resources. ChristianKl (talk) 15:46, 31 October 2017 (UTC)

    Controversial items

    Some new classification items from Fractaler, which has "structural need" only in 1 (or 0) other items: 42306417, 42306669, 42306565. Should we encourage their creation (and their use in more items)? Or are they redundant and should be deleted? Added: see also related things for them in Reasonator. --Infovarius (talk) 06:21, 25 October 2017 (UTC)

    It seems that he creates his own classification tree and do not try to fit into existing classification. Again question: encourage or deny? --Infovarius (talk) 06:36, 25 October 2017 (UTC)
    I looked at 42306417 - I don't have a problem with it as such, but you seem to think it's duplicative? What would you merge this with? ArthurPSmith (talk) 08:27, 26 October 2017 (UTC)
    1) It has no sitelinks (because all of them should be at elasticity (Q62932) anyway). 2) Should it be used on every such object? 3) Should we create a class of objects per each property? --Infovarius (talk) 09:49, 27 October 2017 (UTC)
    Ok, those are good points. @Fractaler: do you care to explain your plan here? What function does "elastic object" fulfil in wikidata that is not already handled by "physicsl object" and "elasticity"? I would think just about every real physical object has elasticity anyway, and even a theoretical one with an elasticity of zero still has a value for the property (just that it's zero). You do seem to have created a lot of partitions of abstract classes of things along these lines. Are you following some particular source? What is the basis for all this work, and where is it going? ArthurPSmith (talk) 20:35, 27 October 2017 (UTC)
    Description from WP: If the material is elastic, the object will return to its initial shape and size when these forces are removedremoved --Fractaler (talk) 11:15, 28 October 2017 (UTC)
    Ok - good point, liquids and gases obviously aren't elastic and there are some others between liquid and solid along those lines. Nevertheless, I'm not sure "elastic object" is a common technical term for this - maybe "elastic body" or "elastic material" would be better. That's why I asked for a source. If enwiki is your source, it still doesn't seem to use the term "elastic object" in that quote. This seems a little along the lines of "original research" to me... ArthurPSmith (talk) 11:25, 28 October 2017 (UTC)
    w:Elasticity (physics): ability of a body to resist a distorting influence or deforming force and to return to its original size and shape when that influence or force is removed. Solid objects will deform when adequate forces are applied on them. If the material is elastic, the object will return to its initial shape and size when these forces are removed. So, we have: 1) body, 2) object, 3) material. I think we can + also 4) media. Link (to WP) added. --Fractaler (talk) 12:02, 29 October 2017 (UTC)
    The fact that there's a property called elasticity doesn't mean that "elastic objects" is a category that used by anyone. We don't use instance of (P31) "red object" either but would find another way to describe that an object is red. ChristianKl (talk) 18:29, 29 October 2017 (UTC)
    Ok, no problem. What is, for example, spring (Q102836) by your version? --Fractaler (talk) 08:02, 30 October 2017 (UTC)
    If you look at the actual item you find that it subclasses machine element (Q839546). ChristianKl (talk) 08:22, 30 October 2017 (UTC)
    Then the next question is: should we distinguish (by assigning supersets) for these items? For example, spring (Q102836) and shaft (Q309383) --Fractaler (talk) 09:29, 30 October 2017 (UTC)
    Using has quality (P1552) works fine for distinguishing them. ChristianKl (talk) 12:34, 30 October 2017 (UTC)
    Fine for whom? How does has quality (P1552) provide transitivity? How do from spring (Q102836) go to a superset of spring-like objects, then to next superset, etc., and back spring (Q102836), then subsets, subsets of subset, etc? Can you do it by has quality (P1552)? But triad (Q29430681) terms of set theory (Q12482) (tool subset (Q177646)<->set (Q36161)<->superset (Q15882515) for navigation) works fine. --Fractaler (talk) 13:50, 30 October 2017 (UTC)
    Is A is a subset of B and all A have property X then of course all B have property X. There's transitivity. You are trying to do original research here in a field where there's existing work. There are various bodies from TÜV, ISO to IEEE that might have thought about how to classify springs. If you find a standardization body that feels the need to classify springs the way you suggest, feel free to provide references. ChristianKl (talk) 14:04, 30 October 2017 (UTC)
    Your tool is only for info about A <-> B navigation ("breadcrumb (Q846205)" of 2). set theory (Q12482)'s tool is for A <-> B <-> C <-> D...All ("breadcrumb (Q846205)" of full). --Fractaler (talk) 14:35, 30 October 2017 (UTC)

    Another is Q42326012 - should we create singular item for each item with plural label? --Infovarius (talk) 09:49, 27 October 2017 (UTC)

    No way! I merged into foster parent (Q2427941) -- JakobVoss (talk) 12:57, 31 October 2017 (UTC)
    Why does no one want to use math in Wikidata? "foster parents" ("foster father and foster mother") is dyad (Q29431432). "foster parent" ("foster mother or foster father") is Q30103061. Mathematics asserts that dyad (Q29431432)Q30103061. So, "foster parents" is not also "foster parent" --Fractaler (talk) 13:10, 31 October 2017 (UTC)

    Consistency Task Force: part of / has part usage and documentation is inconsistent

    Hello, let me start it here, instead of start fixing it up (and making a general mess without warnings) or pushing it to upper discussion forum. Your inputs are most welcome, especially if you're a seasoned wikidata editor. Be gentle. :-)

    I started creating queries related to countries using various filters, and I have realised that - at least within historical Europe - countries are classified into parts (like Eastern Europe), regions (like Baltics) and other various entities, which is fine and dandy. Except, some countries contain (in my humble opinion correctly) partOf parts/regions, while other countries are listed as Region hasPart Country, some has continent Europe, while others are partOf it, and some countries are partOf Europe instead of their region (like Denmark). (And possibly there are some subclassOf or mutant of partOf as well, I haven't checked yet.)

    I agree with the documentation of partOf that hasPart should be deprecated and avoided where possible, and while it doesn't inherently hurt to have inverse relations it makes data consistency fragile and queries ambivalent, and since the relations are exact opposite there shouldn't be any case where it's required. So I would replace all hasPart with partOf in relevant entity [country related] pairs (which means removing hasPart, mind you; similar to the country below I may be easily convinced to keep it and even insert it everywhere, to make it easier to query [← except when people actually grokking the query engince say "it's crap, they're both do the same query time"]).

    Then I believe continent should be on every country, if applicable; apart from the partOf continent, which is a different, and similarly important attribute. I'm saying it while knowing that partOf chains should inherently lead back to continent in the end, so I'm open to be convinced otherwise.

    Then I would ensure that all present and historical countries' partOf chain leads back to continent it's located in throughout the associated regions and parts.

    I'd come up with more consistency checks, along the way, mostly based on highly developed entries like Germany (hi to German wikidaters! :)).

    Objections, advices, praises, flames, general wit? Thanks! --grin 12:02, 25 October 2017 (UTC)


    There seem to have sporadic discussion about automagically ensuring (or creating) inverse pairs of relations, like for every partOf there should a hasPart created on the other end of the relation. I am unable to find a satifying conclusion. Anyone with the hidden knowledge around, maybe? --grin 12:13, 25 October 2017 (UTC)

    If we are talking about adding a superset (Q15882515) to a set (Q36161) and automatically obtaining a list of all subset (Q177646)s for the set (as is the case if the linking tool is used using the categories' system by +[[Category: xxxxxxx]]), in Wikidata, so far, as far as I know, absent. --Fractaler (talk) 12:48, 25 October 2017 (UTC)
    I had specifically part of (P361) and has part (P527) in mind, but ArthurPSmith's (implied :)) example about a glass of lemonade (which hasPart lemon, but it's not desirable to have lemon 2b partOf lemonade) showed that automagical update wouldn't be practical. --grin 14:08, 25 October 2017 (UTC)
    A glass of lemonade (set) consists of components (subsets). The lemon is the component (subset) of a glass of lemonade (set). --Fractaler (talk) 07:08, 26 October 2017 (UTC)
    @Fractaler: It is true but we do not expect to have half a million component of attributes on lemon, from lemonade to bathroom scaling stuff. Indeed there are the case when it's not desirable to list for a really common component all the possible sets it's member of. --grin 10:28, 26 October 2017 (UTC)
    @grin: half-million list is a problem for Wikidata? --Fractaler (talk) 10:58, 26 October 2017 (UTC)
    @Fractaler: Possibly yes; I've already reached the limits of the SPARQL query engine and had to apply some pretty ugly tricks just to be able to handle the large amount of auxilary (and for me unwanted) information. I'd say it'd be clearly suboptimal, since these would absolutely kill the natural human interface of wikidata. --grin 12:30, 26 October 2017 (UTC)
    @grin: Ok, then, while the Wikidata does not have a machine interface of wikidata (for its machine-readable data (Q6723621)), and has only the natural human interface of wikidata (for its Q28777989), we can use "group by"-tool. For example, group by state of matter (Q11430). Then for set "lemon components" we have only 2 subsets: 1) "lemon components of liquid object" (liquid objects (Q30129399)); 2) "lemon components of solid object" (solid object (Q29052015)). --Fractaler (talk) 14:21, 26 October 2017 (UTC)
    @grin: See Help:Basic membership properties for the main documentation I think we have on these issues. "Has part" and "part of" are inverses in most cases but not always - sometimes has parts of the class (P2670) is the right inverse for "part of". And sometimes one side or the other of the relation is not really appropriate at all - an ingredient in a recipe should probably only have the "has part" relation, not the "part of" as each ingredient is likely part of an almost infinite number of recipes; conversely a grouping with a very large number of parts which are concrete instances with wikidata items should probably have only the "part of" relationships, as the "has part" list would be very large and unwieldy. ArthurPSmith (talk) 13:45, 25 October 2017 (UTC)
    @ArthurPSmith: Seen all the docs, it haven't quite examined the subject. You are right with has parts of the class (P2670) but in my case all the objects are instances (countries, regions, continents), where - as far as I see - part of (P361) and has part (P527) are exact inverses. I like your example of lemon which wouldn't have a partOf lemonade, which covers well the general case, and make it obvious why general automagical update wouldn't be proper. I hope someone will actually answer my very specific question above, too. Thanks for your reply! --grin 14:08, 25 October 2017 (UTC)
    continent (P30) currently doesn't subclass "part of" because there are countries like Russias that are continent (P30) "Europe" and "Asia" which means that with a transitive "part_of" property Russia is not part_of Europe or part_of Asia. ChristianKl (talk) 15:23, 25 October 2017 (UTC)
    "Europe" + "Asia"=Eurasia (Q5401) (supercontinent of 2) --Fractaler (talk) 08:29, 26 October 2017 (UTC)
    Are you arguing that we shouldn't say that Russia has Asias as continent? Even if you solve that issue of Eurasia you still have Indonesia, Panama, Spain and Egypt as countries that span multiple continents. ChristianKl (talk) 19:35, 29 October 2017 (UTC)
    Ok, we have a problem for creating of some superset like "land of countries that span multiple continents"? --Fractaler (talk) 08:45, 30 October 2017 (UTC)
    I don't think we have any problem with the status quo. "Continent" works the way it does currently as not being a subclass of "part of". ChristianKl (talk) 11:26, 30 October 2017 (UTC)
    Can we use concept "Continent" for "Indonesia, Panama, Spain and Egypt as countries that span multiple continents"? --Fractaler (talk) 11:32, 30 October 2017 (UTC)
    Yes, because "Continent" doesn't subclass part_of and doesn't contain a suggestion that it's transitive. ChristianKl (talk) 14:07, 30 October 2017 (UTC)
    Can we use a set like "land of countries that span or do not span multiple continents" instead of "Continent" to avoid the limitations imposed by the definition of "Continent"? --Fractaler (talk) 07:36, 31 October 2017 (UTC)

    How to move Statement into Identifiers section?

    In page Brodmann area (Q924684), "FMA-ID" (P:P1402) is shown in "Statements" section. Not in "Identifiers" section. Although I don't know the mechanism of this, I wonder that my property proposal was wrong. Is it possible to fix this even now? --Was a bee (talk) 11:06, 26 October 2017 (UTC)

    Yes, it is. (A request has to be delegated to the developers who can change the datatype.) On the migration page, it is listed among Mix of good to convert and disputed. I think those lists should be revisited and a new batch of properties to convert should be decided on. Matěj Suchánek (talk) 11:22, 26 October 2017 (UTC)
    I didn't know that process. I'll check it. Thank you Suchánek. --Was a bee (talk) 11:53, 26 October 2017 (UTC)
    You might want to review the constraint violations, there seem to be a relatively large number for this property (but some may be legitimate), that was the reason it was not in the group initially converted to external id type. ArthurPSmith (talk) 13:18, 26 October 2017 (UTC)
    Oh I see. Currently there are no clear consensus (even discussions) about that point, how to treat "human specific" and "species general" pages like "Human brain" and "Brain". I'll consider that. Thank you ArthurPSmith. --Was a bee (talk) 21:54, 26 October 2017 (UTC)
    I don't see a reason why "Human brain" and "Brain" shouldn't have both an "FMA-ID" and at the same time have the external-id datatype. ChristianKl (talk) 09:35, 29 October 2017 (UTC)
    That would make it more a categorization system rather than an identifier. ArthurPSmith (talk) 11:18, 29 October 2017 (UTC)
    One of the main effects of labeling something as an external-id is that the value goes in the external-id section. I think the naive user would expect to find something like the FMA ID to be found in the external-id section. Given that FMA is about human anatomy it might be more precise to add it to "Human brain" then "Brain" but I don't think that's in the spirit of useability of the data. ChristianKl (talk) 15:52, 31 October 2017 (UTC)

    Happy Birthday, Wikidata!

    Happy Birthday, Wikidata!

    Today is Wikidata's 5th birthday. Some days I can't really believe it's been 5 years already since we set up this wiki. I am proud of what we are building together every single day and I am excited about what is coming next. Here is to many more years together building the best free knowledge base there is! Head over to the birthday page to read stories, reflections, birthday wishes and most importantly presents ;-)

    What's your highlight of the past year? What are you looking forward to the most over the coming year?


    Cheers --Lydia Pintscher (WMDE) (talk) 22:30, 28 October 2017 (UTC)

    oups ! missed the date ! Happy birthday Wikidata ! --Hsarrazin (talk) 10:34, 30 October 2017 (UTC)
    Happy birthday, Wikidata File:CIS-A2K_Wikidata_5th_birthday_gif.gif.   --Titodutta (talk) 17:31, 31 October 2017 (UTC)

    Picture on item page

    Hello, many items have an "image". Why does the item page simply not show that image, so that you see it immediately? Wouldn't that be quite useful? Z. (talk) 21:13, 29 October 2017 (UTC)

    Showing the image immediately might make people think it's a part of Wikidata and thus licensed under CC0. A lot of the images however aren't. ChristianKl (talk) 21:24, 29 October 2017 (UTC)
    just like everyone at wikipedia must think the rare fair use images "must" be CC-BY. a better argument is load time, or not helpful to machine. but apparently useful to the understanding of this editor. Slowking4 (talk) 23:08, 29 October 2017 (UTC)
    Module:Taxobox has this feature from it's very beginning. --Succu (talk) 21:32, 29 October 2017 (UTC)
    It is a data driven form, not a presentation level page. What is the benefit/need of an image on the page?  — billinghurst sDrewth 00:50, 30 October 2017 (UTC)
    It would be a significant benefit to have an idea if the image in the property was an appropriate one, without having to click through to a separate page on Commons.--Pharos (talk) 01:56, 30 October 2017 (UTC)
    But that's true of every link. The various Wikipedia links may or may not be appropriate as well, such as a different person or place by the same name. The Wikisource copy might be an edition instead of a work. The Wikispecies page might be for an animal instead of a plant with the same name. Any sort of checking will necessarily involve clicking a link. --EncycloPetey (talk) 02:03, 30 October 2017 (UTC)
    Pharos, then maybe it should be a gadget. For the vast bulk of people items of where I edit, I wouldn't know if they are right or wrong, so it would give me no benefit to see the image on every item. About the only time that it would help is those occasions where someone slips in another image, eg. gravestone, or artwork, though they are usually distinguishable by name in the first place.  — billinghurst sDrewth 04:04, 30 October 2017 (UTC)
    Enable ImageHeader in your preferences. Matěj Suchánek (talk) 07:43, 30 October 2017 (UTC)
    I would think we would want "Imagehelper" as a default gadget: its a very basic utility, and does a good job showing that the image is not in fact part of Wikidata. Sadads (talk) 08:15, 30 October 2017 (UTC)
    Now "image" have only Q36987233 in Wikidata. So it is a Wikipedia human-readers tool and Wikimedia human-editor tool (visual assistant when creating an ontology by Wikidata's tools). Images are not recognized by the Wikidata's tools yet. The Wikidata's tool (program) does not yet understand what is depicted in the picture. In order for the program to understand what they want to explain to it with the help of a picture, now, as far as I know, we have only depicts (P180). --Fractaler (talk) 09:08, 30 October 2017 (UTC)
    Hello, thanks for the ideas and explanations. I just tried the "imageheader" gadget, but I don't still see an image on the page. / If for license problemes: It would be possible to indicate at the picture the correct license. At least a link to the picture on WMCommons would be an improvement. Z. (talk) 11:16, 31 October 2017 (UTC)

    College athletes

    Hoi, USAmerican has this thing where college students play sports. These players are often only in a category only for their particular sports and not recognised as alumni. My question; is it ok to have all these college athletes also included as alumni? Thanks, GerardM (talk) 16:13, 30 October 2017 (UTC)

    Do you have sources that claim that they are alumni? If so add the claim with the reference. If you don't have a reference, don't. ChristianKl (talk) 16:26, 30 October 2017 (UTC)
    GerardM is talking about categories and that doesn't sound like a Wikidata thing. Gerard are you sure this is the right forum for your question? If you are asking about whether it would be ok to add educated at (P69) institution statements for people in a list of college athletes at a given institution, that seems fine to me. ArthurPSmith (talk) 20:51, 30 October 2017 (UTC)
    american also has this ugly thing where the student athlete gets injured and loses his scholarship and does not graduate with a degree. so there is a correlation but not equality. Slowking4 (talk) 21:34, 30 October 2017 (UTC)
    In that case the sports categories should not be subcategories of the alumni but a subcategory of the college / university.. Correct ? Thanks, GerardM (talk) 09:10, 31 October 2017 (UTC)
    @Slowking4, GerardM: From the Wikidata perspective, this shouldn't matter much - educated at (P69) is a broad "was educated at" not a narrowly defined "alumni who graduated". Someone who attends for a while and then drops out for whatever reason can still be listed that way. For the WP categorisation, that would probably depend on how the local Wikipedia chooses to define "alumni". Andrew Gray (talk) 12:30, 31 October 2017 (UTC)
    I have suggested on en.wp that the categories for the athletes should be under the category for the college on an equal level as the alumni.. They should decide who is an alumni. We use "educated at" and it works sort of. Thanks, GerardM (talk) 12:35, 31 October 2017 (UTC)
    the categories on wikipedia and commons are "alumni" so more narrow. in the infobox you have two fields: education, and alma mater. so now you have a matching problem of a subset. Slowking4 (talk) 12:35, 31 October 2017 (UTC)

    On an only slightly related note, those who are interested in college athletics (Q5146583) may want to work on properties listed through the new Wikidata property related to college sports (Q42428442) item here. Thierry Caro (talk) 14:02, 31 October 2017 (UTC)

    Corporate Contribution

    Hello,

    I am designing an artificial intelligence system for a client. As part of this system, we need to verify the dates that various technologies were invented. E.g. we need to know that if someone mentions C++14, that the earliest they could have possibly used this technology is 2014. Wikidata appears to be a very valuable resource for this, particularly the "Software Version" tag that many technology entries have.

    When the date for a technology is unknown, our workers must manually look it up and enter it into our AI system. I am wondering what the community would think if we contributed these edits back to Wikidata. If we depend on Wikidata as the authoritative source of this information, we need for our workers to be able to make those modifications. We can have our own database in tandem with Wikidata, but I would like us to contribute back to the community when possible, since we are relying heavily on the data being provided to us from the community for the MVP.

    Is this possible / accepted by Wikidata policies? Would this count as a bot? How should we design the interface from our own technology to Wikidata?

    Thanks!

    Genixpro (talk) 19:54, 30 October 2017 (UTC)

    Tobias1984
    Emw
    Zuphilip
    Danrok
    Bene*
    콩가루
    TomT0m
    DrSauron
    Ruud Koot
    Andreasburmeister
    Ilya
    Toto256
    MichaelSchoenitzer
    Metamorforme42
    Pixeldomain
    User:YULdigitalpreservation
    Dipsode87
    Pintoch
    Daniel Mietchen
    Jsamwrites
    Tinker Bell
    FabC
    Jasc PL
    putnik
    Dhx1
    Tris T7
    Peb Aryan
    lore.mazza004
    Rc1959
    Premeditated
      Notified participants of WikiProject Informatics ChristianKl (talk) 20:55, 30 October 2017 (UTC)

    @Genixpro: have you read Wikidata:Data donation yet? As long as the data you are sharing is acknowledged to be compatible with the CC-0 license, especially since you are planning only to add missing dates and not add new items that might have notability issues, this sounds fine to me. If you run this in an automated mode you should request a special bot account and request bot permissions - see Wikidata:Requests for permissions/Bot for examples and the process needed. Whatever accounts you have doing edits should acknowledge that they are funded by your organization to do this to be clear on any possible conflicts of interest. ArthurPSmith (talk) 21:05, 30 October 2017 (UTC)
    I think it's not only possible but also very welcome. Of course there are rules: beside the rules for everyone, your accounts should mention for which company they work and you shouldn't edit items about yourself or your products (or anything else where you might have conflict of interest). But thats probably no big deal, so a warm welcome from my side. -- MichaelSchoenitzer (talk) 13:12, 31 October 2017 (UTC)

    Date-adding bot

    After some issues were reported, I wrote a bit about my date-adding bot here. Best to discuss there, rather than on my talk page in a dozen threads. --Magnus Manske (talk) 14:28, 31 October 2017 (UTC)

    Concerning Gregorian / Julian issues, Russia continued to use the Julian calendar until 1918. If it is at all possible, I would disable inmpor of dates if (i) they are before 1918; AND (ii) the source does not specify the calendar, AND (iii) the item does not specify P31 or one of the values of P31 is Russia or Russian Empire.--Ymblanter (talk) 14:52, 31 October 2017 (UTC)