Open main menu

User talk:Smalyshev (WMF)

About this board

Previous discussion was archived at User talk:Smalyshev (WMF)/Archive 1 on 2015-08-21.

Structured Data on Commons - IRC office hours this week, 18 July

1
MediaWiki message delivery (talkcontribs)

The Structured Data team is hosting an IRC office hour this week on Thursday, 18 July, from 17:00-18:00 UTC. Joining information as well as date and time conversion is available on Meta. Potential topics for discussion are the testing of "other statements", properties that may need to be created for Commons on Wikidata soon, plans for the rest of SDC development, or whatever you might want to discuss. The development team looks forward to seeing you there. -- Keegan (WMF) (talk) 18:51, 16 July 2019 (UTC)

Reply to "Structured Data on Commons - IRC office hours this week, 18 July"

Structured Data - testing qualifiers for depicts

1
MediaWiki message delivery (talkcontribs)

As you might have seen, testing is underway for adding qualifiers to depicts statements. If you have not left feedback already, the Structured Data on Commons development team is very interested in hearing about your experience using qualifiers on the file page and in the UploadWizard. To get started you can visit Test-Commons and chose a random file to test out, or upload your own file to try out the UploadWizard. Questions, comments, and concerns can be left on the Structured data talk page and the team will address them as best as they can. Thank you for your time. -- Keegan (WMF) (talk) 19:08, 11 June 2019 (UTC)

Reply to "Structured Data - testing qualifiers for depicts"
MisterSynergy (talkcontribs)

Hey Stas (again)

There are sometimes changes which for whatever reason do not make it to the WDQS servers. As far as I know, you can somehow manually re-load complete items to the servers, or remove them from the servers if they have been deleted. Can we also do this as regular Wikidata users? If so, how? If not, would it be possible to make a tool that allows us to do this?

Thanks!

Smalyshev (WMF) (talkcontribs)

There's no tool (except for making another edit to the same item) but if you tell me where the problem is I could check it. It's not easy to make a user-accessible tool since it requires access that regular users would not have.

MisterSynergy (talkcontribs)

"Making another edit to the same item" is sometimes not so easy; unfortunately, we do not have "null edits" in Wikidata which one could make instead of actual changes.

There are quite a lot of problems, and I continuously stumple upon new ones. Deleted items which are still listed in the results months after deletion, or changes which are obviously missing on the servers. IMO it is not really feasible if Wikidata users have to contact you each time, thus I am asking for a tool. Would it be possible to provide some kind of an (internal) interface where a tool or gadget could report entity IDs (items, properties, or lexemes), and the actual re-loading process which requires elevated access rights was done somewhere internally on the server?

MisterSynergy (talkcontribs)

As there is no tool right now and it would probably take some time to make one in case it is possible, I report some items which need a purge anyways. They are all items which have been deleted a while ago, but they are listed in query results:

Q59829205, Q63052311, Q61055977, Q62778891, Q2895918, Q55659535, Q25632774, Q63058667, Q62781555, Q61138466, Q62008697, Q62781871, Q62713985, Q31828394, Q63069534, Q13178905, Q31828368, Q63040612, Q63074731, Q63081983, Q13046358, Q62780082, Q31828391, Q62702741, Q5686578, Q62721811, Q31828254, Q62676575, Q62675942, Q62717645, Q63075268, Q63041370, Q62781636, Q57792502, Q25390124, Q62777557, Q62986415, Q62716765, Q63065908, Q31828417, Q62698081, Q60995829, Q62703963, Q13183001, Q57793838, Q62750924, Q62734411, Q31828378, Q12962093, Q62703188, Q62772788, Q62667304, Q62716697, Q62753008, Q62728287, Q63036112, Q63120710, Q19682182, Q61117736, Q63067830, Q60578103, Q31828297, Q62752914, Q56327604, Q63068802, Q31828393, Q62667294, Q62755014, Q63065668, Q62738379, Q56306520, Q25538365, Q62754296, Q4229444, Q63084259, Q62705445, Q62747408, Q62724098, Q31828248, Q4229681, Q5281896, Q31828203, Q62476887, Q62675605, Q63007134, Q62734388, Q61059034, Q18815002, Q62765154, Q20086083, Q30526954, Q61129050, Q62761201, Q31828209, Q62690844, Q62706496, Q12111894, Q61118152, Q63043840, Q63066755, Q4291259, Q61359472, Q61131551, Q31344605, Q62762259, Q62766460, Q62752269, Q62690936, Q62667019, Q31828270, Q56415788, Q5793370, Q61118135, Q63077625, Q62745809, Q63077570, Q31828295, Q62746804, Q31828273, Q62719860, Q63343783, Q63065650, Q61123705, Q63065828, Q31828388, Q62792320, Q31828208, Q10878213, Q19730075, Q62702659, Q62709576, Q56312327, Q5151641, Q62708234, Q62988192, Q22236231, Q62764570, Q62588329, Q31840388, Q22906251, Q31828389, Q62755868, Q62713821, Q62987821, Q20084441, Q12962086, Q62762851, Q63064990, Q25678713, Q63043793, Q62675661, Q61065578, Q62702796, Q61117845, Q31828383, Q16962127, Q31828250, Q63065876, Q31828253, Q62703932, Q56321182, Q16437754, Q62711587, Q63079898, Q62738118, Q62705223, Q62750164, Q48961760, Q62695842, Q30893350, Q63121654, Q7195950, Q61065775, Q62704029, Q61082295, Q25708204, Q61061661, Q15925722, Q62688910, Q39096647, Q63079427, Q31828227, Q62711995, Q63044886, Q62758567, Q62749994, Q17047948, Q62754781, Q62751976, Q4105233, Q62736653, Q25691916, Q13025162, Q63069480, Q56309320, Q56306154, Q30892496, Q63043649, Q62706322, Q62700193, Q59618332, Q63033949, Q31828382, Q62751507, Q62760325, Q31828207, Q20086755

MisterSynergy (talkcontribs)

Something went wrong with this Structured Discussion thread. This post from earlier this evening is not displayed here properly, and it is missing in the topic history.

Reply to "changes missing in WDQS servers"
MisterSynergy (talkcontribs)

Hey Stas

There is an RDF type wikibase:GeoAutoPrecision used in WDQS which seems to be pretty undocumented wherever I try to find something (mediawiki.org, wikidata.org, phabricator), except for this bit at wikiba.se itself. It seems that it marks statement values of coordinate properties where the user somehow did not specify a geo precision when the statement was added. WDQS always reports 2.7777777777778E-4 for wikibase:geoPrecision in such cases, and this happens to be the case in ~100.000 items . From what I can see, this appears to be kind of a technical legacy, as the coordinates in question were all added in 2013. At some point the validator probably was changed so that coordinates without a user-specified precision were not accepted any longer, probably due to phab:T55796.

The problem now is that these items are not editable via API if no user-specified precision is provided (i.e. I am trying to modify something completely unrelated in these items). When editing with pywikibot, I receive the following error message:

WARNING: API error modification-failed: Missing required field "precision"
Edit to page Q854403 failed:
modification-failed: Missing required field "precision" [messages:[{u'html': {u'*': u'Missing required field "precision"'}, u'name': u'wikibase-validator-missing-field', u'parameters': [u'precision']}]; help:See https://www.wikidata.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce for notice of API deprecations and breaking changes.]

I am not sure what should be done here, thus I ask for your opinion:

  • Add documentation about wikibase:GeoAutoPrecision at appropriate places?
  • Tidy this whole thing, with a bot run that adds "user-specified" precisions for all coordinates in question? This would eliminate wikibase:GeoAutoPrecision completely
  • Change the validator?
  • …? (something else)

Thanks!

Smalyshev (WMF) (talkcontribs)

GeoAutoPrecision is set when the data for whatever reason does not have precision set. You are right, it's a patch for legacy data, null precision is no longer allowed to be entered, and that's the right thing to do. Fixing the legacy data is of course welcome, not sure if the bot is good for it since implied precision might be different, but I didn't research the subject in any detail, so maybe bot-fixing it is ok.

MisterSynergy (talkcontribs)

Thanks.

  • For a bot fix, one would have to seek community consensus anyways via Wikidata:Requests for permissions/Bot.
  • Coordinate precision is one of the least understood feature here at Wikidata. Barely any editor knows what to enter there and just uses the value suggested by the UI; for bots, there are different approaches how to compute values to enter as precision. Coordinate precision is in fact a broken feature anyways, and I don't know whether anyone is using it at all. AFAIR, there are some discussions about the problem around somewhere in WD:PC-archives, phabricator, and maybe some other pages.

As I am busy with another bot job (which surfaced this problem) I will not immediately propose a bot fix to the community, but I think it might be worth to do so at some point in the future when I have more time.

Reply to "wikibase:GeoAutoPrecision"

Structured Data - early depicts testing

1
MediaWiki message delivery (talkcontribs)

The Structured Data on Commons development team has the very basic version of depicts statements available for early testing on Test-Commons. You can add very basic depicts statements to the file page by going into the new “Structured Data” tab located below the "Open in Media Viewer button." You can use the Latest Files link in the left side nav bar to select existing images, or use the UploadWizard to upload new ones to test with (although those images won’t actually show up on the site). The test site is not a fully functional replica of Commons, so there may be some overall problems in using the site, but you should be able to get a general idea of what using the feature is like.

Early next week I will call for broad, community-wide testing of the feature similar to what we did for Captions, with instructions for testing, known bugs, and a dedicated space to discuss the feature as well as a simple help page for using statements. Until then, you're welcome to post on the SDC talk page with what you might find while testing depicts.

Thanks in advance for trying it out, you'll be hearing more from me next week. -- Keegan (WMF) (talk) 22:00, 21 March 2019 (UTC)

Reply to "Structured Data - early depicts testing"
Multichill (talkcontribs)
Smalyshev (WMF) (talkcontribs)
Multichill (talkcontribs)

URI is http, not https, just like the Wikidata URI for this item is http://www.wikidata.org/entity/Q1455955 . It's just a redirect so for the human centered formatter URL we send people straight to the https. So both fields on VIAF ID (P214) are filled out correctly:

Is the wdtn: generating code accidentally using formatter URL (P1630) instead of formatter URI for RDF resource (P1921)?

Smalyshev (WMF) (talkcontribs)

Yeah I see the difference but I am not sure why it's the case? I think WDQS normalizes all VIAF URIs into https version. I am not sure why http version is declared for RDF export, is there any reason for that?

Multichill (talkcontribs)
Smalyshev (WMF) (talkcontribs)

I see. So I imagine that means we should normalize to http, not https. I'll see how easy it would be to make it work.

Multichill (talkcontribs)

No, you shouldn't normalize at all. Just generate the respective url's based on what is set in the different formatter properties.

Smalyshev (WMF) (talkcontribs)

Not normalizing means we'd have two sets of URIs, http and https... not optimal. Also, it looks like changing this may require full DB reload. If so, this may take time :(

Multichill (talkcontribs)

By definition the URI http://viaf.org/viaf/5853377 is not the same as https://viaf.org/viaf/5853377 so changing either of them is wrong. It's either one so I don't understand why we would end up with two sets of URI's. Or do you mean one half with incorrect https URI and one half with correct http URI?

Probably time to make a bug for this in phabricator or did you already do that? Do you happen to know in what change this normalization was added?

See also Property talk:P1921#Incorrect URI's

Smalyshev (WMF) (talkcontribs)

By definition it's not the same, but people are not always supplying clean data according to definitions. Especially before RDF formatting URLs were even established. The normalization is several years old and probably predates "formatter URI for RDF resource" property itself.

Yes, we can make Phabricator task and continue the discussion there, it'd be better probably.

Multichill (talkcontribs)
Smalyshev (WMF) (talkcontribs)

not that I can find. Please feel free to create one and add me to subscribers.

Multichill (talkcontribs)
Reply to "Where is the https coming from?"

Structured Data - development update, March 2019

1
MediaWiki message delivery (talkcontribs)

This text is also posted on the Structured Data hub talk page. You can reply there with questions, comments, or concerns.

A development update for the current work by the Structured Data on Commons team:

After the release of multilingual file captions, work began on getting depicts and other statements ready for release. These were originally scheduled for release in February and into March, however there are currently two major blockers to finishing this work (T215642, T217157). We will know more next week about when depicts and statements can likely be ready for testing and then release; until then I've tentatively updated the release schedule.

Once the depicts feature is ready for testing, it will take place in two stages on TestCommons. The first is checking the very basics; is the design comfortable, how does the simple workflow of adding/editing/removing statements work, and building up help and process pages from there. The second part is a more detailed test of depicts and other statements, checking the edge-case examples of using the features, bugs that did not come up during simple testing, etc. Additionally we'll be looking with the community for bugs in interaction with bots, gadgets, and other scripts once the features are live on Commons. Please let me know if you're interesting in helping test and fix these bugs if they show up upon release, it is really hard to find them in a test environment or, in some cases, bugs won't show up in a testing environment at all.

One new thing is definitely coming within the next few weeks, pending testing: the ability to search for captions. This is done using the inlabel keyword in search strings, and will be the first step in helping users find content that is specifically structured data. I'll post a notice when that feature is live and ready for use.

Thanks, let me know if you have questions about these plans. Keegan (WMF) (talk) 21:34, 12 March 2019 (UTC)

Reply to "Structured Data - development update, March 2019"

Structured Data - file captions coming this week (January 2019)

1
MediaWiki message delivery (talkcontribs)

Hi all, following up on last month's announcement...

Multilingual file captions will be released this week, on either Wednesday, 9 November or Thursday, 10 November 2019. Captions are a feature to add short, translatable descriptions to files. Here's some links you might want to look follow before the release, if you haven't already:

  1. Read over the help page for using captions - I wrote the page on mediawiki.org because captions are available for any MediaWiki user, feel free to host/modify a copy of the page here on Commons.
  2. Test out using captions on Beta Commons.
  3. Leave feedback about the test on the captions test talk page, if you have anything you'd like to say prior to release.

Additionally, there will be an IRC office hour on Thursday, 10 January with the Structured Data team to talk about file captions, as well as anything else the community may be interested in. Date/time conversion, as well as a link to join, are on Meta.

Thanks for your time, I look forward to seeing those who can make it to the IRC office hour on Thursday. -- Keegan (WMF) (talk) 20:22, 7 January 2019 (UTC)
Reply to "Structured Data - file captions coming this week (January 2019)"
MediaWiki message delivery (talkcontribs)

The Structured Data on Commons team has begun beta testing of the first feature, multilingual file captions, and all community members are invited to test it out. Captions is based on designs discussed with the community and the team is looking forward to hearing about testing. If all goes well during testing, captions will be turned on for Commons around the second week of January, 2019.

Multilingual captions are plain text fields that provide brief, easily translatable details of a file in a way that is easy to create, edit, and curate. Captions are added during the upload process using the UploadWizard, or they can be added directly on any file page on Commons. Adding captions in multiple languages is a simple process that requires only a few steps.

The details:

  • There is a help page available on how to use multilingual file captions.
  • Testing will take place on Beta Commons. If you don’t yet have an account set up there, you’ll need one.
  • Beta Commons is a testbed, and not configured exactly like the real Commons site, so expect to see some discrepancies with user interface (UI) elements like search.
  • Structured Data introduces the potential for many important page changes to happen at once, which could flood the recent changes list. Because of this, Enhanced Recent Changes is enabled as it currently is at Commons, but with some UI changes.
  • Feedback and commentary on the file caption functionality are welcome and encouraged on the discussion page for this post.
  • Some testing has already taken place and the team are aware of some issues. A list of known issues can be seen below.
  • If you discover a bug/issue that is not covered in the known issues, please file a ticket on Phabricator and tag it with the “Multimedia” tag. Use this link to file a new task already tagged with "Multimedia."

Known issues:

Thanks!

-- Keegan (WMF) (talk), for the Structured Data on Commons Team 20:43, 17 December 2018 (UTC)
Reply to "Multilingual captions beta testing"

Structured Data - copyright and licensing statements

1
MediaWiki message delivery (talkcontribs)
I've posted a second round of designs for modeling copyright and licensing in structured data. These redesigns are based off the feedback received in the first round of designs, and the development team is looking for more discussion. These designs are extremely important for the Commons community to review, as they deal with how copyright and licensing is translated from templates into structured form. I look forward to seeing you over there. -- Keegan (WMF) (talk) 16:25, 2 November 2018 (UTC)
Reply to "Structured Data - copyright and licensing statements"
Return to the user page of "Smalyshev (WMF)".