Open main menu

User:Magnus Manske/quick statements2

< User:Magnus Manske

This is an informal brainstorming page to collect ideas for a version 2 of QuickStatements (aka QS). The underlying assumption here is a complete rewrite of the tool from scratch, allowing backwards compatibility to the current version without being limited by it. All constructive suggestions are welcome, no promises as to implementation!

See also current QS issues on bitbucket, includes feature requests, which should probably be copied here to get a better overview.

Contents

PRE-ALPHA VERSION IS UP FOR TESTINGEdit

I have started implementing a new version here. It should be able to perform all the version 1 commands (see "import commands"). Co-developers welcome!

CAUTION! The code is mostly untested and under heavy development. It might nuke Wikidata and steal your lunch. Try it on the Sandbox item (Wikidata Sandbox (Q4115189)) only, for now.

InitialisingEdit

  • Keep import for tab-delimited (V1) but also allow for other paste/upload formats (CSV, JSON?) (issue)
  • Some form of API
  • "Upload" a list of commands to be done by a bot, to eliminate the necessity of keeping a browser window open/having continued internet access. This could be done through the proposed API. Would require an interface to emergency-stop QS bot edits for a single user (not blocking the entire bot).
  • Integration with the Wikipedia and Wikidata tools plugin for Chrome. Once the statements are prepared with the proper qids, push a button to launch quickstatements. Similar to the workflow with sourceMD and references. Andrawaag (talk) 18:27, 17 November 2016 (UTC)
    • Moving this here, as it is a way to initialise QuickStatements. This would be JS code in the Chrome plugin for the most part, right? --Magnus Manske (talk) 18:36, 17 November 2016 (UTC)
    • Fantastic idea, I'm all for this. Pinging @Tomayac: Spinster 💬 18:30, 20 November 2016 (UTC)
    • Seeing this only now. Please email me at tomac@google.com with concrete asks and details on this request, or open a GitHub issue on the project page.

InterfaceEdit

  • STOP (or pause) button   Done
  • It should be possible to add multiple references to a statement. ProteinBoxBot is a good example of good references made up from multiple statements. ChristianKl (talk) 10:20, 17 November 2016 (UTC)
  • Make the UI a table view instead of a single text field. ChristianKl (talk) 10:18, 17 November 2016 (UTC)   Done
    • I would imagine a table at least as an intermediate page, but there need to be ways to get a large number of commands in to populate the table. Maybe start with a table, but have a "import QS1 commands"? --Magnus Manske (talk) 10:33, 17 November 2016 (UTC)
  • Multi-language interface via ToolTranslate   Done
  • "Fuzzy matching" mode (request). Would presumably trigger an intermediate interface step between uploading/submitting and executing.
  • Duplicate functionality from PetScan (same small set of operations, done on a lot of items)?
  • "Interactive mode": Show an intermediate table (row=item, column=operation), let the user edit the values. As part of this:
    • Query-style autocomplete for item/property names, replace with Q number.   Done
    • Columns with "show" values (e.g. for translation; show columns with English label, edit column with French label)
  • "third-party match": Give a list of values for property X, matched to values for property Y; get a list of items, with Y values filled in for the respective X
  • When a statement is not understood (typically because the user made a syntax error), output an error about it. Syntax highlighting would be wonderful but I guess it is too difficult to implement.
  • Not sure if it should be in the tool or not but it would be a really helpful addition to one of the tools if you automatically match things like names of countries and other common terms to their Q numbers that you have to match to import data about existing items. John Cummings (talk) 21:57, 20 November 2016 (UTC)
  • Pending / Done counter at the bottom of the page should have error counter too. Also stop on error functionality would be nice.Zache (talk) 13:07, 14 February 2017 (UTC)
  • Doing the same action for many items more easily:

Instead of writing: Qx|Q17|Q801||Qa|Q17|Q801||Qb|Q17|Q801||Qc|Q17|Q801.... ==> Qx,Qa,Qb|Q17|Q801. Possible? Thanks.Mikey641 (talk) 14:57, 2 April 2017 (UTC)

Bugfixes for V1Edit

  • Restarting web browser runs QuickStatements again (issue)   Done in V2
  • Better parsing of date/time
  • Quantities with decimal point are extended with many additional digits. (For instance 0.641 becomes 0.6410000000000000142108547152020037174224853515625) 123 (talk) 19:58, 9 April 2017 (UTC)
  • Using LAST as target of a claim works in QuickStatements v1 but not in QuickStatements v2. − Pintoch (talk) 16:09, 19 June 2017 (UTC)

Requirements over current versionEdit

  • unit support (may already work, I don't remember...)
    • The NIOSH fork supports units in this way: 13.5U11573, where 13.5 is the numeric amount of the measurement and U11573 refers to metre (Q11573) as the unit.
  • edit properties
  • edit statements that link to properties
  • remove statements/references/qualifiers
  • remove items (admins only)
  • group multiple reference statements into a single reference on creation
    • "Adding 'retrieved' dates to references, which is not yet possible now" Spinster 💬 18:34, 20 November 2016 (UTC)
  • work on multiple Wikibase installations (Commons, Wikitionary, Librarybase)
  • support "external items" type (Commons, Wikitionary)
  • groups as many edits as possible into a single Wikidata API action, to cut down on the number of edits made (RC flood, DB revisions etc). For example, the mentioned PHP code already groups all statements for a CREATE into a single action (example).
  • adding descriptions without replacing the current ones
    • There is only one description per item/language, same as labels. Maybe a "do not overwrite" checkbox? Or "append when description exists"? --Magnus Manske (talk) 10:34, 17 November 2016 (UTC)
  • Precision option when adding latitude/longitude coordinate statements. Currently always sets precision to a millionth of a degree (±0.000001°) NavinoEvans (talk) 16:16, 21 November 2016 (UTC)
  • use properties as qualifier values
  • output item-ID in same line when outputting an error (and link the items):   Done
    Instead of: "ERROR (set_desc) : Item [[Q5|Q5]] already has label "Human" associated with language code en, using the same description text."
    Output: "Q2999999": ERROR (set_desc) : Item Q5 already has label "Human" associated with language code en, using the same description text."
  • The ability to set "unknown value" / "no value" settings for properties would be really useful - this was the one thing I felt was a real gap in v1. Andrew Gray (talk) 18:52, 10 December 2016 (UTC)
    • I totally agree, furthermore when setting the property Property:P1448, one should be able to set the language, as required for this property. Affom (talk) 12:49, 29 April 2017 (UTC)
  • Currently, there seems to be no log of the batches, nor entries in the user's contribution history (or did I miss it?) So it's a bit difficult to keep track of what someone did. Jneubert (talk) 09:25, 28 April 2017 (UTC)
  • It would be useful to have possibility to add ranks with QS. XXN, 12:29, 3 May 2017 (UTC)
  • Adding new versions of existing properties. At the moment, adding sources/qualifiers to a Pxxx:Qyyy pair just drops all the qualifiers into the existing pair (eg. It would be great if we could force a new pair to be created with a new set of qualifiers - this would be really helpful for eg/ people who hold the same office on different occasions with different start/end dates. At the moment we need to edit these by hand. Andrew Gray (talk) 16:09, 11 June 2017 (UTC)
  • It should be easy to retrieve the Q-numbers for newly created items so they can be fed back into the original dataset. I could imagine a feature like the following:
    • Following the CREATE command users may on the same line provide a string that contains a unique ID from the original dataset.
    • Upon data ingestion, the QuickStatements Tool outputs a CSV containing the provided string along with the Q-number of the newly created item.
    • The user could then easily use the resulting CSV file to feed the Q-numbers of the newly created Wikidata items back into the original dataset.

ReconciliationEdit

Having imported some bibliographic items I miss support of reconciliation (matching strings to items). How about this: the character ? before a string value should indicate that this value must manually be reconciliated. Example:

   LAST P50 ?"Larry Wall"
   LAST P123 ?"O'Reilly"

The user interface should ask to select an item via dropdown/autosuggest, e.g. Larry Wall (Q92597) and O’Reilly Media (Q1065097). Suggestion may be improved by limiting the list to items of the type required by property, e.g. author (P50) has property value type constraint human (Q5). But even without this optimization reconciliation would be great help.

Optional support of language to select from:

   LAST P17 ?de:"Deutschland"

-- JakobVoss (talk) 08:01, 17 August 2017 (UTC)

Avoid duplicationEdit

Could the tool give a warning when someone tries to create a new entry which already exists in Wikidata? I especially think in the context of publications where the same title may already indicate that this item may already exist. --Zuphilip (talk) 12:14, 8 June 2017 (UTC)

To do so, quickstatements would need to query selected LAST ... statements whether there is already an item with this statement. How do we indicate which statement should be used to check for duplicates? Maybe check for properties with distinct values constraint (Q21502410)? -- JakobVoss (talk) 08:01, 17 August 2017 (UTC)

Thoughts on Code and implementationEdit

  • Original QS using JavaScript and WiDaR
  • A QS PHP class (source) which uses its own bot exists; could become the backend for QS2?
  • Suggest intermediate format (JSON) to convert from other input formats. JSON would then be used to run, store etc. Could also be submitted directly (API). Could use Wikidata API JSON to create/edit, but likely needs to be more flexible (e.g. adding to an existing statement, if the property/value combination already exists, as QS currently does)

ModelEdit

First attempt at modelling the data flow.

  • A "command" is an atomic unit, representing one change/edit, when seen on its own.
  • Multiple commands can be grouped from other commands by the user (e.g. multiple reference statements as one reference "group").
  • Multiple commands can be grouped from other commands automatically for execution on Wikidata, if technically feasible (e.g. all statements for a CREATE command).
  • Server code can execute a single command, and report back on success. Accessible via API.
  • Web interface (JS) can fire individual commands against the API, mark commands in interface on success/failure. This would be running from the browser, as QS1 is doing.
    • Should those be logged in QS2 database? Review/rollback?
  • API can take a command batch, store the individual commands in a database associated with the batch, and run them from the server as a bot user, adding user name and batch ID in the edit comment.
  • Web interface can also submit a batch via API.
  • Commands stored in the database can be used for review/rollback.

Internal JSON formatEdit

Each command is a JSON object. Qxxx is an item, Pxxx is a property, PQxxx is either. {datavalue} is a JSON object representing a datavalue in Wikidata API notation.

Adding a statement   Done
{"action":"add","what":"statement","item":"PQxxx","property":"Pxxx","value":{datavalue}}
Adding a qualifier   Done
{"action":"add","what":"qualifier","item":"PQxxx","property":"Pxxx","value":{datavalue},"qualifier":{datavalue_qualifier}}
requires either a statement as above, or a statement ID as "id":"...". {datavalue_qualifier} is the datavalue of the qualifier, in Wikidata API notation.
Adding sources   Done
{"action":"add","what":"qualifier","item":"PQxxx","property":"Pxxx","value":{datavalue},"sources":[{datavalue_source,...}]}
requires either a statement as above, or a statement ID as "id":"...". [{datavalue_source}] is an array of datavalue of the sources, in Wikidata API notation.
Adding a label/description/alias   Done
{"action":"add","what":"label","item":"PQxxx","language":"ISO code","value":"The new label"}
what is label, description, or alias; the latter adds, the others replace
Adding a sitelink   Done
{"action":"add","what":"sitelink","item":"PQxxx","site":"enwiki","value":"Wiki page name"}
Remove statement/sources/qualifier
Use "action":"remove" on any of the above to remove. A value may be "any" to remove all instances.
Create a new item   Done
{"action":"create","type":"item"}
type can be item or property

Documentation for v2Edit

While the first version of the tool had a good explanation, the UI for QuickStatements2 does not offer any hint at how we are supposed to create a command. How about a short example showing how to write and run a sample command? The only way I can find to enter a command is the "Import from v1" option, I can't believe it is the only way, right? Or is it? Thank you! Syced (talk) 09:20, 7 September 2017 (UTC)

"Import from v1" has the only documentation I can find. The tool is also backward compatible with older version which is well documented. --Jarekt (talk) 11:41, 7 September 2017 (UTC)
Same question as Syced, is "Import from v1" the only option? (it's the only one that I used).
Jarekt for the documentation for the v1, it's quite good but it's improvable. For instance, it doesn't mention subtleties like the calendar for dates, the ranking or badges (I guess it's not possible but ideally, it should be clearly said). In the end, I think that a specific documentation for the v2 is necessary and maybe a central for asking "How can I do this" would be welcome (instead of asking on the Project chat, on talk pages, on Twitter or on IRC here the discussion are losted).
Cdlt, VIGNERON (talk) 17:46, 3 October 2017 (UTC)
May be we should create some un-official project page Help:Quick statements where we would copy current V1 documentation and start expanding it, with what we know about V2. --Jarekt (talk) 17:55, 3 October 2017 (UTC)

I started a documentation page: Help:QuickStatements. Feel free to enrich it, thanks! Syced (talk) 04:05, 4 October 2017 (UTC)

Syced, Great. I am working on expanding it. --Jarekt (talk) 13:21, 4 October 2017 (UTC)
Thanks! I agree https://tools.wmflabs.org/wikidata-todo/quick_statements.php is gentler on new users. I had forgotten the exact syntax so I had no clue what to do on the new version, but after I checked the old documentation it's working great. :) Nemo 08:23, 17 October 2017 (UTC)

Upload report for batch uploadsEdit

Thank you for creating such a useful tool. I am using this tool for writing descriptions in Malayalam language. I have some suggestions for improving the tool :

  • For batch uploads done using 'Run in background' option, it will be nice to have a 'batch report' that explains the errors in detail. As of now, the user has to choose 'Your last batches' and can only see the number of successful uploads and errors. It will be nice to have details about the errors shown in the 'batch report' so that the user knows which items had errors and why.
  • When I tried to upload 40,000 descriptions in one go, I could not run it either by using commands or by using batches. Is there a limit to the maximum number of uploads that can be done at a particular point of time? If yes, what is the limit and why do we have it? Netha Hussain (talk) 08:01, 22 September 2017 (UTC)

I want to back this request by Netha Hussain. I think there is indeed a need for a better report feature for uploads via Quicstatements1&2. Has this already been considered? Getting the following overviews would really help keeping Wikidata clean:

  • clear overview of items that gave errors
  • a structured list of all new items you have just created using LAST (just looking at your user contributions doesn't help as you can not see the edits made via User:QuickStatementsBot).

@Magnus Manske: I would really like to know your opinion on that. Thank you! - Alina data (talk) 14:26, 21 February 2018 (UTC)

Removing a property does not seem to workEdit

I am trying to remove a property from an item.

First I clicked "Import commands" then "Version 1 format" then I entered the following, which contains a proper TAB character:

-Q5332409	P571

Upon clicking "Import", nothing happens. Just "No data available in table".

Second I read in the documentation that Remove is a V2-only feature, so I decided to try the V2 syntax, which is apparently:

Q5332409|P571

There is nowhere to import this command, the Import command menu is only about V1, strangely. So I tried running it via URL: https://tools.wmflabs.org/quickstatements/#v2=-Q5332409%7CP571 but that does not work either, only "No data available in table" is shown.

Has anyone managed to remove a property? If yes, how?

Thank you! Syced (talk) 06:20, 6 November 2017 (UTC)

Syced I was just removing some properties with lines like -Q33169363|P373|"Aoike-chō, Nagoya" and it worked just fine. My guess is that you still need the value of the property you are removing. Which makes sense in case there are several. That is also what the documentation suggest. --Jarekt (talk) 18:36, 6 November 2017 (UTC)

Add multiple sources in one claimEdit

Hi. I'd like to add two sources in one claim, but in the new interface of QuickStatements V2, I cannot do that. The below code works fine in older interface: LAST|P5205|14|S143|Q33109119||LAST|P5205|14|S854|"http://kanji.jitenon.jp/kanjid/1895.html", but in the new one, it makes two duplicate claims (https://www.wikidata.org/w/index.php?title=Q55406214&oldid=707356668). This phenomenon doesn't happen when it does not entail an item creation (https://www.wikidata.org/w/index.php?title=Q12174424&type=revision&diff=707251396&oldid=707250957). --Okkn (talk) 06:31, 7 July 2018 (UTC)

Ability to export QS batch to tab/csvEdit

As a use case ... my batch has completed. It has 89 errors. I want to download the errors in the same (tab, csv) format as was used to upload them, so that I can work through them to fix issues. (Sure I could work from the error list in the QS interface, but I cannot, for instance, track my progress as I wade through issues, and I have 30 batches each with ~50 errors per batch & am bear of small brain.) So. I'm thinking it would be good to be able to extract the contents of a batch, perhaps suffixed by a #status. thx --Tagishsimon (talk) 08:12, 17 July 2018 (UTC)

Would also be useful to be able to view errors for a QS temporary 'browser-based' batch when viewing the batch through the discuss/revert link - cf. https://tools.wmflabs.org/editgroups/b/QSv2T/1533259904195/ - albeit I appreciate right now you likely do not store the data? Errors tend to be the most interesting thing for me (playing with labels & descriptions) as they often point to duplicate items. But I lack the time, now, to fix those from my last batch, and lack the discipline to keep a browser tab open until I do have time, days or weeks later. --Tagishsimon (talk) 11:50, 3 August 2018 (UTC)

Petscan to QSEdit

Hello Magnus,

I don't understand why this Petscan https://petscan.wmflabs.org/?psid=5146689&al_commands=-P50%3AQ8254925%0AP50%3AQ4233718%0A when I click on "Start QS" and run, gives errors for removal of erroneous value, while still adding the new value. In Petscan, it runs OK, so it's not a syntax problem.

Can you please explain what I do wrong ? Thanks for you help. Hsarrazin (talk) 23:42, 24 July 2018 (UTC)

PS : I noticed that on the REMOVE action, the QID lacks Q at the begining. Could this be the cause ? Hsarrazin (talk) 13:03, 25 July 2018 (UTC)
I saw today that it is now fixed... Thanks ! \o/

Can't submit literals in CSV format?Edit

Hi @Magnus Manske: thanks for this great tool!

But how can we submit literals (strings) using the new CSV format? Its documentation Help:QuickStatements#CSV_file_syntax says "The double quotes for string values seem to interfere with CSV syntax. Empirically four double quotes before and one after the string have been found to work" but

  • that works only for one literal on the line, the second literal fails
  • you may agree that's a rather baroque way of denoting strings

Is it possible for you to use TSV as the basic format, and then hopefully quotes will make it through unmangled? Otherwise the utility of the CSV format is very limited, and we'll have to stick to the v1 format --Vladimir Alexiev (talk) 16:06, 19 November 2018 (UTC)

QuickStatements for CommonsEdit

Structured data gets rolled out on Commons, for now only with labes. Would it be possible to enable QuickStatements for Commons? (Ping for @Keegan (WMF): who knows about eventual technical issues) --GPSLeo (talk) 09:43, 11 January 2019 (UTC)