User talk:Vladimir Alexiev/Archive 1
This page is an archive. Do not edit the contents of this page. Please direct any additional comments to the current talk page. |
See also User talk:Vladimir Alexiev/Archive
VIAF - update edit
Hi! I've noticed the beautiful tables you posted in Wikidata:WikiProject Authority control#VIAF Games. Would it be possible to update them in the future? The last import of 570k VIAF identifiers (over +30%) might have changed the situation of one year ago. Thank you very much, --Epìdosis 17:02, 24 November 2019 (UTC)--Epìdosis 17:02, 24 November 2019 (UTC)
- @Epìdosis: Updated number of VIAF links in WD.
- Re-counted 2019-11 with viaf-links-count.pl, see this gist. Tightened the counts: duplicated IDs are not counted twice (see the gist)
- I also tried counting WD external-ids with SPARQL (.rq) but those queries timed out. HELP NEEDED.
- Below are the new VIAF link stats but I don't have the time to merge them in Wikidata:WikiProject_Authority_control#VIAF_Links_per_Source. Could someone help with that?
- You will notice some new types of ID. I've created two proposals but we need to explore the rest and make more proposals.
Notified participants of WikiProject Authority control
Help needed. Cheers! --Vladimir Alexiev (talk) 16:28, 28 November 2019 (UTC)
Notified participants of WikiProject Authority control Help needed. Cheers! --Vladimir Alexiev (talk) 16:28, 28 November 2019 (UTC)
16121 ARBABN 377331 B2Q 376803 BAV 1824982 BIBSYS 300092 BLBNB 185349 BNC 206029 BNCHL 666416 BNE 2498551 BNF 29007 BNL 647483 CAOONL 95020 CYT 181601 DBC 111686 DE663 8451411 DNB 54624 EGAXA 69054 ERRR 180762 FAST 151623 GeoNames 178348 ICCU 9991 IMAGINE 8535276 ISNI 8454581 Identities-lccn 11464602 Identities-viaf 242578 JPG 594543 KRNLK 10560894 LC 747513 LIH 210682 LNB 15879 LNL 8569 MRBNR 228054 N6I 1144734 NDL 1735345 NII 925678 NKC 1119857 NLA 3940 NLB 1902439 NLI 159825 NLR 547775 NSK 33727 NSZL 2735696 NTA 1875069 NUKAT 87316 ORCID 1228 PERSEUS 1619777 PLWABN 440339 PTBNP 2194878 RERO 220157 SELIBR 61360 SIMACOB 51717 SKMASNL 209 SRP 3528681 SUDOC 138138 SZ 85957 UIY 12342 VLACC 143730 W2Z 1951619 WKP 8012817 Wikipedia 1518 XA 2142739 XR
Worldcat Identities edit
I'm inserting 1.7M Worldcat Identities links. I've added them as 85 batches of 40k statements each https://tools.wmflabs.org/quickstatements/#/batches/Vladimir%20Alexiev), and it will take some months for them to trickle through QS.
Related:
- https://www.wikidata.org/wiki/Topic:Vfnr6v8lfqagpi7g "how to add 1.7M claims with QS?" discussion with @Magnus_Manske:
- User_talk:Florentyna#Worldcat_Identity who disputes one link based on the quality of the target page.
- https://github.com/maxlath/wikibase-cli/issues/62 discusses possibility of enabling wikibase-cli to consume QS format @Maxlath, JakobVoss:
--Vladimir Alexiev (talk) 10:34, 6 February 2020 (UTC)
Bot vs QS edit
Please check batch https://tools.wmflabs.org/quickstatements/#/batch/25559 in QuikStatements. It stopped after 522 items edited. And then changed the status to finished. I think you need to start this batch again. -- Hogü-456 (talk) 19:09, 4 February 2020 (UTC)
- Thanks for notifying me, as I plan to run about 200 more such batches. It is running now and making progress. I did not restart it, so it was some WD internal thing . Vladimir Alexiev (talk) 01:52, 5 February 2020 (UTC)
- I asked at the Page Wikidata:Bot requests if it is possible to add property pairs to Wikidata in a QuickStatements like format with a bot. I think this is faster than QuickStatements. I don't know how high the capacity of QuickStatements is but I think it is lower than that what a bot can add.
- I agree with you but I don't know how to do it with a bot. I won't hold my breath about someone making a bot who can eat QS, but see https://github.com/maxlath/wikibase-cli/issues/62
- A bot can add more things at the same time what is not in every case possible in QuickStatements. Maybe you can think about using a bot if the number of batches you want to upload is so high. I think a bot needs 1,5 months to add it. With QuickStatemens you need probably 5 months.
- Can you suggest a both that I can use?
- Because there are also other users who want to upload something and there are 2 commands if you add the property and the source. These are some numbers for your information. Please pay attention that you don't upload to many batches at one day.
- Why not? They are all posted, but only 2 of them are moving (at the snail speed of 30 sec per claim). They are not blocking batches by other people
- I think that there can be problems in QuickStatements with handling that big amount. But there I am not sure if this was the reason of the problems with the tool in the last year. Usually you can't do more than 100,000 edits at one day. At the end it is a good thing that it exists and in the most times it works good. -- Hogü-456 (talk) 20:56, 5 February 2020 (UTC)
- If QS can't add 1.7M that's a serious problem in QS that needs fixing. In a normal RDF repo, I could add that number with SPARQL Update in a few seconds --Vladimir Alexiev (talk) 10:34, 6 February 2020 (UTC)
- Your comments are interesting. Especially the last information with the time what is needed to add that number. Maybe my information about how the batches are running is wrong. I thought that there is a principle of first in first out and then it were a problem and some suggestions at my talk page seem that it is so. In the last two days QuickStatements does not run in a good way. You can talk with Magnus and tell him if you have ideas to improve it that it is faster. As far as it seems there are other Databases who are much faster in adding information. -- Hogü-456 (talk) 17:46, 6 February 2020 (UTC)
https://m.wikidata.org/wiki/Wikidata:Contact_the_development_team#Batch_QuickStatements_has_become_unsustainably_slow shows a speedup of QS Vladimir Alexiev (talk) 22:48, 6 March 2020 (UTC)
Semantic Scholar edit
Have you seen the recent progress in adding SemScholar links and metadata? [1] Sj (talk) 14:34, 21 February 2020 (UTC)
- @Sj: Thanks! But the numbers are quite small... --Vladimir Alexiev (talk) 14:24, 24 February 2020 (UTC)
Request translation Isabelle de Charriere edit
Hello Vladimir Alexiev, Could you write/translate the article of Isabelle de Charrière for the Bulgarian Wikipedia or find someone else to do that? That would be appreciated. Boss-well63 (talk) 10:44, 13 March 2020 (UTC)
- @Boss-well63: Please make a request at https://bg.wikipedia.org/wiki/%D0%A3%D0%B8%D0%BA%D0%B8%D0%BF%D0%B5%D0%B4%D0%B8%D1%8F:%D0%97%D0%B0%D1%8F%D0%B2%D0%BA%D0%B8_%D0%BA%D1%8A%D0%BC_%D0%BF%D0%BE%D0%BB%D0%B8%D0%B3%D0%BB%D0%BE%D1%82%D0%B8 Vladimir Alexiev (talk) 09:05, 15 March 2020 (UTC)
- Sorry but I can't read or write Bulgarian. That's why I asked you. Boss-well63 (talk) 16:18, 16 March 2020 (UTC)
Errors with WorldCat edit
Hi Vladimir. I've noticed that a number of your WorldCat additions have been matching WorldCat entries for geographic locations to the Wikidata entry for railway stations; see for example here, here, and here - and matching a musical group to a station here. I'm guessing that these are errors in the WorldCat data set. If the dataset has that many errors, should it really be added? Pi.1415926535 (talk) 22:58, 24 March 2020 (UTC)
- @Pi.1415926535: Thanks for the reverts! I've collected all reported errors at https://en.wikipedia.org/wiki/Wikipedia:VIAF/errors#WorldCat_Identities_errors and will report them to OCLC. But please quantify "that many errors": in any dataset of 55M entries there's bound to be some errors. So far for about 1.5M Worldcat links I got about 20 reverts and over 100 thanks. VIAF is a crucial authority control dataset and we shouldn't just dismiss it. (The errors are in the VIAF clustering, not the WorldCat link). --Vladimir Alexiev (talk) 15:41, 25 March 2020 (UTC)
- I reverted this in Africa (Q15). Paweł Ziemian (talk) 20:49, 27 March 2020 (UTC)
Hi @Nezdek:! About your revert: If I understand correctly, Q60846274 and Loire Forez Agglomération (Q2986901) are different "incarnations" of the same Public institution of intermunicipal cooperation. I think that https://www.worldcat.org/identities/viaf-138726920 and http://viaf.org/viaf/138726920 describe the same institution. I think that VIAF and WorldCat don't have separate entries for the different "incarnations", so it's correct to apply them to any of those incarnations? --Vladimir Alexiev (talk)
@Vladimir Alexiev: : I suggest you to check the French Wiki, or the period of existance on Wikidata. There was a merger of 4 organizations in 2017. One of them had approximately the same name than the new one. That’s why ! Nezdek (discussion) 14:41, 7 April 2020 (UTC)
Joconde IDs edit
- Is data.culture.fr still working? (I'm not able to open it).
- If the IDs are still used, is old style ID (Txxx-yyyy) deprecated? Are there always a GUID for each old style ID entries?
- I propose to merge all Joconde UUIDs to one property, but the first two questions should be answered first.
--GZWDer (talk) 19:00, 3 April 2020 (UTC)
- Are there entities with old style ID only?--GZWDer (talk) 06:23, 4 April 2020 (UTC)
@GZWDer: It appears to be down at the moment. But the thesauri are still valid. I found another description page at https://data.culture.gouv.fr/explore/dataset/les-vocabulaires-du-ministere-de-la-culture-et-de-la-communication/.
Each concept has one URL. The old style URLs (segmented per thesaurus) were migrated from some older system. But the new system GINKO can only allocate UUIDs.
I think we should merge their ID props into one, and that neither of these answers invalidates this.
Unfortunately we cannot have very powerful validation regexps because of that historic inconsistency. Vladimir Alexiev (talk) 06:18, 4 April 2020 (UTC)
Municipality vs. seat in WorldCat edit
Hi. Regarding Special:Diff/1152172243 and Special:Diff/1152172503: why should we use these identifiers inaccurately? This just encourages users to mess things up further, as illustrated by adding inaccurate WorldCat id based on inaccurate VIAF id. Separate VIAF entity for the municipality may be created later. Until then, I believe it's better to keep municipality item without VIAF link. Please note that earlier I also moved these VIAF ids to items about settlements (seats), Q191106 and Q3044083, and so values are not distinct anymore. 2001:7D0:81F7:B580:4926:AA1B:8734:8BCA 11:47, 7 April 2020 (UTC)
- Hi! How can you tell the VIAF entry and WorldCat page are about the seat and not the municipality, given that there is no other entry in VIAF? There is no requirement, nor it is realistic, that WD and other global databases will have 1:1 correspondence. If and when VIAF creates another entry, we will split them. --Vladimir Alexiev (talk) 13:26, 7 April 2020 (UTC)
- Current municipality of Haapsalu was created recently in 2017. As VIAF entry is linked to other databases that refer to older sources then it must match something else, like the settlement. As for Padise, VIAF links only to LCCN (besides Wikidata), which refers to "populated place", i.e. also not the (former) municipality. Also, note that name of the latter municipality in Estonian is "Padise vald", and not just "Padise".
- Of course there isn't 1:1 correspondence between Wikidata and other databases (or sometimes entries in these other databases really are way too vague, and so it's impossible to tell what they are about). As a consequence, it seems natural to me that there is no reason to try link all entries in another database that has no 1:1 correspondence. Otherwise what's the point of providing these links if it only contributes to further confusion and errors. 2001:7D0:81F7:B580:4926:AA1B:8734:8BCA 15:13, 7 April 2020 (UTC)
- I claim that the VIAF record is about both municipality and seat because the two are so closely related, and because there's no other VIAF record. The WorldCat page is NOT wrong because it shows docs relevant to both the municipality and the seat. What confusion and errors do you mean in this case?
- BTW, thank you for all your other fixes, now those were real examples of confusion and errors. I'm collecting them at https://en.wikipedia.org/wiki/Wikipedia:VIAF/errors#WorldCat_Identities_errors and will report them to OCLC --Vladimir Alexiev (talk) 11:23, 8 April 2020 (UTC)
- What doc does this WorldCat entry show that are specifically about municipality that was created in 2017? I don't see any.
- I doubt that we can conclude anything from the fact that there's no other VIAF record. There are many Wikidata items for which there is no matching nor even closely related VIAF entity anyway. For example, most of 4k+ settlements in Estonia, I hope you don't suggest that these should be linked to something that is "close enough", e.g. VIAF entity about the country.
- Municipality and seat may be closely related (or poorly distinguishable) in some other country, but in Estonia at least the distinction is very clear: both have distinct official boundaries, and if municipality is dissolved then settlement generally remains as such (as is the case for Padise (Q3044083)), and even their names match only partly if at all.
- Inaccurate use of identifiers suggests that municipaly and seat are the same, while actually they are clearly distinct. Based on inaccurate identifiers further inaccurate data is added, which seems to be pretty much what already happened when these WorldCat ids were added based on VIAF.
- Ok, I agree with you as soon as you don't merely delete the WorldCat but move it to the correct item --Vladimir Alexiev (talk) 12:10, 10 April 2020 (UTC)
- As for Special:Diff/1152998089: this WorldCat entry per its label and per works that it mentions is about some historical school. This WorldCat entry links to VIAF and LCCN entries, but these other entries are about settlement, nothing suggest that they are about school instead. So most likely WorldCat entry is erroneously linked to these other databases. 2001:7D0:81F7:B580:8C08:F986:5B46:F52E 16:38, 8 April 2020 (UTC)
- Yes, the label is about a school. But the labels on the right are various names of the village. There's also a military map "Reihe V. Blatt 5. Fellin. / bearbeitet i. d. Kartogr. Abt. d. Stellv. Generalstabes d. Armee. - Maassstab 1:126 000. - [Berlin], Druckauflage 1915" that's likely about the village. If the books are about a school in the village, then by extension they are also about the village. "WorldCat entry is erroneously linked to these other databases": which other databases do you mean? This page merely shows books from the WorldCat catalog that are indexed with the VIAF and LCSH terms. --Vladimir Alexiev (talk) 12:10, 10 April 2020 (UTC)
- Please make a user account so I can ping you --Vladimir Alexiev (talk) 12:10, 10 April 2020 (UTC)