Wikidata:Requests for permissions/Bot/Symac bot 4
The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- withdrawn, license problem --Pasleim (talk) 09:39, 18 November 2014 (UTC)[reply]
Symac bot 4 edit
Symac bot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: Symac (talk • contribs • logs)
Task/s: using a list of ~8000 records from Autolist that gives me all items with Bibliothèque nationale de France ID (P268) but without date of birth (P569) that are instance of (P31) == human (Q5), I get the birth date and death date from French national Library.
Code: see on gist the import script.
Function details: With some previous scripts I have built a database of ~8k items of which there are 2520 records whose birth date provided by the French national library is ok (no ?, no 19..). They are going to be added to the items, with a reference to the French national library.
Script has been run on some records so far :
- Melchior Lorck (Q70893) (diff)
- Bartholomaeus Keckermann (Q67136) (diff)
- Zdeněk Fiala (Q25140) (diff)
Let me know if you have any question about this script.
P.s. : second RfP of the day but it's only because I had some time to build the scripts today, I won't run them simultaneously. --Symac (talk) 00:36, 8 November 2014 (UTC)[reply]
- Thanks for doing that. A couple of comments.
- user:Ricordisamoa had done a similar proposal. There appeared to be copyright issues but the person in charge of the data at the BNF said it was fine, so I think we can do it.
- Can you add the reference URL (P854) of the page from which the data are extracted ?
- The BNF also uses the "naissance" field for organizations, so the task should probably be restricted to instances of humans, or better, use inception (P571) for instances of organizations (these ones)
- (if possible) when we already have a data, but without source or with source = imported from Wikipedia, I think you could check that it matches the BNF, and when it does add the BNF as a source or else add a statement with the BNF date (it seems that the BNF now simetimues use Wikipedia as a source though...)--Zolo (talk) 08:39, 8 November 2014 (UTC)[reply]
- We need a formal permission from the BNF, not an informal "okay".
- Wikidata policies request we honour the databases sui generis rights in EU countries. So, we need a waiver explicitly allowing automated extraction of their db. --Dereckson (talk) 09:11, 9 November 2014 (UTC)[reply]
- @Dereckson:, as indicated below I don't think that a single data from an authority record can be copyrighted, no ? Symac (talk) 09:32, 9 November 2014 (UTC)[reply]
- Facts aren't copyrightable.
- I'm concerned with sui generis database right.
- You'll find more information about how they applies to Wikidata here. --Dereckson (talk) 09:39, 9 November 2014 (UTC)[reply]
- Admitedly, the BNF person does not appear to have realized that releasing the data in Wikidata would mean they would become CC0, with no attribution required, maybe we should insist on that. But if we have informal permission from the BNF, do we really need to go into bureaucratic waiver requests. We are just importing a small part of ther CC-BY data to fill a small part of the holes in our data. Getting their agreement is more of a matter of good manners and reputation than of credible legal risk. Actually, we may well have indirectly imported more data from other copyrighted databases when we imported data from Wikipedia. It is ust less visible. --~~
- We need a formal permission from the BNF, not an informal "okay" to allow REUSERS to use under CC-0 and not under CC-BY. you're entering in the danger zone arguing we ignore the law when it's convenient. --Dereckson (talk) 20:25, 9 November 2014 (UTC)[reply]
- I am all for getting a permission from then BNF but considering that it is not very clear that using them even without permission would break French law and that we almost certainly do not beak US law, I am wondering if we really need to make things so formal. It is not really clear that their would be any database right because the part of their database we import may not be considered "substantial" and because database right stems from "the selection or arrangement of their contents". Here the selection is not made by the BNF (we are only importing data for which we already have items), and we are not reusing the structure of the BNF data. --Zolo (talk) 07:20, 11 November 2014 (UTC)[reply]
- Sorry but I don't understand what is « not very clear » about breaking the French law.
- The Loi n° 98-536 du 1 juillet 1998 portant transposition dans le code de la propriété intellectuelle de la directive 96/9/CE du Parlement européen et du Conseil, du 11 mars 1996, concernant la protection juridique des bases de données says explicitely « Le producteur d'une base de données, entendu comme la personne qui prend l'initiative et le risque des investissements correspondants, bénéficie d'une protection du contenu de la base lorsque la constitution, la vérification ou la présentation de celui-ci atteste d'un investissement financier, matériel ou humain substantiel. ». It adds « Cette protection est indépendante et s'exerce sans préjudice de celles résultant du droit d'auteur ou d'un autre droit sur la base de données ou un de ses éléments constitutifs. ».
- It couldn't be more clear: in addition to copyright (droit d'auteur), content is protected (protection du contenu de la base), as soon as the db producer invested some time, money or effort to prepare the database.
- We're here in the process of extracting data.
- This is explicitely covered by the law too: « l'extraction, par transfert permanent ou temporaire de la totalité ou d'une partie qualitativement ou quantitativement substantielle du contenu d'une base de données sur un autre support, par tout moyen et sous toute forme que ce soit »
- We're here in the process of republishing data.
- This is explicitely covered by the law too: « la réutilisation, par la mise à la disposition du public de la totalité ou d'une partie qualitativement ou quantitativement substantielle du contenu de la base, quelle qu'en soit la forme »
- To know what is or not substantial, we have both national and CJEU decisions who consider substantial part a hundred of records, and a substantial database production any production with an high cost (including salaries) or a special care to assembly data.
- What is not clear? --Dereckson (talk) 20:17, 11 November 2014 (UTC)[reply]
- What was not clear to me was mostly what could be considered substantial. It seems that we can rougly expect to get about 10 000 dates ([1]) from a database of about 4 million entries ([2]. If we consider that life dates are 20% of the content of an entry, that means 0,05 % of the database. But if parts of a hundred records can be considered substantial then those 10 000 dates may indeed be substantial.
- The BNF data are under an "open license" [3], which seems to have two issues:
- it requires showing the retrieval date. We can do that, but not demand our reusers to do it.
- it demands that we cite the source. The CC0 license states that it should normally not be the case, but adds "Should any part of the Waiver for any reason be judged legally invalid or ineffective under applicable law, then the Waiver shall be preserved to the maximum extent permitted taking into account Affirmer's express Statement of Purpose.". Apparently, it is not clear that the "no need to cite the source" would be valid in French law [www.wipo.int/edocs/lexdocs/laws/fr/fr/fr082fr.pdt]. If so CC0, as applicable in France would be essentially equal to the "licence ouverte" except for the date of retrieval issue.
- Well I guess this suggests once again that the use of a CC0 license in Wikidata makes things very complicated, but the Wikidata team has been totally unwilling to debate the license choice. --Zolo (talk) 07:32, 12 November 2014 (UTC)[reply]
- @Dereckson, Zolo:, It's less than 10k records, only 2250 should be updated, which can't be considered substantial with regard to the size of the database. If we add that the data are released under an open license by the producer, can we really think there's a problem importing these single dates ? Symac (talk) 14:50, 16 November 2014 (UTC)[reply]
- ːAsk on meta. and to WMF legal department.
- ːDid you know the last time people acted like you're offering we closed a project to restart if from scratch?
- ːSee https://lists.wikimedia.org/pipermail/foundation-l/2006-March/019857.html --Dereckson (talk) 21:51, 17 November 2014 (UTC)[reply]
- I am all for getting a permission from then BNF but considering that it is not very clear that using them even without permission would break French law and that we almost certainly do not beak US law, I am wondering if we really need to make things so formal. It is not really clear that their would be any database right because the part of their database we import may not be considered "substantial" and because database right stems from "the selection or arrangement of their contents". Here the selection is not made by the BNF (we are only importing data for which we already have items), and we are not reusing the structure of the BNF data. --Zolo (talk) 07:20, 11 November 2014 (UTC)[reply]
- We need a formal permission from the BNF, not an informal "okay" to allow REUSERS to use under CC-0 and not under CC-BY. you're entering in the danger zone arguing we ignore the law when it's convenient. --Dereckson (talk) 20:25, 9 November 2014 (UTC)[reply]
- Admitedly, the BNF person does not appear to have realized that releasing the data in Wikidata would mean they would become CC0, with no attribution required, maybe we should insist on that. But if we have informal permission from the BNF, do we really need to go into bureaucratic waiver requests. We are just importing a small part of ther CC-BY data to fill a small part of the holes in our data. Getting their agreement is more of a matter of good manners and reputation than of credible legal risk. Actually, we may well have indirectly imported more data from other copyrighted databases when we imported data from Wikipedia. It is ust less visible. --~~
- I have found what the BnF employee said, I didn't remember this kind of job had already been done. And I am not sure that extracting a single information from an authority record can be copyrighted, no ?
- I have added the reference URL (P854) in my script
- Already limited to instance of (P31) == human (Q5) yes
- I am already logging when the info is already there, I will rerun my script after to check (and yes, they sometime use wp as a source, especially when people die from what I have seen so far).
- @Dereckson:, as indicated below I don't think that a single data from an authority record can be copyrighted, no ? Symac (talk) 09:32, 9 November 2014 (UTC)[reply]
- Thanks for the inputs Symac (talk) 09:32, 9 November 2014 (UTC)[reply]