Logo of Wikidata

Welcome to Wikidata, 0x010C!

Wikidata is a free knowledge base that you can edit! It can be read and edited by humans and machines alike and you can go to any item page now and add to this ever-growing database!

Need some help getting started? Here are some pages you can familiarize yourself with:

  • Introduction – An introduction to the project.
  • Wikidata tours – Interactive tutorials to show you how Wikidata works.
  • Community portal – The portal for community members.
  • User options – including the 'Babel' extension, to set your language preferences.
  • Contents – The main help page for editing and using the site.
  • Project chat – Discussions about the project.
  • Tools – A collection of user-developed tools to allow for easier completion of some tasks.

Please remember to sign your messages on talk pages by typing four tildes (~~~~); this will automatically insert your username and the date.

If you have any questions, please ask me on my talk page. If you want to try out editing, you can use the sandbox to try. Once again, welcome, and I hope you quickly feel comfortable here, and become an active editor for Wikidata.

Best regards!

--Alexmar983 (talk) 18:05, 17 June 2015 (UTC)

You asked why this deletion?Edit

Thank you for restoring "Nighthawks is a 1978 gay-themed film by Ron Peck" to Q20892688. I was actually trying to remove all data from that item because it was preventing me from adding the English link to the earlier Q11791996, which was created for the Polish version of that page. However, when I ultimately merged Q20892688 into Q11791996, your restoration got copied there automatically, so your edit was helpful.

@Thisisnotatest: yeah, I've see  . Regards — 0x010C ~talk~ 00:32, 31 August 2015 (UTC)

The GameEdit

Hi 0x010C. You recently set a lot of gender values using The Game. Thank you for your efforts here. But please be a little bit more careful when you play the game. There were comparatively many edits that had to be corrected afterwards (e.g. [1], [2], [3], [4], [5], [6], [7]). If you are unsure about one item, please skip it rather than guessing. If you're working too fast for really checking the data (The Game tends to provoke this ;)), try to reduce your pace. Thank you very much. Otherwise, keep on enjoying working on Wikidata! --YMS (talk) 07:06, 27 May 2016 (UTC)

Hi YMS!
It was me the 2 IP addresses, I've corrected those 5 cases just after because I've saw that I had missclick (I didn't pay attention that I wasn't logged in on wikidata itself). Sorry for having missed other errors... I will follow your advices for the future.
Enjoy your day — 0x010C ~talk~ 09:08, 27 May 2016 (UTC)
I almost thought that those IPs were you, but then I saw another edit that you corrected logged-in and two that no one corrected so far, so I assumed that it's more likely the IPs being someone else. If that's not the case, then your error rate looks much better, of course. --YMS (talk) 09:38, 27 May 2016 (UTC)

Requesting rollback of some of my editsEdit

Hi, as part of my efforts to add professional translations for place names for Wikidata items, I recently updated 1382 zh-hans label by copying and simplifying from the zh label. It seems like some of them were not simplified as expected and would rather have these edits rollbacked while I investigate what went wrong.

Would be great if you could help or let me know who I can contact to get this done, thank you --Planemad (talk) 08:14, 19 April 2017 (UTC)

Hi @Planemad:!
I ping @Alphos: who has a bot to do this kind of stuff if I remember correctly.
Best regards — 0x010C ~talk~ 08:56, 19 April 2017 (UTC)
Hi @Planemad:
RollBot seems to be the right tool for the job indeed, and that would even make a great testbed for it - merely as a proof it works, to request full bot status.
Just making sure that you need ALL edits since that one reverted. In other words, are there any other edits than the zh-hans edits during that timeframe ?
Feel free to add details on my talk page, or contact me publicly on the #wikidata IRC channel or privately under the nickname Alphos.
Alphos (talk) 09:22, 19 April 2017 (UTC)
Thank you! Yes, can confirm that all edits since the one mentioned can be reverted, there was nothing else in the time frame that needs to be exempt --Planemad (talk) 10:34, 19 April 2017 (UTC)

Wikibase User Group - mailing listEdit

Hello! Nice to meet you at the hackathon :-) Here's the very new mailing list for the Wikibase Community User Group: https://lists.wikimedia.org/mailman/listinfo/wikibaseug - Cheers! SandraF (WMF) (talk) 17:11, 18 May 2018 (UTC)

PronunciationsEdit

Hello, may be to add them to Lexemes rather than to Items? --Infovarius (talk) 19:42, 26 June 2018 (UTC)

Hi @Infovarius:,
Adding pronunciations to the new Lexeme is planned, but as it's new and the property structure not totally frozen, I prefer to wait a bit. Furthermore, in my opinion both are not mutually exclusive  .
0x010C ~talk~ 19:21, 27 June 2018 (UTC)

Audio recordings of booksEdit

For books, each audio recording should be associated with a particular edition, not with the "work" data item. A French recording should be added to a French edition, an English recording to an English edition. The primary data item for a book will not be the correct place to link an audio recording of that book. --EncycloPetey (talk) 14:07, 19 August 2018 (UTC)


Create lexemes ?Edit

Hi, Looking at the audio files your bot is adding, I think it would be interesting to also create lexemes for some of these. Maybe not some of the very specific item labels, but common nouns, place names, verbs, etc. If you add French ones, you could use TBD (Q344602) as lexical class. I will then update them. --- Jura 13:18, 28 October 2018 (UTC)

Hi @Jura1:,
It's already planned that Lingua Libre Bot add the audio pronunciation files to Lexemes, but as there are still many discussions ongoing around , I'm waiting for the architecture of the properties and the API to be stabilized before activating it  .
Best regards — 0x010C ~talk~ 14:26, 28 October 2018 (UTC)
  • I think the format for these files should be as on Lexeme:L12038#F1. I'm not aware of any discussions to do it differently. If there, please point me to them. You could try to do the French ones and can then review them. You might want to skip those by Nattes à chat for now, as they might be more suitable for items than lexemes. I hope the API remains stable .. --- Jura 17:00, 29 October 2018 (UTC)
(edit conflict) @Jura1: (cc @Pamputt, VIGNERON:) In fact, Lea Lacroix just taught me today that the API was now stable and usable for Lexemes. So I started to update User:Lingua Libre Bot to add audio pronunciation files to existing Lexemes. I still have two questions:
  • If I've correctly understood the structure, two homographs like "fils"@fr (a son) and "fils"@fr (wires), or "dealer"@fr (noun) and "dealer"@fr (verb) will be represented in two different lexemes, am I right? If so, how should these cases be handled by Lingua Libre Bot?
  • Should the pronunciation audio (P443) statement containing the audio record be added to the statements of the lexeme itself or always one of its form? E.g. I have recorded "aller"@fr, should it be added to aller (L750) or L750-F1?
Best regards — 0x010C ~talk~ 17:47, 29 October 2018 (UTC)
Hi 0x010C ;)
Indeed the lexemes (nor the API or the structures) are not stable yet. Lexeme:L12038#F1 is a good example but it's not perfect, there is constraint violation (strange one but though) and it's not exactly in the Wikidata spirit to put several withtout a way to discriminate them (here for instance I'm thinking at least of the origin and the gender of the person, which is stored in LinguaLibre IIRC).
For homograph, you're right, it will be different lexemes (qv. tour (L2330), tour (L2331), tour (L2332) for an extrem and tricky example). I don't know how you can deal with that.
P443 should go on forms (or more arguably on senses) but not on lexeme (a lexeme doesn't have a pronunciation).
PS: we all speak English here but we could all speak French.
Cdlt, VIGNERON (talk) 18:25, 29 October 2018 (UTC)
  • I think you can follow Lexeme:L12038#F1. The constraint violation isn't relevant to lexemes. Eventually, additional data should come directly from the new "M"-entity on Commons, so I wouldn't bother too much with qualifiers. If there is a homograph, I'd skip it (you could all list them somewhere). If there is no F1 yet, you'd need to create. Note that F1 is generally the form identical to the lemma (label on top of a lexeme), but it could be F2. --- Jura 19:10, 29 October 2018 (UTC)
@VIGNERON, Jura1: Here is the first workflow for Lingua Libre Bot I've imagined:
  1. A new audio pronunciation record is available;
  2. If the wikibase item of this record on LinguaLibre is linked to a form of a lexeme:
  3. Otherwise if it is linked to a wikidata item:
  4. Otherwise search for all the forms in the given language which match exactly the transcription of the audio record:
    • If 0 result: --> do nothing;
    • If 1 result: --> add pronunciation audio (P443) to this form;
    • If >=2 result: --> log the record on a specific page with the possible forms.
Does it seems good to you?
0x010C ~talk~ 16:53, 30 October 2018 (UTC)
Globaly, it sounds good.
A remark, for the point 3, several lexemes could link with item for this sense (P5137) to the same item and inside these lexemes, several forms can be homograph, maybe you should have a procedure similar to the one on the point 4.
The point 4 will probably be very resource consuming, maybe we can find a point 3 bis to narrow it before (I'm a bit busy this week but I'll try to look more into it new week).
Cdlt, VIGNERON (talk) 17:39, 30 October 2018 (UTC)
  • Is this for new uploads or existing ones? Ideally, new uploads would include a Lexeme or QID. I think it's fairly easy to add audio files to existing Lexemes. The main problem we have now is that usage of files on forms isn't tracked on Commons. For existing French files, I would create new lexemes as suggested above. --- Jura 19:27, 30 October 2018 (UTC)
@Jura1: Some new uploads will be linked to lexemes, but as LinguaLibre is designed to serve many use cases and wikis, many records will not have links to Wikidata. I will not create new lexemes, for two reasons: it's far beyond the goal of Lingua Libre Bot, and more importantly many records will not be in the scoop of the lexemes (like Amélie Humbert-Droz, Archives de l'Institut et Musée Voltaire, ATSEM, 1er arrondissement de Paris,...). — 0x010C ~talk~ 20:04, 30 October 2018 (UTC)

Hi! I accidentally found this thread about adding pronunciation. In Polish lexemes pronunciation is added differently. It is subproperty of IPA property. See Belgia (L34270) for example. I also think that it would be good to announce this work in Wikidata talk:Lexicographical data. KaMan (talk) 15:06, 31 October 2018 (UTC)


Hi @VIGNERON, Jura1, Pamputt:,

During the Lingua Libre hackathon in Paris last week-end, @Ash_Crow: has created the lexeme module for Lingua Libre Bot! For the moment, it will start simple (it does only the points 1 and 2 of what we discussed above), we can add more complicated logic in the future. I've made a new bot permission request for this task here.

0x010C ~talk~ 16:31, 17 December 2018 (UTC)

Hi,
I wanted to help this weekend but I've been busy, sorry.
This bot looks great, I was a bit surprised by L802-F2 but all seems correct.
Cdlt, VIGNERON (talk) 17:19, 17 December 2018 (UTC)
@VIGNERON: No problem. I also apologize for not taking enough time for you two who were working remote, in fact I already had many solicitations from people at the hackathon... I've continued your work and pushed it further to create a generator which can take in input either a Wikidata Query Service url or a PetScan url. For the first case, it can handle both Items and Lexemes, the only restrictions are that it should not be a tinyurl and the name of the columns should always be ?id and ?label. I've deployed it for all users, maybe you can take a try with the following url (we still have no record in breton language ;) )?
https://query.wikidata.org/#SELECT%20%3Fid%20%3Flabel%20WHERE%20%7B%0A%20%20%3Fl%20a%20ontolex%3ALexicalEntry%20%3B%20dct%3Alanguage%20wd%3AQ12107%20%3B%20wikibase%3AlexicalCategory%20wd%3AQ1084%20%3B%20ontolex%3AlexicalForm%20%3Fid%20.%0A%20%20%3Fid%20ontolex%3Arepresentation%20%3Flabel%20.%0A%20%20FILTER%20NOT%20EXISTS%20%7B%20%3Fid%20wdt%3AP443%20%3Faudio.%20%7D%0A%7D
Best regards — 0x010C ~talk~ 10:51, 18 December 2018 (UTC)

Naming convention for audio filesEdit

Hi! Is there any description of the naming convention of audio files from Lingua Libre? At Polish wiktionary we are used to (and automated bots to) work with naming convention like File:En-Paris.ogg and files from Lingua Libre are currently "invisible" for our automations. Is naming like File:LL-Q150 (fra)-0x010C-guépard.wav final? In this example I don't understand why there is need for "LL" (probably Lingua Libre) and "Q150" and "0x010C" and why language is marked as "fra" not "fr" like ISO 639-1 code (P218) or Wikimedia language code (P424). I have no idea if it is better to have audio files in wav or ogg file format from quality point of view but ogg seems more compressed in size which can be important for commons server. KaMan (talk) 13:41, 18 December 2018 (UTC)

Hi @KaMan:!
Thanks for your interest in audio pronunciation files!
Yes, this naming convention is final. We have changed from the traditional {iso639-2}-{transcription}.wav, because we want to have as much diversity in our audio recordings.
  • The LL is for Lingua Libre, more a disambiguation thing (we could have make without);
  • To indicate the language, we use it's Wikidata Qid (Q150 for example) and it's iso 639-3 code (if it has one, fra for example); the Wikidata Qid to be able to support basically every languages and dialects in the worlds, and not only those which have the chance to have an iso code; and the iso 639-3 code (the one covering the largest number of languages) to still have something human-readable in most cases.
  • Then we include the username of the contributor who has made the record, so that if two persons have recorded the same word in the same language, the second one will not override the first one. Having several records of the same word is very important to reflect local accents for a same language.
  • And last the transcription, nothing changed here.
Concerning the file format, the quality is slightly better in WAV (as it is uncompressed), but the real reason we use it instead of OGG is technical limitation. Lingua Libre use a web-based recording studio, and it is far more easy to produce WAV audio record in a web browser than OGG.
If the polish Wiktionary is interested, I can adapt Lingua Libre Bot to run there too, as it already do on the French and Occitan Wiktionaries. We can add for each record an extended set of metadata, like the place of residence of the speaker or its gender if needed. Just tell me.
Best regards — 0x010C ~talk~ 17:09, 18 December 2018 (UTC)
Just curious: what's the best way to split the username from the transcript? --- Jura 18:19, 18 December 2018 (UTC)
@Jura1: a regex: .*-.*-(.*)-.* give you the username. Cheers, VIGNERON (talk) 07:47, 19 December 2018 (UTC)
Not sure if it's fully reliable, see [8]. I'm more interested in the transcript part. At some point, I tried to extract them to add them to Lexemes. --- Jura 08:23, 19 December 2018 (UTC)
Actually, with a tweak, it is [9]. Thanks! --- Jura 08:27, 19 December 2018 (UTC)
@Jura1: If you think Lingua Libre Bot should add more qualifiers, I'll update its code.
In fact, your query may work on some example, but it won't on all (usernames can contain dashes...). The best way to get the transcription is to use the metadata stored in Lingua Libre's Wikibase instance! You can from our endpoint make federated requests (Wikidata + Lingua Libre) to get the result you want, see the example in the collapsed box bellow.
For more information, you can read Help:Querying Lingua Libre and Lingua Libre query examples.
0x010C ~talk~ 10:12, 19 December 2018 (UTC)
  • re: qualifiers: I don't think it should add the language qualifier. If it's different from the language of the lexeme, I don't think the audio file should generally be added. I was trying to find a sample of a MediaInfo entity that will be used on Commons, but I currently can't find one. --- Jura 11:57, 19 December 2018 (UTC)
@Jura1: Hum, you're right ; I was making it for Items (where it is relevant), but for Lexemes it is not. I've just updated the code in that way. — 0x010C ~talk~ 12:09, 19 December 2018 (UTC)

Two questions about Lingua LibreEdit

First of all: good work, French jumped from 50 audio files to 782. But I have two question:

  1. this morning I recorded a few Polish forms, but when you started today your bot, my files are not added to Polska (L9751). Do you run every language separately by hand or it catches all possible additions?
  2. is this possible to copy audio files in Lingua Libre. I recorded file for "Polską" but the same audio file should be for "polską". Can I copy this audio file or need to make another recording?

KaMan (talk) 10:50, 19 December 2018 (UTC)

@KaMan: Thanks! The bot runs in continue on all languages. But in fact, the lexeme module of Lingua Libre Bot is a bit lazy for the moment, it only add a record when it is 100% confident of the match.
This morning, you have manually typed the polish forms you wanted to record ; by doing so, each record has not been associated to a Lexeme id and the bot will not find it.
But instead of typing a list by hand, during the 3rd step of the RecordWizard you can use a word list generator to automatically fetch words. One is called External Tools and takes as an input an url to a Wikidata Sparql query. By fetching words this way, the id of the Lexemes (or Items, it also work) will be kept as a metadata, and Lingua Libre Bot will be able to make the link!
For polish lexeme, you can try to past this URL in the ExternalTools word list generator: https://query.wikidata.org/#SELECT%20%3Fid%20%3Flabel%20WHERE%20%7B%0A%20%20%3Fl%20a%20ontolex%3ALexicalEntry%20%3B%20dct%3Alanguage%20wd%3AQ809%20%3B%20wikibase%3AlexicalCategory%20wd%3AQ1084%20%3B%20ontolex%3AlexicalForm%20%3Fid%20.%0A%20%20%3Fid%20ontolex%3Arepresentation%20%3Flabel%20.%0A%20%20FILTER%20NOT%20EXISTS%20%7B%20%3Fid%20wdt%3AP443%20%3Faudio.%20%7D%0A%7D (sorry, minified URL doesn't work).
For your second question, no it is not possible to copy an audio recording... :/
If I was not clear enough or if you have further questions, don't hesitate to ask!
0x010C ~talk~ 11:22, 19 December 2018 (UTC)
Ah, thanks for info. So I assume I have to add my recordings by hand now? And currently there is no way to copy recording of "Polską" to "polską" easy way? KaMan (talk) 11:38, 19 December 2018 (UTC)
@KaMan: I've manually added the Lexeme Form id to your audio recordings, and so Lingua Libre Bot has added those on Wikidata!
And no, with Lingua libre you'll have to record it again, because in many languages having a capital letter at the beginning can make a change in the pronunciation...
0x010C ~talk~ 11:56, 19 December 2018 (UTC)
Thank you. What about the same transcription being in two forms? For example you have added https://lingualibre.fr/wiki/Q53101 to L9751-F3 but it should also be added to L9751-F6. This needs another recording too? KaMan (talk) 11:59, 19 December 2018 (UTC)
@KaMan: For the moment, yes. This case is a bit touchy. I don't know how it is in Polish, but in French for example, we have some homograph words, that share their spelling but not their pronunciation like the plural of fil (L10371) (F2) and fils (L15917). That's why 1 audio recording = 1 form at the moment. (but you can add it manually on the lexeme on Wikidata ;) ) — 0x010C ~talk~ 12:19, 19 December 2018 (UTC)
would adding second value in LL as in https://lingualibre.fr/index.php?title=Q53101&diff=prev&oldid=58314 work? KaMan (talk) 12:24, 19 December 2018 (UTC)
I was wondering about that as well. I don't think the pronunciation of F1 and F2 of autotraduction (L31635) are different, at least if the same person pronounces them. Making separate files tends to obscure this. I don't think it's a problem to add the same file to both forms. --- Jura 12:44, 19 December 2018 (UTC)
@KaMan: Yes, it works! — 0x010C ~talk~ 12:55, 19 December 2018 (UTC)

Hi again! I tried to complete audios today and tried to add audio where forms are identical and previous run added only to one ot them. For example L23224-F1 and L23224-F4. But automation on LL site replaced Lexeme ID instead of adding new one, see https://lingualibre.fr/index.php?title=Q53521&diff=prev&oldid=59170 . Moreover I added audios to L23224-F1 and L23224-F8 but your bot on Wikidata ignored this lexeme. Moreover in second today run I recorded three forms for L39299: L39299-F19, L39299-F2, L39299-F6 but bot added only L39299-F19 to Wikidata. KaMan (talk) 06:29, 21 December 2018 (UTC)

Lingua Libre Bot stopedEdit

Hi once again! I have recorded series of audios but your bot not added any of them to Wikidata. Is there any serious problem? KaMan (talk) 05:53, 22 December 2018 (UTC)

Hi @KaMan:,
Can you check again? Lingua Libre Bot can sometimes take more time to do the job, depending of the resources available on the server.
0x010C ~talk~ 17:59, 22 December 2018 (UTC)
Six minutes before your answer bot added first part of my audios but not all. For example https://lingualibre.fr/wiki/Q57086 is still not added since 13 hours. I just made some new audios and they still are not added. KaMan (talk) 06:01, 23 December 2018 (UTC)
Ok, the bot was crashing due to a redirection on Wikidata, so all following records were forgotten. I'll upload a patch tomorow and manually re-run on all the records during the period this happens, no worries :). Best regards — 0x010C ~talk~ 22:12, 23 December 2018 (UTC)
Ah, thanks for the information. I recorded another set this morning and it was not moved to Wikidata again, but if you have cure for it than indeed nothing to worry about. I'm looking forward to the fix, thank you for the explanation. BTW, Lingua Libre is very cool. All the best, KaMan (talk) 06:13, 24 December 2018 (UTC)

Lingua Libre Bot stoped again :( KaMan (talk) 07:31, 30 December 2018 (UTC)

... and again KaMan (talk) 07:36, 31 December 2018 (UTC)
... and again KaMan (talk) 08:43, 5 January 2019 (UTC)
... and again KaMan (talk) 16:07, 7 January 2019 (UTC)
... and again :( KaMan (talk) 16:02, 10 January 2019 (UTC)
Yeah, sorry for that. I've a bug causing the bot to shutdown randomly, and when I'm far from my laptop (like this last two weeks), I cannot restart it quickly. I should have a bit time in the comming days to try to fix this bug, I've put it on the top of my Lingua Libre todolist  . Have a nice day — 0x010C ~talk~ 17:28, 10 January 2019 (UTC)
Ok, I understand, I'm programmer too so I know software sometimes behaves this way. Sorry for bugging you so many times. I understand you cannot restart it now? Because about 200 audios waits? That's ok. I'll wait. KaMan (talk) 03:24, 11 January 2019 (UTC)

Are you still far from your computer to restart it? I hope all is fine with you. But absence of Lingua Libre bot is really a problem for me. I record forms alphabetically and have problems with querying what's already recorded. I wonder if you could share code of your bot so one (me?) could replace you in case of longer absence. KaMan (talk) 12:28, 17 January 2019 (UTC)

@KaMan: Yes, and I will be so for at least a month... :/
But I just pushed a bugfix for that error, Lingua Libre Bot should stay live now.
By the way, it's code is open, you can find it on this github repo :)
Thanks for your patience and all the audio recordings you made!
Best regards — 0x010C ~talk~ 08:23, 18 January 2019 (UTC)

Not working for my entrieEdit

Hi, the bot doesn't seem to be adding my entries could let me know what I need to do differently to make it happen? Back ache (talk) 20:09, 21 April 2019 (UTC)

Wikidata:Federation inputEdit

Salut 0x010C, juste pour te dire que les dév Wikidata attendent des retours sur la meilleure façon de faire tourner des instances Wikibase. Ça se passe ici. Comme tu connais les moindres détails techniques de Lingua Libre, je me dis que tu as peut-être des idées pour améliorer/résoudre les problèmes que tu as pu rencontrer lors du développement. À plus. Pamputt (talk) 14:56, 25 September 2019 (UTC)

Is the bot working for lexemes?Edit

Hello! I just recorded five lexeme forms using LinguaLibre and I see that the bot is not uploading any lexeme since the 1st of October. As I don't know how long does it take to upload them to Wikidata, I would like to know if it is stopped or it is working but it will take some time to upload pronunciations. Thanks! -Theklan (talk) 12:29, 26 October 2019 (UTC)

P443 in itemsEdit

Why this bot is adding pronunciation audio (P443) to regular items? This property instance of (P31) Wikidata property for lexicographic forms (Q54275221) and it's correct for lexemes, not for items (at least not for most items). Wostr (talk) 22:11, 24 November 2019 (UTC)

Q25347Edit

This addition [10] should be made to the data item for the Basque word. The data item you added to is for the taxon Bryophyta. A sound file for "Bryophyta" would be appropriate. --EncycloPetey (talk) 03:01, 25 November 2019 (UTC)

I blocked the bot. Please see the admin's noticeboard. Pamputt (talk) 22:09, 29 November 2019 (UTC)

StatsEdit

Hi, hope you are fine. I updated the statistics for items at Property talk:P443/languages. --- Jura 07:36, 10 December 2019 (UTC)