Wikidata:Property proposal/audio transcription 2

audio transcription edit

Originally proposed at Wikidata:Property proposal/Commons

Descriptiontranscription of the word/text being spoken in this file
Data typeMonolingual text
Template parameter"transcription" in commons:Template:Pronunciation file and commons:Template:Lingua Libre record
Domainshort spoken audio files, predominantly files from commons:Category:Pronunciation
Example 1File:De-Katze.ogg, File:De-Katze2.ogg, File:LL-Q188 (deu)-Sebastian_Wallroth-Katze.wav → "Katze"@de
Example 2File:De-at-Katze.ogg, File:LL-Q188 (deu)-Natschoba-die Katze.wav → "die Katze"@de
Example 3File:Fr-chat.ogg → "un chat"@fr
Example 4File:LL-Q150 (fra)-Aemines6-chat.wav, File:LL-Q150 (fra)-Benoit Rochon-chat.wav, File:LL-Q150 (fra)-DSwissK-chat.wav → "chat"@fr
See alsoIPA transcription (P898), media legend (P2096), Timed Text

Motivation edit

(Someone else's proposal from a few years ago: Wikidata:Property proposal/audio transcription)

There are hundreds of thousands of pronunciation files on Commons. It can be tricky to find out whether there is a file for what you are looking for because there are various naming schemes for them and the filenames are also not always an exact match for the text being spoken.

Having the text as part of the structured data would allow us to use the data in queries, e.g.:

  • To find files for a specific word
  • To find words we have pronunciations of which don't exist as lexeme forms
  • To find inconsistencies such as a form representation not matching the text of the linked audio file

The data could also be used in other ways:

  • To generate a short description of the file, e.g. "Pronunciation of (text) in (language)"
  • To display the text being spoken when using the file, e.g. like on wikt:fr:chat#Prononciation.

This would not be the same as media legend (P2096). That property would be expected to contain more of a description, not just the words being spoken.

This is only intended for short words and phrases - there is a limit on the length of monolingual text statements anyway - and I suggest using Timed Text for transcripts of longer audio files.

- Nikki (talk) 21:14, 25 March 2021 (UTC)[reply]

Discussion edit

@Nikki, Lucas Werkmeister:   Done Now audio transcription (P9533). --Lymantria (talk) 17:06, 9 May 2021 (UTC)[reply]