Wikidata:Property proposal/fingerprint

fingerprintEdit

Originally proposed at Wikidata:Property proposal/Creative work

   Not done
DescriptionAcoustic fingerprint for media. This includes any kind of audio from music, speeches, audio books and films
Data typeString
Example 1See here: https://github.com/acoustid/chromaprint
Example 2See here: https://github.com/acoustid/acoustid-fingerprinter
Example 3See here: https://github.com/acoustid/chromaprint-build-base
Planned usea) identify audio (and by way of audio also video) by content rather than by meta-data, which may or may not be correct,

b) set and correct meta-data after it has been identified by content via fingerprint,

c) remove content duplicates, even if they are not exact binary duplicates.
See also

MotivationEdit

Any content should ultimately be identifiable by the content itself – not by metadata such as tags or descriptions.

This is obvious for text, where full-text indexation is the standard. It is not yet common for audio, images and video. But it should be. The way to do this is fingerprinting (aka perceptual hash). A property proposal for image fingerprints already exists: https://www.wikidata.org/wiki/Wikidata:Property_proposal/Imagehash_perceptual_hash This proposal is for audio. Since a vast majority of video content (even "silent pictures") comes with audio, the property can be widely used for video identification as well.  – The preceding unsigned comment was added by MalEbenSo (talk • contribs).


DiscussionEdit

Note: Datatype can actually be something other than string. It is subject to the algorithm used for fingerprinting. For example, AcoustID/chromalib creates 2.5k binary for a typical music track. But that binary can, of course, be converted to a string.  – The preceding unsigned comment was added by MalEbenSo (talk • contribs).

  •   Comment can you add three actual samples (see other proposals on how it's generally done)? If this is for Commons only, please add that to "domain=" above. Is this property meant to be used with different methods on Commons or just one? In the first case, the determination method might need to be specified, in the second case, it could be worth including the method in the label or description. --- Jura 13:00, 5 February 2021 (UTC)

  Not done incomplete proposal. --- Jura 20:38, 15 March 2021 (UTC)