Wikidata:WikiProject Informatics/Structures/File formats
Subpages edit
Goals edit
Long term goals edit
- For Wikidata to become the most comprehensive resource for information on file formats
- For Wikipedia to extensively use data from Wikidata on articles relating to file formats and software
Short term goals edit
- Define and reach agreement on an ontology for file formats
- Advertise for and encourage new contributors to join the project, particularly from digital preservation organisations
- Commence detailed definition of common file formats (PDF, JPEG, etc) to encourage development of an ontology, and to raise awareness of this project
Automatic lists edit
Useful queries edit
- Return a list of all items for which there is a Library of Congress Format Description Document ID (P3266) identifier (Library of Congress Format Description Document ID):
SELECT ?format ?formatLabel ?fdd
WHERE {
?format wdt:P3266 ?fdd .
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
- Return the names of all file formats for which there is a PRONOM file format identifier in Wikidata:
SELECT ?format ?formatLabel ?puid
WHERE {
?format wdt:P2748 ?puid .
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
- Find items sharing the same LocFDD identifier
SELECT * WHERE {
{
SELECT ?id (COUNT(?obj) AS ?count) (GROUP_CONCAT(?obj; SEPARATOR = " , ") AS ?items) WHERE { ?obj wdt:P3266 ?id. }
GROUP BY ?id
}
FILTER(?count > 1)
}
- Find items sharing the same PRONOM file format identifier
SELECT * WHERE {
{
SELECT ?id (COUNT(?obj) AS ?count) (GROUP_CONCAT(?obj; SEPARATOR = " , ") AS ?items) WHERE { ?obj wdt:P2748 ?id. }
GROUP BY ?id
}
FILTER(?count > 1)
}
- Return a list of software applications ranked in descending order by the number of writable file formats that have been listed in Wikidata:
#defaultView:BubbleChart
SELECT ?app ?appLabel (COUNT(?format) AS ?count)
WHERE {
?app (p:P31/ps:P31/wdt:P279*) wd:Q7397 .
?app wdt:P1073 ?format .
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
GROUP BY ?app ?appLabel
ORDER BY DESC(?count)
- Return a list of software applications ranked in descending order by the number of readable file formats that have been listed in Wikidata:
#defaultView:BubbleChart
SELECT ?app ?appLabel (COUNT(?format) AS ?count)
WHERE {
?app (p:P31/ps:P31/wdt:P279*) wd:Q7397 .
?app wdt:P1072 ?format .
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
GROUP BY ?app ?appLabel
ORDER BY DESC(?count)
- Return a list of items that have PUIDs, LoCFDD ids, and File Formats Wiki ids:
SELECT DISTINCT ?format ?formatLabel ?puid ?fdd ?solve
WHERE {
?format wdt:P2748 ?puid .
?format wdt:P3266 ?fdd .
?format wdt:P3381 ?solve .
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
Properties & structure of items edit
Ontology for an item which is an instance of (P31) file format family (Q26085352) edit
A file format family (Q26085352) is a group of file formats which are closely associated with each other, for example:
- File formats are incremental versions of earlier file formats
- File formats are variations of a base or common file format
Property | Expected values | Expected qualifier properties |
---|---|---|
instance of (P31) | file format family (Q26085352) | none |
has part(s) (P527) | one or more item which is an instance of (P31) file format (Q235557) | none |
based on (P144) | one or more of the following:
|
none |
developer (P178) | one or more of the following:
|
none |
PRONOM file format ID (P2748) | valid PRONOM database identifier where the PRONOM database entry is for a file format family (supertype/group of related file formats) | none |
Library of Congress Format Description Document ID (P3266) | valid Library of Congress Format Description Document identifier where the LoCFDD ID is for a file format family (supertype/group of related file formats) | none |
File Format Wiki page ID (P3381) | Wiki page identifier from the Just Solve the File Format Problem wiki | none |
topic's main category (P910) | one item which is an instance of (P31) Wikimedia category (Q4167836) | none |
Commons category (P373) | valid category name on Wikimedia Commons | none |
Stack Exchange tag (P1482) | valid URL for tag associated with file format family on Stack Overflow | none |
official website (P856) | valid URL for the official website of the developer/maintainer of the file format | none |
Ontology for an item which is an instance of (P31) file format (Q235557) edit
A file format (Q235557) should generally be defined by a document (standard, specification or otherwise)
Property | Expected values | Expected qualifier properties |
---|---|---|
instance of (P31) | file format (Q235557) | none |
part of (P361) | where applicable, one item which is an instance of (P31) file format family (Q26085352) | none |
based on (P144) | where applicable, one or more of the following:
|
none |
replaces (P1365) | where applicable, one or more items which is an instance of (P31) file format (Q235557) | none |
replaced by (P1366) | where applicable, one or more items which is an instance of (P31) file format (Q235557) | none |
described by source (P1343) | one or more of the following:
|
none |
developer (P178) | one or more of the following:
|
none |
media type (P1163) | where applicable, one or more Internet media types | none |
Uniform Type Identifier (P3641) | where applicable, one or more Uniform Type Identifiers (see Apple developer documentation for an example source) | none |
file extension (P1195) | one or more file extensions | none |
programmed in (P277) | where the file format contains computer code, one or more items which is an instance of (P31) or instance of (P31) subclass of (P279) computer language (Q629206) | none |
endianness (P3374) | one item which is an instance of (P31) or instance of (P31) subclass of (P279) of endianness (Q339338) | none |
file format identification pattern (P4152) | one or more file format identification patterns |
|
PRONOM file format ID (P2748) | valid PRONOM database identifier where the PRONOM database entry is for a file format family (supertype/group of related file formats) | none |
Library of Congress Format Description Document ID (P3266) | valid Library of Congress Format Description Document identifier where the LoCFDD ID is for a file format family (supertype/group of related file formats) | none |
File Format Wiki page ID (P3381) | Wiki page identifier from the Just Solve the File Format Problem wiki | none |
topic's main category (P910) | one item which is an instance of (P31) Wikimedia category (Q4167836) | none |
Commons category (P373) | valid category name on Wikimedia Commons | none |
Stack Exchange tag (P1482) | valid URL for tag associated with file format on Stack Overflow | none |
official website (P856) | valid URL for the official website of the developer/maintainer of the file format | none |
URL (P2699) | valid URL of a resource related to the file format (for example, a schema which can be used to validate the correct formatting of a file) |
|
Wikipedia Infoboxes edit
The template Infobox: File format could be rewritten using lua to pull all values from Wikidata. Here is a first attempt at how the template parameters could be mapped to Wikidata properties:
Infobox file format parameter | Wikidata property |
---|---|
name | label |
icon | image (P18) |
iconcaption | qualifier media legend (P2096) of the icon |
iconsize | we could recreate this in the lua template |
screenshot | image (P18) |
screenshot size | qualifier media legend (P2096) of the screenshot |
noextcode | this is intended to avoid the use of <code> formatting of the extension property |
extension | file extension (P1195) |
nomimecode | this is intended to avoid the use of <code> formatting of the mime property |
mime | media type (P1163) |
type code | Mac OS type code (P7126) |
uniform_type | Uniform Type Identifier (P3641) |
conforms_to | needs to be created |
magic | file format identification pattern (P4152), needs to handle the qualifiers, especially encoding (P3294) |
developer | developer (P178) |
released | publication date (P577) |
latest_release_version | software version identifier (P348) |
latest_release_date | would likely need to be handled by a qualifier |
genre | genre (P136) has a note that suggests main subject (P921) might be more appropriate for this use |
container_for | has part(s) (P527) |
contained_by | This would be modeled by the containing item's has part(s) (P527) |
extended_from | based on (P144) |
extended_to | This would be modeled by the containing item's based on (P144) |
standard | described at URL (P973) or ISO standard (P503) |
free | I'm not sure about this one |
url | official website (P856) |
Let me know what you think of this. Feedback welcome. YULdigitalpreservation (talk) 19:19, 8 November 2016 (UTC)
- @YULdigitalpreservation: Merge the "latest release date", "latest realease version" and so on. First the "lastest" part should be handled by a preferred Rank and are useless in Wikidata. Second I think it's better to only have an item for a release and to handle all these by a unique property that can be has edition or translation (P747) . For example (the example is for a software but it's the same for a file format) :
- (normal rank)
- {{C|https://www.wikidata.org/wiki/Q8038%7Cedition%7CGimp 2.9.4} (preferred rank)
- author TomT0m / talk page 11:45, 9 November 2016 (UTC)
- noextcode and nomimecode could be modeled using concept of no-value in Wikibase (Q19798647)? --Azertus (talk) 20:13, 5 December 2016 (UTC)
- @YULdigitalpreservation: For "icon", shouldn't we use "icon image" (P154) instead?
I also think that "based on" is ambiguous and its use for a file format should be precised. I guess it should be considered as a "is a restriction of" relationship (instances of format B, based on format A, are also valid instances of format A). Is it the way this property has been used so far? For different types of relationships between formats, couldn't the GDFR Format Model be a source of inspiration ? --Dipsode87 (talk) 11:28, 20 January 2017 (UTC)
Properties for specification or standard edit
This section intends to describe where the information on the specification or the standard of a file format can be described.
The intent of this description is to try to feed the |standard
property of the Template:Infobox file format (Q10986167) see above.
Note that this information is different from the official website (P856).
- easy cases when a property exists:
- ISO (ISO standard (P503))
- RFC (RfC ID (P892))
- others cases :
- use described at URL (P973) if there is a link (URI) to the standard
- use described by source (P1343) if there is a wikidata entity for the standard
- if the kind of standard is known, add a genre (P136) qualifier to give the information, like W3C Recommendation (Q2661442)
- if the organisation in charge of the standard is a wikidata entity , add a standards body (P1462) qualifier , like International Telecommunication Union (Q376150)
Title | ID | Data type | Description | Examples | Inverse |
---|---|---|---|---|---|
ISO standard | P503 | External identifier | ISO standard: numeric identifier of this ISO standard | JPEG 2000 <ISO standard> 15444 | - |
RfC ID | P892 | External identifier | Request for Comments: identifier for an item in Request for Comments, a publication of IETF and the Internet Society (without "RFC" prefix) | Opus <RfC ID> 6716 | - |
RfC ID with qualifier publication date | P577 | Point in time | publication date: date or point in time when a work was first published or released | Opus <RfC ID> 6716 <publication date> septembre 2012 | - |
described at URL with qualifier genre | P136 | Item | genre and by genre: creative work's genre or an artist's field of work (P101). Use main subject (P921) to relate creative works to their topic | XML Schema <described at URL> http://www.w3.org/TR/xmlschema-0/ <genre> W3C Recommendation | - |
described at URL with qualifier standards body | P1462 | Item | standards organization: organisation that published or maintains the standard governing an item | <described at URL> https://www.itu.int/rec/T-REC-T.851-200509-I/en <standards body> International Telecommunication Union | - |
Feel free to add or modify the above table. Toto256 (talk) 10:25, 25 February 2017 (UTC)