Wikidata:Property proposal/version control system
version control system edit
Originally proposed at Wikidata:Property proposal/Generic
Description | version control system used by a content repository |
---|---|
Represents | version control system (Q3257930) |
Data type | Item |
Domain | Use as qualifier for source code repository URL (P1324) |
Allowed values | Instances of version control system (Q3257930) or of its subclasses |
Example 1 | MediaWiki (Q83) > source code repository URL (P1324) = https://phabricator.wikimedia.org/source/mediawiki/ → Git (Q186055) |
Example 2 | Linux Libertine (Q165925) > source code repository URL (P1324) = https://sourceforge.net/p/linuxlibertine/svn → Apache Subversion (Q46794) |
Example 3 | Leo (Q6523506) > source code repository URL (P1324) = https://code.launchpad.net/~leo-editor-team/leo-editor/trunk3 → GNU Bazaar (Q812656) |
Robot and gadget jobs | Bots should convert the existing usages of protocol (P2700) as qualifiers of source code repository URL (P1324) to this property. |
Motivation edit
This has been agreed by members of WikiProject Informatics/FLOSS in Property talk:P1324#Change protocol constraint to version control system, as a replacement for protocol (P2700) which was not intended for this use, and introduces ambiguity in the context of source code repositories. --Waldyrious (talk) 11:10, 14 June 2020 (UTC)
Discussion edit
- Support ArthurPSmith (talk) 16:04, 15 June 2020 (UTC)
- Oppose protocol (P2700) (or a replacement for this property that has the meaning of "protocol implemented") is the most accurate qualifier to use with source code repository URL (P1324). Uniform Resource Identifier (Q61694) prefixes of "git:" and "svn:" provide a hint at the protocol in use, but don't convey the specific protocol or version of the protocol standard implemented. Modelling should cater for the following types of scenarios:
- Repository is hosted on a Gitblit (Q84415496) server with Git Wire Protocol, version 2 (Q53756006) as a supported protocol.
- Repository is hosted on a Git (Q186055) server with Git Smart HTTP over TLS (Q63085261) as a supported protocol.
- Alternatively, this proposal as it stands could be renamed "class of version control technology used" with newly created vague items such as "subversion", "git", etc used in place of specific qualifiers capturing computer network protocol (Q15836568), server (Q44127), etc. I still don't like this approach however because it generally duplicates Uniform Resource Identifier (Q61694) prefixes, except in cases such as the Uniform Resource Identifier (Q61694) prefix being as generic as "https:". --Dhx1 (talk) 10:47, 16 June 2020 (UTC)
- Support I don't think that the precise protocol is what is most relevant in the use case described here. There are many many ways to download source code that is hosted in git repositories:
- - stateful/stateless http/https
- - ssh, git (through the git daemon)
- - file
- - through packed clone.bundle
- - the bare repo packed in a tar.xz
- - It also changes over time and depends on the hosting systems / providers.
- What seem to be relevant here is to express which version control a given project (like Linux for instance) has chosen for a specific repository.
- Having svn as the authoritative format enable you to use git-svn to download it and work with it, but the authoritative information will be stored in SVN.
- For instance if a version control system doesn't record certain details (like the date at which the author made the commit, or the date at which it was pushed) then you don't have that info. The revision is also typically tied to a given version control.
- For git, you might download it from any protocol, on any mirror, it will still be git. However if you download a git mirror of an svn repository (that has been converted to git for the mirroring), then it's not the authoritative repository / information. You often need to do some specific things to convert properly source code from one version control to the other.
- It's typically a project decision to choose a specific version control system (git vs svn for instance) and it can take quite some time to decide, like in the case of flashrom, the migration from svn to git is/was something important that deeply affects the project, whereas the transport system used might be relevant too but it's a totally different issue that is orthogonal to this one.
- As for the examples, you could probably have a more broad usage too. Some tools use git as a backend (and not specific git protocols). Here are some examples that comes in mind: etckeeper, some wikis, repo, gitlab, github, gitorious, etc. Though some tools (maybe git-annex, not sure) only use the git content storage mechanism which is different from git. GNUtoo (talk) 19:51, 20 June 2020 (UTC)
- Strong support by GNUtoo. --Tinker Bell ★ ♥ 23:01, 20 June 2020 (UTC)
- Support Naming the property "version control system" makes it much clearer what the values should be and therefore more useful for querying Wikidata because HTTP(S) and Git Wire Protocol do not creep in, see Talk:Q186055#Git_Protocol. The only "advantage" of "protocol" is that it allows HTTP(S) as a value to specify that the repository can be accessed with a web interface. However, there is a cleaner way to encode this information, for example: https://invent.kde.org/office/kexi → software engine (P408) → GitLab (Q16639197) and https://anongit.kde.org/kexi.git → software engine (P408) → no value. —Dexxor (talk) 09:59, 4 July 2020 (UTC)
- Support The version control system is what is important to users wanting to make use of the URL; they need to know whether to use it with
,git clone
,svn checkout
, or something else. The lower layer protocols are not important to store (other than what is already part of the URL), since those are determined automatically and can often vary depending on what version of the client is being used. The source code repository URL (P1324) protocol (P2700) qualifier was intended to point to the version control system, e.g. Git (Q186055), Apache Subversion (Q46794), Mercurial (Q476543), etc., and that is the way it is documented on the project page. Although the current usage of the source code repository URL (P1324) protocol (P2700) qualifier mostly follows what is documented there—Git (Q186055) has much more usage than all other protocol (P2700) values combined—it is clear that reuse of this existing property has led to confusion over what values to use. I therefore support this new unambiguous property to replace protocol (P2700) in the context of a source code repository.hg clone
- However, as alluded to above, I think that what is missing from this proposal is how to represent the case where the source code is available only via human readable web page or download link and not via a version control system. For that I would prefer something like version control system = web page (Q36774), or some other agreed-upon value that is fairly clear even to those who haven't read the documentation. I think that is more clear that the current convention of setting protocol (P2700) = HTTP (Q8777) or HTTPS (Q44484), since version control systems also use
andhttp:
URLs. –LiberatorG (talk) 23:21, 4 July 2020 (UTC)https:
- User:Dexxor, User:LiberatorG: couldn't we use full work available at URL (P953) as a property of source code repository URL (P1324) to link to the web-browsable version of the repository? Although in many cases (e.g. for GitHub repositories) this would be the same as the repository URL, that isn't always the case, so we would be able to encode this difference. And we could use full work available at URL (P953) → no value, as Dexxor suggested, for the cases where such an online browser isn't available. --Waldyrious (talk) 10:55, 6 July 2020 (UTC)
- So values for full work available at URL (P953) can either be the URL again or no value? If there is a web interface available under a different URL, it should be added as another statement not as a qualifier on URLs without a web interface (if we added it as both statement and qualifier, we would need to edit the URL at two places when it changes).
- I think that my idea to use software engine (P408) has a higher information density: custom values like GitHub (Q364), GitLab (Q16639197), cgit (Q28974765), Gitea (Q28714270), Trac (Q765385) and unknown mean that the URL provides a web interface and no value means that it does not. Additionally, you can use this information to query which web interfaces are most popular, for instance. If software engine (P408) is not specific enough, we could create a new property called “web interface software”. –Dexxor (talk) 14:34, 6 July 2020 (UTC)
- What about using "repository service" (as used in the GNU Ethical Repository Criteria Evaluations document or "Forge" (for the software)? software engine (P408) seem to be used for things like game engines, frameworks and so on. I would assume that software engine (P408) would refer to things like Django (Q842014) or Mono (Q722656) if a project is heavily dependent on the various libraries that comes with them. For me the current proposal seem to fit very well the intended use. Adding something like "Forge software" (like gitlab) and/or "repository service" (like github) could be useful too once we have a version control system property added. GNUtoo (talk) 06:00, 8 July 2020 (UTC)
- Good points, Dexxor. I do think that software engine (P408) is too broad for this, so I agree with your suggestion to use a new property. I would go with something associated to web hosting (Q5892272). A property analog for forge (Q3077240) is tempting, but the truth is that not all forges provide a web interface to browse the source code and its version history (or even if they do, projects may choose not to activate it while still using the forge's facilities for hosting the VCS server). WDYT? Waldyrious (talk) 22:34, 6 July 2020 (UTC)
- The new property should represent web user interface (Q1981057). forge (Q3077240) seems to exclude software that only provides a web interface to the VCS and no extra features like bug tracking. And not all web interfaces are a web application (Q189210): stagit is just a static HTML generator. I think we all agree that protocol (P2700) is not needed to specify that the URL has no VCS, or no web interface (but at least one of the two is required to be a value for source code repository URL (P1324)). For now let’s focus on adding a property for VCS. —Dexxor (talk) 06:16, 7 July 2020 (UTC)
- I realize this discussion is supposed to be about the property proposed but this is a really good discussion and I thought I would point out that forge (Q3077240) is a somewhat dated term. As the key ways for software developers to collaborate has itself developed, we have the proliferation of other perhaps more grandiose terms like collaborative development environment (Q5145831) and application lifecycle management (Q621590). And there is even DevOps (Q3025536) which attempt to combine these "dev" concepts with software deployment (Q2297740) (a la continuous deployment (Q57261400)) and "IT ops" concepts. It is always tricky to design data structure to be general enough to be flexible in the face of change while being specific enough to actually still be useful. —Uzume (talk) 20:00, 9 July 2020 (UTC)
- The new property should represent web user interface (Q1981057). forge (Q3077240) seems to exclude software that only provides a web interface to the VCS and no extra features like bug tracking. And not all web interfaces are a web application (Q189210): stagit is just a static HTML generator. I think we all agree that protocol (P2700) is not needed to specify that the URL has no VCS, or no web interface (but at least one of the two is required to be a value for source code repository URL (P1324)). For now let’s focus on adding a property for VCS. —Dexxor (talk) 06:16, 7 July 2020 (UTC)
- User:Dexxor, User:LiberatorG: couldn't we use full work available at URL (P953) as a property of source code repository URL (P1324) to link to the web-browsable version of the repository? Although in many cases (e.g. for GitHub repositories) this would be the same as the repository URL, that isn't always the case, so we would be able to encode this difference. And we could use full work available at URL (P953) → no value, as Dexxor suggested, for the cases where such an online browser isn't available. --Waldyrious (talk) 10:55, 6 July 2020 (UTC)
- Support This took long enough. It has been discussed multiple times going back more than four years (and apparently only every two years or so): Wikidata talk:WikiProject Informatics/FLOSS#source code repository URL (P1324) protocol (P2700) qualifier, Wikidata talk:WikiProject Informatics#Git as Protocol?, etc. We already have some properties that have somewhat similar uses, e.g., issue tracker URL (P1401) and package management system (P3033) (of course we will likely want this a Wikidata qualifier (Q15720608) on source code repository URL (P1324)). The URL schema already specifies the lower level protocol(s) to use so things like HTTP (Q8777), HTTPS (Q44484) and Secure Shell (Q170460) are not particularly useful. URL deferences can have input pushed to the server and the server can return different content (e.g., HTTP POST (Q2764521) can send things to the server beyond the URL and content negotiation (Q1128629) can be used to return different media type (Q1667978) (we even have the property media type (P1163)). I also agree that most people care more about how to use such identifiers than the minutia of exactly how it is implemented so exactly which Git Wire Protocol (Q53755957) or Git HTTP transfer protocols (Q63085076) are used is typically less important than which version control system (Q3257930) is needed to get access to the source code (Q128751) in the repository (Q3133368). That is not to say we cannot additionally specify such things just that there is typically a difference in the importance and thus priority of assembling such data. protocol (P2700) should continue to be used when specifying actual communication protocols (even as a Wikidata qualifier (Q15720608) on source code repository URL (P1324) though I expect that to be considerably less interesting once we have this proposed property). —Uzume (talk) 19:11, 9 July 2020 (UTC)
- Support. YULdigitalpreservation (talk) 15:47, 10 July 2020 (UTC)
- @Waldyrious, ArthurPSmith, Dhx1, GNUtoo, Dexxor, LiberatorG: and @Uzume, YULdigitalpreservation: Done --Tinker Bell ★ ♥ 00:44, 14 July 2020 (UTC)