Wikidata:Property proposal/version control system

version control system edit

Originally proposed at Wikidata:Property proposal/Generic

Motivation edit

This has been agreed by members of WikiProject Informatics/FLOSS in Property talk:P1324#Change protocol constraint to version control system, as a replacement for protocol (P2700) which was not intended for this use, and introduces ambiguity in the context of source code repositories. --Waldyrious (talk) 11:10, 14 June 2020 (UTC)[reply]

Discussion edit

Alternatively, this proposal as it stands could be renamed "class of version control technology used" with newly created vague items such as "subversion", "git", etc used in place of specific qualifiers capturing computer network protocol (Q15836568), server (Q44127), etc. I still don't like this approach however because it generally duplicates Uniform Resource Identifier (Q61694) prefixes, except in cases such as the Uniform Resource Identifier (Q61694) prefix being as generic as "https:". --Dhx1 (talk) 10:47, 16 June 2020 (UTC)[reply]
  Support I don't think that the precise protocol is what is most relevant in the use case described here. There are many many ways to download source code that is hosted in git repositories:
- stateful/stateless http/https
- ssh, git (through the git daemon)
- file
- through packed clone.bundle
- the bare repo packed in a tar.xz
- It also changes over time and depends on the hosting systems / providers.
What seem to be relevant here is to express which version control a given project (like Linux for instance) has chosen for a specific repository.
Having svn as the authoritative format enable you to use git-svn to download it and work with it, but the authoritative information will be stored in SVN.
For instance if a version control system doesn't record certain details (like the date at which the author made the commit, or the date at which it was pushed) then you don't have that info. The revision is also typically tied to a given version control.
For git, you might download it from any protocol, on any mirror, it will still be git. However if you download a git mirror of an svn repository (that has been converted to git for the mirroring), then it's not the authoritative repository / information. You often need to do some specific things to convert properly source code from one version control to the other.
It's typically a project decision to choose a specific version control system (git vs svn for instance) and it can take quite some time to decide, like in the case of flashrom, the migration from svn to git is/was something important that deeply affects the project, whereas the transport system used might be relevant too but it's a totally different issue that is orthogonal to this one.
As for the examples, you could probably have a more broad usage too. Some tools use git as a backend (and not specific git protocols). Here are some examples that comes in mind: etckeeper, some wikis, repo, gitlab, github, gitorious, etc. Though some tools (maybe git-annex, not sure) only use the git content storage mechanism which is different from git. GNUtoo (talk) 19:51, 20 June 2020 (UTC)[reply]
However, as alluded to above, I think that what is missing from this proposal is how to represent the case where the source code is available only via human readable web page or download link and not via a version control system. For that I would prefer something like version control system = web page (Q36774), or some other agreed-upon value that is fairly clear even to those who haven't read the documentation. I think that is more clear that the current convention of setting protocol (P2700) = HTTP (Q8777) or HTTPS (Q44484), since version control systems also use http: and https: URLs. –LiberatorG (talk) 23:21, 4 July 2020 (UTC)[reply]
User:Dexxor, User:LiberatorG: couldn't we use full work available at URL (P953) as a property of source code repository URL (P1324) to link to the web-browsable version of the repository? Although in many cases (e.g. for GitHub repositories) this would be the same as the repository URL, that isn't always the case, so we would be able to encode this difference. And we could use full work available at URL (P953)no value, as Dexxor suggested, for the cases where such an online browser isn't available. --Waldyrious (talk) 10:55, 6 July 2020 (UTC)[reply]
So values for full work available at URL (P953) can either be the URL again or no value? If there is a web interface available under a different URL, it should be added as another statement not as a qualifier on URLs without a web interface (if we added it as both statement and qualifier, we would need to edit the URL at two places when it changes).
I think that my idea to use software engine (P408) has a higher information density: custom values like GitHub (Q364), GitLab (Q16639197), cgit (Q28974765), Gitea (Q28714270), Trac (Q765385) and unknown mean that the URL provides a web interface and no value means that it does not. Additionally, you can use this information to query which web interfaces are most popular, for instance. If software engine (P408) is not specific enough, we could create a new property called “web interface software”. –Dexxor (talk) 14:34, 6 July 2020 (UTC)[reply]
What about using "repository service" (as used in the GNU Ethical Repository Criteria Evaluations document or "Forge" (for the software)? software engine (P408) seem to be used for things like game engines, frameworks and so on. I would assume that software engine (P408) would refer to things like Django (Q842014) or Mono (Q722656) if a project is heavily dependent on the various libraries that comes with them. For me the current proposal seem to fit very well the intended use. Adding something like "Forge software" (like gitlab) and/or "repository service" (like github) could be useful too once we have a version control system property added. GNUtoo (talk) 06:00, 8 July 2020 (UTC)[reply]
Good points, Dexxor. I do think that software engine (P408) is too broad for this, so I agree with your suggestion to use a new property. I would go with something associated to web hosting (Q5892272). A property analog for forge (Q3077240) is tempting, but the truth is that not all forges provide a web interface to browse the source code and its version history (or even if they do, projects may choose not to activate it while still using the forge's facilities for hosting the VCS server). WDYT? Waldyrious (talk) 22:34, 6 July 2020 (UTC)[reply]
The new property should represent web user interface (Q1981057). forge (Q3077240) seems to exclude software that only provides a web interface to the VCS and no extra features like bug tracking. And not all web interfaces are a web application (Q189210): stagit is just a static HTML generator. I think we all agree that protocol (P2700) is not needed to specify that the URL has no VCS, or no web interface (but at least one of the two is required to be a value for source code repository URL (P1324)). For now let’s focus on adding a property for VCS. —Dexxor (talk) 06:16, 7 July 2020 (UTC)[reply]
I realize this discussion is supposed to be about the property proposed but this is a really good discussion and I thought I would point out that forge (Q3077240) is a somewhat dated term. As the key ways for software developers to collaborate has itself developed, we have the proliferation of other perhaps more grandiose terms like collaborative development environment (Q5145831) and application lifecycle management (Q621590). And there is even DevOps (Q3025536) which attempt to combine these "dev" concepts with software deployment (Q2297740) (a la continuous deployment (Q57261400)) and "IT ops" concepts. It is always tricky to design data structure to be general enough to be flexible in the face of change while being specific enough to actually still be useful. —Uzume (talk) 20:00, 9 July 2020 (UTC)[reply]