Wikidata:Property proposal/Source of file

source of file edit

Originally proposed at Wikidata:Property proposal/Commons

   Done: source of file (P7482) (Talk and documentation)
Descriptioninformation about where the file is from
Data typeItem
DomainFiles
Example 1 original creation by uploader (Q66458942)
Example 2 original creation by uploader (Q66458942)
Example 3 Flickr (Q103204), qualified with URL (P2699)https://www.flickr.com/photos/16782093@N03/3422126298
Example 4 Belvedere (Q303139), qualified with URL (P2699)https://digital.belvedere.at/objects/3456/eduard-kosmack
Planned useUse on files on Commons

Motivation edit

For structured data on Commons we're discussing how to model source. We could name the property just "source" but that would be extremely confusing on Wikidata itself so I opted for "source of file". We haven't completely figured out all cases, but at least for the simple cases we will need this property. For more complicated cases this property can be used for the immediate source and for the underlying source in combination with some qualifiers. Multichill (talk) 16:28, 13 October 2019 (UTC)[reply]

Discussion edit

  •   Support A thousand times this. We need to do a better job of image provenance. Cite your sources people. Gamaliel (talk) 18:20, 13 October 2019 (UTC)[reply]
  •   Support --GPSLeo (talk) 18:41, 13 October 2019 (UTC)[reply]
  •   Comment When I workshopped this at Wikimania [1], I suggested this top-level property should take a very limited set of values so that they could be effectively constraint-checked, so that every file had one from a very limited list. I think Multichill's proposal above, to mix some values for the nature of the source with some values for the ultimate source is a mistake, because it will make the property far harder for software to interpret, to recognise different types of case, and to apply constraint checking accordingly, appropriately for the different types of case.
In my view, a better data-model for files from particular websites on the internet would be:
Having the generic "file available on the internet" as the main value would allow software to readily identify the nature of the source, and then accordingly apply an appropriate constraint model -- which would be different for different types of source.
We should also strongly distinguish an available URL of the digital image itself from a URL of a webpage describing the digital image. These are two different things; we need to design the data model to make it easy to retrieve one or the other.
In the case of the Eduard Kosmack image, it appears that the JPG itself is not available via the URL above (and possibly not directly available via any URL); the URL above was actually the description page; the image may have been extracted using a process like dezoomify. All of this information should be available from the data model.
Clearly we want some kind of property like this. But it needs rather more careful thought about what model we want to allow appropriate constraint management, and also make possible storage and retrieval of the different kinds of information we want to be accessible. I don't want to see the property approved before that thought has been put in, applied by someone with a bot to several million images, and then for us to realise the data model doesn't do what we want it to do. So until we have a data model specified in more detail that we agree actually works, I say   Wait before creating this. !vote changed, see below. Jheald (talk) 01:31, 14 October 2019 (UTC)[reply]
@Jheald: if we go for the broad version, we need this property, if we go for the narrow version, we need this property. So what's the point of waiting? And who are we waiting for exactly? Not much discussion currently happening.
Don't forget that we're operating wiki's. We try, we reflect and we might change. What you're suggestion sounds more like a waterfall approach where we try to design everything beforehand and than go to actual implementation. I'd rather have the property, do some tests, discus that and maybe change the approach. Multichill (talk) 18:01, 14 October 2019 (UTC)[reply]
I think the only argument to wait with this and many other thing in the structured data is that all none datatype values are not supported by the GUI yet. Of course the API works well but that there is information that is not visible could be confusing. --GPSLeo (talk) 19:35, 14 October 2019 (UTC)[reply]
But that wouldn't be the case for this property, right? You mean not doing any edits until the none thing has been implemented? Multichill (talk) 19:46, 14 October 2019 (UTC)[reply]
If URL (P2699) is used. I would say to create and use this property but only with the features are properly displayed right now. --GPSLeo (talk) 08:52, 15 October 2019 (UTC)[reply]
@Multichill: Unsurprisingly, I'm not a waterfall man, and am generally a great fan of the content side of wiki's general "get something where everyone can see it. Try. Fail. Try again. Fail better" approach. But. This is a property that is going to be used on 40 million pages. It's worth spending a few more breaths to get rough consensus on what we want to do, and how it's going to work, before anybody bombs ahead and adds it to 4 million items off their own bat, creating facts on the ground. @GPSLeo: As you say, at the moment statements with most datatypes are not available, or at least not visible, on Structured Data. I don't see any harm in waiting until those are in place, so we can see and consider some fully worked-through examples, with all fields present, before wider roll-out. (And Commons users have suggested no shortage of test images, for us to think about). I also think it would be a good idea to take thinks quite steadily and cautiously until WDQS is available for Commons, so we see live how properties are getting used. Fortunately it looks like both of those tickets are making active progress again, so with luck those should both be in place really quite soon now. A fundamental thing, about the property proposal process, is that we don't sign off a property as "ready" here until there's a decent-enough consensus on how it is to be used. But with luck that should be achievable. Jheald (talk) 11:47, 24 October 2019 (UTC)[reply]
@Jheald: as discussed in person: Let's start this property tightly scope to only own work files. That way we can get started with some of the files in Commons:Category:Self-published work.
We'll continue how to model source on Commons:Commons talk:Structured data/Modeling/Source and when we reach consensus, expand scope. Multichill (talk) 13:01, 26 October 2019 (UTC)[reply]
Okay,   Support. Particularly, given the hope now that the full range of properties of all datatypes should become visible on SDC within the next couple of weeks or so, I am persuaded that there is (i) now a compelling need to have this property to be able to experiment with the data modelling in the more complicated cases; and (ii) a lot of value in now being able to roll it out for particular sub-sets of cases which we can agree are particularly straightforward (eg images we can reliably trust to be a user's own personal creation and own direct upload, such as WLM images). These would have real value in letting us start to see how SDC performs at scale, and start to see how the interaction with templating will work. The property should maybe carry a note that, for now, only a limited set of potential values are agreed for mass use out 'in the wild', other than test-case images in discussion environments. But I now withdraw my previous general reservation, and agree that this is something that it would be useful to be able to start working with. Jheald (talk) 07:21, 27 October 2019 (UTC)[reply]
On wikidata we could express such restrictions with a property constraint (P2302) = one-of constraint (Q21510859) with appropriate values, and qualifiers constraint status (P2316) = suggestion constraint (Q62026391) and constraint clarification (P6607) = "only agreed values should be used, other than on test images being considered in discussion environments". I'm not sure how fully the SDC interface displays constraints so far, but if not then this might be a useful concrete example for the interface team to play with. Jheald (talk) 07:52, 27 October 2019 (UTC)[reply]
  Support, but after giving it some thought, I support Jheald's approach to it. A bit of deeper thinking beforehand doesn't hurt, this proposed approach is not overly complicated, and I think it makes a lot of sense. It would indeed allow for better constraint modeling, querying, and clustering of similar types of files. Also fully agree with Jheald that we need to separately and differently model URLs of the source webpage where the file comes from, and the actual source file's URL itself. With Structured Data on Commons we now have the opportunity to be a bit more thorough and consistent in our data modelling, take a step back, and rethink some things. This does not prevent us to then get started with a subset of files as soon as we figured those out. Spinster 💬 18:59, 16 October 2019 (UTC)[reply]
  Support--Jarekt (talk) 03:09, 26 October 2019 (UTC)[reply]

@Multichill, Gamaliel, GPSLeo, Jheald:   Done: source of file (P7482). − Pintoch (talk) 08:44, 27 October 2019 (UTC)[reply]