Wikidata talk:Requests for comment/Dimensions and units for the quantity datatype

Default system of measure edit

We should default to the SI standard. --Izno (talk) 00:22, 14 November 2013 (UTC)Reply

The default unit should be per property. For some properties it makes no sense to use pure SI, for instance the speed of vessels is messured in knots and land vehicles in kmh, which are not SI but they are "accepted for use with the SI".--Micru (talk) 08:51, 14 November 2013 (UTC)Reply
The universial lanugage selector should also be a universal unit of measurement selector. And should also be able to let users see coordinates in their favorite format and projection. --Tobias1984 (talk) 12:21, 14 November 2013 (UTC)Reply

Swedish mile/mil edit

A Swedish "mil" (mile) today is 10 km, but before 1889 it was 36000 foot or 10,688.54 meter. Before 1665 the were no national standard. See sv:Mil. That article also tells a norwegian old mil was 11,298 m. -- Lavallen (talk) 11:15, 14 November 2013 (UTC)Reply

  • The times before international standards are going to be really difficult to put into Wikidata. Every market place in Europe had different rods and weights to compare lengths and masses. Wars changed the territories of those reference-materials on a monthly basis and the reference-materials themselves were probably damaged or tinkered with on a daily basis. Putting all this into Wikidata will probably still take years if not decades (especially because we need scientists to investigate it first, so we can source it). --Tobias1984 (talk) 12:15, 14 November 2013 (UTC)Reply
    Agree, I only added this note here since "Swedish mil" was already added to the page. In Sweden we sometimes separate the term "mil" (mile=10000 m) from "svensk mil" (Swedish mile=10688 m) and since there is a 688 meters difference between the terms, it can be confusing. The "mil" should not have any priority, even if the term is used daily here. Technical specifications and traffic-signs are always in kilometer, never in mil. -- Lavallen (talk) 12:24, 14 November 2013 (UTC)Reply

Missing units edit

Not priority, but the sverdrup (Q39274) is commonly used in oceanography, oceanology. --Tobias1984 (talk) 12:17, 14 November 2013 (UTC)Reply

Dalton edit

The current draft doesn't include the "Dalton" or atomic mass unit. Because we have a few atomic properties pending, it would be nice if we could enter numbers as Daltons. Otherwise people will convert the unit themselves and we will have a black box conversion factor in between the value in the source and the entered number. Which brings me to my next point. We need to assemble a list of constants with references which both the community and developers need to able to access. I propose we create a page e.g. Wikidata:Physical constants for this. --Tobias1984 (talk) 12:55, 19 November 2013 (UTC)Reply

The atomic mass unit is there, not in the selection for the test, but on the second mass section. Why do we need a page for constants? Cannot we just query all "instance of:physical constant"? I guess they will contain the value with references too once the corresponding datatype is available.--Micru (talk) 14:09, 19 November 2013 (UTC)Reply
My mistake. I added "Da" as one of the aliases of the unit. The page for constants would let us see how certain numbers are reached through conversion factors. If somebody inputs a weight in atomic mass units and somebody else views it in kg, then the second number depends on the constant which is used for the conversion. I would prefer if everybody could view these conversion factors and not just the programmers. It could be a edit-protected page where changes or alternative values could be discussed and added. Plus citations could be added there which would be displayed at the bottom of each query using those constants. --Tobias1984 (talk) 15:16, 19 November 2013 (UTC)Reply
The conversion factors are open sourced per the first bullet in the RFC lead. Isn't that good enough? :) --Izno (talk) 00:20, 20 November 2013 (UTC)Reply
Making a similar page like MediaWiki talk:Gadget-AuthorityControl.js (based on the GNU file) would have some advantages. The GNU file only has warning messages and some sources. We could expand the list of sources or link to the item that holds the source. As User:Lavallen pointed out, we might need additional conversions in the future. The list covers a lot of USA- and England units, but what about the Swedish units? The GNU file might be incomplete with some numbers. Can't find the Avogadro Constant in the file. Ideally an expansion would immediately be transferred to the UI. --Tobias1984 (talk) 09:35, 20 November 2013 (UTC)Reply
I guess we sometimes will have situations when we do not know the relation between the SI-unit and the unit used in the source. How tall was Goliath (Q192785) as an example? The written records give us some numbers, but we know very little about those units. -- Lavallen (talk) 15:10, 21 November 2013 (UTC)Reply
Regarding adding "Swedish units". In contrast to UK and USA, Sweden has more or less fully adopted the metric system. There is "mil" as I talked about above, and barrel of land (Q1770545). Also the latter has several definitions, I'm afraid. Today, it's considered to be 0,5 hectare (Q35852). None of these is used in technical specifications or reports today. -- Lavallen (talk) 10:46, 22 November 2013 (UTC)Reply

What the datatype should be able to do edit

I think we should also assemble a list of things the datatype should be able to do, before it is released. Things like handling of standard deviations and mathematical numbers (both for displaying and inputting) would be two things I think would be useful. --Tobias1984 (talk) 13:55, 21 November 2013 (UTC)Reply

We had some kind of talk about it when we discussed orbital elements (Q272626). Such numbers are often with standard deviation, but the developers talked about using the same kind of extensions as the time-datatype have today, with "precision", "from" and "to". It then sounded like we have to use qualifiers for standard deviation. -- Lavallen (talk) 15:16, 21 November 2013 (UTC)Reply
Yes, that's my understanding too. Standard errors are a common method of expressing uncertainty, but certainly not the only one, so it's good to express uncertainty flexibly through qualifiers rather than baking one uncertainty measure into the data model.
By "mathematical numbers", do you mean scientific notation (Q219142)? --Avenue (talk) 16:13, 21 November 2013 (UTC)Reply

Which dimensions? edit

  • Please be selective, rather than comprehensive. Let's find out which units (and dimensions) should be implemented first, to cover the vast majority of cases. Keep in mind that it would be annoying having to pick out ft2 from a list with hundreds of obscure of area units. I agree that historical units can be useful sometimes, but perhaps we can concentrate on the most common units first. We already have a comprehensive list (from GNU Units), what we need now is a selective list. -- Duesentrieb (talk) 11:12, 14 November 2013 (UTC)Reply
  • I think for testing purposes length, mass and area should be done first. First only SI-units with SI-prefixes should be allowed. When that works and the initial storm of property creations and mass number imports has passed we can add more stuff. Making people look for metric measures will also decrease the risk of rounding errors or plain wrong numbers (US news papers usually "clean up" numbers after converting from the metric, which is probably the biggest systemic bias in human history). --Tobias1984 (talk) 12:47, 14 November 2013 (UTC)Reply
  • We should be doing our best to store numerical values that have dimensions in such a way that they can be searched in a sensible way. When the time comes for us to be able to query Wikidata - for use in sister projects or elsewhere - we ought to be able to set up a single query that returns all of the data available. For example, if I were looking for all localities that had an area between 10 and 20 square kilometres, I wouldn't want to have to also search for localities that had an area between 10,000,000 and 20,000,000 square metres or an area between 3.861 and 7.722 square miles or an area between 247.1 and 494.2 acres - and did I forget hectares? The point should be apparent, I hope - we can either store the underlying data in a single format and force a conversion on input (then provide conversions for display to suit the application), or we store lots of different numbers with different units that actually represent the same thing and make searching a nightmare. If we don't plan for the future uses of the data now, we'll end up with a far harder task when we try to implement searches later. The real value of a database is the ability to search it - just storing stuff doesn't need all the effort we've been putting into this project. --RexxS (talk) 23:08, 16 November 2013 (UTC)Reply
Ideally the original user input would be saved. It would be nice to know if somebody inputted acres or m². Back to your question: I'm pretty sure the developers are building the database in a way, so it will be able to handle queries with different measurement systems. --Tobias1984 (talk) 14:48, 17 November 2013 (UTC)Reply
Yes :) --Lydia Pintscher (WMDE) (talk) 12:07, 21 November 2013 (UTC)Reply
  • I'm slightly skeptical about the need of *all* SI prefixes. For instance for distances the most common are from atto to kilo, for larger units they are not that used, light-years are used instead or just km plus scientific notation. Same happens with other units, there is a usual range of prefixes and the rest are not used at all. --Micru (talk) 14:15, 19 November 2013 (UTC)Reply
  • I think there are two points to this. If the system can handle prefixes, then handling all of them should not be a problem anymore. Kg is a bit of an exception because the base unit has the K-prefix. I just think the developers should make sure all of these work before the datatype is released. The other point is that for displaying purposes most of these prefixes are rarely used. In my opinion this would be best handled by letting the user choose one or two preferred number formats. Even better would be to allow the user to choose the preferred unit for each property. Atomic weights could then be viewed in Daltons while weight of people would be displayed in kg. Maybe the property creation page will allow us to set preferred display language. This is why it would be helpful to get more input or an UI-mockup from one of the developers. --Tobias1984 (talk) 15:02, 19 November 2013 (UTC)Reply
  • We will at the beginning only have a small selection of units. This is why we're trying to find out here which this should be. The most helpful for us would be if we end up with a list of the 5 most important ones that should come first. Later more can come but we need to start somewhere. --Lydia Pintscher (WMDE) (talk) 12:07, 21 November 2013 (UTC)Reply
  • I don't want to be too demanding, but 5 units doesn't seem like a lot to me. Just the most important units for length (metre, kilometre, foot, and mile) and mass (kilogram, pound) would already take us to 6. Or do you mean the 5 most important dimensions (length, mass, area, etc) and the most important units for those? --Avenue (talk) 13:42, 21 November 2013 (UTC)Reply
  • In my opinion these would be the most useful and widely needed: unit-less number, area (m²), mass (kg), distance (m) and speed (m/s). If more can be done then temperature (Celsius), time (seconds) and atomic mass unit (Dalton) would be useful. If the system can't handle the prefixes it should at least handle mathematical numbers correctly. Please also remember that the UI should treat all zeros as significant, even if they are behind the decimal point. Sometimes a measurement will yield e.g. one or two zeros at the end, and they are significant. Removing them from the data would make measurements look either one or two orders of magnitude less precise. --Tobias1984 (talk) 13:44, 21 November 2013 (UTC)Reply


To clarify: unlike GNU Units, we will (most likely, initially) not support prefixes for factors, nor automatic resolution of "combined" units. That means that seconds, meter, hour, m/s and km/h, g and kg, will all have to be defined separately and manually, with their respective conversion factors. This also means they will all show up separately in your dropdown list for picking units, and if localization is needed, translations have to be created for all of these, separately. -- Duesentrieb (talk) 13:42, 26 November 2013 (UTC)Reply

That's quite what I had expected. If possible it would be nice to have the units grouped in two sets, either "SI-Other" or "frequent-other", that way we wouldn't go over 5-10 units per dimension and per list. The unit triage is complete up to "Time" (8 dimensions), maybe it is better to wait until seeing the first implementation before continuing.--Micru (talk) 23:06, 26 November 2013 (UTC)Reply

Currency edit

These (and every other currency - see Special:WhatLinksHere/Q8142) are essential for the Global Economic Map task force. Filceolaire (talk) 19:42, 14 November 2013 (UTC)Reply

I think we need at least every currency in ISO 4217, as the most accurate statistics are usually provided in local currenct. We could arguably convert everything into US dollars but that would cause a substantial loss of information. Is that possible. I do not think we need any hardcoded conversion, as most exchange rates fluctuate at least from time to time (of course, 1 USD is always 100 cents, but I do not think we need to support cents). --Zolo (talk) 10:11, 15 November 2013 (UTC)Reply
I think we should use the currency the source use. A tricky one is maybe Faroese króna (Q191068) and others with the same kind of history. -- Lavallen (talk) 10:29, 15 November 2013 (UTC)Reply
All currencies should have a time and place qualifier. Prices fluctuate in space and time, and without that information the numbers are almost meaningless. --Tobias1984 (talk) 10:54, 15 November 2013 (UTC)Reply
I do not think the same system that works for scientific unit conversion can also handle currencies. If you add conversion factors with time qualifiers then you might end up with a huge amount of entries and still no good results. I would definitely use the currency the source uses. Some exchange rates fluctuate quite a lot (eg. bitcoins) and in order to make a reasonable conversion one would need to know the exchange rate at the relevant time. Therefor one needs a lot of data on exchange rates in the past. If someone forgot to specify the time or say the relevant information on exchange rates is missing it just does not work. Converting cents to dollars is by far easier.--Debenben (talk) 21:32, 19 November 2013 (UTC)Reply
The way they could work is by considering currency and year as a single unit. It is not the same to say "US dollars" or "Yen" in general than "US dollars (1953)" or "Yen (2003)", you can even make conversions between the later but it is such a complicated and controversial issue (see w:Measuring economic worth over time and MeasuringWorth), that it might be easier to only allow conversion between currencies for the same year and leave worth adjustments to external tools. Currency probably is one of the most complicated dimensions to handle and it might require more development time than others. I am inclined to leave it for the end.--Micru (talk) 22:19, 19 November 2013 (UTC)Reply
Yes, if the quantity + unit datatype requires convertibility, then I do not think we can realistically use it for monetary data. Even in the medium term, that seems much too complicated, and, for various reasons, not nearly as objective than a mile to kilometer conversion. I guess we can use a "currency" qualifier instead. That may not be as convenient to use, but on the upside, that does not impose any restriction on the unit than can be used, and we will probably need qualifiers for the date anyway. --Zolo (talk) 22:32, 19 November 2013 (UTC)Reply

Years edit

There are different kind of years, mainly:

In my opinion the most important is the Julian year, 365.25 days of 86,400 SI seconds each, because it is the one used in astronomy and it has a regular length (the calendar year has leap years). In my opinion, since we want years for timekeeping, we could simplify, just use the Julian and base (timekeeping) decades, centuries, etc as multiples of the Julian year. What do you think? --Micru (talk) 11:17, 22 November 2013 (UTC)Reply

  Done --Micru (talk) 11:47, 23 November 2013 (UTC)Reply

You are making a very good job! For time units, is it possible to add Julian day (or one of its variants)? It is used in the definition of orbits elements in multiple star or extrasolar systems. --Paperoastro (talk) 09:22, 27 November 2013 (UTC)Reply

AFAIK, there are two different approaches to time, either to represent a specific point in time, or to represent a duration. Julian Day Numbers (JDN) are a bit special since they are both, they record a span of time since a pre-set moment to represent a date. In my opinion they should be implemented as a calendar representation of the Time datatype (maybe it would be worth mentioning that on Bugzilla8385 Bugzilla57704). If you want to use them to represent a duration, then you can use either standard 24 h day count or use start/end dates in JDN representation.--Micru (talk) 15:58, 27 November 2013 (UTC)Reply
Yes, you are right! It should be represented by time datatype. Thanks for the suggestions! --Paperoastro (talk) 08:29, 28 November 2013 (UTC)Reply
Sorry, I was not pointing to the right bug, this is the one: Bugzilla57704 --Micru (talk) 09:29, 28 November 2013 (UTC)Reply

Multiple or fraction of unit edit

Is it possible to avoid to save data as fraction or multiple of a unit like kilometer or milligram ? The best solution would be to have an interface allowing to select a multiple/fraction of an unit but behind the value is save in a scientic format. This would help later in the conversion work in the templates in wikipedia. And I propose to the developpers to avoid to build a conversion system inside wikidata: a lua template can do that job and this will leave the flexibility of data formatting to the wikipedia users. Snipre (talk) 10:48, 23 November 2013 (UTC)Reply

From what I understood from this conversation, the numbers are stored fully expressed, with the biggest number being 10^{126}-1, enough to represent the volume of the universe in cubic meters (3.5×1080 m3) and much more. Of course those numbers can be converted later on with templates and I hope there will be support for scientific notation.--Micru (talk) 11:18, 23 November 2013 (UTC)Reply

Did you look at QUDT? edit

I know I am coming very late to this game, but rather than doing all of this work by yourselves, manually, did you consider trying to do a mass-import of the QUDT ontologies? I am not an expert, but I think that it has a very rich set of dimentions and unit definitions. And the fact that it is a Semantic Web ontology would make it relatively easy to import, I would think. I checked your mailing lists, and I only see two mentions of QUDT. Klortho (talk) 06:21, 23 December 2013 (UTC)Reply

Return to the project page "Requests for comment/Dimensions and units for the quantity datatype".