Wikidata talk:WikiProject Companies

Latest comment: 19 days ago by Fordaemdur in topic (Nominal) share capital and authorised capital


Industry classifications edit

 
WikidataCon 2021 Industry classifications

There was a helpful presentation and discussion at WikidataCon 2021 about how to step forward with industry classifications. The present Property P452, industry, can be informal I think. But there are detailed official classifications for industries -- ISIC, NACE, NAICS]], SIC, and so forth, and if we can identify a company as being in such categories, e.g. the category NAICS 21 (Q6952268), that will be helpful for machine reasoning. This property would suit, I think: NAICS code (P3224). The claim can sometimes be cited to an official source, either the company/organization or a government classification. There are complications -- levels of detail, international differences, change over time. The effort overlaps with WikiProject Organizations and WikiProject Occupations. It was suggested to have a workshop to discuss alternatives and move forward with proposals. I'd like that a lot. -- econterms (talk) 19:53, 30 October 2021 (UTC)Reply

Indeed, thanks to Fjjulien for giving this talk! Industry classifications are an area of interest for me as well. I've been working with company-related items a lot over the past couple of years in support of Name Suggestion Index (Q62108705). NAICS-based statistics have been a very tempting benchmark to gauge OpenStreetMap's coverage of points of interest, but first OSM needs to develop a more comprehensive a correspondence between NAICS and OSM tags and also fill in some of the gaps in OSM tagging standards. There may be opportunities to tie into Wikidata's own modeling of industry classifications.

I had the pleasure of chatting with econterms after WikidataCon to better understand how NAICS codes are applied. There's a lot more nuance than appears at first sight, but also a lot of opportunities. It would be wonderful to organize a workshop to work through some of these challenges.

 – Minh Nguyễn 💬 01:45, 1 November 2021 (UTC)Reply

@Mxn I am glad to hear you found this presentation useful. One thing I may have failed to explain clearly is the kind of entities to which industry classification codes are assigned. NAICS, NACE and ISIC codes are attributed to "establishments" based on to their primary economic activity. "Establishments" can be entire businesses/organizations or units/divisions within organizations. Points of interest in OSM are fuzzy concepts that do not always make a clear distinction between a physical bricks and mortar building and the organization operating it. The building itself has no agency and it can therefore not be deemed to conduct an economic activity. The organization running the building, on the other, does conduct an economic activity and can be assigned a NAICS code by Statistics Canada's Business Register (or any other national agency responsible for attributing industry codes to businesses/organizations). Consequently, I would not recommend attempting to map industry codes to OSM tags. Rather, we should map industry codes to Wikidata items describing subclasses of organizations. Fjjulien (talk) 02:40, 22 September 2022 (UTC)Reply
@Fjjulien: OSM's tags indeed run the gamut from describing facilities (many of the amenity values) to describing economic activities that happen to be located in facilities (many of the craft values). The incomplete NAICS–OSM tagging correspondence has already come in handy on countless occasions when mappers faced uncertainty about how to tag something, so fleshing the table out further would be of benefit to the OSM community. But I think that's largely because, compared to OSM's organic tagging system based on a sort of English pidgin, NAICS is more coherently designed and its industry names often more intuitive to laypeople. (Just to be clear, no one is suggesting to tag individual POIs with their NAICS codes.) Minh Nguyễn 💬 05:39, 23 September 2022 (UTC)Reply
  • @Econterms, Mxn: From the property proposal discussion for NAICS code (P3224) - "In theory, items about individual businesses could use this property, but I don't think it should work like that—rather, the code should be applied to the item about the sector, and then the item about the sector used to describe the item about the business." - given that this code is now an external identifier, it by definition should only be used on the sector, not (at least not as a direct property) on items about specific organizations. Despite the first example shown on the property page (which should be deleted). ArthurPSmith (talk) 17:22, 1 November 2021 (UTC)Reply
  • @Econterms, Mxn: If necessary we could have a second property to apply these codes to organizations - it would need to be string-valued, not external-id though; see the distinction between Library of Congress Classification (works and editions) (P8360) (string-valued) and Library of Congress Classification (P1149) (external-id). ArthurPSmith (talk) 17:24, 1 November 2021 (UTC)Reply
    @ArthurPSmith: I'm entirely comfortable with the practice of tagging NAICS code (P3224) on items about sectors rather than individual organizations. econterms was also concerned about NAICS being a code for places of business, not the businesses themselves. Ideally, a headquarters location and a factory would link to the NAICS sector item, but linking the organization to the NAICS sector item could be a less precise first pass for organizations whose facilities don't have separate items yet. Minh Nguyễn 💬 18:11, 3 November 2021 (UTC)Reply
    All of this would be contrary to how the European and international classification systems are modeled, according to Fjjulien's presentation, but as long as that reflects the usage of these systems in the real world, then I see no problem with that inconsistency. Minh Nguyễn 💬 18:13, 3 November 2021 (UTC)Reply
    @ArthurPSmith, @Mxn: I am personnally not in favour of stating industry codes at the individual organization level, for the following reasons:
    • We will never have enough exhaustivity to be able to interrogate this data;
    • The attribution of industry codes by users who are unfamiliar with them would lead to data quality issues;
    • Industry code attribution is not a statement that can be sourced - the only to verify such claim would be to look for supporting statements (such as field of work) and to visit the organization's website to learn more about them.
    For these reasons, I would rather recommend that NAICS strings be stated in class items describing types of organizations and/or that NAICS ID be stated in items describing the exact NAICS classifications themselves (which would require creating a Wikidata item for each of the hundreds of NAICS classifications. Fjjulien (talk) 02:54, 22 September 2022 (UTC)Reply
I anticipate following up here to organize a workshop as was suggested. Today I'm presenting this issue to a some US government statistical staff to get their views. -- econterms (talk) 15:10, 3 December 2021 (UTC)Reply
@Econterms Where does this stand? I see there is a suggestion that the NAICS code could just be applied to a sector item e.g. battery industry. I have some concern about that - if someone classified a coal company with the sector "mining," then that in turn had the NAICS value of 21, and someone else added a more specific industry such as "coal mining" which has the NAISC code of 2121, then to get the fidelity I wanted I would have to query all the sectors and their respective NAICS codes. I would like to be able to set and retrieve the NAICS (and or other categories) directly on a specific company, then I could use the NAICS code to derive the sector information at whichever level of fidelity I wanted. If a sector was only applied by name it would be harder to enforce any taxonomical hierarchy - also complicated by the different category names in different systems. Some people have been using NAICS as an identifier e.g Juniper Networks, but I assume this is wrong - because it is not a one-to-one relationship, it is closer to a "member of" relationship. Is there was some way to enter a SIC, NAICS, and or GICS value directly on the company item - much like people have tried to do with the identifiers - but rather as a property. Could there be something like "has sector category" of type "NAICS" with the value "35467" and then handle multiple categorizations? Korimako (talk) 16:44, 2 April 2022 (UTC)Reply
@Econterms would it be a bad move to simply add NAICS code as a statement (not identifier) on companies? It seems like a straight forward approach - are there any objections? If not I would like to go ahead and do it for at least the companies I am currently concerned with (the top US companies ordered by revenue). The current industry statement is too vague to be useful. If we have NAICS code it would not prevent other categories to also be used, and in and of itself it is useful to know even if it contradicts other industry/sector classifications. Korimako (talk) 04:08, 3 April 2022 (UTC)Reply
@Korimako: What do you mean by "add NAICS code as a statement (not identifier)"? You need a different property with a different data type for that to work, you can't use the same property, if I understand what you mean. ArthurPSmith (talk) 17:13, 4 April 2022 (UTC)Reply
@ArthurPSmith I took a screen shot of what I was thinking here:
https://drive.google.com/file/d/1buBGkYTO3RA3Vod50UYf0gEKR03M0F_q/view?usp=sharing
I went to the Coca-Cola Page and clicked "Add statement", chose "NAICS code" as the property and added 312111 as the NAICs code value. I didn't save it, so you won't see it there, but that is what I was suggesting (from a position of ignorance!) Korimako (talk) 00:58, 6 April 2022 (UTC)Reply
Yeah, if you actually save it and reload the page you'll see it's among the identifiers, not the regular statements. Doesn't work that way, sorry. ArthurPSmith (talk) 13:41, 6 April 2022 (UTC)Reply
The industry property is not bad in and of itself. Now, I am of the opinion that is should be tightened to allow fewer types of values. It could even be used only to state Wikidata items that are instances of industry classification (Q2976602) or of economic activity (Q8187769) (see performing arts (Q29586005) as an example). This modelling strategy would result in this kind of statements:
Of course, implementing and maintaining entire industry classifications in Wikidata would require considerable time, but I'm pretty sure these classifications are available in JSON-LD, xml, csv or any format that could be quickly loaded to Wikidata. Fjjulien (talk) 03:05, 22 September 2022 (UTC)Reply

My very strong opinion is the same as ArthurPSmith: classification codes should be externalID, and attached to industries not to companies.

  • An encyclopedic KG like WD should not bow to any one classification: rather, it should integrate them and point to them.
  • If you read https://www.wikidata.org/wiki/Property_talk:P4496, there are lots of examples of defects in WD items when they bow to a single classification (in this case NACE).
  • There are not tens of industry classifications but hundreds! Just dumping them in WD does not add any value.
  • Since industry classifications differ greatly by granularity (and indeed purpose), effectively merging/harmonizing them in WD is a difficult task, but valuable indeed.
  • There are a number of coreference tables by UNStats that can be leveraged for this purpose. Comment here if you are volunteering for such tasks.
  • From this point of view, the item NAICS 21 (Q6952268) is unfortunate (but ok, there's a Wikipedia page for it): instead, NAICS "21" should be attached to mining (Q44497).

--Vladimir Alexiev (talk) 09:08, 29 February 2024 (UTC)Reply

Total (Gross) Revenue vs Net Revenue edit

I have been entering the revenues for the top US companies (see above) and started to notice a discrepancy between what I was entering and what I saw on the Wall Street Journal. I was looking in the 10-Ks for the statement of income or statement of operations and taking the revenue or sales information from there. But I wasn't discerning between net and gross revenue figures. For example, Fannie Mae's 10-K shows USD 25,328 million for "Net Revenue" in 2020 but the [1] lists Fannie Mae's 2020 revenue at USD 109,451 million. I can not find where WSJ (or Forbes or Fortune) found these higher figures. Can we just use WSJ as the source? I think we need to be consistent and show Gross Revenue, but I do not know where to find it if it isn't listed in a 10-K. Could we have two statement types - one for Gross Revenue and One for Net Revenue? --Korimako (talk) 23:49, 20 November 2021 (UTC)Reply

Revenue is a bit problematic for financial institutions. Some accounting standards tend to prefer gross revenue (thus including gross interest income, for example Chinese banks) while some net revenue (including net interest income only, for example EU banks). Fannie Mae's 10K is reporting net revenues (on page 67), but gross interest revenue is reported as well (p. 70). WSJ is trying to produce comparable figures, thus performing its own calculation of gross revenue (gross interest income+investment income+trading income+fees) using figures from FM 10-K. It cannot be said that one method of calculation (gross interest income vs net) is right or wrong. It would be useful to indicate the method of calculation (as qualifier), but we should stick to primary sources and not make our own calculations even at the cost of losing comparability and consistecy.--Jklamo (talk) 01:46, 21 November 2021 (UTC)Reply
Ahh -oops, I just read this, and I had followed the WSJ formulation. I see your point though. I can change the back to the simpler figure that is listed on p. 70. Thanks for your thoughtful answer. I have been keeping a spreadsheet of this revenue and board project as I go, and am hoping to recruit others to it soon. You can see that I have also performed additions on two other companies to keep them in line with Fortune and or WSJ. Maybe I will have to rethink. Korimako (talk) 03:31, 24 November 2021 (UTC)Reply

Data donation of companies data with links to press clippings and reports (first half of 20th century) edit

As a result of ZBWs data donation from their w:20th Century Press Archives (PM20), 3897 items were supplemented with links to PM20 folders of digitized reports and press clippings, 5085 items with such links were created from scratch. The PM20 metadata was added to the linked items, e.g. about location (map), industry (list by NACE), or inter-company relationships (example). You can find more about the matching process, details about the metadata added and in particular the application of industry classification in this ZBW Labs blog entry. --Jneubert (talk) 10:18, 14 December 2021 (UTC)Reply

Businesses and role edit

Hello,

Over at Wikidata:WikiProject_Video_games we deal with entities − in general companies − that make (video game developer (Q210167)) and publish (video game publisher (Q1137109)) games. For example, Stray (Q96247255)developer (P178)BlueTwelve Studio (Q113156099)publisher (P123)Annapurna Interactive (Q38805988).

My question is about the instance of (P31) to use on these companies. Historically we have been using video game developer (Q210167) and video game publisher (Q1137109) and they are by far the most common pattern (see queries below [note that plenty of companies do both development and publication]) and we have plenty of constraints to enforce that modelling on external identifiers for such entities (see eg TheGamesDB publisher ID (P7642) or OGDB company ID (P7570))

I have noticed using such P31s runs afoul of the recommendations of this project, which recommend to simply use, I believe, business (Q4830453) (the difference with enterprise (Q6881511) or company (Q783794) is a bit beyond me ^_^). That’s fine − but how then are we supposed to express that Annapurna Interactive (Q38805988) is a video game publisher (Q1137109)? We already use industry (P452) with video game industry (Q941594) ; product or material produced or service provided (P1056) feels a stretch and could apply to both developers and publishers (depending how you might interpret 'produced') ; occupation (P106) is only scoped to people… Thoughts? Jean-Fred (talk) 08:36, 28 September 2022 (UTC)Reply

#title: Most common P31 for entities that develop games
SELECT ?type ?typeLabel (COUNT(?game) as ?games) WHERE {
  ?game wdt:P31 wd:Q7889.
  ?game wdt:P178 ?developer.
  ?developer wdt:P31 ?type.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} GROUP BY ?type ?typeLabel
ORDER BY DESC(?games)
Most common P31 for entities that develop games
#title: Most common P31 for entities that publish games
SELECT ?type ?typeLabel (COUNT(?game) as ?games) WHERE {
  ?game wdt:P31 wd:Q7889.
  ?game wdt:P123 ?publisher.
  ?publisher wdt:P31 ?type.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} GROUP BY ?type ?typeLabel
ORDER BY DESC(?games)
Most common P31 for entities that publish games

Jean-Fred (talk) 08:36, 28 September 2022 (UTC)Reply

Some times such as Croteam are known for both developing their own games and publishing others like THQ Nordic are known for just publishing while others are only know to develop games rather than publishing them. I'm perfectly fine with how things are now. --Trade (talk) 21:02, 9 October 2022 (UTC)Reply

Request comment on Wikimedia Foundation as a financial model and test case edit

Bluerasberry (talk) 19:19, 5 April 2023 (UTC)Reply

Add more info about Registers edit

https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Companies/Archive_3#More_Registers outlines a number of tasks to get more company/org register data into WD.

  • The reason this is important because such registers are the "data anchors" for company existence and basic information.
  • Any volunteers to work on this?

Vladimir Alexiev (talk) 09:14, 29 February 2024 (UTC)Reply

(Nominal) share capital and authorised capital edit

share capital (Q330601) and authorised capital (Q144368) got mixed up in some language descriptions, as well as nominal share capital (P8247) and authorised capital (P12651) got mixed up due to the former existing longer and being used instead of the latter, including in many infoboxes in local wikis. I need help from project participants in cleaning those up. Best, Fordaemdur (talk) 11:01, 21 April 2024 (UTC)Reply

Return to the project page "WikiProject Companies".