Wikidata talk:WikiProject Companies

Active discussions

@Jklamo: What about centralizing our discussion about companies here? To your recent change at Q335415 I have a remark: pulp and paper industry (Q2283886) is the industy, papermaking (Q335415) is the process.--Kopiersperre (talk) 16:20, 12 October 2016 (UTC)

Organizations generally?Edit

I was thinking about starting a wikiproject on "organizations" and noticed this one. Is there an interest in generalizing this project to nonprofit, educational organizations, government agencies etc? I believe the Legal Entity Identifier (P1278), for example, covers a lot more than just companies. I was going to work on getting that more fully populated (only 77 ids so far). ArthurPSmith (talk) 10:10, 10 November 2016 (UTC)

There is, provided we don't get swamped with school boards and sports teams. Many customers need not just company data but also org data (eg who legislated what when, or which govt agencies are working in the particular domain). Eg DnB data includes all kinds of gov departments (so the biggest Global Ultimate parent is the US government, then is the Chinese government. --Vladimir Alexiev (talk) 17:37, 23 February 2017 (UTC)
On the other hand, GLEI is mostly companies, and a lot of them in the financial sector. --Vladimir Alexiev (talk) 17:39, 23 February 2017 (UTC)

Just how much data does Wikidata want to hold?Edit

Not sure I understand Wikidata scope. For example will WikiData follow Wikipedia and exclude most smaller companies from coverage, or will Wikipedia cover every company (or at least every public) company?

For the U.S., perhaps other countries, there is already highly structured financial statement data for all public companies. See for example:

Of course quarterly financial statements are only the tip of the iceburg. Will there be 8K (news releases) included in WikiData, currency reporting in real time, last sale transaction by transaction, bids and offers, etc.

To cover companies you need financials plus market data (last sale, closing price, bids/offers), plus a database of who is trading (Legal Entity Identifiers). All the notes to the financials and all the text filings are very important too. Will WikiData swallow the entire thing? Much of the usage of the financial data sets are high speed, does it make any sense to try to load it into WikiData rapidly?

Rjlabs (talk) 00:14, 5 December 2016 (UTC)

  • @Rjlabs: good questions. I don't believe wikidata as it stands is suited to capture time-series data generally - for example stock prices over time - there's no datatype appropriate for time series other than quantity datatype with qualifiers, which would mean one statement for every point in time, which is really really messy. However, as far as containing some data about "every company", I think that may legitimately be within wikidata's scope. WD:N basically requires only that the entity be something in the real world that is described by reliable sources, so as long as we have some third party dataset within appropriate information, and with a reasonably compatible license, we can pull that in. Or portions of it depending on the license. I've been working with the GLEIF data for US entities via Mix N' Match - here - and found a huge number of entities there which aren't in wikidata but maybe could be. I've marked all the mutual fund/hedge fund/retirement fund entities in GLEIF as NOT suitable for wikidata - there are a lot of those as well and I doubt we really need a wikidata entity for every "municipal bond index fund" or whatever. Of course a large fund that is covered in wikipedia for some reason would be fine. Also not everything else in GLEIF is a company - there are a lot of churches, cities and counties, private colleges and universities, etc. and there are a few individuals. A lot of the "small companies" in GLEIF seem to be related to real estate - property management or rental etc. So I think by-hand filtering is still appropriate there but in the long run a good fraction of those small companies probably ought to be in wikidata too. ArthurPSmith (talk) 14:08, 5 December 2016 (UTC)
Although WD notability policy is enough broad to cover all companies, I think we have no human resources to maintain millions items about companies. But I have no problem with having items about all listed companies and other notable companies (by means of enwiki notability).
Even if we have market capitalization (P2226), I think it is better to use it on annual (or quarterly) basis, rather then day basis. About total revenue (P2139) (etc.) I think WD is able to swallow quarterly financial statements.
For ownership changes (parent organization (P749) and owned by (P127)) it will be useful to store only sizable changes, not a purchases of few stocky by management. But when we are talking about ownership, it will be useful to clarify use of these two properties first.--Jklamo (talk) 10:36, 6 December 2016 (UTC)

Number of customers propertyEdit

Often when I read WP articles on subscription based companies (e.g. telecommunication companies, newspapers, pay-tv channels, etc.) the first sentence in the WP article is something like 'company XYZ is the largest company (based on number of customers)'. Seems like a pretty obvious company information. Unfortunately I didn't find a way to tackle that, often detailled information, into wikidata.

So I was wondering: is there really no wikidata property for 'Number of customers'?

Did I miss something?

Any hint appreciated..

Givegivetake (talk) 21:26, 16 January 2017 (UTC)

hmm, closest I can think of for this right now is has parts of the class (P2670) with value customer (Q852835) and a qualifier with the count, but that doesn't seem right. I think you should propose a new property for number of customers... ArthurPSmith (talk) 13:50, 17 January 2017 (UTC)
Number of subscribers would be more fitting. The number of customers of a normal newspaper that don't subscribe is likely unavailable. ChristianKl (talk) 16:03, 3 February 2017 (UTC)
Thanks for your feedback! I added a first proposal at https://www.wikidata.org/wiki/Wikidata:Property_proposal/Number_of_subscribers Looking forward for discussion! Givegivetake (talk) 20:01, 25 February 2017 (UTC)

Sources of Data and How Much is Too MuchEdit

My impression is that WP/WD are not "business friendly" and have 150-200k true "Companies" (the Company class includes Independent Cities and things that we won't normally consider companies). --Vladimir Alexiev (talk) 15:22, 16 February 2017 (UTC)

I think the issue isn't so much "business friendliness" as a lack of open online resources that provide reliable information about companies. The GLEI data is a good starting point. Also there is a bit of a history of small companies creating their own wikipedia/wikidata entries as a form of advertising which is frowned upon... Anyway, please join the discussion at the existing wikiproject! ArthurPSmith (talk) 15:34, 16 February 2017 (UTC)

GLEI has 500k, but that's very low. DnB has 280M (not just companies, eg a Wallmart store could have a DUNS). OpenCorporates has 127M from 160 registers (there's like 280 registers in the world). We'll be adding the BG Trade Register in a datathon end of March, together with OpenCorporates. My current project is H2020 euBusinessGraph http://cordis.europa.eu/project/rcn/206353_en.html, and we'll likely convert OpenCorprorates to RDF.

The question is: WD won't accept 160M companies, and for good reason. Many companies have various registrations in different jurisdictions (even those that don't seek to hide money flows), plenty of them inactive... If you're doing business for 50 years, registrations "grow organically" and get many and messy. Also, I could register a company in 2-3h and then not do anything with it; or consider mom & pop shops with 2-3 employees. So even if WD Notability is lax, we really need some notability conditions, but it's hard to define such. Market cap or turnover could be such conditions, but they are not always available: many registers don't publish them in structured form, and some important companies are not public. --Vladimir Alexiev (talk) 19:39, 23 February 2017 (UTC)

@Vladimir Alexiev: these are good points. OpenCorporates could certainly be considered a valid source of verifiable data, but 160 million+ items would definitely overwhelm things here. It may be up to us to define reasonable criteria for inclusion. Here are some thoughts:
  • If a company has a wikipedia page (in any language) then they certainly should be included (presumably the 200k already has these?)
  • All public companies with stock market listings probably should be included? Do you know how many that includes?
  • Maybe anything that qualifies as more than a "small business" (i.e. $50 million or more in annual revenue, 50+ employees) should be included?
  • Factories, warehouses, research centers or other facilities in different locations might be worth having as separate items if they are particularly notable or have large staff (over 50?)
separate stores and franchises probably shouldn't be included generally though. I think with criteria like this we might be looking at up to several hundred thousand to 1 million company items, which seems reasonable to handle here. ArthurPSmith (talk) 20:38, 23 February 2017 (UTC)

@ArthurPSmith: If a company has a wikipedia page: sure, that proves notability but even that is hard to identify. Eg go to OpenCorporates and search for goldman sachs. Which of the 1.7k entries corresponds to https://en.wikipedia.org/wiki/Goldman_Sachs? OCorp has about 1000 https://opencorporates.com/corporate_groupings that are user-contributed (eg Grouping of Goldman Sachs but seems to be currently broken), but that's a small number. --Vladimir Alexiev (talk) 15:45, 11 March 2017 (UTC)

Or maybe someone just deleted from https://opencorporates.com/corporate_groupings/Goldman+Sachs+Group+Inc? Tracking here https://twitter.com/valexiev1/status/840590062169542657 --Vladimir Alexiev (talk) 15:49, 11 March 2017 (UTC)

Entities vs. Establishments (a/k/a Facilities)Edit

Large corporations typically own numerous subsidiaries. These entities are often spread around the globe. Further, large corporations, through those subsidiaries frequently own several facilities (or factories), again, typically spread out in different locations. Many establishments (factories / facilities) keep their own set of accounting records, have their own "reputations", pay on time or extend out payments, borrow "locally" based on a local incorporation, have local assets/liabilities/profitability, and expand or contract at their own local rates. This is why Dunns has separate numbers for establishments. Economic reporting relies on the the carefully constructed national registry of entities, as does state level and county level economic accounting. Considerable effort is expended maintaining the national registry, including good coverage of entities (facilities) so data may be aggregated properly by industry and geographic boundary (county / city, state, national).

There are numerous company pages currently on Wikipedia of companies with less than 500 employees. Likewise there are innumerable establishments that have over 500 employees, yet they do not have dedicated Wikipedia pages. While establishments of over 500 employees may be considered "lower interest" in terms of an encyclopedia, they are of critical interest in terms of industry data, and data at the County and State level. In my opinion, WikiData, given its data (vs. encyclopedia) orientation, should be designed up front to well handle both Companies (a legal concept) and entities, especially since companies have "fuzzy" geo coordinates, are often engaged in dozens of industries. Taxation and regulation of workers and business activities also occurs at the Federal, State and local levels. Legal jurisdiction for liability claims, workers compensation, etc. are often highly dependent on where the establishments are located vs. where the ultimate, top level, consolidated entity happens to be domiciled. Wikidata needs granular data at a low level, that is well structured. The establishments link together upwards to various legal entities (subsidiaries), and the subsidiaries link together upwards to form companies that issue consolidated financial statements. There are a vast number of data uses including different types and levels of aggregations that depend on excellent structuring at the granular level. Rjlabs (talk)

@Rjlabs: your signature here was missing a timestamp - something need to be adjusted in your settings? Anyway I could see from history this bit was recent. On your comment: I absolutely agree, I think each facility (not necessarily each building, but each corporate entity of significant size at a specific geographic location) should have its own item. I think we do have all the properties we need for this - parent organization (P749) and subsidiary (P355) for example - but maybe you're thinking we need something else here? We're certainly missing the items themselves. For the most part they would not have individual wikipedia pages and so these would be wikidata items without sitelinks. ArthurPSmith (talk) 14:19, 23 March 2017 (UTC)
@ArthurPSmith: Arthur, I'm very much on board with that approach. As for properties and objects I think its best for WikiData to follow rather than try to lead. I think we should "go to school" on what is already in place. Specifically: 1. LEI (especially good for "company" as a legal entity view, including the new hierarchy of ownership/control), 2. EDGAR (Reverse engineer the current company infoboxes on wikipedia and "refactor" them so they can be auto populated from EDGAR xml (via WikiData) - including anything we can extract in terms of segments, sectors, industries, officers and directors, key executives, major owners, exchanges traded on (including the specific exchange ticker identifiers), etc. 3. "Reverse engineer" how the US Census / BLS / SS Administration structures it's surveys of the economy (various surveys of manufacturing, services, etc.) Specifically how they maintain a very high quality database of establishments that they survey periodically, (and constantly interpolate from those periodic surveys) and cast the sample data into NIPA accounts (for things like quarterly GDP / economic reporting, input/output reporting). I see the challenge here more as a "plumbing project" where we hook up wikidata to other well structured source "wells" vs. trying to reinvent "unique" data structures here, then populate by hand. I'm sure there are additional outside expert sources to tap here as well. International accounting standards is close with their own taxonomy, EU?, UK sources of xml company data?, Worldbank/UN economic accounts schema, etc. To be really valuable, company data (including establishment data) must be disciplined enough to "add up" at the county level, state level, nation...all they way up to global. To do that we need to merge accounting and economic schema. Ultimately some of the high level economic "guestimates" could give way to just adding up detailed, granular data, no need for statistical sampling and hand editing. Give us any geo polygon, or industry/sector "polygon" (or both) and have WikiData sum up the most current detailed info for you. (Yes that is a long term goal.) Are you from the U.S.? (we need EU/UK and Asian input too, surely there must be schemas/structured data repositories outside the U.S. that are very important). I'm willing to do some serious investigation into establishment level data in the U.S., and merging accounting and economic schema. Would be great to link up with like minded international counterparts. Rjlabs (talk) 07:01, 26 March 2017 (UTC)
Yes, I'm in the US, definitely agree international input on this would be a big plus. ArthurPSmith (talk) 17:26, 27 March 2017 (UTC)
@ArthurPSmith: got to thinking about your interest in research establishments and their geo locations (as entities in themselves) vs. a company (a different but related ownership entity) which is based on top level consolidated financial statements. This is very similar to a large company having many significant "factories" located in several different U.S. Counties, and spread out internationally. Always the analyst I was thinking how you could back into a full list of academically/scientifically significant R&D centers (public and private). A few possibilities - one is the Description of property"section (required under SEC Reg SK and often captioned Item 102) of a 10K report. Those might note major R&D facilities physically (typically down to city and state, particularly if they occupied many square feet of office or lab space). Another possibility - mine Google Scholar, Microsoft Academic Search, ssrn.com, etc. for authors who are well referenced, grab their emails and see how many match a local, geo-locatable domain? Another possibility is reversing patent filings which contain both the inventor, including (typically) residential city and state and assignee (often a company) including city and state. Rjlabs (talk) 18:23, 27 March 2017 (UTC)

European CompaniesEdit

I'm in Europe. Two relevant initiatives:

  • euBusinessGraph (see above) aims to integrate data on EU companies, but started recently.
  • BRIS allows both national registries to exchange data (eg DE to mirror AT since there is a lot of trade between the two companies), and general customers to get such data. The data is not free though.

We can talk all we want, but at the end of the day the question is what open data sources exist... --Vladimir Alexiev (talk) 08:05, 8 April 2017 (UTC)

External Identifiers used for CompaniesEdit

Please help: (removed old link because it was lost in an old archive, and copied below by rjlabs)--Vladimir Alexiev (talk) 19:40, 23 February 2017 (UTC)

Trying to find all External Identifier props used for Companies. This finds all external identifier props:

select ?wd ?lab ?desc {
  ?wd wikibase:directClaim ?wdt.
  ?wdt a owl:DatatypeProperty 
  filter (exists{?wd wdt:P31/wdt:P279* wd:Q19847637} # Unique Identifier
      || exists{?wd wikibase:propertyType wikibase:ExternalId})
  #filter exists {[?wdt []; wdt:P31/wdt:P279* wd:Q783794]} # a Company: causes timeout
  ?wd rdfs:label ?lab filter(lang(?lab)="en")
  optional {?wd schema:description ?desc filter(lang(?desc)="en")}
}

Try it!

But if I uncomment the part about Company, I get a timeout. The bracketry is probably confusing, so we can expand it like this for clarity:

   filter exists {?company ?wdt ?any_prop; wdt:P31/wdt:P279* wd:Q783794}

--Vladimir Alexiev (talk) 17:05, 23 February 2017 (UTC)

Annual reportsEdit

Please see Wikidata:Property proposal/annual report.
--- Jura 11:25, 26 February 2017 (UTC)

Franzsimon Kopiersperre Jklamo ArthurPSmith S.K. Givegivetake fnielsen rjlabs ChristianKl Vladimir Alexiev User:Pintoch Parikan User:Cardinha00 User:zuphilip MB-one User:Simonmarch User:Jneubert Mathieudu68 User:Kippelboy User:Datawiki30 User:PKM User:RollTide882071 Kristbaum Andber08 Sidpark SilentSpike Susanna Ånäs (Susannaanas) User:Johanricher User:Celead


  Notified participants of WikiProject Companies
--- Jura 06:56, 5 March 2017 (UTC)

Discussion of Financial StatementsEdit

moved from the Annual report property discussion to here because it has broad implications

  •   Comment I'm looking at the entire "company" and related object hierarchy (more here). So, only preliminary ideas for this property. Here you might want to think about "generalizing" to Public financial statement preferably with a URL to the real thing. (perhaps as an entity of its own, not a property). Adding "public" qualifies that the statements issued are publicly accessible without restriction. Private financial statements are of little use to WikiData. Under that you might want periodicity (week/month/quarter/annual - annual only would be very restrictive), typical release date, accounting principles followed (enumerated list with link back to accounting standard setter, and year), dates released. If possible have a Legal Entity Identifier (what legal entity is publishing these financial statements). Would be good to mark statements audited or unaudited. If audited it would be good to have an enumerated list of audit opinions (clean, subject to (specifically stating what that is), going concern questioned, etc. The auditor should be identified, and the date of the audit opinion. Special care should be taken to extract as much of this as possible from public XML repositories such as EDGAR in the U.S. and 990 Public Tax Filings from the IRS for non profits. Also need to remember that many entities issue public financial statements are not for-profit entities (cities, municipalities, non profits, PACs, non profits via 990 tax filings, charitable groups, etc. In centrally planned governments there are often public financial statements of various state owned entities. These vary from limited "production reports" to full, detailed accounting statements. Any "financial data" released to the public in the form of a regular statement is within scope for WikiData. Rjlabs (talk) 19:15, 10 March 2017 (UTC)
    • This is meant for organizations in general, not just companies. I fail to see the advantage of adding "public". If there is an url, it is publicly available.
      A general problem we have with Wikidata:Property proposal/Economics is that many properties are requested and created (see Wikidata_talk:WikiProject_Economics#Sample_items.3F), but not necessarily used. This is possibly due to people attempting to follow schemes without much interested in actually contributing to the project or even the capacity to create sample items with their proposals (Wikidata_talk:WikiProject_Economics#Sample_items.3F). This leads to huge overhead of unused properties.
      A capacity limitation of Wikidata is also that we can't mirror financial statements in their entirety.
      Even with this property, I think people can still create items for specific reports or other reports, but apparently none is actually interested in doing this (at least, I only found 4 (four) items when I checked). Items for annual reports could easily be cross-referenced with statement is subject of (P805).
      --- Jura 08:50, 11 March 2017 (UTC)
    • @Rjlabs: Your comments show good domain expertise, but you're fairly swamping us with info, and as a result it's unlikely that info will be acted upon. Please look around some other property proposals: they're focused and describe one property only. Please study how some other domains are structured to get a feel of the Wikidata data structure. Certainly read the intro about "claims" and "qualifiers". --Vladimir Alexiev (talk) 11:36, 14 March 2017 (UTC)

Further dialog responding to @Jura1: and @Vladimir Alexiev:

  • Agree if there is a URL, its public, so no need for that word in the label (provided all instances actually have the URL). Not sure I like "annual" as quarterly (and other time periods) are very important.
  • Would like to hear much more about A capacity limitation of Wikidata is also that we can't mirror financial statements in their entirety I think I've been misinformed on company data scope from other project posters. The lead goal on the Company project page used to be to build a system to rival Bloomberg. (that is a LOT of data!) Who at WikiData/Wikimedia is the final authority on "scope" for company data here, or is there authoritative written guidance?
    • @Rjlabs: There is no authority, like on all wikipedias it's by consensus. You make property proposals (like this one), and they get discussed (but you have muddled this one). I'm not familiar with the Bloomberg data but if it's in-depth data about important companies, maybe that would be appropriate. However, I personally don't believe WD is the place to store full accounting reports in a structured way: URLs to such, and income/networth/profit yes, but all the numbers no. It's not so much about the tech capacity, it's about the people/crowd capacity. WD currently has 25M items: before talking of adding 100M items, you need to have a plan who'll maintain them and use them. A separate WD (WikiBase) instance could be created for that sort of data, like the EC project EAGLE has done for Epigraphy. Of course, it takes effort and enthusiasm to maintain such instance, EAGLE may have died, see https://www.facebook.com/groups/Wikidata.GLAM/permalink/933245130111585/ --Vladimir Alexiev (talk) 15:30, 19 March 2017 (UTC)
  • I would like to create a category: Economic properties and go through the property pages and tag them to category: Economic properties if possible. It would be a great benefit to easily know what has already been established.
  • It strikes me that other standard schema, outside of Wikidata, need to be followed where standardization of data already exists vs. brewing up something quick and "homemade" for one off projects here at WikiData. Company data is pretty complex and very advanced outside of WikiData. How much of that does WikiData ultimately want to try to "improve" upon? Wouldn't it be better, and much more efficient, to identify and merely follow the best external standards/schema? And, to work towards better linking between various standing standards around the globe (further developing concordances, cross references, etc.)
  • In terms of scope. Here is an estimate of data items covering just the basic financial statements in the U.S. alone. 10,000 companies file in xml at EDGAR; each year 3 quarterly statements on form 10-Q, and one annual on form 10-K; each of those contain four main statements (income statement, balance sheet, cash flow statement, changes in equity); guessing each of those have 100 "mainstream" data points. So over a 10 year period that is 160 million data points. Ultimately you need much more than that to include the data in the annual proxy statements (including all the officers, directors, major shareholders, etc) plus the detailed notes to the financial statements (many running more than 100 pages), managements discussion and analysis, and additional Reg SK disclosures etc.
  • What "slice" of that data does WikiData want to store directly, and how is it going to import that automatically from EDGAR on a timely basis? Will the schema at WikiData accommodate the xBRL schema at EDGAR or will a transformation be required? Wouldn't it be better to talk EDGAR into offering a SPARQL node, store and serve up the data "through" WikiData?
  • re please study how some other domains are structured to get a feel of the Wikidata data structure - would like to look at domains that are currently well structured on WikiData that are most similar to the large amounts of "table data" as outlined immediately above. Recommendations as to which domains to study?
  • Would also like to have pointers to tools that "visualize" or at least output the WikiData class hierarchy, class properties, inheritance, specific enumerations, data types, etc. Anything like XML spy documentation generators? Is there and xsd for WikiData I can just load in XML spy and look at it? I know there are some tools available but am totally new here.
  • re Certainly read the intro about "claims" and "qualifiers" Pointers to these? Any pointers to how enumerations are implemented?
  • re you're fairly swamping us with info Sorry, do not intend to be overwhelming. Oddly, I have the same feeling here trying to acclimate. I have limited time to devote to this so I'm really trying to avoid being misinformed or mislead (as in the "duplicate Bloomberg here" comment in the company project page.)Rjlabs (talk) 18:51, 14 March 2017 (UTC)
    • The problem is not this discussion, the problem is that it's in the wrong place. I'd suggest cutting it out of here and putting it on the "wikiproject company" discussion page. --Vladimir Alexiev (talk) 15:30, 19 March 2017 (UTC)
    •   Comment interesting comments, but somehow this goes far beyond this (relatively simple) property proposal.
      --- Jura 08:47, 18 March 2017 (UTC)

End of block copied from Wikidata:Property proposal/annual report.

  •   Comment I'm sure eventually you will find people interested in expanding it to the field suggested above, but to start, it might be preferable to try something less ambitious, such as the proposal at Wikidata:Property proposal/annual report. This even if it wasn't primarily proposed for companies, but organizations in general. The resulting structured data might also give you a better basis to formulate more property proposals.
    --- Jura 10:40, 25 March 2017 (UTC)

Related project: WikiProject UniversitiesEdit

Hello!

I have started a somewhat related project, WikiProject Universities, which should overlap this project for private educational institutions. Many of the datasets of interest to our project also cover companies (for instance Open ISNI for Organizations (Q28527677)). Feel free to join, and suggestions are very welcome! Cheers − Pintoch (talk) 09:31, 7 March 2017 (UTC)

Thanks for the ping @Pintoch: I looked at OpenISNI (by Rinngold) and was surprised that amongst 400k entries, there are many non-research orgs (eg Merrill Lynch India, Tata Motors, etc). Lots of overlap so the two projects should definitely collaborate --Vladimir Alexiev (talk) 15:36, 11 March 2017 (UTC)
Great! And thanks for joining! The dataset is on Mix'n'Match (373 and 375), so if you get bored, you can always match things there… My understanding is that most of the institutions there don't have a Wikidata item yet. Not sure if they meet the inclusion criteria though. − Pintoch (talk) 16:28, 11 March 2017 (UTC)

Subsidiaries of multinational companiesEdit

Hi, I'm trying to match GRID to Wikidata items and I run into the problem of representing national branches of multinational companies. Should each of these subsidiaries have their own item? Let's consider an example: Honeywell (Q898208). This item already bears multiple GRID ids, each of them corresponding to a national subsidiary of the multinational group. Should I add others, such as grid.410336.3 (Honywell Canada)? Or is it better to create individual items for all these? I guess each branch has its own headquarters, national company number, number of employees, and so on. Or do these things just exist for legal reasons, and they don't actually represent anything different from the main group? But that conflicts with the uniqueness constraint on the identifiers. Ping @ArthurPSmith: who is involved in this import. − Pintoch (talk) 23:03, 18 March 2017 (UTC)

Hi, separate item for each national subsidiary is much more appropriate. --Jklamo (talk) 12:58, 19 March 2017 (UTC)
Agree with Jklamo. I haven't been creating a lot of new items for GRID identifiers as I've occasionally been finding some errors (duplicates) but we probably should feel freer to do that. Not only national subsidiaries, but individual corporate research labs may have separate records such as IBM's IBM Thomas J. Watson Research Center (Q476208). ArthurPSmith (talk) 20:08, 20 March 2017 (UTC)
I've created items for subsidiaries of business (Q4830453) which had multiple GRID with different countries (when the item had only one country itself). I hope I did more good than harm. I'm reasonably confident that these subsidiaries did not have items before as they would have been detected during the GRID import otherwise. At least the constraint violation report should go from 610 items with duplicate GRIDs down to 230. The remaining cases look more subtle. Some of them indeed look like duplicates on GRID's end, as in Red Universitaria Nacional (Q5841811). I've reported one but we might just point them to the list… − Pintoch (talk) 22:48, 3 April 2017 (UTC)

D-U-N-S number (P2771) and Bloomberg company ID (P3377)Edit

Aren't these proprietary, subject to copyright and subject to licensing fees? If so they have to be removed unless they have been donated and that is documented. Same thing for memberships in published indexes such as Standard and Poor's (S&P), MSCI Inc. (formerly Morgan Stanley Capital International and then MSCI Barra), Dow Jones, CUSIP, ABA bank and routing numbers, SWIFT numbers, etc. Tickers and last sale prices tend to be declared public, after 10-15 minutes. Prior to that the data tends to be owned and licensed by the exchange. Rjlabs (talk) 00:00, 30 March 2017 (UTC)

The identifiers themselves cannot be copyrighted: technically, Wikidata just links to their websites by building the URL from the ID. We don't need to ask for permission to link to http://www.bloomberg.com/research/stocks/private/snapshot.asp?privcapId=27444752, for instance. However, automatically scraping bloomberg.com to add many Bloomberg company ID (P3377) to our items is probably forbidden. For the rest of your concerns, facts cannot be copyrighted: if I learn a fact on a public website and write it in Wikidata with a reference to that source, there is nothing wrong about that. − Pintoch (talk) 08:16, 30 March 2017 (UTC)
OpenCorporates claims that DUNS cannot be used openly. Which would be very strange because it's used in the US Government procurement system. If I tell our DUNS to a potential client or partner, would I be in violation of some right? Weird --Vladimir Alexiev (talk) 07:58, 8 April 2017 (UTC)

Franzsimon Kopiersperre Jklamo ArthurPSmith S.K. Givegivetake fnielsen rjlabs ChristianKl Vladimir Alexiev User:Pintoch Parikan User:Cardinha00 User:zuphilip MB-one User:Simonmarch User:Jneubert Mathieudu68 User:Kippelboy User:Datawiki30 User:PKM User:RollTide882071 Kristbaum Andber08 Sidpark SilentSpike Susanna Ånäs (Susannaanas) User:Johanricher User:Celead


  Notified participants of WikiProject Companies because many eyes should be focused on not violating WikiMedia's rather conservative copyright policies. Disagree that identifiers which are privately compiled and maintained over time (at considerable expense) can't be copyrighted and made subject to strict licensing and usage fees. Look at the legal controversy around ISIN (P946) which attempted to embed CUSIP without paying the toll. At the end of the day a reduced (but not zero) licensing fee was required for Europe, and U.S. consumers can't just use ISIN (P946) for U.S. companies without paying their own usage fees. (more at [[1]] and [[2]].) Here is a little quote out of a CUSIP license:

“Subscriber agrees and acknowledges that the CUSIP Database and the information contained therein is and shall remain valuable intellectual property owned by, or licensed to, Standard & Poor’s CUSIP Service Bureau (“CSB”) and the American Bankers Association (“ABA”), and that no proprietary rights are being transferred to Subscriber in such materials or in any of the information contained therein. Any use by Subscriber outside of the clearing and settlement of transactions requires a license from the CSB, along with an associated fee based on usage. [[3]]

ABA, S&P, SWIFT, D&B - all have extensive teams of lawyers who's mission is to monetize the databases they create. Rjlabs (talk) 17:43, 30 March 2017 (UTC)

Wikipedia hosts CAS numbers for chemicals for years despite the American Chemical Association claiming to own the numbers. Nothing I saw in the Wiki article about INSI suggests to me that Wikimedia's hosting of the numbers produced any problems. ChristianKl (talk) 19:08, 30 March 2017 (UTC)
Indeed this is a very sensitive topic. ChristianKl I strongly disagree with this approach because it means they could ask for all these numbers to be removed from WD at any point in time. Without permission from the interested parties, these IDs should not be on Wikidata, and even more so if they are part of a business model. Either permission is officially given or these IDs have to be deleted. Another solution would be to create our own open source financial IDs system, which could then be used freely by Wikidata, but obviously it would be a very large undertaking. Parikan (talk) 02:50, 31 March 2017 (UTC)
Why do you believe that the IDs are protected by copyright? The fact that an organisations that gather IDs wants them to be protected doesn't mean that they are. There are a lot of different IDs hosted in Wikidata and the fact that the IDs have a financial background don't distinguish them as far as copyright goes. If you think the laws are in a way that prevent this large gathering of IDs how about contacting https://meta.wikimedia.org/wiki/Legal about it? ChristianKl (talk) 07:12, 31 March 2017 (UTC)
Again, these IDs are just hypertext links! The only reason why we could be required to remove URLs from Wikidata would be that the content that is available there is illegal in its own right (copyvio, child pornography, and so on), which is not the case as far as I can tell. If these institutions do not want people to link to these pages, they should not run websites like that. Some people did try to create a "link tax" in the EU but the project was abandoned, and I would be very surprised anything similar was in place in the US. − Pintoch (talk) 13:18, 31 March 2017 (UTC)

HierarchyEdit

Let's open the hierarchy issues. My thoughts and practices (feel free to comment directly in the list):

  • owned by (P127) - direct owner if possible, listing all notable owners. In case of listed company with fragmented ownership I go as far as 3%.
  • parent organization (P749) - usually only one top consolidated entity, may be multiple layers far
  • subsidiary (P355) - direct controlled subsidiaries, I tend to not include indirect subsidiaries, if they are subsidiaries of subsidiary with item
  • owner of (P1830) - minor shares in other companies, joint ventures, again I prefer only direct ownership
  • business division (P199) - only non legal entities

--Jklamo (talk) 10:49, 30 March 2017 (UTC)

does part of (P361) and has part (P527) play a role at all? For example chapters of a national nonprofit organization are not really "owned" by the national organization. Although I guess parent organization (P749) suffices there... ? ArthurPSmith (talk)
  Comment suggest a hard look at LEI's approach to hierarchy, they have spent a lot of time with it, and they are likley to be the defacto standard going forward - on the legal entity side. I also have inquiries into the U.S. Census as to how they maintain their Business Register - which includes Establishments (roughly factory or facility locations that are geo located, have local employees, are assigned 1+ industry codes, etc.), Establishments aggregate to industry by industry statistics and reporting at the County, State and National level. Federal Employer IDs tie to payroll and geo location. LEI system is very much legal jurisdiction based (only a loose connection to actual geography). Need to incorporate both Legal concepts plus physical/on the ground Establishment/Facilities/Properties concepts into the ultimate hierarchy. Again highly recommend all designers of WikiData as it relates to "companies" thoroughly brief themselves on the specifics of LEI as its rolling out, and also how economic accounting tied to geography currently works. When you get to areas that are not based on capitalism there is still a great deal of economic accounting and the WikiData structures need to be able to accommodate all types of systems. Rjlabs (talk) 16:45, 30 March 2017 (UTC)
Rjlabs can you give us a pointer to LEI hierarchy work somewhere? ArthurPSmith (talk) 17:54, 30 March 2017 (UTC)

ArthurPSmith start here [[4]] with an overview and see the detailed references listed below the article, including detailed xml schemas. They uncovered many issues when it comes to reporting (and hopefully independently verifying) chains of ownership. Issues that WikiData will surely face. Yet, they are forging ahead. Best of all LEI info is free and open, not subject to copyright and fees. Rjlabs (talk) 18:24, 30 March 2017 (UTC)

Databases of companies listed on org-id.guideEdit

Hi!

I have recently discovered org-id.guide, which lists databases of organizations (mostly companies). We now have org-id.guide ID (P4824) to link these lists to items about the database.

One thing we can do is find out which ones do not have Wikidata properties yet: User:Pintoch/orgid. So if you are looking for your next database of companies to import in Wikidata, you can have a look there. Org-id provides explanations in English about the structure of the ids, the openness of the database and other things like that, so it can spare you some time when creating the proposal. I have made Wikidata:Property_proposal/UK_Provider_Reference_Number based on that for instance.

Pintoch (talk) 11:13, 14 March 2018 (UTC)

Cool, thanks! ArthurPSmith (talk) 15:09, 14 March 2018 (UTC)

Bloomberg database as a reference?Edit

Franzsimon Kopiersperre Jklamo ArthurPSmith S.K. Givegivetake fnielsen rjlabs ChristianKl Vladimir Alexiev User:Pintoch Parikan User:Cardinha00 User:zuphilip MB-one User:Simonmarch User:Jneubert Mathieudu68 User:Kippelboy User:Datawiki30 User:PKM User:RollTide882071 Kristbaum Andber08 Sidpark SilentSpike Susanna Ånäs (Susannaanas) User:Johanricher User:Celead


  Notified participants of WikiProject Companies

Hello,

I have created Kadimastem (Q52145643) and used statement supported by (P3680) to cite Bloomberg as a reference, but it isn't correct since statement supported by (P3680) can be used as a qualifier only.

Item GoViral Inc. (Q32137229) uses stated in (P248), but it isn't correct either since Bloomberg LP is not a work.

Should we create a specific item about the Bloomberg database and use it with imported from Wikimedia project (P143) ? — Mathieudu68 talk 09:39, 23 April 2018 (UTC)

Certainly yes to creating the item for the Bloomberg Database, but instead of imported from Wikimedia project (P143), I would suggest to use stated in (P248). --MB-one (talk) 10:04, 23 April 2018 (UTC)
Agree. As alternative/addition reference URL (P854) with direct url to Bloomberg database (not sure if using Bloomberg company ID (P3377) in refs is appropriate).--Jklamo (talk) 10:21, 23 April 2018 (UTC)
I try to encourage using identifiers in references. If an identifier was wrongly added to an item, that makes it easier to identify the claims that were derived from its record and delete them. It is also useful if the formatter URL changes. − Pintoch (talk) 11:52, 23 April 2018 (UTC)
@Mathieudu68: so to be clear:
  • imported from Wikimedia project (P143) should not be used for this - it is intended only to indicate imports of data from the wikipedias, not from reliable external sources
  • You should use stated in (P248) with a new item for the Bloomberg private company database (I'm surprised we don't already have one, but I certainly can't find it!)
  • It is also encouraged to add Bloomberg company ID (P3377) with the company id, if that is the reference you are relying on.
  • retrieved (P813) is also a good idea.
thanks for working on this! ArthurPSmith (talk) 12:50, 23 April 2018 (UTC)
Okay, thanks a lot. I've created Bloomberg private company database (Q52148486), but the database seems not to have any formal name.
By the way I came across Q41804121 and I'm not sure that this item is relevant.
Mathieudu68 talk 14:03, 23 April 2018 (UTC)
If you click "What links here" on that item you'll see it was created as a reference for Yves Fortier (Q3205904) - it should probably be nominated for deletion and the reference replaced by something like what we've suggested above. ArthurPSmith (talk) 19:37, 23 April 2018 (UTC)

Small businesses and notabilityEdit

The Wikidata:Notability guidelines are significantly looser than those of the English Wikipedia. The requirement is that an entry refers to an instance of a clearly identifiable conceptual or material entity. The entity must be notable, in the sense that it can be described using serious and publicly available references. For "small businesses", would an entry in a chamber of commerce directory (for example [5]) which includes a business website and hours operation be sufficient to meet this threshold?

Slightly differently, is there a rule regarding individual Walmart/McDonalds franchise locations? Right now these appear to be excluded, is this by policy or simply because they haven't been added yet?

I posed this at Wikidata:Project chat, but this may be a better place for the discussion. Power~enwiki (talk) 20:45, 28 April 2018 (UTC)

Thanks for posting here. We have discussed notability of small businesses and franchises here and here on this page; the conclusion generally was that Wikidata just doesn't have the human capacity to deal with all such entities, and so "small businesses" or facilities of larger companies (with less than 50 employees) would normally be excluded unless they met other notability criteria (in particular having a wikipedia page of their own). However, if you think the human capacity issue is maybe not such a problem, we can certainly discuss this further! ArthurPSmith (talk) 23:08, 28 April 2018 (UTC)
If I have any ideas as to how to do mass creations without a ton of labor, I'll come back here. Government records of business listings are too indiscriminate, and Chamber of Commerce / YellowPages listings aren't licensed correctly and also aren't formatted for automatic processing. Power~enwiki (talk) 21:01, 9 May 2018 (UTC)
It's worth noting that if you want to add many records for businesses automatically, that should go through a bot request, where the discussion would be had whether or not the data should be added. ChristianKl❫ 19:08, 22 July 2018 (UTC)

Business enterprise as a collection of companies?Edit

I'm not sure exactly how a company like Marmalade Insurance (Q8058284) should be represented in Wikidata. It's a marginally notable enterprise that has a enwiki article at Young Marmalade. Like many enterprises, it has multiple companies that change over time. I've identified four (with British Companies House IDs):

I'd be reluctant to create items for all of these in Wikidata, since even if they are all considered notable, and it would be feasible in this case, it doesn't seem very useful. It's easier to just treat them as a single business enterprise. E.g., it would be unclear which of the companies should be linked with the enwiki article: not 08676228, which seems to now be the parent company, because the inception date is wrong, and not 04627884, which was apparently the original company but doesn't have the main insurance entity.

Larger enterprises may have dozens or hundreds of subsidiary companies, constantly changing, and keeping track of them could be a full-time job for somebody.

Would it be feasible to make a way of listing these subsidiary companies on a single Wikidata item for the enterprise? Ghouston (talk) 01:40, 15 June 2018 (UTC)

Currently, business (Q4830453) is a subclass of company (Q783794), which seems like strange logic. Businesses can be structured in various ways, including various types of company depending on jurisdiction. There are also partnerships and sole traders, which are businesses but not companies. Ghouston (talk) 07:43, 15 June 2018 (UTC)
So we have sole proprietorship (Q2912172) as a subclass of business (Q4830453), which is a good reason not to make business (Q4830453) a subclass of company (Q783794). Ghouston (talk) 07:45, 15 June 2018 (UTC)
corporate group (Q197952) is what I'm looking for, as an item for a group of related companies. Ghouston (talk) 07:47, 15 June 2018 (UTC)
  • @Ghouston: I guess the question for separate items vs one item is mainly, can statements on one item represent all of them adequately (as a group in some way) - for example, are they in the same country, have the same headquarters location, produce the same products, etc. If it's essentially one company that's just changed its official name a few times, I would definitely say one item with the official name (and any associated ID's) qualified by start and end dates etc. If it's subdivided into separate business units that do different things, I would say separate items, but maybe not all of the sub-units need an item of their own if they are small or otherwise not really notable. ArthurPSmith (talk) 14:44, 15 June 2018 (UTC)
  • Yes, it seems to me that one item would be sufficient in this particular example, since the enterprise only seems to be known for one thing in one place. The individual companies are a not very interesting detail, but it would still be nice to name them with their Companies House ids, now that I've discovered them. Ghouston (talk) 03:28, 16 June 2018 (UTC)
  • Perhaps creating items for the companies is the only way of retaining all the information, including full names, name changes, inception dates, and Companies House ids. The group page would also be needed for linking with enwiki, presumably the companies would be part of (P361) the group. Ghouston (talk) 05:39, 17 June 2018 (UTC)

Adding historical information of companiesEdit

I wonder if I can add historical information about a company simply because it is currently possible in Wikidata. Out of historical perspectiv it is really interesting to collect information about companies in context of time. Events like changes of the official name or the move of the headquaters location. I added an example for this Level-5 (Q674686) with two statements about headquarters location (P159). It works fine for me but I am not sure if that's in the sense of Wikidata. Have you any thoughts about it? Diggr (talk) 14:26, 10 July 2018 (UTC)

It is totally OK to store old hq and names of company at Wikidata. Just do not forget to mark current data with proper rank.--Jklamo (talk) 17:19, 10 July 2018 (UTC)
Okay, I will do that. Thank you! Diggr (talk) 14:57, 11 July 2018 (UTC)

Bot for Legal Entity Identifier (LEI)Edit

The GLEIF – Global Legal Entity Identifier Foundation - publishes every month a CSV file with BIC-LEI mappings. [6]

How do you find the idea, to use a bot to read the csv and write the LEI to all available banks with BIC-Identifier? Datawiki30 (talk) 09:49, 21 July 2018 (UTC)

@Datawiki30: that's a brilliant idea! I have recently been working with that dataset, adding LEI ids and french company numbers for a few french financial institutions. https://tools.wmflabs.org/editgroups/b/OR/11cc9a3/ . I used OpenRefine for that. − Pintoch (talk) 17:14, 21 July 2018 (UTC)

Importing data from OpenCorporatesEdit

Franzsimon Kopiersperre Jklamo ArthurPSmith S.K. Givegivetake fnielsen rjlabs ChristianKl Vladimir Alexiev User:Pintoch Parikan User:Cardinha00 User:zuphilip MB-one User:Simonmarch User:Jneubert Mathieudu68 User:Kippelboy User:Datawiki30 User:PKM User:RollTide882071 Kristbaum Andber08 Sidpark SilentSpike Susanna Ånäs (Susannaanas) User:Johanricher User:Celead


  Notified participants of WikiProject Companies

I have recently had a look at ways to improve our connections with OpenCorporates (Q7095760). I have discovered that the OpenCorporates ID (P1320) consists in two predictable parts:

  • a code for the jurisdiction of the entity;
  • the company number for this legal entity in that jurisdiction - this is the original id from the national company register, not a made-up one from OpenCorporates.

Therefore it is possible to deduce a lot of OpenCorporates ID (P1320) from properties for national registers (and conversely). For instance OpenCorporates ID (P1320) fr/527678262 corresponds to SIREN number (P1616) "527678262", and OpenCorporates ID (P1320) gb/02906991 corresponds to Companies House ID (P2622) 02906991.

I have done a few of these derivations for various jurisdictions and created a table summarizing the correspondence between national prefixes and Wikidata properties.

I think this makes OpenCorporates ID (P1320) really interesting for Wikidata: it connects in a completely transparent way with national registers that we already link to. Moreover, they have an OpenRefine reconciliation interface that makes it easy to look for matches and upload the ids to Wikidata. (I am thinking about writing a tutorial about this workflow.)

For now, the limiting factor is the license: we cannot pull more than the ids because their API is under CC-BY-SA. I will have a call on Tuesday with them to see if they could still allow some data import in Wikidata. What sort of data would you be most interested in? − Pintoch (talk) 17:41, 21 July 2018 (UTC)

That is interesting. I think corporate relationships would be very useful - this entity is a subsidiary of that, etc. Official website if there is one. More detailed headquarters location info if applicable. Inception & dissolution/merger dates, replaces/replaced etc. if possible. If a more detailed corporate type is available and can be matched to a wikidata ID for P31 that could be useful. Stock index/ticker symbol I guess. ArthurPSmith (talk) 18:38, 21 July 2018 (UTC)
Also financial data (total revenue (P2139), total assets (P2403), e. g.), information of leadership (director / manager (P1037)), seat (headquarters location (P159)) and legal status (legal form (P1454)) will be of interest. --MB-one (talk) 01:24, 22 July 2018 (UTC)
OpenCorporates suggest that we narrow down to the top three most important data fields that we would like to import, to see if they could make a special agreement to import these in Wikidata. − Pintoch (talk) 15:11, 25 July 2018 (UTC)
Well that's promising. I assume that's 3 fields beyond the ID? Do you have a list of their fields somewhere, and how often they are populated? Looking at a few examples, I think I'd say "Incorporation Date" and "Jurisdiction" may be the most important. ArthurPSmith (talk) 16:59, 26 July 2018 (UTC)

Automating tier 1 capital ratio extraction from published pdfEdit

Franzsimon Kopiersperre Jklamo ArthurPSmith S.K. Givegivetake fnielsen rjlabs ChristianKl Vladimir Alexiev User:Pintoch Parikan User:Cardinha00 User:zuphilip MB-one User:Simonmarch User:Jneubert Mathieudu68 User:Kippelboy User:Datawiki30 User:PKM User:RollTide882071 Kristbaum Andber08 Sidpark SilentSpike Susanna Ånäs (Susannaanas) User:Johanricher User:Celead


  Notified participants of WikiProject Companies

This capital ratio (P2663) is one very important regulatory indicator fr banks. Usually every EU-bank publishes this ratio on the own homepage. The problem ist that the data is in pdf among with other data. I woul like to automatically download and read the data to import to wikidata. This caital ratio is very important regulatory indicator. Usually every EU-bank publishes this ratio on the own homepage. The problem ist that the data is in pdf among with other data. I woul like to try to automatically download and read the data to import to wikidata.

Once I downloaded the pdf files,I would have onlyn for Germany more than 1500 files. Does anyone have experience for searching and extracting information from multiple pdf-files? I found two interesting things which could help:

  1. pdfgrep [7] - this command utility can read multiple files and write the results in a text file. This is fine, but after that I would need a script to structure/cut only the data I need.
  1. Tabula [8] - this should also work very well, but there should be the same problem like pdf grep.

You can see one example file here [9]. The advantage is, that these pdf are highly standardised. Datawiki30 (talk) 21:27, 30 July 2018 (UTC)

Franzsimon Kopiersperre Jklamo ArthurPSmith S.K. Givegivetake fnielsen rjlabs ChristianKl Vladimir Alexiev User:Pintoch Parikan User:Cardinha00 User:zuphilip MB-one User:Simonmarch User:Jneubert Mathieudu68 User:Kippelboy User:Datawiki30 User:PKM User:RollTide882071 Kristbaum Andber08 Sidpark SilentSpike Susanna Ånäs (Susannaanas) User:Johanricher User:Celead


  Notified participants of WikiProject Companies

I have written some python script to help find and extract the data. The code has the following main tasks:
1. Search for selected company and the Document with the data using the Qwant.com api
2. Get the document
3. Search the Dokument for keywords using pdfgrep and regular expressions and extract the value
4. Write the values in QuickStatements-Format
5. Optionally save the page to Web-Archive
Are you interested for the code? Maybe we can use this for other purposes... --Datawiki30 (talk) 20:03, 10 September 2018 (UTC)
Have you tested this yet? Usually for bulk imports of this sort you should use a bot account for which there's an approval process - at the least it would be good to have somebody review what you've done! ArthurPSmith (talk) 20:06, 10 September 2018 (UTC)
Yes. For the first time I ran this for a 10 banks and checked the values. After that I've done that for about 200 banks. Of course I've cross checked the values before importing them to WD. But indeed there was no second person the check again the values.
I would like to use the same method to update the values for another 600-700 german cooperative banks. I've tested this for about 50 banks - see here: WD-Query
What should I do next? Should I use a separate bot account? --Datawiki30 (talk) 20:30, 10 September 2018 (UTC)
Thanks for the query example - however, I think it would be better if you could list some specific changes that followed your current process. When I look at for example Volksbank Eisenberg (Q2531693) I see some changes that I wonder where they came from - for example modifying the point in time value after the fact. Also I looked at the reference and I don't see where the value "20.6" came from - are you doing some computation beyond just extracting the values from the PDF? ArthurPSmith (talk) 19:20, 11 September 2018 (UTC)
Thank you for the review! I remember that there were about 20 Edits, where I had to correct the point in time property after the batch. You have to search for "Harte Kernkapitalquote". To find the value you have to search for "20,6". (Sometimes there are some scanned PDF or PDF with some format, that are not searchable.) Here you can find another QuickStatements-batch for german saving banks: https://tools.wmflabs.org/quickstatements/#/batch/3777. --Datawiki30 (talk) 21:06, 11 September 2018 (UTC)
Ok - sorry to be slow getting back to you - I've taken a look and you seem to be approaching this in a reasonable manner, the added data looks good. As far as I'm concerned you can go ahead! ArthurPSmith (talk) 20:03, 13 September 2018 (UTC)

type of business entity (Q1269299) and list of business entities (Q53400657) HELP!Edit

Franzsimon Kopiersperre Jklamo ArthurPSmith S.K. Givegivetake fnielsen rjlabs ChristianKl Vladimir Alexiev User:Pintoch Parikan User:Cardinha00 User:zuphilip MB-one User:Simonmarch User:Jneubert Mathieudu68 User:Kippelboy User:Datawiki30 User:PKM User:RollTide882071 Kristbaum Andber08 Sidpark SilentSpike Susanna Ånäs (Susannaanas) User:Johanricher User:Celead


  Notified participants of WikiProject Companies

If you ever hated the term legal person welcome to the mosh pit!

see this for prior discussion, moving here because it rapidly rolled off "Project Chat" https://www.wikidata.org/w/index.php?diff=680757693&oldid=680755302&title=Wikidata:Project_chat#type_of_business_entity_(Q1269299)_and_list_of_business_entities_(Q53400657)

type of business entity https://www.wikidata.org/wiki/Q1269299

list of business entities (Q53400657) (has now been updated to "list of legal entity types by country" https://www.wikidata.org/wiki/Q53400657

... there a real difference between "type of business entity" and "business entity". Q1269299 use the first as label and the second as alias, maybe we can switch them to make it clearer (or even change for "legal form" which is an other alias but the label for the corresponding property legal form (P1454)). I'm not a specialist of this subject either, can someone else pitch in

What's been published in the English Wikipedia does not fit well into the Wikidata structure. I have moved list of business entities (Q53400657)' in Wikipedia to list of legal entity types by country. That article might be broken down to separate articles - "list of legal entity types in the United States, Japan, U.K. ... and that might then play better in Wikidata. The wikipedia article Legal person roughly translates to Legal entity and roughly translates to legal entity type (either list or by geolocation). (note that the article Legal person has been moved to Legal entity and been reverted (see talk page). Is your head spinning yet? Help stop the spin. Thanks! Rjlabs (talk) 19:34, 6 August 2018 (UTC)
Indeed, there seems to be another structure in the English Wikipedia to describe the legal entity form. The Global LEI Foundation publishes also the entity legal form (see this - GLEIF Legal Entity Form Code ). They also published their list of legal forms - derived from the ISO 20275 (see PDF-legal-Form ). When I scroll to USA I found, that there are different legal forms for different states... I dont think, that we can solve this situation. Maybe the type of business entity could be the parent entity of all the other legal entity forms according to the GLEIF-List. --Datawiki30 (talk) 19:57, 10 August 2018 (UTC)
It's normal that each US state has a separate ELF sublist, because each is a separate jurisdiction. I had high expectations about the ELF list, but there are misspellings and some unexpected values (at least comparing to the BG Trade Register), and commonly-established abbreviations are not used in the ELF --Vladimir Alexiev (talk) 09:05, 23 August 2018 (UTC)
@ Vladimir: Are you talking about "Командиртно дружество с акции" :-)? Why dont you write them an e-mail? They can correct the misspellings you found. You can also propose them the abbreviations you have in mind. --Datawiki30 (talk) 21:11, 10 September 2018 (UTC) PS: Other things should be not so easy to correct. I found some not complete legal adresses in GLEIF for companies with two or more legal adresses. I've challenged the LEI and the issuer pointed at the german register. When I've checked the register I found that the address there is also nicht complete - there were two towns but only one ZIP-Code. So I've called the local curt to ask them about that, but they said, the company is responsible for the address-data. So I called the company - they say that they have never had problems with their partners about the address... :-/ All at all this was a very interesting experience ;-)

Bot for nominal GDP-values - Data from the World BankEdit

Franzsimon Kopiersperre Jklamo ArthurPSmith S.K. Givegivetake fnielsen rjlabs ChristianKl Vladimir Alexiev User:Pintoch Parikan User:Cardinha00 User:zuphilip MB-one User:Simonmarch User:Jneubert Mathieudu68 User:Kippelboy User:Datawiki30 User:PKM User:RollTide882071 Kristbaum Andber08 Sidpark SilentSpike Susanna Ånäs (Susannaanas) User:Johanricher User:Celead


  Notified participants of WikiProject Companies

I tried a discussion about this on the Economy WD-Project, but there is no activity. I would like to request a bot for reading GDP-values from the WorldBank DB and write back to Wikidata. The prototype is already done and it worked well on test.wikidata.org. The Scripts can be easily adopted to write other statistics like inflation, GDP per capita etc. How do you find the idea? --Datawiki30 (talk) 20:10, 10 September 2018 (UTC)

Good idea.--Jklamo (talk) 13:59, 25 September 2018 (UTC)
Thank you Jklamo. I would appreciate your support for the request for the bot-permission here: Wikidata:Requests_for_permissions/Bot/WDBot. Other members of the Companies-Project are also welcome to support the bot request :-). Cheers! --WDBot (talk) 09:26, 30 September 2018 (UTC)
The bot has been approved and is running. Cheers! Datawiki30 (talk) 20:53, 20 October 2018 (UTC)

Company, business, etc.Edit

There is a mess with the concepts about companies. I'm not talking about different type of companies, but about the basic definitions, on which we now have the following 5 items:

I think we should have instead only 3 distinct items that refer to 3 distinct concepts:

  • a legal entity that carries on a business activity (in English "company", in Italian "società", in French "compagnie/societé");
  • the business activity carried out by the entrepreneur (in English "business", in Italian "impresa", in French "entreprise");
  • the complex of assets (goods and human capital) organized by the entrepreneur to carry on the business activity (in Italian "azienda").

How can we match this concepts with the items above? I've opened an Interwiki conflict some months ago, but I didn't received any answer. I ping @Alan ffm: because I've seen that he made some edits on this theme. --BohemianRhapsody (talk) 22:16, 26 September 2018 (UTC)

Note business (Q4830453) used to be labeled "business enterprise" in English, and was sort of the catch-all for the organization/corporation etc. - there are thousands (hundreds of thousands?) of items that use this item as their value for instance of (P31). I guess this corresponds to your first concept? business (Q19862406) I believe is your second concept - an activity rather than an organization. ArthurPSmith (talk) 23:27, 26 September 2018 (UTC)

"Replaced by" (P1366) or "dissolved, abolished or demolished" (P576) for company mergersEdit

I have asked myself which property is the best when two (or more) companies merger. I would suggest the following:

a) The old companies and the new merged company are available on Wikidata

  • "Replaced by" (P1366) for the old companies pointing at the new company
  • "replaces" (P1365) for the new company pointing at the old companies
  • in this case the property "dissolved, abolished or demolished" (P576) should not be used (redundant)

b) The new company is not available on Wikidata -> "dissolved, abolished or demolished" (P576) for the old companies

What do you thing about this? --Datawiki30 (talk) 18:10, 29 September 2018 (UTC)

Depends on the type of merger, as something referred as merger can actually mean different things, like:
  • creation of a new entity into which are old entities merged
  • creation of a new entity, old entities become a subsidiary of this entity
  • renaming one entity, second entity merged into the renamed entity
  • renaming one entity, second entity become subsidiary of the renamed entity.
--Jklamo (talk) 10:28, 30 September 2018 (UTC)
Thank you for the 4 examples. I would suggest:
  • creation of a new entity into which are old entities merged -> option a) above
  • creation of a new entity, old entities become a subsidiary of this entity -> parent organization (P749) for the new entity and subsidiary (P355) for the old entities.
  • renaming one entity, second entity merged into the renamed entity -> option a) above
  • renaming one entity, second entity become subsidiary of the renamed entity -> parent organization (P749) for the renamed entity and subsidiary (P355) for the other entity.
--Datawiki30 (talk) 13:51, 30 September 2018 (UTC)
  • Companies can be difficult. Recently Fairfax Media (Q1393218) (an Australian company owning numerous newspapers, websites etc.) was taken over by Nine Entertainment Co. (Q16999054), which is apparently extinguishing the Fairfax brand. Newspaper articles say that Fairfax has ceased to exist, and editors at en:Fairfax Media have followed suit. However, the company didn't go away on the day of the takeover, but continues to exist as a subsidiary of Nine (see government database entries at [10] and [11]). There's unlikely to be enough public information available in future to work out its on-going status. On Wikidata, it still owns numerous newspapers (this may be technically correct at present, but may change over time). I've put a dissolved, abolished or demolished (P576) on the Wikidata item, but I'm not really sure how it should be handled. Ghouston (talk) 03:54, 12 December 2018 (UTC)
  • Now the English Wikipedia has changed its mind and reinstated Fairfax Media as en:Nine Publishing. The other sitelinks on the item still call it Fairfax Media. Yet the Fairfax Media company still exists, according to the Australian business register [12]. Ghouston (talk) 01:19, 5 March 2019 (UTC)

Where would website or privacy policy go?Edit

I am working on a project aiming to catalog data tied to how companies handle data protection concerns. For instance I would like to have a framework to add website URLs, as well as privacy policies URL and content. That's very much just a start. What would be the best way to do this? I would appreciate any help as I am new to Wikidata. Thank you. Pdehaye (talk) 13:13, 19 November 2018 (UTC)

Website URL's can be added using the property official website (P856). I don't think we currently have a property to specifically link to privacy policies - you could do this now using URL (P2699) with a qualifier applies to part, aspect, or form (P518) privacy policy (Q1999831); that seems the right way to me at least but somebody else may suggest a better way to model this. ArthurPSmith (talk) 15:26, 19 November 2018 (UTC)
What if I want to create a new item for a specific privacy policy?

Mix n Match with SIRENE, French register for French companiesEdit

I've launched a mix n match to link wikidata to the SIRENE database, ie French register for companies. Only big companies with more than 1,000 employees have bee3 selected --PAC2 (talk) 07:43, 30 January 2019 (UTC)

CX and similarEdit

At Q55841490, I tried to add some more data on this one. Obviously active ones could be more interesting, but Q55841490 might be a good benchmark for the type of data that should be available. --- Jura 18:06, 11 February 2019 (UTC)

Gender pay gap dataEdit

Disclosure requirements in the UK for the gender pay gap of most companies have produced a treasure trove of data: https://gender-pay-gap.service.gov.uk/ Countless secondary sources have published articles about this: see an example of usage inline in English Wikipedia articles.

The CSV dump contains CompanyNumber and/or SicCodes for each line which should help match Wikidata items. The Open Government License was previously discussed at Wikidata:Project_chat/Archive/2017/10#OGL licence for data. Nemo 09:41, 12 August 2019 (UTC)

Company name and business changes over timeEdit

Should a company (listed, public company in this example) be coded in a single Wikidata item with time-qualified name, ticker code etc, or as multiple items linked by replaces (P1365)/replaced by (P1366)? My question is triggered by an EnWiki AFD discussion about a stub article about a company. The company had already changed its name, and in trying to clean up the article, I discovered there was also a Wikipedia article about the company under an earlier name. For now, I have created a third Wikidata entity and linked them with follows/followed by, but I wonder if the three should be merged and have time qualifiers on the properties that have changed. Miller's Retail (Q6859005) already existed, Specialty Fashion Group (Q28183744) was the article brought to AFD, I created City Chic Collective (Q83484143) for its current name.

To add to the complexity, the company ran a number of retail chains of shops, and has recently sold off all but one of those (the one that matches its current name - City Chic). The chains it sold include one with the original company name (Miller's/Millers). Should each chain/brand also have its own Wikidata item so that ownership changes cna be tracked over time (or is that a question for another wikiproject?)? --ScottDavis (talk) 23:59, 23 January 2020 (UTC)

@ScottDavis: If it's clearly the same entity with a different name, then a single item should suffice. If there was some structural change along with the name change (for example a relocation, merger, change of business activity, etc.) then maybe two items would be better. ArthurPSmith (talk) 15:42, 24 January 2020 (UTC)

Adding the Forbes 2000 rank to a companyEdit

I would like to add the Forbes 2000 rank to companies (for many years). This can be done by the statement "part_of" Forbes_Global_2000, with the qualifier ranking and point_in_time (see https://www.wikidata.org/wiki/Q26463 for an example). The full Forbes 2000 list can be found on Kaggle for the year 2017. To do so I would need to use a bot.

My question: if I read the term of use of Forbes (https://www.forbes.com/terms-and-conditions), it is clearly protected by a copyright. On Wikipedia, we can already find the top 20 rank. How to proceed ? Do I need to ask Forbes if they agree to have their list in wikidata?

I'm unsure about the copyright status, but don't think part of (P361) is the right property for indicating something is present in a list. --SilentSpike (talk) 10:14, 3 March 2020 (UTC)

Model itemsEdit

I think it would be a good idea to establish some model item (P5869) items for business (Q4830453) and company (Q783794) (and any other items that should be maintained by this WikiProject - currently just those two). McDonald's (Q38076) seems like a decent starting point as potentially the most globally well known business which could be easily fleshed out with sourced information. --SilentSpike (talk) 23:12, 14 April 2020 (UTC)

Seems to be in a good shape, but some statements are not referenced. Apple Inc. (Q312) is even in better.--Jklamo (talk) 07:40, 15 April 2020 (UTC)
@Jklamo: Good pick, looks to be very fleshed out. I think we could add both as model items (as they are) and the data can be cleaned up where needed (things I notice missing most are start/end times and non-wikipedia import sources) only serving to make them even more model. Have gone ahead and done so now. --SilentSpike (talk) 11:11, 15 April 2020 (UTC)

NavboxesEdit

Currently some properties related to companies are on Template:Organisation_properties. Should all properties for companies go there or should we make a seperate navbox for companies?

Franzsimon Kopiersperre Jklamo ArthurPSmith S.K. Givegivetake fnielsen rjlabs ChristianKl Vladimir Alexiev User:Pintoch Parikan User:Cardinha00 User:zuphilip MB-one User:Simonmarch User:Jneubert Mathieudu68 User:Kippelboy User:Datawiki30 User:PKM User:RollTide882071 Kristbaum Andber08 Sidpark SilentSpike Susanna Ånäs (Susannaanas) User:Johanricher User:Celead


  Notified participants of WikiProject Companies

Iwan.Aucamp (talk) 14:19, 19 May 2020 (UTC)

Thanks for link, I was not aware of that template. We do not have similar template here and properties seems to be similar. I think it will be a good idea to add company properties there.--Jklamo (talk) 16:43, 19 May 2020 (UTC)
Return to the project page "WikiProject Companies".