User talk:Rjlabs/WikiData Company Data Project

Latest comment: 6 years ago by Rjlabs in topic Events taxonomy notes

Just how much data does Wikidata want to hold?

edit

Not sure I understand Wikidata scope. For example will WikiData follow Wikipedia and exclude most smaller companies from coverage, or will Wikipedia cover every company (or at least every public) company?

For the U.S., perhaps other countries, there is already highly structured financial statement data for all public companies. See for example:

Of course quarterly financial statements are only the tip of the iceburg. Will there be 8K (news releases) included in WikiData, currency reporting in real time, last sale transaction by transaction, bids and offers, etc.

To cover companies you need financials plus market data (last sale, closing price, bids/offers), plus a database of who is trading (Legal Entity Identifiers). All the notes to the financials and all the text filings are very important too. Will WikiData swallow the entire thing? Much of the usage of the financial data sets are high speed, does it make any sense to try to load it into WikiData rapidly?

Rjlabs (talk) 00:14, 5 December 2016 (UTC)Reply

  • @Rjlabs: good questions. I don't believe wikidata as it stands is suited to capture time-series data generally - for example stock prices over time - there's no datatype appropriate for time series other than quantity datatype with qualifiers, which would mean one statement for every point in time, which is really really messy. However, as far as containing some data about "every company", I think that may legitimately be within wikidata's scope. WD:N basically requires only that the entity be something in the real world that is described by reliable sources, so as long as we have some third party dataset within appropriate information, and with a reasonably compatible license, we can pull that in. Or portions of it depending on the license. I've been working with the GLEIF data for US entities via Mix N' Match - here - and found a huge number of entities there which aren't in wikidata but maybe could be. I've marked all the mutual fund/hedge fund/retirement fund entities in GLEIF as NOT suitable for wikidata - there are a lot of those as well and I doubt we really need a wikidata entity for every "municipal bond index fund" or whatever. Of course a large fund that is covered in wikipedia for some reason would be fine. Also not everything else in GLEIF is a company - there are a lot of churches, cities and counties, private colleges and universities, etc. and there are a few individuals. A lot of the "small companies" in GLEIF seem to be related to real estate - property management or rental etc. So I think by-hand filtering is still appropriate there but in the long run a good fraction of those small companies probably ought to be in wikidata too. ArthurPSmith (talk) 14:08, 5 December 2016 (UTC)Reply
Although WD notability policy is enough broad to cover all companies, I think we have no human resources to maintain millions items about companies. But I have no problem with having items about all listed companies and other notable companies (by means of enwiki notability).
Even if we have market capitalization (P2226), I think it is better to use it on annual (or quarterly) basis, rather then day basis. About total revenue (P2139) (etc.) I think WD is able to swallow quarterly financial statements.
For ownership changes (parent organization (P749) and owned by (P127)) it will be useful to store only sizable changes, not a purchases of few stocky by management. But when we are talking about ownership, it will be useful to clarify use of these two properties first.--Jklamo (talk) 10:36, 6 December 2016 (UTC)Reply
I think the company information that can be important from external databases can be well important. When it comes to stock prices, I don't think that daily prices should be listed in Wikidata in the current form. ChristianKl (talk) 16:10, 3 February 2017 (UTC)Reply

Events taxonomy notes

edit

from a project chat

Company data project - R&D and notes on company EVENTS structuring / meta data

(notified project Companies participants) Finally found a spare moment and wanted to add a few comments to this item which rolled off: Wikidata:Project_chat/Archive/2018/01#Wikidata_data_model_"deficiency" ...ONE event with 1 preceding entity and 2 succeeding entities. Really, I wanted to add more general comments about company events, and the meta data around events "sooner than later". IMHO having a robust and working notability policy for company inclusion/exclusion should come first. Events are a very tough area, seems like a potential "black hole". My notes were longer so I put them here: User:Rjlabs/WikiData Company Data Project#Company Events Rjlabs (talk) 21:18, 16 March 2018 (UTC)Reply

I think we should err on the side of creating more items, rather than trying to stuff too much into a single item. We have properties like replaces (P1365) and replaced by (P1366) that can track the sequence of entities in the cases of mergers and splits. If an "event" has reputable sources for its existence (eg. there's plenty of news stories etc. about the HP split) then create an item for the event also and link to the various company items. For less important events, I assume adding significant event (P793) with generic values (initial public offering (Q185142) for example) and appropriate qualifiers would be sufficient? Then the question is, what events count as "significant"? ArthurPSmith (talk) 23:48, 16 March 2018 (UTC)Reply

Arthur, great to hear your reasoned input. Agree that "plenty of news stories / from reputable sources" is a reasonable test for inclusion. More general comment at User:Rjlabs/WikiData_Company_Data_Project#Birth_&_death,_acquisitions,_merger_and_spin_off. As for what corporate events are eligible to become entities/items/objects in themselves how about a "white list" and "black list" for corporate events? Examples:

White list

  • Event (creates or destroy Wikipedia or WikiData entity/establishment objects, or changes data held in a entity/establishment info box in use at Wikipedia for companies) AND, there is at least one reliable secondary source covering the event
  • 8-K reported event that is filed (or international equivalent) for any public company that has stock or bonds actively traded on an established exchange.
  • Similar events for private companies that have at least 500 full time equivalent participants (employees, officers, directors, managers...) provided it was carried by at least one reasonably reliable secondary news source.

Blacklist

  • employee promotion news events unless the person promoted has an article in Wikipedia
  • promotional news release for new products in the ordinary course of business unless three or more tier one news outlets cover the new product
  • shareholder proposals receiving or likely to receive less than 10% of the share vote
  • etc.

--  – The preceding unsigned comment was added by Rjlabs (talk • contribs) at 03:57, 18 March 2018‎ (UTC).Reply

Return to the user page of "Rjlabs/WikiData Company Data Project".