User:Rjlabs/WikiData Company Data Project

This is a work-in-process draft, that is itself now largely out of date. The Company Data project actual scope is greatly reduced from what is expressed here. Wikidata infrastructure does not well tolerate large volumes of table and time-series data, which is the foundation of economic and financial reporting. There appears little desire for importing into WikiData large volumes of data readily available in other electronic repositories (Such as EDGAR in the U.S.). For the next several years its likely that Company data on WikiPedia will house only the select highlights data, manually entered, that underlies various Wikipedia info boxes and similar analytical templates. For the current official page see Wikidata:WikiProject_Companies

This is an evolving and very rough draft/outline to set scope, clarify, organize and sequence many facets of the Companies Project. It attempts to be "top down" and forward looking. Yet, it also attempts to note "bottom up" (highly granular) issues, particularly those which impact the ontology in this and adjacent areas, which if ignored may end up requiring large refactoring(s) of even the basic company data schema. As the project fills up with useful data changes to the base schema will become more and more difficult and unwelcome. This effort here is a broad attempt to flush out the important use-cases for Company data up front, to help minimize change down the road.

Overview edit

  • Long term and short term prospective
  • Addresses (potentially) a great deal of data
  • Provides straight talk application of the notability criterion (for example are virtually all publicly traded companies notable and therefore to be included in the data sets?)
  • Starts to address the wholesale import of public company data from large governmental sources, including update cycles
  • Addresses the need for historical and real time FX data to performing conversions historically and currently "on the fly"
  • Covers public companies in depth but also potentially covers all "organizations" that have any structured data who's output eventually consolidates into Gross Domestic Product.
  • Accommodates public, private, sole prop, governmental, non profit, etc. organizations. (Any organization with anyone on a payroll, sole proprietors, partnerships, corporations, anyone who borrows in the debt markets such as incorporated cities, counties, etc.
  • Includes investing entities, holding companies, hedge funds, etc.
  • Strives to "see through" intermediaries.
  • Strives to be location agnostic (global friendly, report numbers in the user's currency of choice.
  • Store "one gold copy" on WikiData, serve that to various data "consumers" at various WikiPedia projects
  • Provides Wikipedia authors with easy instructions to query and draw company data from Wikidata, transform it to the currency of choice, and display the results on Wikipedia pages that are built in real time.

Update edit

  Comment 11/2017 updated thoughts

To the best of my knowledge there is no objective notability standard that clearly delineates what companies are sufficiently "notable" to permit inclusion into WikiData. I understand there are very real resource constraints for WikiData such that large amounts of table data are strongly discouraged. A current "defacto" standard seems to be if the company has an article in Wikipedia, then its sufficiently notable for WikiData. This approach has some significant drawbacks. A "boring" company may lack coverage in Wikipedia, but it is nonetheless economically quite significant. At Wikipedia no interest means no coverage. While there is pretty good coverage of the largest companies, significant gaps start to appear after you get past the top 20-100 firms in each country. In WikiData the best approach would be the same as taken by a statistical business register]. Ultimately you want a sampling plan where all of the economically significant entities are actively curated, and a statistically valid, representative sample of the remainder are included, such that you can "add up" all the detailed company data (interpolating as needed to fill in for the multitude of missing small company data) and arrive at meaningful aggregate figures such as total industry statistics, on up to national and international macroeconomic statistics. In essence the WikiData of the future should be much like the better run national statistical business registers, and much less like the popularity/fashion standard currently in place at WikiPedia. However there is no way WikiData (at least in the next several years) is going to have the volunteer talent to exceed the resources available in the worldwide pool of statistical business registrars, so another, much less costly approach ("beggars can't be choosers") must be found.

IMHO every public company, that trades daily on a public exchange, has already met a significant, and completely objective "notability" standard. Its not taxing on the WikiData system to keep say 10 items of up to date and highly accurate data on all public companies in the world. The key here is of course automation for sustainability. The initial population task is not so hard, its the endless trail of maintenance that is the real challenge.

LEI is a free global database, with a disciplined globally unique identifier. US, UK, EU and much of Asia already have LEI established. Plus, this data is freely available without any copyright/license issues. All major corporations worldwide already have a LEI.

The challenge is the LEI itself doesn't have the 10 items of data that you seek for every company to be included in WikiData. Further the LEI has much too broad coverage to be included wholesale in WikiData, at least initially. There are LEIs for every major subsidiary of every company, and LEIs for financial companies who own portions of public companies, or lend to them (mutual funds, hedge funds, banks, exchange traded funds, etc.) How to hive off the financial companies and subsidiaries and be left with just the uppermost LEIs remains a reasonable current challenge.

Getting back to the goal of having about 10 data items on every publicly traded company the LEI is an excellent start. The LEI is hands down the best unique key. However the FULLY AUTOMATIC population of the 10 CURRENT data items for each public company remains outside the LEI database. For this you must have well structured data available from each country that supplies it for public companies, free of license charges or copyright issues.

Outside of WikiData you need the LEI as the master, and data for all public companies that is kept up to date and made available free. Data on all public companies (excluding financial entities) must converge those two data sources and be fed to WikiData, with regular updates. Sure it can be performed on a country by country basis. As long as it is automatic. Thankfully LEI is already working on hierarchy upgrades so getting to the "top" of the ownership chains may soon be much easier. All that would be left is the "join" and "merge" operation between LEI and a country by country repository of structured company data (example EDGAR in the U.S.)

Sometime "down the road" WikiPedia company data curators can figure out how to handle 1. private companies, 2. legal entity subsidiaries that are major, 3. "establishments" (including divisions, major facilities (plant and major office sites by GEO boundaries), teasing out clean industry statistics from legal entities and establishments -- all as current done by the (mostly hidden) army of statistical business registrars. Statistical business registrars operate with substantial taxpayer supported resources. In many, if not most cases they are precluded by strong laws that mandate that they do NOT make their data available to the public. Typically their master repositories of entities and establishments (and their linkages) are highly confidential. While they must have substantial granular data to operate, they almost universally pledged to mask all individual data and only report aggregate summaries. These are long term legal and cultural wars that WikiData must fight. How will sunshine be brought into the statistical business registrar offices and publicly paid for data be hauled out, in excellent order and without charge; without violating confidentiality concerns and laws which tend to be archaic. An excellent first step here would be to raise the coverage and quality of this article statistical business register]

*** End of Update ***

The following is a very rough starter / brainstorming outline for various facets of "company data on wikidata and wikipedia, now and in the future".

Scope edit

Goal edit

  • Make Wikidata a relevant company data repository, capable (over time) of holding for all:

Public companies edit

  • Worldwide coverage - a (potentially) large amount of data on each

Private companies with 20+ employees edit

  • worldwide coverage - a (potentially) amount of data on each, likely however less than public companies with automated reporting systems

Private companies with <20 employees edit

  • sporadic worldwide coverage - limited "directory" style information only (to start).
  • Includes all fields stored in the Legal Entity Identifier (LEI) database (except LEI itself), plus (were available):
    • hours
    • geo coordinates of the main public entrance
    • geo coordinates of freight delivery
    • websites
    • owner(s)/propretor(s)/contact(s)
    • sales
    • number of employees
    • industry codes
    • for retail operations all data that OSM would need for typical "points of interest" mapping applications.

The above goal will only be achieved gradually edit

  • The vast majority of company data will come from external databases, machine transferred.
  • "All" and "worldwide" are aspirational and futuristic. We start with the largest and most important companies worldwide and work down.
  • Private companies with <20 employees may be be the only area where some of the limited directory information is not "second sourced", the companies themselves do all the data entry and maintenance.

Adjacent Projects edit

For projects adjacent to this one, under the umbrella of Business and economics Wikidata Projects see: Category:Business_and_economics_WikiProjects

Store Once edit

  • data is to be stored only once, in a single master database, along with many different language labels
  • the "design" of the data is worldwide, country agnostic
  • this enables the the data to be served with a specific end user language at "run-time"
  • "Store once" simplifies error correction and updating vs. having multiple copies of the same thing.
  • Above is ideal and aspirational, especially where different accounting standards are involved. Until accounting standards converge there must be tagging that shows what specific accounting standard was followed.

No "shoehorning" edit

  • data that's merely "similar" but in reality actually different should not be lumped together to "simplify" it.
  • For example if a line item in a financial statement is calculated differently under US Generally Accepted Accounting Principles (US-GAAP) vs. International Accounting Standards Board (IASB) standards (and both are known) it should be stored twice.
  • individual data must be tagged with the standard(s) it complies with.

Birth & death, acquisitions, merger and spin off edit

see also Update above.

In addition to the challenge of evolving a notability standard (company is either notable enough to warrant inclusion in WikiData, or it isn't), there are several issues around when a company is born (then perhaps reaches the notability threshold for Wikidata inclusions) and evolves via acquisition, merger and spinoffs, then eventually goes out of existence in one form or another.

Wikidata, like Wikipedia is "noun" biased - one company; one encyclopedia article. This finds equivalence in WikiData as one company; one element (object). DeFacto rules as to when to add, remove, subdivide and combine elements (objects) at WikiData in terms of companies are not yet well formed. Right now they just follow WikiPedia. Likewise is no garbage collector that periodically goes through and removes elements/objects/companies when they are no longer of value.

Note that the volunteer community seems to favor "creation" over maintenance and destruction of elements/objects/companies.

An establishment can typically be described as fully contained in a small geographic polygon that does not (typically) cross traditional, well known geographic boundaries such as counties, states, nations, major waterways, (and in some cases even down to a single street block). Even virtual companies have employees and revenues which have geographic attributes.

Wikidata is faced with a choice - let fashion dictate or take a methodical statistical sampling approach. Thus far the fashion approach has worked well at Wikipedia. Popular companies get coverage.

In the future, when companies coverage at WikiData expands sufficiently to unearth the establishment issues a statistical approach will become more and more necessary. Prior to that undertaking volunteers would do well to become intimately familiar with statistical business registers, and to greatly beef up that article.

To produce meaningful economic data both entities and establishments must be tracked (and the object level), along with some mechanisms linkages to ownership hierarchies, financial consolidations, etc.

Only at a fixed point in time are entities and establishments static. The very next day a perfect object model of both entities and establishments is created (including the interconnections) it will become out of date. Maintainability features must be inherit in the object modeling up front to accommodate change.

Company events, a concept even wider than corporate actions, are not covered in this section. Obviously major events can require splitting, combining, creating or destroying objects. The same "event model" that drives object creation and destruction may possibly be extended to include normal corporate events (issuance of annual statements, changing a CEO or Chairman, having a major accident, etc.) However the entire subject of corporate news, SEC reportable events, and major corporate events is simply ignored for the moment.

Ultimately, the only way to establish the entity/establishment model is via some form of a statistical business registry. Every advanced country maintains such a registry, and they all address the same fundamental operational challenges to produce their registry. Those groups have already solved the meta data problems. Of course the better statistical business registry operations are amply funded. Much of the book keeping and data collection work must be tedious and boring. Unfortunately most countries registry is held confidentially. However the techniques they use and the meta data models are most often not. Its unlikely that WikiData will ever have the volunteer resources to completely duplicate the work of each country's registry. So a lightweight solution will be needed that fits with very real resource constraints.

Strongly suggest improving WikiPedia's coverage of statistical business registers prior to attempting to model WikiData's inclusion of companies (and entities).

Existing Core Projects edit

Company Infobox edit

Data must "add up" edit

Company data must "add up" neatly to economic aggregates

Individual company data, originating from the company (reported directly to the public and via accessible governmental filings) must be tagged accurately, down deep edit

  • The bulk of company data originates internally in individual accounting departments, following the the design of that company's internal chart of accounts.
  • The exact chart of accounts is unique to each company.
  • The most granular data generated by a company is tagged according to the chart of accounts. Data at that level of granularity (transaction by transaction) is considered highly proprietary and never released
  • Financial statements released to the public are prepared according to a specified body of accounting principles (for example US Generally Accepted Accounting Principles, different in UK, EU, France, Germany, etc.)
  • Public financial statements are released quarterly (unaudited), and annually (typically audited).
  • The design of the chart of accounts used internally facilitates (but rarely fully automates) the preparation of periodic public financial statements, which are generally constructed according to the standards then current in that specific country.
  • Publicly accessible financial statements (at least in the US) are subject to an additional elaborate tagging scheme called xBRL. In other countries different tagging schemes and database structures are applied to public financial statements, and they have highly varying degrees of cross-compatiblity.
  • Very few companies adopted the external reporting model (a/k/a xBRL in the U.S.) tagging to their internal chart of accounts. As such there is a translation step between internal and external reporting models.
  • Therefore public financial reports, a prime source of Company Data, have already gone through one level of translation.
  • Across countries, with different accounting standards, similar translations must occur to insure cross compatiblity.
  • One might conclude that international accounting standards would be the best "ontology" for Wikidata to adopt at its core, however most of the largest companies out there (say the top 500 public firms based in each country) do not all report under that standard and external database vendors that purport to show cross country compatibility are currently very expensive as much of the translation work is done by hand.

Not all company data originates with public financial statements edit

  • The public financial statements (income statement, balance sheet, cash flows, owners equity) including the notes (management discussion and analysis, financial statement notes, audit report, etc.) constitute a significant proportion of Company Data, available to the the public, but hardly all.
    • Segment data, one of the standard notes to the financial statements, is the most granular level generally available data that enables determining industry encoding. Segmentation however is subject to company by company interpretation (and overwhelming propensity to aggregate so as to not reveal sources of high and low profitability to competitors and shareholders).
  • Other sources of Company Data, originating from the company itself, are also critical. Notably, this includes the rich information available in the 10-K, 10-Q and proxy statements (in the US.)
    • Executives and Officers, including backgrounds and compensation
    • Key facilities (factories and office floor space by location
    • Large shareholders
    • Risk factors
    • Major brand names
    • Major litigation

Data that originates from a company and is reported to government, may only be accessible in aggregate edit

One use-case here might be a small town about to gain or loose a major manufacturing facility. Calculating the economic effect of that single company facility moving in or out.

  • Payroll by geographic area
    • by industry
    • by job title
    • employment and unemployment
    • weekly and monthly data
  • Tax data
    • for proprietors, partnerships, corporations
  • Census data
    • Surveys of manufacturing & services
  • Economic data
    • Surveys of industry
    • Surveys and estimates used in input/output acounting
    • GDP accounting
      • by region
      • by country
      • worldwide
    • Transportation data
      • Commodity flow
      • Rail, ship, truck transportation data
      • Imports and exports by category
    • Other governmental data
      • Hazardous materials
      • Pollution
      • Industry specific governmental data troves

Significant company data that does not originate from the company itself edit

  • stock price (actual sales, bids and offers)
  • news items not originating from the company
  • ownership changes
  • what investment funds own large portions of the companies
  • key executives of large investment funds (critical to "unite" in corporate action proposals)
  • Tender offers and mergers

Converting from accounting data at the company level to economic data at higher levels of aggregation edit

  • One company sells intermediate goods to another. If sales are added, there is duplication when calculating collective output.
  • Accounting is oriented to net income; Economics is oriented to Value Added.
    • Determining what is "purchased" by one company, from another, is a data challenge when starting from company financial statements (particularly the typical accounting reporting by segments)
    • too much "hand adjusting" needs to be done, company by company, to allow economic aggregates to be computed. Better tagging could very much help.
    • raw material and finished goods inventory changes impact the calculation of economic value added
    • changes to work in process inventory are complex to evaluate
  • Special attention to tagging, down at the granular level, allows company level data to "add up" at different levels of aggregation (industry, sector, national, global). This requires "value added" tagging down to the company level.
  • Industry and sector tagging is also critical. Segment reporting (accounting concept) to industry reporting (economic concept), including the use of translation maps (what percentage of all purchases were from industry 1,2,3... at time0; applied to all purchases at time1, etc.)

LEI edit

Go to school on the LEI edit

The Legal Entity Identifier project was initiated after the stock market crash of 2008 is critically important for WikiDataens to really go to school on.

  • Global in scope
  • Very current, very high quality development
  • Many eyes looked over the project
  • Excellent data hygiene practices
  • Excellent data modeling
  • Highly reliable
  • Phase II (who owns who) will be critical going forward
  • Fully maintained and updated.

@Rjlabs: Yea? The data is IMHO very (TOO) simple, eg see this gist. A typical trade register (eg Bulgarian trade register fields) is a lot more complex. Furthermore, GLEI has only 500k companies, compare to OpenCorporates that has 160M. --Vladimir Alexiev (talk) 20:34, 23 February 2017 (UTC)

LEI Data File Format edit

The LEI is an ISO standard. You need to purchase the standard from your ISO organization to get access to the very latest standard documentation, which has a copyright.

On Wikidata:

Future ("early 2017"):

  • more to come here from notes and emails

! https://ir.nist.gov/feiii/

Wikidata Ontology & Taxonomy Issues edit

→ All statements must be sourced

  • Transfer industry classification currently done by categories to Wikidata
  • Develop a controlled dictionary for industry (P452) (currently ~100 industries are used)

Industry edit

Sector edit

Nation (Economics, including National Accounting, Value Added, Input/Output) edit

Company Events edit

General comments

  • Notable?
  • 8K SEC filings (catalogue of events, event tagging)
  • News Articles parsing & linking

In standardizing data around companies (legal entities) and establishments (facilities with set geo locations) the standardization of events rapidly comes into play. Here in Wikidata (and more so in Wikipedia) the historical emphasis has been on "nouns", examining things that exist as of a certain time. Over time companies are founded, grow, acquire, spin off, go bankrupt, etc.) Over time they issue financial reports, news releases, have major events happen. Events often change the "noun" structure of things. Sometimes the long history of events "changing the nouns" is critical to understanding the company as it is today, to to predict how it will perform in the future.

If you look backwards to printed encyclopedias they had articles built around nouns in a multi volume set that stayed static for 10+ years. "News events" by year were published is a separate book, purportedly covering the major events of that year. None of this translates well to 24/7/365 release of news and constant editing of Wikipedia articles.

I consider Events to be perhaps the hardest area for the company data project to address and perhaps left to future. Here however are some rough notes on how to "standardize" (create meta data) for company events.

Illustration of why company events are very important edit

Wikidata data model "deficiency" pulled from archive that rolled off https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2018/01

The WD data model is more powerful than RDF because of references and qualifiers. But IMHO it's also weaker in that people are less willing to create items than general RDF nodes.

Eg how to model that Hewlett Packard split into HP Inc and Hewlett Packard Enterprise in 2015? I added two "followed by" values https://www.wikidata.org/wiki/Q80978#P156 but it really is ONE event with 1 preceding entity and 2 succeeding entities.

Guess I could make a prop "significant event" but there's no standard property to link the participants (succeeding companies).

And I don't think many people would agree to make out the Event as an Item... and to standardize a vocabulary of event participant roles. --Vladimir Alexiev (talk) 16:12, 8 January 2018 (UTC)

And I guess we have the same issue in the other direction for mergers? I could argue that if a specific corporate event (like a breakup or merger) had sufficient significance in itself it should have its own item. I'm not sure that all such events deserve an item, but the HP example seems like a good one. ArthurPSmith (talk) 16:55, 8 January 2018 (UTC)
Mergers and splits are complicated, but there is no deficiency.
In fact in most cases of mergers there is no new entity, just second company is merged to the first and first change its name. The correct way to represent this is no new item, but record appropriate changes to existing items. Problem is that sometime new wiki article is created for "new" entity. But that is more likely deficiency of wikpedia, not wikidata.
Same for splits, in most cases there is no one old entity and two new entities, just new entity is created and old one change its name. Again the correct way to represent this is one new item for new entity and record appropriate changes to existing item. Again the problem is that sometimes two new articles are created.
That is the case of HP, "Hewlett Packard Enterprise" is really new entity, but "HP Inc." is nothing but renamed "Hewlett-Packard Company". So correct way to record this is rename Hewlett-Packard (Q80978) to HP Inc. (also add appropriate official name (P1448)), leave followed by (P156) and Hewlett Packard Enterprise (Q19923099) but delete followed by (P156) HP Inc. (Q21404084), and at HP Inc. (Q21404084) left just permanent duplicated item (P2959) and Hewlett-Packard (Q80978). But problem it that articles connected to HP Inc. (Q21404084) already exist.--Jklamo (talk) 17:04, 8 January 2018 (UTC)
There are different scenarios:
  • A + B = A
  • A + B = C
  • A = A + B
  • A = B + C
So we need 4 different event names and 4 different ways to model these events. Snipre (talk) 07:58, 9 January 2018 (UTC)
  • As statements about HP before and after might be considerably different, I'm not entirely convinced by the purely formal approach suggested by Jklamo.
    --- Jura 10:46, 9 January 2018 (UTC)

Form 8-K edit

Endless arguments ensue at Wikidata regarding notability requirements for companies. Down from there will be endless arguments as to what events are actually notable, related to the set of companies that WikiData/Wikipedia actually provides coverage for.

Thankfully the U.S. SEC has prescribed in some detail, what is a significant event. See [Form_8-K] and detailed SEC guidance for what constitutes a "must report" 8-K event. The actual filing of an 8-K report is thankfully "objective" from a WikiData prospective.

  • Signing, amending or terminating material definitive agreements not made in the ordinary course of business
  • Bankruptcies or receiverships
  • Mine shutdowns or violations of mine health and safety laws
  • Consummation of a material asset acquisition or sale
  • Results of operations and financial condition
  • Creating certain financial obligations, such as incurrence of material debt
  • Triggering events that accelerate material obligations (such as defaults on a loan)
  • Costs associated with exit or disposal plans
    • Layoffs
    • shutting down a plant
    • material change in services or outlets
  • Material impairments
  • Delisting from a securities exchange or failing to satisfy listing requirements
  • Unregistered Stock sales (private placements)
  • Modifications to shareholder rights
  • Change in accountant|accountants
  • Determinations that previously issued financial statements cannot be relied upon
  • Change in control
  • Senior officer appointments and departures
  • Director elections and departures
  • Amendments to certificate/articles of incorporation or bylaws
  • Changes in fiscal year
  • Trading suspension under employee benefit plans
  • Amendments or waivers of code of ethics
  • Changes in shell company status
  • Results of shareholder votes
  • Disclosures applicable to issuers of asset-backed securities
  • Disclosures necessary to comply with Regulation FD
  • Other material events
  • Certain financial statements and other exhibits

Corporate Actions edit

Many "corporate actions" constitute a significant event. Attempts to construct a taxonomy of events that everyone likes however has been quite illusive.

Major broker / dealers must maintain security master files and carefully update them, track and communicate major corporate actions out to all shareholders, analysts and brokers (lots of intense documents!)

Here is an attempt, published as a standard taxonomy in 2012 but not updated since then, and currently "downgraded" to the lowly status of "research and data" only. It was developed by xBRL, no strangers to taxonomies and standarization: [[1]]

Sipping from the fire hose - event driven news feeds edit

see Bloomberg event driven feeds as an example.

Corporate events calendar

  • earnings release dates
  • sales results
  • shareholder and board meeting information
  • etc.
  • Event dates are predicted up to one year in advance.

Textual news flow

  • headlines and stories
  • web and social media content
  • information is paired with metadata covering companies, topics and people.
  • 151 global bureaus generating (Bloomberg alone)
  • 10,000 headlines and stories per day (Bloomberg alone)
  • press releases wires
  • social media
  • blogs & industry sources
  • company press releases
  • government website content
  • links and tags

At Bloomberg for example

  • 75,000 securities / tickers
  • 10,000 topics (tag that are assigned to text news)
  • People
  • Topics, tickers and people "Relevancy scores"
  • Historical archive (the entire fire hose contents) dating back to 1992
  • Classification and hierarchy of topics

Company Analysts Events edit

A little known group called RIXML publishes xml standards that include events for company analysts (typically issuing buy/hold/sell recommendations) to code to

'EventTypeEnum Indicates the type of the event. Each type is either conference, corporate event, government meeting, or sellside firm event.

Enumeration Values

  • IndustryConference
  • CountryRegionConference
  • SeminarConference
  • ThematicConference
  • TradeShowConference
  • Conference
  • AnnualShareholderMeeting
  • IPO
  • EarningsRelease
  • EarningsReleaseDiscussion
  • EarningsGuidancePreliminaryDiscussion
  • SpecialShareholdersMeeting
  • CorporateActionCommentary
  • RoadShow
  • GovernmentMeeting


EventVenueTypeEnum Describes the venue of an event.

Enumeration Values:

  • ConferenceCall
  • OneOnOneMeeting
  • GroupMeeting
  • Webcast
  • WrittenRelease
  • InternetAudio
  • InternetVideo
  • MediaAppearance
  • Transcript
  • Brief

Short synopsis of transcript taking key points.

  • Podcast

A podcast is a type of digital media consisting of an episodic series of audio, video, PDF, or ePub files subscribed to and downloaded through web syndication or streamed online to a computer or mobile device.

  • Interview

An interview is a conversation between two people (the interviewer and the interviewee) where questions are asked by the interviewer to obtain information from the interviewee.

  • Panel

A discussion forum in which a moderator directs questions from an audience or other sources to a group of speakers with expertise or other valued perspectives on the topic at hand.

See also talk at talk page.

Sources of company data and statistics - government & free edit

  • EDGAR (SEC filings)
  • XBRL
  • IRS
  • DOL
  • OpenCorporates (Q7095760)

Sources of company data and statistics – private sector/closed source/proprietary edit

  • S&P
  • MSCI
  • others such as: Fortune Hoovers

Fundamental Data edit

  • Financial statement data
  • Sector, industry, index assignments and segment data
  • Earnings releases
  • MD&A Narrative
  • Notes to financial statements
  • Regulatory Filings
  • 10K, 8K
  • Registration statements
  • Proxy statements
  • Ownership and change in ownership data
  • Spin-off & Reorganization
  • Stock exchange/exchanges where the company is traded
  • Stock ticker symbol/symbols

Market data edit

  • Markets where company (stock, bonds, futures, options) are traded.
  • Trade executions (volume and price by time bucket, or every trade)
  • Dividend and split data
  • Bids and offers
  • Options and Futures
  • Bond data
  • Important Related Data (macro economic data releases impacting the markets)
  • Market indexes
  • Industry indexes
  • Pairs data?

FX edit

  • FX information is critical especially in a global context. Need to be able to “enter total assets once and have that expressed in different countries using their home currency – with the option to display it as of a certain date, or as the reported date but converted as of today’s conversion value”

People (company people) edit

  • Executives & officers
  • Board members — board member (P3320)
  • Owners of significant proportion (including via intermediaries)
  • Lead scientists & researchers
  • Unions
  • Standardization of job titles & functions
  • Employment statistics
  • Value added
  • Key “outside” people
  • Lobbyists
  • Outside accountants
  • Outside attorneys
  • Campaign contributions
  • PACs

Places (company geo-coordinates) edit

  • Where products are made
  • Where products are sold
  • Commodity flow in
  • Product flow out
  • Pollution & environmental hazards

Products (produced by a company) edit

  • Products contained neatly by Industry, contained by Sector, contained by GDP, contained by world GDP
  • Bar code to company
  • Product/company data as contained in shopping bots

Templates edit

Show superclasses edit

and more The following line

expands to:

Classification of the class business (Q4830453)     
For help about classification, see Wikidata:Classification.
Parent classes (classes of items which contain this one item)
Subclasses (classes which contain special kinds of items of this class)
business⟩ on wikidata tree visualisation (external tool)(depth=1)
Generic queries for classes

Query Examples edit

Billionaires edit

#Billionaires
#added before 2016-10

SELECT ?item ?itemLabel ?billions
WHERE
{
  ?item wdt:P2218 ?worth.
  FILTER(?worth>1000000000).
  BIND(?worth/1000000000 AS ?billions).
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY DESC(?billions)
Try it!

All subclasses of a class example 1 edit

# All subclasses of a class example
# here all subclasses of P279 Organization (Q43229)
SELECT ?item ?itemLabel ?itemDescription ?itemAltLabel
WHERE
{
	?item wdt:P279 wd:Q43229 .
	SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
ORDER BY ASC(?itemLabel)
Try it!

All subclasses of a class example 2 edit

# All subclasses of a class example 2
# here all subclasses of P279 Business Enterprise(Q4830453)
SELECT ?item ?itemLabel ?itemDescription ?itemAltLabel

WHERE
{
	?item wdt:P279 wd:Q4830453 .
	SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
ORDER BY ASC(?itemLabel)
Try it!

List all "normally applying" properties (including inherited) edit

This can be auto generated from a real simple template

Properties for the class <oil company>

prefix wdt: <http://www.wikidata.org/prop/direct/>
prefix entity: <http://www.wikidata.org/entity/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wikibase: <http://wikiba.se/ontology#>

  # Query to find, for a given starting class ?tree0 (in this case Q14941854),
  # what properties are registered as normally applying
  # to it, or to a super-class of it
  # Query generated by [[Template:PropertyForThisType]]

SELECT ?class ?classname ?property ?propertyname WHERE {
  ?tree0 (wdt:P279)* ?class .
  BIND (entity:Q14941854 AS ?tree0)
  ?class wdt:P1963 ?property .
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en" .
    ?class rdfs:label ?classname .
    ?property rdfs:label ?propertyname .
  }
}
Try it!

All instances of the class "Organization (Q43229)" edit

# All instances of the class "Organization (Q43229).
# Since "Organization" is a broad class this returns many sub classes and instances, limits to 10,000
# see All subclasses of a class to return just the class names (not the instances)

SELECT ?item ?itemDescription ?itemLabel 

WHERE {
  ?item wdt:P31 ?sub0 .
  ?sub0 (wdt:P279)* wd:Q43229  .
  
 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,en"  }  
}

# ORDER BY ASC(?item)  causes a timeout
LIMIT 10000
Try it!

All instances of the class "oil company (Q14941854)", a sub class of Organization edit

# All instances of the class "oil company (Q14941854)", a sub class of Organization
# the class structure under "Organization" is chaotic and needs extensive reorganization work. 
# results are few and major oil companies are missing. 

SELECT ?item ?itemDescription ?itemLabel 

WHERE {
  ?item wdt:P31 ?sub0 .
  ?sub0 (wdt:P279)* wd:Q14941854  .
  
 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,en"  }  
}

ORDER BY ASC(?item)
LIMIT 10000
Try it!

External Identifiers used for Companies edit

Trying to find all External Identifier props used for Companies. This finds all external identifier props:

select ?wd ?lab ?desc {
  ?wd wikibase:directClaim ?wdt.
  ?wdt a owl:DatatypeProperty 
  filter (exists{?wd wdt:P31/wdt:P279* wd:Q19847637} # Unique Identifier
      || exists{?wd wikibase:propertyType wikibase:ExternalId})
  #filter exists {[?wdt []; wdt:P31/wdt:P279* wd:Q783794]} # a Company: causes timeout
  ?wd rdfs:label ?lab filter(lang(?lab)="en")
  optional {?wd schema:description ?desc filter(lang(?desc)="en")}
}
Try it!

But if I uncomment the part about Company, I get a timeout. The bracketry is probably confusing, so we can expand it like this for clarity:

   filter exists {?company ?wdt ?any_prop; wdt:P31/wdt:P279* wd:Q783794}

--Vladimir Alexiev (talk) 17:05, 23 February 2017 (UTC)

Map of places of birth of economists edit

#Map of places of birth of economists
#added before 2016-10

#defaultView:Map
SELECT ?person ?name ?coord ?place ?birthplace ?birthyear
WHERE
{
   {?person wdt:P106 wd:Q188094 .} UNION {?person wd:P101 wd:Q8134.}. MINUS {?person wdt:P106 wd:Q188094. ?person wd:P101 wd:Q8134.}
   ?person wdt:P19 ?place .
   ?place wdt:P625 ?coord .
      OPTIONAL { ?person wdt:P569 ?dob .}.
	BIND(YEAR(?dob) as ?birthyear).
   ?person rdfs:label ?name filter (lang(?name) = "en")
   ?place rdfs:label ?birthplace filter (lang(?birthplace) = "en")
}
Try it!

Average lifespan by occupation edit

#Average lifespan by occupation
#added before 2016-10

# Select the desired columns and get labels
SELECT ?occ ?occLabel ?avgAge ?avgBirthYear ?count
WHERE
{
  {
    # Group the people by their occupation and calculate age
    SELECT
    	?occ
        (count(?p) as ?count)
        (round(avg(?birthYear)) as ?avgBirthYear)
        (avg(?deathYear - ?birthYear) as ?avgAge)
    WHERE {
      {
        # Get people with occupation + birth/death dates; combine multiple birth/death dates using avg
        SELECT
        	?p
            ?occ
            (avg(year(?birth)) as ?birthYear)
            (avg(year(?death)) as ?deathYear)
        WHERE {
           ?p  wdt:P31 wd:Q5 ;
              wdt:P106 ?occ ;
              p:P569/psv:P569 [
                wikibase:timePrecision "9"^^xsd:integer ; # precision of at least year
                wikibase:timeValue ?birth ;
              ] ;
              p:P570/psv:P570 [
                wikibase:timePrecision "9"^^xsd:integer ; # precision of at least year
                wikibase:timeValue ?death ;
              ] .
        }
        GROUP BY ?p ?occ
      } 
    }
    GROUP BY ?occ
  }
  
  FILTER (?count > 300) # arbitrary number to weed out values with 'few' observations
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
ORDER BY ASC(?avgAge)
Try it!

stock-listed US companies with founders who (or whose parents) were born outside the US edit

# stock-listed US companies with founders who (or whose parents) were born outside the US
SELECT ?companyLabel (GROUP_CONCAT(DISTINCT ?founderLabel; separator=", ") AS ?founders) WHERE {
  ?company wdt:P112 ?founder;
           wdt:P17 wd:Q30.
  FILTER EXISTS { ?company wdt:P414|wdt:P249 []. }
  ?founder (wdt:P22|wdt:P2137|wdt:P2139|wdt:P2226|wdt:P2295 ?money. }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
GROUP BY ?companyLabel
ORDER BY DESC(MAX(?money)) ?companyLabel
Try it!

Companies with largest revenue in United States dollar (Q4917) edit

#Companies with largest revenue in USD (Q4917)
SELECT ?item ?revenue ?unit ?revenue_USD ?date WHERE {
  ?item wdt:P31/wdt:P279* wd:Q4830453;
        p:P2139 ?statement .
  OPTIONAL { ?item wdt:P159 ?hq } .
  ?statement wikibase:rank ?rank .
  FILTER( ?rank != wikibase:DeprecatedRank ) .
  OPTIONAL {
    FILTER( ?rank != wikibase:PreferredRank ) .
    ?item p:P2139 ?statement1 .
    ?statement1 wikibase:rank wikibase:PreferredRank .
    FILTER( ?statement1 != ?statement ) .
  } .
  FILTER( !BOUND( ?statement1 ) ) .
  OPTIONAL { ?statement pq:P585 ?date } .
  {
    ?statement psv:P2139 [
      wikibase:quantityAmount ?revenue; wikibase:quantityUnit wd:Q4917
    ] .
    BIND( wd:Q4917 AS ?unit ) .
    BIND( ?revenue AS ?revenue_USD ) .
  } UNION {
    ?statement psv:P2139 [
      wikibase:quantityAmount ?revenue; wikibase:quantityUnit ?unit
    ] .
    FILTER( ?unit != wd:Q4917 ) .
    ?unit p:P2284 ?unit_statement .
    ?unit_statement wikibase:rank ?unit_rank;
                    psv:P2284 [ wikibase:quantityUnit wd:Q4917; wikibase:quantityAmount ?usd ] .
    FILTER( ?unit_rank != wikibase:DeprecatedRank ) .
    OPTIONAL {
      FILTER( ?unit_rank != wikibase:PreferredRank ) .
      ?unit p:P2284 ?unit_statement1 .
      ?unit_statement1 psv:P2284/wikibase:quantityUnit wd:Q4917;
                       wikibase:rank wikibase:PreferredRank .
      FILTER( ?unit_statement1 != ?unit_statement ) .
    } .
    FILTER( !BOUND( ?unit_statement1 ) ) .
    BIND( ?revenue * ?usd AS ?revenue_USD ) .
  } .
}
ORDER BY DESC(?revenue_USD)
LIMIT 10
Try it!

Companies with largest assets in United States dollar (Q4917) edit

# Companies with largest assets in USD (Q4917)
SELECT ?item ?revenue ?unit ?revenue_USD ?date WHERE {
  ?item wdt:P31/wdt:P279* wd:Q4830453;
        p:P2403 ?statement .
  OPTIONAL { ?item wdt:P159 ?hq } .
  ?statement wikibase:rank ?rank .
  FILTER( ?rank != wikibase:DeprecatedRank ) .
  OPTIONAL {
    FILTER( ?rank != wikibase:PreferredRank ) .
    ?item p:P2403 ?statement1 .
    ?statement1 wikibase:rank wikibase:PreferredRank .
    FILTER( ?statement1 != ?statement ) .
  } .
  FILTER( !BOUND( ?statement1 ) ) .
  OPTIONAL { ?statement pq:P585 ?date } .
  {
    ?statement psv:P2403 [
      wikibase:quantityAmount ?revenue; wikibase:quantityUnit wd:Q4917
    ] .
    BIND( wd:Q4917 AS ?unit ) .
    BIND( ?revenue AS ?revenue_USD ) .
  } UNION {
    ?statement psv:P2403 [
      wikibase:quantityAmount ?revenue; wikibase:quantityUnit ?unit
    ] .
    FILTER( ?unit != wd:Q4917 ) .
    ?unit p:P2284 ?unit_statement .
    ?unit_statement wikibase:rank ?unit_rank;
                    psv:P2284 [ wikibase:quantityUnit wd:Q4917; wikibase:quantityAmount ?usd ] .
    FILTER( ?unit_rank != wikibase:DeprecatedRank ) .
    OPTIONAL {
      FILTER( ?unit_rank != wikibase:PreferredRank ) .
      ?unit p:P2284 ?unit_statement1 .
      ?unit_statement1 psv:P2284/wikibase:quantityUnit wd:Q4917;
                       wikibase:rank wikibase:PreferredRank .
      FILTER( ?unit_statement1 != ?unit_statement ) .
    } .
    FILTER( !BOUND( ?unit_statement1 ) ) .
    BIND( ?revenue * ?usd AS ?revenue_USD ) .
  } .
}
ORDER BY DESC(?revenue_USD)
LIMIT 10
Try it!

Companies with largest net profit in United States dollar (Q4917) edit

#Companies with largest net profit in USD (Q4917)
SELECT ?item ?revenue ?unit ?revenue_USD ?date WHERE {
  ?item wdt:P31/wdt:P279* wd:Q4830453;
        p:P2295 ?statement .
  OPTIONAL { ?item wdt:P159 ?hq } .
  ?statement wikibase:rank ?rank .
  FILTER( ?rank != wikibase:DeprecatedRank ) .
  OPTIONAL {
    FILTER( ?rank != wikibase:PreferredRank ) .
    ?item p:P2295 ?statement1 .
    ?statement1 wikibase:rank wikibase:PreferredRank .
    FILTER( ?statement1 != ?statement ) .
  } .
  FILTER( !BOUND( ?statement1 ) ) .
  OPTIONAL { ?statement pq:P585 ?date } .
  {
    ?statement psv:P2295 [
      wikibase:quantityAmount ?revenue; wikibase:quantityUnit wd:Q4917
    ] .
    BIND( wd:Q4917 AS ?unit ) .
    BIND( ?revenue AS ?revenue_USD ) .
  } UNION {
    ?statement psv:P2295 [
      wikibase:quantityAmount ?revenue; wikibase:quantityUnit ?unit
    ] .
    FILTER( ?unit != wd:Q4917 ) .
    ?unit p:P2284 ?unit_statement .
    ?unit_statement wikibase:rank ?unit_rank;
                    psv:P2284 [ wikibase:quantityUnit wd:Q4917; wikibase:quantityAmount ?usd ] .
    FILTER( ?unit_rank != wikibase:DeprecatedRank ) .
    OPTIONAL {
      FILTER( ?unit_rank != wikibase:PreferredRank ) .
      ?unit p:P2284 ?unit_statement1 .
      ?unit_statement1 psv:P2284/wikibase:quantityUnit wd:Q4917;
                       wikibase:rank wikibase:PreferredRank .
      FILTER( ?unit_statement1 != ?unit_statement ) .
    } .
    FILTER( !BOUND( ?unit_statement1 ) ) .
    BIND( ?revenue * ?usd AS ?revenue_USD ) .
  } .
}
ORDER BY DESC(?revenue_USD)
LIMIT 10
Try it!