Wikidata:Property proposal/OpenSanctions ID
OpenSanctions ID
editOriginally proposed at Wikidata:Property proposal/Organization
Description | identifier of persons, companies, luxury vessels of political, criminal, or economic interest at www.opensanctions.org |
---|---|
Represents | OpenSanctions (Q110087116) |
Data type | External identifier |
Domain | persons, organizations, companies, political parties, luxury vessels |
Example 1 | Birgit Honé (Q20606538) → eu-cor-2030915 |
Example 2 | André Viola (Q2848794) → eu-cor-2032180 |
Example 3 | Semion Mogilevich (Q471862) → Q471862 |
Example 4 | Irbis Air Company (Q3396960) → ch-seco-4058 |
Example 5 | Илья Романович Абрамович (new) → rupep-person-16554 |
Example 6 | Pavel Evgenyevich PRIGOZHIN (new) → NK-knteb2sJu9NwaJ79aK6D9T |
Source | https://www.opensanctions.org/datasets |
Number of IDs in source | 167028 on 24.01.2022; Over 204,469 on 14.04.2022 (ref https://opensanctions.org/datasets/default/) |
Expected completeness | eventually complete (Q21873974) |
Formatter URL | https://www.opensanctions.org/entities/$1 |
Robot and gadget jobs | import from OpenSanctions; 53% already have WD ID |
Distinct-values constraint | yes |
Wikidata project | Wikidata:WikiProject_Organizations Wikidata:WikiProject_Companies |
Motivation
editOpenSanctions is a database of persons and companies of political, criminal, or economic interest. The project has received financial support from the German Federal Ministry for Education and Research (Bundesministerium für Bildung und Forschung, BMBF). The data sources are listed at https://www.opensanctions.org/datasets. RShigapov (talk) 15:48, 24 January 2022 (UTC)
Discussion
editHi! This is Friedrich, I'm the principal maintainer of OpenSanctions.org. Since I'm still somewhat new to WD (and the social processes around it), please bear with me. The weakness (to me) of adding OpenSanctions IDs to WD is that we're at the same time trying to converge on QIDs for entities where available - so the "OpenSanctions ID" would only ever make sense in scenarios where our database is incomplete/unreconciled. Maybe the better thing for us (at OpenSanctions) to do would be to add a form on each entity that's not identified with a QID yet to let people submit whatever WD item they think this entity corresponds with - so that we can then re-write the ID on OpenSanctions to match. --OpenSanctions (talk) 14:26, 26 January 2022 (UTC)
- I like the idea with a form! RShigapov (talk) 08:25, 27 January 2022 (UTC)
- I also like that idea. But @OpenSanctions: there are 2 reasons to also have this external-ID: 1. Do you believe all your entities should be created in WD (I do), and will be created in a short period of time (I don't)? 2. Even if you use WD for ALL of your entities, this external-ID will say which WD entities are in OpenSanctions, and provide a link from WD to OpenSanctions. We have a precedent on WD: Altmetric DOI (P5530) is a subset of DOI, but was created as a separate property (despite the fact that "DOI" already has over 10 formatters to various external sites), to indicate which papers have altmetrics, and because the altmetrics is a new distinct piece of info, so somehow "more valuabble" than the other alternative formatters
- You've done an amazingly good job with OpenSanctions, and your reuse of external identifiers and your efforts to link to WD are much appreciated. But a link from WD to OpenSanctions is also valuable, and will increase the positive exposure of OpenSanctions significantly. WD has maybe 6-7k external-ID properties (links to external datasets), so is the world's most significant coreferencing hub --Vladimir Alexiev (talk) 11:28, 14 April 2022 (UTC)
- If you agree with these arguments (and my "support" arguments below), please vote "support" --Vladimir Alexiev (talk) 11:28, 14 April 2022 (UTC)
As an alternative, what I would like to suggest is an "OpenSanctions Dataset" property. This could be used to show which datasets in the OpenSanctions corpus a particular Wikidata item is part of (e.g. `us_ofac`, `ch_seco_sanctions`, and `sanctions`). One item (e.g. "Saddam Hussein") could be part of many OpenSanctions datasets. We could upload such claims as part of our ETL pipeline in the future for all OpenSanctions entities using Wikidata QIDs. It would allow users to see a) that there's additional details about an entity on OpenSanctions and b) that they are -- according to us -- sanctioned or in some other way a person/entity of interest. --OpenSanctions (talk) 14:26, 26 January 2022 (UTC)
- In principle, that can be modelled without extra property. The items representing the OpenSanctions Datasets can be created (or may be they already exist). Then to any entity from a sanctions list you could add a statement with part of (P361) and those items. You could add a reference to OpenSanctions additionally. RShigapov (talk) 08:38, 27 January 2022 (UTC)
- @OpenSanctions: I don't think "OpenSanctions Dataset" is an alternative to "OpenSanctions ID". If you want, make a separate proposal for "OpenSanctions Dataset", but I think I agree with Shigapov that it's not necessary --Vladimir Alexiev (talk) 11:28, 14 April 2022 (UTC)
Vladimir Alexiev (talk) 11:28, 14 April 2022 (UTC):
OpenSanctions is a truly excellent resource:
- Incorporates 44 datasets, including
- "US OFAC Consolidated (non-SDN)" and "US OFAC Specially Designated Nationals (SDN)". Note: Wikidata:Property proposal/OFAC sanction ID has been rejected because of defects in its description. So let's approve this proposal quickly
- Wikidata Entities of Interest and 5-6 other WD datasets
- Tracks a variety of identifiers, from passport numbers to IBANs.
- Reuses WD IDs whenever available
- Offers download in 4 formats (eg see EveryPolitician):
- just names
- simple CSV
- FollowTheMoney JSON (application/json+ftm)
- enriched "Targets as nested" JSON (application/json)
- Has provenance info for every statement.
- Has a very excellent search, eg
- Slobodan+Milošević finds not just him, but also 6 members of his family
- "Bi Sidi Souleymane" finds "Bi Sidi SOULEMAN"
- Has OpenRefine reconciliation API
I did some count on the biggest collection https://opensanctions.org/datasets/default/ as CSV (https://data.opensanctions.org/datasets/latest/default/targets.simple.csv):
$ wc -l targets.simple.csv 196347 $ grep -cP '\bQ\d+\b' targets.simple.csv 112513 $ grep -cP '^"Q\d+"' openSanctions-default.csv 112503
- 57% of its entities have WD ID: about 109,024 of 190,258 entities have WD ID.
- Almost all of them use WD as their main identifier
- 81,234 or 43% don't have a WD identifier
Out of 204,469 total targets in OpenSanctions, this collection has 190,258 or 93%. 2 datasets are missing from this collection (https://github.com/opensanctions/opensanctions/issues/199 asks for an "all" pseudo-collection):
- Russian terrorists & extremists list. "it's nutty": @OpenSanctions: does that mean useless?
- INTERPOL missing children (Yellow Notices): ok, maybe this doesn't belong in WD (for privacy reasons)
Voting
edit- Support This will me imensely helpful for entity/record linking
- Strong Support. Let's help Ukraine! --Vladimir Alexiev (talk) 11:28, 14 April 2022 (UTC)
- Support This project is useful and coreferencig it via Wikidata can only make it more useful. --Nikola Tulechki (talk) 11:45, 14 April 2022 (UTC)
- Support This is useful --Borko1990 (talk) 12:20, 14 April 2022 (UTC)
- Support - Salgo60 (talk) 12:30, 14 April 2022 (UTC)
- SupportSo9q (talk) 13:00, 14 April 2022 (UTC)
- Support LydiaPintscher (talk) 13:46, 14 April 2022 (UTC)
- @RShigapov, Vladimir Alexiev, So9q, OpenSanctions, Nikola Tulechki, Borko1990: Done: OpenSanctions ID (P10632) —MasterRus21thCentury (talk) 06:45, 15 April 2022 (UTC)
@RShigapov, MasterRus21thCentury, So9q, OpenSanctions, Nikola Tulechki, Borko1990: I'm currently importing 112504 OS ids that are the same as WD id: https://quickstatements.toolforge.org/#/batch/82849
- There are about 15% errors, some are due to WD capacity limits, others not sure why. Eg https://www.wikidata.org/wiki/Q1000800 shows as an error, but in fact the OpenSanctions ID was recorded
- Question Although there is Formatter, the ids don't come out as links, does anyone know why? Eg https://www.opensanctions.org/entities/Q100983102/ is a link but 10 other OS id on WD that I checked don't come out as links
- Details in https://github.com/opensanctions/opensanctions/issues/198
- I'll also make a WD Mix-n-Match catalog with the remaining entries. Breakdown per type:
csvtk freq -f schema openSanctions-no-WD.csv schema,frequency Airplane,269 Vessel,415 Company,2455 CryptoWallet,7457 Organization,4335 LegalEntity,4581 Person,57754