About this board

Previous discussion was archived at User talk:Magnus Manske/Archive 9 on 2015-08-10.

Uploading MnM catalog of huge size

1
Solidest (talkcontribs)

Hi. I spent several days preparing complete and detailed list of Discogs master ID (P1954) IDs for mix-n-match. I ended up with a file consisting of 2,042,316 elements and weighing 240mb. Only after that I found out that MnM has a limit of 1000 elements at a time :) . Is there any way around this limitation, or could you please help me with the upload?

Reply to "Uploading MnM catalog of huge size"

Wikispecies - Vernacular names>>Wikidata Property:P1843

1
Rosičák (talkcontribs)

Zdravím,

všiml jsem si, že pomocí bota importujete/(impotroval jste) do Wikidat jednotlivé národní názvy z položky Vernacular names na Wikispecies do položky (P1843). Tento počin je bezvadný, ale vzhledem k pravidlům a doporučením projektu Wikispecies vede k pravopisně chybnému pojmenování taxonů v některých jazycích. V mnoha jazycích jsou názvy taxonů pravopisně správně s malým písmenem na začátku názvu. Mohl bych poprosit o hromadnou opravu názvů taxonů v češtině? Všechny začínají malým písmenem.

Reply to "Wikispecies - Vernacular names>>Wikidata Property:P1843"

QuickStatements läuft nur eingeschränkt

1
Summary by CennoxX

Danke, funktioniert wieder.

CennoxX (talkcontribs)
Antonsusi (talkcontribs)
Reply to "Toolskript Skyhack.php"

Can't add P10700 to catalog 5115

3
Ayack (talkcontribs)
Gerwoman (talkcontribs)

I did it without problem. Perhaps a transient error.

Ayack (talkcontribs)
Vladimir Alexiev (talkcontribs)
Epìdosis (talkcontribs)
Vladimir Alexiev (talkcontribs)

@Epìdosis:

How can we tell coherent from not coherent?

It seems to me both of your examples 1002087146, A2087146 would map to the same thing 1002087146

Epìdosis (talkcontribs)

Surely 1002087146 and A2087146 both map the same thing, but only A2087146 fits the current formatter URL of the property.

Epìdosis (talkcontribs)

I think that we can tell coherent from not coherent through the formatter URL https://doi.org/10.1093/gmo/9781561592630.article.$1:

The regex effectively allows A2087146 but not 1002087146.

Effectively I think that:

Gerwoman (talkcontribs)
Epìdosis (talkcontribs)
Gerwoman (talkcontribs)
Epìdosis (talkcontribs)
Vladimir Alexiev (talkcontribs)

Is OMO a superset if GMO? Or are the two partially overlapping?

Epìdosis (talkcontribs)
Gerwoman (talkcontribs)
Vladimir Alexiev (talkcontribs)
Epìdosis (talkcontribs)

Looks good to me too; I think we can sync 5173 and delete 3802.

Epìdosis (talkcontribs)

Upload of a large catalog a Subset of Tax Exempt Organizations from IRS dump mapped to P1297

4
Wolfgang8741 (talkcontribs)

I'm not aware of where one might collect all EIN, but I post processed the US IRS Exempt Organizations Business Master File Extract which could provide a catalog of a subset of the IRS Employer Identification Number (P1297) which would be current and past tax exempt organizations in the USA.


Uncompressed the csv is 372.1 MB and compressed 56.0 MB which neither uploaded through the web interface. id is the Property ID for P1297


File format of the csv for comment.

id,name,description,url,P6733

000019818,PALMER SECOND BAPTIST CHURCH - 3514,"A tax exempt Association registered in PALMER, Massachusetts - NTEE type record was empty: ",https://apps.irs.gov/app/eos/allSearch,

000029215,ST GEORGE CATHEDRAL,"A tax exempt Corporation registered in SOUTH BOSTON, Massachusetts - NTEE type record was empty: ",https://apps.irs.gov/app/eos/allSearch,

000587764,IGLESIA BETHESDA INC,"A tax exempt Corporation registered in LOWELL, Massachusetts with NTEE primary exempt activity type: Protestant",https://apps.irs.gov/app/eos/allSearch,X21

000635913,MINISTERIO APOSTOLICO JESUCRISTO ES EL SENOR INC,"A tax exempt Corporation registered in LAWRENCE, Massachusetts with NTEE primary exempt activity type: Protestant",https://apps.irs.gov/app/eos/allSearch,X21

000765634,MERCY CHAPEL INTERNATIONAL,"A tax exempt Corporation registered in MATTAPAN, Massachusetts with NTEE primary exempt activity type: Christian",https://apps.irs.gov/app/eos/allSearch,X20

000841363,AGAPE HOUSE OF PRAYER,"A tax exempt Corporation registered in MATTAPAN, Massachusetts with NTEE primary exempt activity type: Christian",https://apps.irs.gov/app/eos/allSearch,X20

000852649,BETHANY PRESBYTERIAN CHURCH,"A tax exempt Corporation registered in BROOKLINE, Massachusetts with NTEE primary exempt activity type: Christian",https://apps.irs.gov/app/eos/allSearch,X20

001028397,UNITY IN THE CITY,"A tax exempt Association registered in BROOKLINE, Massachusetts - NTEE type record was empty: ",https://apps.irs.gov/app/eos/allSearch,

001347537,RESTORATION OF HOPE CHURCH MINISTRIES,"A tax exempt Corporation registered in ROXBURY, Massachusetts with NTEE primary exempt activity type: Christian",https://apps.irs.gov/app/eos/allSearch,X20

~~~~

Vladimir Alexiev (talkcontribs)

Please use OpenRefine to match these to US organizations. The first few records are churches and include "city, state", which should help with matching

Wolfgang8741 (talkcontribs)

@Vladimir Alexiev Are you suggesting OpenRefine just match on the EIN then use Quickstatements to import the rest? I'm not sure what the intersection of US organizations have an associated EIN and collaborative entity resolution of Mix'n'match would avoid importing a significant number of duplicates.

Vladimir Alexiev (talkcontribs)

@Wolfgang8741: I'm suggesting using OR to match these churches to existing WD items by name. BEfore that matching the State and City, and using these fields while matching the main field (name).

This catalog as imported does not have a good chance for "collaborative entity resolution" because City and State are not split out as separate fields, and because MnM cannot do subsidiary reconciliation steps like OR can.

Reply to "Upload of a large catalog a Subset of Tax Exempt Organizations from IRS dump mapped to P1297"

Reasonator dates and external links not translated

1
Back ache (talkcontribs)

In reasonator a couple of things are not translating and staying in English, the date names and the description of the links in the external link box

Back ache (talk) 15:15, 22 March 2022 (UTC)

Reply to "Reasonator dates and external links not translated"

Update Catalog 3092 for P2190 to numeric ID from string format

1
Wolfgang8741 (talkcontribs)

The discussion on P2190 suggests moving to numeric format for C-Span Person IDs from the string format, currently this uses the string IDs and will add unwanted strings to Wikidata that are not reliable for links as redirects are not being added, but a query to the URL may return the numeric ID in the response URL if the string still correctly resolves. - Property talk:P2190. The templates have been notified, numeric pairs uploaded to Wikidata, and bot request to root out the string IDs in EN Wiki. I tried to deactivate this catalog to prevent further string inserts, but the save doesn't seem to take effect. ~~~~

Reply to "Update Catalog 3092 for P2190 to numeric ID from string format"

Catalog 817 uses incorrect url for items and larger considerations

1
Wolfgang8741 (talkcontribs)

A catalog created by Magnus Manske https://mix-n-match.toolforge.org/#/catalog/817 either needs to be retired or the items URLs updated to use the format now used by the Smithsonian's website. All links return 404. Also works have a Linked Open Data URL of the format https://edan.si.edu/saam/id/object/1979.150 which is a different ID than used in the urls that break. Possibly the Property would need discussion too: Current link examples lead to https://americanart.si.edu/artwork/portrait-girl-21713 while the corresponding linked open data focuses on https://edan.si.edu/saam/id/object/1967.6.13 as highlighted https://americanart.si.edu/about/lod

Reply to "Catalog 817 uses incorrect url for items and larger considerations"