Wikidata:Property proposal/ISOF place

ISOF PlaceEdit

Return to Wikidata:Property proposal/Place

   Under discussion
DescriptionPlace name register of Swedish places created by ISOF
RepresentsSwedish Institute for Language and Folklore (Q7654721)
Data typeExternal identifier
Domainsocken (Q1523821)
Example 1Söderala parish (Q10688474)parish-id=176
Example 2Gävleborg County (Q103699)county-id=20
Example 3Algutsrum Hundred (Q4724394)district-id=154
Example 4Sala Silver Mine (Q3314828)place-names/1771147 also same as Hembygdsportalen ID (P6192)171278
SourceWebrest API PlaceNameService see also Notebook
Planned useWikipedia articles and connect with other external identifiers
Number of IDs in source2 million objects
Expected completenessas many as possible at least as many asSwedish place name register SOFI (P5536)
Formatter URLhttps://placename.isof.se/PlaceNamePublic/place-names?$1 this project is in Beta so its not stable yet so this will be updated
See alsoSwedish place name register SOFI (P5536) old version

MotivationEdit

A new application is developed and they will change id. We have the old application in Swedish place name register SOFI (P5536) but as they will run both applications in parallell and that they change the structure we propose to create a new property for the new application: Hopefully this new application can be an authority for Swedish places - Salgo60 (talk) 11:59, 11 November 2020 (UTC)

DiscussionEdit

  Comment I believe we should have a property for the new ISOF places, it looks like they can give a lot of background information. However, the ID's in the examples seem conflated with what a URL formatter should do, and when looking at the ID's in their system, they are not named like that at all. So this would be a new ID constructed by us rather than any official ISOF ID. However, in your notebook it seems to be a so far unnamed column with a unique identifier. If that would become something referenceable then I would like us to adjust the proposal and move forward, but as it is right now I would oppose and instead suggest to split this into separate properties for each of the types (parish, county, district, place) so that we could use their native ID. Ainali (talk) 16:26, 11 November 2020 (UTC)

Ainali What we suggest above is an unique pattern I will double check that with the developer but I feel splitting the property will add no added value. E.g. property Uppsala University Alvin ID (P6821) see also T225522 has nearly the same pattern that I feel makes sense... e.h.
I hope Alvin move direction also support "things" like Alvin-taxon:nnnn, Alvin-parish, Alvin-museum.....
What I miss at ISOF is that they dont display this key in the user interface as Alvin does.... my understanding is that ISOF has plans for an identifier fields but we have seen no specifications for that object... changing value is no major problem as we did that with Hembygdsportalen ID (P6192) they changed last year plattform and did the bad decision to also change all "Persistent" Identifiers
- Salgo60 (talk) 20:34, 11 November 2020 (UTC)
Hmm. This is not how that property was presented when approved, and it has deviated from that approval without any discussion on the discussion page. Perhaps that one need a cleanup as well? Ainali (talk) 20:56, 11 November 2020 (UTC)
@Ainali: do what you want... I feel the problem is the lack of dialogue WD <-> Alvin I have since 2019 been trying to get a meeting with Alvin people see T226099 and also I hope to get some commitments from them. I have met Per Cullhed twice at other events and spoken with wadskog 20200914 on the phone but we havnt been sitting down and share future plans and issues. I feel Alvin dont have a linked data vision and they have no interest in better connect with Wikidata/Wikipedia. I am also interested in to get them better support typed species see T236310 - Salgo60 (talk) 21:47, 13 November 2020 (UTC)
Another approach that we could move forward with immediately would be to switch this to not be of the type external-id (since this is several different external-id's) and instead use URL, which seems to be what this really is. Ainali (talk) 21:16, 11 November 2020 (UTC)
URLs is a bad pattern I think we should try follow the DRY pattern and have in the argument something connected to a "thing" and things that can differ in the formatting URL... we have seen before problems with "moving" applications at SOFI that never was fixed see T200979. Below I do a draft of a checklist what we(Wikidata) want from an external identifier - Salgo60 (talk) 21:47, 13 November 2020 (UTC)


One way to design a system to be a good external identifier for WikidataEdit

A small try to write down a checklist / Best practise please add/change - Salgo60 (talk) 10:38, 14 November 2020 (UTC)

  1. have persistant unique IDs for things like parish, country, places that are containers that Wikidata can do same as with
    1. the "leaf" that in this case is the register card should also have an persistant unique ID
    2. those container objects shall have landing pages and the persistent unique ID should be visible for the UI user compare Alvin Söderala
      1. all landing pages should be supported by GET i.e. you can address that page with an URL and dont need POST we have that problem with SCB Regina database see T200700
  2. to be a good member and on level 5 - link your data to other data to provide context they should have same as external authorities visible. A small step is same as Wikidata Q-number other candidates can be
    1. Swedish county code (P507)
    2. TORA ID (P4820)
    3. Swedish civil parish code/ATA code (P777)
    4. and nice to have is
      1. Hembygdsportalen ID (P6192)
      2. I hope museums will have better place identifiers we are trying to connect to Gotlands museum (P7068), Malmö Museer ID (P8773) what we see is that they have a place but dont say same as so we dont understand what Administrative level we speak about e.g. if they say "Söderala" we dont know if it is Söderala (Q2673411) Söderala parish (Q10688474) Söderala church parish (Q10688470) Söderala (Q21779139)
      3. identifiers for streets. My understanding is that en:Lantmäteriet has no persistent unique id for objects like streets see blog
  3. objects should have version history and support for merges by supporting redirects from the old item to the "new" item
    1. this should also be supported by the API compare Wikidata 'owl:sameas and a query merges
  4. they shall have a SPARQL endpoint and/or JSON access so that we can easy check differences easy between Wikidata and the external system see e.g. SKBL, [Nobel prize...
    1. Good documentation of the API like using Swagger see ISOF, JobTech, Nobelprize
  5. have timestamps for created and changed
  6. nice to have
    1. a change API like Wikidata
    2. support for a Query language like WDQS
    3. linking back to Wikipedia pages e.g. Litteraturbanken
  7. deleted items should be easy to find compare problems Europeana has with Wikidata that gets deleted
  8. support for more languages
    1. SKBL has support for Swedish and English by changing url e.g Greta Garbo json sv en ---> we now support both templates in en:Wikipedia and in sv:Wikipedia --> 9 million visitors to those Wikipedia pages this year sv / en

Example of possibilities and problems we findEdit

  1. Persistent
    1. Graves at www.svenskagravar.se. Quote svenskagravar "they are persistant IF we dont reload all graves". They have now reloaded more times --> i.e. its not persistent
    2. A Swedish site containing local history material "Sveriges Hembygdsförbund" upgraded to a modern plattform and also "upgraded all ids" --> we needed to delete all linked items see T248875 and start from scratch
    3. Europeana was an external identifier in P727 (P727) but lesson learned was that it was not persistent so then they implemented a new approach and I created Europeana entity (P7704). Lesson learned was as you see below that the new approach has quality problems
  2. Quality Europeana did copy 160 000 items from dbpedia/Wikidata for artist BUT they havnt done the homework connect the right objects to the right artists instead used text strings and guessed --> bad quality see T243764
  3. Error reporting When connecting two domains you find problems/errors e.g. Wikidata has indication on many duplicates in Uppsala University Alvin database but we have no easy way to report errors/ or they dont use Wikidata / Phabricator were we track issues see list duplicates or Task T243764
  4. Uniqueness - the Swedish National archives has NAD i.e. id for archives. We have reported that they are not unique and now we see some redesign using en:GUID to fix this see also Task T200046
    1. disambiguation page if a name space gets more items that can be described with the same names create disambiguation pages. In the new design of NAD from the Swedish National Archive it looks like they skip this, which from an user perspective is a nightmare if you just have the old ID
  5. Lack of a helpdesk were we get an unique helpdesk id when we ask a question / report an issue . We have this problem with the Swedish National archives, SCB, ISOF, "Lantmäteriet" .... Swedish "Naturvårdverket" has unique numbers but no easy way to see the status e.g. 2018NV38321
    1. workarounds
      1. be active on Wikipedia ==> then we can ping them and discuss issues and agree how we solve things and get feedback of errors in Wikipedia/Wikidata
      2. GITHUB Litteraturbanken and SKBL are active on GITHUB and have issue trackers we use
        1. Litteraturbanken spraakbanken/littb-frontend/issues
        2. SKBL spraakbanken/skbl-portal/issues
  6. Easy way of ask questions and see what questions other have asked. Good example is Libris most other institutions dont have this
  7. Easy way to subscribe on an issue and get a notification when its moved to production. We have this in Phabricator used by Wikidata and also see change stream
  8. Dataroundtrip as we now support linked data on pictures its getting more and more important to have a data roundtrip approach i.e. changes in WIkidata needs to be tracked and taking care of in both systems we can keep booth systems in synch. Today we try to fix that ad hoc but it would be better if we agreed on a "framework"/"model" examples hat we do today
    1. JSON and structured data
      1. Nobelprize.org
      2. Swedish female biographies
      3. The Swedish Literature Bank
    2. Webpages no API we Webscrape and compare with Wikidata
      1. Swedish National Archive SBL
      2. Graves Uppsala
      3. Swedish Academy
    3. WikiTree a genealoigy site with 180 000 connections to Wikidata WikiTree person ID (P2949) and 22 million profiles
      1. they check the quality of the family tree every against > 250 rules were Wikidata is a number of checks see Data doctors report
  9. GET/PUT we need an easy way of linking using an URL. E:g. SCB Regina is designed for just access a record using post which dont work with WIkipedia see T200700
  10. Clean URLs not using redirects in a perfect world everything is Linked data and WEB 2.0 and data is presented as data. As a workaround to use the power of Wikidata an Australian researcher has created d:Wikidata:Entity_Explosion --> we can get old platforms like SBL, LibrisXL... to use Wikidata for finding "same as". If we install this Webbrowser extension see video we get the magic of Wikidata and how we get problems with e.g. Alvin that has a redirect and a rather "noisy" URL
  11. Active agile product management and easy way to discuss/ get updated of changes (see video about agile product owner). In Wikidata we have
    1. Prioritized open backlog everyone can register and ask question/ subscribe
    2. Weekly status updates Wikidata:Status_updates/2020_11_16 / all
    3. Telegram groups Wikidata and Wikidata Sweden.....
    4. Project chats Wikidata:Project_chat / Wikidata Swedish plus on all pages e.g. property Dictionary of Swedish National Biography ID (P3217) you have Property_talk:P3217
    5. Every 2nd year meeting that are available online e.g. Wikidata:WikidataCon_2019/Program example featured talks 2017 / 2019
    6. We have more research oriented meetings like wikidataworkshop 2 nov 2020, key note
      1. Research papers about Wikidata
  12. Missing vision statements and sharing your future development example we try to connect to the Europeana network and we see [lack of quality it would be of great help if they shared the next step they will take. Without information it looks like they have given up. We have the same "challenge" with the Swedish Riksdagen blog/video were we have no understanding of the vision of classification and small things if they will support who is the substitute of a position, today we have heard they move in direction using Eurovoc and we need to read documents to find who is the substitute for a specific position is that the vision?
    1. Public prioritized backlog the best pattern for success is to have a prioritized backlog open for questions and subscription see the usage of Phabricator for the Wikidata project - video about active product management
    2. EPICS share your Epics example Wikidata
      1. Improve Search Suggestions with NLP
      2. Growth: Newcomer tasks 1.0
      3. Better support for References in Content Translation
      4. Structured data backlog
      5. Feedback processes and tools for data-providers
  13. good tools for measure uptime of service and the usage compare Wikidata Grafana Dasgboard
    1. tools for measure Wiki pageviews eg. article Greta Garbo, sv:Wikipedia articles linking Svenskt kvinnobiografiskt lexikon same for en:WIkipedia