Wikidata:Requests for permissions/Bot/ConferenceCorpusBot
The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Not done, no progress for a while. Feel free to re-open this if work resumes. Thanks. Mike Peel (talk) 18:41, 24 September 2022 (UTC)[reply]
ConferenceCorpusBot edit
ConferenceCorpusBot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: WolfgangFahl (talk • contribs • logs)
Task/s: Import Scientific events and series from diverse sources e.g. dblp
Code: https://github.com/WolfgangFahl/ConferenceCorpus
Function details: The intended functionality builds on the preparations done by Simon Cobb using OpenRefine based on the discussions in:
https://github.com/SmartDataAnalytics/OpenResearch/discussions/127
So far some 4000 event series have been imported this way and linked to dblp, Microsoft Academic Knowledge Graph, GND and so on.
See https://confident.dbis.rwth-aachen.de/dblpconf/wikidata for a list of event series available in wikidata.
The list of Scientific Events is unfortunately not as complete and this bot intends to remedy the situation. In the end there should be proper scientfic event / proceedings pairs instead of the many proceedings entries that do exist but do not have a link to the corresponding conference. See https://github.com/SmartDataAnalytics/OpenResearch/discussions/162 for an overview or https://diagrams.bitplan.com/render/svg/0x589f1ec0.svg
--WolfgangFahl (talk) 15:42, 6 November 2021 (UTC)[reply]
- What do you want to import? conferences, individual editiona of conferences, or articles published there?--GZWDer (talk) 15:57, 6 November 2021 (UTC)[reply]
- this request feels a little vague. are you creating new items? modifying existing items? can we see a sample? BrokenSegue (talk) 16:44, 6 November 2021 (UTC)[reply]
Examples edit
Today I imported some events and corresponding proceedings with openrefine: see https://scholia.toolforge.org/event-series/Q64852380 and my contribution page https://www.wikidata.org/wiki/Special:Contributions/WolfgangFahl The sources are: https://confident.dbis.rwth-aachen.de/or/index.php?title=K-CAP where you'll find pointers e.g. to https://dblp.org/db/conf/kcap/index.html. We intend to create a bot that does similar work without using openrefine and a better quality by doing better matching against different sources/references.
See also Proceedings Title Parser for a matching source we used until recently that only matches via acronyms. The corresponding event series can be found at: https://confident.dbis.rwth-aachen.de/or/index.php?title=TPDL and https://scholia.toolforge.org/event-series/Q5412433
The goal is to complete the event / proceedings pairs and link them via P4745 - is proceedings from
New items will be created when missing and existing items amended if e.g. library references of k10plus, GND, dblp are missing. There is a core set of information that will be used as an "event signature":
- acronym
- year
- title
- location
- country
- starttime
- endtime
- homepage
and those are mapped to the appropriate wikidata properties as shown in the examples. WolfgangFahl (talk) 15:59, 9 November 2021 (UTC)[reply]
- ok. sounds like the bot isn't implemented yet? are all the details ironed out? is the source anywhere? do you know how many edits you'll be making? by the way you should actually register the Bot user account User:ConferenceCorpusBot so nobody can steal it (and probably make the test edits under that account in the future). generally this seems like a good bot though. BrokenSegue (talk) 23:24, 9 November 2021 (UTC)[reply]
@WolfgangFahl, BrokenSegue: This seems to be stale, is this still active? Perhaps @Ymblanter, Lymantria: could comment? Thanks. Mike Peel (talk) 22:17, 18 January 2022 (UTC)[reply]
- Please stay tuned see e.g. WEBIST OpenResearch Series versus WEBIST Series in Wikidata (Scholia) versus WEBIST series information from different datasources versus WEBIST from different datasources (JSON) Currently the process is still semi-automatic and we are using OpenRefine for filling wikidata. We need to implement the Backend now to make sure the Bot functionality will be achieved.WolfgangFahl (talk) 06:39, 19 January 2022 (UTC)[reply]
- Support but I recognize that more documentation and community review is helpful. I invite this tool and its community to Wikidata:WikiProject Events, where anyone can show examples, host discussion, and build out documentation. There is already a data model for conferences documented there, so if the tool can follow that model, then use this as a community hub for explaining the too. I posted some guidance for next steps at Wikidata_talk:WikiProject_Events#Bot_for_importing_conference_data. Somehow we need to see and discuss some test data to proceed and again, I suggest that the Events WikiProject is an appropriate place for central conversation. Let me know if I can assist with organizing conversation and review. Bluerasberry (talk) 18:04, 28 January 2022 (UTC)[reply]
- Yes please. Today i started a semi-automatic import of https://scholia.toolforge.org/event-series/Q105692764 which is an interesting example since it's incomplete in dblp. WolfgangFahl (talk) 08:50, 20 April 2022 (UTC)[reply]
@Bluerasberry, WolfgangFahl: A couple dozen events have been imported using a semi-automated approach in the meantime. The bits and pieces of the infrastructure are now available. How do i proceed from here?--WolfgangFahl (talk) 06:42, 28 May 2022 (UTC) @Bluerasberry, WolfgangFahl, Tholzheim: As part of https://github.com/ceurws/lod/issues/25 some 3200 events have been added based on http://ceur-ws.org. Since we don't know how to use or activate our bot account we have used our personal accounts instead. How could we use the proposed ConferenceCorpusBot account instead? The software we are using is now mostly the https://github.com/WolfgangFahl/pyCEURmake software which in turn uses https://pypi.org/project/pyGenericSpreadSheet/ which has the necessary wikidata integration in https://github.com/WolfgangFahl/PyGenericSpreadSheet/blob/main/spreadsheet/wikidata.py[reply]