Wikidata:Property proposal/Semantic Scholar paper ID

Semantic Scholar paper ID edit

Originally proposed at Wikidata:Property proposal/Creative work

Descriptionidentifier for an article in the Semantic Scholar database
RepresentsSemantic Scholar (Q22908627)
Data typeExternal identifier
Domainwork (Q386724)
Allowed values\w+
ExampleThe Semantic Web Revisited (Q29037447) -> 5acd1dd3da5752e1de4c5b46f75b7aec2bc50503
Formatter URLhttps://www.semanticscholar.org/paper/$1
See also
Motivation

Semantic Scholar (Q22908627) is a nice paper archive with great statistics and recommendations/links. Most papers have open access full-text (it sources arXiv and CiteSeerX). It's still quite smaller than Google Scholar (eg "semantic" finds 140k on Semantic Scholar and 3.9M on Google Scholar), but I think it's increasing in importance. I'll talk to them about getting API access for en:Wikipedia:OABOT, as discussed with @Pintoch:. It is NOT limited to semantic web or computer science only. #WikiCite2017 Vladimir Alexiev (talk) 13:33, 25 May 2017 (UTC)[reply]

#WikiCite :-) Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:47, 28 May 2017 (UTC)[reply]
Discussion

SemanticScholar Corpus edit

Contacted them, let's see the answer: Is there an API we can use to find Open Access sources for papers? After the recent WikiCite conference, we created props for Semantic Scholar on Wikidata, eg see https://www.wikidata.org/wiki/Wikidata:Property_proposal/Semantic_Scholar_paper_ID. We'd use this API to feed https://en.wikipedia.org/wiki/Wikipedia:OABOT. Thanks in advance! --Vladimir Alexiev (talk) 08:16, 6 June 2017 (UTC)[reply]

"Over 7 million published research papers in Computer Science and Neuroscience". The format is rather simple and uses the IDs we already got:

So it's a great basis of matching papers against WP/WD by title and author names

{
  "id": "060e50b8752fdd799201fd9570e0bb668f017402",
  "title": "A review of Web searching studies and a framework for future research",
  "paperAbstract": "Research on Web searching is at an incipient stage. ...",
  "keyPhrases": [
    "OPAC",
    "..."
  ],
  "authors": [
    {
      "ids": [
        "7981846"
      ],
      "name": "Bernard J. Jansen"
    },
    "..."
  ],
  "inCitations": [
    "81027fc698ca6f49f506c3d5cf679178f3c74df1",
    "..."
  ],
  "outCitations": [
    "3811f1176f27b4030bda7b6e431e6ce45cb89996",
    "2b0a8ac61e63a6c4dca5290b93b7622976a6b273",
    "..."
  ],
  "year": 2001,
  "s2Url": "http://semanticscholar.org/paper/060e50b8752fdd799201fd9570e0bb668f017402",
  "venue": "Seattle Tech Conf"
}
Hi Vladimir Alexiev, thanks a lot for looking into this! If I remember correctly, Semantic Scholar uses only PDFs crawled by CiteSeerX (which is already covered by OAbot through BASE). Could you ask your contact at Semantic Scholar if that is still the case ? Otherwise, if these new PDFs do not overlap with any other existing source in the bot, it should be possible to import the dump in dissemin's backend, but I try to avoid that (since this is a static dataset that will not be updated). − Pintoch (talk) 12:59, 14 June 2017 (UTC)[reply]
hi Pintoch I don't think SemScholar is limited to CiteSeerX. Eg see Mario Lipinski on citeseerx (1 paper) vs on semscholar (20). SemScholar seems to have mixed two authors with the same name, but even so it has 6-7 papers (on compsci and extraction from PDF) by the one present on citeseerx. --Vladimir Alexiev (talk) 13:50, 14 June 2017 (UTC)[reply]

SemanticScholar Citeomatic edit

Citeomatic http://labs.semanticscholar.org/citeomatic.

I tried this tool on a recent paper (http://vladimiralexiev.github.io/pubs/Tagarev2017-DomainSpecificGazetteer.pdf) and the results are impressive: http://labs.semanticscholar.org/citeomatic/url/56780d97eac3744403ddaf551dcad872811692d0.

  • Parsed correctly the title and abstract, most of the authors (but parsed "Toloşi" as "Tolo¸sitolo¸si"),
  • don't know where it got "2014" for the year
  • Most importantly: We found 49 new citations and 1 that you have already cited... Export: you can explore/read the papers right there, and export them one by one for your bibliography.
  • So that's a great new way of adding citations of papers you've never read, thus make your paper a lot more scholarly ;-)
  • This is just a light joke: the value of this tool for exploring areas of science is huge! --Vladimir Alexiev (talk) 12:28, 14 June 2017 (UTC)[reply]