Wikidata talk:ScienceSource project/Focus list

(Redirected from Wikidata talk:ScienceSource focus list)
Latest comment: 4 years ago by Charles Matthews in topic Focus list maintenance

Tracking edit

Here is the basic SPARQL query for tracking the focus list. Right now there are just over 600 pages on it. Charles Matthews (talk) 08:11, 11 July 2018 (UTC)Reply

#Tracking query for the ScienceSource focus list (WD:SSFL on Wikidata)
SELECT ?item ?itemLabel
   WHERE 
         {?item wdt:P5008 wd:Q55439927 .
                
          SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }   
          }
Try it!

Workflow from hackathon edit

See phabricator T199329, https://etherpad.wikimedia.org/p/Wikimania2018-Hackathon_Showcase link.

The following steps could be taken, starting from a Wikipedia article, to add its DOIs to the focus list, assuming they occur in {{Cite journal}}.

  • Run Javascript over the parsoid HTML to extract the external links beginning //doi.org/, e.g.
//doi.org/10.1186/1743-422x-7-45, //doi.org/10.3748/wjg.v13.i1.48
  • Take the identifier parts and render all the letters into upper case, e.g.
10.1186/1743-422X-7-45, 10.3748/WJG.V13.I1.48
  • Run this kind of SPARQL query:
#Prototype focus list batch by Aleksey
SELECT ?item 
WHERE 
{
  values ?doi {  "10.1186/1743-422X-7-45" "10.3748/WJG.V13.I1.48" }
  ?item wdt:P356 ?doi
}
Try it!

Charles Matthews (talk) 05:45, 22 July 2018 (UTC)Reply

Census edit

I have written some new code that can read and update DOIs, PubMed, and PubMed Central links from Wiki pages, based on the article HTML. I am currently importing WikiSpecies and English Wikipedia. Publications are automatically checked against Wikidata, and Q number, as well as other IDs, are added. Obvious applications include checking often-used papers within/across Wiki projects, and adding them to Wikidata if required. If you have a Toolforge (aka Labs) account, you can see the database as s52680__science_source_p. --Magnus Manske (talk) 18:11, 23 July 2018 (UTC)Reply

Proj Med has a paid, part-time editor who maintains a list of updates to Cochrane reviews. She coordinates the effort to remove old sources and replace them with newer review articles. Is there a way to flag journal articles so that we can see what ones have updated? I also think that you might be missing the vast potential that this kind of project will have when medical researchers can do a keyword search on a medical topic. When they do this, it will be an easy task to use this database to generate new review articles and meta-analyses. It's pretty exciting, actually. Also, here is just a little more info on med sources: for the rarer diseases and conditions what may seem to be 'lower' quality sources are actually still very good and can even be case studies. Also, we use govt publications as good and valid sources for med articles. Best Regards, Barbara (WVS) (talk) 22:18, 27 July 2018 (UTC)Reply
I actually hope that we can demonstrate that SPARQL queries are superior to keyword searches, in a few ways. Thanks for dropping by. I believe that the ScienceSource Wikibase site will be useful for the sort of data you mention, where one article can be considered to be a replacement for another: which is a particular kind of annotation, such as we'll be using, and possibly not so acceptable here on Wikidata. Charles Matthews (talk) 14:08, 1 August 2018 (UTC)Reply

Bubble charts edit

Here is a query breaking the focus list down by disease special subject:

#ScienceSource focus list (WD:SSFL) disease subjects by medical specialty
#defaultView:BubbleChart
SELECT ?spec ?specLabel ?count
WHERE 
{
  {
    SELECT ?spec (COUNT(?item) AS ?count)    
WHERE {
        ?item wdt:P31 wd:Q12136 .
        ?item wdt:P1995 ?spec  .
        ?SSpaper wdt:P5008 wd:Q55439927 .
        ?SSpaper wdt:P921 ?item .
        }
  GROUP BY ?spec 
  }
   SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
ORDER BY DESC(?count)
LIMIT 100
Try it!

Right now that chart is dominated by infectious diseases (Q788926). Charles Matthews (talk) 10:44, 16 August 2018 (UTC)Reply

Automatically add main subjects contained in the publication title ? edit

Dear

It seem obvious to me that a publication having Fibromyalgia in the title is heavily related to the disease named Fibromyalgia.

Then could we consider that we can automatically add Fibromyalgia as a main topic for all publications having Fibromyalgia in the title ? And the same for all other defined medical concepts ?

If yes a way to do this could be to use a query to list all Qid of scholarly articles having "Fibromyalgia" in the title and then copy the list in a Google form. This would allow to quickly build a QuickStatement query to add Fibromyalgia as a main topic to all scholarly articles having Fibromyalgia in the title.

It could even be possible to create a script where you only have to list such defined medical concepts name and Qid and it will add it as a main topic to all scholarly articles having it in the title.

Regards

-- Thibdx (talk) 00:02, 26 December 2018 (UTC)Reply

@Thibdx: I have only just seen this suggestion. Some comments:
  • A number of people are already using this heuristic. There are additions for main subject (P921) with qualifier, saying the title is being used as a heuristic.
  • It is known that this heuristic is not great. Some research shows that even using the abstract to judge the topics is not necessarily good: you sometimes need the whole text.
  • In 2019, I have been using a tool, NCBI2wikidata, that makes it possible to add MeSH main subjects to items. In practice this is much more satisfactory than the title, which obviously omits many aspects. At present I would say that this direction is the most promising. MeSH main subjects are available for most papers in PubMed (those indexed for Medline), and those marked with * as "major topics" are the best to use.
  • Automation for those PubMed papers is certainly possible. In fact at present the bottleneck is really that the tool produces QuickStatements code much faster than QuickStatements can post it.
So, I think the right direction is to develop some better bots using MeSH. Charles Matthews (talk) 08:32, 2 August 2019 (UTC)Reply

Focus list maintenance edit

#Focus list articles not marked as reviews
SELECT DISTINCT ?paper
WHERE {
     ?paper wdt:P5008 wd:Q55439927;
            wdt:P577 ?date.
      MINUS {?paper wdt:P31 wd:Q7318358}
      }
ORDER BY ASC(?date)
Try it!

The intention now is to move the ScienceSource focus list to being a collection of sources that pass the MEDRS test. There are more than 2K articles on it not marked as a review: one task is therefore to check them against PubMed for publication type. As of today, there are 2211 hits for the query above.

The number should come down as NCBI2wikidata runs find further reviews. Those runs can be targeted towards MeSH major topics of papers on the focus list (see section above). If anyone has suggestions for MeSH descriptor ID (P486) descriptors that should be run in this way, contact me. NCBI2wikidata + some post-processing can yield main subjects that are any of the 26K MeSH topics (on review articles with CC licenses). Charles Matthews (talk) 08:44, 2 August 2019 (UTC)Reply

Return to the project page "ScienceSource project/Focus list".