Open main menu

Wikidata:WikiProject Zika Corpus








Welcome to WikiProject Zika Corpus
This is a WikiProject dedicated to the creation of a rich corpus on all scholarly knowledge related to the Zika Virus.


Wikidata hosts different layers of information about the Zika virus
Scholia profile for publications related to the Zika virus

In February 2016, the World Health Organization declared a public health emergency over the Zika virus outbreak and its links (then suspected, by now confirmed) to microcephaly and Guillain-Barré syndrome. By that time, around 150 scholarly articles had been published about the virus since its discovery in 1947, and the majority of these articles had already been assigned Wikidata items.

Since then, the literature on the topic has grown by an order of magnitude (see #Timeline), and the Wikidata coverage has mostly kept pace, with a typical time lag of less than a week. While not complete, this corpus covers most PubMed-indexed English-language articles reporting or reviewing original research about the Zika virus and the infections it can cause in mosquitoes, humans and animal models, as well as about approaches to prevention, diagnostics, therapy, or surveillance.

The Zika corpus served as a nucleus for creating a citation graph on Wikidata and for exploring co-author networks and similar information on Wikidata. It is now slowly expanding to encompass literature about related subjects, e.g., flaviviridae and mosquito-borne diseases more broadly, epidemiological modeling or data sharing in public health emergencies.


We define the Zika Corpus as consisting of

  • all items linking to Zika virus (Q202864) or to entities that are part of it
  • topics that co-occur with Zika virus (Q202864) as main subject (P921) of creative works
  • authors of these works, and institutions or organizations they are affiliated with or employed by
  • venues and publishers through which these works have been published


  • Curate the corpus by
    • enriching the items in the corpus
    • using the works included in the corpus to reference Wikidata statements
  • Create a demonstrator for WikiCite: a consistent/interesting/visualizable dataset
  • Prototype data visualization/storytelling ideas for exploring the corpus

Overview of publications related to the Zika virusEdit



The timeline of Zika publications indexed in Wikidata can be visualized in various ways, e.g.

Recent changesEdit

A good overview of ongoing activity in curating the corpus is provided by the 100 most recent changes related to the list of items about the Zika virus.

Target AudiencesEdit

  • Sociologists of science (including STS, information science, bibliometrics, social scientists? Or should we describe multiple separate groups here)
    • democratizing access to datasets that have traditionally been controlled by a small group of academic players
    • which topics were the current authors of Zika research previously studying?
  • The general public
    • public understanding of research on Zika and how this research evolved, e.g. timelines of when the news knew about the virus, when it became public knowledge, compared to when the papers were published, social media coverage and compared to the geographic spread of the virus and cases over time.
  • Journalists
    • how much is Zika research costing? where is funding coming from? Is funding coming from tax dollars and research coming from govt orgs? It matters because our representatives' and institutions' ability to respond to global health crises depends on budget
    • what treatments are currently available? Are there advances that may provide treatment in the near future?
    • how the public opinion is understanding or potentially distorting trustworthy information on the topic
    • personal stories
  • Primary Researchers

To DoEdit

  • Define a property to help set the boundaries of the bibliographic corpus
  • Extract and store author affiliations
  • Extend coverage of statements supported by specific sources
  • Add funder organizations from CrossRef Funder Registry to Wikidata. It's CC0 and "a unique taxonomy of grant-giving organizations". Downloadable as RDF or CSV
  • Add funder information for papers
    • PMC API
    • NLP may help:
      • Councill, Isaac G., C. Lee Giles, Hui Han, and Eren Manavoglu. "Automatic acknowledgement indexing: expanding the semantics of contribution in the CiteSeer digital library." In Proceedings of the 3rd international conference on Knowledge capture, pp. 19-26. ACM, 2005. Q30046394
      • Giles, C. Lee, and Isaac G. Councill. "Who gets acknowledged: Measuring scientific contributions through automatic acknowledgment indexing."Proceedings of the National Academy of Sciences of the United States of America 101, no. 51 (2004): 17599-17604. Q30046493
      • Khabsa, Madian, Pucktada Treeratpituk, and C. Lee Giles. "Ackseer: a repository and search engine for automatically extracted acknowledgments from digital libraries." In Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries, pp. 185-194. ACM, 2012. Q30050797
      • Khabsa M., Koppman S., Giles C.L. (2012) Towards Building and Analyzing a Social Network of Acknowledgments in Scientific and Academic Documents. In: Yang S.J., Greenberg A.M., Endsley M. (eds) Social Computing, Behavioral - Cultural Modeling and Prediction. SBP 2012. Lecture Notes in Computer Science, vol 7227. Springer, Berlin, Heidelberg
  • Add paper topics
    • Extract and add MeSH (Q199897) terms


See alsoEdit


The participants listed below can be notified using the following template in discussions:

{{Ping project|Zika Corpus}}