Corpus for Disease Names and Adverse Effects


Text Corpus for Disease Names and Adverse Effects

On this page, the corpus associated with the following publication is available:

Harsha Gurulingappa, Roman Klinger, Martin Hofmann-Apitius, and Juliane Fluck. An Empirical Evaluation of Resources for the Identification of Diseases and Adverse Effects in Biomedical Literature. In 2nd Workshop on Building and evaluating resources for biomedical text mining (7th edition of the Language Resources and Evaluation Conference), Valetta, Malta, May 2010

Annotated entity classes:

  • DISEASE (for diseases)
  • ADVERSE (for adverse effects)

Each Entry starts with a ### followed by its PMID number
The columns:

  1.   Token
  2.   Start Index
  3.   End Index
  4.   Full untokenized Entities
  5.   Class (B-class|I-class|O)
  • B- means: Beginning of an entity
  • I- means: Continuation of an entity
  • O means: None of the defined entities