Files made available through work are: 1. miRNA Corpus Version 1.0 (test and training) 2. Annotation Guidelines for specific miRNA mentions 3. Relation Dictionary Version 1.0 4. Non-Specific MiRNAs Dictionary Version 1.0 5. Supplementary Tables The detailed description of each file is given below. ============================================================================================================================================================================= miRNA Corpus Version 1.0 (as of October 29th, 2012) The corpus consists of 301 Medline citations. The documents were screened for mentions of miRNA in the abstract text. Gene, disease and miRNA entities were manually annotated. The corpus comprises of two separate files, a train and a test set, coming from 201 and 100 documents respectively. Due to license restrictions of MEDLINE, abstracts are not contained in the corpus, but can be downloaded from MEDLINE using eUtils. The corpus can be used to assess and improve the performance of algorithms for information extraction. It is published for academic use only and usage for development of commercial products is not permitted. Corpus statistics: TRAIN set | TEST set sentences 1877 | 788 entities 5828 | 2480 pairs 1996 | 867 pos.pairs 497 | 312 neg.pairs 1499 | 555 The corpus contains the following information: DOCUMENT: - id: the document id specific to the corpus - origId: the Pubmed identifier SENTENCE: - id: the sentence id specific to the corpus - origId: the Pubmed identifier - text: the sentence text ENTITY: - charOffset: start and end positions of the entity text relative to the sentence - id: the entity id specific to the corpus - text: the entity text - type: the type of entity (Specific_miRNAs, Non-Specific_miRNAs, Diseases, Genes/Proteins, Relation_Trigger) PAIR: - e1: internal (corpus) id of the first entity that is part of the pair - e2: internal (corpus) id of the second entity that is part of the pair - id: the pair id specific to the corpus - interaction: presence of an interaction (True or False) P.S: Some of the symbols have been replaced with an entity: Special symbols | Entity ----------------------------------- > | > < | < & | & " | " The offset position of the original symbols have been retained in the corpus. ============================================================================================================================================================================= Annotation Guidelines for specific miRNA mentions This file consists of guidelines developed and followed for annotation of specific miRNA mention in text during the course of this work. Information regarding what has to annotated and what should not be annotated are described in detail with examples. ============================================================================================================================================================================= Relation Dictionary Version 1.0 The relation dictionary consists of 207 terms identified during manual annotation, describing the miRNA-related relations. The dictionary consists of the following information: - Unique identifiers (e.g. REL0001) - Class of the identified term (e.g. RELATION) - Normalized name to be used, mentioned before the colon symbol (e.g. regulate:) - Synonyms for the normalized names (separated by "|"), mentioned after the colon symbol ============================================================================================================================================================================= Non-Specific MiRNAs Dictionary Version 1.0 The dictionary consists of 17 terms identified during manual annotation, describing the general miRNA mentions in text. The dictionary consists of the following information: - Unique identifiers (e.g. MIR0001) - Class of the identified term (e.g. GENERALMIRNA) - Normalized name to be used, mentioned before the colon symbol (e.g. MICRORNA:) - Synonyms for the normalized names (separated by "|"), mentioned after the colon symbol ============================================================================================================================================================================= Supplementary Tables TableA.pdf Provides a quantitative estimate of the entries available in the dictionaries. MeSHAbbr, is the merged dictionary in which the MeSH terms and synonyms (from MeSH dictionary) are additionally tagged with the abbreviations from Allie database. TableB.pdf Gives a detailed evaluation results of the disease dictionaries used. MeSHAbbr, is the merged dictionary in which the MeSH terms and synonyms (from MeSH dictionary) are additionally tagged with the abbreviations from Allie database. TableC.pdf Shows the impact of the NERTri approach on the relation extraction, with detailed results of the evaluation. TableD.pdf The detailed results of comparisons between different relation extraction appraoches is provided in this file. ============================================================================================================================================================================= Please cite the following paper if you publish or present any research result obtained using this corpus: @ARTICLE{Bagewadi14, author = {Shweta Bagewadi and Tamara Bobi\'c and Martin Hofmann-Apitius and Juliane Fluck and Roman Klinger}, title = {Detecting miRNA mentions and relations from biomedical literature}, journal = {submitted}, year = {2014} }