Text Mining Symposium 2007
Fraunhofer Institute for Algorithms and Scientific Computing SCAI

5th Fraunhofer-Symposium on Text Mining
24th and 25th of September 2007
Bonn-Aachen International Center for Information Technology – Bonn, Germany
Introduction
The 5th Fraunhofer-Symposium on Text Mining in Life Sciences will take place at the Bonn-Aachen International Center for Information Technology in Bonn on Monday (Reception) and Tuesday, September 24th and 25th.
The symposium is a community-wide appreciated conference where selected professionals from industry and academic research meet and discuss advancements in information extraction and knowledge discovery. This year’s event will focus on combining textual and image information in biomedical and chemical research.
The organisers are looking forward to welcoming you in the modern building of the Bonn-Aachen International Center for Information Technology in Bonn, located directly on the bank of the river Rhine. Please mark your calendar for this event, which attracted more than 100 participants in 2006.
25th September
Pre-Symposium Reception
Time: 17:00 p.m.
Time |
Subject |
Name |
9:00 – 9:15 |
OPENING ADRESS |
Martin Hofmann-Apitius – Head of Department Bioinformatics, Fraunhofer Institute SCAI |
9:15 – 10:00 |
The Second BioCreative Evaluation: Lessons Learned and Future Directions |
Lynette Hirschman – MITRE Corporation, Bedford, Massachusetts |
10:00 – 10:30 |
Valentina Eigner-Pitto – InfoChem GmbH, Munich |
|
10:30 – 11:00 |
DISCUSSION AND COFFEE BREAK |
|
11:00 – 11:30 |
Information Extraction from Patents: Combining Text- and Image-Mining |
Martin Hofmann-Apitius – Fraunhofer Institute SCAI |
11:30 – 12:00 |
Erick Gaussens – Product Life and Gilles Montier – TEMIS |
|
12:00 – 13:00 |
SOFTWARE DEMONSTRATION |
|
13:00 – 14:00 |
LUNCH |
|
14:00 – 14:40 |
Mining clinical electronic data for research and patient care: Challenges and solutions |
David Hanauer – University of Michigan, USA |
14:40 – 15:10 |
Efficient Patent Management with Text Mining Technologies |
Gert Jäger – PATEV GmbH & Co. KG, Munich |
15:10 – 15:35 |
Corinna Kolárik – Fraunhofer Institute SCAI |
|
15:35 – 16:05 |
COFFEE BREAK |
|
16:05 – 16:35 |
Towards a Semantic Web for Life Sciences at Novartis Institutes for Biomedical Research |
Thérèse Vachon – Novartis Pharma, Basel, Switzerland |
16:35 – 17:00 |
Roman Klinger – Fraunhofer Institute SCAI |
|
17:00 – END |
CONCLUSION |
Martin Hofmann-Apitius – Fraunhofer Institute SCAI |
Take home messages
Take home messages from the 5th Symposium on Text Mining in the Life Sciences by Martin Hofmann-Apitius
The 5th symposium on text mining in the life sciences has again seen a broad spectrum of interesting talks underlining the growing relevance of text mining technology in biology, chemistry and medicine. In her keynote presentation, Dr. Lynette Hirschman from the MITRE organisation presented an overview on the BioCreative critical assessment of text mining in molecular biology. She pointed out that this public critical assessment of technology has become an important event not only to the text mining technology community, but also to the consumers of text mining technology, namely the database community. BioCreative II has seen a significant increase of the number of participating groups, which is indicative for the increased perceiption of text mining perception as an important scientific discipline. Another notion from the second BioCreative competition is that there is a strong trend towards using machine learning for gene mention identification; as it seems there is also a shift from Hidden Markow Models to Conditional Random Fields as the preferred machine learning technology for the recognition of gene mentions. In her conclusions, Dr. Hirschman called for suggestions for the next BioCreative competition which may take part in 2008.
Dr. Valentina Eigner-Pitto from InfoChem GmbH in Munich gave a presentation on chemical information extraction and the underlying workflow. She stated that the overall costs of information extraction and database population in chemistry are too high to extract chemical information from enterprise archival documents or patents at reasonable costs. Therefore, she concluded, automated methods will become increasingly important in the future and this is the reason who InfoChem has started to establish a pipeline for automated information extraction in chemistry built on IBM´s chemical name recognition approach and SCAI´s chemical structure reconstruction tool ChemoCR. In her presentation, Dr. Eigner-Pitto also mentioned a recent benchmarking that InfoChem did on name-to-structure tools. She could convincingly demonstrate that in a benchmarking experiment using a randomized corpus (more than 2000 chemical names) of chemical named entities none of the tools currently offered in the market reached a satisfying ratio of recall and precision.
In the following presentation, Prof. Dr. Martin Hofmann-Apitius gave an overview on recent advancements in the research and development work done at Fraunhofer SCAI. ProMiner, the technology for named entity recognition, has been extended by new dictionaries for chemical and medical entities and a new named-entity-recognition approach based on machine learning. Moreover, a persistence layer based on a Oracle-database has been developed which stores the results of ProMiner analyses. Beyond the analysis of text entities, SCAI has made a lot of progress with the chemoCR tool, a software that translates chemical structure depictions into computer-understandable chemical information. Prof. Hofmann-Apitius demonstrated how ProMiner, the persistence layer for extracted information @neuLink, and chemoCR can be combined for the analysis of full-text patent documents. He clearly stated that this work has just been started, but the encouring news is that multi-modal information extraction from text and chemical structure depictions works.
In the next talk, Erick Gaussens and Gilles Montier were reporting on the use of text mining technology for drug safety risk management. This is in particular relevant for non-quantitative data in monitoring regimes and for scientific watch. In the latter, text mining is used to screen the scientific literature and referenced databases for new evidences on drug - drug interactions, scientific watch, contra-indications and observations that are reported after the official end of phase IV.
Dr. David Hanauer from the University of Michigan gave an overview on the real needs of medical practitioners for information retrieval and information extraction in clinical records. The dimensions of the challenge he outlined are quite impressive: the hospital of U Michigan alone has about 450 GB of free text data in its patient data repository; the examples he gave demonstrated that in contrast to scientific literature mining, text mining in medical records faces quite different challenges. One of the major reasons for corruption of clinical information is the fact that clinical diagnoses of medical doctors are usually recorded as voice on recording systems; these voice records are transcribed in text in countries like India or Pakistan where labour is cheap. However, this step in the information chain results in the introduction of a lot of errors and consequently text mining systems used in the analysis of medical records have to be able to cope with such errors.
Gert Jäger of PATEV, a patent analysis company in Munich, highlighted the challenges associated with the analysis of patent literature. The challenge, he stated, with patent literature is not only the significant number of patents in the world (more than 80 million patents; growth rates about 2200 patents per day) but also with the way how patents sometimes tend to camouflage the real claim or invention instead of unraveling it. One very interesting statement from his talk was that he believes that about 80% of all technical knowledge resides in patents and not in other scientific literature.
In the following talk, Corinna Kolarik a PhD student working at Fraunhofer SCAI, was giving a nice example for the application of text mining in the area of pharmaceutical chemistry. She presented an approach how the annotation of bioactive chemical compounds can be greatly enhanced through automated information extraction procedures. The approach is based on so-called Hearst Patterns that identify factual statements about drugs and other bioactive compounds. The work she presented has been published in the ISMB conference proceedings issue of BIOINFORMATICS.
In her following presentation, Dr. Therese Vachon from Novartis Pharma demonstrated the implementation at Novartis of a system that combines literature mining, competitive intelligence and patent mining with bioinformatics data sources and tools. The system supports queries that shed light on the current state of knowledge on certain targets, defined types of chemistry or system-oriented information such as pathway information. Information extraction covers not only named entity recognition, but is broadened to support also passage and fact retrieval and the mapping of textual entities to database - resident information. Semantic interoperability of textual information and database information is achieved through the automated construction and mapping of thesauri and ontologies to text and databases.
Finally, in the last talk of the symposium, Roman Klinger, another PhD student from SCAI, gave a nice overview on machine learning approaches in text mining and in particular the use of conditional random fields (CRF) for named entity recognition. As Lynette Hirschman already pointed out in her talk, there is a strong tendency in the international research community working in the area of text analysis to make use of CRF to identify named entities in scientific text. Roman Klinger gave an overview on the mathematical and algorithmic basis of CRFs and demonstrated, how he applied this machine learning approach to the problem of the identification of allelic gene variation (SNP) information in scientific literature. We learned from his presentation how named entities such as SNPs, but also IUPAC names can be identified in text and how this approach complements dictionary-based approaches as they have been widely used e.g. in the ProMiner technology.
With Roman Klinger´s talk the scientific programme of the 5th Symposium on Text Mining in the Life Sciences concluded. We have heard really interesting presentations and got a clue on the speed how this field of automated text analysis in the life sciences develops. The coming year will see the 6th Symposium on Text Mining in the Life Sciences and we already think about topics to be covered in the next symposium.
Photo Gallery
Fotos von Bianca Backert, Mediengestalter-Auszubildende


















Set Bookmark