Information extraction / Semantic text analysis
Fraunhofer Institute for Algorithms and Scientific Computing SCAI
Efficient information access is a major challenge in life science research today. As a result, we observe a steadily growing demand to integrate information from various sources and across different disciplines in life sciences.
However, a large fraction of this information is only available in free text format in scientific articles, data base text fields or patents. Moreover, this volume of literature is growing nearly exponentially.
The Fraunhofer Institute SCAI aims to develops text mining methods to support
- Named entity recognition,
- Terminology extraction and classification,
- Relation extraction,
- Condensation, retrieval and search abilities,
- Usability in systematic analysis and interpretation of the experimental data
of relevant terminology and information in the life science field. Processing of text mining data for updating, curation, search, and visualization facilities for end users are indispensable. In two projects we develop an information system and web services for different applications (see AIDB, @neurIST).
Integration in the UIMA framework
The Unstructured Information Management Architecture (UIMA) enables interoperability of components to form customizable, powerful work flows. ProMiner with various dictionaries and different instances of CRFs are already integrated in UIMA and can be combined in larger text processing structures. Due to the high amount of processing and storage requirements in this context, we use UIMA to create customizable text- and image-mining solutions on distributed resources in cluster computing as well as to create grid services.