SCAIView
Fraunhofer Institute for Algorithms and Scientific Computing SCAI
Knowledge Discovery and Semantic Search
SCAIView is an advanced search engine and addresses questions of interest to general biomedical researchers. Most of this knowledge is given as unstructured text (publications, text fi elds in databases). SCAIView allows for full text and biomedical concept searches with large biomedical terminologies and outstanding text mining technologies. Advanced retrieval technology allows answering complex queries.
- Situation
- Document retrieval
- Knowledge Discovery
- Advantages
- Technology
Situation
SCAIView is an advanced search engine and addresses questions of interest to general biomedical researchers. Most of this knowledge is given as unstructured text (publications, text fields in databases). SCAIView allows for full text and biomedical concept searches with large biomedical terminologies and outstanding text mining technologies.
Advanced retrieval technology allows answering complex queries such as:
- Which genes/proteins are related to a certain context (e.g. disease/pathway/epigenetics)?
- Give me an overview of relevant biomedical concepts in my subcorpus
- Which drugs are relevant for this context?
- To which diseases is my gene associated?
- Which chromosomes show linkage to the disease?
- Which variations are mentioned in the context of the disease and could they be found in dbSNP?
- What other diseases are possibly co-occurring with my relevant disease?

- Entity View with aggregated Resultsets and linkout possibilities.
Document retrieval
The documents are retrieved via free text queries in combination with semantic or ontological search of biomedical entities of interest. The biomedical entities are embedded in searchable hierarchies and span from genes, proteins, accompanied SNPs to chemical compounds and medical terminology.
With Ontological Filtering, it is possible to restrict the result to a subset e.g. genes on a KEGG pathway or in a Cytoband region.
Knowledge Discovery
The most important feature of any Knowledge Discovery tool or any search engine is the ranking according to relevance of the results. For this we use a technique termed relative entropy. Even if some proteins like insulin are mentioned quite often in the context of a search, it will be ranked low if it is not mentioned over-proportional in your specific query result set.
The other property of real Knowledge Discovery, novelty detection, could be shown in several biomedical applications.
Examples
In a review on the “Genetics of intracranial aneurysms”, 18 associated genes are mentioned. A query with SCAIView for: “intracranial aneurysm AND MESH: genetics” with the selection of human genes / proteins leads to the retrieval of 122 genes with top ranked hits contained in the experts review. Even novel associated genes not described in the review have been found.

- Documents for retrieved entities could be visualized and entity types are highlighted in different colours.
Advantages
- Superior text mining technology based on approximative search and machine learning
- Support for Confidence Information (adjustment of precision/recall)
- Combination of full text, semantic and ontology search</li/>
- Very fast retrieval from large corpora and relevance ranking of retrieved results</li/>
- Support of Large Resultsets (e.g. 1 Mio Hits)
- Relevance Ranking on aggregated Entity search results
- Overview of found terminology in defined sub corpora
- Links to relevant biomedical databases (e.g. EntrezGene, dbSNP, KEGG, GO, DrugBank)
- Document Visualisation with user defined highlighting
- Export to Excel or Cytoscape
- Programmatic access via an Application Programmers Interface (API)
Technology
The selected biomedical entities are found by an approximate search algorithm implemented in the Fraunhofer-Gesellschaft information extraction tool ProMiner® which additionally disambiguates synonyms of entities to unique identifiers in public available entity databases. ProMiner® has been evaluated as one of the best tools regarding protein and gene detection at the 2004 and 2006 BioCreAtIve contest.
Additionally non-enumerable entities like IUPAC names are found by a machine learning based ProMiner plugin .
Requirements
Client
Browser Firefox >2.0, Internet Explorer >6.0
Server
- Minimum RAM: 2GB more is better
- Operating System: Linux, Windows XP, Windows 7, Solaris
- Overview of found terminology in defined sub corpora
- Application Server: Tomcat >5.5
- Multi-Core processors: Recommended for near linear scaleup
TEXT…

Set Bookmark