Positions for undergraduates and graduates

The Department of Bioinformatics at Fraunhofer SCAI offers positions to students who intend to do a Diploma-, Bachelor- or Master-Thesis in the area of computer science. Candidates should be interested in one of the following research topics.

Some examples for successful “external”
Diploma- or Master thesis projects:


  • Peter Kral, Ludwig Maximillian University Munich, Germany (Dipl. Bioinformatics, Apr 07) 'Chemical structure recognition via an expert system based graph exploration' 
  • Antje Wolf, Freie University Berlin, Germany (M.Sc. Bioinformatics, May 06) 'Jenseits von Consensus Scoring: Qualitativer Vergleich von Docking Tools' 
  • Le Thuy Bui Thi, Bayerische Julius-Maximilians-Universität Würzburg, Germany (Diplom Informatik, Mar 05) 'Graph-Rekonstruktion im Rahmen chemischer Strukturrepräsentationen' 
  • Oliver Wäldrich, Hochschule Bonn-Rhein-Sieg, Germany (M.Sc. Informatics, November 05) 'MetaScheduling-Architekturen in komplexen Grid-Umgebungen'

Master/Diploma/Theses or PhD

Chemical Image Analysis

Chemical entities can appear in scientific texts as trivial and brand names, assigned catalog names, or IUPAC names. However, the preferred representation of chemical entities is often a two-dimensional depiction of the chemical structure. Depictions can be found as images in nearly all electronic sources of chemical information (e.g. journals, reports, patents, and web interfaces of chemical data bases).

Nowadays these images are generated with special drawing programs, either automatically from computer-readable file formats or by the chemist through a graphical user interface. Although drawing programs can produce and store the information in a computer-readable format, chemical structure depictions are published as bitmap images (e.g. GIF for web interfaces or BMP for text documents). As a consequence, the structure information can no longer be used as input to chemical analysis software packages. To make published chemical structure information available in a computer-readable format, images representing chemical structures have to be manually converted by redrawing every structure. This is a time-consuming and error-prone process.

In order to solve the problem of recognizing and translating chemical structures in image documents, our chemoCR system combines pattern recognition techniques with supervised machine-learning concepts. The method is based on the idea of identifying from structural formulas the most significant semantic entities (e.g. chiral bonds, super atoms, reaction arrows…). The workflow consists of three phases: image preprocessing, semantic entity recognition, and molecule reconstruction plus validation of the result. All steps of the process make use of chemical knowledge in order to detect and fix errors. The system can be adapted to different sets of input images. The reconstructed connection table can be used by all chemical software.

You should be interested in applying your computer science background in the field of cheminformatics. You should have a strong background in either field of: graph algorithms and data structures, pattern recognition and machine learning, image analysis and/or formal languages. Some extended experience in software development (we are using JAVA and Eclipse) is necessary.

We offer a challenging master thesis topic in an industrial project in the largest research organization for applied research in Germany. You will become part of our software developer team. For excellent students who have done their master thesis with us we can offer a PhD topic.

Text Mining

Providing relevant information for all molecules (genes, proteins, metabolits, drugs) participating on different biological phenomena and diseases is of central importance for the development of new hypotheses and the analysis of „large scale“ – experiments.

A large fraction of this information is only available in free text format in scientific articles, data base text fields or patents. Therefore we develop text mining methods to support:

  • named entity recognition
  • terminology extraction and classification
  • relation extraction
  • condensation, retrieval and search abilities

of relevant terminology and information in the life science field.

In addition to the more natural language based methods, we develop on the application side methods for:

  • combination of textual and data base resources of biomedical networks
  • usage and systematic analysis of biomedical networks for the interpretation of experimental data
  • different visualization and selection strategies.

As computer scientist/computer linguist you should be interested in applying your computer science background in the field of bioinformatics or bio-text mining. You should have a strong background in either field of: graph algorithms and data structures, pattern recognition and machine learning, image analysis and/or formal languages or one of the natural language topics above. Some extended experience in software development (we are using JAVA and Eclipse) is necessary.

We also offer interdisciplinary topics for biologists for the combination of applied molecular biology with methods from computational biology. A background in computer science would be helpful but is not a conditio sine qua non.

Machine Learning/Data mining

Machine learning/data mining is seen as a cross-sectional discipline at Fraunhofer SCAI/Bio and topics for master theses are aligned to our application groups in text mining or cheminformatics. In cheminformatics, we are interested in machine learning based pattern recognition in images and the improvement of our structure recognition technology with data mining methods. Our text mining applications cover the areas of named entity recognition and word sense disambiguation. Here we are focused on graphical models.

Additional topics are in the area of association finding from biomedical information and the generation of disease specific probabilistic models.

The requirements for applying to a master thesis in data mining are good knowledge in statistics and good programming skills. Basic knowledge about data bases and being familiar with statistical languages like R might be a plus.

Information Systems

Central scientific challenges we address in the area of information systems are intelligent methods to link semantically related data bases with the idea in mind to integrate heterologous data sources in the life sciences. The motivation for our research in this direction stems from the observations that the majority of life science data bases represent selected aspects of e.g. molecular biology to a reasonable degree of detail; but they fail to cover the conceptual space spanning from chemistry via biology to medicine.

We are not interested in data warehousing approaches, but rather try to use abstraction layers such as ontologies to mediate between heterologous information.

Young scientists interested in data integration and data interoperability are welcome to join our team working in information systems. A good background in semantic mediation and an open mind that appreciates life science domain knowledge are necessary to be successful in this research area.

Grid Computing

Grid computing is basically about sharing resources and collaborating, resources are often geographically dispersed and collaboration is frequently crossing administrative domains. Apparently they occur in scientific environments but also in appropriate business models in commercial environments.

Making use of distributed resources for applications or services within a single administrative domain or across multiple domains raises a number of topics to be resolved on the middleware layer. These issues include authentication and authorization, orchestration of resources, mapping of applications to suitable compute resources, management of licenses for commercial software, and Service Level Agreements (SLAs). Today some limited or proprietary solutions for these issues exist - usually only supporting local environments. However, grids and Service-oriented Architectures (SoA) as evolving technologies for executing applications or services in both scientific and commercial environments will only become a real option, if interoperable standards based solutions will be available.

The grid middleware research group is focusing on solutions for the problems mentioned above. The group is involved in a number of European projects, the German D-Grid and the Open Grid Forum (OGF). Research and development topics we are currently working on include:

  • interoperable authentication and authorization mechanisms
  • mapping of applications to suitable computer resources
  • orchestrating multiple distributed resources and services to improve the QoS for applications in the grid
  • management of licenses for commercial applications in distributes environments
  • negotiation and management of SLAs
  • text mining in the grid.

Additional topics related to grid computing for life sciences include

  • grid infrastructures supporting life science informatics
  • gridification of tools and workflows
  • meta-information and service annotation
  • role of ontologies for service-discovery.

You should have a background in distributed computing and a good understanding of the concept of grids and the major grid middleware. Moreover, as most of our developments are targeting web-service or grid-service environments, you should have experience in web-service programming. Some extended experience in software development is necessary - we developing with JAVA.

We offer challenging master thesis topics in a number of projects. The environment is multi-disciplinary and given your background and expertise we are happy to offer thesis topics that bridge the gap between life science applications and grid middleware. You will become part of our research group. For excellent students who have done their master thesis with us we can offer a PhD topic.