Research Group in Bioinformatics

title page of JCC, showing our cluster analysis

Biological research at AWI in the field of biodiversity, biomineralisation, temparature adaptation and others is becoming more and more involved with the use of biological information systems such as genome and proteome databases combined with several tools for information query, management and mapping. The research group in Bioinformatics, as part of the scientific computing group of AWI's computing center, was established in 2004 as a facility to provide services to projects requiring bioinformatics and data analysis background. The group provides access to databases and develops new algorithms and integrated software for the use in scientific environments such as the molecular genetics laboratories in house. In co-operation with Hochschule Bremerhaven and others, students are able to work in the field of sequence analyses participating in scientific projects at AWI.
The group also provides technical support and infrastructure for the hosting of databases and applications running on compute systems such as high-performance Linux-clusters installed at the computing center.

We also provide data management support, i.e. for publication of sequence data and data products to standard data repositories.

Research topics

The bioinformatics working group participates in data analyses in diverse projects at the AWI in phylogenetics, phylogenomics, population genetics, EST, Metatranscriptomics and phylochip analyses. Protein targeting and identification of signal-peptides is a further topic of the group in evolution research. We also support research in chemical structure prediction by means of NMR, cluster analysis of mass spectra (LC/MS). This is complemented by research based on molecular dynamics simulations. Within the AWI research program PACES II the group contributes insights into evolutionary adaptation to cold climate, characterising cold-active enzymes by their biophysical properties, and in their proteomic context. Reecently we entered the field of large-scale, multi-sample metatranscriptomic analyses within the Sea of Change consortium, putting an emphasis on efficient data storage and fast taxonomic/functional analysis pipelines on the tera-byte scale. Contributions to upcomming genome projects on brown algae (Phaeoexplorer, collab. with France) are underway.

Clustering Algorithms

Graph clustering requires an algorithm to cut edges hierarchically. MCL is an example for this, applied here to cut a sequence similarity graph into clusters of similar sequences. Similarity can be computed from BLAST scores or e-values. For cDNA libraries from different species it makes sense to run a full tblastx sequence comparison of all against all. We have ideas to augment this library overlap estimate with functional data on a core-set of eukaryotic functions as proposed, e.g., in CEGMA ( See Harms L, Frickenhaus S, Schiffer M, Mark FC, Storch D, Pörtner HO, Held C, Lucassen M. (2013) Characterization and analysis of a transcriptome from the boreal spider crab Hyas araneus. Comparative Biochemistry and Physiology, Part D 8 2013, 344-351, doi:10.1016/j.cbd.2013.09.004, online 2013 Okt 10.

Tool development

Our software development projects include:

  • the standalone phylogenetic annotation tool PhyloGena
  • the software PhylochipAnalyzer for the analysis of phylochips with hierarchical probes
  • in the field of workflow software for sequence analysis, we develop and integrate tools into the Staden-package
  • for the cluster-analysis of molecular dynamics dihedral angles a set of R-scripts, see publications
  • for comparative transcriptomics, R scripts are developed for a statistical treatment and display of digital gene-expression profiles and functional relations

See our software page for more information.

ARB support

We support ARB ( for molecular phylogeny. For example, we help with installation, database merging. You can download customized ARB-databases from There is the SINA-aligner for aligning your LSU/SSU-sequences into