Genome Function Topology

Project start /

The Genome Function Topology project aims to discover novel principles of the spatial and functional organization of the human genome, by taking into account the global three-dimensional organization of the genome and large-scale knowledge graphs.

The current state of the art for connecting genome structure knowledge to function knowledge is to consider the local “genome context”. That is, the sequence and topology of the genome are taken into consideration, but only with regard to the annotation of individual genes or gene-disease associations in narrowly-defined functional contexts.

A significant limitation of this approach has revealed itself in recent years. Namely, advances in high throughput chromatin interaction analysis have shown that the spatial organization of chromatin in the nucleus has a strong connection to function. Some of this is well-established, but much is still being uncovered. For example, it is known that regulatory regions, like promoters and enhancers, of a gene can be located quite far from the coding portion, linearly speaking, because the chromosome bends and loops during transcription. Such interactions can occur megabases apart. On the other hand, less is known about inter-chromosomal interactions. Moreover, chromatin “clumping” into distinct domains, some being preferentially located near the center of the nucleus and others near the nuclear envelope, is known to have a regulatory effect. Accordingly, any approach that focuses only on local context will necessarily miss significant links between the spatial and functional organization of the genome.

We follow a fundamentally different approach: to consider the full genome topology by adding global functional context via data-driven annotations and knowledge-based modeling in BEL. Knowledge graphs derived from full-text articles will be anchored in the human genome, along with a representation of three-dimensional genome structure. For example, we can consider functional topological enrichment analysis: given a gene, find all genes whose enhancer is “close” to the enhancer of the target gene, and identify the pathways in which these genes are involved. We can examine the significance of this enrichment analysis via some predefined benchmark analysis. Once this methodology is in place, we can identify other genetic objects of interest for functional enrichment, and expand the methodology to these objects. We can explore the distribution over the genome of genes involved in a pathway of interest, or how the knowledge graph looks over conserved versus non-conserved regions of the genome.

Project duration: 11/2018 – 10/2021