The first human genome was published in 2004 and it took 13 years for the complete genome to be sequenced1. The scientific community was very excited to map all the human genes in the genome and it was estimated at that time, that the human genome encoded 25,000 to 40,000 genes2. However, this number proved to be a significant overestimate, and it is now estimated that our genome has about 19,000-20,000 genes. This accounts for only 2% of the genetic material, and raises an important question – what about the other 98% of our genome that is referred to as the dark genome?
Very little is known about the dark genome but it is thought that gene regulation elements including repeat sequences, enhancers, and non-coding RNAs are encoded in these regions. Research into the dark genome has been limited as the focus of drug discovery and biomedical research has been on identifying and modulating protein encoding genes associated with disease phenotypes3. However, some studies on the dark genome matter reported that gene regulatory elements occupy more sequence space than the actual protein coding genes and many single nucleotide polymorphisms (SNPs) are found outside of the protein coding genes2. These findings suggest that there is a lot of very valuable information and potentially a large number of drug targets that are hidden in the dark genome. Scientists are employing powerful tools like the CRISPR-Cas9 gene editing system to probe the dark genome where they can disrupt specific areas and observe the outcome in human cells. The use of other high throughput tools will also help identify the roles of specific regions of the dark genomes. To date, it is estimated that over 200,000 sequences in the dark genome encode proteins that may be disrupted during disease development4. This is not unexpected as RNA and protein coding sequences have been reported across the genomes of mice neurons, specific fishes and parasites4.
A recent study has shown that genes encoded in the dark genome have a direct impact on disease development. Researchers at the University of Cambridge probed the dark genome to identify new targets for bipolar disorder and schizophrenia, and published their findings at the end of 20214. Both diseases have been shown to have a strong genetic link as the heritability of both diseases conditions is about 70%. However, to date, conventional analyses of known genes have not been able to account for the strong heritability so the researchers probed the dark genome to identify factors associated with these diseases. Since schizophrenia and bipolar disorder have been primarily described in humans and are associated with cognition, the researchers searched for novel open reading frames (nORFs) in specific regions of the genome called human accelerated regions (HARs). HARs appear to be newly evolved genomic regions that are human specific.
The study identified over 3,000 nORFs with about 56 nORFs associated with schizophrenia and 40 nORFs associated with bipolar disorder and the researchers hypothesized that some of these nORF encoded proteins could be viable drug targets. The nORF sequences were also identified in transposable elements (TEs) suggesting that the nORF products played a role in gene regulation. Interestingly, changes in expression level of specific peptides encoded by the nORFs correlated with specific phenotypes including psychosis and suicide. This suggests that specific nORFs could be investigated further to develop disease management therapies. Research and drug target discovery in the dark genome regions is in the early days and it will be important to understand the significance of the nORF encoded proteins in disease development and correlate the nORF expression with the heritability of specific diseases.
If the nORF encoded proteins are shown to be true drug development targets, it is likely that accelerated research into the dark genome may provide answers to several questions on the genetic causes of diseases that have perplexed scientists so far.