We identify ohnologs using content-based synteny comparison of a vertebrate genome with itself and with six outgroup genomes. An overall flowchart of our approach has been detailed below:
To rule out spurious synteny, we developed a quantitative approach to assess the statistical confidence of each ohnolog pair (Singh et al. PLoS Comput Biol 2015). This quantitative approach and corresponding "q-score", ranging from 0 to 1, estimates the probability that each ohnolog pair is simply identified by chance. Hence, lower q-scores imply more statistically significant ohnolog pairs.
Finally, we filter and combine all the initial ohnolog candidates following the computational flowchart above and generate three sets of ohnolog pairs/families with varying level of statistical significance:
A gene search leads to Ohnolog Family Page where the ohnolog families constructed by our approach are displayed. Ohnolog partners for the families are displayed in different columns. A typical ohnolog family is depicted below e.g. for the human ERAS gene
For any gene search, ohnolog families for the most stringent criteria are displayed. If relaxing the q-score criteria leads to a change in ohnolog family, we show them for multiple criteria. In the above example RAS family is identified only using the relaxed criteria. However, in the example below, relaxing the q-score criteria for the human RGL1 family gets and additional gene RALGDS for the intermediate q-score criteria. There is no change in the family with the Relaxed q-score and hence it has not been shown here.
Genes within the same cell are small scale duplicates e.g. RGL4 – RALGDS. We use two different separators for SSDs: a comma (,) to distinguish if it is a recent SSD (after 2R-WGD), and a pipe (|) for an ancient SSD (before or around the same time as the 2R-WGD). Hence, RGL4 – RALGDS above have been duplicated by a recent SSD, while UBQLN3 | UBQLNL below have been duplicated by an SSD older than 2R-WGD.
If you wish to generate custom ohnolog families using a different criteria, you can use generate functionality. For generating families, we start with all possible ohnolog pairs i.e. with any q-score and filter them based on user specifications. Then we run depth-first search to generate custom families and merge old and recent SSDs. These families, can be downloaded from the result page.
More details on the server and methods can be found in our upcoming article. We highly appreciate your feedback to improve the server. Please write to us with bugs or suggestions.