OHNOLOGS

A Repository of Genes Retained from Whole Genome Duplications in the Vertebrate Genomes

Algorithm

We identify ohnologs using content-based synteny comparison of a vertebrate genome with itself and with six outgroup genomes. An overall flowchart of our approach has been detailed below:

image/svg+xml WGD - Outgroups Comparison WGD Genome Self Comparison A B C D E F G H K I J Outgroup genome Ortholog pairs Vertebrategenomes Identify Synteny blocks & anc-hors with window size W Calculate p-value Identify anchors belonging tomultiple blocks Select p-value for least conserved region, defining outgroup q-score Sampledall windowsizes ? Combine ohno pairs for all windowsizes by geometric mean of q-score Sampledall outgrouporganisms ? Combine ohno pairs from all outgroupsby multiplication of q-score Geometric mean of outgroup q-score from vertebrate genomes Paralogs duplicated at the base of vertebrates Vertebrategenomes Calculate p-value Take geometric mean of p-values, defining self q-score Sampledall windowsizes ? Combine ohno pairs for all windowsizes by geometric mean of q-score Filter and combine ohno pairs from bothself and outgroup synteny comparision Final ohnolog pairs Combine pairs to generate ohno families Final ohnolog families Yes No Yes No Yes No W = 100 W = 200 W = 300 W = 400 W = 100 W = 200 W = 300 W = 400 Outgroup-1 Outgroup-2 Outgroup-3 Outgroup-4 Identify Synteny blocks & anc-hors with window size W Increase window size Increase window size Run for next outgroup genome Geometric mean of self synteny q-score from vertebrate genomes

q-score

To rule out spurious synteny, we developed a quantitative approach to assess the statistical confidence of each ohnolog pair (Singh et al. PLoS Comput Biol 2015). This quantitative approach and corresponding "q-score", ranging from 0 to 1, estimates the probability that each ohnolog pair is simply identified by chance. Hence, lower q-scores imply more statistically significant ohnolog pairs.

Finally, we filter and combine all the initial ohnolog candidates following the computational flowchart above and generate three sets of ohnolog pairs/families with varying level of statistical significance:

  • Strict:             q-score(outgroup) < 0.01 AND q-score(self) < 0.01
  • Intermediate: q-score(outgroup) < 0.05 AND q-score(self) < 0.3
  • Relaxed:         [q-score(outgroup) < 0.05] OR [q-score(outgroup) < 0.5 AND q-score(self) < 0.01]


    Understanding Ohnolog Families

    A gene search leads to Ohnolog Family Page where the ohnolog families constructed by our approach are displayed. Ohnolog partners for the families are displayed in different columns. A typical ohnolog family is depicted below e.g. for the human ERAS gene

    RAS Family

    For any gene search, ohnolog families for the most stringent criteria are displayed. If relaxing the q-score criteria leads to a change in ohnolog family, we show them for multiple criteria. In the above example RAS family is identified only using the relaxed criteria. However, in the example below, relaxing the q-score criteria for the human RGL1 family gets and additional gene RALGDS for the intermediate q-score criteria. There is no change in the family with the Relaxed q-score and hence it has not been shown here.


    RGL Family

    Genes within the same cell are small scale duplicates e.g. RGL4 RALGDS. We use two different separators for SSDs: a comma (,) to distinguish if it is a recent SSD (after 2R-WGD), and a pipe (|) for an ancient SSD (before or around the same time as the 2R-WGD). Hence, RGL4 RALGDS above have been duplicated by a recent SSD, while UBQLN3 | UBQLNL below have been duplicated by an SSD older than 2R-WGD.


    UBQLN1 Family

    Generate Ohnolog Families

    If you wish to generate custom ohnolog families using a different criteria, you can use generate functionality. For generating families, we start with all possible ohnolog pairs i.e. with any q-score and filter them based on user specifications. Then we run depth-first search to generate custom families and merge old and recent SSDs. These families, can be downloaded from the result page.


    More details on the server and methods can be found in our upcoming article. We highly appreciate your feedback to improve the server. Please write to us with bugs or suggestions.