The Codon Adaptation Index (CAI)[1] is the most widespread technique for analyzing codon usage bias. As opposed to other measures of codon usage bias, such as the 'effective number of codons' (Nc), which measure deviation from a uniform bias (null hypothesis), CAI measures the deviation of a given protein coding gene sequence with respect to a reference set of genes. CAI is used as a quantitative method of predicting the level of expression of a gene based on its codon sequence.[1]
Rationale
Ideally, the reference set in CAI is composed of highly expressed genes, so that CAI provides an indication of gene expression level under the assumption that there is translational selection to optimize gene sequences according to their expression levels. The rationale for this is dual: highly expressed genes need to compete for resources (i.e. ribosomes) in fast-growing organisms and it makes sense for them to be also more accurately translated. Both hypotheses lead to highly expressed genes using mostly codons for tRNA species that are abundant in the cell.
Implementation
For each amino acid in a gene, the weight of each of its codons represented by a parameter termed relative adaptiveness (wi), is computed from a reference sequence set, as the ratio between the observed frequency of the codon fi and the frequency of the most frequent synonymous codon fj for that amino acid.
The CAI of a gene is simply defined as the geometric mean of the weight associated to each codon over the length (L) of the gene sequence (measured in codons).[2]
See also
References
- 1 2 Sharp, Paul M.; Li, Wen-Hsiung (1987). "The codon adaptation index-a measure of directional synonymous codon usage bias, and its potential applications". Nucleic Acids Research. 15 (3): 1281–1295. doi:10.1093/nar/15.3.1281. PMC 340524. PMID 3547335.
- ↑ Gerstein, Mark; Bussemaker, Harmen J.; Jansen, Ronald (2003-04-15). "Revisiting the codon adaptation index from a whole‐genome perspective: analyzing the relationship between gene expression and codon occurrence in yeast using a variety of models". Nucleic Acids Research. 31 (8): 2242–2251. doi:10.1093/nar/gkg306. ISSN 0305-1048. PMC 153734. PMID 12682375.