Alternate RNA decoding results in stable and abundant proteins in mammals

(a) Distributions of substitution ratios for all SAAP identified in each dataset computed for each experiment (TMT set, CPTAC data) or sample (label-free data), using MS1 precursor ion intensities. N indicates the number of RAAS computed at the MS1 level in each dataset. (b) Median RAAS were computed for each unique SAAP-BP pair using reporter ion intensities (MS2, CPTAC data) or precursor ion intensities (MS1, label-free data). Distributions of median RAAS across all SAAP in a dataset are shown. N indicates number of unique SAAP-BP pairs identified in each dataset. (c) Median RAAS across all SAAP identified in each sample were computed using reporter ion intensities (MS2, CPTAC data) or precursor ion intensities (MS1, label-free data). Distributions shown are of these medians across all samples in a dataset. N indicates the number of samples in each dataset. (d) Substitution ratio distributions shown in (a), (b), (c) have consistent medians, highlighting variability in RAAS across datasets. (e) Upset plot showing overlap in unique SAAP identified across all datasets. Dataset combinations require at least 10 shared SAAP to be included in visualization. (f) Heatmap displaying the percentage of samples in each dataset in which SAAP identified in 6+ datasets are found. Hierarchical clustering shows a cluster of shared SAAP that are commonly identified across majority of samples in addition to 6+ datasets. (g) To confirm variability in RAAS across datasets, we looked at the subset of SAAP that were identified in at least 1 sample in at least 6 datasets. LUAD and LSCC substitutions consistently have the lowest RAAS, while PDAC substitutions have the highest RAAS. N indicated the number of RAAS computed for shared SAAP in each dataset. (h) Boxplots highlighting the difference between RAAS in CPTAC datasets relative to RAAS computed in LUAD. Only SAAP shared between LUAD and the compared dataset are used. Each data point is a log 10 (RAAS) difference computed for a unique SAAP-BP pair. (i) RAAS as a function of the minimum number of codon-anticodon mismatches needed for incorporating the detected amino acid across all datasets. (j) An example of a substitution that can be partially explained by synthesis errors arising from significantly (t-test) higher abundance of the amino acyl-tRNA ligase supplying the alternatively translated amino acid relative to the abundance of the amino acyl-tRNA ligase supplying the encoded amino acid. *: q-value < 10−3, **: q-value < 10−5, ***: q-value < 10−20. (k) RAAS negatively correlates to the codon stability coefficient, an empirical measure of codon usage. n denotes number of codons, r is Pearson correlation, p is two-sided correlation p-value computed against a non-correlating beta distribution, and the red line is the ordinary least squares fit. (l) RAAS distributions for SAAP identified and validated in human hepatocytes. (m) The stability of SAAP relative to BP in primary human B cells is inversely proportional to their RAAS. r is Pearson correlation, p is two-sided correlation p-value computed against a non-correlating beta distribution. (n) Same as (m) but in NK cells. The lower, middle, and upper lines of the boxplots in panels h-j correspond to the first quartile, median, and third quartiles, respectively. The upper whisker extends from the third quartile to the largest value and the lower whisker extends from the first quartile to the smallest value, each at most 1.5XIQR of the hinge. Data beyond the whiskers are outliers that are plotted as individual data points.