Welcome to CUBAP’s documentation!

Codon Frequency

Graphs:

Average Codon Frequency
For the selected gene(s)/isoform(s), the average number of times each codon occurs across all samples.
Standard Deviation of Frequency
For the selected gene(s)/isoform(s), the standard deviation of the number of times each codon occurs across all samples.
Average Codon Frequency by Superpopulation
This violin plot shows the frequency of a certain codon, in the selected gene(s)/isoform(s), across all populations. Thicker sections represent a greater number of samples that contain that number of codon occurances.
Average Frequency by Population
This graph shows the same information as the violin plot except individual subpopulation averages are shown. Each point represents the average frequency of the selected codon for all samples in that subpopulation.

Options:

Select Gene
Use the search and dropdown box features to find your gene(s) of interest. Click on the gene name to query it. You can query multiple genes by holding either the ‘CTRL’ or ‘command’ key. Next to the genes are the isoform numbers; the longest ones are marked. You must next select an isoform for every gene you’ve selected in order to view the data in the graphs.
View Options
If you have queried multiple genes or isoforms, you can view the data in two different ways. You can select the ‘Compare Genes’ button in order to view codon frequencies of different genes side by side. You can also select the “Average of Genes” button in order to view the average codon frequencies across all selected genes/isoforms. By default, the ‘Average of Genes’ view option is selected. These view options only affect the Average Codon Frequency and Standard Deviation of Frequency graphs.
Select Population
Choose a super- or subpopulation to view only the codon frequency data from those populations. This will filter every graph.
Select Codon
Choose a codon of interest to view how the frequency of this codon, in the selected gene(s)/isoform(s), across all populations. This only alters the Average Codon Frequency by Superpopulation and Average Frequency by Population graphs.

Identical Codon Pairing & Co-tRNA Codon Pairing

These visuals are set up identically to the Codon Frequency visual except that instead of showing the frequency of each codon, it shows the frequency of each codon pair. For co-tRNA codon pairing, synonymous (but not identical!) pairs are shown by their common amino acid.

Codon Aversion

Graphs:

Total Codon Aversion Across all Genes by Superpopulation
This graph shows how often each codon is averted. Specifically, it is the total number of alleles, summed for each gene, in which the codon is not present.
Total Number of Alleles per Superpopulation with Codon
Across the x-axis the codon aversion motif (all the codons that are missing in the selected gene(s)/isform(s)) and the number of alleles, per superpopulation, that are missing that codon.

Options:

Select Gene
Use the search and dropdown box features to find your gene(s) of interest. Click on the gene name to query it. You can query multiple genes by holding either the ‘CTRL’ or ‘command’ key. Next to the genes are the isoform numbers; the longest ones are marked. You must next select an isoform for every gene you’ve selected in order to view the data in the graphs.
Compare Subpopulations
Click this button to view the number of alleles that are missing codons by subpopulation instead of superpopulation. This only affects the Total Number of Alleles per Superpopulation with Codon graph. Click the Reset button to view superpopulation data again.
Select Subpopulations
This filter only applies to the Total Number of Alleles per Subpopulation with Codon graph. It allows you to ivew the number of alleles of only certain subpopulations.

Ramp Sequences

Graphs:

Ramp Harmonic Mean RSCU by Subpopulation
For all selected gene(s)/isoform(s), the harmonic mean of all RSCU values for each codon in the ramp sequence. This is plotted in a box and whiskers plot by subpopulation.
Ramp Harmonic Mean RSCU by Superpopulation
For all selected gene(s)/isoform(s), the harmonic mean of all RSCU values for each codon in the ramp sequence. This is plotted in a box and whiskers plot by superpopulation.
Gene Harmonic Mean RSCU by Subpopulation
For all selected gene(s)/isoform(s), the harmonic mean of all RSCU values for each codon in the entire gene sequence. This is plotted in a box and whiskers plot by subpopulation.
Gene Harmonic Mean RSCU by Superpopulation
For all selected gene(s)/isoform(s), the harmonic mean of all RSCU values for each codon in the entire gene sequence. This is plotted in a box and whiskers plot by superpopulation.
Percent Samples With Ramp
A pie chart that shows what percentage of individuals in the selected superpopulation(s) or subpopulation(s) have a ramp sequence in the selected gene. Only populations that have at least one individual with a ramp sequence are shown.

Averages:

Ramp Sequence

  • RSCU: the average RSCU value for all codons in the ramp sequence, from all populations
  • Length: the average length of the ramp sequence, in number of codons, from all populations

Entire Gene

  • RSCU: the average RSCU value for all codons in the entire gene sequence, from all populations
  • Length: the average length of the entire gene sequence, in number of codons, from all populations

If multiple genes/isoforms are selected, these values will also be the averages of all those.

Options:

Select Gene
Use the search and dropdown box features to find your gene(s) of interest. Click on the gene name to query it. You can query multiple genes by holding either the ‘CTRL’ or ‘command’ key. Next to the genes are the isoform numbers; the longest ones are marked. You must next select an isoform for every gene you’ve selected in order to view the data in the graphs.
Show Superpopulation
Switch the box plots to group RSCU values by superpopulation.
Show Subpopulation
Switch the box plots to group RSCU values by subpopulation.
Pie Chart
Show the frequency of individuals/samples in a population that have a ramp sequence in a pie chart.
Table
Show the frequency of individuals/samples in a population that have a ramp sequence in a table. All populations are shown.

Nucleotide Composition

Graphs:

Average Nulceptide Frequency
The average number of occurances of each nucleotide in the selected gene(s)/isoform(s).
Standard Deviation of Nucleotide Frequency
The standard deviation of the number of occurances of each nucleotide in the selected gene(s)/isoform(s).
GC Content Across Populations
A violin plot of GC content compared across Superpopulations.
Nucleotide Frequency by Superpopulation
The frequency of the selected nucleotide for each superpopulation.
Nucleotide Frequency by Superpopulation and Subpopulation
The frequency of the selected nulceotide for each subpopulation, ordered by superpopulation. The Average GC Content % is also shown. This is computed from all selected genes/isoforms and from all populations.

Options:

Select Gene
Use the search and dropdown box features to find your gene(s) of interest. Click on the gene name to query it. You can query multiple genes by holding either the ‘CTRL’ or ‘command’ key. Next to the genes are the isoform numbers; the longest ones are marked. You must next select an isoform for every gene you’ve selected in order to view the data in the graphs.
GC Content
This is selected by default. It shows the GC Content Across Populations graph.
Nucleotide Frequency
Click this button to view the Nucleotide Frequency by Superpopulation and Nucleotide Frequency by Superpopulation and Subpopulation graphs.

Frequently Asked Questions

  • Why are some of the graphs blank? Because these graphs compare population differences, but the selected gene(s)/isoform(s) only have one allele in the entire 1000 Genome Sample and thus do not differ at all between populations.
  • Why do some of the isoforms say longest? These are the longest isoform for that particular gene. If several different isoforms of a gene all were all equally long, then the first isoform was indicated as the longest. If the gene only has one isoform then it was marked as the longest.
  • How can I upload my own data to CUBAP? This feature is not available due to contraints on the free license of Power BI Desktop and how Power BI connects data to its visuals. To perform your own analyses, please see our github for scripts that will allow you to compute codon usage biases: https://github.com/kauwelab/cubap