Bioinformatics and Genomics

Main article text

 

Introduction

Materials and Methods

Sequence alignment

Testing for positive selection

Testing for conservation

Testing for recombination

Evaluating polymorphic diversity in the pandemics of 2020

Analysis of RNA and protein structures

Results

Positive and negative selection are highly localized within coronavirus genomes

The gene encoding Spike protein is under persistent positive selection

Genes encoding Nsp4 and Nsp16 contain branch-specific signals of positive selection

Recombination does not account for most signals of positive selection

Recent changes in allele frequency may result from positive selection and hitch-hiking

Discussion

Conclusions

Supplemental Information

Derived and ancestral state of substitutions in NSP16.

Derived and ancestral substitutions in Nsp16 relative to ancestral sequence reconstruction, derived are substitutions that occurred in SARS-CoV-2 and Ancestral are substitutions that occurred in the branch leading to Bat-CoV-RaTG13

DOI: 10.7717/peerj.10234/supp-1

Raw data, analytical pipelines, R scripts to analyze the data.

To decompress use unzip command from Linux. Files in BEDGRAPH can be opened by using the libraries and commands that are implemented in the R script.

The raw data shows the rates of evolution for each of the branches and it’s respective p-values and p-adjusted p-values.

DOI: 10.7717/peerj.10234/supp-2

Validation of adaptiPhy performance.

(A) Distribution of selection along the genome of SARS-CoV-2 reference genome. (B) Here we used a strain of SARS-CoV-2 that is currently leading the pandemic sampling (we called D614G strain based on its mutation in the Spike protein). (C) Four artificial mutations added in the vicinity of the mutation at 14,408 bp of D614G strain. (D) Nine artificial mutations added in the vicinity of the mutation at 14,408 bp of D614G strain.

DOI: 10.7717/peerj.10234/supp-3

Structural changes at the protein and RNA level of Nsp4.

(A) Tertiary predictions of Nsp4 of SARS-CoV-2 (red) overlapping with three other species (gray). We observe structural differences in the overlap of Nsp4 between SARS-CoV-2 and Pa-CoV-GD, and SARS-CoV-2 and SARS-CoV (black arrows). (B) Thermodynamic ensemble predictions and MFE mountain plots for Nsp16 in SARS-CoV-2, Bat-CoV-RaTG13, Pan-CoV (Guangdong) and SARS-CoV at 37 °C. Regions under positive selection shaded in red.

DOI: 10.7717/peerj.10234/supp-4

Structural changes at the protein and RNA level of Nsp16.

(A) Tertiary predictions of Nsp16 of SARS-CoV-2 (red) overlapping with three other species (gray). We observe structural differences in the overlap of Nsp16 between SARS-CoV-2 and SARS-CoV (black arrows). (B) Thermodynamic ensemble predictions and MFE mountain plots for Nsp16 in SARS-CoV-2, Bat-CoV-RaTG13, Pan-CoV (Guangdong) and SARS-CoV at 37 °C. Regions under positive selection shaded in red.

DOI: 10.7717/peerj.10234/supp-5

Structural changes at theRNA level of Nsp16 relative to a common ancestor of SARS-CoV-2 and Bat-CoV-RaTG13.

MFE mountain plots for the forward and reverse strands of Nsp16 in SARS-CoV-2, Bat-CoV-RaTG13, the reconstructed ancestor of SARS-CoV-2 and RaTG13, Pan-CoV-GD and SARS-CoV at 37 °C. Regions under positive selection shaded in red.

DOI: 10.7717/peerj.10234/supp-6

Secondary structure of 5’ UTR (first 474 bp) predicted by RNAfold.

The C>U mutation at position 241 is within the SL5B stem-loop structure, indicated with a red arrow. The Pan-CoV-GD 5’ UTR sequence is missing the first 129 bp relative to the Wuhan SARS-CoV-2 reference sequence and so is not included here.

DOI: 10.7717/peerj.10234/supp-7

Site Frequency Spectrum.

The alternative allele frequency in 5,000 SARS-CoV-2 genomes is depicted on the x axis, and the y axis shows the relative frequency of each mutation. The alternative alleles were inferred by referencing SARS-CoV-2 genomes to the NC_045512 reference genome.

DOI: 10.7717/peerj.10234/supp-8

Protein structures of Nsp4, Nsp5, and Nsp16 were predicted by PHYRE2.

Pairwise comparisons of structure similarity by FATCAT shows all were significantly similar.

DOI: 10.7717/peerj.10234/supp-9

Additional Information and Declarations

Competing Interests

The authors declare that they have no competing interests.

Author Contributions

Alejandro Berrio conceived and designed the experiments, performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft.

Valerie Gartner conceived and designed the experiments, performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft.

Gregory A. Wray conceived and designed the experiments, analyzed the data, authored or reviewed drafts of the paper, and approved the final draft.

Data Availability

The following information was supplied regarding data availability:

The raw data are available in the Supplemental Files.

The analytical pipelines and raw data are available in GitHub: https://github.com/wodanaz/adaptiPhy/blob/master/applications/.

Funding

This work did not have sources of direct funding but publishing costs were supported by the COPE fund at Duke University. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.