Introduction

Atypical porcine pestivirus (APPV), first identified in 2015 in USA, is a novel pestivirus causing the congenital tremor (CT) type A-II and showed a worldwide distribution (Arruda et al. 2016; de Groof et al. 2016; Hause et al. 2015; Mosena et al. 2018). In China, our group first reported infection of APPV in swine herds in 2016 and subsequent investigations showed that the disease outbreak in China (Zhang et al. 2017). APPV is an enveloped, single-stranded and positive-sense RNA virus with the genome of 11 to 12 kb encoding a 3635 amino acid polyprotein and this polyprotein was cleaved into various mature viral proteins including four structural (C, Erns, E1 and E2) and eight non-structural proteins (Npro, P7, N2, NS3, NS4A, NS4B, NS5A and NS5B) (Hause et al. 2015; Zhang et al. 2017). Recently, a new uniform naming system for pestivirus was proposed with the format pestivirus X, where X represents a different capital letter for each species, and APPV belongs to pestivirus K (Smith et al. 2017).

E2 is the major envelope glycoprotein of APPV and it is highly immunogenic similar to other pestiviruses (Cagatay et al. 2019). APPV is a highly variable RNA virus with a high genetic variability (up to 21% genetic distance) between different clinical strains and E2 also showed similar characteristic (Postel et al. 2017). Although a subunit vaccine based on E2 of APPV could induce a strong humoral and cellular immune response in mice, there are still a lot of works to be done (Zhang et al. 2018a). A comprehensive analysis to substantiate the genetic characteristics of E2 of this emerging virus is necessary for further research of virus evolution and developing new control strategies.

Synonymous codons do not occur randomly in viruses, prokaryotes and eukaryotes, which is also called codon usage bias (CUB) (Butt et al. 2014). Mutation pressure, natural selection, gene length, tRNA abundance and RNA structure could affect codon usage bias, and the first two of the above factors were thought to be the two most important factors for RNA viruses (Butt et al. 2014). Considering the genetic variability of APPV complete genomes and its E2 gene as well as gaps for codon analysis, the codon usage bias was firstly investigated to understand the mechanism of codon distribution and evolution of the virus.

Material and Methods

Sequence Data and Recombination Analysis

After removing repetitive and cloning sequences, E2 gene coding sequences from 71 APPV isolates and 60 complete coding sequences were collected from GenBank database (http://www.ncbi.nlm.nih.gov) available up to November 2019. All coding sequences showed worldwide distribution and the detailed information of GenBank accession number, strain name, collection year and country were given in Supplementary Table S1. As possible recombination events could influence the codon usage bias of genomes or genes, recombination analysis was performed by Recombination Detection Program 4.0 (RDP4) using RDP, Bootscan, MaxChi, GENECONV, Chimaera, SiScan and 3Seq methods to validate recombinations in these strains (Martin et al. 2015). The P value cut-off was set to 0.05 and the bonferroni correction was applied in all analyses. Genomes were considered to be recombinants if at least three of the above seven algorithms consistently conformed, and all recombinants would be excluded from further analysis.

Nucleotide Content and Codon Usage Composition

The overall nucleotide compositions (A%, C%, U% and G%) in each sequence were analyzed by Bio Edit (version 7.0.9.0). The nucleotide compositions at the third position (A3s, U3s, C3s and G3s) were computed using Codon W 1.4.4 and GC content at third codon positions (GC3s) were also calculated by Codon W.

Relative Synonymous Codon Usage (RSCU)

RSCU value refers to the ratio of the usage frequency of one specific codon to the usage frequency of all expected synonymous codons for the same amino acid, which is widely applied to standardize the codon usage bias between different sequences (Chen et al. 2014). If the RSCU value of a codon is 1, it represents that there is no codon bias and the synonymous codons are used equally for one amino acid. If the RSCU value > 1, it shows positive codon usage bias while when the RSCU value < 1, it shows negative codon usage bias. Moreover, if RSCU is more than 1.6 or less than 0.6, it is considered to be over-represented or under-represented (Singh and Pandey 2017; Taylor et al. 2017). AUG, UGG and the three stop codons (UAG, UAA and UGA), which have no synonymous codons, were excluded from the RSCU analysis (Butt et al. 2016). The RSCU index was calculated using the following equation:

$$RSCU = \frac{{X_{ij} }}{{\mathop \sum \nolimits_{j}^{{n_{i} }} X_{ij} }}n_{i}$$

In this formula, Xij is the number of the ith codon for the jth amino acid, and ni represents the degenerate numbers of a synonymous codon which encode the jth amino acid. The RSCU value of each codon was calculated using the online software Emboss: cusp (http://emboss.toulouse.inra.fr/cgi-bin/emboss/cusp). Swine is the host of the APPV, and the RSCU values of swine were downloaded from the codon usage database (http://www.kazusa.or.jp/codon/).

Effective Number of Codon (ENC)

The ENC value is a simple and absolute measure of codon usage bias in genes and genomes, which can quantify the degree of CUB and reflect the extent of preference of synonymous codons. ENC values range from 20 to 61. The value of 61 indicates that there is no CUB, and the value of 20 indicates that the codon bias is at a maximum while only one codon was used for each amino acid. It is generally considered that the gene has an obvious codon usage bias if the ENC value is equal to or less than 35.

The ENC value was calculated using the following equation:

$${\text{ENC}} = 2 + \frac{9}{{F_{2} }} + \frac{1}{{F_{3} }} + \frac{5}{{F_{4} }} + \frac{3}{{F_{6} }}$$

where Fi (i = 2, 3, 4, 6) is the average of the Fi values for i-fold degenerate codon families. The Fi value is calculated using the following equation:

$$F_{i} = \frac{{n\mathop \sum \nolimits_{j = 1}^{i} \left( {\frac{{n_{j} }}{n}} \right)^{2} - 1}}{n - 1}$$

where n is the total number of the observed value of codons for one amino acid and nj stands the numbers of the particular codon for that amino acid.

To determine the influencing factors of codon usage bias, an ENC-plot, between the GC3s and ENC values, was generated using Graph Pad Prism 6.0. The expected ENC for each GC3s was calculated as follows:

$${\text{ENC}}_{{{\text{expected}}}} = 2 + s + \left( {\frac{29}{{s^{2} + \left( {1 - s} \right)^{2} }}} \right)$$

where s represents the frequency of G + C of synonymous codons in the third codon position (GC3s). In ENC-plot, if the codon usage is only constrained by the GC3s, the observed ENC values just lies on or around the expected curve, and it also indicates that mutation pressure is almost the only factor for evolution. Whereas if several factors constrain the codon usage, the observed ENC values lies far lower than the expected curve (Zhang et al. 2018b).

General Average Hydropathicity (Gravy) and Aromaticity (Aroma)

Codon W 1.4.4 was used to calculate the Gravy and Aroma values. Gravy and Aroma showed the frequencies of hydrophobic and aromatic amino acids which had much effect on codon usage pattern (Xu et al. 2015). They are also two major indexes on natural selection.

Principal Component Analysis (PCA)

PCA is a multivariate statistical method to identify major variation trends by analyzing the relationship between variables and samples (Bera et al. 2017; Chen et al. 2014). To further explore the trends of codon usage pattern among the different APPV strains, the PCA was performed with Statistic Package for Social Science 22 (SPSS). In the analysis, the RSCU values of each APPV strain were distributed into a 59-dimensional vector corresponding to the 59 synonymous codons (excluding the codons of AUG, UGG and the three-terminal codons). In other words, RSCU values were transformed into uncorrelated variables. PCA plots were constructed with the first two axes which were accounting for most of the component influencing the codon usage variation among genes. Figures in PCA were drawn by Graph Pad Prism 6.0.

Neutrality Plot Analysis

The neutrality plot analysis was carried out to explore the effects of natural selection and mutation pressure in shaping the CUB by regression of GC12s on GC3s (Chakraborty et al. 2019). Figures were drawn using Graph Pad Prism 6.0. In the plot, each point represented an independent APPV strain. If the slope of the regression line was close to 1, it indicated that the main force was mutation pressure for the CUB, while a slope of 0 indicated that the dominant role was natural selection. Moreover, if the slope of the regression curve is close to ± 0.5, it was considered no or weak external selection pressure.

Correlation Analysis

Correlation analyses were conducted by Karl Pearson's method to investigate the relationships among the nucleotide compositions, codon compositions, Gravy, Aroma and principal axes.

Result

Recombination Analysis

Two recombination events of complete sequences, containing MG792803 and MH493896, were detected by the RDP software. Those two recombinant isolates were taken out and surplus coding sequences were carried on to the next analysis.

Nucleotide Compositions of the APPV E2 Gene and Complete Genomes

Analyses of the composition of the E2 coding sequences showed the results as follow: (1) The mean values of nucleotide from high to low were A% (30.75 ± 0.48), G% (27.42 ± 0.51), U% (21.45 ± 0.52) and C% (20.36 ± 0.48). (2) The mean values of the codon composition at the third position from high to low were A3s (39.91 ± 1.85), C3s (30.87 ± 2.04), G3s (28.95 ± 1.61), U3s (26.18 ± 2.10). (3) The mean compositions of AU% (52.21 ± 0.64) were above the GC% (47.79 ± 0.64). The same case was also identified with complete genomes. All the detailed information was listed in Supplementary Tables S2 and S3.

RSCU and ENC Analysis

RSCU values of 59 synonymous codons were calculated to investigate the codon usage bias of APPV complete genomes and E2 gene, and the effect of pig on codon usage pattern. As shown in Table 1, among the 18 frequently used synonymous codons, six were commonly used between the host and both APPV complete genomes and E2 gene. Moreover, twelve preferred codons for APPV E2 gene were A/U-ended (A-ended: 8; U-ended: 4) and six were G/C-ended (G-ended: 2; C-ended: 4), and the result for complete genomes was similar (A-ended: 8; U-ended: 3; G-ended: 3; C-ended: 4). Additionally, seven codons of APPV E2 gene were over-represented (mean RSCU value > 1.6) and up to thirteen codons were under-represented (mean RSCU value < 0.6). Meanwhile, only two codons of complete genomes were over-represented and seven codons were under-represented. The above findings showed that codon usage bias appeared in APPV genomes and the bias of E2 gene was more obvious compared with complete genomes.

Table 1 Overall RSCU of collected sequences of the APPV E2 gene and complete genomes

The ENC values of E2 gene ranged from 49.15 to 61.00 with a mean value of 52.74 ± 2.69 (mean ± SD), and the values of complete genomes ranged from 53.85 to 55.14 with a mean value of 54.52 ± 0.25. The observed result revealed a slight low CUB compared with a high ENC value (> 40). Of note, the ENC mean value of APPV complete genomes was higher than the value of E2 gene and the standard deviation was much lower than the value of E2 gene, which suggested that CUB for APPV complete genomes was more stable and the bias was lower than E2 gene. The detailed information of ENC values was shown in Supplementary Tables S2 and S3.

ENC-Plot Analysis and Correlation Analyses of Nucleotide Compositions and ENC

ENC-plot was used to investigate the mutational pressure in shaping codon usage bias. As shown in Fig. 1, most of the points were lower than the standard curve no matter APPV complete genomes or E2 gene, which indicated that several factors, including natural selection, gene length and RNA structure, influenced the codon usage bias besides mutational pressure. In particular, one strain of E2 gene, which isolated from China, was slightly higher than the expected curve, and other a few values also lied around the curve, which revealed that several strains had an obviously low codon usage bias and a relatively instable change existed in all E2 gene.

Fig. 1
figure 1

The relationship between the ENC values and GC3s. a ENC plots for E2 gene showing the relationship between the ENC values and GC3s. The result showed that most of the points were lower than the standard curve, which indicates mutational pressure and other factors both influenced the codon usage bias. b ENC plots for APPV complete genomes. The result showed that all the points were lower than the standard curve. The larger version was indicated by the arrow

To further determine the effect of mutational pressure on the codon usage bias, the correlation analyses between nucleotide compositions and ENC were carried out for APPV complete genomes and E2 gene (Table 2). The analyses showed that strong correlations (r > 0.5) were more than half among each other and the correlations were more frequently appeared in analyses for APPV complete genomes. Furthermore, the ENC value had a weak correlation with each content. Taken together, the above result suggested that mutation pressure and natural selection both affect the codon usage bias of the APPV and natural selection had a more obvious influence for E2 gene compared with complete genomes.

Table 2 correlation analyses of nucleotide compositions and ENC

Principal Component Analysis (PCA)

PCA with the RSCU values of the coding sequences were carried out to construct the distributions of each vector (Fig. 2a, b). For E2 gene, the first axis accounted for 23.23% of the data inertia and the second axis accounted for 17.86%. For APPV complete genomes, the first axis accounted for 34.67% and the second axis accounted for 19.69%. Based on the above results, the first two axes can explain 41.10% and 54.35% respectively, which suggested that they had reached the level to show characteristics of the overall codon usage trend of E2 gene and complete genomes. Therefore, the PCA plot of the first axis and the second axis was drawn according to the different countries (Fig. 2c, d). Points of different countries were dispersed, confirming that mutation pressure was one of the factors playing a role in shaping the CUB. In particular, the majority of Chinese strains were gathered into two groups with only several strains showing diversity, revealing that besides mutation pressure, the contribution of geographic selection (natural selection) in shaping the CUB in Chinese APPV strains is greater than that in other countries.

Fig. 2
figure 2

Principal component analysis. a, b The distributions of the first 20 vectors by PCA for APPV E2 gene and complete genomes, respectively. Columns represent the relative inertia and the curve represents the cumulative inertia. c, d The PCA plot 1st axis against 2nd axis for E2 gene and complete genomes. Regarding E2 gene, the first axis accounted for 23.23% of the data inertia and the second axis accounted for 17.86%. Regarding APPV complete genomes, the first axis accounted for 34.67% and the second axis accounted for 19.69%. Different countries are represented by different colors

Correlation analyses were also performed to determine the correlations between the first two axes and codon compositions of E2 gene and complete genomes (Table 3). Whereas, the results indicated that the majority of correlations between codon compositions and axes were weak, which confirmed that mutational pressure was an unimportant factor contributing to the codon usage bias for the virus.

Table 3 Summary of correlation between the first two axes and nucleotide constraints

The Role of Natural Selection in the Codon Usage Bias

To determine the role of natural selection, correlation analyses were applied for evaluating the relationship between Gravy and Aromaticity values and the codon bias (Axis1, Axis2, ENC, GC3s and GC). As shown in Table 4, most correlations were significant with the P values much below 0.01. Particularly, for E2 gene, strongly significant correlations were found between Gravy and Axis1 as well as Gravy and ENC. Above analyses suggested that the hydrophobicity and aromaticity of amino acid were associated with the codon usage variation in APPV genomes, and revealing that natural selection shaped the codon usage pattern with an important role.

Table 4 Correlation analyses among AROMO, GRAVY, the first two axes, ENC, GC3s and GC

Neutrality Plot Analysis

To determine the main factor shaping the codon usage pattern of the APPV E2 gene and complete genomes, neutrality plot analysis was performed by comparing the value of GC12s and GC3s (Fig. 3a, b). Regarding APPV E2 gene, the GC3s was significantly correlated with GC12s (r = − 0.243, P < 0.0001) and the slope of the regression line indicated that relative neutrality (mutation pressure) was 5.34% and relative constraint (natural selection) was 94.66%. Regarding APPV complete genomes, a significant correlation was also found between GC3s and GC12s (r = − 0.495, P < 0.0001) and the slope of the regression line indicated that relative neutrality (mutation pressure) was 10.30% and relative constraint (natural selection) was 89.70%. Therefore, natural selection was the main force while mutation pressure was a minor force influencing the codon usage pattern of the APPV E2 gene and complete genomes.

Fig. 3
figure 3

Neutrality analysis with GC3s plotted against GC12s. a, b The Neutrality analysis for E2 gene and complete genomes, respectively. The regression line was represented by the straight line and the regression equation was showed on the plot

Discussion

APPV is an emerging virus showing worldwide distribution which regarded as a threat to global swine health. Due to eminent genetic differences and biological properties, APPV was thought to be different from established pestiviruses (Postel et al. 2017; Zhang et al. 2017). Though the variability and evolution of APPV had been investigated previously, a deep and systematic investigation should be given to fill the gaps (Pan et al. 2019). E2 is the major envelope glycoprotein of APPV and the crucial target for vaccine development while its codon usage bias during evolution is still obscure. Systematic analysis of the codon usage bias of E2 gene and complete genomes will help to clarify related questions.

Viruses were constantly evolving to adapt to the environment and host. Codon usage bias is an important manifestation of gene evolution and ENC is a simple measure of the degree of codon usage bias. Generally, the higher the ENC value, the lower the codon usage bias. For the E2 gene of APPV, ENC values ranged from 49.15 to 61.00, and the mean value of ENC was 52.74 ± 2.69, which was higher than that in other members of pestivirus, including CSFV (52.33) and BVDV (51.43) (Chen et al. 2017; Ma et al. 2018). It is worth note that the ENC value of one Chinese APPV strain even reached to maximum, which suggested that no codon usage bias in this strain. Low codon usage bias allowed the virus to make use of several codons for each amino acid, which was a benefit for viral replication in the host cells and persistent infection (Chen et al. 2014; Zhang et al. 2018b). This theory may account for the phenomenon that adult sows and part of piglets did not display any symptoms of CT with APPV viremia (Munoz-Gonzalez et al. 2017).

Generally, mutational pressure was a more crucial factor compared with natural selection for RNA viruses (Peixoto et al. 2003; Tao and Yao 2019). In this study, although correlation analyses of nucleotide compositions and ENC showed the part of the strong correlations (r > 0.5), weak correlations were also detected especially between ENC values and each content. Secondly, almost points were below the standard curve no matter APPV complete genomes or E2 gene, but a few values also lied around the curve. Additionally, significant correlations were found with Gravy and Aromaticity values, which were two major indexes on natural selection. Finally, neutrality plot analysis was performed and the result showed that relative constraint (natural selection) was 89.70%. Overall, we can indicate that natural selection was the main force and mutational pressure played an important role in determining the CUB for APPV genomes and its E2 gene.

In summary, a slight low codon usage bias was displayed in E2 gene of APPV and natural selection was the main force while mutation pressure was a minor force influencing the codon usage pattern of APPV complete genomes and E2 gene. The results of codon usage pattern of APPV genomes will provide valuable basic data for explaining the evolution of this emerging virus.