Introduction

Capsicum belongs to the family Solanaceae and comprises more than 200 species. Capsicum annum L., Capsicum chinense Jacq., Capsicum frutescens L., Capsicum pubescens L. and Capsicum baccatcm L. are the most frequently cultivated species. Mexico, America, Australia, Britain, Bangladesh, Sri Lanka and India are the major producers of chilli. Bhut-Jolokia (C. chinense Jacq.), is substantially grown in North -East area of India, specifically in Assam, Nagaland and Manipur. In 2007, Bhut-Jolokia made its way to the Guinness World Records as the world’s hottest chilli with 1,001,304 Scoville heat units (SHU’s) [1]. It is commonly known as Bhut-Jolokia (Ghost Chilli) or Bih Jolokia (Poison chilli) in Assam, U-morok (Tree Chilli) in Manipur and Naga-Jolokia in Nagaland. Random amplification of polymorphic DNA (RAPD) marker analysis revealed Bhut-Jolokia to be an inter specific hybrid of C. chinense and C. frutescens [1]. Although, based on phenotype, molecular similarity of internal transcribed spacer (ITS) regions and proteomics evaluation, it has been designated as a distinct species (C. assamicum) [2]. Bhut-Jolokia is used in the form of spice and also known to possess great medicinal properties [3]. However, its production is seriously affected by viral diseases primarily caused by begomoviruses. Plant begomoviruses are the largest dicot infecting genera of the family Geminiviridae and are extensively transmitted by white fly (Bemisia tabaci Genn.). Since viruses are obligatory parasites, the need of host and vector become indispensable for their replication and transmission. The family Geminiviridae is classified into nine genera on the basis of genome structure, mode of spread and varied hosts, of which, begomoviruses are the largest which affect food crops like tomato, potato, chilli, brinjal, radish etc. Chilli leaf curl disease is the most commonly found infection which causes leaf curl in the early stages leading to stunted growth, abscission of flower buds and obstruction of pollen development. The viral infection further enhances the infestation of pests such as thrips and mites causing complete loss of crops [4]. More than twenty species of geminiviruses are identified to be associated with leaf curl disease of chilli/pepper. These include Chilli leaf curl virus (ChiLCV), Chilli leaf curl Palampur virus (ChiLCPV), Tomato leaf curl New Delhi virus (ToLCNDV), Tomato leaf curl Joydebpur virus (ToLCJoV), Chilli leaf curl Vellanad virus (ChiLCVeV) etc. [5,6,7]. The typical symptoms for leaf curl disease are upward and downward leaf curling, puckering of leaves, leaf rolling, yellowing of veins and stunted growth. Association of betasatellites further aggravates disease development through interference with host defense pathways and cause severe phenotypic abnormalities resulting in complete yield loss in several economically important plants [8,9,10,11].

The present paper reports two full-length DNA-A genome of cotton leaf curl multan virus (CLCuMuV) from infected chilli cv. Bhut-Jolokia plants collected from eight different places of Manipur. Also, the screening of viral population led to the isolation of associated Tomato leaf curl Patna betasatellite (ToLCuPaB) with CLCuMuV. The phylogenetic analysis of cloned begomovirus molecules signified their relatedness to CLCuMuV and ToLCuPaB. Further analysis of recombination breakpoints of the begomovirus genome and the per cent GC content of the genome linked with the putative recombination site, revealed the probable pattern of genetic variation occurred during the course of evolution. To the best of our knowledge, this is the first report of association of CLCuMuV from infected Bhut-Jolokia plants marking the spread of the virus to a new host in the North-East region of India.

Materials and methods

Field survey and sample collection

Chilli “Bhut-Jolokia” growing fields were surveyed during the year 2017 at different places viz., Thongju, Heingang, Lamphel, Kiyamgei, Khurai, Koirengei and Andro of Manipur state, India (Fig. 1A). Infected leaf samples were collected based on their symptoms of leaf curling, leaf puckering, and leaf thickening, yellowing of veins and curling of stem (Fig. 1B). The samples were stored at − 80 °C until processed for the detection and cloning of viral genomes.

Fig. 1
figure 1

Field locations and sample collection. A Map displaying geographical regions of chilli var. Bhutjolokia fields used for sample collection at Manipur, (a) Thongju (b) Heingang (c) Lamphel (d) Kiyamgei (e) Khurai (f) Heingang I (g) Koirengei and (h) Andro. B Pictorial representation of infected (I, II, III, IV) and healthy plants of Bhut-Jolokia (V)

Total DNA extraction and rolling circle amplification (RCA)

Total DNA extraction was carried out using the CTAB method [12, 13], and the integrity and quantity were checked using agarose gel electrophoresis and spectrophotometer (Thermo Scientific, USA) respectively. The isolated total DNA samples were then subjected to Rolling Circle Amplification (RCA) using ϕ29 DNA polymerase (Thermo Scientific, USA) to specifically enrich the begomoviral DNA and associated satellite components. The augmented RCA products were further used for individual polymerase chain reactions (PCR) using Phusion high-fidelity DNA polymerase (Thermo Scientific, USA) for amplification of DNA-A, DNA-B, alphasatellite and betasatellite molecules using degenerate primer pairs (available in Molecular Virology Lab, JNU) (Supplementary Table S1).

Cloning of viral genomes

PCR amplified products were purified through the gel-extraction kit (Gene JET™ Gel Extraction Kit, Thermo Scientific, USA) and ligated into blunt-end cloning vector pJET1.2/blunt (Thermo Scientific, USA). The ligated product was used to transform E. coli strain DH5-α competent cells. Transformed clones were randomly selected from each transformation experiment and grown in liquid Luria–Bertani (LB) medium and used for plasmid isolation using the alkaline-lysis method [14]. The plasmids were subjected to restriction digestion with EcoRV for DNA-A clones and KpnI for betasatellite clones.

Restriction fragment length polymorphism (RFLP) and sequencing of viral genome

The positive viral clones were further digested with various restriction enzymes to analyse the pattern for any variation in the viral molecules. The clones with DNA-A molecule were screened using BamHI, PstI, EcoRI and HindIII enzymes whereas clones with betasatellite molecule with EcoRI and PstI (GeNei, India; Thermo Scientific, USA). Up to 24 clones were assessed from each sample to study the polymorphism and the digested products were resolved on 1.2% agarose gel. Since, the profile obtained after restriction analysis showed a similar pattern, one representative clone from each sample was given for sequencing by Sanger’s sequencing at commercial DNA sequencing facility, University of Delhi South Campus, India.

Sequence analysis

The sequence chromatograms obtained were manually scanned and assembled based on the overlapping sequences to assemble the complete viral genome. The reconstructed viral sequences were analysed using NCBI ORF finder (https://www.ncbi.nlm.nih.gov/orffinder/) to examine the coding regions. Sequences were then subjected to standard nucleotide BLAST (blastn) to find the homologous sequences submitted in the GenBank. Sequence datasets were then aligned using multiple sequence comparison by log-expectation algorithm (MUSCLE) followed by molecular evolutionary analysis by Molecular Evolutionary Genetic Analysis tool (MEGA-X) [15].

Phylogenetic relatedness of the sequences obtained was analyzed using the neighbor-joining (NJ) method with 1000 bootstrap values. Based on the earlier literature, the genome sequences of begomoviruses known to infect chilli, cotton, okra and hollyhock plants were selected for the analysis (Supplementary Table S2). Per cent nucleotide identity-based heat maps for both DNA-A and associated betasatellite were generated using sequence demarcation tool (SDT), version1.2 [16].

As geminiviruses are well known for their recombination potential, the sequences were examined for possible recombination events and breakpoints using an algorithm set comprising RDP, GENECONV, Bootscan/Recscan, MaxChi, Chimaera, SiScan and T-3Seq assimilated in recombination detection program (RDP) version 4.97 [17]. Further, the begomoviral genome was analyzed for its GC content using per cent GC-plot graph generated through Artemis DNA plotter analysis tool v18.1.0. (http://www.sanger.ac.uk/Software/Artemis) [18]. Since, both isolated DNA-A molecules were similar at genome sequence level, one of the DNA-A sequences (GenBank accession no. MT886450) was used for further analysis.

The evolutionary divergence between the sequences was calculated by maximum composite likelihood (MCL) model, by using the variance estimation method with 1000 bootstrap values, which include both transition and transversion substitution methods available in MEGA-X tool.

Results

Cloning and sequencing of begomoviral DNA and associated satellite molecule

The viral genome and its associated satellite molecules were PCR amplified post RCA-mediated viral ssDNA enrichment. The desired bands of size ≈ 2.8 and ≈ 1.4 kb were obtained using degenerate primer pair designed for DNA-A and betasatellite, respectively (Supplementary Fig. S1). However, DNA-B and α-satellite molecules could not be amplified from the collected samples. These results implicated the presence of monopartite begomovirus complex. Consequently, the PCR amplified molecules were cloned in pJET1.2 cloning vector and the clones so obtained harbored the desired DNA product of ≈ 2.8 and ≈ 1.4 kb as confirmed by restriction digestion using EcoRV and KpnI for DNA-A and betasatellite, respectively (Supplementary Fig. S2 & S3A). Further, enzymatic digestion using restriction enzymes to screen the positive clones, revealed the presence of similar molecules in all the clones (Supplementary Fig. S3B & S4) and therefore, one representative clone from each sample was sent for sequencing. Since, the partial sequences of all the samples were similar only two DNA-A and one betasatellite molecule were fully sequenced and analyzed bioinformatically to study their phylogeny, divergence and recombination pattern.

Phylogenetic relatedness

The sequences so obtained were analyzed to find the homology with already submitted sequences in the database using NCBI BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi). Begomovirus sequences isolated from infected chilli (Bhut-Jolokia) plants (GenBank accession nos. MT886450 and MT886451) with leaf curl symptoms were found to be 99.6% similar with the ‘Indian’ strain of Cotton leaf curl Multan virus (CLCuMuV-IN; Accession no. MG373556) which was reported earlier from an infected hollyhock plant (family, Malvaceae). In addition to that, the betasatellite (GenBank accession no. MT886452) molecule cloned from infected chilli plant was found to be 99.8% identical with Tomato leaf curl Patna beta satellite (ToLCPaB, Accession no. EU862324).

Phylogenetic analysis by MEGA-X also signified the homology of isolated DNA-A (GenBank accession no. MT886450 and MT886451) with CLCuMuV-IN and betasatelite (GenBank accession no. MT886452) with ToLCPaB. The DNA-A sequences formed a separate clade with CLCuMuV-IN:ND:Hollyhock (GenBank accession nos. MG373556 and MG373551) and also found to share a common ancestor with CLCuMuV-PK:Cotton (GenBank accession no. EU365616) sequence (Fig. 2A). The whole cluster was more closely related to other cotton leaf curl virus species such as CLCuBuV (GenBank accession no. AM774303), CLCuKoV (GenBank accession no. GU385879), CLCuBaV (GenBank accession no. AY705380), and CLCuAlV (GenBank accession no. AJ002452), reported from Pakistan and India. However, CLCuMuV: Manipur isolates (MT886450 and MT886451) were distinctly related with ChiLCV isolates possibly because CLCuMuV was earlier known to infect members of Malvaceae family and may have emerged recently to broaden its host range and infect chilli belonging to Solanaceae. The associated betasatellite (GenBank accession no. MT886452) formed a clade with ToLCuPaB-IN:Bihar:tomato (GenBank accession no. EU862324) and also found to be closely related to ToLCuPaB-IN:Bihar:tobacco (GenBank accession no. HQ180393) (Fig. 3A). The entire cluster was also found to share a common origin with other betasatellites (GenBank accession no. JN663863, HM007112, JN663862 and EF190216) isolated from chilli. However, the newly isolated betasatellite (GenBank accession no. MT886452) was observed to be distinctly related to the betasatellites (GenBank accession nos. AY744380, JF502376, JF502377, AY083590, JF502393) isolated from cotton. Analysis of available viral sequence dataset also suggests the occurrence of CLCuMuV in several distinct hosts such as hollyhock, chilli, cotton, okra and potato, and amaranthus species [19].

Fig. 2
figure 2

Phylogenetic relatedness and color coded matrix of pairwise similarity scores of isolates of CLCuMuV-IN:Manipur:Chilli. A maximum-likelihood dendrogram generated through MEGA-X B Per cent nucleotide identity based heat map created using sequence demarcation tool (SDT), version 1.2. Tomato mottle virus (ToMoV) (GenBank accession no. AY965901) was chosen as an outgroup for the analysis

Fig. 3
figure 3

Phylogenetic relatedness and color coded matrix of pairwise similarity scores of cloned betasatellite, ToLCuPaB-IN:Manipur:Chilli. A Maximum-likelihood dendrogram generated through MEGA-X B Per cent nucleotide identity-based heat map created using sequence demarcation tool (SDT), version 1.2. Ageratum leaf curl disease associated betasatellite (GenBank accession no. FM164738) was considered as an outgroup for the analysis

The color coded matrix of pairwise similarity scores based on per cent nucleotide identity for DNA-A and betasatellite have been generated using SDT which further confirmed that the cloned CLCuMuV-IN:Manipur:Chilli (Accession nos. MT886450 and MT886451) belonged to the CLCuMuV (Fig. 2B) and the betasatellite (Accession no. MT886452) to ToLCuPaB (Fig. 3B).

GC plot analysis

Guanine-cytosine (GC) content refers to the proportion of guanine (G) and cytosine (C) in a given stretch of the genome. The innermost circle and bar represent the GC-plot graph with above-average (green color) and below average (purple color) GC content of the genome, with a window size of 100. In CLCuMuV genome (GenBank accession no. MT886450), low GC content was found to be in the AV1 region (Fig. 4A) whereas the GC distribution plot of ToLCuPaB revealed the presence of low GC content between the end of βC1 ORF and at the beginning of the A-rich region (Fig. 4B). Since, the low GC content region represents probable hot-spot for the recombination, both sequences were subjected to RDP to find the correlation between the GC content and potential recombinant breakpoints.

Fig. 4
figure 4

Genome analysis of isolated viral molecules. Outermost circle represents the nucleotide position in the circular genome, Inner colour code represents respective ORFs encoded by the DNA-A (A) and betasatellite (B), innermost circle and bar represent the GC-plot with above average (Green bar) and below average (Purple bar) GC content of the genome, with window size of 100 showing the highest and lowest possible regions of recombination respectively. This analysis was performed using Artemis DNA plotter version 18.1.0, (http://www.sanger.ac.uk/Software/Artemis). Illustrative representation of recombination patterns observed in the DNA-A (C) and betasatellite (D) genomes. Since the recombination events were similar in both the DNA-A sequences (MT886450 and MT886451), a single graphical image is shown. Details of probable recombination pattern is listed in Supplementary Table S3

Recombination analysis

The isolated DNA-A sequence represented possible recombination events in the AV1 region with high statistical significance. The analysis revealed that the isolated DNA-A molecules have CLCuMuV-PK (EU365616) and CLCuBuV-IN (KM070821) as major and minor parent, respectively. The p value ranges from 4.004 × 10−25 to 1.290 × 10−63 suggesting the significance of the predicted recombination events (Supplementary Table S3). The breakpoint positions were also found common between most of the sequences analysed signifying their common origin and relatedness. In the case of betasatellite, the recombination breakpoint was observed in the A-rich region. A pictorial representation of the data predicted by the RDP tool is shown (Fig. 4C, D). This deeper genome analysis by recombination and phylogenetic relatedness further suggests that the CLCuMuV isolates were present in neighbouring countries such as Pakistan and might have been introduced in India via the Western part which is now spreading to North-East region.

Evolutionary divergence analysis

Virus populations are very dynamic and influenced by several factors, such as population heterogeneity, geographical distributions, broad host range, and mixed infections. Estimation of the pairwise distance between the diverse begomovirus groups was carried out using ten distinct begomovirus isolates for comparison with the present CLCuMuV-IN:Manipur:Chilli (Accession no. MT886450). The data is signified by p value, with 95% confidence interval score (alpha = 0.05). Interestingly, minimum sequence diversity (17 nucleotides) was observed with CLCuMuV-PK:Cotton (Gen Bank accession no. EU365616) and maximum (74 nucleotide) with ChiLCV-PK IN:Chilli (GenBank accession no. JN663861) (Table 1). This further suggests the possibility of the cloned sequence isolated from infected plants of chilli var. Bhut-Jolokia to be more closely related to CLCuMuV.

Table 1 Estimation of evolutionary divergence between sequences of selected begomoviruses

Discussion

Bhut-Jolokia (a variety of hot chilli) is an important spice crop well-known globally for its highest pungency and great medicinal properties [3]. Bhut-Jolokia is an interspecific hybrid chilli pepper cultivated in few states of North-East India, including Assam, Nagaland, Manipur and Arunachal Pradesh. The pungency of Bhut-Jolokia is attributed to the presence of capsaicin and its analogs broadly known as capsaicinoids which possesses anti-inflammatory and antioxidant properties. These qualities resulted into its increased demand which is mostly hampered by leaf curl disease caused by begomoviruses. Recent studies have shown the effect of viral infection on secondary metabolite accumulation in plants [20]. In saffron which is highly appreciated for its colour and aroma, the presence of viral disease has been shown to modify the important secondary metabolite content responsible for its valuable properties [21, 22]. These reports suggest that virus infection not only affect the total yield loss but can also reduce the crop quality at the nutritional level in the surviving plants. Hence, a continuous monitoring of the infected field samples and thorough molecular analysis is the priority for developing management strategies and to prevent any epidemic.

The presence of CLCuMuV and its associated ToLCPaB satellite molecules from infected Bhut-Jolokia plants has been unveiled in the present study. The genome organization suggests the existence of old world monopartite begomoviral molecules at one of North-Eastern states of India, Manipur. Cloning, RFLP analysis and sequencing data further confirmed the presence of similar begomoviral molecules in all the samples collected at eight different locations in Manipur.

CLCuMuV is well-documented to cause cotton leaf curl disease (CLCuD) in several countries. In India, cotton plant infected with CLCuD was first reported at IARI, New Delhi in 1989. Subsequently, it was also isolated from other Malvaceous crops at North-Western states of India such as Rajasthan, Punjab and Haryana [23, 24]. Spreading of CLCuMuV from Pakistan to India and other adjoining countries are known earlier [25, 26]. It has been estimated that the cotton infecting begomoviruses are introduced in India from Africa, Pakistan, Philippines and China [27,28,29]. Furthermore, the genetic relatedness of the molecules using MEGA-X tool, taking into account the major chilli-infecting begomoviruses, suggested the homology of isolated CLCuMuV: Manipur isolates (GenBank accession nos. MT886450 and MT886451) with Cotton leaf curl Multan virus reported from New Delhi, India from an ornamental plant hollyhock [30] and the whole cluster was found more closely related to other cotton leaf curl virus isolates/strains such as CLCuBuV, CLCuKoV, CLCuBaV and CLCuAlV reported from Pakistan and India. As per the current ICTV recommendations [31] for the begomoviruses, the cloned components showed pairwise nucleotide identity above the threshold level (cutoff values of a novel species are ≥ 91%) and therefore, belong to another strain of CLCuMuV and ToLCPaB. However, CLCuMuV: Manipur isolates formed a separate clade from ChiLCV possibly because the former was known to infect the members of family Malvaceae, whereas chilli belongs to the Solanaceae family. This also highlights the spreading of reported CLCuMuV to a new host and an emergence of a novel and successful pathogen of chilli. Betasatellites are known to exhibit host-specific adaptation in crop plants and occasionally result in extending host range across families [32, 33].

In India, very few CLCuMuV (GenBank accession no. MF737345) isolates have been reported from infected chilli plants, however, a detailed analysis is still lacking. Evolution is often linked with recombination, and our data further suggests that CLCuMuV-PK might act as the major and CLCuBuV-PK as the minor parent donor for the recombination processes and the hot-spot region lies between the coat-protein (AV1) regions of the begomovirus genome. It has been already estimated and reported that the nucleotide substitution rate of CLCuMuV-CP (1.6 at codon position 1) was quite higher among other cotton leaf curl viruses, suggesting high mutation frequency in the CP region [27]. The most recent outbreak of CLCuD took place in 2011 [25], where Cotton leaf curl Kokhran virus (CLCuKuV) was found to be one of the putative parent donors in recombination analysis. Lately, an inter-species recombinant CLCuMuV-Rajasthan has been reported as the prime begomovirus to cause CLCuD in the North-West India [34]. These reports further suggest a prospective of intra or inter-strain species recombination that could be a potential threat for many economically important crops [35]. In case of the associated betasatellite molecule (ToLCuPaB), the recombination breakpoint was found in the A-rich region. Recombination event in Tomato leaf curl Patna betasatellite has previously been reported which further strengthens our observation [36]. Earlier reports suggest a viable mechanism of pseudorecombination in ToLCuPaB with ToLCGV DNA-A which resulted into successful establishment of infection in both host (tomato) and non-host (Nicotiana benthamiana) [36]. This proposes the possibility of similar event which may have resulted into infection even with a non-cognate CLCuMuV (DNA-A). A detailed insight into various betasatellites associated with chilli leaf curl disease causing begomoviruses [6] indicates a likelihood of ToLCuPaB helping the non-cognate CLCuMuV (DNA-A) to cause infection in the host. The GC content analysis also suggests the presence of below average GC content (the lowest) at the coat-protein region (AV1) in DNA-A molecule and the region spanning the A-rich region in betasatellite molecule, which further supports the possibility of viral genome to undergo recombination [37]. The low GC region has also been reported to be potential recombination site in human adenovirus which may facilitate the evolution [38]. The nucleotide sequence with high GC content depicts more stability because of the triple hydrogen bond and stacking interactions between the nitrogenous bases [39]. Higher the number of bonds, higher will be the energy required to break it. However, GC dinucleotide repeat is also linked with topology and orientation of DNA strand. In some viruses such as herpes simplex virus (HSV) genome, higher GC content is also found to be at the intergenic region [40] thus, suggesting a possible role in the virus evolution and its link with pathogenesis. Furthermore, the analysis of evolutionary divergence suggests that begomoviruses associated with leaf curl disease have significant genetic variations at their genome. These diverse begomovirus species not only provide enormous space for virus molecules to evolve but also allow a species to acclimatize a new host and a new environmental niche. Similar study has been demonstrated in SARS-CoV-2 genome sequences where nucleotide differences have been found between isolates reported form India and other countries [41]. This could be one of the factors for the spread of CLCuMuV in other crops. In recent years, CLCuMuV had been reported from many malvaceous (Okra, China rose), non-malvaceous (sunnhemp, jute), and solanaceaous (chilli/pepper) plants possibly due to broadening of its host-range. However, the role of viruliferous white-fly mediated virus transmission, anthropogenic activities, international trades of ornamental, and other crops plants cannot be ruled out. The spread of CLCuMuV among the economically important families of crop plants and its presence at distinct geographical areas is an alarming situation and needs further detailed investigation which may ultimately be used to develop successful and sustainable control strategies.