The nucleic acid of SARS-CoV-2 in pharyngeal swabs was determined by RT-qPCR test targeting RBD sequence of SARS-CoV-2 spike protein. The Ct values of the 11 positive samples are shown in Table 1. The extracted RNA of the 22 samples was sent for transcriptome sequencing to verify the results of the molecular diagnosis. Sequencing results confirmed the presence of SARS-CoV-2 in the 11 RT-qPCR positive samples and no SARS-CoV-2 RNA in the 11 RT-qPCR negative samples. After analyzing the transcriptome data of the 11 SARS-CoV-2 positive samples, four complete genome sequences with high quality, four draft genomes and three genomes with some fragments were obtained (Table 1). Furthermore, the phylogenetic tree constructed with the four high-quality SARS-CoV-2 genomes and 232 reference genomes showed that the four SARS-CoV-2 genomes are different (Supplementary Fig. S1).
Sample Seq length (bp) Number of contigs Number of reads Ct value Avg depth S1 30,062 1 200,476 17.5 886 S2 29,977 1 295,310 17.1 316 S3 30,026 1 225,212 23.1 1,009 S4 29,531 7 3632 26.2 11 S5 29,660 1 292,054 23.6 35 S6 393 1 82 28.4 30 S7 27,958 10 3268 26.3 10 S8 29,889 23 2836 28.6 11 S9 1061 2 12 31.2 2 S10 2553 5 148,226 26.4 103 S11 29,296 3 8050 25.7 36
Table 1. Sequencing and RT-qPCR results of the 11 SARS-CoV-2 positive samples.
The chi-square test and the Student's t test confirmed that the SARS-CoV-2 detection results were not correlated with gender and age (All P > 0.05, Supplementary Table S2). For the microbiome analysis, each sample had an average of 9,852,978 reads after filtering the host genome and low-quality sequences. An average of 39,000 contigs was obtained per sample, and the detailed information is available in Supplementary Table S3. For each sample, about 75% contigs were longer than 350 bp and the average length of the contigs was about 425 bp, as shown in Fig. 1A. Contigs with the length longer than 350 bp were used for the species annotation. Altogether, a total of 6502 species were identified including 5896 bacterial taxa and 606 viral taxa belonging to 70 orders and 123 families. In particular, most of the 606 viral taxa identified were phages. In addition, the species accumulation curve in Fig. 1B shows that the total number of species gradually becomes saturated as the number of the samples increases.
Based on the annotation information, a bacterial and viral composition profile with the abundance information was achieved (Supplementary Table S4). To investigate the difference in richness and evenness of the 6502 species among the three groups, we calculated and compared the alpha diversity index, including Observed species, Shannon index as well as Pielou evenness index between the groups (Han et al. 2018). Significant differences were observed in the alpha diversity of the two patients' samples (the COVID-19 group and the Non-COVID-19 group) from the healthy people group as shown in Fig. 2A-2C (P < 0.05 in Kruskal-Wallis test). However, the Observed species and the Shannon index indicated there was no significant difference between the COVID-19 and the Non-COVID-19 groups, as shown in Fig. 2A and 2B. Interestingly, compared with the Non-COVID-19 group, the COVID-19 group had a significantly lower Pielou evenness index (Fig. 2C, P = 0.019). Due to the large number of species taxa, the top twelve abundant orders in the samples, including Bacteroidales, Lactobacillales, Caudovirales etc., were used for analyzing the taxonomic compositions. At the order level, the UPGMA cluster analysis revealed that the samples from healthy control were clustered obviously together and different from the two patient cohorts. While the taxonomic composition of the COVID-19 group showed similar to that of the Non-COVID-19 group (as shown in Fig. 2D).
Figure 2. The microbiome difference in the pharyngeal swabs of the COVID-19 group (n = 11), the Non-COVID-19 group (n = 11) and the Healthy group (n = 7). Comparison of the alpha diversity indexes using Observed species (A), Shannon index (B), and Pielou index (C) based on the metagenomic profiles at the species level. The Kruskal-Wallis test is used for significance calculation. D Samples clustered by UPGMA using Bray-Curtis distance (left), and top twelve abundant orders with the relative abundance in the corresponding samples (right) at the order level. E Principal Coordinate Analysis (PCoA) of the species present in more than 20% of all samples based on the Bray-Curtis distance. ANOSIM, R = 0.53, P = 0.001.
Then the Principal Coordinates Analysis (PCoA) was performed to investigate the similarity of the microbiome in the three cohorts based on the Bray-Curtis distance. In order to reduce individual difference, 3069 species which present in at least twenty percentage of all samples (≥ 6 samples) except SARS-CoV-2, were chosen for analysis. Similar to the results of alpha diversity, there was a clear separation in the beta diversity results between the healthy control group and the patient groups. (Fig. 2E, ANOSIM, R = 0.53, P = 0.001).
We further studied the distribution of the 1090 species presented in over 60% of all samples (≥ 18 samples) to find if there are some opportunistic pathogens enriched in the COVID-19 group, which may contribute to the secondary infections in COVID-19. From the linear discriminant analysis (LDA), it was found that there were 37 species with highly discriminated abundance among the groups. As shown in Fig. 3A, all these species had much higher abundance in the patient groups than that of the healthy group. It was also worth noting that most of these species had higher abundance in the COVID-19 group than that of the Non-COVID-19 group, including some opportunistic pathogens such as Streptococcus and Prevotella (Fig. 3A). Comparative analysis of the differential abundance species by the generalized linear model showed that there were 81 species except SARS-CoV-2 having higher abundance in the COVID-19 group than that of the Non-COVID-19 group, and no species having lower abundance (Fig. 3B). The species ranked by log2 fold increase in the abundance were shown in the Supplementary Table S5. Among them, 51 species were found enriched at least 8 folds in the COVID-19 group compared with that in the Non-COVID-19 group (Supplementary Table S5), and the top three enriched genera were Streptococcus, Prevotella, and Campylobacter. Mapping the reads of each sample back to the genome of each enriched species identified by the generalized linear model confirmed the enrichment (Supplementary Figure S2, all with P < 0.05). Furthermore, the heatmap clearly showed that the abundance of the enriched species identified via LDA method and generalized linear model in the 3 groups (Fig. 3C).
Figure 3. Analysis of species with differential abundance among the COVID-19, Non-COVID-19 and healthy cohorts. A Average relative abundance of the most discriminated species identified by linear discriminant analysis in three cohorts. B Differential abundance species identified by comparative analysis of the COVID-19 and the Non-COVID-19 group via generalized linear model. C Heatmap showing the abundance changes of all the significantly differential species among the three cohorts.
Cell protein ACE2 is considered as the receptor for SARS-CoV-2 binding to the host. Previous studies found the dynamic variation of ace2 gene expression can be in response to Pseudomonas aeruginosa infection (Sodhi et al. 2019), and periodontopathic bacteria may enhance the SARS-CoV-2 infection by increasing the expression of ACE2 (Takahashi et al. 2020). In order to understand if the enriched opportunistic pathogens may play certain roles in SARS-CoV-2 infection, some enriched bacteria in the COVID-19 group were chosen to study their effects on the expression of ACE2 of Vero cells in vitro. Because most species among the 51 species enriched at least 8 folds belong to Streptococcus genus, two Streptococcus species, S. suis and S. agalactiae, together with other three species (Acinetobacter baumannii, Bacillus cereus and Escherichia coli) with no significant difference in abundance from the metagenomic profiles were selected for the experiment. Surprisingly, compared with the pathogen-free blank control, the two Streptococcus strains were found to be able to significantly promote the expression level of ace2 gene at 8 h after interacting the bacteria with the cells (Fig. 4), while the other three bacteria exhibited no significant effects on the expression of ACE2 (Fig. 4).
Figure 4. Relative expression level of ACE2 with time after interaction of the Vero cells with the bacteria. Each of treatment was performed three replications. The relative expression level of ACE2 was determined using the 2−ΔΔCt method (Livak and Schmittgen 2001) and compared to that of the blank control group using the Student's t test. Each point in the line chart represents the mean value of relative expression level of ACE2 at the different infection time, and the error bar represents the standard deviation.