A Strategy to Optimize the Oligo-Probes for Microarray-based Detection of Viruses<sup>*</sup>

Zhuo ZHOU; Zhi-xun DOU; Chen ZHANG; Hou-qing YU; Yi-jie LIU; Cui-zhu ZHANG; You-jia CAO

August 2007

Citation: Zhuo ZHOU, Zhi-xun DOU, Chen ZHANG, Hou-qing YU, Yi-jie LIU, Cui-zhu ZHANG, You-jia CAO. A Strategy to Optimize the Oligo-Probes for Microarray-based Detection of Viruses^* .VIROLOGICA SINICA, 2007, 22(4) : 326-335.

A Strategy to Optimize the Oligo-Probes for Microarray-based Detection of Viruses^*

Zhuo ZHOU ^1,# ,
Zhi-xun DOU ^2,# ,
Chen ZHANG ¹ ,
Hou-qing YU ² ,
Yi-jie LIU ³ ,
Cui-zhu ZHANG ^1,2 ,
You-jia CAO ^{1,2,4
,,}

1.
Department of Biochemistry and Molecular Biology, College of Life Sciences, Nankai University, Tianjin 300071, China
2.
The Key Laboratory of Bioactive Materials, Ministry of Education, Nankai University, Tianjin 300071, China
3.
School of Computing, National University of Singapore, 3 Science Drive 2, 117543, Singapore
4.
Tianjin Key Laboratory of Microbial Functional Genomics, College of Life Sciences, Nankai University, Tianjin 300071, China

Corresponding author: You-jia CAO, caoyj@nankai.edu.cn
These authors contribute equally to the work.
Received Date: 08 January 2007
Accepted Date: 20 May 2007
Available online: 01 August 2007

Fund Project: NSFC grant 30270308Tianjin grant 05YFJZJC01301NSFC grant 30370053

Abstract

DNA microarrays have been acknowledged to represent a promising approach for the detection of viral pathogens. However, the probes designed for current arrays could cover only part of the given viral variants, that could result in false-negative or ambiguous data. If all the variants are to be covered, the requirement for more probes would render much higher spot density and thus higher cost of the arrays. Here we have developed a new strategy for oligonucleotide probe design. Using type I human immunodeficiency virus (HIV-1) tat gene as an example, we designed the array probes and validated the optimized parameters in silico. Results show that the oligo number is significantly reduced comparing with the existing methods, while specificity and hybridization efficiency remain intact. The adoption of this method in reducing the oligo numbers could increase the detection capacity for DNA microarrays, and would significantly lower the manufacturing cost for making array chips.
- Microarray
- , Oligonucleotide
- , Viral Detection

References
1. Altschul S F, Madden T L, Schaffer A A, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, 1997, 25 (17): 3389-3402.
  doi: 10.1093/nar/25.17.3389
2. Boriskin Y S, Rice P S, Stabler R A, et al. DNA microarrays for virus detection in cases of central nervous system infection. J Clin Microbiol, 2004, 42 (12): 5811-5818.
  doi: 10.1128/JCM.42.12.5811-5818.2004
3. Bystricka D, Lenz O, Mraz I, et al. Oligonucleotide-based microarray: a new improvement in microarray detection of plant viruses. J Virol Methods, 2005, 128 (1-2): 176-182.
  doi: 10.1016/j.jviromet.2005.04.009
4. Chizhikov V, Wagner M, Ivshina A, et al. Detection and genotyping of human group A rotaviruses by oligonucleotide microarray hybridization. J Clin Microbiol, 2002, 40 (7): 2398-2407.
  doi: 10.1128/JCM.40.7.2398-2407.2002
5. He Z, Wu L, Li X, et al. Empirical establishment of oligonucleotide probe design criteria. Appl Environ Microbiol, 2005, 71 (7): 3753-3760.
  doi: 10.1128/AEM.71.7.3753-3760.2005
6. Ivshina A V, Vodeiko G M, Kuznetsov V A, et al. Mapping of genomic segments of influenza B virus strains by an oligonucleotide microarray method. J Clin Microbiol, 2004, 42 (12): 5793-5801.
  doi: 10.1128/JCM.42.12.5793-5801.2004
7. Kane M D, Jatkoe T A, Stumpf C R, et al. Assessment of the sensitivity and specificity of oligonucleotide (50mer) microarrays. Nucleic Acids Res, 2000, 28 (22): 4552-4557.
  doi: 10.1093/nar/28.22.4552
8. Korimbocus J, Scaramozzino N, Lacroix B, et al. DNA probe array for the simultaneous identification of herpesviruses, enteroviruses, and flaviviruses. J Clin Microbiol, 2005, 43 (8): 3779-3787.
  doi: 10.1128/JCM.43.8.3779-3787.2005
9. Lee I, Dombkowski A A, Athey B D. Guidelines for incorporating non-perfectly matched oligonucleotides into target-specific hybridization probes for a DNA microarray. Nucleic Acids Res, 2004, 32 (2): 681-690.
  doi: 10.1093/nar/gkh196
10. Markham N R, Zuker M. 2005. DINAMelt web server for nucleic acid melting prediction. Nucleic Acids Res, 33: W577-W581.
  doi: 10.1093/nar/gki591
11. Oh T J, Kim C J, Woo S K, et al. Development and clinical evaluation of a highly sensitive DNA microarray for detection and genotyping of human papillomaviruses. J Clin Microbiol, 2004, 42 (7): 3272-3280.
  doi: 10.1128/JCM.42.7.3272-3280.2004
12. Rimour S, Hill D, Militon C, et al. GoArrays: highly dynamic and efficient microarray probe design. Bioinformatics, 2005, 21 (7): 1094-1103.
  doi: 10.1093/bioinformatics/bti112
13. Rouillard J M, Zuker M, Gulari E. OligoArray 2.0: design of oligonucleotide probes for DNA microarrays using a thermodynamic approach. Nucleic Acids Res, 2003, 31 (12): 3057-3062.
  doi: 10.1093/nar/gkg426
14. Urakawa H, Fantroussi S E, Smidt H, et al. Optimization of single-base-pair mismatch discrimination in oligonucleotide microarrays. Appl Environ Microbiol, 2003, 69 (5): 2848-2856.
  doi: 10.1128/AEM.69.5.2848-2856.2003
15. Wang D, Coscoy L, Zylberberg M, et al. Microarray-based detection and genotyping of viral pathogens. Proc Natl Acad Sci USA, 2002, 99 (24): 15687-15692.
  doi: 10.1073/pnas.242579699
16. Wilson W J, Strout C L, DeSantis T Z, et al. Sequence-specific identification of 18 pathogenic microorganisms using microarray technology. Mol Cell Probes, 2002, 16 (2): 119-127.
  doi: 10.1006/mcpr.2001.0397
17. Young R A. Biomedical discovery with DNA arrays. Cell, 2000, 102 (1): 9-15.
  doi: 10.1016/S0092-8674(00)00005-2
18. Zuker M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res, 2003, 31 (13): 3406-3415.
  doi: 10.1093/nar/gkg595
Proportional views

Figures(5) / Tables(1)

PDF

Article Metrics

Article views(3154) PDF downloads(12) Cited by()

Proportional views

HTML

The detection of viral pathogens is an essential process in many areas. DNA microarray offers a highthroughput approach that has been proved to be a tool capable of detecting viral pathogens in a precise and sensitive way (17). However, some bottlenecks in the technique have been recognized, including nucleotide probe spotting/synthesis and probe design. In the production of microarrays, the density-versus-cost issue has been a major problem. Thus, for the highthroughput viral detection, new approaches are needed to reduce the oligo numbers without compromising the detection capacity and efficiency. The design of suitable sets of oligonucleotide probes is another important step in DNA microarray experiments. For viral detection, a number of groups have designed oligos for representative strains, but the remaining variants are beyond the capacity of the probes (2, 3, 11, 16). An alternative method involves the design of oligos within conserved regions to cover a wider range of variants (4, 8, 15). However, a significant amount of variants are still overlooked by this method, such as those with highly mutated sequences with insertions or deletions at the sites of probes (6). To avoid false-negative or ambiguous results, one can increase the number of oligos designed based on sequences in available databases to cover as many known variants as possible. However, this will lead to the higher probe numbers.

An effective oligo-probe set for viral detection should satisfy several criteria. The specificity is of the highest priority, and has been studied intensively. Using current computer programs (such as OligoArray 2.1) to design 50-mer probes for the type I human immunodeficiency virus (HIV-1) tat genes, we found that over 50% of the probes have the risk of cross-hybridization according to Kane's specificity criteria (7). The common problem using oligos longer than 50-mer is that the specificity is reduced with the increase in oligo length (12). Thus, a proportion of viral variants would not have specific probes if 50-mer or longer oligo designing methods were used. An alternative strategy is to design probes using the GoArrays approach. In this strategy, the oligonucleotide probe consists of the concatenation of two subsequences that are complementary to their target cDNA with an insertion of a short random linker, e.g. 6 bases, to facilitate the formation of the transcript loop (Fig. 1A). This strategy has been shown to broaden the range of oligo selection and increase the specificity and sensitivity (12). In addition to the specificity criteria, the oligo numbers should also be considered. When we tested the existing methods, such as to design oligos within the conserved regions or by the GoArrays approach, we found that all methods would require at least 1500 probes to cover the 1881 tat variants. Therefore, a strategy that circumvents the specificity and oligo number limitations is needed for the microarray-based viral detections.

Figure 1. Schemes of existing and newly developed strategies for oligo design. A: The GoArrays strategy (12), in which the oligo-probe is composed of two disjointed subsequences from one coding segnence (CDS). Subsequences are concatenated via a short linker, e.g., 6 bases. The hybridization between the composite probe and the target induces the formation of a loop, as depicted. B: The modified strategy, in which the mismatch theory is adapted into the GoArrays method, i.e., non-perfect match (es) is incorporated in the concatenated subsequences that were selected from conserved regions.

In this paper, we describe a new strategy for oligonucleotide design aiming to cover all known variants for the viral species of interest with minimal oligo numbers. This approach was developed on the basis of the GoArrays approach (12) and the nonperfect match theory (9). Our results show that the oligo number is significantly reduced in comparison with using other published methods, while the specificity and hybridization efficiency remain intact. This approach could be generally applied to oligodesign or the development of low cost and high throughout viral detection arrays for clinical applications.

MATERIALS AND METHODS

Database and programs

HIV-1 sequences corresponding to exon 1 of the tat gene for 2605 strains (by August, 2005) were downloaded from the LANL HIV Sequence Database (http://hiv-web.lanl.gov/content/hiv-db/mainpage.html). The aligned tat database (in FASTA format) was obtained by searching for all tat exon 1 at http://hiv-web.lanl.gov/components/hiv-db/combined_search_s_tree/search.html. Redundant sequences and sequences with bases other than A, T, G, C were removed. Specially, strain AF443106 was removed because it lacks a large portion of nucleotides. After the filtering steps, tat exon 1 sequences of 1881 strains were collected.

The BLAST program version 2.2 was downloaded from http://www.ncbi.nlm.nih.gov/BLAST/ download. shtml (1). The BioPerl (http://bio.perl.org/) Stand AloneBlast module was used to invoke, parse the result of the blastall program, and to check the specificity of probes designed. All BLAST programs were run using the parameter '-p n' to specify blast on nucleotides, and with default settings for other parameters. Unless indicated, the BLAST score of 32.2 for a 50-mer probe was used as specificity threshold. The human mRNA database used for BLAST to ensure specificity was downloaded from ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/mRNA_Prot/.

OligoArray 2.1 (13) was downloaded from http://berry.engin.umich.edu/oligoarray2_1/; the required OligoArrayAux was downloaded from http://www.bioinfo.rpi.edu/applications/hybrid/OligoArrayAux.php. We set the oligo length 50-mer; maximum oligo number 1; other parameters were set as default. GoArrays (12) was downloaded from http://www.isima.fr/bioinfo/goarrays/, and run following the authors instructions. Maximum length for identity was set as 16; subsequence length was set to 22; other parameters were as default.

Melting temperature (Tm) calculation

The Tm value was calculated by the program downloaded from DINAMelt web server (10) (http://www.bioinfo.rpi.edu/applications/hybrid/hybrid2.php) and run locally as instructed. The initial concentrations of two strands were set as 10^-7mol/L; [Na⁺] was set as 0.5mol/L; other parameters were set as default (see'specificity evaluation' for explanation of these choices).

Free energy and hybridization structure prediction

The free energy (dG) of hybridization was calculated by Mfold web server (18) (http://www.bioinfo.rpi.edu/applications/mfold/). To predict the dG of structure stability, the probe sequence was fixed whereas the target sequences varied. Point mutation(s) in the matched regions was generated by replacing perfectly matched pair(s) with one of the other three non-matching nucleotides.

Determination of conserved regions

A conserved region of certain length of nucleotides is determined by the variety within each window, which shifts from the beginning to the end of the exon by a fixed offset of 5 nt (slide the window along the sequence in 5 nt steps) for each strain in the database. A window with a higher number of unrepeated subsequences is considered to be of higher variety and less conserved. Each sequence selected from a conserved region was subjected to the BLAST program to confirm its specificity as to be used as a probe.

DISCUSSION

During our in silico study, we found that existing specificity criteria for oligonucleotides (7, 13) can be further improved. More recently, He Z. and colleagues proposed an improved criteria which combines identity, stretch and free energy (5). However, we noticed that combining even these three parameters is still not sufficient to ensure specificity, because we were able to create sequences that satisfy all three conditions but still produce non-specific cross-hybridization regardless of hybridization temperature (our unpublished data). Therefore, other thermodynamic parameters, such as Tm and the hybridization rate, should be investigated.

An requirement for DNA microarrays in virological applications is that the methodology could detect unknown variants in the given viral species as well as previously identified variants (15). However the rapid point mutations and segment reassortment of viruses have been a challenge for probe design. When we updated the database in October 2005, we found that there were another 58 new variants posted for HIV-1 tat in addition to the database in August that year. Promisingly, 46 of the 58 new variants can be covered in the 281 probe set as we have deduced, indicating the powerful detection capacity of the probe design.

Further considerations should also be noted. First, as to the probe number, there is still room for improvement. We found that a number of probes contribute only one target in later steps of recurrence during the dynamic programming. To synchronously perform linking combination at more than one pair of regions could be one solution. Another plausible way is to design artificial sequences that do not exist in the subsequence pool but satisfy the mismatch threshold for the targets in later steps. Second, although our calculation tools are currently most updated and the in silico data are statistically reliable, there are still certain deviations compared with experimental results. Finally, our strategy and the validations are based on theoretical model, therefore further experimental verifications are imperative.

Figure (5) Table (1) Reference (18) Relative (20)

A Strategy to Optimize the Oligo-Probes for Microarray-based Detection of Viruses^*

Abstract

References

Proportional views

Article Metrics

Related

Proportional views

A Strategy to Optimize the Oligo-Probes for Microarray-based Detection of Viruses^*

Corresponding author: You-jia CAO, caoyj@nankai.edu.cn