hybseek: pathogen primer design tool for diagnostic multi-analyte assays

9

Click here to load reader

Upload: christian-frech

Post on 05-Sep-2016

218 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: hybseek: Pathogen primer design tool for diagnostic multi-analyte assays

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 4 ( 2 0 0 9 ) 152–160

journa l homepage: www. int l .e lsev ierhea l th .com/ journa ls /cmpb

hybseek: Pathogen primer design tool for diagnosticmulti-analyte assays�

Christian Frecha, Karin Breuera, Bernhard Ronacherb, Thomas Kerna,∗,Christof Sohnc, Gerhard Gebauerc

a University of Applied Sciences, Softwarepark 11, 4232 Hagenberg, Austriab Anagnostics Bioanalysis GmbH, Hafenstr. 47-51, 4020 Linz, Austriac Universitaetsfrauenklinik Heidelberg, Voszstr. 9, 69115 Heidelberg, Germany

a r t i c l e i n f o

Article history:

Received 18 June 2008

Received in revised form

12 November 2008

Accepted 17 December 2008

Keywords:

Primer design

DNA signature

DNA fingerprint

a b s t r a c t

Due to recent advances in genome sequencing, the detection of pathogens by DNA signa-

tures, i.e. by oligonucleotide sequences that uniquely identify a specific genome, is becoming

increasingly popular in modern clinical diagnostics. However, currently available screening

methods, such as PCR and microarrays, lack multiplexing and sensitivity, respectively. Solid-

phase amplification (SPA) is an emerging approach with the potential to overcome these

limitations. SPA-based diagnostic assays require both pathogen-specific and compatible

primer pairs for many, often closely related pathogens. Currently, none of the available tools

supports an automated design of such primer sets, making it an iterative, labor-intensive,

and often difficult procedure. Here we describe hybseek, a Web interface for efficient design of

both pathogen-specific and compatible primer pairs for DNA-based diagnostic multi-analyte

Solid-phase amplification

Multi-analyte assay

Diagnostics

HPV

assays. hybseek achieves pathogen-specificity by selecting only candidates with unique 3′

subsequence, and the degree of this uniqueness is quantitatively expressed by a specificity

score. qPCR experimental data confirm the feasibility of our design strategy. The service is

freely available at https://www.hybseek.com.

pathogens is crucial and remains challenging for microarrays,

Pathogen

1. Introduction

Polymerase chain reaction (PCR) is a well established tech-nique with broad field of applications, including clinicaldiagnostics. PCR has excellent sensitivity and specificity, but islimited by the number of targets it can detect simultaneously.Multiplex PCR reactions exist, but generally they are not quan-titative and can only detect few agents concurrently [1]. Thislack of multiplexing makes PCR-based diagnostic assays time-

consuming, inefficient and expensive. In clinical gynecology,for example, cervical smears contain up to 200 relevant germs,and each one is detected by a separate test. Thus, obtaining a

� hybseek is freely available on the Web along with additional docume∗ Corresponding author. Tel.: +43 7236 3888 7110.

E-mail address: [email protected] (T. Kern).0169-2607/$ – see front matter © 2009 Elsevier Ireland Ltd. All rights resdoi:10.1016/j.cmpb.2008.12.007

© 2009 Elsevier Ireland Ltd. All rights reserved.

complete picture of the microbial and viral status by means ofconventional PCR is a cumbersome procedure.

In contrast, microarrays allow analysis of hundreds oreven thousands of targets in parallel. This feature makesthem attractive for diagnostic applications, such as virusidentification and subtyping [2,3]. For applications in clin-ical diagnostics, however, the ability to get unambiguousand reproducible results from sometimes low-abundant

ntation at https://www.hybseek.com.

although some advances have been reported in the field [4,5].Solid-phase amplification (SPA) strives to combine advan-

tages of both methods, PCR and microarrays. This on-chip

erved.

Page 2: hybseek: Pathogen primer design tool for diagnostic multi-analyte assays

i n b

aaaaabogdc

oacacfn

atisnpi

2

Sadfsptafavp(

swAs2odtipYtoplp

c o m p u t e r m e t h o d s a n d p r o g r a m s

mplification method with immobilized primers combines themplification process with the sequence-specific detection onsolid-phase [6,7]. All potentially harmful micro-organisms

nd viruses of a specific milieu can be quantitatively measurednd analyzed within a single, multi-analyte assay [8,9]. Foracteria, information about resistance against certain antibi-tics can also be obtained. The application of such assays inynecology, for example, would significantly improve clinicaliagnostics and enhance preventive woman medicine drasti-ally.

SPA-based diagnostic multi-analyte assays require at leastne specific primer pair for each tested pathogen. In addition,ll primer pairs must operate under identical experimentalonditions, such as annealing time or temperature. Meetingll these requirements manually is challenging and time-onsuming. The computational design of such primer sets in aully integrated and automated fashion is highly desirable. Theecessary program requirements can be defined as follows.

Given a composition consisting of one host genome andset of pathogen genomes, both targets and non-targets, the

ask is to find at least one specific primer pair for each targetn the set. Both forward and reverse primers have to be highlyensitive, i.e. able to amplify its intended target, specific, i.e. ableot to amplify any non-target, and uniform, i.e. have similarhysicochemical properties to be simultaneously applicable

n one assay.

. Background

upplementary Table S1 provides an overview of freely avail-ble primer and probe design applications. Most programseal with the selection of gene-specific probes or primersor expression analysis. Given a set of genes as DNA or RNAequences, they design one or more unique oligonucleotideser (coding) sequence. Although this task is somewhat relatedo our needs, software for expression profiling generally aimst genes instead of genome-wide DNA signatures, and there-ore ensures specificity only within a single transcriptomend not over multiple genomes. OligoWiz, for instance, is aery popular example, with several attractive features. Thisrompted us to evaluate OligoWiz’ usefulness for our purpose

see Section 4).Currently, few tools support the computation of DNA

ignatures [10–14]. Kaderali and Schliep developed ProbeSel,hich selects DNA signature probes to identify organisms [10].lthough this tool allows for DNA fingerprinting, it does notcale well. For example, the authors state that it takes ProbeSelweeks to design probes for the comparatively small eukary-tic yeast genome. This is too slow for applications in clinicaliagnostics where complete mammalian host genomes needo be considered to reach probe specificity. Another examples YODA, which incorporates a fast multiple-use algorithm forrobe selection [11]. On a standard PC, it takes about 20 min forODA to design probes for all yeast genes. Besides probe selec-ion for gene expression analysis, YODA supports the design

f DNA signature probes for single-genome, multiple-genome,athogen–host, and species-strain identification. However,

ike ProbeSel, YODA does not support the design of primerairs, and the inclusion of complete mammalian genomes

i o m e d i c i n e 9 4 ( 2 0 0 9 ) 152–160 153

for specificity analysis leads to overly long runtimes. Tembeet al. presented TOFI, a high-throughput pipeline for design-ing microarray-based pathogen diagnostic assays [12]. Similarto ProbeSel and YODA, TOFI is created to design microar-ray probes but not primers, and computation time remainscritical. In the performance improved version TOFI-beta [15],it takes 4 h in a high-performance computing (HPC) envi-ronment with 74 processors to design probes for Francisellatularensis, a prokaryotic genome of only 2 megabases. BecauseTOFI is not available as a Web interface, it needs to be installedlocally on a HPC environment, which is definitely beyondthe means of most investigators. Livermore’s KPATH system[16,14] is a fully automated DNA-based signature ‘pipeline’that delivers microbial signature candidates in minutes tohours. KPATH compares the genome of the target pathogen toa continuously updated library of microbial genomes, search-ing for those areas that are unique to the target organism.However, KPATH is developed within a biodefence programand thus not publicly available. Insignia is currently unique inits ability to combine DNA fingerprinting with primer designfunctionality [13]. It identifies DNA signatures of any lengthin bacterial and viral genomes, and once a set of signaturehas been decided upon, the integrated Primer3 [17] softwarecan be used to choose suitable primers from the signatures.Insignia performs comprehensive preprocessing of sequencedata and is thus capable of delivering results almost instan-taneously. However, Insignia lacks several important featuresthat are crucial for the problem stated above. First, it does notsupport the design of multi-analyte assays, i.e. the computationof whole sets of uniform primer pairs in a single step. Rather,primers need to be computed on a per-target basis and theiruniformity has to be ensured manually. Secondly, the InsigniaWeb interface operates on a preselected list of publicly avail-able sequence data. Although Insignia can be installed locallyto include also unpublished sequences, preprocessing is timeintensive. Thus genomes or other DNA sequences cannot beadded easily ‘on the fly’ as they become available. Thirdly,Insignia does not provide a quality measure for its designedprimers, and it is up to the user to decide which of them ispresumably the best one.

Here, we present a new solution, hybseek, with the follow-ing unique features: (1) one-step design of primer pairs formultiple genomic targets; (2) rapid assessment of a primer’spathogen-specificity based on a new method for 3′ subse-quence analysis; (3) efficient algorithm capable of processingeven large prokaryotic targets ‘on the fly’, without the needfor time-consuming preprocessing. Importantly, our service isfree of charge, easy to operate, and fast enough to allow for aninteractive and iterative design pipeline.

3. Systems and methods

3.1. Algorithm

In this paper, a composition refers to a collection of genomic

DNA sequences. These sequences are either targets or non-targets, collectively referred to as entries. For each target, aspecific primer pair has to be designed that must be found inneither any other entry nor in the host organism. A candidate
Page 3: hybseek: Pathogen primer design tool for diagnostic multi-analyte assays

s i n

154 c o m p u t e r m e t h o d s a n d p r o g r a m

is a suitable primer (forward or reverse) proposed by hybseekthat matches the predefined design criteria.

Fig. 1 illustrates the overall primer design pipeline. In a firststep, two word lists are collected from all entry sequences: alist of composition-wide unique 15mers to locate candidatesequences for primer design (this check includes the hostgenome), and a list of composition-wide unique 9mers formore detailed specificity analysis (this check does not includethe host genome). The computation of both lists can be accom-plished in O(n) time by (a) bit-encoding all input sequencesand (b) storing word frequencies in a bit-encoded list in RAM.A detailed description of this procedure can be found in [11].

In a second step, the previously identified unique 15mersare elongated upstream (for both forward and reverse primer)until a preconfigured primer melting temperature is reached.In the case where a candidate exceeds the maximum length,it is rejected. To further assess the specificity of each unique15mer, a specificity score for both terminal 9mer is computed(see Section 3.3).

Note that this design strategy ensures that all success-fully identified candidates carry at least one composition-wideunique 15mer at their 3′-end. In cases where not enoughunique 15mers can be located due to very closely relatedpathogens, it is possible to configure hybseek to group theseentries together and to design group-specific primers. This issimilar to Insignia’s functionality to retrieve shared DNA sig-natures for a reference genome and one or more subspeciesor strains.

The algorithm outputs a list of top-scoring candidates foreach target. What is left to the human expert is the final choiceof primer pairs from automatically identified candidate pairs.hybseek assists this selection process through several sort-ing and filter options. A detailed description of all availablefeatures can be found in hybseek’s online quickstart tutorial,accessible via the hybseek homepage.

Although the above algorithm does not depend on BLAST,an online NCBI BLAST search can be initiated for selected can-didates by a single mouse-click. We parameterized BLAST withvery relaxed values for maximum sensitivity.1 Alternatively,NCBI’s recently released Primer-BLAST service can be used forthat purpose [18].

3.2. Sensitivity, specificity and uniformity

Here, sensitivity refers to a primer’s ability to produce ampli-cons of its intended target, specificity refers to a primer’s abilitynot to align and amplify non-targets, and uniformity refers tosimilar physicochemical properties of multiple primers, suchas melting temperature.

One way hybseek ensures sensitivity is by designingprimers that are 100% identical to the target. Additionally, alldesigned primers are screened for their potential to form sec-

ondary structures, homodimers, or heterodimers (i.e. dimersof forward and reverse primer) by looking for short stretchesof complementary sequences. We use the same algorithm asdescribed in [19] to determine if any such structure is likely.

1 E-value less than 1000, word size 7, gap open cost 1, gap exten-sion cost 1, mismatch penalty −2.

b i o m e d i c i n e 9 4 ( 2 0 0 9 ) 152–160

Specificity is frequently discussed in literature [20–22].Because the aim of hybseek is to design primers com-patible with SPA methods, we consider specificity criteriaestablished for probe design as relevant. Among the mostcited probe specificity criteria are (1) no continuous matchto non-targets longer than 15 bp and (2) sequence identityto non-targets lower than 75%. In addition, we follow thehypothesis that (3) especially the terminal 3′ subsequence ofprobes and primers should carry mismatches to non-targets[23–25]. hybseek reports the amount of unique 15mer foreach candidate, thus the first criterion can easily be guar-anteed by selecting only candidates that have all 15mersunique. The second criterion expresses the percentage ofmatching nucleotides in an alignment to all non-targets ofthe composition. In hybseek, this criterion can be met intwo ways. The first possibility is to activate the option ‘com-pute non-target identity’ before primer computation. In thiscase, hybseek performs an ungapped alignment to all non-targets in the composition (excluding the host-genome) andreports the highest sequence identity. Because this com-putation is exhaustive, it is only recommended for smallcompositions with, for example, viral pathogens. The secondpossibility, which is recommended by the authors, is to leavethis option deactivated and to determine sequence identityfor the most promising candidates by an online NCBI BLASTsearch. Based on the results of our experimental mismatchanalysis (see Section 4), we consider the third criterion per-taining to the uniqueness of 3′ subsequences as the mostimportant one. To meet this criterion, we devised a novelscoring scheme that reflects the number and location of mis-matches found in a primers’ terminal 3′ subsequence (seeSection 3.3).

Uniformity is ensured by selecting only candidates withsimilar melting temperatures (Tm). This is a minimum require-ment for their parallel application in multi-analyte assays.Tm calculation is based on nearest-neighbor thermodynam-ics [26]. We use a fixed salt concentration of 50 × 10−3 M and,because the actual amount of DNA is not known, a fixed puta-tive (or approximate) DNA concentration of 50 × 10−9 M. Inaddition, hybseek allows the specification of a common ampli-con length to select only primer pairs that satisfy a givengenomic distance.

3.3. Specificity score

Several studies have been conducted to assess the effect ofmismatch position and mismatch type on polymerase exten-sion efficiency and PCR product yield [23,24]. These studiessuggest that terminal 3′ mismatches have a major impact onprimer efficiency. This can be explained by the fact that bothhybridization and extension are crucial for DNA polymeriza-tion. The latter is an enzymatic activity of DNA-polymerase,which is very sensitive for mismatches near its catalytic center[27,28].

Our main idea is summarized in Fig. 2 the primer sequenceis split into a hybridization region, the hyb-box, and a

polymerization region, the pol-box. We assume that the DNA-polymerase is more sensitive to mismatches located withinthe pol-box because they directly interfere with the polymer-ization initiation reaction. For practical reasons the length
Page 4: hybseek: Pathogen primer design tool for diagnostic multi-analyte assays

i n b

otow

matethtitsiottttmsi

nsha

S

wp

kg(gosairgotosai

3

hp

c o m p u t e r m e t h o d s a n d p r o g r a m s

f the pol-box is set to 9 nucleotides, because our inves-igations have shown that the existence of shorter uniqueligonucleotides becomes unlikely in larger compositionsith prokaryotic pathogens.

For the computation of the specificity score, only mis-atches in the pol-box are considered. To identify the amount

nd location of mismatches contained in the pol-box, up towo nucleotides of the 9mer are ‘mutated’ in silico and the gen-rated hits are registered (a hit in this context is the presence ofhe altered 9mer in the composition, excluding the host). Eachit is assigned a ‘score’ proportional to the number and posi-ions of induced mutations (Eq. (1)). The nearer the mutations located to the 9mer’s 3′-end, the higher the score, whereaswo mutations almost always score better than one (Fig. 3). Thecore is normalized by division with the highest score possiblen this scheme. If no hit is registered, the maximum specificityf 1 is assigned to the candidate. If multiple hits are registered,he average of all scores is calculated. In this analysis, no morehan two changed nucleotides are allowed because otherwiseoo many hits would be produced, resulting in a noisy andhus meaningless score. However, all primers in our experi-

ental mismatch analysis (see Section 4) already showed aignificant loss of product yield with two mismatches withints 3′-terminal 9mer.

In the following, w is a candidate’s unique terminal 3′ 9mer,its length (i.e. n = 9), i and j are nucleotide positions within w

tarting at 5′, wx is w with a changed nucleotide at position x,it(w) returns 1 if wx is found in the list of all occurring 9mernd 0 otherwise, and h is the total number of hits produced.

Then a candidates specificity score S(w) is

(w) =

n∑i=1

n∑j=1

{i = j : hit(wi)bi

i /= j : hit(wi,j)(bi + bj)

(bn + bn−1)h(1)

ith base b for the exponential function set to value 1.1. Thisroduces the output shown in Fig. 3.

Note that the host genome cannot be considered in thisind of 9mer specificity analysis, because host genomes areenerally large, and the existence of unique oligonucleotides≤ 9) is very unlikely in large genomes. However, for this ‘fine-rained’ specificity analysis we do not regard the exclusionf host genomes as critical, for several reasons. First, hyb-eek always guarantees the composition-wide uniqueness ofll candidates, including the host genome (Fig. 1). Secondly,n PCR experiments it is fairly unlikely that both forward andeverse primers will cross-hybridize within a short, criticalenomic distance. This is potentially a plausible scenario forther, closely related pathogens, but a rather unlikely one forhe host genome. Thirdly, hybseek allows submission of annline NCBI BLAST search of any promising candidate by aingle mouse click, which permits for a quick, final specificitynalysis against virtually all publicly available sequences,ncluding the host genome.

.4. Implementation

ybseek is implemented in C# and runs on an ASP.NET 2.0latform. It accesses a Microsoft SQL Server 2005 database

i o m e d i c i n e 9 4 ( 2 0 0 9 ) 152–160 155

over NHibernate 1.0.1 middleware. The service was testedwith Microsoft Internet Explorer 6.0 and Mozilla Firefox1.0.7. It is currently hosted on a Windows 2003 Serverwith an AMD Opteron Processor 280 with 2.41 GHz and 3GBRAM.

As genomes of host organisms may be large, preprocessedword lists for host genomes are stored in the database. Thisreduces computation time considerably. For example, hybseekis able to detect all 15mer exclusively contained in Escherichiacoli but not contained within the human genome within 10 s on3 GHz Pentium 4 CPU, considering forward and reverse com-plementary sequence. Currently, host genomes of human, rat,mouse, fly, yeast, worm and E. coli are available. Note that pre-processing involves host genomes but not pathogen genomes,which allows the user to add sequence data to the composition‘on the fly’.

4. Results and discussion

4.1. Primer validation

To validate the usefulness of primers designed by hybseekexperimentally, a screening for human papilloma viruses(HPVs) in cervical smears was performed. The HPV familywas chosen as a test bed because many viruses of this familyhave been completely sequenced, and because this pathogenis highly relevant in gynecology. Complete genomes of sixprevalent HPV types (6, 11, 16, 18 and 33) were obtained fromNCBI. The XIST gene located on the human X chromosome(Xq13.2) was used as positive control. This gene encodes fora very large exon, which increased the probability for hyb-seek to find a primer pair satisfying all predefined designconstraints. Primers were designed with Tm 70◦ and a max-imum primer length of 55 bp. Homo sapiens was configured ashost organism. Supplementary Table S2 shows the number ofcomputed candidates and possible amplicons for each target.Supplementary Table S3 lists the chosen primers. The primerswere applied to DNA samples derived from clinical-collectedcervical smears. For qPCR experimental setup and parametersrefer to Supplementary Information I1.

HPV infections were verified using Digene Hybrid Capture2 (hc2) test, although type-specific testing was not possiblewith this procedure. From 64 screened cervical smears, hc2reported 21 (33%) HPV-positive in total, 7 low risk and 14 highrisk HPV infections. With our primers we detected 12 of these.1, 0, 8, 2, and 1 samples were reported as infected by HPV 6, 11,16, 18 and 33, respectively. Because hc2 additionally screensfor HPVs 31, 35, 39, 42, 43, 44, 45, 51, 52, 56, 58, 59 and 68, forwhich we had no primers designed, it is possible that our falsenegatives were among these types. All positive samples foundwith our primers could be confirmed by hc2.

4.2. Tool comparison

As stated above, currently no other software performs thesame task as hybseek. Nevertheless, in simplified scenarioswith only few and small genomic targets, other tools can beused if missing features are compensated manually.

Page 5: hybseek: Pathogen primer design tool for diagnostic multi-analyte assays

156 c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 4 ( 2 0 0 9 ) 152–160

Page 6: hybseek: Pathogen primer design tool for diagnostic multi-analyte assays

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 4 ( 2 0 0 9 ) 152–160 157

Fig. 2 – Primer boxes. We defined two distinct regions(boxes) that should account differently to a primer’sspecificity. Mismatches in the hybridization-box (hyb-box)only account for a lowering in melting temperature.Mismatches within the polymerization-box (pol-box)additionally interfere with the catalytic activity of thepolymerase. We reasoned that the nearer mismatches arelocated to a primer’s 3′-end, the better for its specificity.The specificity score is exclusively based on themismatches found in the pol-box. For our experimentalspecificity analysis, we further quantized the pol-box intotwo distinct regions (P1: 4 nt, P2: 5 nt) and the hyb-box intof

fdpm4gu

smtpifisotouw

Ps

c

Fig. 3 – Possible mismatch patterns and associated scores.

Fecbarws

our distinct regions (H1–H4, each 7 nt long).

Primer32 is an accepted and widely used standard toolor primer design [17]. To assess the quality of the primersesigned by hybseek, we used Primer3 to design referencerimer pairs for HPV 6, 16 and 18 and compared their perfor-ance. Primer3 parameters were: product size min 400, opt

50, max 500; all other settings were left default. For each HPVenome, we let Primer3 identify the best primer pair. Primerniqueness was ensured manually by NCBI BLAST.

As a second method of reference OligoWiz 2.03 was cho-en [29]. OligoWiz is a popular and widely used tool foricroarray probe design. Using OligoWiz as a representa-

ive, we tested the applicability of probe design tools for theroblem described in this paper. Useful features of OligoWiz

nclude cross-hybridization check, folding, low-complexityltering and specificity testing against the human tran-criptome. All settings were left default. Two high-scoringligonucleotides were manually selected in an amplicon dis-ance between 400 and 500 bp, and the reverse-complementf one of the oligos was used as the reverse primer. Again,niqueness was ensured manually by NCBI BLAST after-ards.

Supplementary Table S4 lists the primers designed byrimer3, OligoWiz and hybseek. They were applied to theame clinical-collected cervical samples as used for HPV

2 http://frodo.wi.mit.edu/cgi-bin/primer3/primer3 www slow.gi/.3 http://www.cbs.dtu.dk/services/OligoWiz2/.

Each row represents a 9mer with mismatch (black squares)and match positions (white squares). All possiblecombinations are ordered ascending by their assignedspecificity scores. The order shown here is a consequenceof the exponential function used to weigh mismatchpositions Eq. (1). The more and the closer mismatches arelocated at 3′ (right side), the higher the assigned score.

ig. 1 – Primer design pipeline. First, all composition-wide unique 15mer are identified (considering the host genome). Forach unique 15mer (or seed), both terminal 9mer are analyzed to assess the difference to all other 9mers found within theomposition (without considering the host genome). For both terminal 9mers, this information is quantitatively expressedy a specificity score between 0 and 1. Only top-scoring seeds are further elongated towards 5′ (forward primer candidates)nd 3′ (reverse primer candidates) until the configured melting temperature is reached. For reverse primer candidates, theeverse complementary of the input sequence is taken afterwards (indicated by the switch of primer orientation). Alongith several computed primer properties, the resulting list of candidates is presented to the user for final annotation,

election and download.

Page 7: hybseek: Pathogen primer design tool for diagnostic multi-analyte assays

158 c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 4 ( 2 0 0 9 ) 152–160

Table 1 – Peaks produced by different primer sets.

Program C(t) Major peak Minor peak

HPV 18Primer 3 20,60 85 ◦C nhybseek 23,05 84 ◦C nOligoWiz 28,33 83 ◦C n

HPV 6Primer 3 31,81 78 ◦C 82 ◦Chybseek 30,54 82 ◦C nOligoWiz 30,12 84 ◦C 78 ◦C

HPV 16Primer 3 20,28 82 ◦C nhybseek 21,69 83 ◦C nOligoWiz 20,40 83 ◦C 78 ◦C

All qPCR reactions were performed under the same parameter and an annealing temperature of 70◦C. C(t) value was as set as 6 Sigma withinthe first 20 cycles. Major peak and minor peak represent peaks determined by melting curve analysis (Supplementary Fig. 3). Major peak isaccording to the most prominent peak within the melting curve. The existence of a minor peak indicates unspecific amplification of a second

target.

detection (see Section 4.1), predefined for HPV-positive andHPV-negative. To test the specificity of the obtained primers,qPCRs with an annealing temperature of 70◦C followed bya melting curve at predefined conditions were performed.Results are summarized in Table 1. Three of the primer pairsshow two peaks instead of one. Only hybseek primer pairsperformed as expected.

We conclude that the primers designed by hybseek per-formed better than those of Primer3 and OligoWiz. At least inone case both Primer3 and OligoWiz produced primer pairsthat were not specific to their respective targets. Given the

small sample size of this study, this points out the principalproblem of a ‘naive’ approach and underlines the importanceof a specialized tool like hybseek for the design of primers fordiagnostic purposes.

Fig. 4 – Influence of nucleotide mismatches on primer efficiency.‘mutated’ with up to three mismatches at different locations anddifferent regions that carry these mismatches, according to the cshows the range of specificity score that has been computed forcycles after which a signal had been detected. For HPV6, primersvalues are assessed to 40. PM: perfect match.

4.3. Primer mismatch analysis

To assess the validity of our specificity score, we inserted up tothree nucleotide mismatches into hybseek’s previously usedforward primers for tool comparison, and applied them tothe same clinical-collected cervical samples already prede-fined for HPV-positive (see Section 4.1). The reverse primersremained unchanged. Original and mutated primers are listedin Supplementary Table S5.

Results are shown in Fig. 4. Single nucleotide mismatchesgenerally showed no influence on C(t) except for HPV6 where

a C:C (primer:template) mismatch at the last position wasvery effective. Double mismatches had no significant influ-ence when both mismatches were located upstream of the 7thnucleotide from 3′. However, for all primers that contain mis-

Previously used forward primers of HPV6, 16 and 18 werethere efficiency was measured. The x-axis accounts for the

lassification scheme of Fig. 2. The second line of the x-axisthese mismatches. The y-axis depicts the number of PCR(P2+P1) and (H1+P2+P1) produced no signal and its C(t)

Page 8: hybseek: Pathogen primer design tool for diagnostic multi-analyte assays

i n b

mfw

mesut

4

Tciht1tcvd1wficf

5

Wcadtrhhhc

cioib

C

Aoa

A

hlm

r

c o m p u t e r m e t h o d s a n d p r o g r a m s

atches within their pol-box (downstream the 9th nucleotiderom 3′), the performance decreased rapidly. This is consistentith our assumption outlined in Fig. 2.

Fig. 4 also illustrates the specificity score assigned to eachutated primer. This computed score correlates well with the

xperimental results. In all cases, primers with a specificitycore above 70% showed a significant decrease in PCR prod-ct yield, and therefore we recommend this value as a lowerhreshold for the selection of primer candidates.

.4. In silico HPV case study

o assess the applicability of hybseek in cases where manylosely related pathogens need to be discriminated, an in sil-co analysis of the whole papillomaviridae family infectingumans was performed. It took 3 min for hybseek’s algorithmo successfully design unique primer candidates for all known06 HPV variants (human host, non-target identity computa-ion ‘off’). The number of found forward and reverse primerandidates with a configured melting temperature of 70◦Caries from 102 in HPV16 to over 2000 in HPV41 (unpublishedata). Only 7.5% (subtypes 1a, 2a, 5b, 6a, 6b, 16 African type 1,6 African type 2 and 16 Asian-American variant) of all HPVsere hard to discriminate with our design strategy. After con-guring hybseek to group those with their closest relatives,ommon primer candidates could be successfully identifiedor these targets as well.

. Conclusion

ith currently available primer and probe design tools, theompilation of primer sets for SPA-based diagnostic multi-nalyte assays is a time- and labor-intensive endeavor. Weeveloped hybseek for this purpose, an integrated, interac-ive, and Web-based tool. Its efficient design algorithm deliversesults within seconds, even for compositions with eukaryoticosts and prokaryotic targets. Results from qPCR proved theigh quality of designed primers. In particular, they showedigh target-specificity, which is essential for discriminatinglosely related pathogens.

Nucleic acid-based diagnostics are gradually replacing oromplementing culture-based, biochemical, and immunolog-cal assays in routine microbiology laboratories. With thengoing maturation of parallelized SPA-based systems and

ts application in diagnostics, hybseek has great potential ofecoming a very valuable tool for the scientific community.

onflict of interest statement

ll authors hereby stated that they have no potential conflictf interest related to any for-profit company or institution inny ways.

cknowledgements

ybseek is the result of the research project MAAGYS, molecu-ar biological screening (multi-analyte) assay for gynaecologicals

ear diagnostic in routine laboratories (http://maagys.fh-

i o m e d i c i n e 9 4 ( 2 0 0 9 ) 152–160 159

hagenberg.at/). MAAGYS was partially funded as subprojectof REGINS, a project within the European Community Initia-tive INTERREG III C, financed under the European RegionalDevelopment Fund (ERDF). We would like to thank the per-sonnel of our research partner in Germany, the Universityof Heidelberg (Gynecological Hospital), for sample collection,molecular analysis, clinical data preparation, and manuscriptrevision; the team of Anagnostics Bioanalysis GmbH for assayprototyping; and the anonymous reviewers for their valuablecomments and suggestions.

Appendix A. Supplementary data

Supplementary data associated with this article can be found,in the online version, at doi:10.1016/j.cmpb.2008.12.007.

e f e r e n c e s

[1] I.M. Mackay, K.E. Arden, A. Nitsche, Real-time PCR invirology, Nucleic Acids Res. 30 (6) (2002) 1292–1305.

[2] T.J. Oh, C.J. Kim, S.K. Woo, T.S. Kim, D.J. Jeong, M.S. Kim, S.Lee, H.S. Cho, S. An, Development and clinical evaluation ofa highly sensitive DNA microarray for detection andgenotyping of human papillomaviruses, J. Clin. Microbiol. 42(7) (2004) 3272–3280, http://dx.doi.org/10.1128/JCM.42.7.3272-3280.2004.

[3] D. Wang, L. Coscoy, M. Zylberberg, P.C. Avila, H.A. Boushey,D. Ganem, J.L. DeRisi, Microarray-based detection andgenotyping of viral pathogens, Proc. Natl. Acad. Sci. U.S.A. 99(24) (2002) 15687–15692, http://dx.doi.org/10.1073/pnas.242579699.

[4] C.W. Wong, C.L.W. Heng, L.W. Yee, S.W.L. Soh, C.B.Kartasasmita, E.A.F. Simoes, M.L. Hibberd, W.-K. Sung, L.D.Miller, Optimization and clinical validation of a pathogendetection microarray, Genome Biol. 8 (5) (2007) R93,http://dx.doi.org/10.1186/gb-2007-8-5-r93.

[5] D.R. Call, Challenges and opportunities for pathogendetection using DNA microarrays, Crit. Rev. Microbiol. 31 (2)(2005) 91–99.

[6] D. Bing, C. Boles, F. Rehman, M. Audeh, M. Belmarsh, B.Kelley, C. Adams, Bridge amplification: a solid phase PCRsystem for the amplification and detection of allelicdifferences in single copy genes, Genetic IdentityConference Proceedings, Seventh International Symposiumon Human Identification, 1996. http://www.promega.com/geneticidproc/ussymp7proc/0726.html.

[7] C. Adessi, G. Matton, G. Ayala, G. Turcatti, J.J. Mermod, P.Mayer, E. Kawashima, Solid phase DNA amplification:characterisation of primer attachment and amplificationmechanisms, Nucleic Acids Res. 28 (20) (2000) 87.

[8] G. Mitterer, M. Huber, E. Leidinger, C. Kirisits, W. Lubitz, M.W.Mueller, W.M. Schmidt, Microarray-based identification ofbacteria in clinical samples by solid-phase PCR amplificationof 23S ribosomal DNA sequences, J. Clin. Microbiol. 42 (3)(2004) 1048–1057.

[9] A.J. Alvarez, M.P. Buttner, G.A. Toranzos, E.A. Dvorsky, A.Toro, T.B. Heikes, L.E. Mertikas-Pifer, L.D. Stetzenbach, Use ofsolid-phase PCR for enhanced detection of airborne

microorganisms, Appl. Environ. Microbiol. 60 (1) (1994)374–376.

[10] L. Kaderali, A. Schliep, Selecting signature oligonucleotidesto identify organisms using DNA arrays, Bioinformatics 18(10) (2002) 1340–1349.

Page 9: hybseek: Pathogen primer design tool for diagnostic multi-analyte assays

s i n

160 c o m p u t e r m e t h o d s a n d p r o g r a m

[11] E.K. Nordberg, Yoda: selecting signature oligonucleotides,Bioinformatics 21 (8) (2005) 1365–1370, http://dx.doi.org/10.1093/bioinformatics/bti182.

[12] W. Tembe, N. Zavaljevski, E. Bode, C. Chase, J. Geyer, L.Wasieloski, G. Benson, J. Reifman, Oligonucleotidefingerprint identification for microarray-based pathogendiagnostic assays, Bioinformatics 23 (1) (2007) 5–13,http://dx.doi.org/10.1093/bioinformatics/btl549.

[13] A.M. Phillippy, J.A. Mason, K. Ayanbule, D.D. Sommer, E.Taviani, A. Huq, R.R. Colwell, I.T. Knight, S.L. Salzberg,Comprehensive dna signature discovery and validation,PLoS Comput. Biol. 3 (5) (2007) e98, http://dx.doi.org/10.1371/journal.pcbi.0030098.

[14] J.E. Allen, S.N. Gardner, T.R. Slezak, Dna signatures fordetecting genetic engineering in bacteria, Genome Biol. 9 (3)(2008) R56, http://dx.doi.org/10.1186/gb-2008-9-3-r56.

[15] R.V. Satya, N. Zavaljevski, K. Kumar, J. Reifman, Ahigh-throughput pipeline for designing microarray-basedpathogen diagnostic assays, BMC Bioinform. 9 (2008) 185,http://dx.doi.org/10.1186/1471-2105-9-185.

[16] J.P. Fitch, S.N. Gardner, T.A. Kuczmarski, S. Kurtz, R. Myers,L.L. Ott, T.R. Slezak, E.A. Vitalis, A.T. Zemla, P.M. McCready,Rapid Dev. Nucleic Acid Diagn. 90 (11) (2002) 1708–1721.

[17] S. Rozen, H. Skaletsky, Primer3 on the www for generalusers and for biologist programmers, Methods Mol. Biol. 132(2000) 365–386.

[18] NCBI, Primer-blast (2008). http://www.ncbi.nlm.nih.gov/tools/primer-blast/.

[19] P.M. Vallone, J.M. Butler, Autodimer: a screening tool forprimer-dimer and hairpin structures, Biotechniques 37 (2)(2004) 226–231.

[20] Z. He, L. Wu, X. Li, M.W. Fields, J. Zhou, Empiricalestablishment of oligonucleotide probe design criteria, Appl.

Environ. Microbiol. 71 (7) (2005) 3753–3760, http://dx.doi.org/10.1128/AEM.71.7.3753-3760.2005.

[21] A. Relgio, C. Schwager, A. Richter, W. Ansorge, J. Valcrcel,Optimization of oligonucleotide-based DNA microarrays,Nucleic Acids Res. 30 (11) (2002) e51.

b i o m e d i c i n e 9 4 ( 2 0 0 9 ) 152–160

[22] M.D. Kane, T.A. Jatkoe, C.R. Stumpf, J. Lu, J.D. Thomas, S.J.Madore, Assessment of the sensitivity and specificity ofoligonucleotide (50mer) microarrays, Nucleic Acids Res. 28(22) (2000) 4552–4557.

[23] F. Miura, C. Uematsu, Y. Sakaki, T. Ito, A novel strategy todesign highly specific PCR primers based on the stability anduniqueness of 3′-end subsequences, Bioinformatics 21 (24)(2005) 4363–4370, http://dx.doi.org/10.1093/bioinformatics/bti716.

[24] S. Ayyadevara, J.J. Thaden, R.J.S. Reis, Discrimination ofprimer 3′-nucleotide mismatch by taq DNA polymeraseduring polymerase chain reaction, Anal. Biochem. 284 (1)(2000) 11–18, http://dx.doi.org/10.1006/abio.2000.4635.

[25] T.R. Hughes, M. Mao, A.R. Jones, J. Burchard, M.J. Marton,K.W. Shannon, S.M. Lefkowitz, M. Ziman, J.M. Schelter, M.R.Meyer, S. Kobayashi, C. Davis, H. Dai, Y.D. He, S.B.Stephaniants, G. Cavet, W.L. Walker, A. West, E. Coffey, D.D.Shoemaker, R. Stoughton, A.P. Blanchard, S.H. Friend, P.S.Linsley, Expression profiling using microarrays fabricated byan ink-jet oligonucleotide synthesizer, Nat. Biotechnol. 19 (4)(2001) 342–347, http://dx.doi.org/10.1038/86730.

[26] J. SantaLucia, A unified view of polymer, dumbbell, andoligonucleotide dna nearest-neighbor thermodynamics,Proc. Natl. Acad. Sci. U.S.A. 95 (4) (1998)1460–1465.

[27] M.M. Huang, N. Arnheim, M.F. Goodman, Extension of basemispairs by taq DNA polymerase: implications for singlenucleotide discrimination in PCR, Nucleic Acids Res. 20 (17)(1992) 4567–4573.

[28] S. Kwok, D.E. Kellogg, N. McKinney, D. Spasic, L. Goda, C.Levenson, J.J. Sninsky, Effects of primer-templatemismatches on the polymerase chain reaction: humanimmunodeficiency virus type 1 model studies, Nucleic Acids

Res. 18 (4) (1990) 999–1005.

[29] R. Wernersson, H.B. Nielsen, Oligowiz 2.0–integratingsequence feature annotation into the design of microarrayprobes, Nucleic Acids Res. 33 (Web Server issue) (2005)W611–W615, http://dx.doi.org/10.1093/nar/gki399.