problem definition

DNA Barcoding: Preliminary Studies with Avian Influenza VirusJ. Duitama1,*, D.M. Kumar2,*, E. Hemphill3, S. Babapoor2, I.I. Mandoiu1 , C. Nelson3, and M.

1Computer Science & Engineering, 2Pathobiology & Veterinary Science, 3Molecular & Cell Biology, U. of Connecticut, *Contributed equally to this work

• Avian influenza belongs to the influenza type A genus of theOrthomyxoviridae family of RNA viruses. It is a highly mutablevirus, with the Haemagglutinin (HA) and the Neuraminidase(NA) genes being the most variable. To date, 16 HA and 9 NAsubtypes have been identified.

• There has been much recent work on developing rapidmethods for detection and subtype identification of avianinfluenza infections. Nucleic acid based analyses such as thePolymerase Chain Reaction (PCR) are becoming the method ofchoice , largely replacing the much more labor and timeconsuming serotyping techniques.

• Common primer design packages such as Primer3 [6] are notwell suited for designing PCR primers for subtype identificationsince they seek to amplify just one known target sequence, notan unknown target from the set of highly variable sequencesthat comprise a subtype.

• The high sequence variability within subtypes also rendersunfeasible “common substring” approaches such as [2, 5].

• Methods for degenerate primer selection such as [3, 8] can beused to ensure amplification of a large fraction of knownsequences of a given subtype, but they ignore primerspecificity, i.e., preventing amplification of closely related virussubtypes.

• As in [2, 5], our tool takes as input sets of both target and non-target sequences. However, instead of searching for substringsshared by the target sequences as in [2, 5, 8], or for highlyconserved regions in a multiple alignment of the targetsequences as in [3], Primer Hunter ensures that selectedprimers amplify all target sequences and none of the non-targetsequences by relying on accurate melting temperaturecomputations based on the nearest-neighbor model of [7] andthe fractional programming algorithm of [4].

• The open source C++ code, released under the GNU General Public License, as well as a web server for PrimerHunter are available at http://dna.engr.uconn.edu/software/PrimerHunter/

Problem DefinitionNotations• For a sequence s, we denote by |s| its length, and by s(l,i) the subsequence

of length l ending at position i (i.e., s(l,i) = si-l+1 … si-1si).

• We denote by T(p,t,i) the melting temperature of the duplex formed by aprimer p and the Watson-Crick complement of t(|p|,i). In order to ensuresensitive amplification of target sequences, we require for each selectedprimer p to have at least one position i within each target t such that T(p,t,i)is greater than or equal to a user specified threshold Tmin_target. To avoid non-specific amplification, we further require a melting temperature T(p,t,i) belowa user specified threshold Tmax_nontarget at every position i of every non targetsequence t.

• Since mismatches at the 3’ end of the primer can significantly reduceamplification efficiency we additionally require that the 3’ end of p matchperfectly t(|p|,i) at a set of bases specified using a 0-1 perfect match maskM. For example, a mask M = 3’-1101-5’ specifies that the first, second, andfourth 3’-most bases of the primer must be matched exactly. For a primer pand a target sequence t, we denote by I(p,t,M) the set of positions i of t atwhich the 3’ end of p matches t(|p|,i) according to M. Thus, in order toensure sensitive PCR amplification of target sequences, we require that aselected primer p have, for every target t, at least one position i I(p,t,M) forwhich T(p,t,i) ≥ Tmin_target

Design forward primers

Make pairs filtering by product length,cross dimerization

and Tm

Iterate over targets to build a hash table of occurances

of seed patterns Haccording with mask M

Build candidates as suitablelength substrings of one or

more target sequences

Test each candidate p

Design reverseprimers

Test GC Content, GCClamp, single base repeatand self complementarity

For each target t use H tobuild I(p,t,M) and test if

exists i:I |T(p,t,i)>Tmin_target

For each complement of a non target t’ test ifT(p,t’) < Tmax_nontarget

Given• Sets TARGETS and NONTARGETS of 5’ – 3’ DNA

sequences, perfect match mask M, meltingtemperature thresholds Tmin_target and Tmax_nontarget andconstraints on primer length, GC content, selfcomplementarity, etc.

Find• Primers p satisfying given constraints on primer

length, GC content, self-complementarity, etc., such that:– For every t TARGETS, exists i I(p,t,M) such that T(p,t,i)

≥ Tmin_target and

– For every t NONTARGETS T(p,t,i) <Tmax_nontarget for everyi { |p|,…,|t|}

Input Sequences• We used all complete

Avian Influenza HA sequences from North America available in NCBI flu database [1] as of March 2008 (a total of 574 sequences spanning the 14 subtypes shown in adjacent phylogenetic tree)

• For each subtype Hi, we used all available Hi

sequences as targets, and all sequences with different subtype as non-targets

H6H16H13

Parameters• Primer length 20 - 25

• Amplicon length 75 - 200

• GC content 25% - 75%

• Maximum mononucleotide repeat 5

• Mask M = 11

• No required 3’ GC Clamp

• Primer concentration 0.8μM

• Salt concentration 50mM

• Tmin_target =Tmax_nontarget = 40o C

Results• Suitable pairs for subtype specific PCR

amplification found for every Hi (Table1)

• Number of primers and pairs founddecreases as the variability in thetargets increases

• Primers design has been performedunder the same conditions for the nineknown NA subtypes finding suitablepairs for every Ni (Table 2)

Subtype H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 H13 H16Targets 48 41 72 67 69 100 55 9 23 16 45 15 10 4Non Targets 526 533 502 507 505 474 519 565 551 558 529 559 564 570FP 51 42 41 265 68 36 77 489 140 243 267 472 41 367RP 52 43 61 225 66 27 81 482 152 302 262 494 33 352Pairs 70 187 135 3724 160 3 260 14415 1222 3712 4117 12895 98 7629

Table 1: Number of primer pairs for HA subtypes of Avian Influenza

Subtype N1 N2 N3 N4 N5 N6 N7 N8 N9Targets 110 241 65 15 32 77 22 84 42Non Targets 578 447 623 673 656 611 666 604 646FP 97 77 45 370 355 29 97 140 292RP 71 44 61 360 353 43 103 211 305Pairs 553 234 113 9665 8380 7 480 1785 6310Table 2: Number of primer pairs for NA subtypes of Avian Influenza

• Thermodynamic Alignment Problem: Given a DNA sequence s1 in 5’ – 3’orientation and DNA sequence s2 in 3’ – 5’ orientation, find the pairwisealignment of s1 and s2 that maximizes the melting temperature accordingwith Santalucia’s Model [7].

• Fractional Programming: Given a finite set S, and two functions f,g:S→IR,if g is positive, t=f(y)/g(y) for some yS, and maxxS(f(x)-tg(x))=0 thent=maxxS(f(x)/g(x)). If the left term can be efficiently maximized, an iterativeprocess can be applied to find argmaxxS (f(x)/g(x)) [4].

• Tm Calculation Using Fractional Programming: S is the set of all possiblealignments between s1 and s2. PrimerHunter uses the melting temperatureformula of [7], which includes salt concentration effects not considered in [4]:

ΔH (x)

Tm (x) = ————————————————

ΔS (x) + 0.368*N/2*ln(Na+) + Rln(C)

where C is c1-c2/2 if c1≠c2 and (c1+c2)/4 if c1=c2

• The figure below shows the distribution of differences (in degrees celsius) between experimental melting temperatures predictions obtained by fractional programming without salt correction [4] and with salt corrections performed using [7] for 812 duplexes without mismatches

• Three primer pairs were selected for H3, H5 and H7 subtypes,and HA segments from isolates of these 3 subtypes werecloned into plasmids

• Real time PCR was performed with each combination of primer-pair and plasmid type. Amplification curves for an H3 specificprimer against H3, H5 and H7 templates and the minimum,maximum, and average difference in Ct value compared to a notemplate control for H3, H5 and H7 specific primers arepresented below.

• PrimerHunter is a new tool to design primers for subtypeidentification via PCR experiments.

• Accurate melting temperature estimates allowing formismatches are obtained using the nearest-neighbor model of[7] and the fractional programming approach of [4]; these arecritical for achieving high primer sensitivity and specificity forrapidly mutating viruses

• Primers have been successfully designed for 14 AvianInfluenza subtypes, and have been validated using real-timePCR experiments with virus isolates cloned into plasmids.

• In ongoing work we seek to incorporate group testing techniques for reducing the number of reactions needed for unambiguous subtype identification.

• The open source C++ code, released under the GNU General Public License, as well as a web server for PrimerHunter are available at http://dna.engr.uconn.edu/software/PrimerHunter/

Introduction Algorithmic Details

• Triplicate real time PCR reactions were performed with ten different dilutions of each plasmid type. The plot below gives both on-target and off-target delta Ct values at different plasmid dilutions for H3, H5 and H7-specific primer pairs

Experimental ValidationPrimer Design Conclusions

problem definition

target t

target sequence t

nontarget sequences

unknown target

known target sequence

position i

selected primer p

variable sequences

Documents

linear programming problem definition and examples

definition of problem gambling

problem definition mmm

chapter 6 problem definition zikmund.ppt

chapter3 problem definition

problem definition[1]

complex problem solving (cps) definition

problem definition

research problem definition

definition of the problem

backdrop problem definition primary...

calibration problem definition

problem definition (1)

problem definition in research design

allen_modern conservatism the problem of definition

definition of a problem

compendium of problem definition statements - volume iii …...

problem definition - ucla | bionics lab

problem definition lecture-6

problem representation. definition of problem...