pairwise and multiple sequence alignment lesson 2
Post on 20-Jan-2016
234 views
TRANSCRIPT
Pairwise and Pairwise and Multiple Multiple
Sequence Sequence AlignmentAlignment
Lesson 2Lesson 2
|| || ||||| ||| || || |||||||||||||||||||MVHLTPEEKTAVNALWGKVNVDAVGGEALGRLLVVYPWTQRFFE…
ATGGTGAACCTGACCTCTGACGAGAAGACTGCCGTCCTTGCCCTGTGGAACAAGGTGGACGTGGAAGACTGTGGTGGTGAGGCCCTGGGCAGGTTTGTATGGAGGTTACAAGGCTGCTTAAGGAGGGAGGATGGAAGCTGGGCATGTGGAGACAGACCACCTCCTGGATTTATGACAGGAACTGATTGCTGTCTCCTGTGCTGCTTTCACCCCTCAGGCTGCTGGTCGTGTATCCCTGGACCCAGAGGTTCTTTGAAAGCTTTGGGGACTTGTCCACTCCTGCTGCTGTGTTCGCAAATGCTAAGGTAAAAGCCCATGGCAAGAAGGTGCTAACTTCCTTTGGTGAAGGTATGAATCACCTGGACAACCTCAAGGGCACCTTTGCTAAACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGATCCTGAGAATTTCAAGGTGAGTCAATATTCTTCTTCTTCCTTCTTTCTATGGTCAAGCTCATGTCATGGGAAAAGGACATAAGAGTCAGTTTCCAGTTCTCAATAGAAAAAAAAATTCTGTTTGCATCACTGTGGACTCCTTGGGACCATTCATTTCTTTCACCTGCTTTGCTTATAGTTATTGTTTCCTCTTTTTCCTTTTTCTCTTCTTCTTCATAAGTTTTTCTCTCTGTATTTTTTTAACACAATCTTTTAATTTTGTGCCTTTAAATTATTTTTAAGCTTTCTTCTTTTAATTACTACTCGTTTCCTTTCATTTCTATACTTTCTATCTAATCTTCTCCTTTCAAGAGAAGGAGTGGTTCACTACTACTTTGCTTGGGTGTAAAGAATAACAGCAATAGCTTAAATTCTGGCATAATGTGAATAGGGAGGACAATTTCTCATATAAGTTGAGGCTGATATTGGAGGATTTGCATTAGTAGTAGAGGTTACATCCAGTTACCGTCTTGCTCATAATTTGTGGGCACAACACAGGGCATATCTTGGAACAAGGCTAGAATATTCTGAATGCAAACTGGGGACCTGTGTTAACTATGTTCATGCCTGTTGTCTCTTCCTCTTCAGCTCCTGGGCAATATGCTGGTGGTTGTGCTGGCTCGCCACTTTGGCAAGGAATTCGACTGGCACATGCACGCTTGTTTTCAGAAGGTGGTGGCTGGTGTGGCTAATGCCCTGGCTCACAAGTACCATTGA
MVNLTSDEKTAVLALWNKVDVEDCGGEALGRLLVVYPWTQRFFE…
MotivationMotivation
What is sequence alignmentWhat is sequence alignment??
Alignment: Alignment: Comparing two (pairwise) or Comparing two (pairwise) or more (multiple) sequences. Searching for more (multiple) sequences. Searching for a series of identical or similar characters in a series of identical or similar characters in the sequences.the sequences.
MVNLTSDEKTAVLALWNKVDVEDCGGE|| || ||||| ||| || || ||MVHLTPEEKTAVNALWGKVNVDAVGGE
Why perform a pairwise sequence Why perform a pairwise sequence alignment?alignment?
e.g., pe.g., predicting characteristics of a protein – redicting characteristics of a protein –
premised on:premised on:
similar sequence (or structure)similar sequence (or structure)
similar functionsimilar function
Finding homology between two sequences
Local vs. GlobalLocal vs. Global
Local alignmentLocal alignment – finds regions of high – finds regions of high similarity in similarity in partsparts of the sequences of the sequences
Global alignmentGlobal alignment – finds the best alignment – finds the best alignment across the across the entireentire two sequences two sequences
ADLGAVFALCDRYFQ|||| |||| |ADLGRTQN-CDRYYQ
ADLGAVFALCDRYFQ|||| |||| |ADLGRTQN CDRYYQ
Three types of nucleotide changes:Three types of nucleotide changes:1.1. SubstitutionSubstitution – a replacement of one (or more) – a replacement of one (or more)
sequence characters by another:sequence characters by another:
2.2. InsertionInsertion - an insertion of one (or more) - an insertion of one (or more) sequence characters:sequence characters:
3.3. DeletionDeletion – a deletion of one (or more) sequence – a deletion of one (or more) sequence characters:characters:
TTAA
Evolutionary changes in sequencesEvolutionary changes in sequences
InsertionInsertion + + DeletionDeletion IndelIndel
AAAAGGAA AAAACCAA
AAGAAG
GAGAAAAA
Choosing an alignment: Choosing an alignment:
Many Many differentdifferent alignments between two alignments between two sequences are possible:sequences are possible:
AAGCTGAATTCGAAAGGCTCATTTCTGA
A-AGCTGAATTC--GAAAG-GCTCA-TTTCTGA-
How do we determine which is the best alignment?
AAGCTGAATT-C-GAAAGGCT-CATTTCTGA-
. . .
Toy exerciseToy exercise
Match: Match: +1+1 Mismatch: Mismatch: -2-2 Indel: Indel: -1-1
AAGCTGAATT-C-GAAAGGCT-CATTTCTGA-
A-AGCTGAATTC--GAAAG-GCTCA-TTTCTGA-
Compute the scores of each of the following alignments using this naïve scoring scheme
Scoring scheme:11--22--22--22
--2211--22--22
--22--2211--22
--22--22--2211
A
C
G
T
A C G T
Substitution matrix
Gap penalty (opening = extending)
Substitution matrices: accounting Substitution matrices: accounting for biological contextfor biological context
Which best reflects the biological reality regarding nucleotide mismatch penalty?
1. Tr > Tv > 0
2. Tv > Tr > 0
3. 0 > Tr > Tv
4. 0 > Tv > Tr
Tr = Transition
Tv = Transversion
Scoring schemes: accounting for Scoring schemes: accounting for biological contextbiological context
Which best reflects the biological reality regarding these mismatch penalties?
1. Arg->Lys > Ala->Phe
2. Arg->Lys > Thr->Asp
3. Asp->Val > Asp->Glu
PAM matricesPAM matrices Family of matrices PAM 80, PAM 120, PAM 250, …Family of matrices PAM 80, PAM 120, PAM 250, …
The number with a PAM matrix (the The number with a PAM matrix (the nn in PAM in PAMnn) ) represents the evolutionary distance between the represents the evolutionary distance between the sequences on which the matrix is basedsequences on which the matrix is based
The (The (iithth,,jjthth)) cell in a PAMcell in a PAMnn matrix denotes the probability matrix denotes the probability that amino-acid that amino-acid ii will be replaced by amino-acid will be replaced by amino-acid j j in in time time nn:: P Pii→→j,nj,n
Greater Greater nn numbers denote greater distances numbers denote greater distances
PAM - limitationsPAM - limitations
Based on only one original datasetBased on only one original dataset
Examines proteins with few differences Examines proteins with few differences (85% identity)(85% identity)
Based mainly on small globular proteins Based mainly on small globular proteins so the matrix is biased so the matrix is biased
BLOSUM matricesBLOSUM matrices Different BLOSUMDifferent BLOSUMnn matrices are calculated matrices are calculated
independently from BLOCKS (ungapped, manually independently from BLOCKS (ungapped, manually created local alignments)created local alignments)
BLOSUMBLOSUMnn is based on a cluster of BLOCKS of is based on a cluster of BLOCKS of sequences that share at least sequences that share at least nn percent identity percent identity
The (The (iithth,,jjthth)) cell in a BLOSUM matrix denotes the log of cell in a BLOSUM matrix denotes the log of odds of the observed frequency and expected frequency odds of the observed frequency and expected frequency of amino acids of amino acids ii and and j j in the same position in the data: in the same position in the data: log(log(PPijij//qqii**qqjj))
Higher Higher nn numbers denote higher identity between the numbers denote higher identity between the sequences on which the matrix is basedsequences on which the matrix is based
PAM Vs. BLOSUMPAM Vs. BLOSUM PAM100 = BLOSUM90 PAM120 = BLOSUM80 PAM160 = BLOSUM60 PAM200 = BLOSUM52 PAM250 = BLOSUM45
More distant sequences
BLOSUM62 for general useBLOSUM62 for general useBLOSUM80 for close relationsBLOSUM80 for close relationsBLOSUM45 for distant relationsBLOSUM45 for distant relations
PAM120 for general usePAM120 for general usePAM60 for close relations PAM60 for close relations PAM250 for distant relationsPAM250 for distant relations
Substitution matrices exerciseSubstitution matrices exercise
Pick the best substitution matrix (PAM and Pick the best substitution matrix (PAM and BLOSUM) for each pairwise alignment:BLOSUM) for each pairwise alignment:
Human – chimpHuman – chimp Human - yeastHuman - yeast Human – fishHuman – fish
PAM options: PAM60 PAM120 PAM250
BLOSUM options: BLOSUM45 BLOSUM62 BLOSUM80
Substitution matrices Substitution matrices
Nucleic acids:Nucleic acids: Transition-transversionTransition-transversion
Amino acids:Amino acids: Evolutionary (empirical data) based: (PAM, Evolutionary (empirical data) based: (PAM,
BLOSUM)BLOSUM) Physico-chemical properties based Physico-chemical properties based
(Grantham, McLachlan)(Grantham, McLachlan)
Gap penaltyGap penalty
AAGCGAAATTCGAACA-G-GAA-CTCGAAC
AAGCGAAATTCGAACAGG---AACTCGAAC
• Which alignment has a higher score?
• Which alignment is more likely?
Pairwise alignment algorithm matrix Pairwise alignment algorithm matrix representation: representation: formulationformulation
V[i,j] = value of the optimal alignment between S1[1…i] and S2[1…j]
V[i,j] + S(S1[i+1],S2[j+1])
V[i+1,j+1] = max V[i+1,j] + S(gap)
V[i,j+1] + S(gap)
V[i,j]V[i,j]V[i+1,j]V[i+1,j]
V[i,j+1]V[i,j+1]V[i+1,j+1]V[i+1,j+1]
2 sequences: S1 and S2 and a Scoring scheme: match = 1, mismatch = -1, gap = -2
Pairwise alignment algorithm matrix Pairwise alignment algorithm matrix representation: representation: initializationinitialization
0
A 1
G 2
C 3
0 0 -2 -4 -6
A 1 -2
A 2 -4
A 3 -6
C 4 -8
S2S1
Match = 1Mismatch = -1Indel (gap) = -2
Scoring scheme:
Pairwise alignment algorithm matrix Pairwise alignment algorithm matrix representation: representation: filling the matrixfilling the matrix
Match = 1Mismatch = -1Indel (gap) = -2
Scoring scheme:
0
A 1
G 2
C 3
0 0 -2 -4 -6
A 1 -2 1 -1 -3
A 2 -4 -1 0 -2
A 3 -6 -3 -2 -1
C 4 -8 -5 -4 -1
S2S1
Pairwise alignment algorithm matrix Pairwise alignment algorithm matrix representation: representation: trace backtrace back
0
A 1
G 2
C 3
0 0 -2 -4 -6
A 1 -2 1 -1 -3
A 2 -4 -1 0 -2
A 3 -6 -3 -2 -1
C 4 -8 -5 -4 -1
S2S1
Pairwise alignment algorithm matrix Pairwise alignment algorithm matrix representation: trace backrepresentation: trace back
0
A 1
G 2
C 3
0 0 -2 -4 -6
A 1 -2 1 -1 -3
A 2 -4 -1 0 -2
A 3 -6 -3 -2 -1
C 4 -8 -5 -4 -1
S2S1
AAAC
AG-C
Assessing the significance of an Assessing the significance of an alignment scorealignment score
AAGCTGAATTC-GAAAGGCTCATTTCTGA-
AAGCTGAATTCGAAAGGCTCATTTCTGA
AGATCAGTAGACTAGAGTAGCTATCTCT
28.0
AGATCAGTAGACTA---------GAGTAG-CTATCTCT
CGATAGATAGCATAGCATGTCATGATTC
.
.
CGATAGATAGCATA------------------GCATGTCATGATTC
26.0
16.0
True
Random
Web servers for pairwise alignmentWeb servers for pairwise alignment
BLAST 2 sequences (bl2Seq) at BLAST 2 sequences (bl2Seq) at NCBI NCBI
Produces the Produces the locallocal alignment of two given alignment of two given sequences using sequences using BLASTBLAST (Basic Local (Basic Local Alignment Search Tool)Alignment Search Tool) engine for local engine for local alignmentalignment
Does not use an exact algorithm but a Does not use an exact algorithm but a heuristicheuristic
Back to NCBIBack to NCBI
BLAST – bl2seqBLAST – bl2seq
Bl2Seq - queryBl2Seq - query
blastnblastn – – nucleotide nucleotide blastpblastp – protein – protein
Bl2seq resultsBl2seq results
Bl2seq resultsBl2seq results
MatchMatch DissimilarityDissimilarity SimilaritySimilarity GapsGaps Low Low
complexitycomplexity
Query type: AA or DNAQuery type: AA or DNA??
For coding sequences, AA (protein) data For coding sequences, AA (protein) data are betterare better Selection operates most strongly at the protein Selection operates most strongly at the protein
level level →→ the homology is more evident the homology is more evident AA – 20 char’ alphabetAA – 20 char’ alphabet DNA - 4 char’ alphabetDNA - 4 char’ alphabet
lower chance of random homology for AAlower chance of random homology for AA
↓
BLAST – programsBLAST – programs
Query: DNA Protein
Database: DNA Protein
BLAST – BlastpBLAST – Blastp
Blastp - resultsBlastp - results
Blastp – results (cont’)Blastp – results (cont’)
Blast scoresBlast scores:: Bits scoreBits score – A score for the alignment according – A score for the alignment according
to the number of similarities, identities, etc. It has to the number of similarities, identities, etc. It has a standard set of units and is thus independent a standard set of units and is thus independent of the scoring schemeof the scoring scheme
Expected-score (E-value)Expected-score (E-value) –The number of –The number of alignments with the same or higher score one alignments with the same or higher score one can “expect” to see by chance when searching a can “expect” to see by chance when searching a random database with a random sequence of random database with a random sequence of particular sizes. The closer the e-value is to particular sizes. The closer the e-value is to zero, the greater the confidence that the hit is zero, the greater the confidence that the hit is really a homologreally a homolog
Multiple Multiple Sequence Sequence
Alignment (MSA)Alignment (MSA)
Seq1 VTISCTGSSSNIGAG-NHVKWYQQLPGSeq2 VTISCTGTSSNIGS--ITVNWYQQLPGSeq3 LRLSCSSSGFIFSS--YAMYWVRQAPGSeq4 LSLTCTVSGTSFDD--YYSTWVRQPPGSeq5 PEVTCVVVDVSHEDPQVKFNWYVDG--Seq6 ATLVCLISDFYPGA--VTVAWKADS--Seq7 AALGCLVKDYFPEP--VTVSWNSG---Seq8 VSLTCLVKGFYPSD--IAVEWWSNG--
Similar to pairwise alignment BUT n sequences are aligned instead of just 2
Multiple sequence alignment
Each row represents an individual sequenceEach column represents the ‘same’ position
Why perform an MSAWhy perform an MSA??
MSAs are at the heart of comparative genomics studies which seek to study evolutionary histories, functional and structural aspects of sequences, and to understand phenotypic differences between species
Seq1 VTISCTGSSSNIGAG-NHVKWYQQLPGSeq2 VTISCTGTSSNIGS--ITVNWYQQLPGSeq3 LRLSCSSSGFIFSS--YAMYWVRQAPGSeq4 LSLTCTVSGTSFDD--YYSTWVRQPPGSeq5 PEVTCVVVDVSHEDPQVKFNWYVDG--Seq6 ATLVCLISDFYPGA--VTVAWKADS--Seq7 AALGCLVKDYFPEP--VTVSWNSG---Seq8 VSLTCLVKGFYPSD--IAVEWWSNG--
Seq1 VTISCTGSSSNIGAG-NHVKWYQQLPGSeq2 VTISCTGTSSNIGS--ITVNWYQQLPGSeq3 LRLSCSSSGFIFSS--YAMYWVRQAPGSeq4 LSLTCTVSGTSFDD--YYSTWVRQPPGSeq5 PEVTCVVVDVSHEDPQVKFNWYVDG--Seq6 ATLVCLISDFYPGA--VTVAWKADS--Seq7 AALGCLVKDYFPEP--VTVSWNSG---Seq8 VSLTCLVKGFYPSD--IAVEWWSNG--
Multiple sequence alignment
variable conserved
Alignment methodsAlignment methods
There is no available optimal solution for There is no available optimal solution for MSA – all methods are MSA – all methods are heuristics:heuristics:
Progressive/hierarchical alignment Progressive/hierarchical alignment (ClustalX)(ClustalX)
Iterative alignment (MAFFT, MUSCLE)Iterative alignment (MAFFT, MUSCLE)
ABCDE
Compute the pairwise Compute the pairwise alignments for all against alignments for all against
all (10 pairwise alignments).all (10 pairwise alignments).The similarities are The similarities are
converted to distances and converted to distances and stored in a tablestored in a table
First step :compute pairwise distances
Progressive alignmentProgressive alignment
AABBCCDDEE
AA
BB88
CC15151717
DD161614141010
EE3232313131313232
A
D
C
B
E
Cluster the sequences to create a Cluster the sequences to create a tree (tree (guide treeguide tree):):
• represents the order in which pairs of represents the order in which pairs of sequences are to be alignedsequences are to be aligned• similar sequences are neighbors in the similar sequences are neighbors in the tree tree • distant sequences are distant from distant sequences are distant from each other in the treeeach other in the tree
Second step:build a guide tree
AABBCCDDEE
AA
BB88
CC15151717
DD161614141010
EE3232313131313232The guide tree is imprecise The guide tree is imprecise and is NOT the tree which and is NOT the tree which truly describes the truly describes the evolutionary relationship evolutionary relationship between the sequences!between the sequences!
Third step: align sequences in a bottom up order
A
D
C
B
E
1. Align the most similar (neighboring) pairs
2. Align pairs of pairs
3. Align sequences clustered to pairs of pairs deeper in the tree
Sequence A
Sequence B
Sequence C
Sequence D
Sequence E
Main disadvantages of progressive Main disadvantages of progressive alignmentsalignments
A
D
C
B
E
Sequence A
Sequence B
Sequence C
Sequence D
Sequence E
Guide-tree topology may be considerably wrong
Globally aligning pairs of sequences may create errors that will propagate through to the final result
ABCDE
Iterative alignmentIterative alignment
Guide tree
Pairwise distance table
Iterate until the MSA does not change (convergence)
A
DCB
E
MSA
Blastp – acquiring sequencesBlastp – acquiring sequences
blastp – acquiring sequencesblastp – acquiring sequences
blastp – acquiring sequencesblastp – acquiring sequences
MSA input: multiple sequence Fasta fileMSA input: multiple sequence Fasta file>gi|4504351|ref|NP_000510.1| delta globin [Homo sapiens]MVHLTPEEKTAVNALWGKVNVDAVGGEALGRLLVVYPWTQRFFESFGDLSSPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFSQLSELHCDKLHVDPENFRLLGNVLVCVLARNFGKEFTPQMQAAYQKVVAGVANALAHKYH
>gi|4504349|ref|NP_000509.1| beta globin [Homo sapiens]MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH
>gi|4885393|ref|NP_005321.1| epsilon globin [Homo sapiens]MVHFTAEEKAAVTSLWSKMNVEEAGGEALGRLLVVYPWTQRFFDSFGNLSSPSAILGNPKVKAHGKKVLTSFGDAIKNMDNLKPAFAKLSELHCDKLHVDPENFKLLGNVMVIILATHFGKEFTPEVQAAWQKLVSAVAIALAHKYH
>gi|6715607|ref|NP_000175.1| G-gamma globin [Homo sapiens]MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSSASAIMGNPKVKAHGKKVLTSLGDAIKHLDDLKGTFAQLSELHCDKLHVDPENFKLLGNVLVTVLAIHFGKEFTPEVQASWQKMVTGVASALSSRYH
>gi|28302131|ref|NP_000550.2| A-gamma globin [Homo sapiens]MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSSASAIMGNPKVKAHGKKVLTSLGDATKHLDDLKGTFAQLSELHCDKLHVDPENFKLLGNVLVTVLAIHFGKEFTPEVQASWQKMVTAVASALSSRYH
>gi|4885397|ref|NP_005323.1| hemoglobin, zeta [Homo sapiens]MSLTKTERTIIVSMWAKISTQADTIGTETLERLFLSHPQTKTYFPHFDLHPGSAQLRAHGSKVVAAVGDAVKSIDDIGGALSKLSELHAYILRVDPVNFKLLSHCLLVTLAARFPADFTAEAHAAWDKFLSVVSSVLTEKYR
MSA using MSA using ClustalXClustalX
Step1: Load the sequencesStep1: Load the sequences
A little unclear…
Edit Fasta headersEdit Fasta headers……>gi|4504351|ref|NP_000510.1| delta globin [Homo sapiens]MVHLTPEEKTAVNALWGKVNVDAVGGEALGRLLVVYPWTQRFFESFGDLSSPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFSQLSELHCDKLHVDPENFRLLGNVLVCVLARNFGKEFTPQMQAAYQKVVAGVANALAHKYH
>gi|4504349|ref|NP_000509.1| beta globin [Homo sapiens]MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH
>gi|4885393|ref|NP_005321.1| epsilon globin [Homo sapiens]MVHFTAEEKAAVTSLWSKMNVEEAGGEALGRLLVVYPWTQRFFDSFGNLSSPSAILGNPKVKAHGKKVLTSFGDAIKNMDNLKPAFAKLSELHCDKLHVDPENFKLLGNVMVIILATHFGKEFTPEVQAAWQKLVSAVAIALAHKYH
>gi|6715607|ref|NP_000175.1| G-gamma globin [Homo sapiens]MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSSASAIMGNPKVKAHGKKVLTSLGDAIKHLDDLKGTFAQLSELHCDKLHVDPENFKLLGNVLVTVLAIHFGKEFTPEVQASWQKMVTGVASALSSRYH
>gi|28302131|ref|NP_000550.2| A-gamma globin [Homo sapiens]MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSSASAIMGNPKVKAHGKKVLTSLGDATKHLDDLKGTFAQLSELHCDKLHVDPENFKLLGNVLVTVLAIHFGKEFTPEVQASWQKMVTAVASALSSRYH
>gi|4885397|ref|NP_005323.1| hemoglobin, zeta [Homo sapiens]MSLTKTERTIIVSMWAKISTQADTIGTETLERLFLSHPQTKTYFPHFDLHPGSAQLRAHGSKVVAAVGDAVKSIDDIGGALSKLSELHAYILRVDPVNFKLLSHCLLVTLAARFPADFTAEAHAAWDKFLSVVSSVLTEKYR
> delta globin
> beta globin
> epsilon globin
> G-gamma globin
> A-gamma globin
> hemoglobin zeta
Step2: Perform alignmentStep2: Perform alignment
MSA and conservation viewMSA and conservation view
Messing-up alignment of HIV-1 env
MSA toolsMSA tools
Progressive:Progressive: CLUSTALX/CLUSTALX/CLUSTALWCLUSTALW
Iterative:Iterative: MUSCLEMUSCLE, , MAFFTMAFFT, , PRANKPRANK