parametric inference and drosophila alignments female male karyotype a project to compare and...

40

Post on 19-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Parametric Inference

and

Drosophila Alignments

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

http://species.flybase.net/

Female Male

KaryotypeA project to compare and contrast Drosophila

Sequencing & Assembly Status Sequencing Center~3-fold WGS of w501 strain & 1-foldcoverage of 6 other strains complete(2 assemblies currently available; deepercoverage of w501 strain expected Fall ‘05)

Washington Univ.(WUGSC)

~3-fold WGS complete (assembly to bereleased by Sept 1)

Broad Institute

Release 4.2: 118.4 Mb with 23 gapsremaining (Release 5 in Fall 2005)

Celera/BDGP

~6-fold WGS complete (assembly inGenBank)(additional coverage - automated sequenceimprovement expected Fall ‘05)

Washington Univ(WUGSC).

~12-fold WGS complete & assembled Agencourt

~8-fold WGS complete & assembled Agencourt~9-fold WGS complete & assembled Baylor College of

Medicine (BCM)~4-fold WGS complete & assembled Broad Institute

~6-fold WGS (BAC paired ends currentlybeing sequenced; assembly to be released bySept 15)

Venter Institute(JCVI)

~8-fold WGS complete & assembled Agencourt~9-fold WGS complete & assembled Agencourt~8-fold WGS complete (assembly to bereleased by Sept 15)

Agencourt

http://rana.lbl.gov/drosophila/

DroAna_20041206_ GTCGCTCAACCAGCATTTGCAAAAGTCGCAGAACTTGCGCTCATTGGATTTCCAGTACTCDroMel_4_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTCDroMoj_20041206_ GTCGCTTAACCAGCATTTACAGAAATCGCAATACTTGCGTTCATTGGATTTCCAGTACTCDroPse_1_ GTCGCTCAGCCAGCACTTGCAGAAGTCGCAGTACTTGCGCTCGTTTGATTTCCAGAATTCDroSim_20040829_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTCDroVir_20041029_ GTCGCTCAACCAGCATTTGCAGAAGTCGCAATACTTGCGTTCATTCGACTTCCAGTACTCDroYak_1_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTCCGCTCGTTTGACTTCCAGTACTC ****** * ****** ** ** ** ***** **** ** ** ** ** ****** * **

Alignment of an exon

DroAna_20041206_ CTGAAGGAAT-------TCTATATT---------AAAGAAGATTTCTCATCATTGGTTGDroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGA---------GTTTDroMoj_20041206_ CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAAATTCTATTGAAA-------DroPse_1_ CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG----DroSim_20040829_ CTGCGGGATTAGGAGTCATTAGAGT---------GCGGAAAAGCGG---------GTT-DroVir_20041029_ CTGCAGCAGTTAAATA-ATTGTAATAAACAATTCTCT--AATTTGGTCCAAA-------DroYak_1_ CTGCGGGATTAGCGGTCATTGGTGT---------GAAGAATAGATC---------CTTT *** * * * DroAna_20041206_ AATC-----ACTTACDroMel_4_ ATTCTATGGACTCACDroMoj_20041206_ ----TATTTACTCACDroPse_1_ ------TGTACTTACDroSim_20040829_ ATTCTATGGACTCACDroVir_20041029_ ----TATTTACTCACDroYak_1_ ATTTCATAAACTCAC

*** **

Alignment of an intron

DroAna_20041206_ GTCGCTCAACCAGCATTTGCAAAAGTCGCAGAACTTGCGCTCATTGGATTTCCAGTACTCDroMel_4_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTCDroMoj_20041206_ GTCGCTTAACCAGCATTTACAGAAATCGCAATACTTGCGTTCATTGGATTTCCAGTACTCDroPse_1_ GTCGCTCAGCCAGCACTTGCAGAAGTCGCAGTACTTGCGCTCGTTTGATTTCCAGAATTCDroSim_20040829_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTCDroVir_20041029_ GTCGCTCAACCAGCATTTGCAGAAGTCGCAATACTTGCGTTCATTCGACTTCCAGTACTCDroYak_1_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTCCGCTCGTTTGACTTCCAGTACTC ****** * ****** ** ** ** ***** **** ** ** ** ** ****** * **

Alignment of an exon

Alignment of an introndroAna1.2448876 CTGAAGGAATTCTA--TATTAAAG-----------------------------------dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAA----AGCGAGT-TTATTCdroMoj1.contig_2959 CTGGAATAGTTAATTTCATTGTAA---------CACATAAA------CGTTTTAAATTCdp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGG----AGAGGCCATCATCGdroSim1.chr2L CTGCGGGATTAGGAGTCATTAGAG---------TGCGGAAA----AGCGGG--TTATTCdroVir1.scaffold_6 CTGCAGCAGTTAA-ATAATTGTAA---------TAAACAA--------TTCTCTAATTTdroYak1.chr2L CTGCGGGATTAGCGGTCATTGGTG---------TGAAGAAT----AGATCCT-TTATTT *** * * * *

droAna1.2448876 AAGATTTCTCATCATTGGTTGAATC---------------------ACTTAC dm2.chr2L -----------------------------------------TATGGACTCACdroMoj1.contig_2959 -------------------------AAATATTT--------TATTGACTCACdp3.chr4_group3 -----------------------------------------TGT--ACTTACdroSim1.chr2L -----------------------------------------TATGGACTCACdroVir1.scaffold_6 ---------------------------------AAATATTTGGTCCACTCACdroYak1.chr2L -----------------------------------------CATAAACTCAC *** **

DroAna_20041206_ GTCGCTCAACCAGCATTTGCAAAAGTCGCAGAACTTGCGCTCATTGGATTTCCAGTACTCDroMel_4_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTCDroMoj_20041206_ GTCGCTTAACCAGCATTTACAGAAATCGCAATACTTGCGTTCATTGGATTTCCAGTACTCDroPse_1_ GTCGCTCAGCCAGCACTTGCAGAAGTCGCAGTACTTGCGCTCGTTTGATTTCCAGAATTCDroSim_20040829_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTCDroVir_20041029_ GTCGCTCAACCAGCATTTGCAGAAGTCGCAATACTTGCGTTCATTCGACTTCCAGTACTCDroYak_1_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTCCGCTCGTTTGACTTCCAGTACTC ****** * ****** ** ** ** ***** **** ** ** ** ** ****** * **

Alignment of an exon

Alignment of an introndroAna CTGAAGGAATT--CTATATTAAAGAAGATTTCTCATCATT-GGTTGAATCACTTAC----droMel CTGCGGGATTAGGGGTCATTAGAGTGCCGAAAAGCGAGTTTATTCTATGGACTCAC----droMoj CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAA-ATTCAAATATTTTATTGACdroPse CTGGAAGAGTT--TTGATTAGTAGGGGATCCATGGGGGCG-AGGAGAGGCCATCATCGTGdroSim CTGCGGGATTAGGAGTCATTAGAGTGCGGAAAAGCGGGTT-ATTCTATGGACTCAC----droVir CTGCAGCAGTTAAATA-ATTGTAATAAACAA--TTCTCTA-ATTTAAATATTTGGTCCACdroYak CTGCGGGATTAGCGGTCATTGGTGTGAAGAATAGATCCTTTATTTCATAAACTCAC---- *** * * * * * droAna -------droMel -------droPse TACTTACdroMoj TCAC---droSim -------droVir TCAC---droYak -------

DroAna_20041206_ CTGAAGGAAT-------TCTATATT---------AAAGAAGATTTCTCATCATTGGTTGDroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGA---------GTTTDroMoj_20041206_ CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAAATTCTATTGAAA-------DroPse_1_ CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG----DroSim_20040829_ CTGCGGGATTAGGAGTCATTAGAGT---------GCGGAAAAGCGG---------GTT-DroVir_20041029_ CTGCAGCAGTTAAATA-ATTGTAATAAACAATTCTCT--AATTTGGTCCAAA-------DroYak_1_ CTGCGGGATTAGCGGTCATTGGTGT---------GAAGAATAGATC---------CTTT DroAna_20041206_ AATC-----ACTTACDroMel_4_ ATTCTATGGACTCACDroMoj_20041206_ ----TATTTACTCACDroPse_1_ ------TGTACTTACDroSim_20040829_ ATTCTATGGACTCACDroVir_20041029_ ----TATTTACTCACDroYak_1_ ATTTCATAAACTCAC

droAna CTGAAGGAATT--CTATATTAAAGAAGATTTCTCATCATT-GGTTGAATCACTTAC----droMel CTGCGGGATTAGGGGTCATTAGAGTGCCGAAAAGCGAGTTTATTCTATGGACTCAC----droMoj CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAA-ATTCAAATATTTTATTGACdroPse CTGGAAGAGTT--TTGATTAGTAGGGGATCCATGGGGGCG-AGGAGAGGCCATCATCGTGdroSim CTGCGGGATTAGGAGTCATTAGAGTGCGGAAAAGCGGGTT-ATTCTATGGACTCAC----droVir CTGCAGCAGTTAAATA-ATTGTAATAAACAA--TTCTCTA-ATTTAAATATTTGGTCCACdroYak CTGCGGGATTAGCGGTCATTGGTGTGAAGAATAGATCCTTTATTTCATAAACTCAC---- droAna -------droMel -------droPse TACTTACdroMoj TCAC---droSim -------droVir TCAC---droYak -------

droAna1.2448876 CTGAAGGAATTCTA--TATTAAAG-----------------------------------dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAA----AGCGAGT-TTATTCdroMoj1.contig_2959 CTGGAATAGTTAATTTCATTGTAA---------CACATAAA------CGTTTTAAATTCdp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGG----AGAGGCCATCATCGdroSim1.chr2L CTGCGGGATTAGGAGTCATTAGAG---------TGCGGAAA----AGCGGG--TTATTCdroVir1.scaffold_6 CTGCAGCAGTTAA-ATAATTGTAA---------TAAACAA--------TTCTCTAATTTdroYak1.chr2L CTGCGGGATTAGCGGTCATTGGTG---------TGAAGAAT----AGATCCT-TTATTT

droAna1.2448876 AAGATTTCTCATCATTGGTTGAATC---------------------ACTTAC dm2.chr2L -----------------------------------------TATGGACTCACdroMoj1.contig_2959 -------------------------AAATATTT--------TATTGACTCACdp3.chr4_group3 -----------------------------------------TGT--ACTTACdroSim1.chr2L -----------------------------------------TATGGACTCACdroVir1.scaffold_6 ---------------------------------AAATATTTGGTCCACTCACdroYak1.chr2L -----------------------------------------CATAAACTCAC

64%

50%

43%

DroAna_20041206_ GTCGCTCAACCAGCATTTGCAAAAGTCGCAGAACTTGCGCTCATTGGATTTCCAGTACTCDroMel_4_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTCDroMoj_20041206_ GTCGCTTAACCAGCATTTACAGAAATCGCAATACTTGCGTTCATTGGATTTCCAGTACTCDroPse_1_ GTCGCTCAGCCAGCACTTGCAGAAGTCGCAGTACTTGCGCTCGTTTGATTTCCAGAATTCDroSim_20040829_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTCDroVir_20041029_ GTCGCTCAACCAGCATTTGCAGAAGTCGCAATACTTGCGTTCATTCGACTTCCAGTACTCDroYak_1_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTCCGCTCGTTTGACTTCCAGTACTC ****** * ****** ** ** ** ***** **** ** ** ** ** ****** * **

Alignment of an exon

DroAna_20041206_ GTCGCTCAACCAGCATTTGCAAAAGTCGCAGAACTTGCGCTCATTGGATTTCCAGTACTCDroEre_20041028_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTCCGCTCGTTTGACTTCCAGTACTCDroMel_4_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTCDroMoj_20041206_ GTCGCTTAACCAGCATTTACAGAAATCGCAATACTTGCGTTCATTGGATTTCCAGTACTCDroPse_1_ GTCGCTCAGCCAGCACTTGCAGAAGTCGCAGTACTTGCGCTCGTTTGATTTCCAGAATTCDroSim_20040829_ GTCGCTCAGCCAGCA-TTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTCDroVir_20041029_ GTCGCTCAACCAGCATTTGCAGAAGTCGCAATACTTGCGTTCATTCGACTTCCAGTACTCDroYak_1_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTCCGCTCGTTTGACTTCCAGTACTC ****** * ****** ** ** ** ***** **** ** ** ** ** ****** * **

simulans ~3-fold WGS of w501 strain & 1-foldcoverage of 6 other strains complete(2 assemblies currently available; deepercoverage of w501 strain expected Fall ‘05)

X

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Core Promoter Sequences Contribute to ovo-B Regulation in the Drosophila melanogaster Germline

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Beata Bielinska, Jining Lü, David Sturgill and Brian OliverLaboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Department of Health and Human Services, Bethesda, Maryland 20892

Vol. 169, 161-172, January 2005

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

DroAna_20041206_ TTTCGGTGATTTTGAGTCT---------------CATATTGTATATTGTCTTCTT-----DroEre_20041028_ TCCGGGTGATTTTCCGTTG---------------CTTTTT-TTTTTTGCCTGCTT-----DroMel_4_ TC--GGTGATTTTCCGTTG---------------CTTTTT-TATTGTGTGTGCAC-----DroMoj_20041206_ TTTCGTTGTTATTACATTCTATTTTAATTTCGGAGTAATCTTCGTT--------CTCTTGDroPse_1_ TCTCGGCAGTTTTTCGTTGTAATATA-TTGGGGACTATTTGT------------------DroSim_20040829_ ------------------------------------------------------------DroVir_20041029_ TTTCGTTGTTATTTAATT--------ATTTAAGGCTCGTTTTCTTTTGCCCACCCCCCTADroYak_1_ TC--GGTGATTTTCCGTTG---------------TTTCTT-T-TTTCGCCCGCAC----- DroAna_20041206_ ------CTCGAAAGTTCCTTGACTCCTAGCATCCA------TTACATTACATTAGA----DroEre_20041028_ ------TCGAAAAGTTCTAT------TGGGTTCCACACGGTTTTCATATAGTTTGAA---DroMel_4_ ------TCG-AAAGTTCTAT------TAGGTTCCACAGGGTTTTTATA-------CA---DroMoj_20041206_ CGCTTTTCGC----TTTCGGGCAAGTGCCGTT----AACTTTTGCTTTACA--AGAATGTDroPse_1_ --------GAAATTTTCT----------------------TTTAGATACAAAAATAC---DroSim_20040829_ ------------------------------------------------------------DroVir_20041029_ CCCTATTCGCTCGGTTTCGGGCAACTGCCGTTGCACATTTATAACGTAAC----GAATGTDroYak_1_ ------TC-----GTTCTAT------TAGGTTCCACAAGGTTTTCATA-------TA--- DroAna_20041206_ ----------------------------------------TCTATTATT-------TCTADroEre_20041028_ ----------------------------------CATAAT--------------------DroMel_4_ ----------------------------------TATGATT----AATT-------CGTADroMoj_20041206_ AAAACTTATG--------------------CGCGCATCAGTGCATACATACAAACATA--DroPse_1_ ----------------------------------AAAGGATCGGT--TT-------TATCDroSim_20040829_ ------------------------------------------------------------DroVir_20041029_ AAAACTCATGATGCGCATGCAGCACTAACACATGCATACATGCATACATACATACATATADroYak_1_ ----------------------------------CATAGTTTGATAGTT-------TGTA

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Core Promoter Sequences Contribute to ovo-B Regulation in the Drosophila melanogaster Germline

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Beata Bielinska, Jining Lü, David Sturgill and Brian OliverLaboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Department of Health and Human Services, Bethesda, Maryland 20892

Vol. 169, 161-172, January 2005

Sequencing & Assembly Status Sequencing Center~3-fold WGS of w501 strain & 1-foldcoverage of 6 other strains complete(2 assemblies currently available; deepercoverage of w501 strain expected Fall ‘05)

Washington Univ.(WUGSC)

~3-fold WGS complete (assembly to bereleased by Sept 1)

Broad Institute

Release 4.2: 118.4 Mb with 23 gapsremaining (Release 5 in Fall 2005)

Celera/BDGP

~6-fold WGS complete (assembly inGenBank)(additional coverage - automated sequenceimprovement expected Fall ‘05)

Washington Univ(WUGSC).

~12-fold WGS complete & assembled Agencourt

~8-fold WGS complete & assembled Agencourt~9-fold WGS complete & assembled Baylor College of

Medicine (BCM)~4-fold WGS complete & assembled Broad Institute

~6-fold WGS (BAC paired ends currentlybeing sequenced; assembly to be released bySept 15)

Venter Institute(JCVI)

~8-fold WGS complete & assembled Agencourt~9-fold WGS complete & assembled Agencourt~8-fold WGS complete (assembly to bereleased by Sept 15)

Agencourt

http://rana.lbl.gov/drosophila/

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

http://species.flybase.net/

Female Male

KaryotypeDifferences among the Drosophila

Available Drosophila whole genome multiple alignments

• MAVID• http://hanuman.math.berkeley.edu/kbrowser

• MULTIZ• http://genome.ucsc.edu/ (currently no D. erecta)

DroAna_20041206_ CTGAAGGAAT-------TCTATATT---------AAAGAAGATTTCTCATCATTGGTTGDroEre_20041028_ CTGCGGGATTAGGGGTCATTGGTGT---------GCCAAAAGTCGC---------GTTTDroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGA---------GTTTDroMoj_20041206_ CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAAATTCTATTGAAA-------DroPse_1_ CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG----DroSim_20040829_ CTGCGGGATTAGGAGTCATTAGAGT---------GCGGAAAAGCGG---------GTT-DroVir_20041029_ CTGCAGCAGTTAAATA-ATTGTAATAAACAATTCTCT--AATTTGGTCCAAA-------DroYak_1_ CTGCGGGATTAGCGGTCATTGGTGT---------GAAGAATAGATC---------CTTT *** * * * DroAna_20041206_ AATC-----ACTTACDroEre_20041028_ ACTTTATAGACTCACDroMel_4_ ATTCTATGGACTCACDroMoj_20041206_ ----TATTTACTCACDroPse_1_ ------TGTACTTACDroSim_20040829_ ATTCTATGGACTCACDroVir_20041029_ ----TATTTACTCACDroYak_1_ ATTTCATAAACTCAC

*** **

N. Bray and L. Pachter, MAVID: Constrained ancestral alignment of multiple sequences, Genome Research 14 (2004) p 693--699

MAVID

DroAna_20041206_ CTGAAGGAAT-------TCTATATT---------AAAGAAGATTTCTCATCATTGGTTGDroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGA---------GTTTDroMoj_20041206_ CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAAATTCTATTGAAA-------DroPse_1_ CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG----DroSim_20040829_ CTGCGGGATTAGGAGTCATTAGAGT---------GCGGAAAAGCGG---------GTT-DroVir_20041029_ CTGCAGCAGTTAAATA-ATTGTAATAAACAATTCTCT--AATTTGGTCCAAA-------DroYak_1_ CTGCGGGATTAGCGGTCATTGGTGT---------GAAGAATAGATC---------CTTT *** * * * DroAna_20041206_ AATC-----ACTTACDroMel_4_ ATTCTATGGACTCACDroMoj_20041206_ ----TATTTACTCACDroPse_1_ ------TGTACTTACDroSim_20040829_ ATTCTATGGACTCACDroVir_20041029_ ----TATTTACTCACDroYak_1_ ATTTCATAAACTCAC

*** **

N. Bray and L. Pachter, MAVID: Constrained ancestral alignment of multiple sequences, Genome Research 14 (2004) p 693--699

MAVID

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

droAna1.2448876 CTGAAGGAATTCTA--TATTAAAG-----------------------------------dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAA----AGCGAGT-TTATTCdroMoj1.contig_2959 CTGGAATAGTTAATTTCATTGTAA---------CACATAAA------CGTTTTAAATTCdp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGG----AGAGGCCATCATCGdroSim1.chr2L CTGCGGGATTAGGAGTCATTAGAG---------TGCGGAAA----AGCGGG--TTATTCdroVir1.scaffold_6 CTGCAGCAGTTAA-ATAATTGTAA---------TAAACAA--------TTCTCTAATTTdroYak1.chr2L CTGCGGGATTAGCGGTCATTGGTG---------TGAAGAAT----AGATCCT-TTATTT *** * * * *

droAna1.2448876 AAGATTTCTCATCATTGGTTGAATC---------------------ACTTAC dm2.chr2L -----------------------------------------TATGGACTCACdroMoj1.contig_2959 -------------------------AAATATTT--------TATTGACTCACdp3.chr4_group3 -----------------------------------------TGT--ACTTACdroSim1.chr2L -----------------------------------------TATGGACTCACdroVir1.scaffold_6 ---------------------------------AAATATTTGGTCCACTCACdroYak1.chr2L -----------------------------------------CATAAACTCAC *** **

Blanchette et al., Aligning multiple sequences with the threaded blockset aligner, Genome Research 14 (2004) p 708--715

MULTIZ

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

droAna1.2448876 CTGAAGGAATTCTA--TATTAAAG-----------------------------------dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAA----AGCGAGT-TTATTCdroMoj1.contig_2959 CTGGAATAGTTAATTTCATTGTAA---------CACATAAA------CGTTTTAAATTCdp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGG----AGAGGCCATCATCGdroSim1.chr2L CTGCGGGATTAGGAGTCATTAGAG---------TGCGGAAA----AGCGGG--TTATTCdroVir1.scaffold_6 CTGCAGCAGTTAA-ATAATTGTAA---------TAAACAA--------TTCTCTAATTTdroYak1.chr2L CTGCGGGATTAGCGGTCATTGGTG---------TGAAGAAT----AGATCCT-TTATTT *** * * * *

droAna1.2448876 -----ACTTAC dm2.chr2L TATGGACTCACdroMoj1.contig_2959 TATTGACTCACdp3.chr4_group3 TGT--ACTTACdroSim1.chr2L TATGGACTCACdroVir1.scaffold_6 GGTCCACTCACdroYak1.chr2L CATAAACTCAC *** **

DroAna_20041206_ CTGAAGGAAT-------TCTATATT---------AAAGAAGATTTCTCATCATTGGTTGDroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGA---------GTTTDroMoj_20041206_ CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAAATTCTATTGAAA-------DroPse_1_ CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG----DroSim_20040829_ CTGCGGGATTAGGAGTCATTAGAGT---------GCGGAAAAGCGG---------GTT-DroVir_20041029_ CTGCAGCAGTTAAATA-ATTGTAATAAACAATTCTCT--AATTTGGTCCAAA-------DroYak_1_ CTGCGGGATTAGCGGTCATTGGTGT---------GAAGAATAGATC---------CTTT *** * * * DroAna_20041206_ AATC-----ACTTACDroMel_4_ ATTCTATGGACTCACDroMoj_20041206_ ----TATTTACTCACDroPse_1_ ------TGTACTTACDroSim_20040829_ ATTCTATGGACTCACDroVir_20041029_ ----TATTTACTCACDroYak_1_ ATTTCATAAACTCAC

*** **

droAna1.2448876 CTGAAGGAATTCTA--TATTAAAG-----------------------------------dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAA----AGCGAGT-TTATTCdroMoj1.contig_2959 CTGGAATAGTTAATTTCATTGTAA---------CACATAAA------CGTTTTAAATTCdp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGG----AGAGGCCATCATCGdroSim1.chr2L CTGCGGGATTAGGAGTCATTAGAG---------TGCGGAAA----AGCGGG--TTATTCdroVir1.scaffold_6 CTGCAGCAGTTAA-ATAATTGTAA---------TAAACAA--------TTCTCTAATTTdroYak1.chr2L CTGCGGGATTAGCGGTCATTGGTG---------TGAAGAAT----AGATCCT-TTATTT *** * * * *

droAna1.2448876 AAGATTTCTCATCATTGGTTGAATC---------------------ACTTAC dm2.chr2L -----------------------------------------TATGGACTCACdroMoj1.contig_2959 -------------------------AAATATTT--------TATTGACTCACdp3.chr4_group3 -----------------------------------------TGT--ACTTACdroSim1.chr2L -----------------------------------------TATGGACTCACdroVir1.scaffold_6 ---------------------------------AAATATTTGGTCCACTCACdroYak1.chr2L -----------------------------------------CATAAACTCAC *** **

droAna CTGAAGGAATT--CTATATTAAAGAAGATTTCTCATCATT-GGTTGAATCACTTAC----droMel CTGCGGGATTAGGGGTCATTAGAGTGCCGAAAAGCGAGTTTATTCTATGGACTCAC----droMoj CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAA-ATTCAAATATTTTATTGACdroPse CTGGAAGAGTT--TTGATTAGTAGGGGATCCATGGGGGCG-AGGAGAGGCCATCATCGTGdroSim CTGCGGGATTAGGAGTCATTAGAGTGCGGAAAAGCGGGTT-ATTCTATGGACTCAC----droVir CTGCAGCAGTTAAATA-ATTGTAATAAACAA--TTCTCTA-ATTTAAATATTTGGTCCACdroYak CTGCGGGATTAGCGGTCATTGGTGTGAAGAATAGATCCTTTATTTCATAAACTCAC---- *** * * * * * droAna -------droMel -------droMoj TCAC---droPse TACTTACdroSim -------droVir TCAC---droYak -------

Higgins et al.,CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice, Nucleic Acids Research 22 (1994) p 4673--4680

CLUSTAL W

droAna1.2448876 CTGAAGGAATTCTA--TATTAAAG-----------------------------------dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAA----AGCGAGT-TTATTCdroMoj1.contig_2959 CTGGAATAGTTAATTTCATTGTAA---------CACATAAA------CGTTTTAAATTCdp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGG----AGAGGCCATCATCGdroSim1.chr2L CTGCGGGATTAGGAGTCATTAGAG---------TGCGGAAA----AGCGGG--TTATTCdroVir1.scaffold_6 CTGCAGCAGTTAA-ATAATTGTAA---------TAAACAA--------TTCTCTAATTTdroYak1.chr2L CTGCGGGATTAGCGGTCATTGGTG---------TGAAGAAT----AGATCCT-TTATTT *** * * * *

droAna1.2448876 -----ACTTAC dm2.chr2L TATGGACTCACdroMoj1.contig_2959 TATTGACTCACdp3.chr4_group3 TGT--ACTTACdroSim1.chr2L TATGGACTCACdroVir1.scaffold_6 GGTCCACTCACdroYak1.chr2L CATAAACTCAC *** **

dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAAAGCGAGT-TTATTCdp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG

dm2.chr2L TATGGACTCACdp3.chr4_group3 TGT--ACTTAC

>dm2.chr2L CTGCGGGATTAGGGGTCATTAGAGTGCCGAAAAGCGAGTTTATTCTATGGACTCAC>dp3.chr4_group3CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCGTGTACTTAC

How is an alignment made from the sequences?

dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAAAGCGAGT-TTATTCdp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG

dm2.chr2L TATGGACTCACdp3.chr4_group3 TGT--ACTTAC

DroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGA---------GTTTDroPse_1_ CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG---- DroMel_4_ ATTCTATGGACTCACDroPse_1_ ------TGTACTTAC

Each alignment can be summarized by counting the number of matches (#M), mismatches (#X), gaps (#G), and spaces (#S).

#M=31, #X=22, #G=3, #S=12

#M=27, #X=18, #G=3, #S=28

2(#M+#X)+#S=n+m (n,m length of seqs.) so #X,#G and #S suffice.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.This notation follows Chapter 7 (Parametric Sequence Alignment) by Colin Dewey and Kevin Woods in the new book Algebraic Statistics for Computational Biology (edited by L. Pachter and B. Sturmfels).

We can mark a point in space for every alignment…

In the example of our two sequences there are 379522884096444556699773447791552717765633different alignments, but only53890 different summaries. So we don’t need to mark that many points.

But 53890 is still quite a large number. Fortunately, there are only 69 vertices on the convex hull.

That is something we can draw…

>melCTGCGGGATTAGGGGTCATTAGAGTGCCGAAAAGCGAGTTTATTCTATGGAC>pseCTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCGTGTAC

For the sequences:

49 #x=24, #S=10, #G=2

There are eight alignments that have this summary.

the polytope is:

mel CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGAGTTTATTCTATGGACpse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATC-GTGTAC

mel CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGAGTTTATTCTATGGACpse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG-TGTAC

mel CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAAAGCGAGTTTATTCTATGGACpse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATC-GTGTAC

mel CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAAAGCGAGTTTATTCTATGGACpse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG-TGTAC

mel CTGCGGGATTAGGGGTCATTAGA---------GTGCCGAAAAGCGAGTTTATTCTATGGACpse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATC-GTGTAC

mel CTGCGGGATTAGGGGTCATTAGA---------GTGCCGAAAAGCGAGTTTATTCTATGGACpse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG-TGTAC

mel CTGCGGGATTAGGGGTCATTAG---------AGTGCCGAAAAGCGAGTTTATTCTATGGACpse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATC-GTGTAC

mel CTGCGGGATTAGGGGTCATTAG---------AGTGCCGAAAAGCGAGTTTATTCTATGGACpse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG-TGTAC

mel CTGCGGGATTAGGGGTCATTAGAGT===------===GCCGAAAAGCGAGTTTATTCTA=TGGACpse CTGGAAGAGTTTTGATTAGTAG===GGGATCCATGGGGGCGAGGAGAGGCCATCATC==GTGTAC

Consensus at a vertex

The vertices of the polytope have special significance. They correspond to optimal alignments, that is alignments that maximize

X*(#X)+G*(#G)+S*(#S)

For some choice of X,G and S.

Example: the eight alignments summarized by

are optimal for the parameters: M = 100, X = -100, S = -30, G = -400.

49 #x=24, #S=10, #G=2

(#X #S #G)[#alignments]40 (15,16,16)[1080]41 (17,30,2)[4] 42 (18,14,5)[4]43 (18,16,4)[56]44 (20,10,6)[16]45 (20,10,7)[24]46 (23,8,6)[6]47 (23,8,8)[165]48 (24,8,3)[38]49 (24,10,2)[8]50 (25,8,2)[24]51 (25,62,3)[2]52 (28,48,2)[1]53 (29,8,1)[6]

Finding the polytope is what we call parametric inference.Colin Dewey’s polytope propagation software can find the vertices in 20 seconds.

The MAVID, MULTIZ and CLUSTALW multiple alignments did not contain a D. melanogaster -- D. pseudoobscura pairwise alignment corresponding to a vertex on the polytope.

Reasons:

• D. melanogaster and D. pseudoobscura are not neighbors on the tree and were therefore aligned during a heuristic “progressive alignment” step.

• The correct alignment requires a model that includes a parameter for the transition /transversion ratio.

Example where robust alignments are crucial:

Transcription-associated mutational asymmetry in mammalian evolution Green et al. Nature Genetics, 33 (2003).

Observation:A G > T CG A > C T

in mammalian genes on the coding strand of transcribed regions.In fact, A G transitions were 58% more frequent than T C and G A transitions were 18% more frequent than C T.This is established by examining human-chimpanzee-baboon alignments (with baboon the outgroup):

human chimp baboon

Peter Huggins has confirmed that there is transcription-associated mutational asymmetry in Drosophila (ratios of 40%).

D. mel D. sim D.yak

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

from Green et al.

But the real problem is testing non-coding DNA…

>melCTGCGGGATTAGGGGTCATTAGAGTGCCGAAAAGCGAGTTTATTCTATGGAC>pseCTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCGTGTAC

Associated to every pair of sequences is a polynomial built from the “summaries” of the alignments.

49 #x=24, #S=10, #G=2

corresponds to the monomial 8x24S10G2

For example:

How do we build the polytope for ?

NPi,j = S*NPi-1,j+S*NPi,j-1+(X or M)*NPi-1,j-1

A

A

C

A

T

T

A

G

AA G A T T A C C A C A

Newton polytope forpositions [1,i] and [1,j]

in each sequence

Convex hull of union Minkowski sum

Polytope propagation

Next Steps

Biology

• Align all introns and intergenic regions between all pairs of Drosophila species parametrically. This will result in thousands of polytopes.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Next Steps

Biology

• Align all introns and intergenic regions between all pairs of Drosophila species parametrically. This will result in thousands of polytopes. • Distinguish robust alignments that do not depend critically on changes in parameters, from unreliable alignments. • Investigate biological questions parametrically.

Mathematics

• Optimize polytope propagation, and investigate other fast methods for building alignment polytopes. • Study the structure of alignment polytopes.• Develop a parametric framework for multiple alignment.