genomics: reading genome sequences assembly of the sequence annotation of the sequence

64
Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes find the genes! For Bioinformatics , Start with:

Upload: ciara-rojas

Post on 02-Jan-2016

38 views

Category:

Documents


1 download

DESCRIPTION

For Bioinformatics. , Start with:. Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence. carry out dideoxy sequencing. connect seqs. to make whole chromosomes. find the genes!. For Bioinformatics. , Start with:. Genomics: READING genome sequences - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

Genomics:

READING genome sequences

ASSEMBLY of the sequence

ANNOTATION of the sequence

carry out dideoxy sequencing

connect seqs. to make whole chromosomes

find the genes!

For Bioinformatics, Start with:

Page 2: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

Genomics:

READING genome sequences

ASSEMBLY of the sequence

ANNOTATION of the sequence

carry out dideoxy sequencing

connect seqs. to make whole chromosomes

find the genes!

For Bioinformatics, Start with:

Page 3: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

2 ways to annotate eukaryotic genomes:

-ab initio gene finders:

-Genes based on previous knowledge….EVIDENCE of message

2 ways to annotate eukaryotic genomes:

-ab initio gene finders:Work on basic biological principles:

Open reading framesConsensus splice sitesMet start codons…..

-Genes based on previous knowledge….EVIDENCE of message

Page 4: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence
Page 5: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence
Page 6: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence
Page 7: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

2 ways to annotate eukaryotic genomes:

-ab initio gene finders:Work on basic biological principles:

Open reading framesConsensus splice sitesMet start codons…..

-Genes based on previous knowledge….EVIDENCE of message cDNA sequence of the gene’s message

cDNA of a closely related gene’ message sequenceProtein sequence of the known geneSame gene’sSame gene’s from another speciesRelated gene’s protein…….

Page 8: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

Homology based exon predictions

Consensus genestructure (both strands)

start and stop site

predictions

Splice site predictions

computational exon

predictions

Tracking information

Unique identifiers

Information for Ab initiogene finding

Page 9: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

Automaticallygeneratedannotation

Page 10: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

A zebrafish hit shows a gene model protein encoded by a 6 exon gene.

This gene structure (intron/exon) is seen in other species, as is the protein size.

The proteins, if corresponding to MSP in S. gal., must be heavily glycosylated (likely). At least some have a signal peptide.

Page 11: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

The zebrafish hit can be viewed at higher resolution, and…

Page 12: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

The zebrafish hit can be viewed down to nucleotide resolution

GO LIVE!

Page 13: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

Sarin et al

Page 14: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

Sarin et al

Page 15: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

Is there linkage between a mutant gene/phenotype and a SNP?

USE standard genetic mapping technique, with SNP alternative sequences as “phenotype”

B= bad hair, DominantSNP1 ..ACGTC..SNP1’ ..ACGCC..

SNP2 ..GCTAA..SNP2’ ..GCAAA..

SNP3 ..GTAAC..SNP3’ ..GTCAC..

X

XSNP1’ ..ACGCC..SNP1’ ..ACGCC..

SNP2’ ..GCAAA..SNP2’ ..GCAAA..

SNP3’ ..GTCAC..SNP3’ ..GTCAC..

SNP1 ..ACGTC..SNP1 ..ACGTC..

SNP2 ..GCTAA..SNP2 ..GCTAA..

SNP3 ..GTAAC..SNP3 ..GTAAC..

F1

START with Inbred lines-

SNPs are homozygosed

B

Page 16: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

Is there linkage between a mutant gene/phenotype and a SNP?

USE standard genetic mapping technique, with SNP alternative sequences as “phenotype”

B= bad hair, Dominant

X

B/b b/b

B/b

B/b

b/b

b/b

1’/1 25%

1/1 25%

1’/1 25%

1/1 25%

1’/1 1/1

SNP1 ..ACGTC..SNP1’ ..ACGCC..

SNP2 ..GCTAA..SNP2’ ..GCAAA..

SNP3 ..GTAAC..SNP3’ ..GTCAC..

2’/2 47%

2/2 3%

2’/2 3%

2/2 47%

2’/2 2/2

3’/3 25%

3/3 25%

3’/3 25%

3/3 25%

3’/3 3/3

SO…B is 6 cM from SNP2, and is unlinked to SNP 1 or 3

B 2’ / b 2

Page 17: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

Is there linkage between a mutant gene/phenotype and a SNP?

USE standard genetic mapping technique, with SNP alternative sequences as “phenotype”

B= bad hair, Dominant

X

B/b b/b 1/1’ 1/1

SNP1 ..ACGTC..SNP1’ ..ACGCC..

SNP2 ..GCTAA..SNP2’ ..GCAAA..

SNP3 ..GTAAC..SNP3’ ..GTCAC.. 2/2’ 2/2 3/3’ 3/3

SO…B is 6 cM from SNP2, and is unlinked to SNP 1 or 3

We have the ENTIRE genome sequence of mouse, so we know where the SNPs are

Now-do this while checking the sequence of THOUSANDS of SNPs

Page 18: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

Genomics:

READING genome sequences

ASSEMBLY of the sequence

ANNOTATION of the sequence

carry out dideoxy sequencing

connect seqs. to make whole chromosomes

find the genes!

But Bioinformatics is more…

Page 19: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence
Page 20: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

TRANSCRIPTOMICS: cDNAs

RNA target sampleRNA target sample

End Reads (Mates)End Reads (Mates)

SEQUENCESEQUENCE

PrimerPrimer

cDNA Library

Each cDNA provides sequence from the two ends – two ESTs

& ESTs: Expressed Sequence Tags

Page 21: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence
Page 22: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

!!AA_SEQUENCE 1.0ab025413 peptide tenm4.pep Length: 2771 May 12, 1999 09:34 Type: P Check: 2254 ..

1 MDVKERKPYR SLTRRRDAER RYTSSSADSE EGKGPQKSYS SSETLKAYDQ

51 DARLAYGSRV KDMVPQEAEE FCRTGTNFTL RELGLGEMTP PHGTLYRTDI

101 GLPHCGYSMG ASSDADLEAD TVLSPEHPVR LWGRSTRSGR SSCLSSRANS

151 NLTLTDTEHE NTETDHPSSL QNHPRLRTPP PPLPHAHTPN QHHAASINSL

201 NRGNFTPRSN PSPAPTDHSL SGEPPAGSAQ EPTHAQDNWL LNSNIPLETR

251 NLGKQPFLGT LQDNLIEMDI LSASRHDGAY SDGHFLFKPG GTSPLFCTTS

301 PGYPLTSSTV YSPPPRPLPR STFSRPAFNL KKPSKYCNWK CAALSAILIS

351 ATLVILLAYF VAMHLFGLNW HLQPMEGQMQ MYEITEDTAS SWPVPTDVSL

401 YPSGGTGLET PDRKGKGAAE GKPSSLFPED SFIDSGEIDV GRRASQKIPP

Protein sequence: from peptide sequencing, or from translation of sequenced nucleic acids

Page 23: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence
Page 24: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

Structural proteomics:Coordinates, rather than 1D sequence, Saved

Page 25: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

/TRANSCRIPTOMICS (Arrays)

Page 26: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

RNA for ALL C. elegans genesWhere? When? Who? are the RNAs

Page 27: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

Where? When? Who? are the RNAs

Page 28: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

Where? When? Who? are the RNAs

Page 29: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

MICROARRAY ANALYSIS

Where? When? Who? are the RNAs

Page 30: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

/TRANSCRIPTOMICS (Arrays)

Page 31: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

Figure 4.15 Microarray TechniqueWhere? When? Who? are the RNAs

Page 32: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

Figure 4.15 Microarray TechniqueWhere? When? Who? are the RNAs

Page 33: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

Array analysis: see animation from Griffiths

Where? When? Who? are the RNAs

Page 34: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

Figure 4.16(1) Microarray Analysis of Those Genes Whose Expression in the Early Xenopus Embryo Is Caused by the Activin-Like Protein Nodal-Related 1

(Xnr1)

Where? When? Who? are the RNAs

Page 35: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

Figure 4.16(2) Microarray Analysis of Those Genes Whose Expression in the Early Xenopus Embryo Is Caused by the Activin-Like Protein Nodal-Related 1

(Xnr1)

Where? When? Who? are the RNAs

Page 36: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

Where? When? Who? are the RNAs

Page 37: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

Where? When? Who? are the RNAs

Page 38: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

RNAi for every C. elegans gene too!

-results on the webProjects to systematically Knock-out (or pseudo-knockout)every gene, in order to establish phenotype of each gene -> function of each gene

Page 39: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

Figure 4.23(1) Use of Antisense RNA to Examine the Roles of Genes in Development (here fly)

Page 40: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

Figure 4.23(2) Use of Antisense RNA to Examine the Roles of Genes in Development (here fly)

Page 41: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

RNAi for ALL C. elegans genes

Page 42: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

Figure 4.24 Injection of dsRNA for E-Cadherin into the Mouse ZygoteBlocks E-Cadherin Expression

Page 43: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

MODENCODE

Page 44: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

MODENCODE

Page 45: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

MODENCODE

Page 46: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

MODENCODE

Page 47: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

MODENCODE

Page 48: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

MODENCODE

Page 49: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

MODENCODE

Page 50: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

MODENCODE

Page 51: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

MODENCODE

Page 52: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

MODENCODE was from the Drosophila paper:

Nature. 2011 Mar 24;471(7339):527-31. doi: 10.1038/nature09990.

A cis-regulatory map of the Drosophila genome.Nègre N et al.

Page 53: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

Followed by INVERSE PCR to recover seqeunce adjacent to insertion.Then compare to the complete Drosophila genome sequence to know which ORF “Hit”

KNOCK-OUTS OF ALL ESSENTIAL GENES – RANDOM MUTAGENESIS ATTEMPT – using transposon mobilization

Page 54: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

About 10% of All Assumed genes “Hit” (~10/100 per interval) on Drosophila X chromosome. 1 series of random insertion experiments.

ALL inset sites know, thanks to INVERSE PCR

Page 55: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

Figure 1 The two-hybrid assay carried out by screening a protein array. a, The array of 6,000 haploid yeast transformants plated on medium lacking leucine, which allows growth of all transformants. Each transformant expresses one of the yeast ORFs expressed as a fusion to the Gal4 activation domain. b, Two-hybrid positives from

a screen of the array with a Gal4 DNA-binding domain fusion of the Pcf11 protein, a component of the pre-mRNA cleavage and

polyadenylation factor IA, which also consists of four other polypeptides36. Diploid colonies are shown after two weeks of

growth on medium lacking tryptophan, leucine and histidine and supplemented with 3 mM 3-amino-1,2,4-triazole, thus allowing growth only of cells that express the HIS3 two-hybrid reporter gene. Three other components of factor IA, Rna14, Rna15 and Clp1, were identified as Pcf11 interactors. Positives that do not

appear in Table 2 were either not reproducible or are false positives that occurred in many screens.

2-hybrid reaction between one protein and all 6000+ potential interactors in Yeast Genome

Page 56: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

Figure 2 Visualization of combined, large-scale interaction data sets in yeast. A total of 14,000 physical interactions obtained from the GRID database were represented with the Osprey network visualization system (see http://biodata.mshri.on.ca/grid). Each edge in the graph represents an interaction between nodes, which are coloured according to Gene Ontology (GO) functional annotation. Highly connected complexes within the data set, shown at the perimeter of the central mass, are built from nodes that share at least three interactions within other complex members. The complete graph contains 4,543 nodes of 6,000 proteins encoded by the yeast genome, 12,843 interactions and an average connectivity of 2.82 per node. The 20 highly connected complexes contain 340 genes, 1,835 connections and an average connectivity of 5.39

Osprey: integrate all 2-hybrid interactions between all 6000+ proteinsin Yeast Genome (Proteome)

Page 57: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence
Page 58: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence
Page 59: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence
Page 60: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence

a

Page 61: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence
Page 62: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence
Page 63: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence
Page 64: Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence