prosite and ucsc genome browser exercise 4

Post on 27-Jan-2016

41 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Prosite and UCSC Genome Browser Exercise 4. What is a motif?. A sequence motif = a certain sequence that is widespread and conjectured to have biological significance Examples: KDEL – ER-lumen retention signal PKKKRKV – an NLS (nuclear localization signal). More loosely defined motifs. - PowerPoint PPT Presentation

TRANSCRIPT

Prosite and Prosite and UCSC Genome UCSC Genome

BrowserBrowser

Exercise 4Exercise 4

What is a motif?What is a motif?

A sequence motifA sequence motif = a certain sequence = a certain sequence that is widespread and conjectured to that is widespread and conjectured to have biological significancehave biological significance

Examples:Examples:KDELKDEL – ER-lumen retention signal – ER-lumen retention signalPKKKRKVPKKKRKV – an NLS (nuclear localization – an NLS (nuclear localization signal)signal)

More loosely defined motifsMore loosely defined motifs

KDEL (usually)KDEL (usually)++

HDEL (rarely) HDEL (rarely) ==

[HK]-D-E-L:[HK]-D-E-L:H H oror K at the first position K at the first position

This is called a pattern (in Biology), or a This is called a pattern (in Biology), or a regular expression (in computer science)regular expression (in computer science)

Syntax of a patternSyntax of a pattern

Example:Example: W-x(9,11)-[FYV]-[FYW]-x(6,7)-[GSTNE]W-x(9,11)-[FYV]-[FYW]-x(6,7)-[GSTNE]

PatternsPatterns

W-x(9,11)-[FYV]-[FYW]-x(6,7)-[GSTNE]W-x(9,11)-[FYV]-[FYW]-x(6,7)-[GSTNE]

Any amino-acid, between 9-11

times

F or Y or

V

WOPLASDFGYVWPPPLAWSROPLASDFGYVWPPPLAWSWOPLASDFGYVWPPPLSQQQ

Patterns - syntaxPatterns - syntax

The standard IUPAC one-letter codes. The standard IUPAC one-letter codes. ‘‘x’x’ : any amino acid. : any amino acid. ‘‘[]’[]’ : residues allowed at the position. : residues allowed at the position. ‘‘{}’{}’ : residues forbidden at the position. : residues forbidden at the position. ‘‘()’()’ : repetition of a pattern element are indicated in : repetition of a pattern element are indicated in

parenthesis. X(n) or X(n,m) to indicate the number or parenthesis. X(n) or X(n,m) to indicate the number or range of repetition. range of repetition.

‘‘-’-’ : separates each pattern element. : separates each pattern element. ‘‹’‘‹’ : indicated a N-terminal restriction of the pattern. : indicated a N-terminal restriction of the pattern. ‘›’‘›’ : indicated a C-terminal restriction of the pattern. : indicated a C-terminal restriction of the pattern. ‘‘.’.’ : the period ends the pattern. : the period ends the pattern.

Profile-pattern-consensusProfile-pattern-consensus

AAAACCTTTTGG

AAAAGGTTCCGG

CCAACCTTTTCC

1122334455

AA0.660.66110000..

TT00000011..

CC0.330.33000.660.6600..

GG00000.330.3300..

AAAACCTTTTGG

]AC-[A-[GC]-T-[TC]-[GC]

multiple alignment

consensus

pattern

profile

NNAANNTTNNNN

PrositeProsite

A method for determining the function of A method for determining the function of uncharacterized translated protein uncharacterized translated protein sequencessequences

DB of annotated protein families and DB of annotated protein families and functional sites as well as associated functional sites as well as associated patterns and profiles to identify thempatterns and profiles to identify them

PrositeProsite Entries are represented with Entries are represented with patternspatterns or or

profilesprofiles

pattern

1122334455

AA0.660.66110000..

TT00000011..

CC0.330.33000.660.6600..

GG00000.330.3300..

profile

]AC-[A-[GC]-T-[TC]-[GC]

Profiles are used in Prosite when the motif is relatively Profiles are used in Prosite when the motif is relatively divergent and it is difficult to represent as a patterndivergent and it is difficult to represent as a pattern

Scanning PrositeScanning Prosite

Query: sequence

Query: pattern

Result: all patterns found in sequence

Result: all sequences which adhere to this pattern

Searching Prosite with a sequenceSearching Prosite with a sequence

PrositeProsite results for Hemoglobin subunit beta results for Hemoglobin subunit beta

Prosite profileProsite profile

Prosite profile Prosite profile sequence logo sequence logo

Sequence logoSequence logo

WebLogoWebLogo

http://weblogo.berkeley.edu/logo.cgi

Searching Prosite with a sequenceSearching Prosite with a sequence

Patterns with a high probability of Patterns with a high probability of occurrenceoccurrence

Entries describing commonly found postEntries describing commonly found post--translational modifications or compositionally translational modifications or compositionally biased regions.biased regions.

Found in the majority of known protein Found in the majority of known protein sequences sequences

High probability of occurrenceHigh probability of occurrence

Searching Prosite with a patternSearching Prosite with a pattern

Searching Prosite with a patternSearching Prosite with a pattern

]TAFR-[W-Q-Y

Searching Prosite with a Prosite ACSearching Prosite with a Prosite AC

UCSC Genome BrowserUCSC Genome Browser

Reset all settings of

previous user

UCSC Genome Browser - GatewayUCSC Genome Browser - Gateway

UCSC Genome Browser - GatewayUCSC Genome Browser - Gateway

UCSC Genome Browser - GatewayUCSC Genome Browser - Gateway

UCSC Genome Browser query resultsUCSC Genome Browser query results

UCSC Genome Browser UCSC Genome Browser Annotation tracksAnnotation tracks

Vertebrate conservation

mRNA (GenBank)

RefSeq

UCSC Genes

Base position

Single species compared

SNPs

Repeats

Direction of transcription (<)

CDS

Intron

UTR

EST based sequence

USCS GeneUSCS Gene

UCSC Genome Browser - movementUCSC Genome Browser - movement

Zoom x3 + Center

mRNA mRNA annotation track optionannotation track option

Sickle-cell anemia distr.

Malariadistr.

BLATBLAT

BLAT = BBLAT = Blast-last-LLike ike AAlignment lignment TTool ool BLAT is designed to find similarity of BLAT is designed to find similarity of >95% on >95% on

DNADNA, , >80% for protein>80% for protein Rapid search by indexing entire genome.Rapid search by indexing entire genome.Good for:Good for:1.1. Finding genomic coordinates of cDNAFinding genomic coordinates of cDNA2.2. Determining exons/intronsDetermining exons/introns3.3. Finding human (or chimp, dog, cow…) Finding human (or chimp, dog, cow…)

homologs of another vertebrate sequencehomologs of another vertebrate sequence4.4. Find upstream regulatory regionsFind upstream regulatory regions

BLAT on UCSC Genome BrowserBLAT on UCSC Genome Browser

BLAT on UCSC Genome BrowserBLAT on UCSC Genome Browser

BLAT ResultsBLAT Results

BLAT ResultsBLAT Results

Match

Non-Match(mismatch/indel)

Indel boundaries

BLAT ResultsBLAT Results

BLAT Results on the browserBLAT Results on the browser

Getting Getting DNADNA sequence of region sequence of region

Getting Getting DNADNA sequence of region sequence of region

top related