biological databases ucsc genome...
TRANSCRIPT
BIOLOGICAL DATABASES
UCSC GENOME BROWSERLetizia Marullo
MSc, PhD
NCBIGENE, GENBANK,
NUCLEOTIDE
FASTA format
UCSCGENOME BROWSER
• University of California Santa Cruz http://genome.ucsc.edu/
Santa Cruz, CA
THE UCSC HOME PAGE
navigate
navigate General information
Specific information—
new features, current status, etc.
Inga Prokopenko, MSc, PhD, Senior PostDoc, Senior Lecturer in Human GenomicsImperial College of London
UCSCTHE GENOME BROWSER GATEWAY
START PAGE
Make your Gateway choices:
1. Select Clade
2. Select species: search 1 species at a time
3. Assembly: the official backbone DNA sequence
4. Position: location in the genome to examine
5. Image width: how many pixels in display window; 5000 max
6. Configure: make fonts bigger + other choices
1 2 43
5
Inga Prokopenko, MSc, PhD, Senior PostDoc, Senior Lecturer in Human GenomicsImperial College of London
Search “BRCA1”
UCSCTHE GENOME BROWSER GATEWAY
START PAGE
text/ID
searches
Use this Gateway to search by:
• Gene names, symbols
• Chromosome number: chr7, or region: chr11:1038475-1075482
• Keywords: kinase, receptor
• IDs: NP, NM, OMIM, and more…
Inga Prokopenko, MSc, PhD, Senior PostDoc, Senior Lecturer in Human GenomicsImperial College of London
EXERCISE 1: SEARCHING IN UCSC GENOME BROWSER
• Group: mammals
• Genome: human
• Assembly: hg19
• Search “LCT”
UCSCOVERVIEW OF THE WHOLEGENOME BROWSER PAGE
}Genome viewer section
Track and image controls
Inga Prokopenko, MSc, PhD, Senior PostDoc, Senior Lecturer in Human GenomicsImperial College of London
UCSCMOVING IN THE BROWSER
Change your view or location with controls at the top
Use “base” to get right down to the nucleotides
Configure: to change font, window size, more…
Specify
a
position
Walk
left or
right
Zoom
in
Zoom
out
click to
zoom 3x
and re-center
Inga Prokopenko, MSc, PhD, Senior PostDoc, Senior Lecturer in Human GenomicsImperial College of London
fonts,
window,
more
Genome
viewer
section
Groups of data
UCSCOVERVIEW OF THE WHOLEGENOME BROWSER PAGE
UCSCANNOTATION TRACK DISPLAY OPTIONS
Some data is ON or OFF by default
Links to info
and/or filters
• Menu links to info about the tracks: content, methods
• You change the view with pulldown menus
enforce
changes
• After making changes, REFRESH to enforce the change
Change
track view
Inga Prokopenko, MSc, PhD, Senior PostDoc, Senior Lecturer in Human GenomicsImperial College of London
Data from
the
ENCODE
project
Data lifted
from other
builds
EXERCISE 2: CONFIGUREVISUALISATION IN UCSC
• Mapping and Sequencing: • Base position: DENSE
• Genes and gene predictions:
• UCSC genes: PACK
• RefSeq genes: DENSE
• GENECODE: SHOW
• Phenotype and literature:
• Publications: DENSE
• GWAS catalog: DENSE
• mRNAs and ESTs:
• Human mRNAs: DENSE
• Spliced ESTs: HIDE
• Expression: hide all
• Regulation:• ENCODE regulation: SHOW
• Set what to show: Transcription FULL, DNAseI DENSE
• Comparative Genomics:
• Conservation: FULL
• Neandertal Assembly and Denisova Assembly:hide all
• Variation:
• Common SNPs: PACK
• Repeats:
• RepeatMasker: DENSE
UCSCWHAT WE WILL FIND..
Inga Prokopenko, MSc, PhD, Senior PostDoc, Senior Lecturer in Human GenomicsImperial College of London
UCSC ANNOTATION TRACK
informative
description
other resource links
microarray data
mRNA secondary structure
links to sequences
protein domains/structure
homologs in other species
Gene Ontology™ descriptions
mRNA descriptions
pathways
Not all genes have
This much detail.
Different
annotation tracks
carry different detail
data.
Inga Prokopenko, MSc, PhD, Senior PostDoc, Senior Lecturer in Human GenomicsImperial College of London
UCSC GET SEQUENCES
Click the lineClick the item
sequence section
on detail page
Inga Prokopenko, MSc, PhD, Senior PostDoc, Senior Lecturer in Human GenomicsImperial College of London
EXERCISE 3: GET THE SEQUENCE
• Get the 1st 15 lines of LAC genomic sequence in fasta format and save in a text file.
UCSC GET SEQUENCES WITH EXTENDED OPTIONS
• Use the DNA link at the top
• Plain or Extended options
• Change colors, fonts, etc.
EXERCISE 3: GET THE SEQUENCE
• Get the sequence chr1:128,284,800-128,290,000 in Mus musculus and savethe FASTA format in a text file.
UCSC BLAT TOOL
• Rapid searches by INDEXING the entire genome
• Works best with high similarity matches
BLAT = BLAST-like Alignment Tool
Inga Prokopenko, MSc, PhD, Senior PostDoc, Senior Lecturer in Human GenomicsImperial College of London
Submit
Make choices
Paste one or
more
sequences
Or uploadInga Prokopenko, MSc, PhD, Senior PostDoc, Senior Lecturer in Human Genomics
Imperial College of London
UCSC BLAT TOOL
UCSC BLAT RESULTS,
ALIGNMENT DETAILSYour query
Genomic match, color cues
Side-by-side alignment
Inga Prokopenko, MSc, PhD, Senior PostDoc, Senior Lecturer in Human GenomicsImperial College of London
EXERCISE 4: ALIGNMENT
• Align the two FASTA sequences to Mus musculus genome.
UCSC WHOLE GENOMES DATA