july 2015 cshl navigating data at the saccharomyces genome database rob nash, senior biocuration...
TRANSCRIPT
July 2015 CSHL
Navigating data at the Saccharomyces Genome Database
SGD: www.yeastgenome.org
Rob Nash, Senior Biocuration [email protected]
July 2015 CSHL
Outline
• History and background• How to stay current• Basic org. (homepage, search, LSP)• Tabs, access to detailed info (sequence, gene
ontology, phenotype, interaction, expression and regulation)
• Data analysis: GO tools, YeastMine basics and use-cases
July 2015 CSHL
About SGD
• Totally public, open, non-profit academic group
• Funded by the NIH (NHGRI)
• Mike Cherry at Stanford is the P.I. (since 1992). Most of SGD is housed at Stanford, with a few remote curators who work from home
July 2015 CSHL
Key early decisions
• People who understand the biology (Ph.D. biologists) are required to design the database, summarize the literature, etc.
• Full-time staff positions are needed for project stability.
• Our top priority is to serve the needs of the research community (yeast and other), so communication with users is critically important.
July 2015 CSHL
SGD Today
• Over 1.7 million visits from unique IP addresses over the past year; 175,000 page views per week; worldwide usage
• About 15 full-time staff (curators, programmers, system and db admins)
“Other” represents 30 countries with more than 100 visits, and 49 additional countries with 10-100 visits.
July 2015 CSHL
SGD Staff, Cherry lab
July 2015 CSHL
• Search• YeastMine• YouTube tutorials
• New data and updates• Research spotlight• Upcoming meetings
• Analysis and seq. tools• Functional information• Literature• Community
• Colleague Info.• Gene registry• Wiki• Newsletter
Social Media:• Facebook• Twitter• Linked in
Basic organization of information on the home page
July 2015 CSHL
Elastic search with autocomplete
Gene names (ACT1) => Locus Summary pageOther terms (actin; “act1 *”) => Instant Search pageSome IDs direct: 5634, 25721128Single quote (OR) vs double quotes (AND)
July 2015 CSHL
Modify your search
Autocomplete (suggestions)
Instant search (predictive results)
Next iteration to include facets!
July 2015 CSHL
Website redesign: staying current and modern
• To store new data and leverage new web development tools, SGD was completely overhauled.
• Restructured pages, data transfer methods, and underlying database schema, all done while keeping the site live and actively curated. Goal was to make the website faster, and easier to maintain
• New visualization methods, and a responsive layout.
July 2015 CSHL
Locus Summary Page
Responsive layout: better for all devices
Organization:• moved seq. info up + improved graphics• some basic protein info. • regulation summary• Improved expression histogram
Navigation has changed:• Sectional nav. bar with back to top• tabs and details link• New tabs for seq. and locus history
July 2015 CSHL
What’s behind the tabs?
July 2015 CSHL
Sequence details• S288C overview
– map– subfeatures, with coordinates– sequence (genomic, coding and protein)
• Alternative reference strains– map– subfeatures, with coordinates– sequence (genomic, coding and protein)
• Other strains
July 2015 CSHLOther ref strainsAlternative ref strains
July 2015 CSHL
Sequence tools
• BLASTN, BLASTP
• BLASTN vs fungi, BLASTP vs fungi
• Strain alignment (YRR1)
• Variant viewer (new)
July 2015 CSHL
Variant viewerAccess from:1) Sequence (home page navigation bar) -> Strain and species2) Analyze sequence section of LSP, and 3) resources section of sequence tab
July 2015 CSHL
Protein details
• Overview• Domains table, and location graphic• Shared domains diagram• Post-translational modifications• Physico-chemical properties• External IDs• Resources
July 2015 CSHL
The Gene Ontology (GO) Project
A collaboration among model organism databases, initiated in 1998 by a consortium of researchers from FlyBase, SGD, and MGD, to improve queries within and across databases.
The problem across databases: “Biologists would rather share their toothbrush than share a gene name. Gene nomenclature is beyond redemption” - Michael Ashburner
July 2015 CSHL
S. cerevisiae
CDC25Son of
Sevenless
D. melanogaster
SOS1
H. sapiens
fructose-bisphosphate aldolase = 1,6-diphosphofructose aldolase = D-fructose-1,6-bisphosphate D-glyceraldehyde-3-phosphate-lyase = diphosphofructose aldolase = fructoaldolase = fructose 1,6-diphosphate aldolase = fructose 1-monophosphate aldolase = fructose 1-phosphate aldolase = fructose diphosphate aldolase = fructose-1,6-bisphosphate triosephosphate-lyase
= ketose 1-phosphate aldolase = phosphofructoaldolase = zymohexase
Neither genetic names nor common names are consistently used
= =
July 2015 CSHL
The solution: GO, a set of three independent structured, controlled vocabularies for describing the molecular function, biological process, and
cellular component of gene products
Molecular function: the tasks performed by individual gene products, for example, fructose-bisphosphate aldolase activity or protein serine/threonine kinase activity.
Biological process: the broad biological goals, such as mitosis or DNA replication, that are accomplished by ordered assemblies of molecular functions.
Cellular component: subcellular structures, locations, and macromolecular complexes, such as nucleus, cellular bud tip, and origin recognition complex.
July 2015 CSHL
GO Annotation DetailsGO Summary
Biological Process
Molecular Function
Cellular Component
July 2015 CSHL
Phenotype details
Use SGD search to locate observables and ALL textBrowsable list of all phenotypes
July 2015 CSHL
Interaction details
Operations• sort• filter• analyze
July 2015 CSHL
Expression details
July 2015 CSHL
SPELL expression tool
See expression of an individual gene in selected dataset(s) Enter a set of genes and find genes
with similar expression profiles (optional filtering by tags)
July 2015 CSHL
Regulation details
Overview
Domains/classifications
Targets
Shared GO for targets
Regulators
July 2015 CSHL
Biochemical Pathways
July 2015 CSHL
Gbrowse
Navigation: * landmark * scrolling * zooming
Selecting: * tracks * subtracks
July 2015 CSHL
Navigation:• Region (chrVI:48,978..58,977), gene name (CDC28), keyword
(invasive growth)• Highlighted rectangle in overview is region of genome displayed in
detail panel• Region panel displays a portion of the genome surrounding the region
of interest• Detail panel displays zoomed in view that corresponds to the overview
selection rectangle
Select tracks:• SGD Annotations sequence features• Chromatin structure histone modifications, nucleosome
org.• Gene Structure transcription start sites, 5’ and 3’ UTRs• RNA expression mRNA, ncRNA, cell cycle• Replication and Recomb’n meiotic recomb’n, origins of
replication• Transcription Regulation txn factors, RNAPII, preinitiation
factors• Analysis restriction sites
July 2015 CSHL
Data files for download
July 2015 CSHL
Search full text with Textpresso
July 2015 CSHL
Genome Snapshot: global questions about the genome and its annotation status