july 2015 cshl navigating data at the saccharomyces genome database rob nash, senior biocuration...

32
July 2015 CSHL Navigating data at the Saccharomyces Genome Database SGD: www.yeastgenome.org sgd- [email protected] .edu Rob Nash, Senior Biocuration Scientist [email protected]

Upload: jean-ray

Post on 12-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: July 2015 CSHL Navigating data at the Saccharomyces Genome Database Rob Nash, Senior Biocuration Scientist rnash@stanford.edu

July 2015 CSHL

Navigating data at the Saccharomyces Genome Database

SGD: www.yeastgenome.org

[email protected]

Rob Nash, Senior Biocuration [email protected]

Page 2: July 2015 CSHL Navigating data at the Saccharomyces Genome Database Rob Nash, Senior Biocuration Scientist rnash@stanford.edu

July 2015 CSHL

Outline

• History and background• How to stay current• Basic org. (homepage, search, LSP)• Tabs, access to detailed info (sequence, gene

ontology, phenotype, interaction, expression and regulation)

• Data analysis: GO tools, YeastMine basics and use-cases

Page 3: July 2015 CSHL Navigating data at the Saccharomyces Genome Database Rob Nash, Senior Biocuration Scientist rnash@stanford.edu

July 2015 CSHL

About SGD

• Totally public, open, non-profit academic group

• Funded by the NIH (NHGRI)

• Mike Cherry at Stanford is the P.I. (since 1992). Most of SGD is housed at Stanford, with a few remote curators who work from home

Page 4: July 2015 CSHL Navigating data at the Saccharomyces Genome Database Rob Nash, Senior Biocuration Scientist rnash@stanford.edu

July 2015 CSHL

Key early decisions

• People who understand the biology (Ph.D. biologists) are required to design the database, summarize the literature, etc.

• Full-time staff positions are needed for project stability.

• Our top priority is to serve the needs of the research community (yeast and other), so communication with users is critically important.

Page 5: July 2015 CSHL Navigating data at the Saccharomyces Genome Database Rob Nash, Senior Biocuration Scientist rnash@stanford.edu

July 2015 CSHL

SGD Today

• Over 1.7 million visits from unique IP addresses over the past year; 175,000 page views per week; worldwide usage

• About 15 full-time staff (curators, programmers, system and db admins)

“Other” represents 30 countries with more than 100 visits, and 49 additional countries with 10-100 visits.

Page 6: July 2015 CSHL Navigating data at the Saccharomyces Genome Database Rob Nash, Senior Biocuration Scientist rnash@stanford.edu

July 2015 CSHL

SGD Staff, Cherry lab

Page 7: July 2015 CSHL Navigating data at the Saccharomyces Genome Database Rob Nash, Senior Biocuration Scientist rnash@stanford.edu

July 2015 CSHL

• Search• YeastMine• YouTube tutorials

• New data and updates• Research spotlight• Upcoming meetings

• Analysis and seq. tools• Functional information• Literature• Community

• Colleague Info.• Gene registry• Wiki• Newsletter

Social Media:• Facebook• Twitter• Linked in

Basic organization of information on the home page

Page 8: July 2015 CSHL Navigating data at the Saccharomyces Genome Database Rob Nash, Senior Biocuration Scientist rnash@stanford.edu

July 2015 CSHL

Elastic search with autocomplete

Gene names (ACT1) => Locus Summary pageOther terms (actin; “act1 *”) => Instant Search pageSome IDs direct: 5634, 25721128Single quote (OR) vs double quotes (AND)

Page 9: July 2015 CSHL Navigating data at the Saccharomyces Genome Database Rob Nash, Senior Biocuration Scientist rnash@stanford.edu

July 2015 CSHL

Modify your search

Autocomplete (suggestions)

Instant search (predictive results)

Next iteration to include facets!

Page 10: July 2015 CSHL Navigating data at the Saccharomyces Genome Database Rob Nash, Senior Biocuration Scientist rnash@stanford.edu

July 2015 CSHL

Website redesign: staying current and modern

• To store new data and leverage new web development tools, SGD was completely overhauled.

• Restructured pages, data transfer methods, and underlying database schema, all done while keeping the site live and actively curated. Goal was to make the website faster, and easier to maintain

• New visualization methods, and a responsive layout.

Page 11: July 2015 CSHL Navigating data at the Saccharomyces Genome Database Rob Nash, Senior Biocuration Scientist rnash@stanford.edu

July 2015 CSHL

Locus Summary Page

Responsive layout: better for all devices

Organization:• moved seq. info up + improved graphics• some basic protein info. • regulation summary• Improved expression histogram

Navigation has changed:• Sectional nav. bar with back to top• tabs and details link• New tabs for seq. and locus history

Page 12: July 2015 CSHL Navigating data at the Saccharomyces Genome Database Rob Nash, Senior Biocuration Scientist rnash@stanford.edu

July 2015 CSHL

What’s behind the tabs?

Page 13: July 2015 CSHL Navigating data at the Saccharomyces Genome Database Rob Nash, Senior Biocuration Scientist rnash@stanford.edu

July 2015 CSHL

Sequence details• S288C overview

– map– subfeatures, with coordinates– sequence (genomic, coding and protein)

• Alternative reference strains– map– subfeatures, with coordinates– sequence (genomic, coding and protein)

• Other strains

Page 14: July 2015 CSHL Navigating data at the Saccharomyces Genome Database Rob Nash, Senior Biocuration Scientist rnash@stanford.edu

July 2015 CSHLOther ref strainsAlternative ref strains

Page 15: July 2015 CSHL Navigating data at the Saccharomyces Genome Database Rob Nash, Senior Biocuration Scientist rnash@stanford.edu

July 2015 CSHL

Sequence tools

• BLASTN, BLASTP

• BLASTN vs fungi, BLASTP vs fungi

• Strain alignment (YRR1)

• Variant viewer (new)

Page 16: July 2015 CSHL Navigating data at the Saccharomyces Genome Database Rob Nash, Senior Biocuration Scientist rnash@stanford.edu

July 2015 CSHL

Variant viewerAccess from:1) Sequence (home page navigation bar) -> Strain and species2) Analyze sequence section of LSP, and 3) resources section of sequence tab

Page 17: July 2015 CSHL Navigating data at the Saccharomyces Genome Database Rob Nash, Senior Biocuration Scientist rnash@stanford.edu

July 2015 CSHL

Protein details

• Overview• Domains table, and location graphic• Shared domains diagram• Post-translational modifications• Physico-chemical properties• External IDs• Resources

Page 18: July 2015 CSHL Navigating data at the Saccharomyces Genome Database Rob Nash, Senior Biocuration Scientist rnash@stanford.edu

July 2015 CSHL

The Gene Ontology (GO) Project

A collaboration among model organism databases, initiated in 1998 by a consortium of researchers from FlyBase, SGD, and MGD, to improve queries within and across databases.

The problem across databases: “Biologists would rather share their toothbrush than share a gene name. Gene nomenclature is beyond redemption” - Michael Ashburner

Page 19: July 2015 CSHL Navigating data at the Saccharomyces Genome Database Rob Nash, Senior Biocuration Scientist rnash@stanford.edu

July 2015 CSHL

S. cerevisiae

CDC25Son of

Sevenless

D. melanogaster

SOS1

H. sapiens

fructose-bisphosphate aldolase = 1,6-diphosphofructose aldolase = D-fructose-1,6-bisphosphate D-glyceraldehyde-3-phosphate-lyase = diphosphofructose aldolase = fructoaldolase = fructose 1,6-diphosphate aldolase = fructose 1-monophosphate aldolase = fructose 1-phosphate aldolase = fructose diphosphate aldolase = fructose-1,6-bisphosphate triosephosphate-lyase

= ketose 1-phosphate aldolase = phosphofructoaldolase = zymohexase

Neither genetic names nor common names are consistently used

= =

Page 20: July 2015 CSHL Navigating data at the Saccharomyces Genome Database Rob Nash, Senior Biocuration Scientist rnash@stanford.edu

July 2015 CSHL

The solution: GO, a set of three independent structured, controlled vocabularies for describing the molecular function, biological process, and

cellular component of gene products

Molecular function: the tasks performed by individual gene products, for example, fructose-bisphosphate aldolase activity or protein serine/threonine kinase activity.

Biological process: the broad biological goals, such as mitosis or DNA replication, that are accomplished by ordered assemblies of molecular functions.

Cellular component: subcellular structures, locations, and macromolecular complexes, such as nucleus, cellular bud tip, and origin recognition complex.

Page 21: July 2015 CSHL Navigating data at the Saccharomyces Genome Database Rob Nash, Senior Biocuration Scientist rnash@stanford.edu

July 2015 CSHL

GO Annotation DetailsGO Summary

Biological Process

Molecular Function

Cellular Component

Page 22: July 2015 CSHL Navigating data at the Saccharomyces Genome Database Rob Nash, Senior Biocuration Scientist rnash@stanford.edu

July 2015 CSHL

Phenotype details

Use SGD search to locate observables and ALL textBrowsable list of all phenotypes

Page 23: July 2015 CSHL Navigating data at the Saccharomyces Genome Database Rob Nash, Senior Biocuration Scientist rnash@stanford.edu

July 2015 CSHL

Interaction details

Operations• sort• filter• analyze

Page 24: July 2015 CSHL Navigating data at the Saccharomyces Genome Database Rob Nash, Senior Biocuration Scientist rnash@stanford.edu

July 2015 CSHL

Expression details

Page 25: July 2015 CSHL Navigating data at the Saccharomyces Genome Database Rob Nash, Senior Biocuration Scientist rnash@stanford.edu

July 2015 CSHL

SPELL expression tool

See expression of an individual gene in selected dataset(s) Enter a set of genes and find genes

with similar expression profiles (optional filtering by tags)

Page 26: July 2015 CSHL Navigating data at the Saccharomyces Genome Database Rob Nash, Senior Biocuration Scientist rnash@stanford.edu

July 2015 CSHL

Regulation details

Overview

Domains/classifications

Targets

Shared GO for targets

Regulators

Page 27: July 2015 CSHL Navigating data at the Saccharomyces Genome Database Rob Nash, Senior Biocuration Scientist rnash@stanford.edu

July 2015 CSHL

Biochemical Pathways

Page 28: July 2015 CSHL Navigating data at the Saccharomyces Genome Database Rob Nash, Senior Biocuration Scientist rnash@stanford.edu

July 2015 CSHL

Gbrowse

Navigation: * landmark * scrolling * zooming

Selecting: * tracks * subtracks

Page 29: July 2015 CSHL Navigating data at the Saccharomyces Genome Database Rob Nash, Senior Biocuration Scientist rnash@stanford.edu

July 2015 CSHL

Navigation:• Region (chrVI:48,978..58,977), gene name (CDC28), keyword

(invasive growth)• Highlighted rectangle in overview is region of genome displayed in

detail panel• Region panel displays a portion of the genome surrounding the region

of interest• Detail panel displays zoomed in view that corresponds to the overview

selection rectangle

Select tracks:• SGD Annotations sequence features• Chromatin structure histone modifications, nucleosome

org.• Gene Structure transcription start sites, 5’ and 3’ UTRs• RNA expression mRNA, ncRNA, cell cycle• Replication and Recomb’n meiotic recomb’n, origins of

replication• Transcription Regulation txn factors, RNAPII, preinitiation

factors• Analysis restriction sites

Page 30: July 2015 CSHL Navigating data at the Saccharomyces Genome Database Rob Nash, Senior Biocuration Scientist rnash@stanford.edu

July 2015 CSHL

Data files for download

Page 31: July 2015 CSHL Navigating data at the Saccharomyces Genome Database Rob Nash, Senior Biocuration Scientist rnash@stanford.edu

July 2015 CSHL

Search full text with Textpresso

Page 32: July 2015 CSHL Navigating data at the Saccharomyces Genome Database Rob Nash, Senior Biocuration Scientist rnash@stanford.edu

July 2015 CSHL

Genome Snapshot: global questions about the genome and its annotation status