generation, and archiving of ngs management systems … ·  · 2015-03-16generation, annotation...

122
Generation, annotation and archiving of NGS Generation, annotation and archiving of NGS data: Laboratory Information Management data: Laboratory Information Management Systems (LIMS) and Distributed Annotation Systems (LIMS) and Distributed Annotation Server architecture Server architecture Advanced genome browsers: The Integrated Advanced genome browsers: The Integrated Genome Browser Genome Browser Heiko Heiko Muller Muller Computational Research IIT@SEMM Computational Research IIT@SEMM [email protected] [email protected] Genomic Computing, DEIB, 1620 March 2015

Upload: dangquynh

Post on 06-Apr-2018

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Generation, annotation and archiving of NGS Generation, annotation and archiving of NGS  data: Laboratory Information Management data: Laboratory Information Management 

Systems (LIMS) and Distributed Annotation Systems (LIMS) and Distributed Annotation  Server architectureServer architecture

Advanced genome browsers: The Integrated Advanced genome browsers: The Integrated  Genome Browser Genome Browser 

HeikoHeiko

MullerMullerComputational Research IIT@SEMMComputational Research IIT@SEMM

[email protected]@iit.it

Genomic Computing, DEIB, 16‐20 March 2015

Page 2: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Illumina

HiSeq

Each lane containsmore than one sample(multiplexing)

180 mio

clusters per lane

Page 3: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

NGS data flow

The current situation: 1.Biologist fills in request form and sends it to service‐[email protected] are inserted into LIMS and request ID’s

are sent back to biologist3.Samples are sequenced and run data are inserted into LIMS4.LIMS prepares sample sheets that are used for demultiplexing

and bcl‐>fastq

conversion5.FastQC

is run for quality control6.FASTQ data are saved on IIT‐Isilon

device and hard links are produced in user folders7.Group bioinformaticians

align and analyze data8.Group bioinformaticians

interact with biologists to interpret results

Request LIMS‐>FASTQ bioinformaticiansElaborated data sets

homogeneous heterogeneous

Page 4: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

NGS usage on campus

Page 5: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

LIMS 1.0: NGS requests

http://hilt.iit.ieo.eu:8080/NGSSampleInfo/http://hilt.iit.ieo.eu:8080/NGSSampleInfo/

LIMS = Laboratory Information Management SystemLIMS = Laboratory Information Management System

Page 6: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

LIMS 1.0: NGS requests

http://hilt.iit.ieo.eu:8080/NGSSampleInfo/http://hilt.iit.ieo.eu:8080/NGSSampleInfo/

filter

Page 7: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Data delivery

LIMS 1.0 LIMS 2.0

Page 8: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Data delivery

http://hilt.iit.ieo.eu/data/delivery_stats.xlsx

Page 9: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

users

facility

Illumina

HiSeq2000

LIMSfrontend

SGE‐HPC

blade

GPUbladebladeblade

bladeblade

Storage Isilon

LIMS DB

Quality control (FastQC)

data

Genome browsers

UCSC

IGB, DAS/2, Quickload

Application servers:Apache2, Glassfish, UCSC, DAS/2, 

Quickload, data listings

Infrastructure

Page 10: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Application server, blades, GPU

Page 11: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Isilon

storage (250 TB, 300.000 Euro)

Page 12: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Request LIMS‐>FASTQ bioinformaticiansElaborated data sets

homogeneous heterogeneous

NGS data flow

Can we improve it?

Raw data: 27.8 TBFASTQ data 25.5 TB

Elaborated data: > 57 TBScratch: 13 TB> 70 TB

Page 13: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Limitations of LIMS 1.0

• No roles• Sample – lane relationship N : 1, N : N desirable• No projects• No sample annotation compatible with GMQL• No workflows

• ‐> developed LIMS 2.0 together with PoliMi

Venco, Francesco, et al. "SMITH: A LIMS for handling next‐generation sequencing workflows." BMC bioinformatics

15.Suppl 14 (2014): S3.

Page 14: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

GMQL Compatible Laboratory Information Management System

Demo available: https://cru.genomics.iit.it/smith/Demo available: https://cru.genomics.iit.it/smith/

SMITH: Sequencing Machine Information Tracking and HandlingFrancesco Venco, Yuriy

Vaskin, Arnaud Ceol, Marco Masseroli, Stefano Ceri, Heiko

Muller 

Page 15: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Controller(FacesServlet)

Model(Managed beans)

View(xhtml

facelets)

Hibernate(ORM)

MySQLJava EE7 web server 

MySQL

SGE‐HPC

File system

Web clients

Sample submission

Sample annotation

Sample analysis

Run folder monitor

Reagent store

Role based access

Virtual flow cell

Index compatibility

Email alertsSample tracking

Quality control

Project awareness

SMITH features

Page 16: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

requested

queued

confirmed

analyzed

user

technician

Principal investigator

SMITH, HPC, Galaxy

SMITH Sample states

Page 17: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

SMITH database schema

Page 18: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Request submission etc, stand‐alone DB client

Page 19: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

SMITH Context parameter (configurable)

Page 20: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Request form

Page 21: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

SMITH Sample search (role‐based)

Roles: 

Admin

everythingGuest

look, no sample detailsGroup leader

define projects, collaborators, track 

group samplesUser

submit and track samplesTechnician

start NGS runs

Page 22: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

SMITH NGS runs

Page 23: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Mindex

(Mindful Index) to support multiplexing in flow cell assembly

Page 24: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

SMITH NGS runs assembly: Mindex

Page 25: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

SMITH Samplesheets

Page 26: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

SMITH NGS analysis trigger

Page 27: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

From BCL (base call format) to FASTQ: Demultiplexing

Samplesheet

Script generator

Run on IIT blades (Process proc = Runtime.getRuntime().exec(command);)

Page 28: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

SMITH NGS reagents

Page 29: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

SMITH Project aware

Page 30: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

SMITH Sample annotation with attribute‐value pairs ‐> GMQL

Attributes: 

search samplesdo statistics on attribute values (GQL)

Page 31: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

SMITH workflows (Data tab)

Path to BigWig/Bam data

Path to FASTQ data

Page 32: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

SMITH News

Page 33: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

SMITH users

Automatic email communications

Page 34: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

By the end of analysis we get big files files

fastq

bam bigWig

Page 35: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

SMITH simplifies analysis workflow

Request LIMS‐>FASTQ CRUElaborated data sets

homogeneous heterogeneous

Page 36: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Previous situation

FASTQ file

User folder

FASTQ folder,Backed up

Page 37: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Current situation for bam files

bam file

User folder

BAM folder,Backed up

Quickload DAS2

Page 38: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

bigWig

file

User folder

bigWig

folder,Backed up

Quickload DAS2

Current situation for bigWig

files

Page 39: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Request LIMS‐>FASTQ CRUElaborated data sets

homogeneousHomogeneous, 

Less space consuming,Accessible, sharable,

Bioinformaticians

can do more science,Biologists get tracks instantly,

GQL meta‐analysis of ENCODE dataCollaborative (analyses and pipelines)

Advantages

Page 40: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

View and share data immediately

Data sources

Page 41: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Data Sources

http://bioserver.iit.ieo.eu/genopub/http://bioserver.iit.ieo.eu/genopub/ http://hilt.iit.ieo.eu/quickload/http://hilt.iit.ieo.eu/quickload/

Page 42: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Share your data, in the lab or worldwide, by setting access levels, use plug‐ins

DAS2 manages access levels

Plugins

Page 43: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

View side‐by‐side with UCSC tracks

Page 44: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Visualizing NGS data: Genome BrowsersVisualizing NGS data: Genome Browsers

Page 45: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Visualizing NGS data: Genome Browsers

Page 46: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Visualizing genomic data: What is a “Genome Browser”

• linear representation

of a genome

• position‐based annotations, each called a track

– continuous annotations: e.g. conservation– interval annotations: e.g. gene, read alignment

– point annotations: e.g. SNPs• user specifies a subsection

of genome to look at

Page 47: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Comparison of Genome Browsers

UCSC Ensembl IGV IGB

Referencehttp://genome.ucsc.edu/ http://www.ensembl.org/index.html http://www.broadinstitute.org/igv/ http://bioviz.org/igb/

Model Server Server Client Client

Interactive

HTS support

Database of tracks

Plugins

No support Some support Good support

Server model Client model

Server central data store Server stores datarenders imagessends to client

Client requests images Client local HTS storedisplays images renders images

displays images

Limitations:

do not

support multiple genomes simultaneouslydo not capture 3‐dimensional conformationdo not capture spatial or temporal informationdo not integrate well with analytics

Page 48: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

• Browse many eukaryotic genomes (yeast to human)

• Most annotations are there

• Important evolutionary and variation data representation.

• Very flexible and configurable views

• Graphical and table views

• Upload your data into custom tracks and share with 

colleagues

• Client/server application with it’s issues, but a great app!

About UCSC Genome Browser

Page 49: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

http://genome.ucsc.edu

Page 50: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

http://genome.ucsc.edu

Page 51: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Integrated Genome Browser and IIT DAS2 server

Page 52: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Integrated Genome Browser and published genome annotations

Page 53: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Genome browser view: ChIP‐seq

.bam.bed .bigWig

Page 54: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Genome browser view: sequencing errors

Page 55: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Integrated Genome Browser and the Distributed Annotation System (DAS)

Outline

Genome Browsing: Why was DAS developed?DAS: history, usage, and specification, reference implementationIntegrated Genome BrowserExamples

Page 56: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Integrated Genome Browser and the Distributed Annotation System (DAS)

Outline

Genome Browsing: Why was DAS developed?DAS: history, usage, and specification, reference implementationIntegrated Genome BrowserExamples

Page 57: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Frederic Sanger

Genbank

Centralized repository, sequences owned by submitter, 

Genbank

Page 58: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

LOCUS       NM_053056               4304 bp

mRNA    linear   PRI 27‐MAY‐2012DEFINITION  Homo sapiens cyclin

D1 (CCND1), mRNA.ACCESSION   NM_053056 NM_001758VERSION     NM_053056.2  GI:77628152KEYWORDS    .SOURCE      Homo sapiens (human)ORGANISM  Homo sapiens

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;Catarrhini; Hominidae; Homo.

REFERENCE   1  (bases 1 to 4304)AUTHORS   Li,Q., Dong,Q. and Wang,E.TITLE     Rsf‐1 is overexpressed

in non‐small cell lung cancers and regulatescyclinD1 expression and ERK activity

JOURNAL   Biochem. Biophys. Res. Commun. 420 (1), 6‐10 (2012)PUBMED   22387541REMARK    GeneRIF: Rsf‐1 is overexpressed

in non‐small cell lung cancers andcontributes to malignant cell growth by cyclin

D1 and ERKmodulation.

PRIMARY     REFSEQ_SPAN         PRIMARY_IDENTIFIER PRIMARY_SPAN 

COMP1‐138               BM796500.1         1‐138139‐1278            BC001501.2         73‐12121279‐4077           AP001888.4         12952‐157504078‐4304           X59798.1           4018‐4244

FEATURES             Location/Qualifierssource          1..4304

/organism="Homo sapiens"/mol_type="mRNA"/db_xref="taxon:9606"/chromosome="11"/map="11q13"

gene

1..4304/gene="CCND1"/gene_synonym="BCL1; D11S287E; PRAD1; U21B31"/note="cyclin

D1"/db_xref="GeneID:595"/db_xref="HGNC:1582"/db_xref="HPRD:01346"/db_xref="MIM:168461"

exon

1..407/gene="CCND1"/gene_synonym="BCL1; D11S287E; PRAD1; U21B31"/inference="alignment:Splign"/number=1

CDS

210..1097/gene="CCND1"/gene_synonym="BCL1; D11S287E; PRAD1; U21B31"/note="B‐cell CLL/lymphoma 1; BCL‐1 oncogene; PRAD1oncogene; B‐cell lymphoma 1 protein"/codon_start=1/product="G1/S‐specific cyclin‐D1”

/protein_id="NP_444284.1"/db_xref="GI:16950655"/db_xref="CCDS:CCDS8191.1"/db_xref="GeneID:595"/db_xref="HGNC:1582"/db_xref="HPRD:01346"/db_xref="MIM:168461"/translation="MEHQLLCCEVETIRRAYPDANLLNDRVLRAMLKAEETCAPSVSYFKCVQKEVLPSMRKIVATWMLEVCEEQKCEEEVFPLAMNYLDRFLSLEPVKKSRLQLLGATCMFVASKMKETIPLTAEKLCIYTDNSIRPEELLQMELLLVNKLKWNLAAMTPHDFIEHFLSKMPEAEENKQIIRKHAQTFVALCATDVKFISNPPSMVAAGSVVAAVQGLNLRSPNNFLSYYRLTRFLSRVIKCDPDCLRACQEQIEALLESSLRQAQQNMDPKAAEEEEEEEEEVDLACTPTDVRDVDI"

misc_feature

885..887/gene="CCND1"/gene_synonym="BCL1; D11S287E; PRAD1; U21B31"/experiment="experimental evidence, no additional detailsrecorded"/note="Phosphotyrosine; propagated fromUniProtKB/Swiss‐Prot (P24385.1); phosphorylation

site"ORIGIN      

1 cacacggact

acaggggagt

tttgttgaag

ttgcaaagtc

ctggagcctc

cagagggctg61 tcggcgcagt

agcagcgagc

agcagagtcc

gcacgctccg

gcgaggggca

gaagagcgcg121 agggagcgcg

gggcagcaga

agcgagagcc

gagcgcggac

ccagccagga

cccacagccc181 tccccagctg

cccaggaaga

gccccagcca

tggaacacca

gctcctgtgc

tgcgaagtgg241 aaaccatccg

ccgcgcgtac

cccgatgcca

acctcctcaa

cgaccgggtg

ctgcgggcca301 tgctgaaggc

ggaggagacc

tgcgcgccct

cggtgtccta

cttcaaatgt

gtgcagaagg361 aggtcctgcc

gtccatgcgg

aagatcgtcg

ccacctggat

gctggaggtc

tgcgaggaac421 agaagtgcga

ggaggaggtc

ttcccgctgg

ccatgaacta

cctggaccgc

ttcctgtcgc481 tggagcccgt

gaaaaagagc

cgcctgcagc

tgctgggggc

cacttgcatg

ttcgtggcct541 ctaagatgaa

ggagaccatc

cccctgacgg

ccgagaagct

gtgcatctac

accgacaact601 ccatccggcc

cgaggagctg

ctgcaaatgg

agctgctcct

ggtgaacaag

ctcaagtgga661 acctggccgc

aatgaccccg

cacgatttca

ttgaacactt

cctctccaaa

atgccagagg721 cggaggagaa

caaacagatc

atccgcaaac

acgcgcagac

cttcgttgcc

ctctgtgcca//

A Genbank

entry

By design, annotations are nearly impossible to 

incorporate

Page 59: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Since 1989, centrally curated, annotations provided by the community‐> curation

bottleneck

AceDB: A C.elegans

database

Page 60: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

2001

2002

‐To view massive amounts of sequencing data, genome browsers were

developed.‐Annotations developed in “Annotation Jamborees”‐Human Genome Project Analysis Group: concept of annotation tracks‐Tracks produced and curated

by different groups but stored on centralized server

‐>Bandwidth bottleneck

HUGO

Page 61: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Integrated Genome Browser and the Distributed Annotation System (DAS)

Outline

Genome Browsing: Why was DAS developed?DAS: history, usage, and specification, reference implementationIntegrated Genome BrowserExamples

Page 62: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Decentralized curation

of annotation tracksDecentralized storage of annotation tracks

Distributed Annotation System: DAS

The distributed annotation system

Page 63: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

components:

1

Reference genome server 

(provides coordinates and sequence)2 Annotation server(s) 

(provides annotation tracks)3

Client 

(view annotations mapped onto reference)

DAS basics

reference

Client (web or stand alone)

annotations

Dowell et al. 2001

Page 64: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Geodesic: Standalone client by Dowell et al. 2001

Source code: http://www.biodas.org/geodesic/

Page 65: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Glyphs: Graphic elements used for track display

Page 66: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

DAS/2 (not listed in registry)

http://india907.server4you.de:8080/das2/genome 

(epigenome.at)http://www.bioviz.org/das2/genome 

(Bioviz)http://bioserver.hci.utah.edu:8080/DAS2DB/genome (UofUtahBioinfoCore)http://netaffxdas.affymetrix.com/das2/genome 

(NetAffx)

Currently 1600 DAS/1 entriesClients:

DAS registry (www.dasregistry.org)

Page 67: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

http://www.biodas.org/wiki/DAS/2

Main difference: DAS/2 supports non‐XML file formatsDAS/2 clients support DAS/1 but not vice versa

DAS/1 != DAS/2

2004‐2007, NIH grant for DAS/2 development, 

partners: 

Affymetrix, Cold Spring Harbor Lab, the EBI/ Sanger Center, Dalke

Scientific

Page 68: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

DAS specification (www.biodas.org)

Page 69: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Sources: 

list available genomesSegments: 

lists chromosomes per genomeTypes: 

list types of annotation (file format etc)Features: 

list annotation details in specific region

DAS: Basic Query types: sources, segments, types, features 

Page 70: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

<?xml version="1.0" encoding="UTF‐8"?><SOURCESxmlns="http://biodas.org/documents/das2"xml:base="http://rubidio.ifom‐ieo‐campus.it:8080/das2/genome/" ><MAINTAINER email="ivan.lago@ifom‐ieo‐campus.it" /><SOURCE uri="D_rerio" title="D_rerio" ><VERSION uri="danRer7" title="danRer7" created="2012‐05‐05T16:47:27+0200" >

<CAPABILITY type="segments" query_uri="danRer7/segments" /><CAPABILITY type="types" query_uri="danRer7/types" /><CAPABILITY type="features" query_uri="danRer7/features" />

</VERSION></SOURCE><SOURCE uri="H_sapiens" title="H_sapiens" ><VERSION uri="H_sapiens_Mar_2006" title="H_sapiens_Mar_2006" created="2012‐05‐05T16:47:27+0200" >

<COORDINATES uri="http://www.ncbi.nlm.nih.gov/genome/H_sapiens/B36.1/" authority="NCBI" taxid="9606" version="36" source="Chromosome" /><CAPABILITY type="segments" query_uri="H_sapiens_Mar_2006/segments" /><CAPABILITY type="types" query_uri="H_sapiens_Mar_2006/types" /><CAPABILITY type="features" query_uri="H_sapiens_Mar_2006/features" />

</VERSION></SOURCE><SOURCE uri="M_musculus" title="M_musculus" ><VERSION uri="M_musculus_Jul_2007" title="M_musculus_Jul_2007" created="2012‐05‐05T16:47:27+0200" >

<CAPABILITY type="segments" query_uri="M_musculus_Jul_2007/segments" /><CAPABILITY type="types" query_uri="M_musculus_Jul_2007/types" /><CAPABILITY type="features" query_uri="M_musculus_Jul_2007/features" />

</VERSION><VERSION uri="M_musculus_Mar_2006" title="M_musculus_Mar_2006" created="2012‐05‐05T16:47:27+0200" >

<CAPABILITY type="segments" query_uri="M_musculus_Mar_2006/segments" /><CAPABILITY type="types" query_uri="M_musculus_Mar_2006/types" /><CAPABILITY type="features" query_uri="M_musculus_Mar_2006/features" />

</VERSION></SOURCE></SOURCES>

http://bioserver.iit.ieo.eu/genopub/genome

Page 71: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

<?xml version="1.0" encoding="UTF‐8"?> <SEGMENTS 

xmlns="http://biodas.org/documents/das2" xml:base="http://rubidio.ifom‐ieo‐campus.it:8080/das2/genome/M_musculus_Jul_2007/" uri="http://rubidio.ifom‐ieo‐campus.it:8080/das2/genome/M_musculus_Jul_2007/segments" > 

<SEGMENT uri="chr1" title="chr1" length="197195432" /> <SEGMENT uri="chr2" title="chr2" length="181748087" /><SEGMENT uri="chr3" title="chr3" length="159599783" /> <SEGMENT uri="chr4" title="chr4" length="155630120" /><SEGMENT uri="chr5" title="chr5" length="152537259" /> <SEGMENT uri="chr6" title="chr6" length="149517037" /> <SEGMENT uri="chr7" title="chr7" length="152524553" /> <SEGMENT uri="chr8" title="chr8" length="131738871" /> <SEGMENT uri="chr9" title="chr9" length="124076172" /> <SEGMENT uri="chr10" title="chr10" length="129993255" /><SEGMENT uri="chr11" title="chr11" length="121843856" /> <SEGMENT uri="chr12" title="chr12" length="121257530" /> <SEGMENT uri="chr13" title="chr13" length="120284312" /> <SEGMENT uri="chr14" title="chr14" length="125194864" /> <SEGMENT uri="chr15" title="chr15" length="103494974" /> <SEGMENT uri="chr16" title="chr16" length="98319150" /><SEGMENT uri="chr17" title="chr17" length="95272651" /> <SEGMENT uri="chr18" title="chr18" length="90772031" /> <SEGMENT uri="chr19" title="chr19" length="61342430" /> <SEGMENT uri="chrX" title="chrX" length="166650296" /> <SEGMENT uri="chrY" title="chrY" length="15902555" /> <SEGMENT uri="chrM" title="chrM" length="16299" /> <SEGMENT uri="chr1_random" title="chr1_random" length="1231697" /> <SEGMENT uri="chr3_random" title="chr3_random" length="41899" /> <SEGMENT uri="chr4_random" title="chr4_random" length="160594" /> <SEGMENT uri="chr5_random" title="chr5_random" length="357350" /> <SEGMENT uri="chr7_random" title="chr7_random" length="362490" /> <SEGMENT uri="chr8_random" title="chr8_random" length="849593" /> <SEGMENT uri="chr9_random" title="chr9_random" length="449403" /> <SEGMENT uri="chr13_random" title="chr13_random" length="400311" /> <SEGMENT uri="chr16_random" title="chr16_random" length="3994" /> <SEGMENT uri="chr17_random" title="chr17_random" length="628739" /> <SEGMENT uri="chrUn_random" title="chrUn_random" length="5900358" /> <SEGMENT uri="chrX_random" title="chrX_random" length="1785075" /> <SEGMENT uri="chrY_random" title="chrY_random" length="58682461" /> 

</SEGMENTS> 

http://bioserver.iit.ieo.eu/genopub/genome/M_musculus_Jul_2007/segments

Page 72: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

<?xml version="1.0" encoding="UTF‐8"?><TYPES xmlns="http://biodas.org/documents/das2"xml:base="http://localhost:8080/genopub/genome/M_musculus_Jul_2007/" ><TYPE uri="EML1/PU1_ChIP/Input" title="EML1/PU1_ChIP/Input" ><FORMAT name="useq" /><PROP key="Normalization" value="N" /><PROP key="group" value="Alcalay" /><PROP key="group_contact" value="Myriam

Alcalay" /><PROP key="group_email" value="myriam.alcalay@ifom‐ieo‐campus.it" /><PROP key="name" value="Input" /><PROP key="owner" value="Alcalay, Myriam" /><PROP key="owner_email" value="IEO" /><PROP key="owner_institute" value="myriam.alcalay@ifom‐ieo‐campus.it" /><PROP key="url" value="http://localhost:8080/genopub/genopub?idAnnotation=11" /><PROP key="visibility" value="Members" />

</TYPE><TYPE uri="EML1/PU1_ChIP/PU1_A3" title="EML1/PU1_ChIP/PU1_A3" ><FORMAT name="useq" /><PROP key="Normalization" value="N" /><PROP key="group" value="Alcalay" /><PROP key="group_contact" value="Myriam

Alcalay" /><PROP key="group_email" value="myriam.alcalay@ifom‐ieo‐campus.it" /><PROP key="name" value="PU1_A3" /><PROP key="owner" value="Alcalay, Myriam" /><PROP key="owner_email" value="IEO" /><PROP key="owner_institute" value="myriam.alcalay@ifom‐ieo‐campus.it" /><PROP key="url" value="http://localhost:8080/genopub/genopub?idAnnotation=7" /><PROP key="visibility" value="Members" />

</TYPE></TYPES>

http://bioserver.iit.ieo.eu/genopub/genome/M_musculus_Jul_2007/types

Page 73: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

http://localhost:8080/genopub/genome/M_musculus_Jul_2007/features?segment=http%3A%2F%2Flocalhost%3A8080%2Fgenopub%2Fgenome%2FM_musculus_Jul_2007%2Fchr1;overlaps=79374747%3A81152999;type=http%3A%2F%2Flocalhost%3A8080%2Fgenopub%2Fgenome%2FM_musculus_Jul_2007%2FEML1%2FPU1_ChIP%2FPU1_B2;format=useq

Returns a file in useq

format, essentially a zip file, preferred format in IGBContains a archiveReadMe.txt

and one or more “slice files”Observations can be textual or numerical

http://useq.sourceforge.net/useqArchiveFormat.html

Features

Page 74: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

A BED file (.bed) is a tab‐delimited text file that defines a feature track. File extension .bed is recommended. The BED file format is described on the UCSC Genome Bioinformatics web site: http://genome.ucsc.edu/FAQ/FAQformat. Tracks in the UCSC Genome Browser (http://genome.ucsc.edu/) can be downloaded to BED files and loaded into IGB/IGV.

Notes: Zero‐based index: Start and end positions are identified using a zero‐based index. The end position is excluded. For example, setting start‐end to 1‐2

describes exactly one base, the second base in the sequence (ACGT).

track name=pairedReads

description="Clone Paired Reads"Chr22

1000

5000

cloneAChr22

2000

6000

cloneB

Other important file formats: BED (textual)

Page 75: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

The bedGraph

format is line‐oriented. Bedgraph

data are preceededby a track definition line, which adds a number of options for controlling the default display of this track. The track type is REQUIRED, and must be “bedGraph”.

Bedgraph

track data values can be integer or real, positive or negative values. Chromosome positions are specified as 0‐relative. The first chromosome position is 0. The last position in a chromosome of length N would be N ‐

1. Only positions specified have data. Positions not specified do not have data and will not be graphed. All positions specified in the input data must be in numerical order. The bedGraph

format has four columns of data:  

track type=bedGraph

name="BedGraph

Format"chr19 49302000 49302300 10 chr19 49302300 49302600 20 chr19 49302600 49302900 25 

Intervals can be of any length and overlapping.

Other important file formats: BEDGraph

(numerical)

Page 76: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

The wiggle (WIG) format is for display of dense, continuous data

such as GC percent, probability scores, and transcriptome

data. Wiggle data elements must be equally sized. If you need to display continuous data that is sparse

or contains elements of varying size, use the BedGraph

format instead. If you have a very large data set and you would

like to keep it on your own server, you should use the bigWig

data format. Chromosome positions are specified as 1‐relative.

variableStep

is for data with irregular intervals between new data points and is the more commonly used wiggle format. It begins with a declaration line and is followed by two columns containing chromosome positions and data values: variableStep

chrom=chrN

[span=windowSize] StartA

dataValueAStartB

dataValueB

variableStep

chrom=chr2      

is equivalent to: 

variableStep

chrom=chr2 span=5300701 12.5 

300701 12.5 300702 12.5 300703 12.5 300704 12.5 300705 12.5 

Both versions display a value of 12.5 at position 300701‐300705 on chromosome 2.

Other important file formats: Wig (“wiggle”)

Page 77: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

The wiggle (WIG) format is for display of dense, continuous data

such as GC percent, probability scores, and transcriptome

data. Wiggle data elements must be equally sized. If you need to display continuous data that is sparse

or contains elements of varying size, use the BedGraph

format instead. If you have a very large data set and you would

like to keep it on your own server, you should use the bigWig

data format. Chromosome positions are specified as 1‐relative.

fixedStep

is for data with regular intervals between new data values and is the more compact wiggle format. It begins with a declaration line and is followed by a single column of data values:

The declaration line starts with the word fixedStep

and includes specifications for chromosome, start coordinate, and step size. The span specification has the same meaning as in 

variableStep

format. For example, this fixedStep

specification: 

fixedStep

chrom=chr3 start=400601 step=100 span=5 11 22 33 

displays the values 11, 22, and 33 as single‐base regions on chromosome 3 at positions 400601, 400701, and 400801, respectively. Step and span are fixed for entire data set.

Other important file formats: Wig (“wiggle”)

Page 78: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Data transfer levelRequired: random access. At the lowest layer, we take advantage of the byte‐range protocols of HTTP and HTTPS, and the protocols associated with resuming interrupted FTP transfers, toachieve random access to binary files over the web.URL data cache layera cache layer on top of the data transfer layer. Data are fetched in blocks of 8 Kb, and each block is kept in a cache.Indexingbased on a single dimensional version of the R tree that is commonly used for indexing geographical data. The index size is typically less than 1% of the size of the data itself. Because the stored data are sorted by chromosome and start position, not every item in the file must be indexed; in fact by default only every 512th item is indexed.Compression:regions between indexed items (containing 512 items by default) are individually compressed (gzip).

BigWig and BigBed

Page 79: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Basic architecture Object relational mappingVia Hibernate

Flex

Apache Tomcat 6Glassfish

mySQL

DAS/2 server reference implementation: http://sourceforge.net/projects/genoviz

Page 80: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Database tables

Page 81: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

User table

Annotation table

User role table Message digest 5 (MD5) encryptionfrom java.security package

Table views

Page 82: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Each file gets his own folder (automatically assigned folder names). No filenames to store in DB, which may contain non‐supported characters.

Data storage directory

Page 83: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Visibility levels:

DAS2 administration user interface

Page 84: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

If you want to access data with restricted visibility, you must be inserted in the usertable and be part of a group that is headed by the owner of the data.

Users and groups setup

Page 85: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Every user, admin or non‐admin, can change his password,load new data, add data descriptions, and set visibility levels.

Non‐administrator users interface

Page 86: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

IGB user identification

jdbcRealmldapRealmBoth work

Page 87: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

NetAffx and UCSC hg19 annotations

All these annotations are one click away from the user

Page 88: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Conclusions

DAS2 servers provide distributed genome annotations

Support fine grained security model

Perform parsing of data for custom genome views

Page 89: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

List of Genome Browsers

AlamutAnnmapApollo Genome Annotation Curation ToolArgo Genome BrowserArtemis Genome BrowserAvadis NGSBugViewCelera Genome BrowserDalliance Javascript‐based genome browserDiProGBDNAnexus Flash‐based interactive genome browserEnsembl The Ensembl Genome BrowserGaggle Genome BrowserGBrowseGenome WowserThe Genomic HyperBrowserIntegrative Genomics Viewer

Genostar GenoBrowserGenoverse interactive genome browserGenPlayGolden Helix GenomeBrowseIntegrated Genome BrowserIntegrated Microbial GenomesJBrowseMGV ‐

Microbial Genome ViewerMochiView Genome BrowserNextBio Genome BrowserPathway Tools Genome BrowserSavant Genome BrowserSEED viewerUCSC Genome Bioinformatics Genome BrowserViral Genome Organizer (VGO)VISTA genome browserWashU Genome Browser

Page 90: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Integrated Genome Browser: reference implementation of a DAS/2 client

IGB: Integrated Genome Browser (http://www.bioviz.org/igb/)

The Integrated Genome Browser (IGB, pronounced Ig‐Bee) is an interactive, zoomable, scrollable software program you can use to visualize and explore

genome‐scale data sets, such as tiling array data, next‐generation sequencing results, genome annotations, microarray designs, and the sequence itself. IGB is implemented using the Java programming language and should run on any computer.

IGB is an open source, publicly‐funded project, but it did not start out that way. Initial development of the software was largely funded by Affymetrix, Inc., which donated the IGB software to the community in 2005. Since then, community developers have continued to contribute their time and

efforts to improving the software. In 2008, funding from National Science Foundation has allowed us

to speed up the pace of development. 

IGB interacts with DAS (distributed annotation system servers)

DAS (http://www.biodas.org/wiki/Main_Page)(DAS) defines a communication protocol used to exchange annotations on genomic or protein sequences. It is motivated by the idea that such annotations should not be provided by single centralized databases, but should instead be spread over multiple sites.

DAS/2 built to address the needs of distributing massive genomic data sets derived from high density microarray applications and Next (and Next Next) Generation Sequencing. Unlike DAS/1, DAS/2 does not require data exchange through text based XML but allows for data distribution using any text or binary format.

Page 91: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Genometry model

Central concept: SeqSymmetry: breadth (SeqSpans) and depth (hierarchy, parents, children)

Page 92: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Hierarchical annotations

Page 93: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

URL: http://www.bioviz.org/igb/download.shtml

How to launch IGB

Page 94: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Refseq and cytoband annotations automatically loaded from NetAffx DAS2

IGB after startup

Page 95: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Data access tab

Page 96: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Search tab

Page 97: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Selection info tab

Page 98: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Sliced view to interrogate alternative splice variants, ORF analysis.

Sliced view tab

Page 99: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Graph adjuster tab

Page 100: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

External view tab

Page 101: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

To load data: Click desired data set, choose region in view or whole chromosome,Click refresh data.

Data access tab

Page 102: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Load Affy probesets in View

Page 103: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

NetAffx and UCSC mm8 annotations

Page 104: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

NetAffx and UCSC mm9 annotations

Page 105: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

ChIPchipChIPseqExon arrayDNAseIRNA seqChIP petRNA petMethyl seqCage tags...Km of data

new server

NetAffx and UCSC hg18 annotations

Page 106: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Data sources: Quickload, DAS, DAS2

Server registration (data source) tab

Page 107: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

1. Single files file type extensionBAM .bamBED .bedBinary .bps, .bgn, .brs, .bsnp, .brpt, .bnib, .bp1, .bp2, .ead, .useqGFF .gff, .gtf, .gff3FASTA .fa, .fasta, .fasPSL .psl, .psl3DAS .das, .dasxml, .das2xmlGraph .gr, .bgr, .sgr, .bar, .chp, .wigScored Interval .sin, .egr, .egr.txtCopy number .cntCopy number chp .cnchp, .lohchpGenomic variation (Toronto DB) .varRegion (genotype console segmenter) SegmenterRptParser.CN_REGION_FILE_EXT, SegmenterRptParser.LOH_REGION_FILE_EXTFishClones .fsh, FishClonesParser.FILE_EXTScored map .map

2. Quickload (local directory with auxiliary files)

Easy to set up but can load data only into entire genome.

example http://www.bioviz.org/quickload/)

Four types of data sources (files)

Page 108: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

3. DAS(1) (example UCSC), (software http://code.google.com/p/mydas/)

Can load data into view of interestresponse XML (problematic for large datasets)

4. DAS2 (example NetAffx), (software http://genoviz.sourceforge.net/

Unlike DAS1, DAS2 does not require data exchange through text based XML but allows for data distribution using any text or binary format. The two versions are not natively compatible.

Can load data into view of interest in a range of different formats.

Four types of data sources (servers)

Page 109: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Loading BAM files from http listing (no need to move them)

Page 110: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

2. Text based annotations (e.g. .bed, .bam, .psl, .gff, .fasta files)

1. Graph based annotations (.gr, .bgr, .sgr, .bar, .chp, .wig, .sin, .egr, .egr.txt)

text

graph

Permit different types of operations

graph

graph

Two basic types of annotation

Page 111: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Logical: intersect, union, A not B, B not A, Xor, Not

antisense transcription

all transcribed regions

Select tracks, right‐click to access context‐menu

Operations on text based annotations

Page 112: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Scale: filter displayed values by value or by percentile

Height: adjust display height

Style: bar, line, dot, min/max/avg. heatmap, stairstep, color

transform: log10

, log2

, loge

, and inverses thereof

Join/split: diplay all graphs as one

arithmetic (requires identical X‐values): sum, difference, product, division

Thresholding: transforms regions meeting given criterion into text‐based annotation(can then be used in logical operations)

Operations on graph based annotations

Page 113: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Plugins

Based on Open Services Gateway initiative (OSGI)

Page 114: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Implement Activator interface

Needed to display plugin in tab

Page 115: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Access tracks from Genometry model

Can perform arbitrary manipulations on tracks

Page 116: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

E‐μ

myc mouse model, Amati/Faga

Example: myc bound and differentially regulated gene

Page 117: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

External view

Page 118: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Molecular Interaction plugin

Page 119: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Molecular Interaction plugin: visualize molecular interactions

Page 120: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Molecular Interaction plugin: visualize interactions with small molecules (drugs)

Our plugin repository (by Arnaud Ceol): http://cru.genomics.iit.it/igb/plugins‐test/

Page 121: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Highly interactive

Excellent logarithmic zooming around hairline

Integrated with UCSC/campus browser

Can do logical/arithmetic operations on annotations

Can create custom annotations on the fly

Can incorporate distributed annotations

Easily customizable display options

Open‐source: new features can be added according to our needs

IGB summary

Page 122: Generation, and archiving of NGS Management Systems … ·  · 2015-03-16Generation, annotation and archiving of NGS data ... 1.Biologist fills in request form and sends it to service‐ings@ieo.eu

Acknowledgements

Arnaud CeolLuca ZammataroJole

CostanzaAnna BiressiSofia CappellariBruno Amati

Francesco VencoYuriy

VaskinMarco MasseroliStefano Ceriet al.

Pier Giuseppe PelicciDomenico

TriaricoRoberta CarboneAnnalisa AriesiDaniela Rossi