cyverse-enabled ncbi sequence read archive (sra) submission pipeline

18
CyVerse-enabled NCBI Sequence Read Archive (SRA) Submission Pipeline -A Part of the CyVerse Data Commons Effort

Upload: shavonne-gibson

Post on 18-Jan-2018

232 views

Category:

Documents


0 download

DESCRIPTION

Data Commons services are in development Scientific Project Management In the CyVerse Data Commons Support collaboration, reproducibility, data publication, discovery, and reuse Projects Interface Planned 4th DE tab - View/Edit custom or standardized metadata – Track analysis history – Enhanced tools to share data and analyses with collaborators over a project lifecycle Staging Area Prepare data/metadata for publication – Request permanent identifiers – Submit to the CyVerse Data Commons Repository - Submit to canonical repositories Slide is meant to provide ‘big picture’ of services and value. Not all services are iDC-centric. Not all planned services are represented (Searching within Mirrors or custom Mirrors UIs for example). May not be familiar with Mirrors. Slide heading and box headings are take homes iDCr data will be significantly more discoverable and useful to users after metadata curation, not languishing in canonical repositories - will enable enhanced public dissemination, discoverability, and reuse of data - Will be able to go from discovery in iDCr to direct reuse in iPlant without having to download in between Projects Interface: development started Q4 2015 Bulk Metadata Association: Q4 2015 release Mirrors 1.0: in QA-testing for Q4 2015 release DOI Pipeline: can issue DOIs, release pending landing pages, next step = automation Mirrors 2.0: landing pages for datasets with DOIs, custom UIs, in development NCBI SRA Submission: pipeline released Q3 2015 NCBI Genome Assembly Submission Pipeline: Started Q4 2015 Data Commons Repository Within the CyVerse Data Store - Public, static, searchable, discoverable – Disseminate published data with permanent identifiers - Browse, search, discover, and reuse datasets - Data Commons services are in development

TRANSCRIPT

Page 1: CyVerse-enabled NCBI Sequence Read Archive (SRA) Submission Pipeline

CyVerse-enabled NCBI Sequence Read Archive (SRA) Submission Pipeline

-A Part of the CyVerse Data Commons Effort

Page 2: CyVerse-enabled NCBI Sequence Read Archive (SRA) Submission Pipeline

Scientific Project Management In the CyVerse Data CommonsSupport collaboration, reproducibility, data publication, discovery, and reuse

Prepare data/metadata for publication – Request permanent identifiers – Submit to the CyVerse Data Commons Repository - Submit to canonical repositories

Staging Area

Within the CyVerse Data Store - Public, static, searchable, discoverable – Disseminate published data with permanent identifiers - Browse, search, discover, and reuse datasets -

Data Commons Repository

Planned 4th DE tab - View/Edit custom or standardized metadata – Track analysis history – Enhanced tools to share data and analyses with collaborators over a project lifecycle

Projects Interface

Data Commons services are in development

Page 3: CyVerse-enabled NCBI Sequence Read Archive (SRA) Submission Pipeline

SRA Submission Overview

• SRA Home Page – safe to assume familiarity?

• SRA Submission Quick Start Guide– Create BioProject and BioSample(s)

– SRA Submissions = compressed seq files and metadata for ‘Experiments’ and ‘Runs’ associated with BioProject and BioSample(s)

Page 4: CyVerse-enabled NCBI Sequence Read Archive (SRA) Submission Pipeline

Submitting to the NCBI Sequence Read Archive can be Time Consuming

• Submission Package = compressed seq files and metadata for associated BioProject and BioSample(s) and sequencing libraries

• Pain Points: Independent BioProject and BioSample creation - Data compression - Checksum generation - Copy paste errors - Correct metadata templates and formats - Uploads slow and or interrupted - Error correction

• Worked with SRA to create interoperable submission workflow in the Discovery Environment

Page 5: CyVerse-enabled NCBI Sequence Read Archive (SRA) Submission Pipeline

CyVerse Users Asked for Help

• Explored browser-based and bulk submissions

• Developed pipeline in collaboration with SRA staff

• Set up Aspera Connect on CyVerse systems and linked to SRA test servers

• Built submission pipeline in the Discovery Environment

Page 6: CyVerse-enabled NCBI Sequence Read Archive (SRA) Submission Pipeline

CyVerse-Enabled SRA Submissions Remove Roadblocks1. Upload Data into CyVerse Discovery Environment

– Efficient tools already in place within the DE– Batch compress data (if not already compressed before upload)

2. Create and organize submission package– Eliminate need for independent BioProject and BioSample creation and checksums

3. Enter metadata from templates and save package metadata to file– Eliminate metadata burden with dropdown menus, instructions, and metadata copying– CyVerse system generates single metadata XML file for submission

4. Submit package with metadata with CyVerse SRA submission App– Eliminate data transfer problems with CyVerse transfers via Aspera Connect

5. Receive submission notification from SRA– SRA validates submissions and communicates to users as usual, CyVerse is not ‘in the way’

6. If needed, correct package errors and resubmit– Correct errors and resubmit from CyVerse, without having to recreate submission package

Page 7: CyVerse-enabled NCBI Sequence Read Archive (SRA) Submission Pipeline

CyVerse SRA Submission Workflow

• BioProject Creation App– For creating a new BioProject with submission

• BioProject Update App– For updating an existing BioProject for submission

• Submission Report Retrieval App• Submission Tutorial– Example Data in the DE at: /iplant/home/shared/iplantcollaborative/example_data/SRA_submission

Demo – SRA Submission Example Package

Page 8: CyVerse-enabled NCBI Sequence Read Archive (SRA) Submission Pipeline
Page 9: CyVerse-enabled NCBI Sequence Read Archive (SRA) Submission Pipeline
Page 10: CyVerse-enabled NCBI Sequence Read Archive (SRA) Submission Pipeline
Page 11: CyVerse-enabled NCBI Sequence Read Archive (SRA) Submission Pipeline
Page 12: CyVerse-enabled NCBI Sequence Read Archive (SRA) Submission Pipeline
Page 13: CyVerse-enabled NCBI Sequence Read Archive (SRA) Submission Pipeline
Page 14: CyVerse-enabled NCBI Sequence Read Archive (SRA) Submission Pipeline
Page 15: CyVerse-enabled NCBI Sequence Read Archive (SRA) Submission Pipeline
Page 16: CyVerse-enabled NCBI Sequence Read Archive (SRA) Submission Pipeline
Page 17: CyVerse-enabled NCBI Sequence Read Archive (SRA) Submission Pipeline
Page 18: CyVerse-enabled NCBI Sequence Read Archive (SRA) Submission Pipeline