submitting sequenced bacs robert buels sgn. purpose and scope provide well-organized archive of...

24
Submitting Sequenced BACs Robert Buels SGN

Upload: gwen-charles

Post on 18-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Submitting Sequenced BACs Robert Buels SGN. Purpose and Scope Provide well-organized archive of sequencing data for the tomato genome sequencing project,

SubmittingSequenced BACs

Robert BuelsSGN

Page 2: Submitting Sequenced BACs Robert Buels SGN. Purpose and Scope Provide well-organized archive of sequencing data for the tomato genome sequencing project,

Purpose and Scope

Provide well-organized archive of sequencing data for the tomato genome sequencing project, including:

finished BAC sequences

chromatograms

assembly files

annotations

Page 3: Submitting Sequenced BACs Robert Buels SGN. Purpose and Scope Provide well-organized archive of sequencing data for the tomato genome sequencing project,

BAC Submission Guidelines

A comprehensive BAC submission format exists (see handout, pp. 4-5).

If we all follow this format, the data will be much easier to work with.

Page 4: Submitting Sequenced BACs Robert Buels SGN. Purpose and Scope Provide well-organized archive of sequencing data for the tomato genome sequencing project,

Problems

Not all submissions follow the submission guidelines.

Irregularities in naming, directory structure, submission file contents.

FTP site does not separate finished and unfinished sequences.

Result: data is harder to work with than it should be!

Page 5: Submitting Sequenced BACs Robert Buels SGN. Purpose and Scope Provide well-organized archive of sequencing data for the tomato genome sequencing project,

Problems

To address these problems:

We improved the directory structure of the FTP site.

Corrected existing submissions to follow submission guidelines.

We made a new submission system that checks submission guidelines.

Page 6: Submitting Sequenced BACs Robert Buels SGN. Purpose and Scope Provide well-organized archive of sequencing data for the tomato genome sequencing project,

New FTP Structure

tomato_genome/bacs/ chr01/ chr02/ ... chr12/ bacs.seq finished_bacs.seq README validate_submission.pl

Page 7: Submitting Sequenced BACs Robert Buels SGN. Purpose and Scope Provide well-organized archive of sequencing data for the tomato genome sequencing project,

New FTP Structure

tomato_genome/bacs/ chr01/ chr02/ ... chr12/ bacs.seq finished_bacs.seq README validate_submission.pl

all seqs

Page 8: Submitting Sequenced BACs Robert Buels SGN. Purpose and Scope Provide well-organized archive of sequencing data for the tomato genome sequencing project,

New FTP Structure

tomato_genome/bacs/ chr01/ chr02/ ... chr12/ bacs.seq finished_bacs.seq README validate_submission.pl

all seqs

finished seqs

Page 9: Submitting Sequenced BACs Robert Buels SGN. Purpose and Scope Provide well-organized archive of sequencing data for the tomato genome sequencing project,

New FTP Structure

tomato_genome/bacs/ chr01/ chr02/ ... chr12/ bacs.seq finished_bacs.seq README validate_submission.pl

all seqs

finished seqs

validation script

Page 10: Submitting Sequenced BACs Robert Buels SGN. Purpose and Scope Provide well-organized archive of sequencing data for the tomato genome sequencing project,

New FTP Structure

tomato_genome/bacs/ chr01/ unfinished/ finished/ C01HBa0088L02.all.xml C01HBa0088L02.seq C01HBa0088L02.tar.gz C01HBa0216G16.all.xml C01HBa0216G16.seq C01HBa0216G16.tar.gz ...

Page 11: Submitting Sequenced BACs Robert Buels SGN. Purpose and Scope Provide well-organized archive of sequencing data for the tomato genome sequencing project,

New FTP Structure

tomato_genome/bacs/ chr01/ unfinished/ finished/ C01HBa0088L02.all.xml C01HBa0088L02.seq C01HBa0088L02.tar.gz C01HBa0216G16.all.xml C01HBa0216G16.seq C01HBa0216G16.tar.gz ...

separate finished and unfinished

Page 12: Submitting Sequenced BACs Robert Buels SGN. Purpose and Scope Provide well-organized archive of sequencing data for the tomato genome sequencing project,

New FTP Structure

tomato_genome/bacs/ chr01/ unfinished/ finished/ C01HBa0088L02.all.xml C01HBa0088L02.seq C01HBa0088L02.tar.gz C01HBa0216G16.all.xml C01HBa0216G16.seq C01HBa0216G16.tar.gz ...

separate finished and unfinished

submission

Page 13: Submitting Sequenced BACs Robert Buels SGN. Purpose and Scope Provide well-organized archive of sequencing data for the tomato genome sequencing project,

New FTP Structure

tomato_genome/bacs/ chr01/ unfinished/ finished/ C01HBa0088L02.all.xml C01HBa0088L02.seq C01HBa0088L02.tar.gz C01HBa0216G16.all.xml C01HBa0216G16.seq C01HBa0216G16.tar.gz ...

separate finished and unfinished

submission

sequence

Page 14: Submitting Sequenced BACs Robert Buels SGN. Purpose and Scope Provide well-organized archive of sequencing data for the tomato genome sequencing project,

New FTP Structure

tomato_genome/bacs/ chr01/ unfinished/ finished/ C01HBa0088L02.all.xml C01HBa0088L02.seq C01HBa0088L02.tar.gz C01HBa0216G16.all.xml C01HBa0216G16.seq C01HBa0216G16.tar.gz ...

separate finished and unfinished

annotations

submission

sequence

Page 15: Submitting Sequenced BACs Robert Buels SGN. Purpose and Scope Provide well-organized archive of sequencing data for the tomato genome sequencing project,

New Submission System

Developed to automatically check submission formatting, analyze submissions, and publish them on the FTP site.

Also improves turn-around time since it is mostly automatic.

Page 16: Submitting Sequenced BACs Robert Buels SGN. Purpose and Scope Provide well-organized archive of sequencing data for the tomato genome sequencing project,

New Submission System

checks submission guidelines

C01HBa0001A01.tar.gz C01HBa0001A01/ chromat_dir/ edit_dir/ phd_dir/ seq_dir/ C01HBa0001A01.seq >C01HBa0001A01 ATGCCTACGAT...

Page 17: Submitting Sequenced BACs Robert Buels SGN. Purpose and Scope Provide well-organized archive of sequencing data for the tomato genome sequencing project,

New Submission System

checks submission guidelines

C01HBa0001A01.tar.gz C01HBa0001A01/ chromat_dir/ edit_dir/ phd_dir/ seq_dir/ C01HBa0001A01.seq >C01HBa0001A01 ATGCCTACGAT...

submission is tar.gz file, named for BAC

Page 18: Submitting Sequenced BACs Robert Buels SGN. Purpose and Scope Provide well-organized archive of sequencing data for the tomato genome sequencing project,

New Submission System

checks submission guidelines

C01HBa0001A01.tar.gz C01HBa0001A01/ chromat_dir/ edit_dir/ phd_dir/ seq_dir/ C01HBa0001A01.seq >C01HBa0001A01 ATGCCTACGAT...

submission is tar.gz file, named for BAC

self-named subdirectory

Page 19: Submitting Sequenced BACs Robert Buels SGN. Purpose and Scope Provide well-organized archive of sequencing data for the tomato genome sequencing project,

New Submission System

checks submission guidelines

C01HBa0001A01.tar.gz C01HBa0001A01/ chromat_dir/ edit_dir/ phd_dir/ seq_dir/ C01HBa0001A01.seq >C01HBa0001A01 ATGCCTACGAT...

submission is tar.gz file, named for BAC

self-named subdirectory

self-named sequence file

Page 20: Submitting Sequenced BACs Robert Buels SGN. Purpose and Scope Provide well-organized archive of sequencing data for the tomato genome sequencing project,

New Submission System

checks submission guidelines

C01HBa0001A01.tar.gz C01HBa0001A01/ chromat_dir/ edit_dir/ phd_dir/ seq_dir/ C01HBa0001A01.seq >C01HBa0001A01 ATGCCTACGAT...

submission is tar.gz file, named for BAC

self-named subdirectory

self-named sequence file

self-namedsequence

Page 21: Submitting Sequenced BACs Robert Buels SGN. Purpose and Scope Provide well-organized archive of sequencing data for the tomato genome sequencing project,

validate_submission.pl

Checks the formatting of your submission.

Avoids having to retry uploads to correct formatting.

~$ ./validate_submission.pl C01HBa0001A01.tar.gzC01HBa0088L02.tar.gz passed.

~$

On your machine...

Page 22: Submitting Sequenced BACs Robert Buels SGN. Purpose and Scope Provide well-organized archive of sequencing data for the tomato genome sequencing project,

Submission Process

1.Use validate_submission.pl to verify that your submission is properly formatted.

2.Upload with scp to upload.sgn.cornell.edu, using your login and password.

3.When finished, the submission cron job will notice it and process it.

Page 23: Submitting Sequenced BACs Robert Buels SGN. Purpose and Scope Provide well-organized archive of sequencing data for the tomato genome sequencing project,

Submission Process

4. If properly formatted, you will see it on the FTP site within 3 - 4 hours.

5. Otherwise, you will receive an email asking you to correct the format and retry the submission.

Page 24: Submitting Sequenced BACs Robert Buels SGN. Purpose and Scope Provide well-organized archive of sequencing data for the tomato genome sequencing project,

Questions?

Suggestions for improvement of FTP site?

Suggestions for submission process?