friday seminar 15 10 2004

26
Funded by: Facilitating Standardization and Exchange of Array Design ADF MAGE-ML Tool Pierre Marguerite – Friday Seminar EBI – Microarray Informatics Team 15 October 2004

Upload: pierre-marguerite

Post on 04-Jun-2015

484 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Friday Seminar 15 10 2004

Funded by:

Facilitating Standardization and Exchange of Array Design

ADF MAGE-ML Tool

Pierre Marguerite – Friday Seminar

EBI – Microarray Informatics Team

15 October 2004

Page 2: Friday Seminar 15 10 2004

Funded by:

15/10/2004 - Friday Seminar Pierre Marguerite

Microarray Informatics Team2

ADF MAGE-ML Tool• Application

– stand-alone– plateform independant

• Supports:– Simple/Complex microarray layout– Differents microarray applications

• gene_expression• snp_detection• comparative_genomic_hybridization• binding_site_identification• Others (minimal)

• Respects Good practices

Page 3: Friday Seminar 15 10 2004

Funded by:

15/10/2004 - Friday Seminar Pierre Marguerite

Microarray Informatics Team3

conversiontool

Page 4: Friday Seminar 15 10 2004

Funded by:

15/10/2004 - Friday Seminar Pierre Marguerite

Microarray Informatics Team4

MAGE-ML (MAGE-OM)Description Biosequence

Array

Array Design DesignElement DesignElement

Page 5: Friday Seminar 15 10 2004

Funded by:

15/10/2004 - Friday Seminar Pierre Marguerite

Microarray Informatics Team5

MAGE-ML (next)

Page 6: Friday Seminar 15 10 2004

Funded by:

15/10/2004 - Friday Seminar Pierre Marguerite

Microarray Informatics Team6

ADF (previous)

Page 7: Friday Seminar 15 10 2004

Funded by:

15/10/2004 - Friday Seminar Pierre Marguerite

Microarray Informatics Team7

Array Design File

adh

adr

adc

Header

contacts

Technical Information

Page 8: Friday Seminar 15 10 2004

Funded by:

15/10/2004 - Friday Seminar Pierre Marguerite

Microarray Informatics Team8

Array Design File

adr

adc

ReportersFeatures

Feature /Reporter

Page 9: Friday Seminar 15 10 2004

Funded by:

15/10/2004 - Friday Seminar Pierre Marguerite

Microarray Informatics Team9

Array Design FileComposite

Characteristics

Map to reporters

Page 10: Friday Seminar 15 10 2004

Funded by:

15/10/2004 - Friday Seminar Pierre Marguerite

Microarray Informatics Team10

ADF version differences• 3 parts (files) instead of 1• As Workbook or text files

• No Reporter Identifier item • No Reporter Group [role] item• New Chromosome item • New Chromosome_band item • New Species item

Page 11: Friday Seminar 15 10 2004

Funded by:

15/10/2004 - Friday Seminar Pierre Marguerite

Microarray Informatics Team11

• 2 mandatory steps :– Validation– Conversion

Page 12: Friday Seminar 15 10 2004

Funded by:

15/10/2004 - Friday Seminar Pierre Marguerite

Microarray Informatics Team12

Validation

• File format validation:

• File content validation– Validation of controlled vocabulary

• MGED ontology terms

• Approved Databases (Tags, Accession numbers)

– Automatic curation (when possible)

Page 13: Friday Seminar 15 10 2004

Funded by:

15/10/2004 - Friday Seminar Pierre Marguerite

Microarray Informatics Team13

Validation

• two levels of checking:– Relaxed– Strict

• two execution modes :– A complete mode– A step-by-step mode

• Error Log : for correction

Page 14: Friday Seminar 15 10 2004

Funded by:

15/10/2004 - Friday Seminar Pierre Marguerite

Microarray Informatics Team14

Checking lists (header)• File/Data structure checklist:

1. Header file is a tab-delimited-file2. Item names are correct or can be identified

if an item is not identified, it is skipped.1. All mandatory items are present in the header

• Data/file content checklist1. Correct field value format

Possible value types:"Integer"

• "Free Text"• "Controlled vocabulary"• "MGED ontology term"• "DatabaseEntry"• "Sequence"• "Species"

2. Check single multiple value

Page 15: Friday Seminar 15 10 2004

Funded by:

15/10/2004 - Friday Seminar Pierre Marguerite

Microarray Informatics Team15

Checking lists (feature reporter)• Feature Reporter file• File/Data structure checklist:

1. Header File is correct (structure and data )2. FeatureReporter file is a tab-delimited-file3. Header item names are correct (unknown items are skipped)4. All mandatory items are present. item cardinalities and dependences are correct.5. Database tags are approved and database accession numbers are correct6. Item order is correct (Optional, do not fail the checking)7. Field dependences are correct

• Data/file content checklist1. FeatureReporter file structure must be correct2. Mandatory Field are present. Field cardinalities and field value multiplicities must be correct.3. Field values are in a mandatory format

• Database tags are approved by ArrayExpress and are supplied in lower caseand between square brackets• Database ID are correct• Ontology terms are correct (MGED ontology)• Sequences are correct following the associated polymer type (DNA, RNA, protein):• Integer field values are correct

4. Duplicate features must not exist5. Duplicate Reporter (equal names) must have the characteristics.

Page 16: Friday Seminar 15 10 2004

Funded by:

15/10/2004 - Friday Seminar Pierre Marguerite

Microarray Informatics Team16

Checking lists (composite)• CompositeSequence• File/Data structure checklist:

1. Feature Reporter file must be correct (structure and data)2. CompositeSequence file is a tab-delimited-file3. Header item names are correct. (Unknown items are skipped)4. All mandatory items are present. Header item cardinalities and dependences are

correct5. Column order is correct (non mandatory)

• Data/file content checklist1. Composite file structure must be correct2. All mandatory fields are present. Field cardinalities are correct3. Field values are in expected format. Field multiplicity is correct (same as

Feature/Reporter)4. Names in map are reporter or composite sequence names5. No duplicate CompositeSequences (same names)

Page 17: Friday Seminar 15 10 2004

Funded by:

15/10/2004 - Friday Seminar Pierre Marguerite

Microarray Informatics Team17

Checking lists• Header item names are correct• All mandatory items are present• All mandatory fields are present.• No Duplicate features • Duplicate Reporter (equal names) must have the

characteristics.• No duplicate CompositeSequences (same names)• Names in map are reporter or composite

sequence names

Page 18: Friday Seminar 15 10 2004

Funded by:

15/10/2004 - Friday Seminar Pierre Marguerite

Microarray Informatics Team18

Page 19: Friday Seminar 15 10 2004

Funded by:

15/10/2004 - Friday Seminar Pierre Marguerite

Microarray Informatics Team19

Page 20: Friday Seminar 15 10 2004

Funded by:

15/10/2004 - Friday Seminar Pierre Marguerite

Microarray Informatics Team20

MGED Ontology / DAML+OIL

Page 21: Friday Seminar 15 10 2004

Funded by:

15/10/2004 - Friday Seminar Pierre Marguerite

Microarray Informatics Team21

Approved Databases

Page 22: Friday Seminar 15 10 2004

Funded by:

15/10/2004 - Friday Seminar Pierre Marguerite

Microarray Informatics Team22

User modes

Page 23: Friday Seminar 15 10 2004

Funded by:

15/10/2004 - Friday Seminar Pierre Marguerite

Microarray Informatics Team23

Implementation - technical choices

-MAGE-stk-JaxB-Configuration (default parameters)

Performance:4000 features : ~10 minutes

Page 24: Friday Seminar 15 10 2004

Funded by:

15/10/2004 - Friday Seminar Pierre Marguerite

Microarray Informatics Team24

Installer - izpack http://www.izforge.com/izpack/

Page 25: Friday Seminar 15 10 2004

Funded by:

15/10/2004 - Friday Seminar Pierre Marguerite

Microarray Informatics Team25

http://www.ebi.ac.uk/adf

http://www.ebi.ac.uk/adf/

Page 26: Friday Seminar 15 10 2004

Funded by:

15/10/2004 - Friday Seminar Pierre Marguerite

Microarray Informatics Team26