gus plugin system michael saffitz genomics unified schema workshop july 6-8th, philadelphia,...

21
GUS Plugin System Michael Saffitz Genomics Unified Schema Workshop July 6-8th, Philadelphia, Pennsylvania

Upload: dominick-mckenzie

Post on 01-Jan-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

GUS Plugin System

Michael Saffitz

Genomics Unified Schema Workshop

July 6-8th, Philadelphia, Pennsylvania

Plugin Overview

Small Perl programs that load and manipulate data within GUS

Written using the GUS Plugin API and Perl Object Layer

Provide automatic support for: Data Provenance Object layer and database connectivity Standardized documentation Command line argument processing Logging Error Handling

“Supported” and “Community” Plugins provided with GUS

Supported Plugins

Have been tested in Oracle and Postgres and are confirmed to work

Portable

Useful beyond the site that developed them

Meet the GUS Plugin Standard

Community Plugins

Fail to meet one or more of the criteria above Have not been tested

Provided as a general resource to the community

Plugin Life Cycle

Plugin Initialization Documentation Command Line Arguments

Data Loading Reading, Parsing, Querying

Data Manipulation Insert or Update? Restart Logic

Data Submission

GUS Supported Plugins InsertArrayDesignControl.pm InsertAssayControl.pm InsertBlastSimilarities.pm InsertExternalDatabase.pm InsertExternalDatabaseRls.pm InsertGOEvidenceCode.pm InsertGeneOntology.pm InsertGeneOntologyAssoc.pm

InsertRadAnalysis.pm InsertReviewStatus.pm InsertSecondaryStructure.pm InsertSequenceOntology.pm LoadArrayDesign.pm LoadArrayResults.pm LoadFastaSequences.pm LoadGusXml.pm LoadNRDB.pm LoadRow.pm LoadTaxon.pm

Plugin Shell

package GUS::Supported::Plugin::LoadRow;

@ISA = qw(GUS::PluginMgr::Plugin);

use strict;use GUS::PluginMgr::Plugin;

sub new { … }

sub run { … }

Plugin Initialization

sub new {my ($class) = @_;my $self = {};bless($self, $class);

$self->initialize({ requiredDbVersion => 3.5, cvsRevision => '$Revision: 2934 $',

name => ref($self), argsDeclaration => $argsDeclaration, documentation => $documentation });

return $self;}

Declaring Arguments stringArg({name => 'externalDatabaseVersion', descr => 'sres.externaldatabaserelease.version for this instance of

NRDB', constraintFunc => undef, reqd => 1, isList => 0 }),

fileArg({name => 'gitax', descr => 'pathname for the gi_taxid_prot.dmp file', constraintFunc => undef, reqd => 1, isList => 0, mustExist => 1, format => 'Text' }),

Argument Types

String Integer Boolean Table Name Float File Enumeration Controlled Vocab

Local, Database Term Pairs for “dinky” CVs

Declaring Documentation

my $tablesDependedOn = [['GUS::Model::DoTS::NRDBEntry', 'pulls aa_sequence_id from here when id and extDbId match requested']];

my $documentation = {purposeBrief => $purposeBrief,purpose => $purpose,tablesAffected => $tablesAffected,tablesDependedOn => $tablesDependedOn,howToRestart => $howToRestart,failureCases => $failureCases,notes => $notes

};

Plugin Initializationsub new {

my ($class) = @_;my $self = {};bless($self, $class);

$self->initialize({ requiredDbVersion => 3.5, cvsRevision => '$Revision: 2934 $',

name => ref($self), argsDeclaration => $argsDeclaration, documentation => $documentation });

return $self;}

Plugin Shell

package GUS::Supported::Plugin::LoadRow;

@ISA = qw(GUS::PluginMgr::Plugin);

use strict;use GUS::PluginMgr::Plugin;

sub new { … }

sub run { … }

Run Method

“Entry point” for plugin Concise overview/“table of contents” for plugin:

sub run {my ($self) = @_;my $rows = 0;my $rawData = $self->readData();my @parsedData = $self->parseData($rawData);foreach $data (@parsedData) {

$data->submit(); $rows++;

}return “Inserted $rows ”;

}

Accessing Data

Command line arguments: $self->getArg(‘nrdbFile);

Through Objects: my $preExtAASeq =GUS::Model::DoTS::ExternalAASequence->new

({'aa_sequence_id'=>$aa_seq_id});$preExtAASeq->retrieveFromDB();

Direct Database Access: my $dbh = $self->getQueryHandle();

my $sth = $dbh->prepare(…);

Persisting Data

Saving & Updating: $obj->submit(); Will cascade and submit children

Delete: $obj->markDeleted(1);

$obj->submit();

Logging and Error Handling

For general logging, use logging functions Printed to STDERR $self->log(“message”)

For error handling: Either die() immediately or Write errors to a file (for recoverable errors)

Restart functionality Check for object existence Check, but ensure loaded from a valid proper invocation Store data from previous run and use as a filter

Clearing the Cache

Historical: Perl previously had poor garbage collection support

Default capacity of 10000 objects

At the bottom of the outermost loop: $self->undefPointerCache();

Data Provenance

Tracks plugin revisions-- Name, Checksum, Revision

Tracks parameters that a specific plugin is executed with

Algorithm

AlgorithmImplementation

AlgorithmInvocation

AlgorithmParamKey

AlgorithmParamKeyType

AlgorithmParam

Plugin Evolution

Changes abound: Data file formats Schema

Be flexible in writing plugins-- command line configuration

Be clear about what schema objects you use

Plugin Standard

See Developer’s Guide: http://gusdb.org/documentation/3.5/developers/developersguide.html