perl object layer & pipelines
DESCRIPTION
Perl Object Layer & Pipelines. Pipelines Steve Fischer John Iodice Deborah Pinney Mark Heiges Ed Robinson Perl Object Layer Brian Brunk Mark Gibson Dave Barkan. Pipeline Introduction. Sequential steps of Plugin calls Script calls Cluster jobs Purpose - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Perl Object Layer & Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062518/5681466b550346895db38f72/html5/thumbnails/1.jpg)
Perl Object Layer & Pipelines
Pipelines –Steve Fischer John Iodice Deborah Pinney Mark Heiges Ed Robinson
Perl Object Layer–Brian Brunk Mark Gibson Dave Barkan
![Page 2: Perl Object Layer & Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062518/5681466b550346895db38f72/html5/thumbnails/2.jpg)
Pipeline Introduction
• Sequential steps of – Plugin calls – Script calls – Cluster jobs
• Purpose– Codifies the process of creating the data set– Reduces human resources– Reduces human error and omissions
![Page 3: Perl Object Layer & Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062518/5681466b550346895db38f72/html5/thumbnails/3.jpg)
Two Pipeline Types
• Resources pipeline– Downloads resources from external sources– Loads resources into database– Example: NRDB files
• Analysis pipeline– Extract data from database– Run analysis programs on data
• On main or cluster server
– Put value added data back into database
![Page 4: Perl Object Layer & Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062518/5681466b550346895db38f72/html5/thumbnails/4.jpg)
Resource Pipeline
• Invoked by:– loadresources xmlfile propfile
• Take a tour of a resources XML file
![Page 5: Perl Object Layer & Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062518/5681466b550346895db38f72/html5/thumbnails/5.jpg)
Resources Repository
• Destination of downloads• Houses files in a file system• Serves as a cache for files• Has API to access files by name and version• If you request an existing file by name and
version, repository returns it without downloading– But the wget arguments must match (these are
remembered by the repository)• Particularly useful if multiple projects want to
synchronize their data input
![Page 6: Perl Object Layer & Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062518/5681466b550346895db38f72/html5/thumbnails/6.jpg)
Analysis Pipeline
• Take a tour of the analysis pipeline file
• Take a tour of the Steps.pm file
• Take a tour of the property file
![Page 7: Perl Object Layer & Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062518/5681466b550346895db38f72/html5/thumbnails/7.jpg)
Pipeline Directory Structure
• The directory which houses all the information for the pipeline including:– Input data– Logs– Result data– Pipeline control information:
• Which steps have been completed• Property files to control cluster
• Structured for easy comprehension• Take a tour of the directory structure
![Page 8: Perl Object Layer & Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062518/5681466b550346895db38f72/html5/thumbnails/8.jpg)
Analysis Pipeline API
• GUS::Pipeline::Manager.pm– Declares properties– Prevents steps from rerun– Calls plugins– Executes commands– Eases communication with cluster
• GUS::Pipeline::MakeTaskDirs.pm– Helps make directories expected by distribjob on the
cluster• GUS::Pipeline::TaskRunAndValidate.pm
– Helps run a series of tasks on the cluster
![Page 9: Perl Object Layer & Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062518/5681466b550346895db38f72/html5/thumbnails/9.jpg)
DJob
• Manages the distribution of tasks across a compute cluster
• Handles the case of a very large number of inputs which are processed independently and uniformly
• For example, blasting a set of EST against a genome
• Now available for clusters using PBS cluster scheduler
• http://core.pcbi.upenn.edu/tools/liniactools.html
![Page 10: Perl Object Layer & Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062518/5681466b550346895db38f72/html5/thumbnails/10.jpg)
Perl Object Layer
http://www.cbil.upenn.edu/~brunkb/PERL_Objects.html
![Page 11: Perl Object Layer & Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062518/5681466b550346895db38f72/html5/thumbnails/11.jpg)
Perl Object Layer
• Simplifies database interactivity
• Manages parent-child relationships
• Manages submits (inserts,updates and deletes)– Submits children recursively– Automatic versioning– Sets default attributes (Ex. row_user_id)
• Enforces read/write permissions
• Code generator - objects consistent with db
• Extracts meta data from db
• Prints to XML and parses XML into objects
![Page 12: Perl Object Layer & Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062518/5681466b550346895db38f72/html5/thumbnails/12.jpg)
DbiDatabase Module
• Creates login to the database
• Allows use of all database objects
• Has methods to get meta information– Ex: getTable(tableName) returns a DbiTable
for access of FK and PK attributes
• DbiDatabase object automatically instantiated by plugins
• DbiDatabase objects must be explicitly instantiated in scripts
![Page 13: Perl Object Layer & Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062518/5681466b550346895db38f72/html5/thumbnails/13.jpg)
Object Constructor
• TableName->new($hashRef)
![Page 14: Perl Object Layer & Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062518/5681466b550346895db38f72/html5/thumbnails/14.jpg)
Retrieving objects from DB
• retrieveFromDB(\@attributesToNotRetrieve)
• Returns 1 if successful – Constrains attribute values
• Returns 0 if not successful– No rows or multiple rows
![Page 15: Perl Object Layer & Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062518/5681466b550346895db38f72/html5/thumbnails/15.jpg)
Getting and Setting Attributes
• Attributes can be set using the individual object– Preferred, for additional functionality – Ex: setRowUserId($userId);
• Attributes can be set using the superclass– set('row_user_id',$userId);
• Get methods use similar syntax– getRowUserId()– get('row_user_id')
![Page 16: Perl Object Layer & Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062518/5681466b550346895db38f72/html5/thumbnails/16.jpg)
Managing submits to database
• submit($notDeep, $noTran)– $notDeep = 1 only submits self but not
children – $noTran = 1 does not begin or commit a
transaction
• addToSubmitList($object)– Additional $object gets submitted after
main object and its children are submitted
![Page 17: Perl Object Layer & Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062518/5681466b550346895db38f72/html5/thumbnails/17.jpg)
Managing Parents
• setParent($p)
• getParent($className, $retrieveIfNoParent ,\@doNotRetrieveAttributes)
• retrieveParentFromDB($className ,\@doNotRetrieveAttributes)
![Page 18: Perl Object Layer & Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062518/5681466b550346895db38f72/html5/thumbnails/18.jpg)
Managing Memory
• undefPointerCache()– MUST be called in each loop to allow
garbage collection. – Removes all child and parent pointers so
they can not be retrieved.
• All other methods are automatic– addToPointerCache($ob) – getFromPointerCache($object_reference) – removeFromPointerCache($ob)
![Page 19: Perl Object Layer & Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062518/5681466b550346895db38f72/html5/thumbnails/19.jpg)
Managing deletes
• Deletes occur in two steps
• markDeleted($doChildren)– Mark self deleted– If $doChildren = 1 then does this recursively
• Deletes occur with submit
![Page 20: Perl Object Layer & Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062518/5681466b550346895db38f72/html5/thumbnails/20.jpg)
Managing Children
• getChildren($className, $retrieveIfNoChildren, $getDeletedToo, $where,\@doNotRetrieveAttributes)
• getAllChildren($retrieve, $getDeletedToo, $where)
• retrieveChildrenFromDB($className, $resetIfHave, $where,\@doNotRetrieveAttributes )
• retrieveAllChildrenFromDB($recursive, $resetIfHave)
![Page 21: Perl Object Layer & Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062518/5681466b550346895db38f72/html5/thumbnails/21.jpg)
Methods for dealing with sequence
• getSequence()
• setSequence($sequence)– removes returns and non-sequence
characters and then sets.
• GetFeatureSequence()– retrieves substring of sequence to which
that feature points
• toFasta($type)– If $type = 1 id used is the aa(or
na)_sequence_id - otherwise it is the source_id
![Page 22: Perl Object Layer & Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062518/5681466b550346895db38f72/html5/thumbnails/22.jpg)
Printing
• ToString()
• toXML($indent, $suppressDef, $doXmlIds, $family)– $suppressDef = 1 default attributes below
modification_date are suppressed– $doXmlIds = 1 will print XML ids in the
object tags– $family = 1 will print parent/child
relationships in object tags rather than nesting children
![Page 23: Perl Object Layer & Pipelines](https://reader036.vdocument.in/reader036/viewer/2022062518/5681466b550346895db38f72/html5/thumbnails/23.jpg)
Checking read and write permissions
• checkReadPermission()
• checkWritePermission()