using oracle intermedia for wellcome trust the large scale ... · the presentation will conclude...

64
www.sanger.ac.uk The Wellcome Trust Sanger Institute Using Oracle Using Oracle inter inter Media for Media for Large Scale Image Storage Large Scale Image Storage Martin Widlake Martin Widlake Database Manager Database Manager Tony Webb Tony Webb Real DBA Real DBA The The Wellcome Wellcome Trust Trust Sanger Sanger Institute Institute [email protected]

Upload: others

Post on 11-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Using Oracle Using Oracle interinterMedia forMedia forLarge Scale Image Storage Large Scale Image Storage

Martin WidlakeMartin Widlake –– Database ManagerDatabase ManagerTony Webb Tony Webb –– Real DBAReal DBA

The The WellcomeWellcome Trust Trust Sanger Sanger [email protected]

Page 2: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Abstract• The Wellcome Trust Sanger Institute have several projects

under development which require the handling and annotation of very large quantities of image data. With Oracle Corp, we areimplementing an Oracle10G solution, using the Intermediafunctionality, which has been extensively developed for 10G and continues to be enhanced.

This talk will cover WTSI’s need for image handling, the proof of concept and the system being implemented now. It also covers the planned development of both the system and alsointerMedia.

The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed by a workshop.

Page 3: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Format• Martin Widlake from WTSI will introduce and talk

about why we need to handle so much image data and why we are using Oracle interMedia.

• Tony Webb from WTSI will explain how we implemented the proof-of-concept, the issues we hit and the plans for the Gene Atlas project

• Tony will then cover the current plans and also the development areas being looked at with Oracle.

• Melli Annamalai from Oracle will then present the general technical capabilites of Oracle 10G interMedia

• Following the presentation, Melli will host a workshop.

Page 4: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

• The Sanger Institute is a research centre funded primarily by the Wellcome Trust. WTSI is located at Hinxton Hall, Cambridge (UK) in 55 acres of parkland.

• Founded in 1993; Currently over 800 staff members at WTSI, about 170 are IT, 430 are scientific and the boundary can be rather blurred.

• Our purpose is to further the knowledge of the biology of organisms, particularly through large scale sequencing, analysis of their genomes and post-genomic large-scale studies.

• Our lead project has been to sequence a third of the human genome as part of the international Human Genome Project.

Page 5: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

June 2000“In the last decade of the twentieth century, scientists from around the world initiated one of the most significant scientific projects of all time: to determine the DNA sequence of the entire human genome, the human genetic blueprint.” Joint statement, Clinton and BlairBill Clinton, Photo Rod Edmonds

Tony Blair, Photo Christine Nesbitt Sir John Sulston, Photo Matthew Fearn

Page 6: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Why the Wellcome TrustSanger Institute needs

to handle so muchImage data

Page 7: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Gene Atlas Project

• Develop a set of antibodies to every known gene in the mouse genome.

• Link the antibodies to dye molecules in order to stain cytological samples.

• Identify where the gene is expressed in the mouse embryo during the development process.

• Create an atlas of where each gene is expressed within the developing mouse, when, and the level of expression.

• An information resource available to the world scientific community.

Page 8: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Gene Atlas Project

ImmunostainingUse antibodies to detect any of the 30k genes.

Highlight where the proteins (and thus genes) are expressed

Not just where, but when.

Page 9: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Gene Atlas Project

• Images are composed of camera frames that may overlap

3

1 2

4

Page 10: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Gene Atlas ProjectData Volumes

Creation ofprotein

expression constructs

Protein production

Antibody generation and selection

Generation ofTissue arrays

High throughputICC and image

capture

Tissue collection

Image analysisand annotation

Antigenselection

100/month 1000/month

1000/month

500,000/month!

Page 11: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Gene Atlas Project

Tissue microarray cores (1mm)= 6mb x 400 = 2.4gb x 60 slides = 144gb

Tissue sections (10x15mm)= 900mb x6 sections = 54gb x 60 slides = 324gb

…so 5x106 files of ~6 Mb = ~30 x 1012bytes

= ~30 Tb

Page 12: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Image Handling – GeneTrap Project

• Interrupt known genes by inserting a new piece of DNA into the gene.

• Grow tissue samples with interrupted genes to confluence and take an image of the cell culture.

• Also, during the cell growth process, images of the growing colonies are taken each day to assess colony size and when the culture sample can be taken.

• Keep a record of the plate images to assess culture growth and where satellite colonies are occurring.

• 2-3 TB of images will be gathered and stored for the usual, scientific “period of time”.

Page 13: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Image analysis e.g. AD0697Low power (2.5 x) High power (20 x)

•level of expression•subcellular localisation•cell type specificity•proportion of staining

cells (e.g. 15%)

Page 14: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

X-Gal staining shows that trapped gene expression is variable

Page 15: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Wellscan names cell lines

AD0333 AD0334 AD0335 AD0336 AD0337 AD0338 AD0339 AD0340

AD0341 AD0342 AD0343 AD0344 AD0345 AD0346 AD0347 AD0348

AD0365 AD0366 AD0367 AD0368 AD0369 AD0370 AD0371 AD0372

AD0373 AD0374 AD0375 AD0376 AD0377 AD0378 AD0379 AD0380

AD0349 AD0350 AD0351 AD0352 AD0353 AD0354 AD0355 AD0356

AD0357 AD0358 AD0359 AD0360 AD0361 AD0362 AD0363 AD0364

Page 16: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Image Handling – GeneTrap Project

• 2-3TB of images will be generated by the automated image capturing system.

• A further system will need to capture images of colonies growing on a plate to judge when the colonies are mature.

• The captured and image-processed data will need to link to the LIMS systems we build using Oracle.

• The images also needs to link to data held in MySQLdatabases that they brought with them to the institute.

• Generate a resource for all 30,000 mouse genes, have samples available for each gene that can be ordered and used by any research group .

Page 17: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Image Handling -C.Elegans Phenotyping

• C.Elegans are small nematode worms used as model organisms in studying development.

• Mutations alter the manner in which c.elegans either develops or grows.

• Images are captured via standard light microscopy and then processed by in-house algorithms.

• Relatively simple image processing allows generation of numerical phenotypic data.

• Now the system works they want to keep the images. There will be lots of them…

Page 18: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Image Handling -C.Elegans Phenotyping

Page 19: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Image Handling -C.Elegans Phenotyping

Page 20: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Image Requirements atWTSI

• Gene Trap project grow up tissue to confluence and wish to keep the images as a reference - 3 TB

• Gene Atlas producing 20 plus TB of image data over the next two years.

• C. Elegans phenotype project – modest data requirements but interesting image processing

• Need to be linked into our LIMS, Sample storage, Robotics systems, and gene variation databases all developed in Oracle.

• Systems need to hold large data volumes, be protected against any data loss and potential allow large numbers of interactive users.

Page 21: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Why use OracleinterMedia?

• Could have handled in file system with gatekeeper software, but backup, control become a major issue

• Want multi user, interactive access.

• We store massive data volumes in Oracle, our LIMS in Oracle…Maybe use Blobs.

• Already testing Oracle 10 Beta and Oracle Corp came and asked us if we had any interest in images…

• Altered Beta test program to include the interMediasuite.

Page 22: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

And they get bigger…Development, Genes and

Tissue Patterns• Will use similar system to Gene Atlas but with much

larger samples.

• Images of 10-100GB made up of many tiles.

• Multiple slices through one sample, say 20 though a mouse brain.

• Repeat for 30,000 genes, for 100’s of stages of development

• Predicted data volume?

1.6 petabytes

Page 23: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Page 24: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

ImplementinginterMedia

Tony WebbTony WebbSenior DBA at The Wellcome Trust Sanger [email protected]

Page 25: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

The Pilot Scheme

• The Project chosen as the pilot was The Gene ATLAS Project headed up by Dr. Gareth Maslen.

• O/S chosen was RedHat Linux Advanced Server (appropriately patched).

• Initial hardware used was a DL380 Dual Xeon CPU server with 4 Gigs of RAM and 100 Gigs of Enterprise Virtual Array (EVA) disk space.

Page 26: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

The Pilot Scheme

• Recently we moved onto non-BETA releases of interMedia and Oracle (Oracle 10.1.0.2) running on new hardware.

• Earlier this month the original server was decommissioned. This talk, however, covers elements from both environments.

• One of the new machines is a cluster pair of DL380s (atlasdb1a and atlasdb1b) still using 4 Gig of RAM and running OCFS on RedHat Linux Advanced Server.

Page 27: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

The Pilot Scheme

• Disk setup on the new cluster is very different from the original machine.

• There is more of it: Terrabytes not Gigabytes!

• Bladestore is fine for backups, bulk writes, bulk reads etc. although it should be used with caution for datafiles.

• Disk space has changed to include a mix of the EVA store (expensive and fast SAN storage) and Bladestore (cheap IDE disks).

• OCFS has been installed as a prerequisite for RAC although we are not currently using RAC on this machine.

Page 28: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Getting The Data into interMedia

• This Is What Currently Happens…

Oracle (HSV)

Oracle (Bladestore)SQLServer

DB

SQLServer2000 Cluster

SAN

Linux Cluster

Workstation

XML Creation

Image Import

interMedia write

Image Files

(AITIFF)

XMLFiles

Image capture

Page 29: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Installing interMedia• For the original environment, the db server s/w was

downloaded from the Oracle beta tester website and installed error free. Two databases (BETA and TBETA) were then created.

• Installation of the InterMedia s/w was a little troublesome. This was largely attributable to user (or rather dba!) error.

• Installation on the new environment was, however, trouble free thanks largely to use of The Database Configuration Assistant (dbca)

Page 30: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Installing interMedia

• Installation via dbca is strongly recommended

Page 31: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Installing interMedia

• interMedia installation should be verified by running the following script:

@<ORACLE_HOME>/ord/im/admin/imchk.sql

Component Status

-------------------------------------------------- ---------COLORFREQUENCIESLIST Public Grant VALIDCOLORFREQUENCIESLIST Varray VALID

……

Page 32: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Adding Go-faster Stripes

• The interMedia Image Accelerators are not technically essential but we would certainly recommend their use.

• Installing them made a huge difference to performance of one java program, specifically use of:

OrdImage.processCopy(String, OrdImage)

where performance improved dramatically.

Page 33: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Adding Go-faster Stripes

• Software installation in general is much easier, partly due to a slicker installation process.

Page 34: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Database Considerations

• Tablespace organisation needs careful consideration (BIGFILEs; disk types).

• DB Initialisation parameters should be checked against recommendations in the interMedia User Guide(db_nk_block_size; shared_pool_size)

• How are backups going to be implemented? (OS; RMAN).

• Table creation also needs to be fully understood (in-line storage; chunk size; LOB location)

• Will partitioning be used? (Range; Hash, List). This is not an interMedia specific consideration.

Page 35: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

interMedia Demos

• Additional s/w from the OTN website, http://otn.oracle.comwas also downloaded and installed.

e.g. Imgdemo.c

Page 36: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

The Photo Album Servlet

• Another example - The PhotoAlbum Java Servlet

Page 37: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

The Photo Album Servlet

Page 38: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

The Java Environment

• Sample code was ‘tinkered with’ and recompiled into class files, e.g. :($ORACLE_HOME/oc4j/j2ee/home/default-web-app/WEB-INF/classes/MelliPhotoRequest.class)

• Some initial problems setting up environment to run Java but nothing specific required for interMedia beyond changes to CLASSPATH.

• CLASSPATH:$ORACLE_HOME/ord/jlib/ordhttp.jar:$ORACLE_HOME/ord/jlib/ordim.jar:$ORACLE_HOME/jdbc/lib/classes12.jar:$ORACLE_HOME/sqlj/lib/runtime12.jar:$ORACLE_HOME/oc4j/j2ee/home/lib/servlet.jar:/usr/opt/j2sdk1.4.2_04/lib/rt.jar:$ORACLE_HOME/jdbc/lib/classes12.jar

Page 39: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Prototyping

Page 40: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Prototyping

Page 41: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Prototyping

Page 42: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Going Off At A Tangent..

• But what Does interMedia Data Look Like?

CREATE TABLE PHOTOS

( ID NUMBER NOT NULL,DESCRIPTION VARCHAR2(40) NOT NULL,

LOCATION VARCHAR2(40),

IMAGE ORDSYS.ORDIMAGE,THUMB ORDSYS.ORDIMAGE,DISPLAYORDER VARCHAR2(2),UPLOADTIMESTAMP TIMESTAMP(9))

Page 43: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Going Off At A Tangent..

• So what is ORDSYS.ORDIMAGE?

• It is an object type (database type). • Owned by database user ORDSYS.• Has attributes and methods.• Attributes are:

source type: ORDSource, Height, width and contentLength type: INTEGER, fileFormat, contentFormat, compressionFormat and mimeType

type:VARCHAR2(4000)• Methods include:

setProperties, checkProperties, getHeight, getFileFormat.

Page 44: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Going Off At A Tangent..

…and What is ORDSYS.ORDSOURCE?• It is another object type.• Also owned by database user ORDSYS.• Again, it has attributes and methods.• Attributes are:

localData type: BLOB,srcType, srcLocation, srcName type: VARCHAR2(4000) updateTime type: DATE, local type: NUMBER

• Methods include:getSourceInformation ,import, export.

Page 45: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

What Else Besides interMedia?

• Tools Currently Being Used By Our Developers include:

• Eclipse (Java IDE)• Apache Webserver• Tomcat (Servlet Container)• OC4J (Oracle Containers for Java)• Xerces (parses XML files -> export files)• JDeveloper• PL/SQL• SQL*Loader

Page 46: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

ImplementinginterMedia

• File Formats Tested In The Pilot:

• TIFF• AITIFF (3rd Party Variant of TIFF)• JPEG • JPEG2000

(although not currently an interMedia datatype)

Page 47: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Implementing interMedia

“Warts And All”

• <Metalink Bug:3292143> JAVAVM STILL DOESN'T RELEASE MEMORY IN SOME CIRCUMSTANCES WITHIN A SESSION. Seems to be related to specific images but once it happens all subsequent images in that java session will fail.

• Cursors not being closed when running JDeveloper and OC4J. Same code not erroring in TOMCAT.

Page 48: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Future Plans for ATLAS

• Overall tests have been encouraging.

• Live system to be created in July with a 400 Gig data load followed by Web Deployment from August?

• Will it scale? Need to investigate partitioning and backup options.

• Probable use of XMLDB, RAC and Oracle Application Server.

• More Automation for ATLAS (SQLServer replaced by Oracle?).

• Working with Oracle to develop new interMedia functionality.

Page 49: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Future Plans For interMedia

• The next few slides cover some of the features (in no particular order) that we would like to see incorporated into interMedia and we are working with Oracle in these areas.

Page 50: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Future Plans:Image Stitching• Automated microscopy systems produce tiles of pictures,

often several tiles per whole slide.

• Image Stitching is not a simple process and different microscopy systems will implement different algorithms or may not do it at all.

• Carrying out the stitching with one piece of code gives consistency.

• RDBMS products do not support this today.

Page 51: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

FuturePlans:Image Stitching

Page 52: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Future Plans:Image Stitching Requirements

Oracle is currently collecting requirements:

• Stitching: Done by the RDBMS or externally. – It’s more flexible if same algorithm is available both ways.– A single algorithm limits introduced artifacts in the

final image and any auto-annotation stands a good chance of not annotating them!

• Storage options: store tiles and stitch in real-time, or stitch and store the final image

• Annotations: Support for stitched images

Page 53: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Future Plans:Metadata Requirements

• As seen earlier, Oracle interMedia associates physically descriptive metadata with images e.g., size, creation date, etc.

• interMedia updates the associated metadata when an image format transformation occurs.

Requirements:• Metadata association: When images are stitched or

cut, the association should be maintained.

• User-defined metadata support: extend interMediaMetadata to include user-defined attributes.

Page 54: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Future Plans:Layers Requirements• Layers can be used to add annotations over an existing

image, leaving the original data unaltered– e.g., a cytological slide can be “drawn over” to highlight those

areas that are of interest.

• A reference template can be laid over an image – e.g., an idealised embryo outline laid over a 7-day mouse

embryo slide, opening a way to automated annotation

Requirement: • Allow one or more annotation layers to be associated with

a single image to create an annotation set.

Page 55: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Future Plans: New Image Format RequirementsinterMedia supports popular image formats today

Requirements:

• JPEG 2000: A new format that allows many interesting features - lossless compression, supported by standard browsers (unlike TIFF), extends to ‘movie’ data, allows encryption…

• Dicom: a format being used by medical organisations. Includes specifications for metadata.

Page 56: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Future Plans:Automated Annotation• The Gene Atlas project will be able to produce several

thousand images a day.

• An experienced cytologist can annotation a small number of hundreds of images a day…

• Charitable organisations can’t afford to employ 183 cytologists…

• The solution is automated annotation….

… Which is as likely as 3D protein structure prediction to actually work ☺

Page 57: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

Future Plans:Automated Annotation Requirements• Screening Criteria: Support multiple criteria such as

colour depth and distribution to reduce the results set. • Remote Annotation: Augment annotation resources by

allowing the wider scientific community to annotate images remotely.

• Life science community call to action:– Develop image annotation standards – Encourage 3rd party tool and application vendors to adopt these

standards and support Oracle Database as a storage option for images and annotations

• WTSI will have a massive data set, which may well encourage 3rd parties to develop automated systems.

Page 58: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

End Of My Bit

• Over to Melli.

Page 59: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

This Page Deliberately Left Blank

Page 60: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

• ..But Not This One ☺

Page 61: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

ImplementinginterMedia - Appendix

• Loading data via SQL*Loader:sqlldr parfile=sangerex.par

• sanger.par:userid=intermedia/intermedia

control=/oracle/home/oracle/intermedia/melli/sangerex.ctllog=/oracle/home/oracle/intermedia/melli/sangerex.log

direct=y

• sample table:create table stockphotos

(photo_id integer, image ordsys.ordimage);

Page 62: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

ImplementinginterMedia - Appendix

• Sample ctl file:LOAD DATA INFILE * INTO TABLE stockphotos REPLACEFIELDS TERMINATED BY ','(photo_id ,image column object

(source column object( localData_fname FILLER CHAR(12),

localData LOBFILE(image.source.localData_fname) raw terminated by EOF

))

)BEGINDATA1,core1.tif,2,core2.tif

Page 63: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

ImplementinginterMedia - Appendix

• Storing lobs in a different tablespace:CREATE TABLE NEW_IMAGE(

ID NUMBER,NAME VARCHAR2(256),DESCRIPTION VARCHAR2(255),IMG ORDSYS.ORDIMAGE,CREATOR NUMBER NOT NULL,ATLAS_GROUP NUMBER NOT NULL,CREATED DATE NOT NULL,INSERTED DATE NOT NULL,PATH VARCHAR2(256),HOST VARCHAR2(256),URL VARCHAR2(256),THUMBNAIL ORDSYS.ORDIMAGE

)TABLESPACE DATA_01STORAGE (INITIAL 504K

MAXEXTENTS UNLIMITEDPCTINCREASE 0)

LOB (IMG.source.localData, THUMBNAIL.source.localData) STORE AS (TABLESPACE DATA_02 STORAGE (INITIAL 1M NEXT 1M) CHUNK 4);

Page 64: Using Oracle interMedia for Wellcome Trust The Large Scale ... · The presentation will conclude with a more technical overview of what Intermedia is capable of and will be followed

www.sanger.ac.uk

TheWellcome TrustSanger Institute

ImplementinginterMedia - Appendix

select tablespace_name from user_tables where table_name = 'NEW_IMAGE';

TABLESPACE_NAME------------------------------DATA_01

select column_name, tablespace_name from user_lobs where table_name = 'NEW_IMAGE';

COLUMN_NAME-----------------------------------------------------------------

---------------TABLESPACE_NAME------------------------------"IMG"."SOURCE"."LOCALDATA"DATA_02

"THUMBNAIL"."SOURCE"."LOCALDATA"DATA_02