integration of heterogeneous data sources for high content ... · screening process • (3) image...

19
Integration of heterogeneous data sources for high content screening data exploitation and exchange Institute Curie, 6 th January 2015 Elton Rexhepaj, MSc, PhD

Upload: others

Post on 12-Jun-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Integration of heterogeneous data sources for high content ... · screening process • (3) Image acquisition High throughput removes subjectivity from the data but is still prone

Integration of heterogeneous

data sources for high content

screening data exploitation and

exchange

Institute Curie, 6th January 2015 Elton Rexhepaj, MSc, PhD

Page 2: Integration of heterogeneous data sources for high content ... · screening process • (3) Image acquisition High throughput removes subjectivity from the data but is still prone

Presentation layout

• High content screening and the visualisation

functionalities needed

• Curie institute framework for microscopy

imaging content management

• Tools developped to integrate heterogenous

data

• Conclusion and perspectives

Page 3: Integration of heterogeneous data sources for high content ... · screening process • (3) Image acquisition High throughput removes subjectivity from the data but is still prone

BioPhenics is a technological HCS platform that supports

research teams in their needs of high-content screens

Translational research projects involves clinicians working on

cancer-related cell models for target validation or/and drug

repositioning.

Academic research projects involves research groups

working on cancer-related.

Support researchers with (1) assay development, (2) image data

acquisition, (3) data analysis

Who are we ?

To deal with the complexity of the analysis and underlying biological question

software tools are needed for data sharing and communication

Page 4: Integration of heterogeneous data sources for high content ... · screening process • (3) Image acquisition High throughput removes subjectivity from the data but is still prone

High content screening

workflow

• (1) Assay development. Which Abs

to use? Incubation time?

• (2) Robotics for scaling up the

screening process

• (3) Image acquisition

High throughput removes subjectivity from the data but is

still prone to technical biais that need to be validated.

High-content screening (HCS), also

known as high-content analysis (HCA)

or cellomics, is a method that is used in

biological research and drug discovery

to identify substances such as small

molecules, peptides, or RNAi that alter

the phenotype of a cell in a desired

manner.

Page 5: Integration of heterogeneous data sources for high content ... · screening process • (3) Image acquisition High throughput removes subjectivity from the data but is still prone

Typical screen: 1000 molecules, 2 replicates = 1TB of data, ~40000 images

High Content Screening Analysis pipeline

Sharing of imaging data can encrich phenotyping and allow a more optimal

exploitation of the HCS data

Page 6: Integration of heterogeneous data sources for high content ... · screening process • (3) Image acquisition High throughput removes subjectivity from the data but is still prone

CID iManage > Institut Curie

Avadis® iManage

Images

Annotations

Apps

Access UI

Results

Acquisition UI

Settings

Access APIs SOAP/XML/RPC

Prepare Acquire Analyze Share Disseminate Visualize

Page 7: Integration of heterogeneous data sources for high content ... · screening process • (3) Image acquisition High throughput removes subjectivity from the data but is still prone

Images Server+ Metadata +annotations (manual or analysis results)/

attachments (publications,xls file…)

Acquisition Client Web Client Interface Web admin for project

managing

COMPUTING

CLUSTER

IMAGE STORAGE

Dynamic Organisation,

Visual search or

advanced search

functionalities

Metadata (pixel size, acquisition time,…)

annotations,

Automatic analysis without

full download,

Data fusion, advanced

visualisation

BioImaging Cell and Tissue Core Facility

http://pict-ibisa.curie.fr/

Page 8: Integration of heterogeneous data sources for high content ... · screening process • (3) Image acquisition High throughput removes subjectivity from the data but is still prone

Underlying hardware infrastructure

Page 9: Integration of heterogeneous data sources for high content ... · screening process • (3) Image acquisition High throughput removes subjectivity from the data but is still prone

Avadis® iManage

Images

Annotations

Apps

Results Settings

Access APIs SOAP/XML/RPC

Collaborator

User

?

Software tool needs to be addressed

Page 10: Integration of heterogeneous data sources for high content ... · screening process • (3) Image acquisition High throughput removes subjectivity from the data but is still prone

Real-time Unified Bio-Imaging Exploitation

System > RUBIES

As part of France Bio-Imaging network, we developed a Real-time Unified Bio-

Imaging Exploitation System to answer data sharing needs.

Liferay technology was selected as an open source platform to develop the

RUBIES portail.

RUBIES uses open source tools and widely-known paradigms to create a

collaborative platform to visualise and exchange HCS data.

Heterogeneous data from HCS is analysed in real-time in order to enrich

databases which are exposed through a web service layer.

WebLab integration framework: Service Oriented Architecture where individual

components are encapsulated as web services with common int/com protocols.

Page 11: Integration of heterogeneous data sources for high content ... · screening process • (3) Image acquisition High throughput removes subjectivity from the data but is still prone

Heterogeneous components (algorithms, data sources, indexes, external tools)

• Internal components are separated from the communication layer

• Web services use standardized WebLab interfaces and data exchange protocols.

• Ensures compatibility between old and new services.

Liferay technology choice for development

Page 12: Integration of heterogeneous data sources for high content ... · screening process • (3) Image acquisition High throughput removes subjectivity from the data but is still prone

• Any kind of information may be created by a service without needing to change

other services.

(subject, predicate, object)=>(Project URI/ hasId / “016” )

• Visualization is done through portal and portlet technology. Each portlet is able to call user specific types of data or processing.

• Portlets are loosely coupled elements, easing maintenance.

• Pages can be easily created by users in order to be tailored to a specific task or work methodology.

Data exchange protocols use RDF/JSON

for annotation storage and communication

Page 13: Integration of heterogeneous data sources for high content ... · screening process • (3) Image acquisition High throughput removes subjectivity from the data but is still prone

Cell count

Lysosome count

IF Granularity

Raw tiff

JP

EG

20

00

(Q

=80

)

IF average intensity per well for lysosome staining (endocytosis

screening) / Raw data (384 well plates)

Plate robust z-score normalisation of average cell population IF

intensity prior to visualisation

1) Image normalisation and contrast enhacement 2) JPEG compression

Data pre-processing prior to visualisation

Compression is necessary to decrease client/server communication throughput (10 fold

decrease) and also fine for cell and organelle segmentation

Experimental plates in acquisition order

No

rmalized

well in

ten

sit

y

Raw

well in

ten

sit

y

DAPI Endosome Lysosome

Page 14: Integration of heterogeneous data sources for high content ... · screening process • (3) Image acquisition High throughput removes subjectivity from the data but is still prone

Background services provide capability to search for elements through plain text queries

Interface for dynamic search of content

Page 15: Integration of heterogeneous data sources for high content ... · screening process • (3) Image acquisition High throughput removes subjectivity from the data but is still prone

Selected fields (and corresponding metadata, in the lower-right corner) can be exported to external applications, in this case Avadis iManage (commercial) through specialized portlets

Export of selected annotation/images

Page 16: Integration of heterogeneous data sources for high content ... · screening process • (3) Image acquisition High throughput removes subjectivity from the data but is still prone

Export of selected annotation/images /

multiple selection

Page 17: Integration of heterogeneous data sources for high content ... · screening process • (3) Image acquisition High throughput removes subjectivity from the data but is still prone

Export of mixed automated and

experimental annotations with images

Page 18: Integration of heterogeneous data sources for high content ... · screening process • (3) Image acquisition High throughput removes subjectivity from the data but is still prone

Conclusions

RDF as a web standard suitable to represents any kind of information

statements as collections of triples for HCS data (nosql approach).

Image preprocessing (normalisation/compression) is a critical step for HCS

data visualisation (i.e. throughput) and visual assessment of results.

RUBIES can interoperate with other data management systems and hence

integrate HCS image and annotation data with other experimental output.

Concurrently access to the portal and user management is supported with

job parallelisation in the computing cluster.

RUBIES is still a work in progress however visualisation Interfaces are

fully functional and we are happy to share our work with the community.

Further work: user-added annotations and exploitation of image and

metadata in order to create cell image dictionaries for pattern recognition.

Page 19: Integration of heterogeneous data sources for high content ... · screening process • (3) Image acquisition High throughput removes subjectivity from the data but is still prone

Franck Perez

Philippe Benaroche

Jacque Chamonix

Elaine Del Nery

Aurianne Lescure

Sarah Tessier

Dmitry Vjostockolevic

Elodie Anthony

Acknoledgments Curie Institute - BIOPHENICS

Curie Institute – UMR 144

Jean Salamero

Perrine Paul-Gilloteaux