integration of heterogeneous data sources for high content ... · screening process • (3) image...
TRANSCRIPT
![Page 1: Integration of heterogeneous data sources for high content ... · screening process • (3) Image acquisition High throughput removes subjectivity from the data but is still prone](https://reader034.vdocument.in/reader034/viewer/2022052408/5f05b1bc7e708231d4143c47/html5/thumbnails/1.jpg)
Integration of heterogeneous
data sources for high content
screening data exploitation and
exchange
Institute Curie, 6th January 2015 Elton Rexhepaj, MSc, PhD
![Page 2: Integration of heterogeneous data sources for high content ... · screening process • (3) Image acquisition High throughput removes subjectivity from the data but is still prone](https://reader034.vdocument.in/reader034/viewer/2022052408/5f05b1bc7e708231d4143c47/html5/thumbnails/2.jpg)
Presentation layout
• High content screening and the visualisation
functionalities needed
• Curie institute framework for microscopy
imaging content management
• Tools developped to integrate heterogenous
data
• Conclusion and perspectives
![Page 3: Integration of heterogeneous data sources for high content ... · screening process • (3) Image acquisition High throughput removes subjectivity from the data but is still prone](https://reader034.vdocument.in/reader034/viewer/2022052408/5f05b1bc7e708231d4143c47/html5/thumbnails/3.jpg)
BioPhenics is a technological HCS platform that supports
research teams in their needs of high-content screens
Translational research projects involves clinicians working on
cancer-related cell models for target validation or/and drug
repositioning.
Academic research projects involves research groups
working on cancer-related.
Support researchers with (1) assay development, (2) image data
acquisition, (3) data analysis
Who are we ?
To deal with the complexity of the analysis and underlying biological question
software tools are needed for data sharing and communication
![Page 4: Integration of heterogeneous data sources for high content ... · screening process • (3) Image acquisition High throughput removes subjectivity from the data but is still prone](https://reader034.vdocument.in/reader034/viewer/2022052408/5f05b1bc7e708231d4143c47/html5/thumbnails/4.jpg)
High content screening
workflow
• (1) Assay development. Which Abs
to use? Incubation time?
• (2) Robotics for scaling up the
screening process
• (3) Image acquisition
High throughput removes subjectivity from the data but is
still prone to technical biais that need to be validated.
High-content screening (HCS), also
known as high-content analysis (HCA)
or cellomics, is a method that is used in
biological research and drug discovery
to identify substances such as small
molecules, peptides, or RNAi that alter
the phenotype of a cell in a desired
manner.
![Page 5: Integration of heterogeneous data sources for high content ... · screening process • (3) Image acquisition High throughput removes subjectivity from the data but is still prone](https://reader034.vdocument.in/reader034/viewer/2022052408/5f05b1bc7e708231d4143c47/html5/thumbnails/5.jpg)
Typical screen: 1000 molecules, 2 replicates = 1TB of data, ~40000 images
High Content Screening Analysis pipeline
Sharing of imaging data can encrich phenotyping and allow a more optimal
exploitation of the HCS data
![Page 6: Integration of heterogeneous data sources for high content ... · screening process • (3) Image acquisition High throughput removes subjectivity from the data but is still prone](https://reader034.vdocument.in/reader034/viewer/2022052408/5f05b1bc7e708231d4143c47/html5/thumbnails/6.jpg)
CID iManage > Institut Curie
Avadis® iManage
Images
Annotations
Apps
Access UI
Results
Acquisition UI
Settings
Access APIs SOAP/XML/RPC
Prepare Acquire Analyze Share Disseminate Visualize
![Page 7: Integration of heterogeneous data sources for high content ... · screening process • (3) Image acquisition High throughput removes subjectivity from the data but is still prone](https://reader034.vdocument.in/reader034/viewer/2022052408/5f05b1bc7e708231d4143c47/html5/thumbnails/7.jpg)
Images Server+ Metadata +annotations (manual or analysis results)/
attachments (publications,xls file…)
Acquisition Client Web Client Interface Web admin for project
managing
COMPUTING
CLUSTER
IMAGE STORAGE
Dynamic Organisation,
Visual search or
advanced search
functionalities
Metadata (pixel size, acquisition time,…)
annotations,
Automatic analysis without
full download,
Data fusion, advanced
visualisation
BioImaging Cell and Tissue Core Facility
http://pict-ibisa.curie.fr/
![Page 8: Integration of heterogeneous data sources for high content ... · screening process • (3) Image acquisition High throughput removes subjectivity from the data but is still prone](https://reader034.vdocument.in/reader034/viewer/2022052408/5f05b1bc7e708231d4143c47/html5/thumbnails/8.jpg)
Underlying hardware infrastructure
![Page 9: Integration of heterogeneous data sources for high content ... · screening process • (3) Image acquisition High throughput removes subjectivity from the data but is still prone](https://reader034.vdocument.in/reader034/viewer/2022052408/5f05b1bc7e708231d4143c47/html5/thumbnails/9.jpg)
Avadis® iManage
Images
Annotations
Apps
Results Settings
Access APIs SOAP/XML/RPC
Collaborator
User
?
Software tool needs to be addressed
![Page 10: Integration of heterogeneous data sources for high content ... · screening process • (3) Image acquisition High throughput removes subjectivity from the data but is still prone](https://reader034.vdocument.in/reader034/viewer/2022052408/5f05b1bc7e708231d4143c47/html5/thumbnails/10.jpg)
Real-time Unified Bio-Imaging Exploitation
System > RUBIES
As part of France Bio-Imaging network, we developed a Real-time Unified Bio-
Imaging Exploitation System to answer data sharing needs.
Liferay technology was selected as an open source platform to develop the
RUBIES portail.
RUBIES uses open source tools and widely-known paradigms to create a
collaborative platform to visualise and exchange HCS data.
Heterogeneous data from HCS is analysed in real-time in order to enrich
databases which are exposed through a web service layer.
WebLab integration framework: Service Oriented Architecture where individual
components are encapsulated as web services with common int/com protocols.
![Page 11: Integration of heterogeneous data sources for high content ... · screening process • (3) Image acquisition High throughput removes subjectivity from the data but is still prone](https://reader034.vdocument.in/reader034/viewer/2022052408/5f05b1bc7e708231d4143c47/html5/thumbnails/11.jpg)
Heterogeneous components (algorithms, data sources, indexes, external tools)
• Internal components are separated from the communication layer
• Web services use standardized WebLab interfaces and data exchange protocols.
• Ensures compatibility between old and new services.
Liferay technology choice for development
![Page 12: Integration of heterogeneous data sources for high content ... · screening process • (3) Image acquisition High throughput removes subjectivity from the data but is still prone](https://reader034.vdocument.in/reader034/viewer/2022052408/5f05b1bc7e708231d4143c47/html5/thumbnails/12.jpg)
• Any kind of information may be created by a service without needing to change
other services.
(subject, predicate, object)=>(Project URI/ hasId / “016” )
• Visualization is done through portal and portlet technology. Each portlet is able to call user specific types of data or processing.
• Portlets are loosely coupled elements, easing maintenance.
• Pages can be easily created by users in order to be tailored to a specific task or work methodology.
Data exchange protocols use RDF/JSON
for annotation storage and communication
![Page 13: Integration of heterogeneous data sources for high content ... · screening process • (3) Image acquisition High throughput removes subjectivity from the data but is still prone](https://reader034.vdocument.in/reader034/viewer/2022052408/5f05b1bc7e708231d4143c47/html5/thumbnails/13.jpg)
Cell count
Lysosome count
IF Granularity
Raw tiff
JP
EG
20
00
(Q
=80
)
IF average intensity per well for lysosome staining (endocytosis
screening) / Raw data (384 well plates)
Plate robust z-score normalisation of average cell population IF
intensity prior to visualisation
1) Image normalisation and contrast enhacement 2) JPEG compression
Data pre-processing prior to visualisation
Compression is necessary to decrease client/server communication throughput (10 fold
decrease) and also fine for cell and organelle segmentation
Experimental plates in acquisition order
No
rmalized
well in
ten
sit
y
Raw
well in
ten
sit
y
DAPI Endosome Lysosome
![Page 14: Integration of heterogeneous data sources for high content ... · screening process • (3) Image acquisition High throughput removes subjectivity from the data but is still prone](https://reader034.vdocument.in/reader034/viewer/2022052408/5f05b1bc7e708231d4143c47/html5/thumbnails/14.jpg)
Background services provide capability to search for elements through plain text queries
Interface for dynamic search of content
![Page 15: Integration of heterogeneous data sources for high content ... · screening process • (3) Image acquisition High throughput removes subjectivity from the data but is still prone](https://reader034.vdocument.in/reader034/viewer/2022052408/5f05b1bc7e708231d4143c47/html5/thumbnails/15.jpg)
Selected fields (and corresponding metadata, in the lower-right corner) can be exported to external applications, in this case Avadis iManage (commercial) through specialized portlets
Export of selected annotation/images
![Page 16: Integration of heterogeneous data sources for high content ... · screening process • (3) Image acquisition High throughput removes subjectivity from the data but is still prone](https://reader034.vdocument.in/reader034/viewer/2022052408/5f05b1bc7e708231d4143c47/html5/thumbnails/16.jpg)
Export of selected annotation/images /
multiple selection
![Page 17: Integration of heterogeneous data sources for high content ... · screening process • (3) Image acquisition High throughput removes subjectivity from the data but is still prone](https://reader034.vdocument.in/reader034/viewer/2022052408/5f05b1bc7e708231d4143c47/html5/thumbnails/17.jpg)
Export of mixed automated and
experimental annotations with images
![Page 18: Integration of heterogeneous data sources for high content ... · screening process • (3) Image acquisition High throughput removes subjectivity from the data but is still prone](https://reader034.vdocument.in/reader034/viewer/2022052408/5f05b1bc7e708231d4143c47/html5/thumbnails/18.jpg)
Conclusions
RDF as a web standard suitable to represents any kind of information
statements as collections of triples for HCS data (nosql approach).
Image preprocessing (normalisation/compression) is a critical step for HCS
data visualisation (i.e. throughput) and visual assessment of results.
RUBIES can interoperate with other data management systems and hence
integrate HCS image and annotation data with other experimental output.
Concurrently access to the portal and user management is supported with
job parallelisation in the computing cluster.
RUBIES is still a work in progress however visualisation Interfaces are
fully functional and we are happy to share our work with the community.
Further work: user-added annotations and exploitation of image and
metadata in order to create cell image dictionaries for pattern recognition.
![Page 19: Integration of heterogeneous data sources for high content ... · screening process • (3) Image acquisition High throughput removes subjectivity from the data but is still prone](https://reader034.vdocument.in/reader034/viewer/2022052408/5f05b1bc7e708231d4143c47/html5/thumbnails/19.jpg)
Franck Perez
Philippe Benaroche
Jacque Chamonix
Elaine Del Nery
Aurianne Lescure
Sarah Tessier
Dmitry Vjostockolevic
Elodie Anthony
Acknoledgments Curie Institute - BIOPHENICS
Curie Institute – UMR 144
Jean Salamero
Perrine Paul-Gilloteaux