clinical data eav
DESCRIPTION
TRANSCRIPT
Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
CHCDB & CHCDBWEB
Clinical Annotation Database and web interface
Thomas Burguiere
INSERM Unite 674
May 5th, 2011
Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 1 / 35
Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
1 Introduction
2 Relational DatabasePrincipleBenefitsInterface
3 Clinical Annotation dataSpecificitiesE.A.V.
4 CHCDB
5 CHCDBWEB
6 Conclusions
Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 2 / 35
Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
• 1500 liver tumor samples
• Malignant (HCC) and benign (HCA) tumors
• Normal Tissue
Existing Data
• Clinical Annotations of malignant tumors (4D)• Excel files which contains :
• Clinical Annotations of malignant & benign tumors• Other annotations (mutations, clinical studies, etc.)• Tissue extractions listings (concentrations / quantities)
Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 4 / 35
Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
Existing Data
• Clinical Annotations of malignant tumors (4D)• Excel files which contains :
• Clinical Annotations of malignant & benign tumors• Other annotations (mutations, clinical studies, etc.)• Tissue extractions listings (concentrations / quantities)
Problems
• Clinical Annotations of malignant tumours can only be accessed onsingle machine
• Redundant data among di↵erent files
• Duplicated files on di↵erent machines
,! Discrepancies between di↵erent files
,! Cross-checking data between the di↵erent data source is cumbersome
Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 5 / 35
Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
1 Introduction
2 Relational DatabasePrincipleBenefitsInterface
3 Clinical Annotation dataSpecificitiesE.A.V.
4 CHCDB
5 CHCDBWEB
6 Conclusions
Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 6 / 35
Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
Principle
Relational Database : Software
• Relational Database Management System : software which containsand organizes data (OracleTM, MySQL, DB2TM, SQL ServerTM, etc.)
• Client Server architecture :• Server software, which manages data, installed on a single machine• Client software, which queries the server, installed on any machine used to
consult the database
Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 7 / 35
Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
Principle
Client Server architecture
���������
�� ������ �� ������ �� ������ �� ������ �� ������ �� ������
�������
Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 8 / 35
Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
Principle
Relational Database : Data
• The data is stored in a set of tables
• One can define a set of constraints regarding the data contained inthe tables
• The tables can be associated to one another by logical links : integrityconstraints
Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 9 / 35
Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
Principle
Relational Database : Example
!"##$%&' !$()*!+,% -%. /0% -1%21)#"# 34526%3)(2# 789
!"!#$%& "!' ( )# &*+, - ./.
!"!)01& "!! ( 2-
!"!)#.& "!! 3 $.
Classical table (e.g. Excel sheet)
Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 10 / 35
Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
Principle
Relational Database : Example
1 Breaking downdata
2 Typingconstraints
3 Unicityconstraints
4 Integrityconstraints
!"##$%&' !$()*!+,% -%. /0%
!"!#$%& "!' ( )#
!"!)01& "!! ( 2-
!"!)#.& "!! 3 $.
!"##$%&' -1%21)#"# 34526%3)(2# 789
!"!#$%& &*+, - ./.
!"!%-)& ('=E, ) ./.
&4567-
&45670
8'*!"'* 9:&
9:&;<<=,': (=<'&8'*!"'*
Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 11 / 35
Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
Benefits
Relational Database : Benefits
• Data centralisation on the server side
• Constraints allow, in some instances, to avoid data inconsistencies
,! Consistent data
• E�cient : tables containing millions of rows can be easily manipulated
• Querying a correctly structured database allows one to cross-checkdata very rapidly*
Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 12 / 35
Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
Interface
A database requires graphical interface
• Data manipulation in a database is done exclusively with queries,written in SQL (Structured Query Language)
•
!"##$%&' !$()*!+,% -%. /0% -1%21)#"# 34526%3)(2# 789
!"!#$%& "!' ( )# &*+, - ./.
!"!)01& "!! ( 2-
!"!)#.& "!! 3 $.• SELECT t a b l e 1 . t i s s u e ID , t a b l e 1 . TumorType ,
t a b l e 1 . Sex , t a b l e 1 . Age , t a b l e 2 . S t e a t o s i s ,t a b l e 2 . nb adenomas , t a b l e 2 .CRP
FROM t a b l e 1 INNER JOIN t a b l e 2ON ( t a b l e 1 . t i s s u e I D = t a b l e 2 . t i s s u e I D )WHERE t a b l e 1 . T i s sue ID = ’CHC358T ’ ;
• Powerful language, albeit counterintuitive
,! A graphical interface must be associated to the database
Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 13 / 35
Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
Interface
Graphical interface : principle
Mecanism
1 The interface receives instruction from the user, and transform theminto SQL queries sent to the server
2 The server receives the SQL queries, and sends back results
3 The interface receives the results from the server, and displays theresults to the user
Interface types
• 2 types of interface : desktop program or web interface
• In our case, we decided to develop a web interface
Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 14 / 35
Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
Interface
Web interface
• Software installed on a single machine : the web server
• Accessing the interface only requires a web browser
,! Avoids installation and maintenance issues on the client machines
,! Avoids OS compatibility issues (Mac, Windows, Linux, etc. . .)
Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 15 / 35
Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
Interface
Web client / server architecture
���������
������� �����
������
Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 16 / 35
Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
1 Introduction
2 Relational DatabasePrincipleBenefitsInterface
3 Clinical Annotation dataSpecificitiesE.A.V.
4 CHCDB
5 CHCDBWEB
6 Conclusions
Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 17 / 35
Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
Specificities
Specific features of clinical annotation data
Specificities
• New variables arefrequently added
• Data regarding the samevariable can be inputdi↵erently, depending ofsample provenance andtype (malignant orbenign tumor)
Consequences in a database
• Frequent addition of newcolumns or sub-tables
• Tables contain a lot of columns,with sparsely filled rows
,! Constant maintenance of thedatabase
Clinical annotation data must be stored in a specific database structure :the E.A.V. structure
Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 18 / 35
Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
E.A.V.
Principle
• E.A.V. = Entity Attribute Value[?]
• An E.A.V. is a subset of tables in a relational database, with aspecific organization
• This data organization is particularly suitable of clinical annotationdata
• In the E.A.V., all clinical annotation data is stored in one 3-columnstable :
• Entity : contains the identifier of the entity for which an annotation isstored (In our case, an entity is a tissue)
• Attribute : contains the identifier (e.g. the name) of the annotation variable
• Value : contains the value of the annotation, for a given entity
Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 19 / 35
Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
E.A.V.
Example
!"##$%&' !$()*!+,% -%. /0%
!"!#$%& "!' ( )#
!"!)*+& "!! ( ,-
!"!)#.& "!! / $.
!"##$%&' -1%21)#"# 34526%3)(2# 789
!"!#$%& &012 - .3.
Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 20 / 35
Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
E.A.V.
Example
!"##$%&' !$()*!+,% -%. /0%
!"!#$%& "!' ( )#
!"!)*+& "!! ( ,-
!"!)#.& "!! / $.
!"##$%&' -1%21)#"# 34526%3)(2# 789
!"!#$%& &012 - .3.
256789
256789
':;<=>79
':;<=>79
?@AB>;9
?@AB>;9
Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 20 / 35
Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
E.A.V.
Example
!"##$%&' ?2*"24<%&' ?2<$%
!"!#$%& CBD (
!"!#$%& &>EF;&GHB "!'
!"!#$%& 'IB )#
!"!#$%& C7B@7F9<9 &012
!"!#$%& 5=J@KB5FE@9 -
!"!#$%& !0L .3.
!"!)*+& CBD (
!"!)*+& 'IB ,-
!"!)*+& &>EF;&GHB "!!
!"!)#.& CBD /
!"!)#.& 'IB $.
!"!)#.& &>EF;&GHB "!!
256789 ':;<=>79 ?@AB>;9
Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 20 / 35
Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
E.A.V.
Pros & Cons
Pros
• New annotation = New line inthe table
• No structural modifications
• No sparsely filled tables
Cons
• A lot of lines in the table
• Very complex queries
• Columns are no longer typed
!"##$%&' ?2*"24<%&' ?2<$%
!"!#$%& CBD (
!"!#$%& &>EF;&GHB "!'
!"!#$%& 'IB )#
!"!#$%& C7B@7F9<9 &012
!"!#$%& 5=J@KB5FE@9 -
!"!#$%& !0L .3.
!"!#$%& M;@A!F57;@NBH6F5 &012
!"!)*+& CBD (
!"!)*+& 'IB ,-
!"!)*+& &>EF;&GHB "!!
!"!)#.& CBD /
!"!)#.& 'IB $.
!"!)#.& &>EF;&GHB "!!
!"!)#.& 2KEF59F5 444
?'0!"'0
Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 21 / 35
Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
E.A.V.
Metadata
• Loosinginformationregarding variabledata types isproblematic
,! This data isstored in anancillary table,the metadatatable
!"##$%&' ?2*"24<%&' ?2<$%
!"!#$%& CBD (
!"!#$%& &>EF;&GHB "!'
!"!#$%& 'IB )#
!"!#$%& C7B@7F9<9 &012
!"!#$%& 5=J@KB5FE@9 -
!"!#$%& !0L .3.
!"!#$%& M;@A!F57;@NBH6F5 &012
!"!)*+& CBD (
!"!)*+& 'IB ,-
!"!)*+& &>EF;&GHB "!!
!"!)#.& CBD /
!"!)#.& 'IB $.
!"!)#.& &>EF;&GHB "!!
!"!)#.& 2KEF59F5 444
?2*"24<%&' '212!+,%
CBD ?'0!"'0
&>EF;&GHB ?'0!"'0
'IB 4O&
C7B@7F9<9 PMMQ2'O
5=J@KB5FE@9 4O&
!0L (QM'&
M;@A!F57;@NBH6F5 PMMQ2'O
2KEF59F5 ?'0!"'0
Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 22 / 35
Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
1 Introduction
2 Relational DatabasePrincipleBenefitsInterface
3 Clinical Annotation dataSpecificitiesE.A.V.
4 CHCDB
5 CHCDBWEB
6 Conclusions
Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 23 / 35
Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
CHCDB
Software
• R.D.B.M.S. : MySQL• Open source & free• Most widely used open-source R.D.B.M.S.
,! Actively maintained,! Lots of maintenance and development tools
• The machine hosting the R.D.B.M.S. has yet to be bought
Data
• CHCDB’s tables fall into one of three categories• Tissue listings• Clinical annotation data, in the E.A.V. structure• Extraction data
Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 24 / 35
Database structure
Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
1 Introduction
2 Relational DatabasePrincipleBenefitsInterface
3 Clinical Annotation dataSpecificitiesE.A.V.
4 CHCDB
5 CHCDBWEB
6 Conclusions
Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 26 / 35
Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
A web interface
Peculiarities
• Installed on the server hosting the R.D.B.M.S.
• Can be reached from any machine on the CEPH network
Features
• Consultation and modificationof clinical annotations for a given tissue
• Listing of tissues and their annotations
• Listing of tissue extractions
• Management (add/modify/delete) of annotation variables
• Batch import of annotations
• Batch import of extraction data
Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 27 / 35
Consultation & modification of the annotations of a given
tissue
Consultation & modification of the annotations of a given
tissue
Listing of tissues and annotations
Listing of tissue extractions
Annotation variables management
Annotation variables management
Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
1 Introduction
2 Relational DatabasePrincipleBenefitsInterface
3 Clinical Annotation dataSpecificitiesE.A.V.
4 CHCDB
5 CHCDBWEB
6 Conclusions
Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 33 / 35
Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
Missing features in CHCDBWEB
• The tissue management interface is not yet complete
• The batch import interface for annotations and extraction is missing
CHCDB
• Defining a starting set of variables
• Importing existing data into CHCDB
Material
• Acquiring a configuring the machine which will host the database andthe web server
CHCDB and CHCDBWEB should enter production phase in June 2011.
Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 34 / 35