clinical data eav

36
Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions CHCDB & CHCDBWEB Clinical Annotation Database and web interface Thomas Burguiere INSERM Unit´ e 674 May 5th, 2011 Thomas Burguiere (INSERM Unit´ e 674) CHCDB & CHCDBWEB May 5th, 2011 1 / 35

Upload: thomas-burguiere

Post on 12-Nov-2014

247 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Clinical data eav

Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions

CHCDB & CHCDBWEB

Clinical Annotation Database and web interface

Thomas Burguiere

INSERM Unite 674

May 5th, 2011

Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 1 / 35

Page 2: Clinical data eav

Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions

1 Introduction

2 Relational DatabasePrincipleBenefitsInterface

3 Clinical Annotation dataSpecificitiesE.A.V.

4 CHCDB

5 CHCDBWEB

6 Conclusions

Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 2 / 35

Page 3: Clinical data eav

Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions

• 1500 liver tumor samples

• Malignant (HCC) and benign (HCA) tumors

• Normal Tissue

Existing Data

• Clinical Annotations of malignant tumors (4D)• Excel files which contains :

• Clinical Annotations of malignant & benign tumors• Other annotations (mutations, clinical studies, etc.)• Tissue extractions listings (concentrations / quantities)

Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 4 / 35

Page 4: Clinical data eav

Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions

Existing Data

• Clinical Annotations of malignant tumors (4D)• Excel files which contains :

• Clinical Annotations of malignant & benign tumors• Other annotations (mutations, clinical studies, etc.)• Tissue extractions listings (concentrations / quantities)

Problems

• Clinical Annotations of malignant tumours can only be accessed onsingle machine

• Redundant data among di↵erent files

• Duplicated files on di↵erent machines

,! Discrepancies between di↵erent files

,! Cross-checking data between the di↵erent data source is cumbersome

Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 5 / 35

Page 5: Clinical data eav

Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions

1 Introduction

2 Relational DatabasePrincipleBenefitsInterface

3 Clinical Annotation dataSpecificitiesE.A.V.

4 CHCDB

5 CHCDBWEB

6 Conclusions

Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 6 / 35

Page 6: Clinical data eav

Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions

Principle

Relational Database : Software

• Relational Database Management System : software which containsand organizes data (OracleTM, MySQL, DB2TM, SQL ServerTM, etc.)

• Client Server architecture :• Server software, which manages data, installed on a single machine• Client software, which queries the server, installed on any machine used to

consult the database

Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 7 / 35

Page 7: Clinical data eav

Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions

Principle

Client Server architecture

���������

�� ������ �� ������ �� ������ �� ������ �� ������ �� ������

�������

Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 8 / 35

Page 8: Clinical data eav

Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions

Principle

Relational Database : Data

• The data is stored in a set of tables

• One can define a set of constraints regarding the data contained inthe tables

• The tables can be associated to one another by logical links : integrityconstraints

Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 9 / 35

Page 9: Clinical data eav

Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions

Principle

Relational Database : Example

!"##$%&' !$()*!+,% -%. /0% -1%21)#"# 34526%3)(2# 789

!"!#$%& "!' ( )# &*+, - ./.

!"!)01& "!! ( 2-

!"!)#.& "!! 3 $.

Classical table (e.g. Excel sheet)

Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 10 / 35

Page 10: Clinical data eav

Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions

Principle

Relational Database : Example

1 Breaking downdata

2 Typingconstraints

3 Unicityconstraints

4 Integrityconstraints

!"##$%&' !$()*!+,% -%. /0%

!"!#$%& "!' ( )#

!"!)01& "!! ( 2-

!"!)#.& "!! 3 $.

!"##$%&' -1%21)#"# 34526%3)(2# 789

!"!#$%& &*+, - ./.

!"!%-)& ('=E, ) ./.

&4567-

&45670

8'*!"'* 9:&

9:&;<<=,': (=<'&8'*!"'*

Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 11 / 35

Page 11: Clinical data eav

Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions

Benefits

Relational Database : Benefits

• Data centralisation on the server side

• Constraints allow, in some instances, to avoid data inconsistencies

,! Consistent data

• E�cient : tables containing millions of rows can be easily manipulated

• Querying a correctly structured database allows one to cross-checkdata very rapidly*

Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 12 / 35

Page 12: Clinical data eav

Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions

Interface

A database requires graphical interface

• Data manipulation in a database is done exclusively with queries,written in SQL (Structured Query Language)

!"##$%&' !$()*!+,% -%. /0% -1%21)#"# 34526%3)(2# 789

!"!#$%& "!' ( )# &*+, - ./.

!"!)01& "!! ( 2-

!"!)#.& "!! 3 $.• SELECT t a b l e 1 . t i s s u e ID , t a b l e 1 . TumorType ,

t a b l e 1 . Sex , t a b l e 1 . Age , t a b l e 2 . S t e a t o s i s ,t a b l e 2 . nb adenomas , t a b l e 2 .CRP

FROM t a b l e 1 INNER JOIN t a b l e 2ON ( t a b l e 1 . t i s s u e I D = t a b l e 2 . t i s s u e I D )WHERE t a b l e 1 . T i s sue ID = ’CHC358T ’ ;

• Powerful language, albeit counterintuitive

,! A graphical interface must be associated to the database

Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 13 / 35

Page 13: Clinical data eav

Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions

Interface

Graphical interface : principle

Mecanism

1 The interface receives instruction from the user, and transform theminto SQL queries sent to the server

2 The server receives the SQL queries, and sends back results

3 The interface receives the results from the server, and displays theresults to the user

Interface types

• 2 types of interface : desktop program or web interface

• In our case, we decided to develop a web interface

Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 14 / 35

Page 14: Clinical data eav

Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions

Interface

Web interface

• Software installed on a single machine : the web server

• Accessing the interface only requires a web browser

,! Avoids installation and maintenance issues on the client machines

,! Avoids OS compatibility issues (Mac, Windows, Linux, etc. . .)

Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 15 / 35

Page 15: Clinical data eav

Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions

Interface

Web client / server architecture

���������

������� �����

������

Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 16 / 35

Page 16: Clinical data eav

Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions

1 Introduction

2 Relational DatabasePrincipleBenefitsInterface

3 Clinical Annotation dataSpecificitiesE.A.V.

4 CHCDB

5 CHCDBWEB

6 Conclusions

Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 17 / 35

Page 17: Clinical data eav

Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions

Specificities

Specific features of clinical annotation data

Specificities

• New variables arefrequently added

• Data regarding the samevariable can be inputdi↵erently, depending ofsample provenance andtype (malignant orbenign tumor)

Consequences in a database

• Frequent addition of newcolumns or sub-tables

• Tables contain a lot of columns,with sparsely filled rows

,! Constant maintenance of thedatabase

Clinical annotation data must be stored in a specific database structure :the E.A.V. structure

Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 18 / 35

Page 18: Clinical data eav

Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions

E.A.V.

Principle

• E.A.V. = Entity Attribute Value[?]

• An E.A.V. is a subset of tables in a relational database, with aspecific organization

• This data organization is particularly suitable of clinical annotationdata

• In the E.A.V., all clinical annotation data is stored in one 3-columnstable :

• Entity : contains the identifier of the entity for which an annotation isstored (In our case, an entity is a tissue)

• Attribute : contains the identifier (e.g. the name) of the annotation variable

• Value : contains the value of the annotation, for a given entity

Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 19 / 35

Page 19: Clinical data eav

Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions

E.A.V.

Example

!"##$%&' !$()*!+,% -%. /0%

!"!#$%& "!' ( )#

!"!)*+& "!! ( ,-

!"!)#.& "!! / $.

!"##$%&' -1%21)#"# 34526%3)(2# 789

!"!#$%& &012 - .3.

Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 20 / 35

Page 20: Clinical data eav

Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions

E.A.V.

Example

!"##$%&' !$()*!+,% -%. /0%

!"!#$%& "!' ( )#

!"!)*+& "!! ( ,-

!"!)#.& "!! / $.

!"##$%&' -1%21)#"# 34526%3)(2# 789

!"!#$%& &012 - .3.

256789

256789

':;<=>79

':;<=>79

?@AB>;9

?@AB>;9

Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 20 / 35

Page 21: Clinical data eav

Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions

E.A.V.

Example

!"##$%&' ?2*"24<%&' ?2<$%

!"!#$%& CBD (

!"!#$%& &>EF;&GHB "!'

!"!#$%& 'IB )#

!"!#$%& C7B@7F9<9 &012

!"!#$%& 5=J@KB5FE@9 -

!"!#$%& !0L .3.

!"!)*+& CBD (

!"!)*+& 'IB ,-

!"!)*+& &>EF;&GHB "!!

!"!)#.& CBD /

!"!)#.& 'IB $.

!"!)#.& &>EF;&GHB "!!

256789 ':;<=>79 ?@AB>;9

Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 20 / 35

Page 22: Clinical data eav

Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions

E.A.V.

Pros & Cons

Pros

• New annotation = New line inthe table

• No structural modifications

• No sparsely filled tables

Cons

• A lot of lines in the table

• Very complex queries

• Columns are no longer typed

!"##$%&' ?2*"24<%&' ?2<$%

!"!#$%& CBD (

!"!#$%& &>EF;&GHB "!'

!"!#$%& 'IB )#

!"!#$%& C7B@7F9<9 &012

!"!#$%& 5=J@KB5FE@9 -

!"!#$%& !0L .3.

!"!#$%& M;@A!F57;@NBH6F5 &012

!"!)*+& CBD (

!"!)*+& 'IB ,-

!"!)*+& &>EF;&GHB "!!

!"!)#.& CBD /

!"!)#.& 'IB $.

!"!)#.& &>EF;&GHB "!!

!"!)#.& 2KEF59F5 444

?'0!"'0

Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 21 / 35

Page 23: Clinical data eav

Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions

E.A.V.

Metadata

• Loosinginformationregarding variabledata types isproblematic

,! This data isstored in anancillary table,the metadatatable

!"##$%&' ?2*"24<%&' ?2<$%

!"!#$%& CBD (

!"!#$%& &>EF;&GHB "!'

!"!#$%& 'IB )#

!"!#$%& C7B@7F9<9 &012

!"!#$%& 5=J@KB5FE@9 -

!"!#$%& !0L .3.

!"!#$%& M;@A!F57;@NBH6F5 &012

!"!)*+& CBD (

!"!)*+& 'IB ,-

!"!)*+& &>EF;&GHB "!!

!"!)#.& CBD /

!"!)#.& 'IB $.

!"!)#.& &>EF;&GHB "!!

!"!)#.& 2KEF59F5 444

?2*"24<%&' '212!+,%

CBD ?'0!"'0

&>EF;&GHB ?'0!"'0

'IB 4O&

C7B@7F9<9 PMMQ2'O

5=J@KB5FE@9 4O&

!0L (QM'&

M;@A!F57;@NBH6F5 PMMQ2'O

2KEF59F5 ?'0!"'0

Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 22 / 35

Page 24: Clinical data eav

Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions

1 Introduction

2 Relational DatabasePrincipleBenefitsInterface

3 Clinical Annotation dataSpecificitiesE.A.V.

4 CHCDB

5 CHCDBWEB

6 Conclusions

Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 23 / 35

Page 25: Clinical data eav

Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions

CHCDB

Software

• R.D.B.M.S. : MySQL• Open source & free• Most widely used open-source R.D.B.M.S.

,! Actively maintained,! Lots of maintenance and development tools

• The machine hosting the R.D.B.M.S. has yet to be bought

Data

• CHCDB’s tables fall into one of three categories• Tissue listings• Clinical annotation data, in the E.A.V. structure• Extraction data

Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 24 / 35

Page 26: Clinical data eav

Database structure

Page 27: Clinical data eav

Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions

1 Introduction

2 Relational DatabasePrincipleBenefitsInterface

3 Clinical Annotation dataSpecificitiesE.A.V.

4 CHCDB

5 CHCDBWEB

6 Conclusions

Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 26 / 35

Page 28: Clinical data eav

Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions

A web interface

Peculiarities

• Installed on the server hosting the R.D.B.M.S.

• Can be reached from any machine on the CEPH network

Features

• Consultation and modificationof clinical annotations for a given tissue

• Listing of tissues and their annotations

• Listing of tissue extractions

• Management (add/modify/delete) of annotation variables

• Batch import of annotations

• Batch import of extraction data

Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 27 / 35

Page 29: Clinical data eav

Consultation & modification of the annotations of a given

tissue

Page 30: Clinical data eav

Consultation & modification of the annotations of a given

tissue

Page 31: Clinical data eav

Listing of tissues and annotations

Page 32: Clinical data eav

Listing of tissue extractions

Page 33: Clinical data eav

Annotation variables management

Page 34: Clinical data eav

Annotation variables management

Page 35: Clinical data eav

Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions

1 Introduction

2 Relational DatabasePrincipleBenefitsInterface

3 Clinical Annotation dataSpecificitiesE.A.V.

4 CHCDB

5 CHCDBWEB

6 Conclusions

Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 33 / 35

Page 36: Clinical data eav

Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions

Missing features in CHCDBWEB

• The tissue management interface is not yet complete

• The batch import interface for annotations and extraction is missing

CHCDB

• Defining a starting set of variables

• Importing existing data into CHCDB

Material

• Acquiring a configuring the machine which will host the database andthe web server

CHCDB and CHCDBWEB should enter production phase in June 2011.

Thomas Burguiere (INSERM Unite 674) CHCDB & CHCDBWEB May 5th, 2011 34 / 35