transmart 17.1 technical overview

36
27th of October 2016 Piotr Zakrzewski – The Hyve TranSMART Pro 17.1 project Technical Overview

Upload: seher-room

Post on 10-Feb-2017

22 views

Category:

Health & Medicine


5 download

TRANSCRIPT

Page 1: tranSMART 17.1 technical overview

27th of October 2016Piotr Zakrzewski – The Hyve

TranSMART Pro 17.1 project Technical Overview

Page 2: tranSMART 17.1 technical overview

2

What does 17.1 mean for future development?

Improved ease of development● Clean up of repositories (single repo)● One step build● Dependencies update● Rest api improvements● Consolidation and extension of the star

schema to better fit tranSMART and new data types

● Documentation

Page 3: tranSMART 17.1 technical overview

3

What does 17.1 mean for future development?

Improved ease of development● Clean up of repositories (single repo)● One step build● Dependencies update● Rest api improvements● Consolidation and extension of the star

schema to better fit tranSMART and new data types

● Documentation

Page 4: tranSMART 17.1 technical overview

4

Repository StructureBefore you can deploy it here ...

Page 5: tranSMART 17.1 technical overview

5

Repository Structure

core-api core-db rest-api R modules core-api transmart

data legacy db

you need all of these ...

...and these...

Page 6: tranSMART 17.1 technical overview

6

Repository Structure16.2: - TranSMART 16.2 spans 10 core

repositories- Building & testing tranSMART requires a

special setup (that resides in yet another repository)

17.1:- Single repository with all core

components necessary for building working tranSMART WAR file

Page 7: tranSMART 17.1 technical overview

7

What does 17.1 mean for future development?

Improved ease of development● Clean up of repositories (single repo)● One step build● Dependencies update● Rest api improvements● Consolidation and extension of the star

schema to better fit tranSMART and new data types

● Documentation

Page 8: tranSMART 17.1 technical overview

8

Versioning of Artifacts 16.2:- Most components are versioned as

SNAPSHOTs- core-api, core-db, rest-api, transmartApp

and all other core components need to match strictly in revision in order to work

17.1:- Single repository: all changes to different

components come in a single PR

Page 9: tranSMART 17.1 technical overview

9

Build Process16.2:- Transmart 16.2 (Grails 2) uses Gant scripts for

building- git-repo used for fetching all repositories- custom groovy script (dependency manager)

needed for dev setup17.1:- Gradle build system (comes with Grails 3)- One step build (also with database setup)- just git clone && ./gradlew build

Page 10: tranSMART 17.1 technical overview

10

Test Setup16.2:- Custom script matching branches during

travis run- Different way to run tests locally and on

travis- No reliable way to run tests for all

components- Tested on H2 in-memory database17.1:- ./gradlew test both locally and on travis- tested against Oracle and Postgres - BDD Spock framework for testing

Page 11: tranSMART 17.1 technical overview

11

- Default option for Grails 3.X- Very versatile build system - Also very popular (gained momentum due to

adoption by Android)- Especially suitable for multi-project, multi-

language builds like tranSMART

Page 12: tranSMART 17.1 technical overview

12

What does 17.1 mean for future development?

Improved ease of development● Clean up of repositories (single repo)● One step build● Dependencies update● Rest api improvements● Consolidation and extension of the star

schema to better fit tranSMART and new data types

● Documentation

Page 13: tranSMART 17.1 technical overview

13

Java 7 to Java 8

tranSMART is still running on Java 7 which is no longer supported, even for security updates since April 2015.

Java 7 reached its end of life

Page 14: tranSMART 17.1 technical overview

14

Groovy 2.4 and Grails 3

- Java 8 supports invokeDynamic, which should increase performance of many groovy dynamic calls

- Many workarounds accounting for old Grails and Hibernate versions bugs no longer necessary

- Upgrade allowed us to adopt better build system: Gradle

Page 15: tranSMART 17.1 technical overview

15

What does 17.1 mean for future development?

Improved ease of development● Clean up of repositories (single repo)● One step build● Dependencies update● Rest api improvements● Consolidation and extension of the star

schema to better fit tranSMART and new data types

● Documentation

Page 16: tranSMART 17.1 technical overview

16

REST-API versioning

● TranSMART REST-api is used in production● Several clients and third-party apps● But development needs to continue …

Page 17: tranSMART 17.1 technical overview

17

REST-API versioning

- in 17.1 REST-api versioning is introduced- Versioning is done on the url level- GET /studies becomes GET /v1/studies- only minor influence on existing clients (change of

base url configuration to include version)

Page 18: tranSMART 17.1 technical overview

18

Current REST-API documentation

Page 19: tranSMART 17.1 technical overview

19

Open API (previously Swagger)

Page 20: tranSMART 17.1 technical overview

20

What does 17.1 mean for future development?

Improved ease of development● Clean up of repositories (single repo)● One step build● Dependencies update● Rest api improvements● Consolidation and extension of the star

schema to better fit tranSMART and new data types

● Documentation

Page 21: tranSMART 17.1 technical overview

21

Db schema as of now (16.2)

Page 22: tranSMART 17.1 technical overview

22

Db schema as of now (16.2)

Some facts about the current schema:Study exists only as string ids sprinkled around the star

schema (no table for study)Concepts and patients belong to a study (cannot be

shared)Combination of patient-concept yields a single

observation

Page 23: tranSMART 17.1 technical overview

23

Db schema of 17.1

Page 24: tranSMART 17.1 technical overview

24

Db schema of 17.1

Most important Consequences of 17.1 changes:Concepts and patients can be shared between studies more straightforward cross trial comparison (trial-visit

dimension) and longitudinal data (start date) supportMuch redundancy and inconsistencies removed

Page 25: tranSMART 17.1 technical overview

25

Hypercube- Introduction of longitudinal data

requires a whole different approach

- Modifiers used to store time point. Both relative and absolute allowed

- Each observation has effectively an additional dimension (hence the Hypercube)

Page 26: tranSMART 17.1 technical overview

26

How to query a Hypercube ?

Page 27: tranSMART 17.1 technical overview

27

Impact on backwards compatibility- Old UI will work only with old data, new data

(especially longitudinal) will not be supported- Old ui will not make use of new cross-trial

functionality- Migration path will be provided between 16.2 and

17.1

Page 28: tranSMART 17.1 technical overview

28

New UI however will support the longitudinal data and other features

Page 29: tranSMART 17.1 technical overview

29

What does 17.1 mean for future development?

Improved ease of development● Clean up of repositories (single repo)● One step build● Dependencies update● Rest api improvements● Consolidation and extension of the star

schema to better fit tranSMART and new data types

● Documentation

Page 30: tranSMART 17.1 technical overview

30

Documentation

- one of the project deliverables is documentation on the database schema

- REST-api documented with Open-API- Documentation as part of git repository

Page 31: tranSMART 17.1 technical overview

31

Conclusion

17.1 aside from many new features is also a major clean-up that will make future

developments easier

Page 32: tranSMART 17.1 technical overview
Page 33: tranSMART 17.1 technical overview

Backup slides

33

Page 34: tranSMART 17.1 technical overview

34

Arvados Keep

Page 35: tranSMART 17.1 technical overview

35

Performance Benchmarks- Goal: safeguarding performance of REST-api- Implemented as a Gradle task (single command)- Should help developers spot falls in performance

after new changes- Reference setup on Amazon will be available to

make benchmarks comparable

Page 36: tranSMART 17.1 technical overview

36

Other changes- Multiple observations per concept-patient support- Categorial variables no longer loaded per value

(e.g. variable Treated being two variables: yes and no)

- Several new tables to accommodate new HDD data type (RNAseq measurement per transcript) and table to store generic links to external resources (files)