researching cancer in the cloud - using spring, neo4j, mongo and redis in the cloud

50
© 2013 SpringOne 2GX. All rights reserved. Do not distribute without permission. Redbasin Networks: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis By Smita Kulkarni Gudur and Manoj Joshi Friday, September 6, 13

Upload: spring-io

Post on 09-May-2015

683 views

Category:

Technology


3 download

DESCRIPTION

Speakers: Smitha Gudur and Manoj Joshi Cancer/life science drug research models are very rich in relationships, relationship heterogeneity and entity inter-dependencies. Most entity metadata is dynamic and unpredictable making it difficult to fit such models in traditional relational landscape. Redbasin Networks uses a hybrid Nosql strategy that supports composite and rich document metadata that is interconnected pervasively. Cancer and life science data is excessively nested. You will find this useful if you are building complex engineering and/or scientific applications, and need insights on how to merge data from many diverse data-sets and map it to an intuitive and effective graph database model. We will show using code examples how complex metadata can be engineered using Spring, Neo4J and Mongo, to create useful drug insights for the drug researcher, and also provide a platform for technologists to build sophisticated life science applications.

TRANSCRIPT

Page 1: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

© 2013 SpringOne 2GX. All rights reserved. Do not distribute without permission.

Redbasin Networks: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis

By Smita Kulkarni Gudur and Manoj Joshi

Friday, September 6, 13

Page 2: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Introduction

Smitha Kulkarni Gudur, CEO

Manoj Joshi, CTO

Allan Grimes, VP Business

Neeta Potdar, HR & Admin

Friday, September 6, 13

Page 3: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Redbasin Networks Overview

Redbasin Networks provides a cloud based platform for cancer drug researchers in Pharma and Bio-tech.

Redbasin is a scalable technology and platform that allows Life Science researchers to gain insights about viable drug molecules and pathways.

Friday, September 6, 13

Page 4: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Cancer Ecosystem Today (It’s complex!)

EPA

CDCUniversities

NIH/NLM

Hospitals, Treatment CentersBiotech

Labs

Legal

Instrument vendors

Certification,Approval

Lab tests

Patients

Insurance

Pharma

Contract Research

Organization

Drug Labs

4

FDA

Friday, September 6, 13

Page 5: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Cancer Market Research

US cancer spending $108b

89mdeaths

2005-2015

Redbasin Networks 10% of top

200 drugscancerrelatedgenerate $1b/yr

1.5mnew cancerdeaths

Friday, September 6, 13

Page 6: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Spring Data: Redbasin Cancer Research

SpringData

Protein Gene Disease Drug Antibody Ligand Complex Epigenetics

MongoDB Neo4j Redis HBase Lucene

Cloud

REST API

XML JSON

Friday, September 6, 13

Page 7: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Typical Drug Life Cycle Costs

Friday, September 6, 13

Page 8: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Why Not Go Relational?

Oncological meta-data is multi-dimensional

Pervasive joins are a drag on performance

Unpredictable schemas during mining

Temporality is difficult to represent

Friday, September 6, 13

Page 9: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Redbasin Core Data Technologies

• Mongo• Neo4J• Redis• Lucene• HBase/Hadoop

Friday, September 6, 13

Page 10: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Why Mongo?

Lots of XML and JSON documents

Very easy to use

High performance and scalability

Strong Java & REST Support

Friday, September 6, 13

Page 11: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Why Neo4j?

Neo4j is a modern graph database

Very easy to use

Complex features that are used less often have been dropped

Strong Java & REST Support

Friday, September 6, 13

Page 12: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

How does Redbasin use Neo4J

We have 225 oncology dimensions

Everything either a node or relationship or a property

We use indexes liberally

Friday, September 6, 13

Page 13: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Numerous dim and sub-dim in Redbasin’s big data

DI

TX

Protein Gene Disease Drug Antibody Ligand

Epigenetics Ontology

Aminoacid

Structure PD/PK Physicochemical

Research Experiment

Interaction

Researcher Institute

Pathway

OrganismInstrument Method

Enzyme

Time LocationFDA Pharma ClinicalTrial

Friday, September 6, 13

Page 14: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Dimensions have sub-dimensions

DI

TX

Pharmacodynamics

Absorption Distribution Metabolism Elimination Toxicity

Principal Dimension

Sub-dimensions

(What drug does to body?)

Friday, September 6, 13

Page 15: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Data is Logical. But Big Data is not.

DI

AOP

TX

Real-time lookups

Understands human ideosyncracies

Logical

Impressive computational

abilities

Data is more than just data

Asymptotic convergence to

human

Friday, September 6, 13

Page 16: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

No enterprise! Just plain cloud...

DI

AOP

TX

Friday, September 6, 13

Page 17: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Perhaps a Nebula(e), but why?

DI

AOP

TX

•Contextual correlation•Ontology driven•Multi-dimensional•Hierarchical•Fractal like•Clustering•Dynamic/Evolving•Stars(facts) are born•Zoom for details•Humongous•Transparency•Dynamic metadata*•Interconnected•Graph like•Complexity

Friday, September 6, 13

Page 18: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

How does Redbasin use Spring DataRedbasin Cloud Connects to hundred’s of cancer data sourcesRedbasin uses contextual mining to create dynamic modelsWe map nodes, relationships, attributes to Redbasin Object ModelWe separate analytics from queries

Friday, September 6, 13

Page 19: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Neo4J Node Index Example IndexHits <Node> pNodeHits = drugIdIndex.get(DRUG_ID, drugConceptCode);if (pNodeHits != null && pNodeHits.size() > 0) { // if node already exists drugNode = pNodeHits.getSingle(); if (drugNode != null) { if (!drugNode.hasProperty(DRUG_CONCEPT_CODE)) { drugNode.setProperty(DRUG_CONCEPT_CODE, drugConceptCode); } if (!drugNode.hasProperty(BioEntityTypes.NODE_TYPE)) { drugNode.setProperty(BioEntityTypes.NODE_TYPE, BioEntityTypes.RB_DRUG); } }}

Friday, September 6, 13

Page 20: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Spring Stack: Spring Data with Mongo JSON "@molecule_type" : "complex", "@id" : "208314", "Name" : { "@name_type" : "PF", "@long_name_type" : "preferred symbol", "@value" : "TXA2/TP beta/beta Arrestin3/RAB11/GDP" }, "ComplexComponentList" : [ { "@molecule_idref" : "202489" }, { "@molecule_idref" : "202493", "PTMExpression" : [ { "@protein" : "O75228", "@position" : "239", "@aa" : "C", "@modification" : "palmitoylation" }

Friday, September 6, 13

Page 21: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Redbasin data grows and changes over time

Spring Data with Mongo Objects

Collection ideal for Redbasin’s unstructured

Data

Retrieve nested objects with ease

participantList.experimentalRoleList.experimentalRole.xref.secondaryRef.@db" : "pubmed"

DBObject utilities well suited for mapping to BioEntities

Friday, September 6, 13

Page 22: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Spring Data: Redis

Key

Value

Usage: Ontologies & Taxonomy for unique key value pairs. In auto completion as our data is “N” column based

Friday, September 6, 13

Page 23: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Redis - Ontology Lookups

Ontology Lookups Can Be Very Handy

Friday, September 6, 13

Page 24: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Redis - Analytics Cache

MineBot and Multi-entity Analytics is Nifty

Friday, September 6, 13

Page 25: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Redis - Managing Aliases

Gene Aliases for Instance are Numerous

Friday, September 6, 13

Page 26: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Redis - Key Value Pairs

Large Number of Key Value Pairs

Key Value

ATP Adenosine Tri-phosphate

Friday, September 6, 13

Page 27: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Redis - Slaves

Redis Slaves Simply Work

Friday, September 6, 13

Page 28: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Redis - Monitor

https://github.com/nkrode/RedisLive

Friday, September 6, 13

Page 29: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Redis - Subgraph Caching

•Subgraph Similarity Analytics•Pathway Rules Cache

Friday, September 6, 13

Page 30: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Redis - Spring data

• Using connection package Jedis• Spring’s data access exception for redis driver• Built abstraction - Redis template• Not using pubsub support• Using our our own JSON/XML mapping serializers• Atomic counter for redis - useful• Sorting (using) and pipelining (not using)• Not using 3.1 spring cache abstraction

Friday, September 6, 13

Page 31: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Spring Data: Redis Usage

Key Value

NCBI_TAXONOMY_ID Key: 9606 Homo Sapien

DISEASE_CODEKey: x46859

Metastases from colorectal carcinoma

HGNC_ID (Human Gene Identifier)Key: 1817 CEACAM5

Friday, September 6, 13

Page 32: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Redbasin vs Other BioModels

Redbasin Other BioModels

Focused on Oncology No focus on any specific Disease

Commercial/public domain correlations

Focused on academic knowledge

Information density is “infinite” Information size is “infinite”

Temporality/pathway dependent No time element

Hybrid vendor strategy No co-existence scenario

One cloud for all Oncology Typically downloadable software

Friday, September 6, 13

Page 33: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Neo4J Node Validation

Beclin 1 Gene

Bcl-2 Protein

Apoptosis

binds-to

inhibits

Biologically aware nodes and relationships

Friday, September 6, 13

Page 34: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Spring Data Relationship Entity

@RelationshipEntitypublic class BioRelation { }

Annotation for @RelationshipEntity

Metadata for recognition of a relationship class

Convenient relationship abstraction

Friday, September 6, 13

Page 35: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Relationships always have start/end nodes

@RelationshipEntitypublic class BioRelation { @EndNode private Object endNode; @StartNode private Object startNode; }

• A unique field must be marked as @EndNode• A unique field must be marked as @StartNode• Field can be any variable name• Flexibility for the programmer• Must be @BioEntity class

Friday, September 6, 13

Page 36: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Spring Data Relationship Entity

@RelationshipEntitypublic class BioRelation {..... @GraphId private Long id;..... }

• Id of the relationship• This is an unreliable field• But we have it hear for reference

Friday, September 6, 13

Page 37: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Spring Data Relationship Entity

@RelationshipEntitypublic class BioRelation { ..... @RelProperty private String name; .... }

• @RelProperty tells if this is a property• There could be non-property fields• The property here is “name”• It’s always a String

Friday, September 6, 13

Page 38: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Spring Data Relationship Entity

@RelationshipEntitypublic class BioRelation { .... @RelType private String relType; @RelProperty private String message;}

• @RelType is the actual relation• message is a default @RelProperty

Friday, September 6, 13

Page 39: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Spring Data Relationship Entity

@RelationshipEntitypublic class BioRelation { @EndNode private Object endNode; @StartNode private Object startNode; @GraphId private Long id;

@RelProperty private String name; @RelType private String relType; @RelProperty private String message;}

Friday, September 6, 13

Page 40: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Spring Data-isms @Retention(RetentionPolicy.RUNTIME) public @interface BioEntity { public BioTypes bioType(); }

Retention(RetentionPolicy.RUNTIME) public @interface RelationshipEntity { }

Friday, September 6, 13

Page 41: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Spring Data-isms Neo4j Retention(RetentionPolicy.RUNTIME)public @interface RelatedTo {

public Direction direction() default Direction.BOTH;

BioRelTypes relType() default BioRelTypes.DEFAULT_RELATION;

public Class<?> elementClass() default Object.class;

public BioTypes endNodeBioType() default BioTypes.UNKNOWN;

public BioTypes startNodeBioType() default BioTypes.UNKNOWN;}

Friday, September 6, 13

Page 43: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

End Node annotation

package com.redbasin.bio.meta;

@Retention(RetentionPolicy.RUNTIME)@Target({ ElementType.ANNOTATION_TYPE, ElementType.FIELD })public @interface Reference {}

@Retention(RetentionPolicy.RUNTIME)@Target({ElementType.FIELD,ElementType.METHOD})@Referencepublic @interface EndNode {}

• There is no concept of start and end nodes in Neo4J• This is a Redbasin abstraction• The @Reference can be used by annotation types and fields only• The annotation @EndNode can be used by methods and fields only• It cannot be used by classes or other elements

Friday, September 6, 13

Page 44: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Redbasin Open Doc Share

https://github.com/redbasin/redbasin-org

• It’s our “social taxonomy” for scientific documents• github community project• Scientists can collaborate over zillions of documents and media• Downloadable code, can run in cloud mode• Can be modified to support any data access• Redbasin.org uses it for collaboration in schools• A Spring champion cause, underprivileged schools

Friday, September 6, 13

Page 45: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

What can developers do?

• Help us with development of our public domain API• We support Jquery, d3js, JSON/XML, REST and more• We support Android, iOS on mobiles/tablets• Spring data integration - developer plugins

Friday, September 6, 13

Page 46: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Redbasin Cloud Projects

Open Stack ProjectCloud Foundry IntegrationAWS Project

Friday, September 6, 13

Page 47: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Why have Java developers chosen Spring?

DI

AOP

TX

CoreModel

J(2)EE usability

Testable, lightweightmodel for

programming

Application Portability

Powerful Service Abstractions

Deployment Flexibility

Friday, September 6, 13

Page 48: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Spring

Deploy to Cloud or on premise

Big, Fast,

FlexibleData Web,

Integration,Batch

CoreModel

GemFire

Friday, September 6, 13

Page 49: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Spring Stack

DI AOP TX JMS JDBC

MVC Testing

ORM OXM Scheduling

JMXREST Caching Profiles Expression

Spring Framework

HATEOAS

JPA 2.0 JSF 2.0 JSR-250 JSR-330 JSR-303 JTA JDBC 4.1

Java EE 1.4+/SE5+

JMX 1.0+WebSphere 6.1+

WebLogic 9+

GlassFish 2.1+

Tomcat 5+

OpenShift

Google App Eng.

Heroku

AWS Beanstalk

Cloud FoundrySpring Web Flow Spring Security

Spring Batch Spring Integration

Spring Security OAuth

Spring Social

Twitter LinkedIn Facebook

Spring Web Services

Spring AMQP

Spring Data

Redis HBase

MongoDB JDBC

JPA QueryDSL

Neo4j

GemFire

Solr Splunk

HDFS MapReduce Hive

Pig Cascading

Spring for Apache Hadoop

SI/Batch

Spring XD

Friday, September 6, 13

Page 50: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

Learn More. Stay Connected.

Contact Redbasin: bit.ly/redbasin<related sessions>

Talk to us on Twitter: @springcentralFind session replays on YouTube: spring.io/video

Friday, September 6, 13