reproducible workflow with cytoscape and jupyter notebook

65
Reproducible Workflows with Jupyter Notebook and Cytoscape Keiichiro Ono Cytoscape Core Developer Team UC, San Diego Trey Ideker Lab / National Resource for Network Biology 5/19/2016 Advanced Cytoscape Workshop

Upload: keiichiro-ono

Post on 19-Jan-2017

353 views

Category:

Data & Analytics


6 download

TRANSCRIPT

Reproducible Workflows with Jupyter Notebook and CytoscapeKeiichiro OnoCytoscape Core Developer TeamUC, San Diego Trey Ideker Lab / National Resource for Network Biology

5/19/2016 Advanced Cytoscape Workshop

Course Materials: Clone/Fork/Download this repository!

https://github.com/idekerlab/tsri-lecture

Setup Guide:

https://github.com/idekerlab/tsri-lecture/blob/master/documents/Setup%20Guide.pdf

Cytoscape 3.4.0:

http://www.cytoscape.org/download.php

Keiichiro Ono

Cytoscape Core Developer since 2005 @UCSD Trey Ideker Lab

Area of Interest:Biological Data Integration & Visualization

Agenda

• Reproducible Analysis & Visualization

• Introduction to Jupyter Notebook

• Create a reproducible network visualization workflows with Python

Review: Cytoscape Core Features

Review

- Network analysis / visualization is a powerful method to get biological insights from your screening result

- Cytoscape is the de-facto standard tool to perform this type of analysis

Review

-Core features of Cytoscape -Navigation (Pan/Zoom/Select) -Network / Table Data Import -Automatic Layout -Visual Style

Drawing Biological Networks

VS

Drawing Tools

You need to specify color of each node, width of each edge, shape of nodes, etc.

There is one huge difference between Cytoscape and Illustrator…

In Cytoscape, Your Data Controls View

Creating Visualizations in Cytoscape

Name Type

BRCA1 gene

MAP2K1 gene

C05981 compound

• Mapping from Type to Node Shape • Mapping from Type to Node Color

C05981

BRCA1

MAP2K1

Creating mappings from data points to Visual Properties

Reproducibility

Recap

Cytoscape Session File — for sharing results

But what about process?

http://www.the-scientist.com/?articles.view/articleNo/43632/title/Get-With-the-Program/

https://theconversation.com/how-computers-broke-science-and-what-we-can-do-to-fix-it-49938http://www.nature.com/nature/journal/v483/n7391/full/483531a.html

Reproducibility…it’s a known issue

Problems- Reproducibility of biological research, especially for in vivo/vitro

experiments, is a hard problem

- But this is true even for in silico analysis! - OS version - Revision of scripts - Data analysis software versions - Version of data files - Command line parameters written on a paper napkin - “Black magic” only a grad student knows

- This is something we need to fix, using latest technologies and best practices

Typical Workflow

Data Preparation Analysis Visualization

Data Preparation

Data Preparation

- Cleansing

- Normalization

- Missing values

- Corrupted values

- Reformat

- Conversion

Data Preparation Analysis Visualization

Analysis

Analysis

- Filtering

- Standard graph statistics

- Density

- Betweenness - Centrality

- Clustering

- Community Detection

- GO enrichment analysis

Data Preparation Analysis Visualization

Visualization

Visualization

- Mapping

- Data points to visual variables

- Layout

- For graphs:

- Force-directed

- Tree

Data Preparation Analysis Visualization

Data Preparation

Analysis Visualization

Data Preparation

Analysis Visualization

Cytoscape for Interactive Visualization

Python for Data Manipulation / Analysis

Lab Notebook for in silico Experiments

Interactive Command-Line +

Markdown-based Documents

IPython Notebook? Jupyter?

IPython Notebook

Notebook UI

+ Python Kernel

Jupyter Notebook UI

+

Language Kernel

(R/Julia/etc.)

Language-Agnostic

- From next version (4.x), Python Notebook will be an implementation of Jupyter

- You can switch to other language kernels

- In this lecture, we will use Python, but you can use language of your choice to control Cytoscape

Question

• Cytoscape is a desktop application

• Point & click GUI operation

• Easy to use, but how can we make our workflow reproducible?

REST

What is cyREST?

- Platform-independent, RESTful API module for Cytoscape - Means you can access basic Cytoscape data objects

programmatically - Now it’s a Cytoscape Core feature!

REST

Interactive Data Analysis Environments

In-House Databases External Computing Resources

- Graph Layout- Statistical Analysis- Data Pre-processing

RStudio

- NumPy- SciPy- Pandas- NetworkX

IPython Notebook

File / Code Hosting ServicesPublic Data Repository

PSICQUIC Services

EBI RDF Platform

Other Bioinformatics Web Applications / Services

- igraph- rCurl

Command Line Tools

> sed> awk> grep> curl

Web Browsers

Data Repository & Collaboration Service

Data Bus (Internet)

Your Workstation

Cytoscape App Store

Cytoscape Desktop

Apps

Core

REST

REST API?

curl http://mygene.info/v2/query?q=kras

{ "hits": [ { "taxid": 9606, "entrezgene": 3845, "symbol": "KRAS", "_id": "3845", "name": "Kirsten rat sarcoma viral oncogene homolog" }, { "taxid": 10090, "entrezgene": 16653, "symbol": "Kras", "_id": "16653", "name": "Kirsten rat sarcoma viral oncogene homolog" }, { "taxid": 10116, "entrezgene": 24525, "symbol": "Kras", "_id": "24525", "name": "Kirsten rat sarcoma viral oncogene" }, { "taxid": 10090, "entrezgene": 110836, "symbol": "Kras2-rs2", "_id": "110836", "name": "Kirsten rat sarcoma oncogene 2, related sequence 2" }, { "taxid": 10090, "entrezgene": 110832, "symbol": "Kras2-rs1", "_id": "110832", "name": "Kirsten rat sarcoma oncogene 2, related sequence 1" }, { "taxid": 10090, "entrezgene": 111117, "symbol": "Kras1-ps", "_id": "111117", "name": "Kirsten rat sarcoma oncogene 1, pseudogene" } ], "max_score": 391.5175, "took": 4, "total": 6}

REST

Cytoscape 3.1+Clients

POST

PUT

DELETE

GET

How cyREST Works

Mapping Cytoscape API to HTTP Methods

Create

Read

Update

Delete

Cytoscape Operations

POST

GET

PUT

DELETE

HTTP Methods

Get full network with unique ID 52 as JSON

GET http://localhost:1234/v1/networks/52

http://localhost:1234/v1/networks/52

Language-Specific Shims

For Python For R

REST

RESTLab notebook to record

your workflow

Make Cytoscape controllable via scripts

Manage multiple versions of your

notebooks and other scripts

Hands-On:

Using Cytoscape from Jupyter Notebook

Where should we go from here?

RESTLab notebook to record

your workflow

Make Cytoscape controllable via scripts

Manage multiple versions of your

notebooks and other scripts

Missing: Environment to execute your workflow

Python 3.5.0

Ubuntu 15.04

Pandas, numpy, scipy, jupyter…

Docker as Portable Data Analysis Environment

Bare Metal MachineOSVirtual Machine

Frameworks

Your App

Bare Metal MachineOS (Linux)

Docker

FrameworksApplication

FrameworksApplication

FrameworksApplication

FrameworksApplication

FrameworksApplication

What is Docker?

- Container to run applications in an isolated environment

- Application = Layer of images

- Sharable Environments

- Environments as code

Docker Hub

- Sharing environments as code!

- Dockerfile - Definition of your container

- “GitHub of Images”

Jupyter Official Images

Resources

- https://www.dataquest.io/blog/docker-data-science/

- https://try.jupyter.org/

-

- Two Google Groups

- [email protected]

- [email protected]

- ANY question is OK!

Getting Help

Further Readings

Further Readings

• My presentation slides

• http://www.slideshare.net/keiono

• cyREST web sites

• http://apps.cytoscape.org/apps/cyrest

• https://github.com/idekerlab/cyREST/wiki

• py2cytoscape — https://github.com/idekerlab/py2cytoscape