Transcript
Page 1: 3Stanford University, Stanford, CA 94305, 4Mount Sinai ... · CYTO 2012 Web-based SPADE for extracting a cellular hierarchy from multidimensional fluorescence and mass cytometry datasets

2012YTOC

Web-based SPADE for extracting a cellular hierarchy from multidimensional fluorescence and mass cytometry datasets

Jonathan Irish1,2, Zach Bjørnson3, Robert Bruggner3, Michael Linderman4, Chad Rosenberg1, Nikesh Kotecha1,3

1Cytobank, Inc, Mountain View, CA 94040, 2Vanderbilt University, Nashville, TN 372203Stanford University, Stanford, CA 94305, 4Mount Sinai School of Medicine, New York, NY 10029

Introduction and Aims Cloud Computing & SPADE Interpreting SPADE TreesOverall goal: Develop a SPADE algorithm for discovery of cell populations in fluorescence and mass cytometry datasets and deploy it in the Cytobank web-based cloud computing environment.

SPADE = Spanning-tree Progression Analysis of Density-normalized Events, a technique for automated population identification and visualization.

Aims: (1) Automatically identify populations of cells in heterogeneous samples, including rare subsets

(2) Visualize markers measured across subsets of cells and samples & organize populations by phenotype.

(3) Provide an interface that connects investigators to datasets and cloud computing resources

SPADE Interface

SPADE Analysis of Fluorescence Cytometry

Optimizing Scales for Computational Analysis

SPADE Analysis of Mass Cytometry

MountSinai

1) Cloud computing enables analysis on a remote server without monopolizing local resources & links analysis results with raw data files.

2) Channel-specific scaling is critical prior to analysis by SPADE or other computational tools, especially for fluorescence cytometry.

3) These results create a cloud-based framework for integrating flow cytometry analysis algorithms, tools, and visualizations.

Conclusions For More Information

Irish Lab

SPADE

MassCytometry

[email protected]/irishlab

SPADE on Cytobank [email protected]

www.cytobank.org

Imagine as a 3D shapethe “OK sign”

SPADE will cluster and thenarrange the clusters in a 2D tree

Some 2D trees may be intuitive, while others may not be

Clustering Minimum Spanning Tree

Tree A

Appropriate channel-specific scaling is essential

Selecting nodes or a bubble displays those events in a 2D plot

Rare populations are automatically identified

Tree B Tree C2D tree “close” to original shape Different branch positioning Different breakpoint

SPADE trees depict multidimensional similarity in two dimensions

SPADE Controlson Cytobank

Pre-gating PBMC for CD45+ single cells

Set target clusters

Min:Cofactor:

Result:

Issue:

-30002500Good

-3000500Poor

-3000150Bad

Under-transformedOff scale

For fluorescent flow cytometry data a biexponential or arcsinh transformation corrects the scale near zero.

A 50:50 mix of + and - events stained only for PerCP-Cy5.5 is shown using different scales.

Over-transformed

-30001

Very Bad

12500Bad

-300010,000Poor

- target number of nodes in the tree

- by percentile or # of cells

- remove dead cells, set scales

- can cluster using all or a subset

- used to set ‘basal’ for foldchange calculations

Set downsampling

Optional: pre-gating

Choose channels

Optional: group samples

Qiu P et al., Nature Biotechnology 2011

1) Get High-DCytometry Data

3) Cluster(group by similarity)

4) Project into 2D(minimum spanning tree)

2) Downsample(preserves rare subsets)

Dataset from Engelhardt BG et al., Blood 2012

5

4

1

2 541

23

3

1) Connect flow cytometry tools and knowledge

2) Communicate results and share data files

3) Provide key details (compensation, scales) to computational collaborators4) Apply new algorithms & visualizations that run best with significant compute resources

Growing needs:

Cloud computing is needed as cytometry experiments increase in power

Sam

ple

Num

ber

Features Measured

100

1000+

50

104 10 20 50+

Future of Flow

Routine

CuttingEdge

BleedingEdge

Marker 1: Nucleic Acid

3.51 Singlet PBMCs

Marker 2: CD45

5.15 CD45+ PBMCs

Marker 3: CD3CD3+ CD45+

CD3- CD45+

3.05

Marker 4: CD4CD3+ CD45+

3.01

Monocytes & DCsCD4lo CD3- CD45+

CD3- CD45+

CD4+ T cellsCD4+ CD3+ CD45+

Marker 5: CD8

Monocytes & DCsCD4lo CD3- CD45+

3.38

CD3- CD45+

CD4+ T cellsCD4+ CD3+ CD45+ CD8-

CD8+ T cellsCD8+ CD3+ CD45+ CD4-

Marker 6: CD16

CD3- CD45+

2.55

MDC Group BCD16+

CD4lo CD3- CD45+

MDC Group ACD16lo

CD4lo CD3- CD45+

CD4+ T cellsCD4+ CD3+ CD45+ CD8-

NK cellsCD16+ CD3- CD45+

CD8+ T cellsCD8+ CD3+ CD45+ CD4-

Marker 7: CD19

B cellsCD19+ CD3- CD16- CD4- CD45+

4.03

MDC Group BCD16+

CD4lo CD3- CD45+

MDC Group ACD16lo

CD4lo CD3- CD45+

NK cellsCD16+ CD3- CD45+

CD4+ T cellsCD4+ CD3+ CD45+ CD8-

CD8+ T cellsCD8+ CD3+ CD45+ CD4-

Marker 8: CD20

4.02

MDC Group BCD16+

CD4lo CD3- CD45+

MDC Group ACD16lo

CD4lo CD3- CD45+

NK cellsCD16+ CD3- CD45+

B cellsCD19+ CD20+ CD3-

CD16- CD4- CD45+

CD4+ T cellsCD4+ CD3+ CD45+ CD8-

CD8+ T cellsCD8+ CD3+ CD45+ CD4-

Marker 9: CD14

2.13

NK cellsCD16+ CD3- CD45+

MonocytesCD14+ CD16lo

CD4lo CD3- CD45+

B cellsCD19+ CD20+ CD3-

CD16- CD4- CD45+

CD4+ T cellsCD4+ CD3+ CD45+ CD8-

CD8+ T cellsCD8+ CD3+ CD45+ CD4-

MDC Group BCD14- CD16+

CD4lo CD3- CD45+

Marker 10: CD33

1.10

NK cellsCD16+ CD3- CD45+

MDC Group BCD33lo CD14- CD16+

CD4lo CD3- CD45+

MonocytesCD33+ CD14+ CD16lo

CD4lo CD3- CD45+

B cellsCD19+ CD20+ CD3-

CD16- CD4- CD45+

CD4+ T cellsCD4+ CD3+ CD45+ CD8-

CD8+ T cellsCD8+ CD3+ CD45+ CD4-

Marker 11: CD123

NK cellsCD16+ CD3- CD45+

1.99

Dendritic CellsCD123+

CD33lo CD14- CD16+

CD4lo CD3- CD45+

MonocytesCD33+ CD14+ CD16lo

CD123lo

CD4lo CD3- CD45+

B cellsCD19+ CD20+ CD3-

CD16- CD4- CD45+

CD4+ T cellsCD4+ CD3+ CD45+ CD8-

CD8+ T cellsCD8+ CD3+ CD45+ CD4-

Summary of Major Cell Types Grouped by SPADE(Human PBMC)

5.15

NK cellsB cells

CD4+ T cellsCD8+ T cells

Dendritic Cells

Dendritic Cells

Monocytes

Mono

DC

DC

NKB

CD8+ TCD4+ T

Merged Nodes

Nucleic Acid (Ir), CD45, CD3, CD4, CD8,CD16, CD19, CD20, CD11b, CD123,CD45RA, CD33, CD11c, CD14, CD56,CD38, CD15, CD10, CD44, & HLA-DR

20 Markers:

All available commercially from DVS Sciences

All Human PBMC Events Singlet PBMCs CD45+ PBMCsNucleic Acid+ CD45+ Singlets from human PBMC were clustered using 18 measured surface markers and arranged into a 400 node SPADE tree. 11 markers are shown below.

Major populations included CD4+ and CD8+ T cells, B cells, NK cells, monocytes, and dendritic cells.

Here, CD8 was under-scaled so that an artifical ‘hole’ in the graph existed around zero. This created the false impression of two CD8 populations in this sample gated as CD8 negative. SPADE treated this as a significant difference.

Comparison of CD8scaling for CD8

on cells gated as CD8-

CD8 was measured on PE-Cy5

Corrected ScaleIncorrect Scale

Since computational analysis techniques compare distance similar to what a person does when looking at a plot, these techniques can identify artificial populations near zero if data are not appropriately transformed prior to analysis.

arcsinh (inverse hyperbolic sine) scalehttp://mathworld.wolfram.com/InverseHyperbolicSine.html

In order to analyze fluorescent datasets with SPADE, Cytobank applies channel-specific scales set during routine analysis (gating, making figures).

Without channel specific scaling, population artifacts can arise (see CD8 example to the right).

Analysis of CD4+ T cell subsets in post-transplant diabetes mellitus patients based on SPADE analysis of Foxp3, CD25, CD127, CD45R0, α4β7 integrin (gut homing), and CLA (skin homing).

Rare gut and skin homing subpopulations of Foxp3+ CD25+ CD4+ T cells (~0.02% of total; 200 cells in 1 million) were identified by SPADE.

After the SPADE analysis completes you can interact with the tree by selecting one or more nodes (populations of cells) and naming them according to phenotype.

The tree to the right shows CD4 expression on CD45+ PBMC and the user creating a population named “CD4 T cells”.

A small analysis of 18 features on <100,000 cells (as with the mass cytometry file below) takes ~3 min. Larger experiments with millions of cells and multiple samples can take 1 hour or more (as with the fluorescent dataset to the right). SPADE runs on a server and you can close the browser window after initiating the analysis. When the run is complete Cytobank sends an email with a link to the results.

arcsinh(x) with cofactor c =

CD45R0

CLA(skin homing)

Foxp3α4β7 integrin(gut homing)

CD25 CD127 CD4User selects Foxp3+ CD45R0+ CD4 Treg nodes

No nodes selected,all cells plotted1

2

1

2

All Human PBMC Events Intact PBMCs CD14- Viable PBMCs

CD3+ CD4+

Top Related