3stanford university, stanford, ca 94305, 4mount sinai ... · cyto 2012 web-based spade for...

1
2012 YT O C Web-based SPADE for extracting a cellular hierarchy from multidimensional fluorescence and mass cytometry datasets Jonathan Irish 1,2 , Zach Bjørnson 3 , Robert Bruggner 3 , Michael Linderman 4 , Chad Rosenberg 1 , Nikesh Kotecha 1,3 1 Cytobank, Inc, Mountain View, CA 94040, 2 Vanderbilt University, Nashville, TN 37220 3 Stanford University, Stanford, CA 94305, 4 Mount Sinai School of Medicine, New York, NY 10029 Introduction and Aims Cloud Computing & SPADE Interpreting SPADE Trees Overall goal: Develop a SPADE algorithm for discovery of cell populations in fluorescence and mass cytometry datasets and deploy it in the Cytobank web-based cloud computing environment. SPADE = Spanning-tree Progression Analysis of Density-normalized Events, a technique for automated population identification and visualization. Aims: (1) Automatically identify populations of cells in heterogeneous samples, including rare subsets (2) Visualize markers measured across subsets of cells and samples & organize populations by phenotype. (3) Provide an interface that connects investigators to datasets and cloud computing resources SPADE Interface SPADE Analysis of Fluorescence Cytometry Optimizing Scales for Computational Analysis SPADE Analysis of Mass Cytometry Mount Sinai 1) Cloud computing enables analysis on a remote server without monopolizing local resources & links analysis results with raw data files. 2) Channel-specific scaling is critical prior to analysis by SPADE or other computational tools, especially for fluorescence cytometry. 3) These results create a cloud-based framework for integrating flow cytometry analysis algorithms, tools, and visualizations. Conclusions For More Information Irish Lab SPADE Mass Cytometry [email protected] my.vanderbilt.edu/irishlab SPADE on Cytobank [email protected] www.cytobank.org Imagine as a 3D shape the “OK sign” SPADE will cluster and then arrange the clusters in a 2D tree Some 2D trees may be intuitive, while others may not be Clustering Minimum Spanning Tree Tree A Appropriate channel-specific scaling is essential Selecting nodes or a bubble displays those events in a 2D plot Rare populations are automatically identified Tree B Tree C 2D tree “close” to original shape Different branch positioning Different breakpoint SPADE trees depict multidimensional similarity in two dimensions SPADE Controls on Cytobank Pre-gating PBMC for CD45+ single cells Set target clusters Min: Cofactor: Result: Issue: -3000 2500 Good -3000 500 Poor -3000 150 Bad Under-transformed Off scale For fluorescent flow cytometry data a biexponential or arcsinh transformation corrects the scale near zero. A 50:50 mix of + and - events stained only for PerCP-Cy5.5 is shown using different scales. Over-transformed -3000 1 Very Bad 1 2500 Bad -3000 10,000 Poor - target number of nodes in the tree - by percentile or # of cells - remove dead cells, set scales - can cluster using all or a subset - used to set ‘basal’ for fold change calculations Set downsampling Optional: pre-gating Choose channels Optional: group samples Qiu P et al., Nature Biotechnology 2011 1) Get High-D Cytometry Data 3) Cluster (group by similarity) 4) Project into 2D (minimum spanning tree) 2) Downsample (preserves rare subsets) Dataset from Engelhardt BG et al., Blood 2012 5 4 1 2 5 4 1 2 3 3 1) Connect flow cytometry tools and knowledge 2) Communicate results and share data files 3) Provide key details (compensation, scales) to computational collaborators 4) Apply new algorithms & visualizations that run best with significant compute resources Growing needs: Cloud computing is needed as cytometry experiments increase in power Sample Number Features Measured 100 1000+ 50 10 4 10 20 50+ Future of Flow Routine Cutting Edge Bleeding Edge Marker 1: Nucleic Acid 3.51 Singlet PBMCs Marker 2: CD45 5.15 CD45 + PBMCs Marker 3: CD3 CD3 + CD45 + CD3 - CD45 + 3.05 Marker 4: CD4 CD3 + CD45 + 3.01 Monocytes & DCs CD4 lo CD3 - CD45 + CD3 - CD45 + CD4 + T cells CD4 + CD3 + CD45 + Marker 5: CD8 Monocytes & DCs CD4 lo CD3 - CD45 + 3.38 CD3 - CD45 + CD4 + T cells CD4 + CD3 + CD45 + CD8 - CD8 + T cells CD8 + CD3 + CD45 + CD4 - Marker 6: CD16 CD3 - CD45 + 2.55 MDC Group B CD16 + CD4 lo CD3 - CD45 + MDC Group A CD16 lo CD4 lo CD3 - CD45 + CD4 + T cells CD4 + CD3 + CD45 + CD8 - NK cells CD16 + CD3 - CD45 + CD8 + T cells CD8 + CD3 + CD45 + CD4 - Marker 7: CD19 B cells CD19 + CD3 - CD16 - CD4 - CD45 + 4.03 MDC Group B CD16 + CD4 lo CD3 - CD45 + MDC Group A CD16 lo CD4 lo CD3 - CD45 + NK cells CD16 + CD3 - CD45 + CD4 + T cells CD4 + CD3 + CD45 + CD8 - CD8 + T cells CD8 + CD3 + CD45 + CD4 - Marker 8: CD20 4.02 MDC Group B CD16 + CD4 lo CD3 - CD45 + MDC Group A CD16 lo CD4 lo CD3 - CD45 + NK cells CD16 + CD3 - CD45 + B cells CD19 + CD20 + CD3 - CD16 - CD4 - CD45 + CD4 + T cells CD4 + CD3 + CD45 + CD8 - CD8 + T cells CD8 + CD3 + CD45 + CD4 - Marker 9: CD14 2.13 NK cells CD16 + CD3 - CD45 + Monocytes CD14 + CD16 lo CD4 lo CD3 - CD45 + B cells CD19 + CD20 + CD3 - CD16 - CD4 - CD45 + CD4 + T cells CD4 + CD3 + CD45 + CD8 - CD8 + T cells CD8 + CD3 + CD45 + CD4 - MDC Group B CD14 - CD16 + CD4 lo CD3 - CD45 + Marker 10: CD33 1.10 NK cells CD16 + CD3 - CD45 + MDC Group B CD33 lo CD14 - CD16 + CD4 lo CD3 - CD45 + Monocytes CD33+ CD14 + CD16 lo CD4 lo CD3 - CD45 + B cells CD19 + CD20 + CD3 - CD16 - CD4 - CD45 + CD4 + T cells CD4 + CD3 + CD45 + CD8 - CD8 + T cells CD8 + CD3 + CD45 + CD4 - Marker 11: CD123 NK cells CD16 + CD3 - CD45 + 1.99 Dendritic Cells CD123 + CD33 lo CD14 - CD16 + CD4 lo CD3 - CD45 + Monocytes CD33 + CD14 + CD16 lo CD123 lo CD4 lo CD3 - CD45 + B cells CD19 + CD20 + CD3 - CD16 - CD4 - CD45 + CD4 + T cells CD4 + CD3 + CD45 + CD8 - CD8 + T cells CD8 + CD3 + CD45 + CD4 - Summary of Major Cell Types Grouped by SPADE (Human PBMC) NK cells B cells CD4 + T cells CD8 + T cells Dendritic Cells Dendritic Cells Monocytes Mono DC DC NK B CD8 + T CD4 + T Merged Nodes Nucleic Acid (Ir), CD45, CD3, CD4, CD8, CD16, CD19, CD20, CD11b, CD123, CD45RA, CD33, CD11c, CD14, CD56, CD38, CD15, CD10, CD44, & HLA-DR 20 Markers: All available commercially from DVS Sciences All Human PBMC Events Singlet PBMCs CD45 + PBMCs Nucleic Acid+ CD45+ Singlets from human PBMC were clustered using 18 measured surface markers and arranged into a 400 node SPADE tree. 11 markers are shown below. Major populations included CD4+ and CD8+ T cells, B cells, NK cells, monocytes, and dendritic cells. Here, CD8 was under-scaled so that an artifical ‘hole’ in the graph existed around zero. This created the false impression of two CD8 populations in this sample gated as CD8 negative. SPADE treated this as a significant difference. Comparison of CD8 scaling for CD8 on cells gated as CD8- CD8 was measured on PE-Cy5 Corrected Scale Incorrect Scale Since computational analysis techniques compare distance similar to what a person does when looking at a plot, these techniques can identify artificial populations near zero if data are not appropriately transformed prior to analysis. arcsinh (inverse hyperbolic sine) scale http://mathworld.wolfram.com/InverseHyperbolicSine.html In order to analyze fluorescent datasets with SPADE, Cytobank applies channel-specific scales set during routine analysis (gating, making figures). Without channel specific scaling, population artifacts can arise (see CD8 example to the right). Analysis of CD4+ T cell subsets in post-transplant diabetes mellitus patients based on SPADE analysis of Foxp3, CD25, CD127, CD45R0, α4β7 integrin (gut homing), and CLA (skin homing). Rare gut and skin homing subpopulations of Foxp3+ CD25+ CD4+ T cells (~0.02% of total; 200 cells in 1 million) were identified by SPADE. After the SPADE analysis completes you can interact with the tree by selecting one or more nodes (populations of cells) and naming them according to phenotype. The tree to the right shows CD4 expression on CD45+ PBMC and the user creating a population named “CD4 T cells”. A small analysis of 18 features on <100,000 cells (as with the mass cytometry file below) takes ~3 min. Larger experiments with millions of cells and multiple samples can take 1 hour or more (as with the fluorescent dataset to the right). SPADE runs on a server and you can close the browser window after initiating the analysis. When the run is complete Cytobank sends an email with a link to the results. arcsinh(x) with cofactor c = CD45R0 CLA (skin homing) Foxp3 α4β7 integrin (gut homing) CD25 CD127 CD4 User selects Foxp3+ CD45R0+ CD4 Treg nodes No nodes selected, all cells plotted 1 2 1 2 All Human PBMC Events Intact PBMCs CD14- Viable PBMCs CD3+ CD4+

Upload: others

Post on 28-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 3Stanford University, Stanford, CA 94305, 4Mount Sinai ... · CYTO 2012 Web-based SPADE for extracting a cellular hierarchy from multidimensional fluorescence and mass cytometry datasets

2012YTOC

Web-based SPADE for extracting a cellular hierarchy from multidimensional fluorescence and mass cytometry datasets

Jonathan Irish1,2, Zach Bjørnson3, Robert Bruggner3, Michael Linderman4, Chad Rosenberg1, Nikesh Kotecha1,3

1Cytobank, Inc, Mountain View, CA 94040, 2Vanderbilt University, Nashville, TN 372203Stanford University, Stanford, CA 94305, 4Mount Sinai School of Medicine, New York, NY 10029

Introduction and Aims Cloud Computing & SPADE Interpreting SPADE TreesOverall goal: Develop a SPADE algorithm for discovery of cell populations in fluorescence and mass cytometry datasets and deploy it in the Cytobank web-based cloud computing environment.

SPADE = Spanning-tree Progression Analysis of Density-normalized Events, a technique for automated population identification and visualization.

Aims: (1) Automatically identify populations of cells in heterogeneous samples, including rare subsets

(2) Visualize markers measured across subsets of cells and samples & organize populations by phenotype.

(3) Provide an interface that connects investigators to datasets and cloud computing resources

SPADE Interface

SPADE Analysis of Fluorescence Cytometry

Optimizing Scales for Computational Analysis

SPADE Analysis of Mass Cytometry

MountSinai

1) Cloud computing enables analysis on a remote server without monopolizing local resources & links analysis results with raw data files.

2) Channel-specific scaling is critical prior to analysis by SPADE or other computational tools, especially for fluorescence cytometry.

3) These results create a cloud-based framework for integrating flow cytometry analysis algorithms, tools, and visualizations.

Conclusions For More Information

Irish Lab

SPADE

MassCytometry

[email protected]/irishlab

SPADE on Cytobank [email protected]

www.cytobank.org

Imagine as a 3D shapethe “OK sign”

SPADE will cluster and thenarrange the clusters in a 2D tree

Some 2D trees may be intuitive, while others may not be

Clustering Minimum Spanning Tree

Tree A

Appropriate channel-specific scaling is essential

Selecting nodes or a bubble displays those events in a 2D plot

Rare populations are automatically identified

Tree B Tree C2D tree “close” to original shape Different branch positioning Different breakpoint

SPADE trees depict multidimensional similarity in two dimensions

SPADE Controlson Cytobank

Pre-gating PBMC for CD45+ single cells

Set target clusters

Min:Cofactor:

Result:

Issue:

-30002500Good

-3000500Poor

-3000150Bad

Under-transformedOff scale

For fluorescent flow cytometry data a biexponential or arcsinh transformation corrects the scale near zero.

A 50:50 mix of + and - events stained only for PerCP-Cy5.5 is shown using different scales.

Over-transformed

-30001

Very Bad

12500Bad

-300010,000Poor

- target number of nodes in the tree

- by percentile or # of cells

- remove dead cells, set scales

- can cluster using all or a subset

- used to set ‘basal’ for foldchange calculations

Set downsampling

Optional: pre-gating

Choose channels

Optional: group samples

Qiu P et al., Nature Biotechnology 2011

1) Get High-DCytometry Data

3) Cluster(group by similarity)

4) Project into 2D(minimum spanning tree)

2) Downsample(preserves rare subsets)

Dataset from Engelhardt BG et al., Blood 2012

5

4

1

2 541

23

3

1) Connect flow cytometry tools and knowledge

2) Communicate results and share data files

3) Provide key details (compensation, scales) to computational collaborators4) Apply new algorithms & visualizations that run best with significant compute resources

Growing needs:

Cloud computing is needed as cytometry experiments increase in power

Sam

ple

Num

ber

Features Measured

100

1000+

50

104 10 20 50+

Future of Flow

Routine

CuttingEdge

BleedingEdge

Marker 1: Nucleic Acid

3.51 Singlet PBMCs

Marker 2: CD45

5.15 CD45+ PBMCs

Marker 3: CD3CD3+ CD45+

CD3- CD45+

3.05

Marker 4: CD4CD3+ CD45+

3.01

Monocytes & DCsCD4lo CD3- CD45+

CD3- CD45+

CD4+ T cellsCD4+ CD3+ CD45+

Marker 5: CD8

Monocytes & DCsCD4lo CD3- CD45+

3.38

CD3- CD45+

CD4+ T cellsCD4+ CD3+ CD45+ CD8-

CD8+ T cellsCD8+ CD3+ CD45+ CD4-

Marker 6: CD16

CD3- CD45+

2.55

MDC Group BCD16+

CD4lo CD3- CD45+

MDC Group ACD16lo

CD4lo CD3- CD45+

CD4+ T cellsCD4+ CD3+ CD45+ CD8-

NK cellsCD16+ CD3- CD45+

CD8+ T cellsCD8+ CD3+ CD45+ CD4-

Marker 7: CD19

B cellsCD19+ CD3- CD16- CD4- CD45+

4.03

MDC Group BCD16+

CD4lo CD3- CD45+

MDC Group ACD16lo

CD4lo CD3- CD45+

NK cellsCD16+ CD3- CD45+

CD4+ T cellsCD4+ CD3+ CD45+ CD8-

CD8+ T cellsCD8+ CD3+ CD45+ CD4-

Marker 8: CD20

4.02

MDC Group BCD16+

CD4lo CD3- CD45+

MDC Group ACD16lo

CD4lo CD3- CD45+

NK cellsCD16+ CD3- CD45+

B cellsCD19+ CD20+ CD3-

CD16- CD4- CD45+

CD4+ T cellsCD4+ CD3+ CD45+ CD8-

CD8+ T cellsCD8+ CD3+ CD45+ CD4-

Marker 9: CD14

2.13

NK cellsCD16+ CD3- CD45+

MonocytesCD14+ CD16lo

CD4lo CD3- CD45+

B cellsCD19+ CD20+ CD3-

CD16- CD4- CD45+

CD4+ T cellsCD4+ CD3+ CD45+ CD8-

CD8+ T cellsCD8+ CD3+ CD45+ CD4-

MDC Group BCD14- CD16+

CD4lo CD3- CD45+

Marker 10: CD33

1.10

NK cellsCD16+ CD3- CD45+

MDC Group BCD33lo CD14- CD16+

CD4lo CD3- CD45+

MonocytesCD33+ CD14+ CD16lo

CD4lo CD3- CD45+

B cellsCD19+ CD20+ CD3-

CD16- CD4- CD45+

CD4+ T cellsCD4+ CD3+ CD45+ CD8-

CD8+ T cellsCD8+ CD3+ CD45+ CD4-

Marker 11: CD123

NK cellsCD16+ CD3- CD45+

1.99

Dendritic CellsCD123+

CD33lo CD14- CD16+

CD4lo CD3- CD45+

MonocytesCD33+ CD14+ CD16lo

CD123lo

CD4lo CD3- CD45+

B cellsCD19+ CD20+ CD3-

CD16- CD4- CD45+

CD4+ T cellsCD4+ CD3+ CD45+ CD8-

CD8+ T cellsCD8+ CD3+ CD45+ CD4-

Summary of Major Cell Types Grouped by SPADE(Human PBMC)

5.15

NK cellsB cells

CD4+ T cellsCD8+ T cells

Dendritic Cells

Dendritic Cells

Monocytes

Mono

DC

DC

NKB

CD8+ TCD4+ T

Merged Nodes

Nucleic Acid (Ir), CD45, CD3, CD4, CD8,CD16, CD19, CD20, CD11b, CD123,CD45RA, CD33, CD11c, CD14, CD56,CD38, CD15, CD10, CD44, & HLA-DR

20 Markers:

All available commercially from DVS Sciences

All Human PBMC Events Singlet PBMCs CD45+ PBMCsNucleic Acid+ CD45+ Singlets from human PBMC were clustered using 18 measured surface markers and arranged into a 400 node SPADE tree. 11 markers are shown below.

Major populations included CD4+ and CD8+ T cells, B cells, NK cells, monocytes, and dendritic cells.

Here, CD8 was under-scaled so that an artifical ‘hole’ in the graph existed around zero. This created the false impression of two CD8 populations in this sample gated as CD8 negative. SPADE treated this as a significant difference.

Comparison of CD8scaling for CD8

on cells gated as CD8-

CD8 was measured on PE-Cy5

Corrected ScaleIncorrect Scale

Since computational analysis techniques compare distance similar to what a person does when looking at a plot, these techniques can identify artificial populations near zero if data are not appropriately transformed prior to analysis.

arcsinh (inverse hyperbolic sine) scalehttp://mathworld.wolfram.com/InverseHyperbolicSine.html

In order to analyze fluorescent datasets with SPADE, Cytobank applies channel-specific scales set during routine analysis (gating, making figures).

Without channel specific scaling, population artifacts can arise (see CD8 example to the right).

Analysis of CD4+ T cell subsets in post-transplant diabetes mellitus patients based on SPADE analysis of Foxp3, CD25, CD127, CD45R0, α4β7 integrin (gut homing), and CLA (skin homing).

Rare gut and skin homing subpopulations of Foxp3+ CD25+ CD4+ T cells (~0.02% of total; 200 cells in 1 million) were identified by SPADE.

After the SPADE analysis completes you can interact with the tree by selecting one or more nodes (populations of cells) and naming them according to phenotype.

The tree to the right shows CD4 expression on CD45+ PBMC and the user creating a population named “CD4 T cells”.

A small analysis of 18 features on <100,000 cells (as with the mass cytometry file below) takes ~3 min. Larger experiments with millions of cells and multiple samples can take 1 hour or more (as with the fluorescent dataset to the right). SPADE runs on a server and you can close the browser window after initiating the analysis. When the run is complete Cytobank sends an email with a link to the results.

arcsinh(x) with cofactor c =

CD45R0

CLA(skin homing)

Foxp3α4β7 integrin(gut homing)

CD25 CD127 CD4User selects Foxp3+ CD45R0+ CD4 Treg nodes

No nodes selected,all cells plotted1

2

1

2

All Human PBMC Events Intact PBMCs CD14- Viable PBMCs

CD3+ CD4+