multiplex single-molecule interaction profiling of dna...

14
LETTER doi:10.1038/nature13761 Multiplex single-molecule interaction profiling of DNA-barcoded proteins Liangcai Gu 1 , Chao Li 2 , John Aach 1 , David E. Hill 1,3 , Marc Vidal 1,3 & George M. Church 1,2 In contrast with advances in massively parallel DNA sequencing 1 , high-throughput protein analyses 2–4 are often limited by ensemble measurements, individual analyte purification and hence compro- mised quality and cost-effectiveness. Single-molecule protein detec- tion using optical methods 5 is limited by the number of spectrally non-overlapping chromophores. Here we introduce a single-molecular- interaction sequencing (SMI-seq) technology for parallel protein interaction profiling leveraging single-molecule advantages. DNA barcodes are attached to proteins collectively via ribosome display 6 or individually via enzymatic conjugation. Barcoded proteins are assayed en masse in aqueous solution and subsequently immobilized in a polyacrylamide thin film to construct a random single-molecule array, where barcoding DNAs are amplified into in situ polymerase colonies (polonies) 7 and analysed by DNA sequencing. This method allows precise quantification of various proteins with a theoretical maximum array density of over one million polonies per square mil- limetre. Furthermore, protein interactions can be measured on the basis of the statistics of colocalized polonies arising from barcoding DNAs of interacting proteins. Two demanding applications, G-protein coupled receptor and antibody-binding profiling, are demonstrated. SMI-seq enables ‘library versus library’ screening in a one-pot assay, simultaneously interrogating molecular binding affinity and specificity. To analyse proteins in a massively parallel single-molecule format, we generated proteins that are molecularly coupled to a DNA bearing a barcoding sequence. One barcoding approach is to translate and display proteins on protein–ribosome–messenger-RNA–complementary-DNA (PRMC) complexes in vitro, in which the cDNA contains a synthetic barcode at the 59 end of protein open reading frames (ORFs) (Fig. 1a). Specifically, the ribosome display was performed by using mRNA–cDNA hybrids as templates and an in vitro translation (IVT) system recon- stituted with purified components 8 that was shown to stabilize PRMC complexes (Extended Data Fig. 1). PRMC complexes bearing full-length proteins of interest were enriched by Flag-tag affinity purification. No- tably, this approach is applicable to a library of proteins of various sizes and size-related biases during decoding can be avoided by using uni- formly sized barcoding DNAs. Alternatively, some proteins that can only be functionally expressed in vivo require individual barcoding. Thus, fu- sion proteins were constructed with an engineered enzyme tag, HaloTag 9 , which mediates an efficient covalent conjugation to a HaloTag-ligand- modified double-stranded DNA (Fig. 1b). Our method is adaptable to a microtitre plate format for automated parallel protein production (Ex- tended Data Fig. 2). A complex mixture of barcoded proteins can be identified and quan- tified by in situ sequencing of their barcodes (Fig. 2a). The proteins were immobilized into an ultrathin layer of crosslinked polyacrylamide gel attached to a microscopic slide, and their barcoding DNAs bearing a59-acrydite modification (Fig. 1) were covalently crosslinked to the gel matrix to prevent template drifting (Extended Data Fig. 3). A solid-phase PCR, with two gel-anchored primers, was performed according to an adapted isothermal bridge amplification protocol 10 in an assembled flow cell. This amplification showed a high efficiency of ,80% barcode detection (Extended Data Fig. 4a), and resulted in polonies of ,1 mm diameter (Fig. 2b), similar to the clusters generated on an Illumina platform 10 . Polonies were identified by hybridization with fluorescent probes, single-base extension (SBE) or ligation-based sequencing 11 . To test the accuracy of our method, we selected nine immunoglob- ulin and non-immunoglobulin binding proteins and three antigens (for example, human, bacterial and viral proteins) of a molecular weight ranging from 3.4 to 120 kDa (Extended Data Table 1). Mixed PRMC complexes were prepared in six barcoded dilutions, with concentra- tions spanning six orders of magnitude, pooled together and subjected to the single-molecule quantification. Barcode detection efficiencies of different proteins were found to be almost identical at various concen- trations (Extended Data Fig. 4). The in situ single-molecule quantifi- cation can avoid PCR amplification bias 12 and shows high reproducibility; the Pearson correlation coefficient r was above 0.99 when over 1,000 protein polonies were detected (Fig. 2c). Because proteins were highly diluted (at less than picomolar concentrations) before array deposition, protein monomers should be the predominant form. Interacting barcoded proteins can be indirectly detected by joining their barcoding DNAs via ligation 13,14 or primer extension 15 . Here, direct observation and counting of single-molecule protein complexes should 1 Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA. 2 Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, Massachusetts 02115, USA. 3 Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA. a b HaloTag-labelled protein Protein–DNA conjugate Individual protein expression and purification 2) In vitro transcription mRNA–cDNA hybrids 1) In vitro translation 2) Flag tag affinity purification 1) PCR Reverse transcription Barcoded ORF library Barcode T7 promoter RBS TolA spacer Flag tag SecM peptide 555333Protein of interest mRNAs Universal priming site for reverse transcription PRMC complexes Flag or His tag Desthiobiotin modification Acrydite modification 555333Figure 1 | Schematics of protein barcoding methods. a, Collective barcoding via ribosome display. A short synthetic barcoding sequence is joined to the 59 end of DNA templates via PCR. PRMC complexes are formed via ribosome stalling triggered by a carboxy-terminal Escherichia coli SecM peptide. Displayed proteins bearing a C-terminal Flag tag are separated from the ribosomes by an E. coli TolA spacer domain. RBS, ribosomal binding site. b, Individual barcoding via a HaloTag-mediated conjugation of proteins to a 220-base-pair (bp) double-stranded barcoding DNA with a HaloTag ligand modification (black triangle). Modifications are introduced to barcoding DNAs by PCR with modified primers. 00 MONTH 2014 | VOL 000 | NATURE | 1 Macmillan Publishers Limited. All rights reserved ©2014

Upload: trinhmien

Post on 17-Jun-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multiplex single-molecule interaction profiling of DNA ...arep.med.harvard.edu/pdf/Gu_Nature_2014.pdf · Multiplex single-molecule interaction profiling of ... via ribosome display

LETTERdoi:10.1038/nature13761

Multiplex single-molecule interaction profiling ofDNA-barcoded proteinsLiangcai Gu1, Chao Li2, John Aach1, David E. Hill1,3, Marc Vidal1,3 & George M. Church1,2

In contrast with advances in massively parallel DNA sequencing1,high-throughput protein analyses2–4 are often limited by ensemblemeasurements, individual analyte purification and hence compro-mised quality and cost-effectiveness. Single-molecule protein detec-tion using optical methods5 is limited by the number of spectrallynon-overlapping chromophores. Here we introduce a single-molecular-interaction sequencing (SMI-seq) technology for parallel proteininteraction profiling leveraging single-molecule advantages. DNAbarcodes are attached to proteins collectively via ribosome display6

or individually via enzymatic conjugation. Barcoded proteins areassayed en masse in aqueous solution and subsequently immobilizedin a polyacrylamide thin film to construct a random single-moleculearray, where barcoding DNAs are amplified into in situ polymerasecolonies (polonies)7 and analysed by DNA sequencing. This methodallows precise quantification of various proteins with a theoreticalmaximum array density of over one million polonies per square mil-limetre. Furthermore, protein interactions can be measured on thebasis of the statistics of colocalized polonies arising from barcodingDNAs of interacting proteins. Two demanding applications, G-proteincoupled receptor and antibody-binding profiling, are demonstrated.SMI-seq enables ‘library versus library’ screening in a one-pot assay,simultaneously interrogating molecular binding affinity and specificity.

To analyse proteins in a massively parallel single-molecule format,we generated proteins that are molecularly coupled to a DNA bearing abarcoding sequence. One barcoding approach is to translate and displayproteins on protein–ribosome–messenger-RNA–complementary-DNA(PRMC) complexes in vitro, in which the cDNA contains a syntheticbarcode at the 59 end of protein open reading frames (ORFs) (Fig. 1a).Specifically, the ribosome display was performed by using mRNA–cDNAhybrids as templates and an in vitro translation (IVT) system recon-stituted with purified components8 that was shown to stabilize PRMCcomplexes (Extended Data Fig. 1). PRMC complexes bearing full-lengthproteins of interest were enriched by Flag-tag affinity purification. No-tably, this approach is applicable to a library of proteins of various sizesand size-related biases during decoding can be avoided by using uni-formly sized barcoding DNAs. Alternatively, some proteins that can onlybe functionally expressed in vivo require individual barcoding. Thus, fu-sion proteins were constructed with an engineered enzyme tag, HaloTag9,which mediates an efficient covalent conjugation to a HaloTag-ligand-modified double-stranded DNA (Fig. 1b). Our method is adaptable toa microtitre plate format for automated parallel protein production (Ex-tended Data Fig. 2).

A complex mixture of barcoded proteins can be identified and quan-tified by in situ sequencing of their barcodes (Fig. 2a). The proteinswere immobilized into an ultrathin layer of crosslinked polyacrylamidegel attached to a microscopic slide, and their barcoding DNAs bearinga 59-acrydite modification (Fig. 1) were covalently crosslinked to the gelmatrix to prevent template drifting (Extended Data Fig. 3). A solid-phasePCR, with two gel-anchored primers, was performed according to anadapted isothermal bridge amplification protocol10 in an assembled flowcell. This amplification showed a high efficiency of ,80% barcode

detection (Extended Data Fig. 4a), and resulted in polonies of ,1 mmdiameter (Fig. 2b), similar to the clusters generated on an Illuminaplatform10. Polonies were identified by hybridization with fluorescentprobes, single-base extension (SBE) or ligation-based sequencing11.

To test the accuracy of our method, we selected nine immunoglob-ulin and non-immunoglobulin binding proteins and three antigens (forexample, human, bacterial and viral proteins) of a molecular weightranging from 3.4 to 120 kDa (Extended Data Table 1). Mixed PRMCcomplexes were prepared in six barcoded dilutions, with concentra-tions spanning six orders of magnitude, pooled together and subjectedto the single-molecule quantification. Barcode detection efficiencies ofdifferent proteins were found to be almost identical at various concen-trations (Extended Data Fig. 4). The in situ single-molecule quantifi-cation can avoid PCR amplification bias12 and shows high reproducibility;the Pearson correlation coefficient r was above 0.99 when over 1,000protein polonies were detected (Fig. 2c). Because proteins were highlydiluted (at less than picomolar concentrations) before array deposition,protein monomers should be the predominant form.

Interacting barcoded proteins can be indirectly detected by joiningtheir barcoding DNAs via ligation13,14 or primer extension15. Here, directobservation and counting of single-molecule protein complexes should

1Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA. 2Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, Massachusetts 02115, USA.3Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA.

a

b HaloTag-labelled protein

Protein–DNA conjugate Individual protein

expression and purification

2) In vitro transcription

mRNA–cDNA hybrids 1) In vitro

translation

2) Flag tag affinity

purification

1) PCR

Reverse

transcription

Barcoded ORF library

Barcode

T7 promoter

RBS

TolA spacer

Flag tag

SecM peptide

5′5′5′

3′3′3′

Protein of interest

mRNAs

Universal priming site for reverse transcription

PRMC complexes

Flag or His tag Desthiobiotin modification Acrydite modification

5′5′5′

3′3′3′

Figure 1 | Schematics of protein barcoding methods. a, Collective barcodingvia ribosome display. A short synthetic barcoding sequence is joined to the 59

end of DNA templates via PCR. PRMC complexes are formed via ribosomestalling triggered by a carboxy-terminal Escherichia coli SecM peptide.Displayed proteins bearing a C-terminal Flag tag are separated from theribosomes by an E. coli TolA spacer domain. RBS, ribosomal binding site.b, Individual barcoding via a HaloTag-mediated conjugation of proteins to a220-base-pair (bp) double-stranded barcoding DNA with a HaloTag ligandmodification (black triangle). Modifications are introduced to barcodingDNAs by PCR with modified primers.

0 0 M O N T H 2 0 1 4 | V O L 0 0 0 | N A T U R E | 1

Macmillan Publishers Limited. All rights reserved©2014

Page 2: Multiplex single-molecule interaction profiling of DNA ...arep.med.harvard.edu/pdf/Gu_Nature_2014.pdf · Multiplex single-molecule interaction profiling of ... via ribosome display

be possible if barcoding DNAs of interacting proteins can be amplifiedinto colocalized polonies. To test this, we generated DsRed, which natu-rally forms a tetramer, with monomers each bearing one of two differ-ent barcodes. To avoid dissociation of any complexes during the arrayanalysis, we crosslinked them with an amine-reactive crosslinker, bis-N-succinimidyl-(pentaethylene glycol) ester (BS(PEG)5). The cross-linking was shown to be efficient due to the presence of a lysine-richTolA spacer domain (Fig. 1a and Extended Data Fig. 5). It is evidentthat barcoding DNAs of the colocalized monomers (DsReda and DsRedb)were co-amplified into overlapping polonies (Fig. 3a), providing a solidbasis for further applications.

Unlike other methods that only detect affinity-enriched proteins (forexample, PLATO16), our approach simultaneously counts polonies ofboth unbound and bound proteins in a single solution. Thus, we soughtto determine if it can provide a measure of protein binding affinities.We chose a model system, the GTP-dependent binding of human H-Ras (Ras) to Ras-binding domain of c-Raf-1 (Raf-RBD)17. A Raf-RBDpolony colocalization ratio—the percentage of Raf-RBD polonies colo-calized with Ras polonies—was measured for wild-type Ras and Raf-RBD and eight Raf-RBD mutants; the Ras protein concentration wastitrated over three orders of magnitude (Fig. 3b). Although the coloca-lization ratio is dependent on protein concentration and crosslinkingefficiency and can be affected by experimental variables (protein quality,crosslinking conditions, polony array density, etc.), all the proteins with-in a single assay are under the same reaction conditions. Given a similarproportion of active protein and crosslinking efficiency, polony colo-calization ratios could be correlated with ratios of bound proteins atequilibrium and thus their binding affinities. To test this, the coloca-lization ratios were plotted against previously reported dissociation con-stants (Kd values) ranging from nanomolar to micromolar values18,19

(Fig. 3b and Supplementary Table 1), and fitted by using a one-site-specific binding model (dashed curves). The fitted and observed average

colocalization ratios show relatively high agreement (r . 0.96), exceptfor the A85K mutant which displayed significantly lower experimentalvalues than predicted by the model, probably owing to the disruptionof Lys 85-mediated interactions19 by the crosslinking. Therefore, thismethod could be useful for high-throughput screening of protein bind-ing affinities.

As a first high-throughput screening application, we investigated smallmolecule-mediated protein–protein interactions. An advantage of ourmethod over traditional solid-phase techniques such as protein micro-arrays3 is that we store and assay proteins in an aqueous solution. Toexploit this, we decided to address the challenges in screening G-proteincoupled receptors (GPCRs), the largest membrane protein family andpremier drug targets20. Current GPCR–ligand screening techniques mainlyrely on cell-based assays21, which are subject to limitations such as theheterogeneous nature of the samples, the presence of other cellularcomponents that can cause false positives or negatives, and limited min-iaturization and multiplexing capability (for example, one receptor per

a b

50 μm In situ amplification

of barcoding DNA to

polonies

Immobilization of

barcoded proteins in

PAA gel

Polony fluorescent

sequencing

< 5 μm

100 101 102 103 104 105 106 107100

101

102

103

104

105

106

107

scFv

Po

lony n

um

ber

(rep

licate

2)

Polony number (replicate 1)

Nanobody

Adnectin

Affibody

DARPin

Anticalin

Knottin

TUBE

HB36

GP120

IFNB1

Toxin A

1

1

0.67

0.84

0.99

r

1

c

Figure 2 | Amplification and quantification of barcoding DNAs.a, Schematic of in situ polony amplification and sequencing. Barcoded proteinswere immobilized in a polyacrylamide (PAA) gel matrix attached to aBind-Silane-treated glass slide. The slide was assembled into a flow cell, wherebarcoding DNAs were amplified in situ into polonies for DNA sequencing.b, Representative merged images of polonies hybridized with Cy5 (red),Cy3 (green) and fluorescein (blue)-labelled oligonucleotides (320 objectivemagnification). c, Polony quantification of mixed protein binders and antigens.The Pearson correlation coefficient r was calculated for different coveragesgrouped by dotted lines.

a

10–8 10–7 10–6 10–5

0

10

20

30

Kd (M)

Raf-

RB

D p

olo

ny

co

localiz

atio

n r

atio

(%

)

2 μM Ras

200 nM

20 nM

WT

R59A

A85K

N64A

V

69A

T

68A

K84A

R67A

Q66A

Raf-RBD

DsReda

Cy5 DsRedb

Cy3 Merged

20 μm

b

c

d

e

0

1

2

3

AD

RB

2 p

olo

ny

co

localiz

atio

n r

atio

(%

)

β-arr2 (nM)

Phosphorylated Non-phosphorylated

Barcoded GPCR nanodiscs Barcode A

β-arr2 P P

A

Ligand: 1 2 3

Barcoded β-arr2: A B C

Agonist binding

Alpre

nolo

l

Pindol

ol

Isop

rote

reno

l

Atropin

e

Xanom

eline

Car

bacho

l0.0

0.5

1.0

1.5

2.0

2.5

0

1

2

3

4

5

0

2

4

6

GP

CR

po

lony c

o-l

ocaliz

atio

n r

atio

(%

)

ADRB2

CHRM1

CHRM2

***

*** **

***

***

0.5 1.0 2.5 5.0 10 15

Figure 3 | Analyses of protein interactions via polony colocalization.a, Interaction of DsRed subunits resulted in colocalized polonies. Polonies ofthe differently barcoded subunits, DsReda and DsRedb, were identified by SBEwith, respectively, Cy5 (red) or Cy3 (green)-labelled dideoxynucleotidetriphosphates (ddNTPs). b, Correlation between the polony colocalizationratios and Kd values of Ras–Raf-RBD complexes. Means of measurements at100 imaging positions 6 95% confidence level (CL; refer to SupplementaryTable 1). Fitting equation, R~Rmax|P= KdzPð Þ, where R is the predictedRaf-RBD polony colocalization ratio, Rmax is the maximum polonycolocalization ratio when Raf-RBD is saturated by Ras, and P is the Rasconcentration. WT, wild type. c, Schematic of multiplex GPCR screening andcompound profiling by the binding assay of mixed barcoded GPCRs withbarcoded b-arr2. d, Comparison of b-arr2 binding to isoproterenol-activatedb2-adrenergic receptor with or without GRK2-mediated phosphorylation.Titration data of b-arr2 were fitted by the one-site-specific model usingGraphPad Prism 6. e, Parallel GPCR binding profiling. Data represent meanvalues of 50 measurements; error bars, 95% CL, highlighted in red for agonists(refer to Supplementary Table 2). **P , 0.01, ***P , 0.001, one-tailedpaired Student’s t-test.

RESEARCH LETTER

2 | N A T U R E | V O L 0 0 0 | 0 0 M O N T H 2 0 1 4

Macmillan Publishers Limited. All rights reserved©2014

Page 3: Multiplex single-molecule interaction profiling of DNA ...arep.med.harvard.edu/pdf/Gu_Nature_2014.pdf · Multiplex single-molecule interaction profiling of ... via ribosome display

assay). To prepare a homogeneous single-molecule GPCR sample com-patible with our approach, receptors were stabilized in phospholipidbilayer nanodiscs22 by assembling detergent-solubilized GPCRs, phos-pholipids and a membrane scaffold protein, MSP1E3D1, into GPCR–nanodisc complexes23,24. GPCR activation upon ligand binding can befunctionally assessed byb-arrestin binding to activated receptors, whichis a G-protein-independent assay applicable to almost all GPCRs,including orphan receptors25.

A compound library can be screened in multi-well plates, and ineach well one compound is assayed with many barcoded GPCRs and ab-arrestin-2 (b-arr2) protein bearing a well-position-associated barcode(Fig. 3c). All the samples were pooled and deposited on one slide, andGPCR agonists were detected by measuring GPCR polony colocaliza-tion with correspondingb-arr2 polonies. Our efforts to obtain functionalGPCRs using IVT systems were not successful, so they were expressedin baculovirus-infected insect cells, purified in nanodiscs and individ-ually barcoded (Fig. 1b). To establish assay conditions, we examinedb-arr2 binding to an agonist (isoproterenol)-saturated b2-adrenergicreceptor (ADRB2), with and without GPCR kinase 2 (GRK2)-mediatedreceptor phosphorylation and under varied b-arr2 protein concentra-tions (Fig. 3d). The colocalization ratios were measured at 50 imagingpositions on the array for statistical analysis. As expected, coupling thereceptor phosphorylation to the assay improves the b-arr2 binding; 3-to 11-fold increases (largest P 5 0.002) of the average colocalizationratios after phosphorylation were observed. The fitting of b-arr2 titra-tion data for the phosphorylated receptor yielded an apparent Kd of0.95 nM, which is close to the Kd of 0.23 nM obtained from traditionalbinding assays using radiolabelled b-arr226.

To test the screening performance, we assayed three GPCRs, ADRB2,M1 and M2 muscarinic acetylcholine receptors (CHRM1 and CHRM2),with six compounds including full, partial, subtype-selective and non-selective agonists and two antagonists (Fig. 3e and Supplementary Table 2).The colocalization statistical analysis based on measurements of ,13,000–17,000 polonies for each receptor precisely identified the full agonists(isoproterenol and carbachol) from the others (largest P , 2.7 3 10210).Moreover, different types of agonists can be distinguished by compar-ing their polony colocalization ratios, for example, the full and partialagonists of ADRB2 (isoproterenol and pindolol, respectively; P , 0.004),and the orthosteric and allosteric agonists of CHRM1 (carbachol andxanomeline, respectively; P , 3 3 1026). Thus, our method could allowparallel GPCR screening and compound profiling.

An intriguing feature of this approach is the ability to screen twobarcoded libraries in a single binding assay. Established techniques (forexample, yeast two-hybrid systems2) for library versus library screen-ing are cell-based and require pairing both genes from two libraries toidentify positive clones by performing individual PCR reactions27. Todemonstrate this capability, we prototyped a test of a demanding appli-cation, the binding profiling of an antibody repertoire. The screeningof natural or semisynthetic monoclonal antibody libraries essentially in-cludes binding affinity selection and specificity profiling, which have tobe conducted separately with current techniques. The traditional spe-cificity profiling is costly, usually requiring at least one protein chip fora single antibody test28, and thus has only been commercially applied totherapeutic antibodies. However, both processes could be integrated onour platform by screening an antibody library with a target-protein library.

Specifically, we performed a one-pot assay containing 200 ribosome-displayed single-chain variable fragments (scFvs) and 55 human pro-teins including cytokines, growth factors and receptors synthesizedin vitro (Extended Data Table 2). Twenty scFvs were derived by randommutagenesis from each of ten scFvs, the genes of which were previouslysynthesized from a programmable DNA microchip29. We sequenced,0.64 million polonies and measured the colocalization ratios for 11,000scFv–target protein (probe) pairs at 100 imaging positions (Fig. 4a andSupplementary Table 3). Of 200 scFvs, 147 were found with the highestcolocalization ratios, 95 of which are significantly above the secondhighest (P , 0.05), and thus the highest specificity, to their predicted

targets; the others failed probably because the construction of scFv frag-ments and mutations inhibit target binding. Substantial cross-reactivitycan be sensitively detected, for example, 3,474 scFv–probe pairs showedtenfold higher polony colocalization than random distribution (P , 0.01).scFv mutants of a same scFv, grouped by their numbers, exhibit similarbut not identical binding patterns to the probes. Next, we confirmed theresults of 40 scFv–probe pairs by individual immunoprecipitation assaysand the colocalization statistics were consistent with relative fluor-escence intensities of the probe protein bands (Fig. 4b). Moreover, tofurther assess multiplexing potential, we developed a mathematicalmodel that integrated parameters including Kd values of protein–probecomplexes to be detected and numbers of proteins and probes that canbe assayed simultaneously (Supplementary Notes). The model suggeststhat tens of thousands of proteins and probes can be quantifiably ana-lysed within a single assay.

The protein barcoding requirement imposes limitations on SMI-seq. First, it cannot directly analyse proteins from biological samples.However, non-barcoded proteins can be detected in a similar fashion byusing barcoded antibodies or aptamers as part of a proximity ligationassay13,14. In addition, PRMC complexes are susceptible to nuclease con-tamination, thus limiting the choice of IVT systems. Finally, barcodingDNA can non-specifically bind to proteins bearing nucleic-acid-bindingdomains. Although in the present study DNA templates were individ-ually barcoded, a large library can be prepared by introducing millionsof chip-synthesized29 or random barcoding sequences to an open read-ing frame (ORF) library by a single PCR reaction and later matchingthem to ORF sequences by next-generation sequencing. SMI-seq enablessingle-molecule counting of proteins and complexes in situ, fundament-ally improving sensitivity, accuracy and multiplexity (Extended DataTable 3 and Supplementary Discussion), and thus the demonstratedapplications are difficult or impossible to perform with other high-throughput techniques (for example, PLATO). It is readily adaptableto industrial next-generation sequencing platforms and translated into

a

10 20 30 40 50

20

40

60

80

100

120

140

160

180

200

Human proteins (probes)

scF

vs

< 0.2 > 2 0.6 1.8 1.0 1.4

Average scFv polony

colocalization ratio (%)

b

0

10

20

0

10

20

0

10

20

0

10

20IL1B

TNFRSF10B

VEGFA

NGF

8

24

48

71

86

106

134

146

168

182

scFvs

scF

v p

olo

ny c

olo

caliz

atio

n r

atio

(%

)

***

***

***

Figure 4 | Parallel antibody binding profiling. a, Heat map of themean colocalization ratios measured at 100 imaging positions (refer toSupplementary Table 3). ScFvs sharing the same origins were grouped by theirnumbers (Extended Data Table 2). b, Correlation between the polonycolocalization statistics and the scFv immunoprecipitation results. For theimmunoprecipitation assay, selected scFvs were fused to a C-terminalstreptavidin binding peptide tag and bound to streptavidin magnetic beads topull down human protein probes bearing a HaloTag, which can be labelled byHalo- tetramethylrhodamine (TMR) for fluorescent gel imaging. Error bars,95% CL, highlighted in red for specific scFv–probe binding. ***P , 0.001,one-tailed paired Student’s t-test.

LETTER RESEARCH

0 0 M O N T H 2 0 1 4 | V O L 0 0 0 | N A T U R E | 3

Macmillan Publishers Limited. All rights reserved©2014

Page 4: Multiplex single-molecule interaction profiling of DNA ...arep.med.harvard.edu/pdf/Gu_Nature_2014.pdf · Multiplex single-molecule interaction profiling of ... via ribosome display

many applications. In addition to natural and recombinant proteins, itwill be applicable to de novo proteins (for example, with unnatural aminoacids or modifications), nucleic acids and barcoded small molecules30.

Online Content Methods, along with any additional Extended Data display itemsandSourceData, are available in the online version of the paper; references uniqueto these sections appear only in the online paper.

Received 11 February; accepted 8 August 2014.

Published online 21 September 2014.

1. Shendure, J. & Ji, H. Next-generation DNA sequencing. Nature Biotechnol. 26,1135–1145 (2008).

2. Dreze, M. et al. High-quality binary interactome mapping. Methods Enzymol. 470,281–315 (2010).

3. MacBeath, G. & Schreiber, S. L. Printing proteins as microarrays forhigh-throughput function determination. Science 289, 1760–1763 (2000).

4. Gavin, A. C. et al. Functional organization of the yeast proteome by systematicanalysis of protein complexes. Nature 415, 141–147 (2002).

5. Weiss, S. Fluorescence spectroscopy of single biomolecules. Science 283,1676–1683 (1999).

6. Hanes, J. & Pluckthun, A. In vitro selection and evolution of functional proteins byusing ribosome display. Proc. Natl Acad. Sci. USA 94, 4937–4942 (1997).

7. Mitra, R. D. & Church, G. M. In situ localized amplification and contact replication ofmany individual DNA molecules. Nucleic Acids Res. 27, e34–e39 (1999).

8. Shimizu, Y. et al. Cell-free translation reconstituted with purified components.Nature Biotechnol. 19, 751–755 (2001).

9. Los, G. V. et al. HatoTag: A novel protein labeling technology for cell imaging andprotein analysis. ACS Chem. Biol. 3, 373–382 (2008).

10. Bentley, D. R. et al. Accurate whole human genome sequencing using reversibleterminator chemistry. Nature 456, 53–59 (2008).

11. Shendure, J. et al. Accurate multiplex polony sequencing of an evolved bacterialgenome. Science 309, 1728–1732 (2005).

12. Aird, D. et al. Analyzing and minimizing PCR amplification bias in Illuminasequencing libraries. Genome Biol. 12, R18 (2011).

13. Fredriksson, S. et al. Protein detection using proximity-dependent DNA ligationassays. Nature Biotechnol. 20, 473–477 (2002).

14. Hammond,M., Nong,R.Y., Ericsson,O., Pardali, K.& Landegren,U.Profiling cellularprotein complexes by proximity ligation with dual tag microarray readout.PLoS ONE 7, e40405 (2012).

15. Lundberg, M., Eriksson, A., Tran, B., Assarsson, E. & Fredriksson, S. Homogeneousantibody-based proximity extension assays provide sensitive and specificdetection of low-abundant proteins in human blood. Nucleic Acids Res. 39, e102(2011).

16. Zhu, J. et al. Protein interaction discovery using parallel analysis of translated ORFs(PLATO). Nature Biotechnol. 31, 331–334 (2013).

17. Vetter, I. R. & Wittinghofer, A. The guanine nucleotide-binding switch in threedimensions. Science 294, 1299–1304 (2001).

18. Block, C., Janknecht, R., Herrmann, C., Nassar, N. & Wittinghofer, A. Quantitativestructure-activity analysis correlating Ras/Raf interaction in vitro to Raf activationin vivo. Nature Struct. Biol. 3, 244–251 (1996).

19. Kiel, C. et al. Improved binding of Raf to Ras?GDP is correlated with biologicalactivity. J. Biol. Chem. 284, 31893–31902 (2009).

20. Overington, J. P., Al-Lazikani, B. & Hopkins, A. L. How many drug targets are there?Nature Rev. Drug Discov. 5, 993–996 (2006).

21. Zhang, R. & Xie, X. Tools for GPCR drug discovery. Acta Pharmacol. Sin. 33,372–384 (2012).

22. Denisov, I.G., Grinkova, Y. V., Lazarides, A.A.& Sligar, S.G.Directedself-assemblyofmonodisperse phospholipid bilayer nanodiscs with controlled size. J. Am. Chem.Soc. 126, 3477–3487 (2004).

23. Leitz, A. J., Bayburt, T. H., Barnakov, A. N., Springer, B. A. & Sligar, S. G. Functionalreconstitution of b2-adrenergic receptors utilizing self-assembling nanodisctechnology. Biotechniques 40, 601–612 (2006).

24. Whorton, M. R. et al. A monomeric G protein-coupled receptor isolated in ahigh-density lipoprotein particle efficiently activates its G protein. Proc. NatlAcad. Sci. USA 104, 7682–7687 (2007).

25. Luttrell, L. M. & Lefkowitz, R. J. The role of b-arrestins in the termination andtransduction of G-protein-coupled receptor signals. J. Cell Sci. 115, 455–465(2002).

26. Gurevich, V. V. et al. Arrestin interactions with G-protein-coupled receptors - Directbinding-studies of wild-type and mutant arrestins with rhodopsin, b2-adrenergic,and m2-muscarinic cholinergic receptors. J. Biol. Chem. 270, 720–731 (1995).

27. Yu, H. et al. Next-generation sequencing to generate interactome datasets. NatureMethods 8, 478–480 (2011).

28. Michaud, G. A. et al. Analyzing antibody specificity with whole proteomemicroarrays. Nature Biotechnol. 21, 1509–1512 (2003).

29. Kosuri, S. et al. Scalable gene synthesis by selective amplification of DNA poolsfrom high-fidelity microchips. Nature Biotechnol. 28, 1295–1299 (2010).

30. McGregor, L. M., Jain, T. & Liu, D. R. Identification of ligand-target pairs fromcombined libraries of small molecules and unpurified protein targets in celllysates. J. Am. Chem. Soc. 136, 3264–3270 (2014).

Supplementary Information is available in the online version of the paper.

Acknowledgements This work was supported by a grant from the US Department ofEnergy (DE-FG02-02ER63445) toG.M.C. andagrant fromtheNIHNHGRI (HG001715)to M.V. and D.E.H. L.G. was supported by a postdoctoral fellowship from the Jane CoffinChilds Fund for Medical Research and a grant from Harvard Origins of Life Initiative.We thank W. Harper, L. Pontano Vaites, S. Elledge and J. Zhu for providing plasmids,F. Vigneault for comments on the manuscript, J. Lai and D. Breslau for assistance onimaging setup, and members of the Church and Vidal groups for constructivediscussions.

Author Contributions L.G. andG.M.C. conceived the technique;L.G. and C.L. performedexperiments and analysed data; J.A. built the mathematical model and assisted thecolocalization analyses; D.E.H. and M.V. assisted the production of barcoded proteins;L.G., J.A. and G.M.C. wrote the manuscript with help from the other authors.

Author Information MATLAB scripts for imaging analyses, colocalization statistics andmathematical modelling can be found at http://arep.med.harvard.edu/SMI-Seq/.Reprints and permissions information is available at www.nature.com/reprints.The authors declare competing financial interests: details are available in the onlineversion of the paper. Readers are welcome to comment on the online version ofthe paper. Correspondence and requests for materials should be addressed toL.G. ([email protected]) or G.M.C. ([email protected]).

RESEARCH LETTER

4 | N A T U R E | V O L 0 0 0 | 0 0 M O N T H 2 0 1 4

Macmillan Publishers Limited. All rights reserved©2014

Page 5: Multiplex single-molecule interaction profiling of DNA ...arep.med.harvard.edu/pdf/Gu_Nature_2014.pdf · Multiplex single-molecule interaction profiling of ... via ribosome display

METHODSDNA construction. Protein coding sequences were synthesized by Genewiz andIntegrated DNA Technologies, PCR amplified from plasmids or genomic DNA, ortransferred from Gateway-adapted human ORF clones31 (refer to SupplementaryTable 4 for DNA sources, sequences and construction methods) and inserted intoa multiple cloning site or Gateway recombination sites of expression vectors (referto Supplementary Fig. 1 for plasmid construction and Supplementary Table 5 forplasmid and primer sequences) for in vitro or in vivo protein expression.Ribosome-display-based protein barcoding. To barcode protein libraries of rel-atively small size (# 200 in this work), synthetic barcoding sequences were intro-duced to DNA templates via individual PCRs. Barcoded linear DNA templates werepooled and transcribed in vitro using a HiScribe T7 kit (New England Biolabs). Togenerate mRNA–cDNA hybrids, cDNAs were synthesized by incubating ,0.10mMmRNA, 1mM 59-acrydite and desthiobiotin-modified primer, 0.5 mM of each de-oxynucleotide triphosphate (dNTP), 10 Uml21 SuperScript III, 2 Uml21 RNaseOUT(Invitrogen) and 5 mM dithiothreitol (DTT) in a buffer (50 mM Tris-HCl, pH 8.3,75 mM KCl, and 5 mM MgCl2) at 50 uC for 30 min. Synthesized mRNA–cDNAhybrids serve as templates for ribosome display using a PURExpressD ribosome kit(New England Biolabs). Typically, a 100ml IVT reaction with ,0.40mM mRNA–cDNA hybrids and ,0.30mM ribosome was incubated at 37 uC for 30 min, quenchedby addition of 100ml ice-cold buffer HKM (50 mM HEPES, pH 7.0, 250 mM KOAc,25 mM Mg(OAc)2, 0.25 U ml21 RNasin (Promega), 0.5 mg ml21 chloramphenicol,5 mM 2-mercaptoethanol and 0.1% (v/v) Tween 20) and centrifuged (14,000g,4 uC) for 10 min to remove insoluble components. PRMC complexes, always kepton ice or in a cold room, were subjected to two-step Flag tag and desthiobiotin tagaffinity purification to enrich full-length and barcoded target proteins. In brief, pro-teins were sequentially purified using anti-Flag M2 magnetic beads (Sigma-Aldrich)and streptavidin-coated magnetic beads (Dynabeads M-270 Streptavidin, Life Tech-nologies) blocked with the buffer HKM supplemented with 0.1 mg ml21 yeast trans-fer RNA and 10 mg ml21 BSA. The bound proteins were eluted with the bufferHKM containing 0.1 mg ml21 Flag peptide or 5 mM biotin, and their barcodingDNAs were quantified by real-time PCR.Protein expression and purification and HaloTag-based barcoding. His-taggedHaloTag–TolA, HaloTag–DsRed–TolA, HaloTag–mCherry–TolA, Ras–TolA–HaloTagandb-arr2–TolA–HaloTag were expressed in E. coli or using an E. coli IVT system.Proteins were expressed in an OverExpress C41(DE3) strain (Lucigen) with 1 mMisopropyl-b-D-galactopyranoside (IPTG) induction at 30 uC for 8–10 h, and puri-fied using immobilized metal affinity chromatography (IMAC) at 4 uC. In brief, har-vested cells were resuspended in 50 mM sodium phosphate, pH 8.0, 300 mM NaCl,10 mM imidazole and 20% glycerol, and disrupted by French press. Supernatantsof cell lysates were loaded on a 5 ml HisTrap column (GE Healthcare) and non-specifically bound components were washed off with 50 mM sodium phosphate,pH 8.0, 300 mM NaCl, 20 mM imidazole and 10% glycerol. Proteins were eluted with50 mM sodium phosphate, pH 8.0, 300 mM NaCl, 250 mM imidazole and 10% glyc-erol, concentrated with Amicon Ultra-15 centrifugal filter units (Millipore), bufferexchanged to a storage buffer (50 mM HEPES, pH 7.0, 150 mM KOAc and 20% glyc-erol) using a PD10 desalting column (GE Healthcare), flash frozen in 100–500 mlaliquots by liquid N2 and stored at 280 uC. Relatively small amounts of proteinswere typically synthesized in an E. coli crude extract (RTS 100 E. coli HY, 5 PRIME)at 30 uC for ,4 h, and similarly purified with His-tag magnetic beads (DynabeadsHis-tag, Life Technologies).

Human ADRB2, CHRM1 and CHRM2 were expressed in baculovirus-infectedSf9 cells (Life Technologies), solubilized with detergents as previously described32,33,and assembled into GPCR–nanodisc complexes followed by affinity purification34.In brief, GPCR genes were synthesized and inserted into a pBac–NFlagHA vectorfor the expression of the fusion proteins bearing amino-terminal Flag and hemag-glutinin (HA) epitope tags and a HaloTag domain. Cells were harvested at 2 daysafter transfection, homogenized in a 50 mM Tris-HCl, pH 7.4, 50 mM NaCl and1 mM EDTA with a protease inhibitor cocktail (Roche), and centrifuged to collectmembrane fractions. The membranes were solubilized in 50 mM Tris-HCl, pH 7.4,150 mM NaCl, 5 mM CaCl2, 5 mM MgCl2, 2 mM EDTA, 10% glycerol, 1% N-dodecyl-b-D-maltopyranoside (DDM) and a protease inhibitor cocktail (Set III, EMD Bio-sciences), and centrifuged at 15,000g for 15 min; the supernatants were subjectedto a bicinchoninic acid assay (Thermo Scientific) to determine the protein concen-tration. The nanodiscs were assembled by incubating 90 mM MSP1E3D1 (Sigma-Aldrich), 8 mM POPC, 40 mM DDM and 180mg total membrane protein in 50 mMTris pH 7.4, 150 mM NaCl, 5 mM CaCl2, 5 mM MgCl2, 2 mM EDTA and 2% glyc-erol on ice for 45 min, followed by removal of the detergent using Bio-Beads SM-2(Bio-Rad)34. GPCR–nanodisc complexes were bound to anti-Flag M1 agarose resin(Sigma-Aldrich) and eluted with a conjugation buffer (50 mM HEPES, pH 7.5,150 mM NaCl, 2 mM EDTA and 5% glycerol) in the presence of 0.2 mg ml21 Flagpeptide.

Human proteins were synthesized in vitro with a human IVT kit (Thermo Sci-entific). Proteins were individually translated at 30 uC for 2 h and purified with theanti-Flag M2 or His-tag magnetic beads. Membrane proteins were stabilized by ad-dition of preassembled nanodiscs (2ml MembraneMax reagent per 50ml reaction,Life Technologies). To semi-quantifiably analyse HaloTag fusion proteins, theirHaloTag domains were covalently labelled with a fluorescent reporter Halo-TMR(Promega) and analysed by SDS–PAGE and the fluorescent gel imaging with aTyphoon Trio Imager (GE Healthcare).

Barcoding DNAs of 220 bp in length were prepared in parallel by adding bar-coding sequences via PCR with a universal template and barcoded primers, andintroducing the modifications by a secondary PCR with the modified primers (In-tegrated DNA Technologies). A HaloTag ligand was conjugated to the primer byincubating 100mM amino-modified oligonucleotide and 10 mM succinimidyl ester(O4) ligand (Promega) in 50 mM Na2HPO4, pH 8.0, 150 mM NaCl and 50% for-mamide at room temperature for 1 h; the modified oligonucleotide was purified byreverse-phase high-performance liquid chromatography using a Zorbax EclipseXDB-C18 column (5mm, 9.4 3 250 mm, Agilent Technologies) and an elution gra-dient of 5–70% CH3CN/H2O (0.1 M triethylammonium acetate). To generate protein–DNA conjugates, we typically incubated ,0.5–2mM modified barcoding DNAs and,2–5mM HaloTag-labelled proteins in the conjugation buffer with gentle shakingat room temperature for 2–4 h; the conjugates were purified with the anti-Flag M2or His-tag magnetic beads and the streptavidin magnetic beads. Barcoded proteinswere eluted in assay buffers (see below) in the presence of 5 mM biotin.Ras–Raf-RBD binding assay. Prior to the barcoding, the E. coli-expressed and pu-rified Ras protein was saturated with a non-hydrolysable GTP analogue, Gpp(NH)p,by EDTA-enhanced nucleotide exchange as previously described35. Mixed wild-typeand mutant Raf-RBD (2 nM) displayed on PRMC complexes were incubated withdifferent concentrations of barcoded Ras in 50 mM HEPES, pH 7.5, 100 mM NaCl,10 mM MgCl2, 0.5 mM DTT, 0.1% (v/v) Tween 20 and 0.5 mM Gpp(NH)p for 1 h.After reaching equilibrium, Ras–Raf-RBD complexes were crosslinked with 0.5 mMBS(PEG)5 at 4 uC for 1 h. The reaction was quenched by adding Tris-HCl (pH 8.0)to a final concentration of 50 mM. As unbound Ras can contribute to random Raf-RBD polony colocalization, it was removed by HA-tag affinity purification to en-rich HA-tagged Raf-RBD and bound Ras. Thus, the samples were incubated withanti-HA magnetic beads (Thermo Scientific) at 4 uC for ,2 h and eluted with anarray deposition buffer (20 mM HEPES, pH 7.0, 50 mM KOAc, 6 mM Mg(OAc)2,0.25 U ml21 RNasin (Promega) and 0.1% Tween 20) in the presence of 2 mg ml21

HA peptide.GPCR profiling assay. Mixed barcoded GPCRs were assayed with 100mM alpre-nolol, pindolol, isoproterenol, atropine and carbachol (Sigma-Aldrich) and 100 nMxanomeline (Tocris Bioscience). The GPCR–b-arr2 binding assay was performedby adding a ligand to ,1 nM GPCR–nanodisc complexes in 20 mM HEPES, pH 7.5,50 mM KOAc, 2 mM EDTA and 5 mM MgCl2, followed by addition of 10 nM GRK2(Life Technologies), 0.1 mM ATP, 10 nM G proteinb1c2 subunits (KeraFAST) and5 nM barcoded b-arr2 to a total volume of 25ml. Compounds were assayed in par-allel, and reactions were incubated at 30 uC for 30 min followed by the crosslinkingand the HA-tag affinity purification described above. Proteins from multiple wellswere pooled and analysed on a single array.ScFv binding profiling and immunoprecipitation assay. To diversify scFvs, error-prone PCR was performed for the ten scFv genes by using a random mutagenesiskit (Clontech Laboratories) under the condition of 3.5 mutations per 1,000 bp.Twenty mutants for each scFv were randomly picked and barcoded to construct ascFvs library. Ribosome display of the scFv library was specifically performed withthe PURExpress D ribosome kit supplemented with disulphide bond enhancers(New England Biolabs, 4ml of the enhancer 1 and 2 per 100ml reaction). The bind-ing assay was performed by incubating 200 scFvs (,5.5 nM in total) and 55 bar-coded human proteins (,1.8mM in total) in an assay buffer (50 mM HEPES,pH 7.5, 100 mM NaCl, 10 mM MgCl2 and 0.1% (v/v) Tween 20) at 4 uC for 4 h.Similarly as above, samples were subjected to the crosslinking and the HA-tagaffinity purification.

For the immunoprecipitation assay, selected scFv genes were subcloned intopEco–CSBP to express scFv fusions bearing a C-terminal streptavidin binding pep-tide tag. Proteins were synthesized in vitro using a PURExpress IVT kit supple-mented with the disulphide bond enhancers. In each binding assay, a 10ml IVTreaction typically containing 0.1–0.4mM translated scFvs was incubated with 2mlhuman proteins (4.6–9.5 nM) labelled by Halo-TMR in the assay buffer at 4 uC for4 h. Bound human proteins were pulled down with the streptavidin magnetic beadsand analysed by SDS–PAGE and the fluorescent gel imaging.Array deposition. Barcoded proteins were diluted with the deposition buffer to atenfold deposition concentration between 0.1 and 1 nM. Because the presence ofoxygen can inhibit the gel polymerization, a gel-casting solution (6.66% acrylamide/bis-acrylamide (19:1, molecular grade, Ambion) and two 59-acrydite-modified bridgeamplification primers (278mM each) in the deposition buffer) were degassed with

LETTER RESEARCH

Macmillan Publishers Limited. All rights reserved©2014

Page 6: Multiplex single-molecule interaction profiling of DNA ...arep.med.harvard.edu/pdf/Gu_Nature_2014.pdf · Multiplex single-molecule interaction profiling of ... via ribosome display

argon and mixed with diluted proteins by a 9:1 volume ratio in an anaerobic cham-ber (Coy Lab). To form a gel layer of less than 5-mm thickness, $ 20ml gel-castingmix, immediately after addition of 0.1% (v/v) TEMED and 0.05% (w/v) ammo-nium persulfate, was applied to a glass slide surface pretreated with Bind-Silane (GEHealthcare)7,36, and a coverslip was placed on the top of the liquid and tightly pressedto form a liquid layer evenly spreading over the glass surface. The gel was polymer-ized in the chamber for 4 h. After removal of the coverslip, the slide was washed withMilli-Q H2O and dried by a quick spin.Polony amplification, linearization and blocking. A protein-deposited slide wasassembled in a FC 81 transmission flow cell containing a 1.85-mm-thick polycar-bonate flow channel (BioSurface Technologies) for polony amplification, linear-ization and blocking. Flow cell components including the channel, a coverslip andtubing were cleaned by sonication in 5% Contrad 70 and Milli-Q H2O, and airdried in an AirClean PCR hood. Prior to the amplification, samples containingmRNA2cDNA hybrids can be digested with 10 U ml21 RNase H (New EnglandBiolabs) in 50 mM Tris-HCl, pH 8.3, 75 mM KCl, 3 mM MgCl2 and 0.1% (v/v)Triton X-100 at 37 uC for 20 min. Polony amplification, linearization and blockingwere performed by using an adapted cluster generation protocols10. In brief, immo-bilized barcoding DNAs were subjected to 32–35 cycles of isothermal bridge ampli-fication at 60 uC. For each cycle, the flow cell was washed with deionized formamide(Ambion) and an amplification buffer (20 mM Tris-HCl, pH 8.8, 10 mM ammo-nium sulphate, 2 mM magnesium sulphate, 0.1% (v/v) Triton X-100, 1.3% (v/v)dimethylsulphoxide and 2 M betaine) and incubated with 200mM dNTPs and80 U ml21 Bst polymerase (New England Biolabs) in the amplification buffer for5 min. Resulted double-stranded polonies were linearized by adding 10 U ml21

USER enzyme (New England Biolabs) in a linearization buffer (20 mM Tris-HCl,pH 8.8, 10 mM KCl, 10 mM ammonium sulphate, 2 mM magnesium sulphate and0.1% (v/v) Triton X-100) followed by incubating the flow cell at 37 uC for 1 h. Ex-cised strands were eluted with a wash buffer W1 (1 3 SSC and 70% formamide). Ex-posed 39-OH ends of polonies and primers were blocked by adding 250 U ml21

terminal transferase (New England Biolabs) and 10mM ddNTPs in a blocking buffer(20 mM Tris-acetate, pH 7.9, 50 mM KOAc, 10 mM Mg(OAc)2 and 0.25 mM CoCl2)followed by incubating the flow cell at 37 uC for 10 min.DNA sequencing. Linearized and 39-OH blocked polonies were analysed by hy-bridization with fluorescently labelled oligonucleotides, SBE or sequencing-by-ligation as previously described11,36. The assays can be performed within the flowcell or a gasket chamber assembled with the polony slide taken out of the flow celland a microarray gasket slide (Agilent Technologies). Polonies were probed witholigonucleotides or ddNTPs (PerkinElmer) labelled by fluorescein/FAM, Cy3/Ty563or Cy5/Ty665 and subjected to three-colour fluorescence imaging. In brief, the hy-bridization was performed by incubating polonies with oligonucleotides (2mM each)in a hybridization buffer (5 3 SSC and 0.1% (v/v) Tween 20) at 60 uC for 10 minfollowed by decreasing the temperature to 40 uC and washing off unbound oligo-nucleotides with a wash buffer W2 (0.3 3 SSC and 0.1% (v/v) Tween 20). The SBEwas performed by incubating primer-bound polonies with fluorescently labelledddNTPs (1mM each) and 0.32 Uml21 Thermo Sequenase (GE Healthcare) in 26 mMTris-HCl, pH 9.5, 6.5 mM MgCl2 and 0.05% (v/v) Tween 20 at 60 uC for 5 min, fol-lowed by washing off excess ddNTPs with the wash buffer W2. For each sequencing-by-ligation cycle, polonies were hybridized to a sequencing primer and probed witha query oligonucleotide set (fluorescent nonamers, 2mM each subpool) in a ligationbuffer (50 mM Tris-HCl, pH 7.6, 10 mM MgCl2, 1 mM ATP and 5 mM DTT) in thepresence of 30 Uml21 T4 DNA ligase (Enzymatics). The ligation was incubated atroom temperature for 20 min followed by increasing the temperature and main-taining it at 35 uC for 40 min. Prior to a next cycle, hybridized primers were strippedwith the buffer W1 at 60 uC. To facilitate the deconvolution of colocalized poloniesfrom two protein libraries, each library was separately sequenced using a differentsequencing primer.Image acquisition, processing and base calling. Fluorescence imaging was con-ducted with a Leica AM TIRF MC system including a DMI6000 B inverted micro-scope, a motorized scanning stage and a Hamamatsu C9100-02 electron multiplyingCCD camera (1,000 3 1,000 pixels, Hamamatsu Photonics). Polony images wereacquired under an epi-illumination mode by using a 320 (HCX PL Fluotar L, N.A.0.40, Leica) or 340 objective (HCX PL APO, N.A. 0.85, Leica) and from three chan-nels (fluorescein, Cy3 and Cy5) using 488, 561 and 635 nm lasers and excitation–emission filter pairs (490/20–525/50, 552/24–605/65 and 635/10–720/60, respectively).Raw images were exported by LAS AF Lite software (Leica) and processed usingImageJ and MATLAB (R2011a) scripts to remove background fluorescence and

exclude small-size impurities and large-scale structures. Image analyses and basecalling were conducted similarly as previously described11. In brief, MATLAB scriptswere applied to identify polony coordinates by finding local maxima or weightedcentroids, construct a reference image containing all detected polonies by super-imposing images taken in the first cycle, and then align images from later cycles tothe reference image. Thus, a set of fluorescence values for each acquisition cycle aswell as the coordinates were obtained for polony identification and the colocaliza-tion analysis. No more than five sequencing cycles were required to analyse eachlibrary used in this work.Colocalization analysis and statistics. To align reference images for protein andprobe libraries, polonies were hybridized with both sequencing primers labelled byCy3 or Cy5, and then their images were superimposed to generate a cross-library ref-erence. MATLAB scripts calculated the offset of reference images generated fromtwo sequencing rounds, measured distances between all polony positions identifiedfrom the two libraries, and compared them to a defined threshold to determine thecolocalization. We considered a polony exclusion effect usually observed for com-petitive co-amplification of colocalized templates37,38, and set an optimized thresh-old distance to be 0.7mm. Total and colocalized polony numbers were computedfor each paired polony species at each imaging position. Colocalization statisticswere calculated using Student’s t-tests based on measurements at all imaged posi-tions. In addition, a pair cross-correlation function statistic39 was applied to ana-lyse polony colocalization patterns (Supplementary Notes).

31. Yang, X. et al. A public genome-scale lentiviral expression library of human ORFs.Nature Methods 8, 659–661 (2011).

32. Parker, E. M., Kameyama, K., Higashijima, T. & Ross, E. M. Reconstitutively active Gprotein-coupled receptors purified from baculovirus-infected insect cells. J. Biol.Chem. 266, 519–527 (1991).

33. Kobilka, B. K. Amino and carboxyl terminal modifications to facilitate theproduction and purification of a G protein-coupled receptor. Anal. Biochem. 231,269–271 (1995).

34. Mitra, N.et al. Calcium-dependent ligandbinding and G-protein signaling of familyB GPCR parathyroid hormone 1 receptor purified in nanodiscs. ACS Chem. Biol. 8,617–625 (2013).

35. John, J., Frech,M.&Wittinghofer, A.BiochemicalpropertiesofHa-rasencodedp21mutants and mechanism of the autophosphorylation reaction. J. Biol. Chem. 263,11792–11799 (1988).

36. Mitra, R. D. et al. Digital genotyping and haplotyping with polymerase colonies.Proc. Natl Acad. Sci. USA 100, 5926–5931 (2003).

37. Mitra, R. D., Shendure, J., Olejnik, J., Edyta Krzymanska, O. & Church, G. M.Fluorescent in situ sequencing onpolymerasecolonies. Anal. Biochem. 320,55–65(2003).

38. Aach, J. & Church, G. M. Mathematical models of diffusion-constrainedpolymerase chain reactions: basis of high-throughput nucleic acid assays andsimple self-organizing systems. J. Theor. Biol. 228, 31–46 (2004).

39. Philimonenko, A. A., Janacek, J. & Hozak, P. Statistical evaluation of colocalizationpatterns in immunogold labeling experiments. J. Struct. Biol. 132, 201–210(2000).

40. Lewis, J.D.et al.Quantitative interactor screeningwithnext-generationsequencing(QIS-Seq) identifies Arabidopsis thaliana MLO2 as a target of the Pseudomonassyringae type III effector HopZ2. BMC Genomics 13, 8 (2012).

41. Georgiou, G. et al. The promise and challenge of high-throughput sequencing ofthe antibody repertoire. Nature Biotechnol. 32, 158–168 (2014).

42. Larman, H. B. et al. Autoantigen discovery with a synthetic human peptidome.Nature Biotechnol. 29, 535–541 (2011).

43. Fujimori, S. et al. Next-generation sequencing coupled with a cell-free displaytechnology for high-throughput production of reliable interactome data. Sci. Rep.2, 691 (2012).

44. Darmanis, S. et al. ProteinSeq:high-performanceproteomicanalyses byproximityligation and next generation sequencing. PLoS ONE 6, e25583 (2011).

45. Young, K. et al. Identification of a calcium channel modulator using a highthroughput yeast two-hybrid screen. Nature Biotechnol. 16, 946–950 (1998).

46. Nishihara, T. et al.Estrogenic activities of517chemicals byyeast two-hybrid assay.J. Health Sci. 46, 282–298 (2000).

47. Chidley, C., Haruki, H., Pedersen, M. G., Muller, E. & Johnsson, K. A yeast-basedscreen reveals that sulfasalazine inhibits tetrahydrobiopterin biosynthesis. NatureChem. Biol. 7, 375–383 (2011).

48. Wrighton, N. C. et al. Small peptides as potent mimetics of the protein hormoneerythropoietin. Science 273, 458–463 (1996).

49. Lowman, H. B. Bacteriophage display and discovery of peptide leads for drugdevelopment. Annu. Rev. Biophys. Biomol. Struct. 26, 401–424 (1997).

50. Leuchowius, K. J. et al. High content screening for inhibitors of protein interactionsand post-translational modifications in primary cells by proximity ligation.Mol. Cell. Proteomics 9, 178–183 (2010).

RESEARCH LETTER

Macmillan Publishers Limited. All rights reserved©2014

Page 7: Multiplex single-molecule interaction profiling of DNA ...arep.med.harvard.edu/pdf/Gu_Nature_2014.pdf · Multiplex single-molecule interaction profiling of ... via ribosome display

Extended Data Figure 1 | Improved stability of PRMC complexes generatedin a reconstituted E. coli IVT system. a, Schematic of PRMC complexstability analysis by measuring the relative ratio of barcoding DNA toHaloTag-labelled protein. b, Comparison of PRMC complex stabilities in theE. coli recombinant-factor-reconstituted (PURE) and an E. coli crudeextract (S30) IVT system. Nucleic acid degradation or ribosome dissociationcan result in the loss of barcoding DNAs. IVT reactions were performed at37 uC for 30 min and PRMC complexes were further incubated at roomtemperature for indicated periods of time before the affinity purificationand the stability analysis. Means of three independent experiments 6 standarddeviations.

LETTER RESEARCH

Macmillan Publishers Limited. All rights reserved©2014

Page 8: Multiplex single-molecule interaction profiling of DNA ...arep.med.harvard.edu/pdf/Gu_Nature_2014.pdf · Multiplex single-molecule interaction profiling of ... via ribosome display

Extended Data Figure 2 | HaloTag-based protein–DNA conjugation.a, Schematic of the individual barcoding method adaptable to an automaticplatform. Fusion proteins bearing an N- or C-terminal HaloTag and the affinity

tags were purified and conjugated to a barcoding DNA bearing threedifferent modifications. b, Agarose gel electrophoresis of the barcoding DNAand selected protein–DNA conjugates.

RESEARCH LETTER

Macmillan Publishers Limited. All rights reserved©2014

Page 9: Multiplex single-molecule interaction profiling of DNA ...arep.med.harvard.edu/pdf/Gu_Nature_2014.pdf · Multiplex single-molecule interaction profiling of ... via ribosome display

Extended Data Figure 3 | Covalent immobilization of barcoding DNAs isrequired for in situ polony amplification. a, b, Representative imagesof polonies amplified from barcoding DNA templates without (a) or with

(b) 59-acrydite modifications. Oversized polonies or polony clusters shown ina resulted from template-drifting-induced multiple seeding events duringthe amplification.

LETTER RESEARCH

Macmillan Publishers Limited. All rights reserved©2014

Page 10: Multiplex single-molecule interaction profiling of DNA ...arep.med.harvard.edu/pdf/Gu_Nature_2014.pdf · Multiplex single-molecule interaction profiling of ... via ribosome display

Extended Data Figure 4 | Polony quantification of various barcodedproteins. a, Plot showing the average number of polonies detected at a singleimaging position against the average number of barcoding DNA templatespredicted by real-time PCR quantification. Data represent mean values of 100

measurements; error bars, 95% CL. b, Log–log plot of total numbers of poloniesdetected against dilution factors. Data represent mean values of twotechnical replicates.

RESEARCH LETTER

Macmillan Publishers Limited. All rights reserved©2014

Page 11: Multiplex single-molecule interaction profiling of DNA ...arep.med.harvard.edu/pdf/Gu_Nature_2014.pdf · Multiplex single-molecule interaction profiling of ... via ribosome display

Extended Data Figure 5 | Crosslinking efficiency of DsRed is improved by alysine-rich TolA domain. a, SDS–PAGE analysis of purified DsRed (ClontechLaboratories) and HaloTag–DsRed–TolA proteins before (lanes 1 and 3)and after (lanes 2 and 4) the crosslinking. 10mM purified proteins werecrosslinked by 1 mM BS(PEG)5 in 20 mM HEPES buffer, pH 8.0, 150 mMKOAc at 4 uC for 1 h. Proteins were stained with Coomassie blue. Only a minorband of the crosslinked dimer was observed for DsRed (lane 2); in contrast,HaloTag–DsRed–TolA was all crosslinked as a tetramer or a trimer (lane 4).Co-purified E. coli proteins (some protein bands below the major band in thelane 3), probably bound to TolA during the purification, and degradationproducts (due to the hydrolysis of an acylimine bond in the DsRedchromophore) were efficiently crosslinked to HaloTag–DsRed–TolA.

b, Comparison of HaloTag-labelled DsRed and mCherry, a monomericfluorescence protein, crosslinked at different conditions. Proteins labelled withHalo-TMR were analysed by fluorescent gel imaging. Only a minor fraction ofHaloTag–mCherry–TolA, a control to show non-specific crosslinking, wascrosslinked at increased protein concentrations. Intramolecularly crosslinkedproteins show multiple bands or smears corresponding to different quaternarystructures of the multidomain proteins stabilized by crosslinking. Bands ofthe crosslinked trimers show an increased intensity at higher BS(PEG)5

concentrations probably because the primary amine groups on surface aremore quickly modified by BS(PEG)5, thus preventing the further crosslinkingto form the tetramer.

LETTER RESEARCH

Macmillan Publishers Limited. All rights reserved©2014

Page 12: Multiplex single-molecule interaction profiling of DNA ...arep.med.harvard.edu/pdf/Gu_Nature_2014.pdf · Multiplex single-molecule interaction profiling of ... via ribosome display

Extended Data Table 1 | Polony quantification of titrated binder proteins and antigens

Polony counts of two technical replicates were shown. a.a., amino acids.

RESEARCH LETTER

Macmillan Publishers Limited. All rights reserved©2014

Page 13: Multiplex single-molecule interaction profiling of DNA ...arep.med.harvard.edu/pdf/Gu_Nature_2014.pdf · Multiplex single-molecule interaction profiling of ... via ribosome display

Extended Data Table 2 | ScFvs and human proteins used in the one-pot binding profiling

LETTER RESEARCH

Macmillan Publishers Limited. All rights reserved©2014

Page 14: Multiplex single-molecule interaction profiling of DNA ...arep.med.harvard.edu/pdf/Gu_Nature_2014.pdf · Multiplex single-molecule interaction profiling of ... via ribosome display

Extended Data Table 3 | Comparison of protein interaction profiling technologies based on nucleic acid barcoding and high-throughputsequencing

Refer to Supplementary Discussion for a comparison of the above technologies. CDS, protein coding sequence; NGS, next-generation sequencing. *References 14, 16, 27, 30, 40–50 are cited in this table.**Mathematical model suggests that $ 105 3 105 interactions may be analysed in a single binding assay (refer to Supplementary Notes).

RESEARCH LETTER

Macmillan Publishers Limited. All rights reserved©2014