poster abstracts - abrf protocols, and ... poster abstracts abrf 2014 team science and big data:...

30
ALBUQUERQUE CONVENTION CENTER Albuquerque New Mexico MARCH 22-25, 2014 POSTER ABSTRACTS

Upload: dangngoc

Post on 10-Apr-2018

216 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: POSTER ABSTRACTS - ABRF protocols, and ... POSTER ABSTRACTS ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 2 ... N. Jafari 5, C. Aquino 6, A. Perera 7

ALBUQUERQUE

CONVENTION CENTER

Albuquerque New MexicoM A R C H 2 2 - 2 5 , 2 0 1 4

POSTER ABSTRACTS

Page 2: POSTER ABSTRACTS - ABRF protocols, and ... POSTER ABSTRACTS ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 2 ... N. Jafari 5, C. Aquino 6, A. Perera 7

ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 1POSTER ABSTRACTS

ABRF RESEARCH GROUP

1 ABRF Research Group Development and Characterization of a Proteomics Normalization Standard Consisting of 1,000 Stable Isotope Labeled Peptides

Christopher M. Colangelo1, Craig Dufresne2, David Hawke3, Alexander R. Ivanov4, Antonius Koller5, Brendan MacLean6, Brett Phinney7, Kristie Rose8, Paul Rudnick9, Brian Searle10, Scott Shaffer11

1Keck Biotechnology Resource, Yale University, New Haven , CT, 2Thermo Fisher Scientific, West Palm Beach, FL, 3UT MD Anderson Cancer Center, Huston, TX, 4Northeastern University, Boston, MA, 5Stony Brook University, Stony Brook, NY, 6University of Washington, Seattle, WA, 7University of California, Davis, CA, 8Vanderbilt University, Nashville, TN, 9Spectragen Informatics, Rockville, MD, 10Proteome Software, Portland, OR, 11University of Massachusetts Medical School, Worcester, MA

The ABRF Proteomics Standards Research Group (sPRG) is reporting the progress of a two-year study (2012-2014) which focuses on the generation of interassay, interspecies, and interlaboratory peptide standard that can be used for normalization of protein abundance measurements in mass spectrometry based quantitative proteomics analyses. The standard has been formulated as two mixtures: 1,000 stable isotope 13C/15N-labeled (SIL) synthetic peptides alone, and peptides mixed with a tryptic digest of a HEK 293 cell lysate. The sequences of the synthetic peptides were derived from 552 proteins conserved across proteomes of commonly analyzed species: Homo sapiens, Mus musculus and Rattus norvegicus. The selected peptides represent a full range of hydrophobicities and isoelectric points, typical of tryptic peptides derived from complex proteomic samples. The standard was designed to represent proteins of various concentrations, spanning three orders of magnitude. First year efforts were focused on selection of appropriate protein and peptide candidates, peptide synthesis, quality assessment and LC-MS/MS evaluation conducted in laboratories of sPRG members. Using a variety of instrumental configurations and bioinformatics approaches, a thorough characterization of all 1,000 peptides was established. In the second year, the group launched the study to the entire proteomics community. A lyophilized mixture of HEK 293 tryptic digest cell lysate spiked with the 1,000 SIL peptide standards was provided to each participant. Also provided were a Skyline tutorial, tutorial datasets, three MS/MS spectral libraries generated from linear ion-trap (CID), Q-TOF/QQQ (CID), or Orbitrap (HCD) instrumentation, and a Panorama data repository. Participants were asked to analyze the sample in triplicate and calculate ratios of the spiked SIL to endogenous peptides and coefficients of variance for each peptide. Over 40 datasets were returned, and results following thorough characterization of the standard using various instrumental configurations will be reported.

2 Evaluating Effects of Cell Sorting on Cellular Integrity

M. DeLay1, P. Lopez2, A. Bergeron3, A. Box4, K. Brundage5, S. Chittur 6, M. Cochran7, E. M. Meyer8, S. Tighe9

1Research Flow Cytometry Core, Cincinnati Children’s Hospital, 2Office of Collaborative Science Cytometry Core, New York University Langone Medical Center, 3DartLab, Dartmouth College, 4Cytometry Facility, Stowers Institute for Medical Research, 5Flow Cytometry Core Facility, West Virginia University, 6DNA Microarray Core Facility, SUNY Albany, 7Flow Cytometry Core, University of Rochester Medical Center, 8Cytometry Facility, University of Pittsburgh Cancer Institute, 9Advanced Genome Technologies Core, Vermont Cancer Center

During the past year the Flow Cytometry Research Group has continued on its goal to establish best practice guidelines for cell sorting conditions that minimize cell stress, perturbation, or injury to the sorted cells. Towards this goal the group has followed up on an observation from our initial study that showed poor cell recovery when a clonal population of cells (Jurkat) was sorted aggressively under intentionally adverse sorting conditions (excessive pressure as well as undersized sorting orifice). In this follow-up study we sought to identify unique qualities of the cells that survived the adverse sorting conditions, in the hope that this may prove to be a useful test method for assessing deleterious effects of cell sorting across a wide variety of cell types.

To address this question, six FCRG member-sites received a distribution of the same Jurkat cell population and using different instrumentation and sorting conditions, sorted these cells for subsequent cell cycle analysis, post-sort viability, and recovered cell counts. In addition, one site submitted parallel samples for microarray analysis. The results of these studies and future planned studies will be presented.

3 The ABRF Next Generation Sequencing Study: Multi-platform and Cross-methodological Reproducibility of Transcriptome Profiling by RNA-seq

Christopher E. Mason1,2, Sheng Li1,2, Scott Tighe3, Charles Nicolet4, Don Baldwin5, George Grills6, and the ABRF-NGS Consortium

Next generation sequencing (NGS) has dramatically expanded the potential for novel genomics discoveries, but the wide variety of platforms, protocols, and performance has created the need for reference data sets to understand the sources of variation in results. The goals of the ABRF-NGS Study are to use standard references to evaluate the performance of NGS platforms and to identify optimal methods and best practices. For the first phase of this study, over 20 core facility laboratories performed replicate RNA-seq experiments,

POSTER ABSTRACTS

Page 3: POSTER ABSTRACTS - ABRF protocols, and ... POSTER ABSTRACTS ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 2 ... N. Jafari 5, C. Aquino 6, A. Perera 7

ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 2POSTER ABSTRACTS

using titrated reference RNA standards and a set of synthetic RNA spike-ins, evaluated over a wide range of methods: polyA-enriched, ribo-depleted, size-specific fractionations, and degraded RNA, on six NGS platforms (Illumina HiSeq 2000/2500 and MiSeq, Life Technologies PGM and Proton, Roche 454 GS FLX+, and PacBio RS). Two RT-qPCR data sets were used as orthogonal tools to gauge the RNA-seq results. The results show high intra-platform consistency and inter-platform concordance for expression measures, but also demonstrate highly variable rates of efficiency and costs for splice isoform detection between platforms. The data also add evidence that ribosomal RNA depletion can both salvage degraded RNA samples and be readily compared to polyA-enriched fractions. Comparisons of alternative aligners for each platform show that algorithm choice affects mapping rates and transcript coverage more than gene quantification. Surrogate variable analysis (SVA) proved to be an optimal method to combine data within and between platforms, increasing sensitivity and reducing false positives by over 90%. Taken together, these data represent a broad cross-platform characterization of RNA standards and provide a comprehensive comparison of results from degraded, full-length, and size-selected RNA across the latest NGS platforms. The next phase of this study is focusing on use of DNA reference standards. Results of the ABRF-NGS Study provide a broad foundation for cross-platform standardization, evaluation, and improvement of NGS applications.

4 Antibody Research Group

David L. Blum1, Dennis Bagarozzi2, Kathryn Brundage3, Robert Carnahan4, Dan L. Crimmins5, Frances Weis-Garcia6

1University of Georgia, 2Centers for Disease Control, 3West Virginia University, 4Vanderbilt University Medical Center, 5Washington University, 6Memorial Sloan Kettering Cancer Center

To obtain an overview of how scientists develop new antibodies and find existing ones, the Antibody and Technology Research Group (ARG) used a web-based survey to query academic antibody core facilities/resource centers and researchers. A total of four surveys were conducted using Survey Monkey. The first sought to learn the immunization strategies employed by our peer when they are trying to make antibodies against a weakly immunogenic target covering variable such as host, adjuvant and immunogen form. The second curated screening approaches the cores employ to find hybridomas secreting the desired antibodies, antibodies that bind the target in whatever their final application may be (e.g. flow cytometry, immunoflourescence and in vivo). The third survey focused on the molecular biology side of the field, specifically antibody sequencing and recombinant antibody production. As a follow up to a round table discussion entitled “The Perfect Antibody”, the fourth was conducted to understand how researchers antibodies that meet the needs of their research and if they utilize websites focused on end-user reviews and publication citations.

5 Genomics Research Group Session

N.G. Reyero1, D. Baldwin2, S. V. Chittur3, N. Raghavachari4, N. Jafari5, C. Aquino6, A. Perera7

1Mississippi State University, 2Pathonomics, 3SUNY Albany, 4NIH, 5Northwestern University, 6Functional Genomics Center Zurich, 7Stowers Institute for Medical Research

The Genomics Research Group (GRG) presentation is intended to describe the current activities of the group in applying the latest tools and technologies for transcriptome analysis to determine the advantages and disadvantages of each of the platforms. We will present three ongoing projects. In the first project, we specifically evaluated microarrays, QPCR and Next Generation Sequencing (NGS) platforms for examining the sensitivity and specificity of microRNA detection using synthetic miRNA standards. In a second study, we used total RNA from Atlantic oyster samples to analyze the transcriptome and study the effect of oil spill and hypoxia as secondary stressor using next generation sequencing. Furthermore, we are now in the process of assembling the Atlantic oyster genome in collaboration with the Genomics Bioinformatics Research Group (GBIRG). We will also present our new project, which will involve the comparison of gene expression profiles of individual SUM149PT cells treated with the histone deacetylase inhibitor TSA vs. untreated controls. The goals of this project are to demonstrate RNA sequencing (RNA-seq) methods for profiling the ultra-low amounts of RNA present in individual cells, and to demonstrate the use of the Fluidigm system for cell capture and RNA amplification in C1 Single-Cell Auto Prep Integrated Fluidic Circuits (IFCs). In this session, we will discuss the technical challenges and the results obtained and expected from each of these projects.

6 Overview of Production of Protein Using Cell-free Systems

Fei Philip Gao Protein Expression Research Group, University of Kansas

One of the most important steps in protein research is production of the target protein. Cell based systems are mature tools that have long been used to express recombinant proteins by manipulation of the expression organisms. However, it is often challenging to find suitable cell systems that allow for rapid screening of conditions and constructs to produce properly folded, functional proteins in a cost effective manner. As a result, cell-free protein production emerged as an attractive alternative to cell-based protein expression methods because of its advantages including speed, simplicity, and adaptability to various formats. Efforts have been made in recent years to overcome a few major obstacles that had been preventing the system from being more widely used. These advances have led to the revitalization of cell-free expression systems to meet the increasing demands for protein production, and many research institutions and companies have developed unique and innovative cell-free systems. This poster will present the history and development of the cell-free method, and the updated techniques of various cell-free systems. Examples will be presented to demonstrate that the cell-free system can be a true alternative to cell based protein expression systems and offers a powerful technology for accelerating the production of recombinant protein.

Page 4: POSTER ABSTRACTS - ABRF protocols, and ... POSTER ABSTRACTS ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 2 ... N. Jafari 5, C. Aquino 6, A. Perera 7

ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 3POSTER ABSTRACTS

7 Nucleic Acid Research Group 2013-2014 Study: Evaluating Library Synthesis Protocols for Sub-Nanogram ChIPSeq samples

Herbert Auer1, J. Russ Carmical2, Christian H Lytle3, Vijayanand Nadella4, Nicholas Beckloff5, Anoja Perera6, Scott Tighe7, Sridar V Chittur8, Zach Herbert9

1Functional Genomics Core, IRB Barcelona, Barcelona, Spain, 2The University of Texas Medical Branch, Galveston, TX, 3Geisel School of Medicine at Dartmouth / Norris Cotton Cancer Center, Hanover, NH, 4Genomics Facility, Ohio University, Athens, OH, 5RTSF Genomics Core, Michigan State University, East Lansing, MI, 6Stowers Institute for Medical Research, Kansas City, MO, 7University of Vermont, 8University at Albany, SUNY, 9Molecular Biology Core Facilities, Dana-Farber Cancer Institute, Boston, MA

Chromatin immunoprecipitation followed by sequencing the precipi-tated DNA (ChIP-Seq) is the state-of-the-art method to study protein-DNA interactions. ChIP-Seq allows identification of binding sites of proteins across the entire genome in an unbiased manner. One of the major limitations of currently available ChIP-Seq protocols is the necessity to isolate sufficient amounts of immune precipitated DNA for subsequent sequencing. The NARG 2013/14 study evaluated library preparation alternatives starting from one and two orders of magnitude DNA less than standard protocols. Library preparation kits from seven different commercial providers were utilized in this project. Some of these kits were intended for low input while other kits were used outside of specifications of the manufacturers. Aliquots of the same preparation of ChIPed DNA were processed using the standard protocol for 10 ng of input DNA and the evaluated library preparation alternatives for 1 ng and 100 pg of input DNA. Each library type was prepared at two different ABRF Member labs and sequencing was done on a single Illumina HiSeq flow cell.

The results of low input compared to the standard protocol will be presented. The NARG 2013/14 study provides information how ChIP-Seq can be performed from 100 times less DNA than by standard methods with minimal compromising quality of results.

8 Metabolomics Investigation of Spiked Compound Differences in Human Plasma

Amrita K Cheema, John Asara, Thomas Neubert and Chris Turck Georgetown University, Harvard Medical School, New York University, Max Planck Institute of Psychiatry

Metabolomics is an emerging field that involves qualitative and quantitative measurements of small molecule metabolites in a biological system. This information is useful for developing biomarkers for diagnosis, prognosis or predicting response to therapy. The Metabolomics Research Group has organized a multi-laboratory study wherein human plasma samples were spiked with different amounts of metabolite standards in two groups of biological samples (A and B). The participants were asked to report back metabolites that were deemed to be significantly different between

the two biological groups with the help of analytical platforms and bioinformatics analyses that are routinely used in their laboratories. The participants were given the option to carry out the analysis blinded (non-targeted metabolomics) or with the knowledge of the spiked-in compounds (targeted metabolomics). The returned data show that the use of multiple platforms provides complementary information that helps increase metabolome coverage. Quantitation for spiked-in metabolites with high endogenous plasma levels was not as accurate as for compounds present at low endogenous levels. Participant data confirm that metabolite identification remains an important bottleneck in the field. We will summarize the data and provide some benchmarks that laboratories can use to compare their methodologies.

9 Longer Reads Do Not Significantly Improve RNA-seq Results

Jeffrey Rosenfeld1,2, Sagar Chhangawala3,4, Gabe Rudy5, Po-Yen Wu6, Scott Tighe7, May D. Wang6, Don A. Baldwin8, George Grills9, Christopher E. Mason4,5

and the ABRF-NGS Consortium 1Rutgers-New Jersey Medical School, Newark, NJ 07101; 2American Museum of Natural History, New York, NY; 3The Institute for Computational Biomedicine, Weill Cornell Medical College, New York, NY 10021; 4Department of Physiology and Biophysics, Weill Cornell Medical College, New York, NY 10021; 5Golden Helix, 203 Enterprise Blvd, Suite 1, Bozeman, MT 59718; 6The Wallace H. Coulter Dept. of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA; 7Vermont Cancer Center, University of Vermont, Burlington, VT 05405; 8Pathonomics LLC, Philadelphia, PA 19104; 9Institute of Biotechnology, Cornell University, Ithaca, NY 14853

The initial next-generation sequencing technologies produced reads of 25 or 36 base-pairs and only read from a single-end of the library sequence. Currently, it is possible to reliably produce 300bp paired-end sequences for RNA-expression analysis. While these read lengths have consistently increased, people have assumed that longer reads are better and that paired-end reads produce better results than single-end reads. These assumptions have been based upon intuition rather than hard experimentation. Using the RNA-seq standards from the Association of Biomolecular Facilities – Next Generation Sequencing (ABRF-NGS) Study, we were able to evaluate the impact of read-length on RNA-seq results. We started with paired-end 100bp reads and then trimmed them to simulate different read lengths along with separating the pairs to produce single-end reads. For each read-length and paired status, we evaluated differential expression levels between two standard samples and compared the results to those obtained by qPCR. We found that with the exception of reads trimmed to 25bp, there is little difference for the detection of differential expression regardless of the read-length. Once single-end 50bp reads are used, the results do not change substantially for any level up to and including 100bp paired-end reads. Thus, a researcher could save substantial resources by using 50bp single-end reads for their paired-end expression analysis. We replicated these results by using multiple computational pipelines to confirm that they were not a result of the particular algorithm we were using. Additionally, we performed the same analysis on two ENCODE samples and found consistent results affirming that our conclusions have broad application.

Page 5: POSTER ABSTRACTS - ABRF protocols, and ... POSTER ABSTRACTS ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 2 ... N. Jafari 5, C. Aquino 6, A. Perera 7

ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 4POSTER ABSTRACTS

ANTIBODY TECHNOLOGY

10 An Investigation of Vesicular Trafficking in Type III Neuromuscular Junctions of D. melanogaster

C Innocent1 and DL Deitcher1,2

1Molecular Biology & Genetics, Cornell University; 2Neurobiology & Behavior, Cornell University

Bursicon is a neurohormone packaged in and secreted by Type III synaptic contacts. In comparison to the depth of information describing Type I and Type II neuromuscular junctions (NMJs), Type III NMJs are notably ill-defined. As such, this lab has provided evidence identifying bursicon as a reliable marker in an aim to characterize Type III NMJs. Firstly, a comprehensive registry of proteins defining Type III NMJs can be compiled through antibody-labeled colocalizations using bursicon as the Type III marker. Candidates to be stained with bursicon include established presynaptic markers such as the SNARE proteins. Positional overlap in these dual antibody-staining profiles will be confirmed by sequential-excitation confocal microscopy. This proteomic approach establishes an immunohistological staining profile characterizing Type III NMJs. Secondly, this effort in effect also produces a directory identifying proteins involved in vesicle delivery in Type III NMJs and can be used to dissect the machinery by which neuropeptides are trafficked and secreted. While what is known about the mechanisms of vesicle trafficking employed by neurotransmitter-releasing Type I synapses continues to be expanded, information on these same properties in peptidergic Type III NMJs idles. Therefore, using a GFP-variant fusion protein able to mimic bursicon’s packaging and transport in vesicles, we can perturb in vivo delivery of a fluorescent neuropeptide to describe Type III synaptic release. This experimental design elucidates the dynamic process of the vesicle trafficking system specifically employed by Type III NMJs and adds to the breadth of knowledge gathered about the mechanism used during peptide release. This study is funded by the 2013 Cornell Graduate School Provost Fellowship.

BIOINFORMATICS

11 Characterizing Sequencing Samples with k-mer Spectra for Good and Evil

W. L. Trimble1, S. Owens2, K. Keegan1, F. Meyer1

1Computation Institute, Argonne National Laboratory, 2NGS core facility, Institute for Genomics and Systems Biology, University of Chicago

With many diverse applications for multiplexed, high-throughput sequencing, sequence data always has a common format, although for different applications it can have very different content. K-mer spectra afford diagnostics that characterize sequence-level diversity and have proven useful in revealing coverage bias, sequencing adapter contamination, and even inadvertently mixed and contaminated samples. We present how summarizing datasets using spectrum of long (15+ base pair) k-mers can help recognize commonalities in and manage sequencing datasets.

12 A Practical Evaluation of Next Generation Sequencing & Molecular Cloning Software

C. Olsen, K. Qaadri, R. Moir, M. Kearse, S. Buxton, A. Wilson, M. Cheung, B. Milicevic, W. Hengjie, J. Kuhn , S. Stones-HavasBiomatters

Research biologists increasingly face the arduous process of assessing and implementing a combination of freeware, commercial software, and web services for the management and analysis of data from high-throughput experiments. Laboratories can spend a remarkable amount of their research budgets on software, data analysis, and data management systems. The National Institutes of Health (NIH) and the National Science Foundation (NSF) have emphasized the need for contemporary software to be well-documented, interoperable, and extensible. However, laboratories often invest significant resources for personnel to build bespoke bioinformatics tools. This can have a marked impact on productivity and ROI, because these software tools often do not perform the way needed or hidden costs arise unexpectedly because of inefficiencies in the software. In this poster, we present a framework that Biomatters developed as a practical evaluation process to assist core facility managers and principal investigators to determine the best tools for DNA/RNA/protein sequence analysis and molecular cloning. The evaluation was performed on commercial & open source software packages using six criteria: 1) user interface, 2) data management, 3) data analysis, 4) feature availability, 5) extensibility, and 6) support. Thirteen software packages were evaluated (six commercial and seven free packages) using our six-tiered framework. This framework is useful for efficiently discriminating the strengths and weaknesses of the various packages, standardizes this process, and is helpful in reducing the amount of time spent on the evaluation process.

13 Secondary Structures of Plasmodium falciparum Associated with Recombination in Variable Surface Antigen Gene Families

C. Olsen, K. QaadriBiomatters

Many pathogens undergo antigenic variation to evade and counter host immune defense systems. Plasmodium falciparum is the causative agent of malaria in humans and has developed antigenic strategies for host evasion. One of these strategies is to manipulate var gene expression, which results in alternating expression of the erythrocyte membrane protein 1 class on the infected erythrocyte surface. Recombination has been shown to generate diversity in the var gene, but the control and nature of the genetic changes remain opaque. A bioinformatic and experimental approach was used to identify recombination events in var genes. We confirm that during a parasite’s sexual stages, recombination between isogenous var paralogs occurs near low-folding free energy DNA 50-mers. These sequences are concentrated at the boundaries of regions encoding individual erythrocyte membrane protein 1 structural domains. This poster aims to illustrate the nature and the location of the structural variants of the var gene in Plasmodium falciparum.

Page 6: POSTER ABSTRACTS - ABRF protocols, and ... POSTER ABSTRACTS ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 2 ... N. Jafari 5, C. Aquino 6, A. Perera 7

ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 5POSTER ABSTRACTS

14 A Practical Example of Bringing Computation to Data

D.Kalra1, B.N.Downs1, D.M.Opheim1, W.Hale1, L.Xi1, L.A.Donehower1,2

1Human Genome Sequencing Center, Baylor College of Medicine, 2Department of Molecular Virology & Microbiology, Baylor College of Medicine

The rapid decline in sequencing costs has resulted in an ever increasing number of data sets being generated by next-gen sequencing technologies. Downloading this data from a repository can consume much of an institute’s network resources and severely impact overall analysis time. Storing and analyzing this ”big data” is becoming a challenge not just for independent researchers but for large-scale sequencing centers as well. At Baylor College of Medicine, we had a need to analyze a large amount of data from The Cancer Genome Atlas (TCGA) dataset hosted at the Cancer Genomics Hub (CGHub) of The University of California Santa Cruz. We performed the initial reduction of the data at a compute farm co-located with CGHub. This significantly reduced the time required to download the data, while preserving the final results. We processed 6302 BAM files corresponding to over 4508 samples. The initial data reduction was achieved by using samtools at the co-located compute farm to extract the exon data in question. This reduction was accelerated by combining gtfuse with samtools.

Using this approach, we were able to shrink nearly 98TB of data to 6.5GB. Downloading the original data set of ~98TB would have taken us 8.5 weeks. Our approach reduced the transfer time to 5.5 mins (assuming a transfer rate of 20MB/s). By greatly reducing both the download time and the storage size of our data set, we have demonstrated one way in which the big data paradigm of moving computation to the data can be a practical reality.

The rapid decline in sequencing costs has resulted in an ever increasing number of data sets being generated by next-gen sequencing technologies. Downloading this data from a repository can consume much of an institute’s network resources and severely impact overall analysis time. Storing and analyzing this “”big data”” is becoming a challenge not just for independent researchers but for large-scale sequencing centers as well. At Baylor College of Medicine, we had a need to analyze a large amount of data from The Cancer Genome Atlas (TCGA) dataset hosted at the Cancer Genomics Hub (CGHub) of The University of California Santa Cruz. We performed the initial reduction of the data at a compute farm co-located with CGHub. This significantly reduced the time required to download the data, while preserving the final results. We processed 6302 BAM files corresponding to over 4508 samples. The initial data reduction was achieved by using samtools at the co-located compute farm to extract the exon data in question. This reduction was accelerated by combining gtfuse with samtools. Using this approach, we were able to shrink nearly 98TB of data to 6.5GB. Downloading the original data set of ~98TB would have taken us 8.5 weeks. Our approach reduced the transfer time to 5.5 mins (assuming a transfer rate of 20MB/s). By greatly reducing both the download time and the storage size of our data set, we have demonstrated one way in which the big data paradigm of moving computation to the data can be a practical reality.

15 Keeping Genomic Data Safe on the Cloud

I. Bogicevic1, Z. Rilak1, S. Wernicke1 1Seven Bridges Genomics

With rapidly improving technology and decreasing costs, the amount of genetic data generated through Next Generation Sequencing (NGS) continues to grow at an exponential pace. Managing and processing this data requires significant compute power and storage. Cloud-based solutions and platforms offer virtually unlimited compute power and storage to meet this requirement, but raise concerns about data security. Standards such as HIPAA in the United States attempt to establish best practices for dealing with genetic information, but have not been written with the cloud in mind and leave room for interpretation on crucial security issues such as data encryption.

We present a comprehensive and fully HIPAA-compliant security framework that we have implemented for our genomics cloud platform. Key technology elements of this framework are: One, keeping data encrypted at all times: during transfer, in storage, and during computation. Two, in addition to server-side encryption, requiring that all user data, file transfers, and platform services communicate through encrypted SSL channels. Three, computation environments are set within a separate Amazon Web Services VPC (Virtual Private Cloud) container to provide a high level of isolation, security and monitoring.

From a user perspective, analyses can be run in a secure isolation mode that isolates the environments from any access or activities except for the data owner’s computation. Additionally, data owners completely control all access rights to data and analysis pipelines using a fine-grained permission scheme. For example, access to files and pipelines can be set individually on a project-by-project and user-by-user basis.

16 The Galaxy Framework as a Unifying Bioinformatics Solution for ‘omics’ Core Facilities

P.D. Jagtap1, J. Johnson2, B. Gottschalk2, G. Onsongo2, S. Bandhakavi3, E.P. deJong4, J.A. Kooren5, T.W. Markowsk1, L. Higgins1, J.D. Rudney6, T.J. Griffin5 1Center for Mass Spectrometry and Proteomics, University of Minnesota., 2Minnesota Supercomputing Institute, 3Bio-Rad Laboratories, Inc., Hercules, CA, 4Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota., 5Johns Hopkins Medical School, Baltimore, MD., 6School of Dentistry, University of Minnesota

Integration of different omics data (genomic, transcriptomic, proteomic) reveals novel discoveries into biological systems. Integration of these datasets is challenging however, involving use of multiple disparate software in a sequential manner. However, the use of multiple, disparate software in a sequential manner makes the integration of multi-omic data a serious challenge. We describe the extension of Galaxy for mass spectrometric-based proteomics software, enabling advanced multi-omic applications in proteogenomics and metaproteomics. We will demonstrate the benefits of Galaxy for these analyses, as well as its value for software developers seeking to publish new software. We will also share insights on the benefits of the Galaxy framework as a bioinformatics solution for proteomic/metabolomic core facilities. Multiple datasets for proteogenomics research (3D-fractionated salivary dataset and oral pre-malignant lesion (OPML) dataset) and metaproteomics research

Page 7: POSTER ABSTRACTS - ABRF protocols, and ... POSTER ABSTRACTS ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 2 ... N. Jafari 5, C. Aquino 6, A. Perera 7

ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 6POSTER ABSTRACTS

(OPML dataset and Severe Early Childhood Caries (SECC) dataset). Software required for analytical steps such as peaklist generation, database generation (RNA-Seq derived and others), database search (ProteinPilot and X! tandem) and for quantitative proteomics were deployed, tested and optimized for use in workflows. The software are shared in Galaxy toolshed (http://toolshed.g2.bx.psu.edu/). Usage of analytical workflows resulted in reliable identification of novel proteoforms (proteogenomics) or microorganisms (metaproteomics). Proteogenomics analysis identified novel proteoforms in the salivary dataset (51) and OPML dataset (38). Metaproteomics analysis led to microbial identification in OPML and SECC datasets using MEGAN software. As examples, workflows for proteogenomics analysis (http://z.umn.edu/pg140) and metaproteomic analysis (http://z.umn.edu/mp65) are available at the usegalaxyp.org website. Tutorials for workflow usage within Galaxy-P framework are also available (http://z.umn.edu/ppingp). We demonstrate the use of Galaxy for integrated analysis of multi-omic data, in an accessible, transparent and reproducible manner. Our results and experiences using this framework demonstrate the potential for Galaxy to be a unifying bioinformatics solution for ‘omics core facilities.

17 Computing Multi-Level Clustered Alignments of Gene-Expression Time-Series

Deborah Muganda-Rippchen1,2 and Mark Craven1,2

1Department of Computer Sciences, University of Wisconsin-Madison, 2Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison

Identifying similarities and differences in expression patterns across multiple time series can provide a better understanding of the relationships among various normal biological and experimentally induced conditions such as chemical treatments or the effects induced by a gene knockout/ suppression. We consider the task of identifying sets of genes that have a high degree of similarity both in their (i) expression profiles within each condition, and (ii) changes in expression responses across conditions. Previously, we developed an approach for aligning time series that computes clustered alignments. In this approach, an alignment represents the correspondences between two gene expression time series. Portions of one of the time series may be compressed or stretched to maximize the similarities between the two series. A clustered alignment groups genes such that the genes within a cluster share a common alignment, but each cluster is aligned independently of the others. Unlike standard gene-expression clustering, which groups genes according to the similarity of their expression profiles, the clustered-alignment approach clusters together genes that have similar changes in expression responses across treatments. We have now extended the clustered alignment approach to produce multi-level clusterings that identify subsets of genes that have a high degree of similarity both in their (i) expression profiles within each treatment, and (ii) changes in expression responses across treatments.

18 An Integrative Approach for Interpretation of Clinical NGS Genomic Variant Data

E. L. Crowgey1, D. L. Stabley2, C. Chen1, H. Huang1, K. Robbins2, S. Polson1, K. Sol-Church2, C. Wu1

1Center for Bioinformatics & Computational Biology, University of Delaware, 2Biomolecular Core Lab, Nemours Alfred I. DuPont Hospital for Children

Antibody (Ab) discovery research has accelerated as monoclonal Ab (mAb)-based biologic strategies have proved efficacious in the treatment of many human diseases, ranging from cancer to autoimmunity. Initial steps in the discovery of therapeutic mAb require epitope characterization and preclinical studies in vitro and in animal models often using limited quantities of Ab. To facilitate this research, our Shared Resource Laboratory (SRL) offers microscale Ab conjugation. Ab submitted for conjugation may or may not be commercially produced, but have not been characterized for use in immunofluorescence applications. Purified mAb and even polyclonal Ab (pAb) can be efficiently conjugated, although the advantages of direct conjugation are more obvious for mAb. To improve consistency of results in microscale (<100ug) conjugation reactions, we chose to utilize several different varieties of commercial kits. Kits tested were limited to covalent fluorophore labeling. Established quality control (QC) processes to validate fluorophore labeling either rely solely on spectrophotometry or utilize flow cytometry of cells expected to express the target antigen. This methodology is not compatible with microscale reactions using uncharacterized Ab. We developed a novel method for cell-free QC of our conjugates that reflects conjugation quality, but is independent of the biological properties of the Ab itself. QC is critical, as amine reactive chemistry relies on the absence of even trace quantities of competing amine moieties such as those found in the Good buffers (HEPES, MOPS, TES, etc.) or irrelevant proteins. Herein, we present data used to validate our method of assessing the extent of labeling and the removal of free dye by using flow cytometric analysis of polystyrene Ab capture beads to verify product quality. This microscale custom conjugation and QC allows for the rapid development and validation of high quality reagents, specific to the needs of our colleagues and clientele. Next generation sequencing (NGS) technologies provide the potential for developing high-throughput and low-cost platforms for clinical diagnostics. A limiting factor to clinical applications of genomic NGS is downstream bioinformatics analysis. Most analysis pipelines do not connect genomic variants to disease and protein specific information during the initial filtering and selection of relevant variants. Robust bioinformatics pipelines were implemented for trimming, genome alignment, SNP, INDEL, or structural variation detection of whole genome or exon-capture sequencing data from Illumina. Quality control metrics were analyzed at each step of the pipeline to ensure data integrity for clinical applications. We further annotate the variants with statistics regarding the diseased population and variant impact. Custom algorithms were developed to analyze the variant data by filtering variants based upon criteria such as quality of variant, inheritance pattern (e.g. dominant, recessive, X-linked), and impact of variant. The resulting variants and their associated genes are linked to Integrated Genome Browser (IGV) in a genome context, and to the PIR iProXpress system for rich protein and disease information. This poster will present detailed analysis of whole exome sequencing performed on patients with facio-skeletal anomalies. We will compare and contrast data analysis methods and report on potential clinically relevant leads discovered by implementing our new clinical variant pipeline. Our variant analysis of these patients and their unaffected family members resulted in more than 500,000 variants. By applying our system of annotations, prioritizations, inheritance filters, and functional profiling and analysis, we have created a unique methodology for further filtering of disease relevant variants that impact protein coding genes. Taken together, the integrative approach allows better selection of disease relevant genomic variants by using both genomic and disease/protein centric information. This type of clustering approach can help clinicians better understand the association of variants to the disease phenotype, enabling application to personalized medicine approaches.

Page 8: POSTER ABSTRACTS - ABRF protocols, and ... POSTER ABSTRACTS ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 2 ... N. Jafari 5, C. Aquino 6, A. Perera 7

ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 7POSTER ABSTRACTS

19 Automated Big Data Analysis in Bottom-up and Targeted Proteomics

Yassene Mohamme1,2, Suzanne van der Plas-Duivesteijn1, Dominik Doman‘ski3, Derek Smith2, Christoph Borchers2, Magnus Palmblad1 1Center for Proteomics and Metabolomics, Leiden University Medical Center, Leiden, The Netherlands2University of Victoria – Genome British Columbia Proteomics Centre, University of Victoria, Canada3Mass Spectrometry Laboratory, Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Warsaw, Poland

Similar to other data intensive sciences, analyzing mass spectrometry-based proteomics data involves multiple steps and diverse software using different algorithms and data formats and sizes. Besides that the distributed and evolving nature of the data in online repositories, another challenge is that a scientists have to deal with many steps of analysis pipelines. A documented data processing is also becoming an essential part for the overall reproducibility of the results. Thanks to different e-Science initiatives, scientific workflow engines have become a means for automated, sharable and reproducible data processing. While these are designed as general tools, they can be employed to solve different challenges that we are facing in handling our Big Data. Here we present three use cases: improving the performance of different spectral search engines by decomposing input data and recomposing the resulting files, building spectral libraries from more than 20 million spectra, and integrating information from multiple resources to select most appropriate peptides for targeted proteomics analyses. The three use cases demonstrate different challenges in exploiting proteomics data analysis. In the first we integrate local and cloud processing resources in order to obtain better performance resulting in more than 30-fold speed improvement. By considering search engines as legacy software our solution is applicable to multiple search algorithms. The second use case is an example of automated processing of many data files of different sizes and locations, starting with raw data and ending with the final, ready-to-use library. This demonstrates the robustness and fault tolerance when dealing with huge amount data stored in multiple files. The third use case demonstrates retrieval and integration of information and data from multiple online repositories. In addition to the diversity of data formats and Web interfaces, this use case also illustrates how to deal with incomplete data.

BIOMARKERS

20 Optimization of 384-well Luminex Immunoassays on the FlexMAP 3D System for Routine Analyte Quantitation from Commercial Kits

Cristina Fhied1, Melissa Pergande1, Ravi Pithadia1, Jeffrey A. Borgia 1

1Department of Biochemistry and Pathology, RUSH University Medical Center

INTRODUCTION: The Luminex immunobead platform offers the benefits of ELISA-based assays, but also permits higher throughput, increased flexibility, reduced sample volume, and lower cost when evaluation of multiple analytes is necessary. The FlexMap 3D system is a compact, high bead throughput instrument which can read up

to 500 different bead regions from a single sample in a 96 or 384-well plate format. Commercially-available kits for the measurement of protein analytes are currently only available in the 96-well plate format. The objective of this study was to adapt and validate the commonly used 96-well plate protocols to the 384-well plate format with equivalent or improved test performance characteristics. METHODS: A laboratory-developed 96-well plate assay protocol for Vimentin autoantibody quantitation was translated and optimized for a 384-well plate format. Assay sensitivity was evaluated by comparing the results from two commercially available kits (Human IGF Binding Proteins 1-7 and Human IGF I/II), first employing the traditional 96-well plate format and subsequently using the newly optimized 384-well plate format. RESULTS: Conversion from the 96 to 384-well plate format for our Vimentin autoantibody assay revealed comparable sensitivity, while consuming only about a quarter of the reagents and specimen. Results from the IGF-I assay revealed approximately equivalent sensitivity (±2.4%), whereas IGF-II showed a significant increase in sensitivity (17.7%), both when the 96 and 384-well plate formats were ran concomitantly. No significant alterations in assay range, recovery or precision were otherwise noted. CONCLUSION: In this study, we have demonstrated that the conversion of a 96 to 384-well plate format was developed and validated successfully. When employing our optimized 384-well plate format, we found that nearly 4 times as many samples can be ran additionally with one commercially available kit, saving in not only cost but also sample size.

21 Development of a Luminex Immunobead-Based Method for Autoantibody Discovery

Melissa Pergande1, Cristina Fhied1, Clint Piper1, Ravi Pithadia1, Michael J. Liptay1, and Jeffrey A. Borgia1

1Rush University Medical Center

Circulating autoantibodies show much promise for a range of diagnostic applications that are currently in very high priority. The unambiguous identification of novel targets for this application typically use immunoproteomic methods that have been historically very low throughput and with poor resolution. In this study, we combine the advancements in multi-dimensional HPLC with the strengths of Luminex-based immunoassays to develop a ‘fluidic array’ for immunoproteomic discovery. Our objective was extensive and high-throughput novel autoantigen discovery for lung cancer detection. Whole cell lysates were prepared from resected tissue specimens relevant to lung cancer screening (n=13 stage I adenocarcinomas; 13 granulomas) and fractionated via two-dimensional high-performance liquid chromatography (2D-HPLC) using reversed-phase (C4) followed by mixed bed ion-exchange chromatography. Resulting fractions were conjugated to Luminex immunodetection beads using carbodiimide/NHS chemistry, assays pooled, and the multiplex (direct-capture) assay used to screen patient sera (n=30 malignant; 30 benign) for differential immunoreactivity. Positive fractions via Mann-Whitney Rank Sum test were further resolved by SDS-PAGE and subjected to protein identification via mass spectrometry. The intact proteins acquired from the tissue lysates were distributed into 140 fractions (70 sub-fractions per tissue cohort) using 2D-HPLC. Direct screening of patient sera with the pooled Luminex bead sets resulted in the identification of 30 fractions with significant (p!0.05) differential immunoractivity. Autoantigen identification via mass spectrometry provided targets for the development of specific autoantibody assays and formation of optimized diagnostic panels. Herein, we present a novel and efficient method which can be used in the discovery of diagnostic autoantigens. This high-throughput method allows for the partial purification of autoantigens via 2D-HPLC as well as the direct screening of patient sera via an immunoreactive fluidic assay.

Page 9: POSTER ABSTRACTS - ABRF protocols, and ... POSTER ABSTRACTS ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 2 ... N. Jafari 5, C. Aquino 6, A. Perera 7

ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 8POSTER ABSTRACTS

22 SOMAscanTM v3.0: A Sensitive and Reproducible, Multiplex Proteomic Platform that Measures over 1100 Analytes in Complex Matrices

B. Lollo, S. Kraemer, G. Sanders, E. Katilius, E. King, T. Bauer, D. Zichi, N. Saccomano SomaLogic

The translation of biomarkers from pre-clinical model systems to human conditions has proven to be challenging. One approach to this issue is through the use of a rapid, highly multiplexed, quantitative assay solution targeted at human proteins and cross-species orthologs. SOMAscan™, a high-throughput, multiplexed proteomic assay (Ver. 3 >1100 protein analytes), simultaneously measures native folded proteins directly from small volumes of biological samples such as plasma, tissue or cell culture. A series of early experiments were examined in Non-Small Cell Lung Cancer (NSCLC) to identify translatable proteins in cell culture and mouse xenograft model systems as well as across human NSCLC tumor and serum. Drug sensitive and resistant NSCLC cell lines demonstrated profound differences in protein concentrations in response to drug treatment. Comparison of cell findings with tumor xenografts showed 22 (of 93) proteins in common. Experiments conducted in NSCLC clinical patients found 11 (of 44) proteins in common between human serum (n=291) and tumor tissue (n=8) . Features of both known and novel biology were uncovered, providing implications for drug action and resistance as well as novel targets for drug development. Multiplexed profiling of large numbers of analytes using SOMAscan can successfully enable biomarker discovery and translation of findings between model systems and to clinical systems.

CARBOHYDRATE ANALYSIS

23 Separation of N-Glycans with Different Sialic Acid Linkages by Using Novel Superficially Porous Particle HILIC Column

S. Tao1, Y. Huang1, B. Boyes1,2, R. Orlando1

1Complex Carbohydrate Research Center, Department of Chemistry, University of Georgia, Athens, GA 2Advanced Materials Technology Inc, Wilmington, DE

The study of N-linked glycans is among the most challenging analytical tasks due to their complexity and variety. This difficulty is further compounded by the fact that glycans of the same composition are typically found as mixtures of multiple glycoforms differing only in branching and linkages, which can have profound biological implications. For example, previous studies have shown that the ratio of a2-3 and a2-6 linked sialic acid plays an important role in cancer biology. Consequently, differential and quantitative detection of those isomeric structures could serve as a valuable early diagnostics in oncology, which could not effectively accomplished by traditional glycomic profiling yet. Here, we studied the ability of a novel superficially porous particle (Fused-core) Penta-HILIC column to chromatographically resolve isomeric N-glycans. We found that these columns are capable of resolving N-glycans with different sialic acid linkages, as demonstrated by isolating tentative glycan isomers from model glycoproteins such as Fetuin. Structural characterization was conducted with exoglycosidase digestions and LC-MS analysis. The separation achieved by utilizing the Penta-HILIC columns on N-glycan isomers with a2-3 and a2-6 sialic acid linkages can ultimately facilitate individual glycoform identification and quantitation.

CORE FACILITIES

24 Business Planning Core Facilities

G.N. Itzkowitz School of Medicine, Stony Brook Medicine

Thoughtful business planning is pivotal to the success of any business/operational venture. When planned in a thoughtful and detailed manner there are very few operational or financial surprises for an institution or facility (service center) to contend with. At Stony Brook Medicine we include SWOT analysis and a detailed Market Analysis as part of the process. This is bolstered by an initiative to ensure institutional policies are met so that facilities remain in compliance throughout their lifecycle. As we operate 14 facilities we have had the opportunity to become creative in our approach to coordinate activities, virtualize services, integrate new software business-to-business partners, and finally coordinate plans for phased consolidation instead of outright termination of services when required. As the Associate Dean for Scientific Operations and Research Facilities, the shared research facilities (cores) of the Medical School are in my direct line of sight. We understand their value to the meeting our overall research mission. We have found that an active process of monitoring to predict trouble as much as possible is the best approach for facilities. Some case analysis of this type of interaction will be presented as well.

25 Easing the Pain of Next-Gen Sequencing Data Evaluation and Delivery

C.M. Nicolet1, N. Shatokhina1, Z.Ramjan1, and B. P. Berman2

1Data Production Facility, USC Epigenome Center, Norris CCC, Keck School of Medicine, University of Southern California, 2Dept. of Preventive Medicine, Norris CCC, Keck School of Medicine, University of Southern California

High throughput genome biology has created the need for a robust and scalable software system able to handle massive amounts of sequencing data. At the Data Production Facility in the USC Epigenome Center we have addressed this problem by developing an online sequence data access site called the Epigenome Center Data Portal: ECDP. This scalable portal allows researchers to explore and download their datasets in a secure fashion. From the initial LIMS sample entry (currently using Genologics) through sequencing and downstream analysis on our supercomputing cluster, all characteristics of a sample are parsed and tracked allowing for the presentation of these metrics on a single integrated interface. The QC metrics data generated by the analyses can be visualized in a number of ways. Metrics can be viewed for multiple samples side by side, plotted on an interactive plot, and exported in spreadsheet format. The most important summary data are presented initially with the additional option to drill down into further detail. This allows for the rapid assessment of library quality before proceeding further with in-depth analyses. ECDP also serves as the primary means of client sequence data delivery. Clients can securely download analyzed data such as fastq files, bam files, and visualization tracks. Recently we have been collecting information on the client usage of ECDP in order to more effectively tailor the site to the needs of the user community. The initial results of this tracking will be presented.

Page 10: POSTER ABSTRACTS - ABRF protocols, and ... POSTER ABSTRACTS ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 2 ... N. Jafari 5, C. Aquino 6, A. Perera 7

ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 9POSTER ABSTRACTS

26 Opening the Doors to Pre-Commercial Technology

Reed A. George Scientific Services, Howard Hughes Medical Institute, Janelia Research Campus

Over three hundred years after Leeuwenhoek first applied a microscope to the study of life, new developments in optical systems technology and applications to science continue to emerge on a regular basis. However, most new instrumentation is developed behind closed doors, in the laboratories of optical physicists who collaborate with a small number of scientists to prove out and evaluate the potential impact. Most scientists, including those who stand to benefit the most from the innovations, must wait until the instruments become commercially available before they can evaluate and utilize the latest imaging approaches in their research. The Gordon and Betty Moore Foundation and the Howard Hughes Medical Institute (HHMI) are collaborating to change that situation with the development of an Advanced Imaging Center (AIC) at HHMI’s Janelia Research Campus (Janelia) in Northern Virginia. The AIC will provide access to pre-commercial imaging systems developed in the laboratories at Janelia, free of charge to researchers who submit compelling proposals for their use. The AIC will open in the second quarter of 2014, and will initially provide access to five novel imaging systems. The poster will describe the planned organization, proposal review process, initial technology set, and metrics for evaluating the utility of the facility. This new model of inter-institution collaboration and focus on getting important technology into the hands of motivated scientists promises to benefit both instrument developers and a wide range of users.

27 Next Generation Sequencing Analysis of Circulating RNAs in Human Plasma

Kanhav Khanna1, Xiaoping Su2, Hongli Tang1, Louis Ramagli1, Siping Sun1, and Erika Thompson1 1Sequencing and Microarray Facility, The University of Texas MD Anderson Cancer Center, 2Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center

Management of metastases, a major challenge in cancer therapy, requires constant monitoring of tumor burden in response to treatment. The importance of circulating tumor DNA (ctDNA), as biomarker for metastatic cancer, is increasingly evident as new studies have shown greater correlation of changes in tumor burden with ctDNA levels than with traditional markers like cancer antigen 15-3. The presence of tumor-associated RNA in plasma of cancer patients has also brought attention to the use of circulating RNA as markers. Recently, expression profiling of circulating RNAs have identified differentially expressed transcripts in cancer patients. These tumor-associated transcripts can potentially provide a non-invasive approach to identify new targets for cancer detection. To characterize the performance of RNA-seq methods with circulating RNAs, we evaluated the efficacy of three commercial library preparation protocols. Using two different plasma RNA samples, we constructed and sequenced five libraries and compared them to libraries from frozen tissue and FFPE samples. Our data show that over 40% genes in plasma samples, with 30X depth of mean coverage, were detected compared to 70% genes from frozen tissue sample with 44X depth of mean coverage. Although 64%

genes were detected in FFPE sample, the mean coverage was only 10X. More than 50% of lncRNA detected in frozen tissue were also detected in plasma samples. We theorize that the lower percentage of detected genes in plasma sample is probably due to incomplete RNA spectrum in circulating RNAs. Among the library preparation kits we used for plasma RNAs, TruSeq Stranded Total RNA kit performed the best with up to 60% of reads uniquely mapping to mRNA exonic region. Our preliminary analysis suggests that current plasma RNA-seq methods are robust and the data generated from plasma RNA is comparable to frozen tissue.

28 Evaluation of Billing and Tracking Programs Leads to a Hybrid Approach

M. DeLay1, A. Rupert2, A.N. White3, M. Wagner2, S. Thornton1 1Flow Cytometry Core, Division of Rheumatology, Cincinnati Children’s Hospital Medical Center, 2Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, 3Shared Facilities, Office of Research, Cincinnati Children’s Hospital Medical Center

Analysis of metrics, such as usage and billing, to assess core need and growth is paramount to running a successful shared facility. Determining these metrics has long been performed by basic analysis methods such as spreadsheets. Fortunately several platforms have emerged in the last decade to aid in the tracking of core metrics; however, these may not be as conducive as some in-house developed software that can be customized to fit core needs. Our flow cytometry shared facility has recently compared these platforms and has found that adaptation of the most useful aspects of acquiring usage time along with new billing and tracking systems in a hybrid approach is highly amenable to our goals and efficient for the shared facility. The different aspects compared will be described including tracking of usage hours, ensuring instrumentation training, billing, and capturing experimental parameters. In addition, we will describe the software revisions utilized to adapt to a hybrid system.

29 Using Surveys to Assess Core Needs

M. DeLay1, S. Thornton1 1Flow Cytometry Core, Division of Rheumatology, Cincinnati Children’s Hospital Medical Center

Assessment of core needs and collecting client feedback is key to tailoring core development to support the research of your clientele. To determine our user’s needs we have developed surveys at the technical and customer service level. For our flow cytometry core our technical surveys have established a change in the need for greater than 6 color cytometry from 17.3% of users surveyed in 2012 to 50% of users surveyed in 2013. Comparison with usage of instrumentation confirmed and strengthened this need. Furthermore, by assessing which fluors/lasers are the most highly utilized by our clients, these statistics can be used for requesting instrumentation with the appropriate capabilities to meet user demand and can aid in tailoring educational initiatives. On the customer service level, we also have monitored the satisfaction of our clients with our services and have acted appropriately to maintain successful programs and address deficiencies.

Page 11: POSTER ABSTRACTS - ABRF protocols, and ... POSTER ABSTRACTS ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 2 ... N. Jafari 5, C. Aquino 6, A. Perera 7

ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 10POSTER ABSTRACTS

30 Alternate Cluster Generation for Picomolar NGS Libraries

J.W. Podnar1, G. Huerta1, T. Heckmann1, H. Deiderick1, M. Barnette 1, S.P. Hunicke-Smith1

1Genetic Sequencing and Analysis Facility, University of Texas at Austin

While advancements in library preparation techniques now enable NGS library preparation from single cells and nano- to picograms of DNA or RNA, it can still be difficult to reach Illumina’s recommended final library concentration of 2 nM for DNA template preparation prior to cluster generation. This can be an acute issue for ChIP-seq, RIP-seq, and LCM-based samples. We have developed a simple and robust alternative method that maximizes number of reads from an Illumina library without having to sacrifice precious sample or perform excessive amplification. This new method requires between 10- and 100-fold less library material than the standard Illumina protocol. By direct comparison to the Illumina standard DNA template preparation method, we demonstrate that cluster density, Q-scores, raw reads and PF reads, and data quality are not affected by this new protocol. We have successfully used this method on over 115 HiSeq runs and 100 MiSeq runs of both single- and paired-end varieties.

31 Case Study: Regulatory Considerations in the Analysis of Human Patient Samples in an Academic Core Lab

A.S. Chien1, K.M. Krasinska1, T.M. McLaughlin1, J. Ingrande2 1Stanford University Mass Spectrometry, Stanford University, 2Department of Anesthesia, Stanford School of Medicine

ANALYTICAL CHALLENGE: Absolute quantitation of propofol in 1,600 human plasma samples. REGULATORY CONSIDERATIONS: Before our core lab could accept this project, regulatory and safety aspects had to be taken into account -- For clinical samples, would CLIA or GLP certification be required? Would HIPAA apply? Is propofol (infamous as the drug that killed Michael Jackson) a Controlled Substance? What about biosafety regulations? These issues will be defined and explored. In brief for this particular case study, CLIA was not required because results would not be reported back to patients or used for diagnosis. GLP was not required because data would not be used in FDA or other regulatory submissions. HIPAA did not apply because samples and data were not personally identifiable. Propofol is not classified as a controlled substance; if it were, many controlled substances can be obtained as analytical standards (in solution at low concentration) without triggering federal DEA regulations. As for biosafety, the patients were not known to have any infectious diseases, and samples were handled under “universal precautions”. SCIENTIFIC METHOD: Human plasma samples were spiked with D17-propofol (internal standard) and cleaned up via liquid-liquid extraction. The heptane extracts were analyzed without derivatization on a Scion TQ GC-triple quadrupole mass spectrometer (Bruker Daltonics) using electron ionization and SRM mode. An isocratic oven program (195C) minimized cycle time: total injection-to-injection time was 2.5 minutes. Four transitions were monitored, 2 each for propofol and D17-propofol. RESULTS: The method proved to be reliable for the analysis of 1,600 plasma samples spread over 2 months. Each 100-vial autosampler tray of samples took just over 4 hours to analyze. Calibration curves and QCs were run with each batch of samples and demonstrated consistent method performance over time. LLOQ was 4nM (400 amol on column), LLOD 2nM; response between 1nM and 4mM was linear.

32 Antibody Fluorophore Conjugation: A Simple Cell-Free Method for Quality Control

K. Acklin MS1, K. Ruisaard1, K. Ramirez1, D. Dwyer1, D. Bonilla PhD1, R. Jewell1, and K. Clise-Dwyer PhD1

1South Campus Flow Cytometry and Cell Sorting Core Facility, University of Texas MD Anderson Cancer Center

Antibody (Ab) discovery research has accelerated as monoclonal Ab (mAb)-based biologic strategies have proved efficacious in the treatment of many human diseases, ranging from cancer to autoimmunity. Initial steps in the discovery of therapeutic mAb require epitope characterization and preclinical studies in vitro and in animal models often using limited quantities of Ab. To facilitate this research, our Shared Resource Laboratory (SRL) offers microscale Ab conjugation. Ab submitted for conjugation may or may not be commercially produced, but have not been characterized for use in immunofluorescence applications. Purified mAb and even polyclonal Ab (pAb) can be efficiently conjugated, although the advantages of direct conjugation are more obvious for mAb. To improve consistency of results in microscale (<100ug) conjugation reactions, we chose to utilize several different varieties of commercial kits. Kits tested were limited to covalent fluorophore labeling. Established quality control (QC) processes to validate fluorophore labeling either rely solely on spectrophotometry or utilize flow cytometry of cells expected to express the target antigen. This methodology is not compatible with microscale reactions using uncharacterized Ab. We developed a novel method for cell-free QC of our conjugates that reflects conjugation quality, but is independent of the biological properties of the Ab itself. QC is critical, as amine reactive chemistry relies on the absence of even trace quantities of competing amine moieties such as those found in the Good buffers (HEPES, MOPS, TES, etc.) or irrelevant proteins. Herein, we present data used to validate our method of assessing the extent of labeling and the removal of free dye by using flow cytometric analysis of polystyrene Ab capture beads to verify product quality. This microscale custom conjugation and QC allows for the rapid development and validation of high quality reagents, specific to the needs of our colleagues and clientele.

33 Operation of the Advanced Instruments at the Flow Cytometry Resource Center: Extensive User Training vs. Staff-Assistant Service

S. Mazel1 Flow Cytometry Resource Center (FCRC) at the Rockefeller University

Flow Cytometry Analyzers are getting more and more sophisticated and more and more complex as time goes on. Now having 5-laser/16-fluorescent detector instrument(s) became our everyday routine. Every time when the Flow Cytometry Resource Center (FCRC) acquires a new instrument it strikes us how many errors could be made by an improperly trained operator. How many acquisition errors could sneak in and how many wrong decisions could be made? Wrongly analyzed data can be re-analyzed, but wrongly acquired data can almost never be re-acquired. This brings up a dilemma – which level of user training is appropriate for the efficient operation of the Advanced analyzers (BD LSRII, BD LSR-Fortessa, etc.)? It appeared to be good practice to reevaluate the training scheme at least yearly.

Page 12: POSTER ABSTRACTS - ABRF protocols, and ... POSTER ABSTRACTS ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 2 ... N. Jafari 5, C. Aquino 6, A. Perera 7

ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 11POSTER ABSTRACTS

User training is customized to the user’s needs and takes into account the user’s background as well as previous experience and consists of several essential steps:

1. Basic Flow Cytometry training – On-line Fluorescence Tutorials from Life Technologies Corp. (Web based; 1.5h)

2. “Beyond the Basics” Flow Cytometry Class (group of 5-15 researchers; 5.5h)

3. Pre-hands-on consultation (one-to-one; 1.5-2.5h)

4. Hands-on training on the instrument using cells of the interest (I Experiment; one-to-one; 2-4h)

5. One-on-one Introduction to the Analysis programs (one-to-one; 1-2h)

6. FCRC Staff Help Session with the instrumental setup (II Experiment; one-to-one; 2-4h)

7. Semi-independent user run of the data analysis (with the FCRC Staff “on call” for questions and tips)

8. Semi-independent user run of the next experiment (Experiment III) with the FCRC Staff ”on call” for questions and easy solutions

We would like to share with the community the diverse scheme of user training we have developed and implemented at The Rockefeller University FCRC and discuss in detail the importance of each essential step, including some business aspects of this scheme.

34 Biomedical Engineer’s Role in Improving the Management of Devices Used for Genomic Medicine Research

B. Hernandez1, O. Gutierrez1 1National Institute of Genomic Medicine

Today the biomedical engineers are very important in different health institutions around the world, yet the same impact has not been seen in research. All devices that are being used in biotechnology research have the same life cycle as in health care, so research units can learn from the biomedical engineers on how to improve the use of their instruments. The roles of biomedical engineers in the National Institute of Genomic Medicine (INMEGEN) are: (1) establish policies and guidelines to ensure that the technology used in research protocols is reliable and safe throughout development, (2) implementation and evaluations of technology management programs ensuring quality processes, (3) covering demand and (4) complying with national and international rules for biomedical devices. Additionally, we have developed and implemented tools to define the needs for new technologies that will be incorporated into the Institute as well as establish technical guidelines for the acquisition through technology assessments and a cost/benefit analysis. Furthermore, we designed and implemented training programs for research staff, in order to optimize resources. Currently there are many tools for the management of medical devices, but we developed new management tools for research devices based upon medical device management tools. Ours results are: (1 )increased use of the technology, (2) improved security, (3) increase in the confidence of the research protocols, (4) the researchers spent less time managing technology, (5) higher quality results, and (6) lower costs. We have found it necessary and useful to promote the participation of biomedical engineers in the management of research devices through their knowledge and tools to improve the resources in biotechnology research.

35 Lessons Learned: A Guide for Implementing and Activating a Centralized Billing System

A. N. White, L. MaysCincinnati Children’s Research Foundation, Cincinnati, OH

An increasing number of Institutions and Universities are adopting and implementing a centralized billing and tracking system for their research Shared Facilities. This is driven by the need to increase financial compliance with grant funding, create transparency in the ordering process, aid in usage tracking, and centralize billing and reporting processes. Cincinnati Children’s Research Foundation (CCRF) elected to implement the Core Ordering and Reporting Enterprise System (C.O.R.E.S) developed by Vanderbilt University. The CCRF C.O.R.E.S Billing Team has successfully implemented its first group of Shared Facilities into a centralized billing system. With this, some valuable lessons were learned with respect to the practical and technical implications of incorporating multiple workflows from Shared Facility, user, and finance perspectives. Here, we identify important points of consideration and a strategic process to help identify potential issues and concerns that can arise from the Investigator, Business, and Core Manager roles while implementing a centralized billing system.

DNA SEQUENCING

36 A Next Generation Sequencing PCR Primer Design Tool for Sanger Sequencing Confirmation by CE

A. Karger, S. Berosik, M. Wenz, P. Brzoska, E. Schreiber, F. Hu, X. You, H. Lam, S. Sovan, S.-H. Lim, K. Gordeeva, A. Ho Life Technologies Corp.

The advent of next generation sequencing (NGS) technology has made DNA-sequencing of the entire human exome of individuals in search of genetic variants a viable proposition. As a consequence, the demand for confirmatory Sanger- sequencing of PCR products containing genetic variants found in NGS experiments has grown. In the process of confirmatory DNA-sequencing , bioinformatics, time and effort for the design of efficient and specific PCR-Sequencing (PCR-Seq) primers around the candidate variants, is a bottleneck. Primer Designer is an online tool that makes the selection of PCR-Seq primers flanking candidate variants fast and easy. It gives access to a database of over 300,000 pre-designed PCR primer pairs covering 95% of the sequence of human coding- and 75% of the sequence of non-coding exons. Wet-lab validation on a set of 192 Primer Designer assays showed a better than 98% success rate for amplification and sequencing reactions. We illustrate the Primer Designer user interface offering several search input options, including by NGS .vcf file, and output of order-ready PCR-Seq primer pairs designed to generate PCR products of average 500 bp length. Details of the data analysis using a web-based version of the Variant Reporter® software and results are discussed for a representative subset of 28 variants(SNPs, indels) found by NGS.

Page 13: POSTER ABSTRACTS - ABRF protocols, and ... POSTER ABSTRACTS ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 2 ... N. Jafari 5, C. Aquino 6, A. Perera 7

ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 12POSTER ABSTRACTS

37 Accurate Genome Analysis of Single Cells

C. Korfhage1, E. Fisch1, E. Fricke1, S. Baedker1, U. Deutsch1, R. Ahmed1 and D. Loeffert1 QIAGEN Sciences Inc.

Whole genome analysis can be performed by next-generation-sequencing (NGS) techniques, microarrays, or parallel real-time PCR addressing multiple genomic regions. These analyses require a minimal amount of gDNA from 100-1000 ng, equivalent to 16,000–160,000 cells. For analysis of genomic differences between individual cells, accurate replication of the single-cell genome is required. Here, we describe the reliability of single-cell whole genome amplification (WGA) and its application in NGS and real-time PCR. Single cells were obtained by picking cells under the microscope or by dilution. Amplification was performed for 8 hours at 30°C, generating up to 40 µg of DNA. WGA methods from other suppliers were applied to single-cell samples in parallel. For NGS, 2 µg of DNA was used for shearing and library preparation was done using the TruSeq® DNA Sample Preparation Kit. The libraries were quantified and sequenced (paired-end) on a MiSeq® Instrument. For real-time PCR, 100 pg (bacterial cells) or 10 ng (mammalian cells) of WGA DNA was analyzed using SYBR-Green reagents or RT2 qPCR Arrays. A real-time PCR analysis of 267 loci across the entire genome was performed using 10 ng of WGA DNA for each primer assay. Results showed low and consistent CT-values in real-time PCR for all loci, with no dropout from any marker, indicating successfully amplified DNA from all areas of the genome and high suitability for single-cell genomics. Whole genome sequencing of the Bacillus subtilis genome was performed on the MiSeq from 2 µg of non-amplified genomic DNA or DNA amplified by REPLI-g Single Cell WGA from cells. Comparable sequence coverage was observed for gDNA and REPLI-g Single Cell amplified DNA. A comparison of non-amplified and REPLI-g amplified DNA revealed error rates in a similar, very low, percentage range. The representation of regions of different GC content matches the representation of the genome.

38 Optimization of Library Construction Protocol to Sequence Large Fragment Libraries on PacBio

S. Shanker1, N.G. Panayotova1, X.H. Zhou1, G Yuan2, and D.A. Moraga1 1Genomic Division/ICBR University of Florida, 2Pacific Biosciences

The PacBio RS single molecule long-read sequencing platform offers the long sequence reads and robust coverage and thus the potential to produce improved genome assembly of finished quality containing fewer gaps and longer contigs. However, in order to take the full advantage of PacBio technology a large fragment size high-quality library is essential. The University of Florida is an early adopter of PacBio technology and has supported wide range of projects successfully. Following our RS150 (v2.1) upgrade and the use of the P5/C3 chemistry, we started experiencing issues with sequencing runs using large fragment libraries. The problems were more obvious to libraries that had been blue pippin selected. To overcome our obstacles, we optimized the library construction protocol in collaboration with PacBio. Here we present the challenges and improvements with the long fragment PacBio libraries.

39 Detecting and Quantifying Low Level Gene Variants in Sanger Sequencing Traces Using the ab1 Peak Reporter Tool

E. Schreiber1, S. Roy1

1Life Technologies, 200 Oyster Point Blvd., South San Francisco, CA 94080

Automated fluorescent dye-terminator DNA Sequencing using capillary electrophoresis (a.k.a. CE or Sanger sequencing) has been instrumental in the detailed characterization of the human genome and is now widely used as gold standard method for verification of germline mutations. The primary information of the DNA sequencing process is the identification of the nucleotides and of possible sequence variants. A largely unexplored feature of fluorescent Sanger sequencing traces is the quantitative information embedded therein. With the growing need for quantifying somatic mutations in tumor tissue it is desirable to exploit the potential of the quantitative information obtained from sequencing traces. To this end, we have developed the ab1PeakReporter tool that converts Sanger sequencing trace files into comma separated value (.csv) files containing numerical data of peak data characteristics that can be explored and analyzed using conventional spreadsheet software. The web-based tool can be accessed after log-in into a user account at http://apps.lifetechnologies.com/ab1peakreporter . The output file contains the peak height and quality values for each nucleotide and peak height ratios for all 4 bases at any given locus allowing the detection and assessment of subtle changes at any given allele. We demonstrated the utility of this tool by analyzing samples with known amounts of spiked in variant alleles ranging from 2.5%, 5%, 7.5%, 10%, 15% and 25% and show that rare alleles could be convincingly detected around the 5% – 7.5% level. In conclusion, enabling the high sensitivity detection of variants occurring at low level using Sanger sequencing will be useful as orthogonal verification method for next generation sequencing projects attempting to detect minor variants.

40 Characterization of FKH-8 as a Critical Regulator of Dopaminergic Neuron Function in C. elegans

E.M. Tross., B.L. Nelms, Ph.D.Fisk University

Dopaminergic neurons are specialized cells in the brain that produce the neurotransmitter dopamine. This neurotransmitter is important for many functions in humans and other animals, including sleep, mood, learning and movement. In order for dopaminergic neurons to function properly, they must commit to a cell fate and express precise levels of genes encoding signaling molecules and receptors for their response to stimuli. The choices neurons make to become specific neuron type and express certain molecules is thought to be largely controlled by transcription factors, but only a handful of the factors are known. We have recently identified a likely role for a member of the conserved and developmentally important forkhead transcription factor family, fkh-8, in regulating the function of dopaminergic neurons. We have preliminary data that deletion mutants of fkh-8, which is expressed in dopaminergic neurons, exhibit a swimming-induced paralysis phenotype indicative of altered levels of dopamine transport or production. To test our hypothesis that fkh-8 regulates critical aspects of DA neuron function, I am carrying out pharmacological, genetic, and phenotypic analyses. This is significant because it may be possible to use fkh-8 to help generate a program for dopamine neuron development, which could help with future studies to direct stem cells to a dopaminergic fate. This could potentially prevent or reverse diseases like Parkinson’s.

Page 14: POSTER ABSTRACTS - ABRF protocols, and ... POSTER ABSTRACTS ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 2 ... N. Jafari 5, C. Aquino 6, A. Perera 7

ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 13POSTER ABSTRACTS

41 Comparison of Illumina and Ion Torrent RNA-Sequencing and Microarray-based approaches for Profiling the Transcriptome

Yuriy Alekseyev1, Rebecca Kusko2, Ashley LeClerc1, Samantha Swahn1, Marc Lenburg2, Avrum Spira2 1Boston University School of Medicine, Microarray Resource, 72 E. Concord E605, Boston, MA, 02118, USA 2 Boston University School of Medicine, Computational Biomedicine, 72 E. Concord, Boston, MA, 02118, United States

Currently, most of RNA-seq experiments are performed on Illumina platform, but other companies are competing for market share. In this highly competitive environment, cross-platform comparisons and/or validations are becoming increasingly critical. Results of several comparisons in which the same samples were studied using Illumina and Ion Torrent RNA-seq, and different microarray-based approaches are presented. To prepare the libraries, the RNA samples were processed using Illumina TruSeq protocol (a protocol capturing polyadenylated RNA) and sequenced on Illumina HiSeq 2500 producing 100x100-nt paired-end reads. The same samples were processed using the Ion Torrent Total RNA-Seq V2 protocol which is capable of capturing non-coding RNA and preserves the strand specificity. These libraries were sequenced on the Ion Proton using the P1 chip and produced up to 200-nt reads. The data obtained with both platforms was compared for quality, alignment statistics, error rates, evenness and continuity of coverage, RNA biotype representation, and accuracy for expression profiling. Additionally, detailed comparison of technical aspects including input amount, throughput, experimental time and reagent costs is presented. Lastly, the same samples were interrogated using Agilent V2 Human Whole Genome arrays, Affymetrix Gene arrays ST (1.0 and 2.0) and newly commercialized Affymetrix Human Transcriptome Arrays. There was a significant correlation between the Illumina and Ion Torrent RNA-Seq gene expression data and microarray data generated from the same samples; however, the RNA-Seq detects additional transcripts whose expression were either not interrogated or not detected by microarrays.

42 Deep Sequencing of a Recent Bacterium Found in the Human Oral Cavity

F. Ghadiri1, J. Ghadiri1, S. Garcia1, A. Caldwell1, I. Vinnichenco1, C. Ouverney1

1San Jose State University, Department of Biological Sciences, San Jose, Ca, 95192

The human oral cavity harbors 500-700 species of bacteria, depending on many factors, including diet, hormone levels, health status and genetics. We know very little about most of these bacteria, but it has been reported that many of them have yet to be cultured. An uncultured bacterium currently has no isolated pure culture, an example of such bacterium is Saccharibacteria. Candidatus Sccharibacteria, or TM7, have been associated with the disease periodontitis – an advanced form of gingivitis that if left untreated leads to gum loss and tooth decay. Among the TM7 bacteria, a subgroup called TM7a has been identified not only in the human oral cavity, but also on the human skin and a range of environmental sites including soil, activated wastewater (sludge), termite and rumen gut. We aim to better understand the prevalence and diversity of TM7a in the human oral cavity. Human subgingival plaque samples were

taken from six healthy adults consisting of healthy and mild gingivitis sites. These samples were analyzed for the presence of total TM7 as well as TM7a using Illumina next generation sequencing based on 16S ribosomal DNA (rDNA) gene sequence. A total of 134,888 TM7 sequences were obtained, of which 13,149 were TM7a accounting for 10% of all TM7 sequences. This is the largest number of TM7 rDNA sequences reported from a single study and the only purely oral TM7 project done through the illumina platform. New phylotypes were discovered based on 97% similarity, association of TM7a to human disease was not determined.

43 Precise Quantification of Next Generation Sequencing Ion Torrent™ and Illumina Libraries using the QuantStudio™ 3D Digital PCR Platform

L. Degoricija1, P. Schweitzer2, A. Harris1, D. Mandelman1, S. Jackson1 and F. Cifuentes1

1Life Technologies, South San Francisco, CA, 2Cornell University, Ithaca, NY

Current methods for quantifying NGS libraries do not provide precise measurements of functionally relevant molecules. Accurate absolute quantification of NGS libraries is critical for obtaining optimal throughput from each sequencing run using the Ion Torrent PGM™ and Proton™ platforms and the Illumina HiSeq and MiSeq® platforms. Thus a need exists to precisely measure library concentration prior to clonal amplification for these sequencing systems. The QuantStudio™ 3D Digital PCR System (QS3D) offers a highly precise method for quantifying libraries prior to sequencing. Compared to alternative methods, Life Technologies chip-based dPCR approach is particularly attractive to accurately quantify libraries due to a simple workflow and eliminating the need for standard curves as required by traditional qPCR methods. TaqMan® gene expression assays were specifically designed that span the P1 and A adapters for Ion Torrent™ libraries, and that span the P5 and P7 adapters for Illumina libraries. A reaction mix was formulated by mixing diluted library with the QS3D Master Mix and TaqMan® assay and then loaded onto a QS3D chip. This chip was then sealed and the mixture amplified on a thermal cycler. Lastly, the data was visualized using the QuantStudio™ 3D AnalysisSuite Cloud Software which displays the target concentration. The concentration obtained from the digital platform was correlated to the percent of template beads (pre-enriched) for the Ion Torrent™ libraries and showed a tight range between 10-14% as determined by flow cytometry. The digital data obtained from the Illumina libraries showed a tight correlation with cluster density on an Illumina platform. We therefore present a simple and accurate workflow for quantifying NGS libraries using the QuantStudio™ 3D Digital PCR Platform.

44 SMRT® Sequencing Solutions for Large Genomes and Transcriptomes

J. Gu1, J. Chin1, P. Peluso1, D. Rank1, K. Kim1, J. Landolin1, S. Koren2, A.M. Phillippy2, E. Tseng1, S. Wang1, P. Baybayan1

1Pacific Biosciences, Menlo Park, CA, 2National Biodefense Analysis and Countermeasures Center; Center for Bioinformatics and Computational Biology; University of Maryland

Single Molecule, Real-Time (SMRT) Sequencing holds promise for addressing new frontiers in large genome complexities, such as long, highly repetitive, low-complexity regions and duplication events, and differentiating between transcript isoforms that are difficult to resolve

Page 15: POSTER ABSTRACTS - ABRF protocols, and ... POSTER ABSTRACTS ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 2 ... N. Jafari 5, C. Aquino 6, A. Perera 7

ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 14POSTER ABSTRACTS

with short-read technologies. We present solutions available for both reference genome improvement (>100 MB) and transcriptome research to best leverage long reads that have exceeded 20 Kb in length. Benefits for these applications are further realized with consistent use of size-selection of input sample using the BluePippin™ device from Sage Science. Highlights from our genome improvement projects using the latest P5-C3 chemistry on model organisms with contig N50 exceeding 6 Mb and longest contig exceeding 12.5 Mb with an average base quality of QV50 will be shared. Additionally, the value of long, intact reads to provide a no-assembly approach to investigate transcript isoforms using our Iso-Seq protocol will be presented.

45 Strand-Specific Transcriptome Sequencing for Challenging Samples

Craig Betts, Rachel Fish, Suvarna Gandlur, Andrew FarmerClontech Laboratories

Next Generation Sequencing (NGS) has empowered a deeper understanding of biology by enabling RNA expression analysis over the entire transcriptome with high sensitivity and dynamic range. A powerful application within this field is stranded RNA-Seq, which is necessary to distinguish closely-related genes and non-coding RNAs (e.g. lincRNA) or to define genes in poorly annotated, coding-rich genomes, such as many bacteria. Commonly used methods to generate strand-specific RNA-Seq libraries are plagued by protocols requiring several rounds of enzymatic treatments and cleanup steps, making them time intensive, insensitive, and challenging to process several samples simultaneously. Here we present a novel, single tube method, based on Clontech’s patented SMART technology, which is able to generate strand-specific RNA-Seq libraries from minute sample quantities in under four hours. This approach eliminates the multitude of labor-intensive enzymatic steps required by other stranded RNA-Seq methods while maintaining the sensitivity and reproducibility characteristic of SMART. We have successfully tested our technology with input levels from 100 pg to 100 ng poly(A)-selected RNA, as well as ribosomally depleted FFPE RNA, with outstanding reproducibility within and across input levels. Spike in of ERCC controls showed linear detection over six orders of magnitude and strand specificity of over 99%. The increased sensitivity achieved from SMART requires a more sensitive rRNA removal method; most methods typically require microgram amounts of total RNA. With this in mind, we have developed a method of rRNA depletion which effectively removes 28S, 18S, 5.8S, 5S, and 12S transcripts from mammalian samples down to 10 ng. The remaining RNA can easily be used in downstream sequencing applications with fewer than 5% of reads mapping back to rRNA. With these tools, researchers can more confidently apply NGS to challenging samples.

46 An Improved cDNA Library Generation Protocol for Transcriptome Analysis from a Single Cell

Andrew Farmer, Rachel Fish, Sally Zhang, Magnolia Bostick, Cynthia Chang, Suvarna GandlurClontech Laboratories, Inc.

As Next Generation Sequencing (NGS) technologies and transcriptome profiling using NGS mature, they are increasingly being used for more sensitive applications that have only limited sample availability. The ability to analyze the transcriptome of a single cell consistently and

meaningfully has only recently been realized. SMART™ technology is a powerful method for cDNA preparation that enables library preparation from very small amounts of starting material. Indeed, the SMARTer® Ultra™ Low RNA method allows researchers to readily obtain high quality data from a single cell or 10 pg of total RNA – the approximate amount of total RNA in a single cell. Recent studies have used this method to investigate heterogeneity among individual cells based on RNA expression patterns (Deng, Science 343, 2014 and Shalek, Nature 498, 2013). A new SMARTer protocol has been developed that is simpler and faster while improving the quality and yield of cDNA produced. The full-length cDNA produced with this method may be used as a template for library sample preparation for Ion Torrent™ and Illumina next generation sequencing platforms.

Sequencing results for libraries created from single cells or from equivalent amounts of total RNA demonstrate that approximately 90% of the reads map to RefSeq, less than 0.5% of the total reads map to rRNA, and the average transcript coverage is uniform. Improvements in the protocol following first strand synthesis and during cDNA amplification show higher sensitivity with an increase in gene counts and improved representation from GC-rich genes. These data indicate that the improved SMART cDNA protocol is an ideal choice for single cell transcriptome analysis.

47 Use of Next Generation Sequencing (NGS) Platforms for CE Sequencing on Thousands of SamplesH. Koshinsky1,3, M. Shin1, P. Dier1, J. Kyle1, R. O’Callahan2, M. Helmrick3, H. Phan3, D.L. Hirchberg4, V.Y. Fofanov2, J. Curry1 1Eureka Genomics – California, 2Eureka Genomics – Texas, 3Investigen, 4Columbia University

Although there are limitations (throughput, scalability, speed, labor intensiveness, and cost), sequence information for specific targeted regions of interest, has traditionally been generated by capillary electrophoresis (CE)-based Sanger sequencing. Next-Generation Sequencing (NGS) enables inexpensive (per base), massive, rapid, parallel sequencing of entire eukaryotic genomes. However, these attributes do not translate to single targets, where the cost of NGS lies in the sample preparation more than the per base cost of data generation. We have developed a method that inexpensively and in high throughput creates libraries of single (or up to 1000 plex) targets over thousands of samples for NGS. This method, HELSR, (hybridization extension ligation sequencing reaction) allows sequencing resources to be focused on single or multiple informative loci. Custom HELSR probes can be designed to generate overlapping amplicons to provide complete coverage of entire targeted single or multiple regions. With the use of Eureka Genomics’ EasySeq indices, all the desired targets from any sample are assigned a unique sample ID sequence (index). This can simultaneously occur in thousands of samples. These indexed samples are pooled into a single sequencing library and NGS data is generated. Based on the sample ID index, the reads are sorted by sample and provided to the biologist. This method (with Eureka Genomics’ EasySeq indices) has been used in a Mycobacterium tuberculosis (MTB) multi-drug resistant screening assay (Investigen) to assess 70 target regions, over 16 genes and containing 317 known mutations. The method is being implemented in Eureka Genomics’ assays for human colorectal (261 targets, 7 genes spanning 21337 bp of coding regions), prostate cancer (129 loci), and in collaboration with Columbia University for human respiratory viruses (98 targets, 37 viruses).

Page 16: POSTER ABSTRACTS - ABRF protocols, and ... POSTER ABSTRACTS ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 2 ... N. Jafari 5, C. Aquino 6, A. Perera 7

ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 15POSTER ABSTRACTS

48 Enabling High-Throughput Discovery of the RNA Transcription Landscape Using a Directional RNA Workflow and a Combinatorial Multiplexing Approach

Fiona J. Stewart, Daniela B. Munafo, Pingfang Liu, Christine J. Sumner, Bradley W. Langhorst, Eileen T. Dimalanta, Theodore B. DavisNew England Biolabs, Inc.

Massively parallel next generation cDNA sequencing (RNA-Seq), has allowed many advances in the characterization and quantification of transcriptomes. In addition to enabling the detection of non-canonical transcription start sites and termination sites, alternative splice isoforms, transcript mutations and edits can be identified. Additionally, the ability to obtain information on the originating strand is useful for many reasons including for example: identification of antisense transcripts, determination of the transcribed strand of noncoding RNAs, and determination of expression levels of coding or noncoding overlapping transcripts. Overall, the ability to determine the originating strand can substantially enhance the value of a RNA-seq experiment. However, standard methods for sequencing RNA do not provide information on the DNA strand from which the RNA strand was transcribed, and methods for strand-specific library preparation can be inefficient and time-consuming. Our objective was to address this challenge by developing a streamlined, low input method for Directional RNA-Sequencing that highly retains strand orientation information while maintaining even coverage of transcript expression. This method is based on second strand labeling and excision after adaptor ligation; allowing differential tagging of the first strand cDNA ends. We have also extended the utility of this method by developing additional adaptor and primer reagents, including a dual barcoding approach that allows for multiplexing up to 96 samples. As a result, we have enabled highly multiplexed, strand-specific mRNA sequencing, as well as whole transcriptome sequencing (Total RNA-seq) from ribosomal-depleted samples, enabling the discovery of a much broader picture of expression dynamics including discovery of antisense transcripts. This work presents a streamlined, fast solution for complete RNA sequencing, with high quality data that illustrates the complexity and diversity of the RNA transcription landscape.

49 DNA Sequence and Analysis of the O-antigen Gene Clusters of Escherichia coli Serogroups O62, O68, O131, O140, O142, and O163 and Serogroup-Specific PCR Assays

Y. Liu1, P. Fratamico1, C. DebRoy2, X. Yan1, D. S. Needleman3, R. Li4, W. Wang5, L. Losada5, L. Brinkac5, D. Radune5, M. Toro6, and J. Meng6 1Molecular Characterization of Foodborne Pathogens Research Unit, Eastern Regional Research Center, Agricultural Research Service, U.S. Department of Agriculture, Wyndmoor, PA, 2E. coli Reference Center, Department of Veterinary and Biomedical Sciences, The Pennsylvania State University, University Park, PA, 3Genetic Analysis Core Facility, Eastern Regional Research Center, Agricultural Research Service, U.S. Department of Agriculture, Wyndmoor, PA, 4Bovine Functional Genomics Laboratory, Animal and Natural Resources Institute, United States Department of Agriculture, Beltsville, MD,

5J. Craig Venter Institute, Rockville, MD, 6Department of Nutrition & Food Science, University of Maryland, College Park, MD

The DNA sequence of the O-antigen gene clusters of Escherichia coli serogroups O62, O68, O131, O140, O142, and O163 was determined. There were 9 to 12 open reading frames (ORFs) identified, encoding genes required for O-antigen sugar biosynthesis, transfer, and processing. Primers based on the wzx (O-antigen flippase) and wzy (O-antigen polymerase) genes within the O-antigen gene clusters were designed and used in PCR assays to identify each serogroup. Of interest, an NCBI BLAST alignment showed that the O-antigen gene cluster of O62 shares a 99% sequence identity with O68, with the exception of an insertion sequence (IS) element between the rmlA and rmlC genes of O62. The IS insertion in O62 infers that O68 may be the ancestor of O62. Specificity testing using strains belonging to each of the serogroups isolated from various sources, representative standard strains of 174 E. coli O serogroups, and 16 non-E. coli bacteria revealed that the PCR assays were specific for each serogroup. The PCR assay targeting O62 also gave positive results with strains serotyped as E. coli O68, which confirms that the O-antigen gene cluster sequences for these two serogroups are very similar. The PCR assays developed in this study can be used for the detection and identification of E. coli O62, O68, O131, O140, O142, and O163 strains isolated from different sources.

50 Changing of the Guard: Migrating 16S and ITS Diversity Profiling from 454 Onto Alternate NGS Platforms

K. McGrath, R. McNally, M. Johnson, N. Kasinadhuni, M. Tinning The Australian Genome Research Facility

The Australian Genome Research Facility (AGRF) currently operates a microbial diversity profiling service on the 454 GS-FLX platform, targeting various regions of 16S and ITS microbial genes. The recent announcement by Roche to discontinue support prompted us to investigate how this service can be transitioned to alternate NGS platforms. The objective of this study was to evaluate the accuracy and reproducibility of each platform/target combination for both artificial control samples (known gDNA pools) and natural microbial community samples (soils). Replicate PCR amplicons were generated for several microbial and fungal regions (16S and ITS), and sequenced on the GS-FLX, MiSeq and Ion Torrent PGM platforms. The resulting sequences were identified at different taxonomic levels (phylum to species) and the results for each sample, platform and target compiled. This comparison revealed distinct and reproducible differences between the same samples run on different platforms. While no platform was able to perfectly describe the proportions of the artificial pooled sample, each one showed a high level of reproducibility (< 1% variance). This suggests that PCR bias during amplification gives rise to consistent biases in resulting microbial distributions and confirms this method as a useful tool for comparing samples and monitoring changes, but demonstrate its weakness for absolute quantitation. Between platforms, the long reads of the GS-FLX gave the greatest resolution for microbial identification, followed by the MiSeq. Despite the shorter amplicon size, the ion torrent was still able to provide high resolution profiles of the microbial samples. These results show that currently, the 454 still provides higher resolution data sets for microbial diversity than other available platforms. However, as read lengths improve and error rates decrease, these alternate platforms should provide a suitable transition pathway for researchers interested in profiling microbial diversity.

Page 17: POSTER ABSTRACTS - ABRF protocols, and ... POSTER ABSTRACTS ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 2 ... N. Jafari 5, C. Aquino 6, A. Perera 7

ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 16POSTER ABSTRACTS

51 A Novel Analytical Pipeline for de novo Haplotype Phasing and Amplicon Analysis using SMRT® Sequencing Technology

R. Lleras1, B. Bowman1, S. Ranade1, J. Harting1 1Pacific Biosciences, Menlo Park, CA

While the identification of individual SNPs has been readily available for some time, the ability to accurately phase SNPs and structural variation across a haplotype has been a challenge. With individual reads of up to 30kb in length, SMRT® Sequencing technology allows the identification of combinations of mutations such as microdeletions, insertions, and substitutions without any predetermined reference sequence. Long amplicon analysis is a novel protocol that identifies and reports the abundance of differing clusters of sequencing reads within a single library. Graphs generated via hierarchical clustering of individual sequencing reads are used to generate Markov models representing the consensus sequence of individual clusters found to be significantly different. Long amplicon analysis is capable of differentiating between underlying sequences that are 99.9% similar, such as haplotypes and pseudogenes. This protocol allowed for the identification of structural variation in the MUC5AC gene sequence, despite the presence of a gap in the current genome assembly. Long amplicon analysis allows for the elucidation of complex regions otherwise missed by other sequencing technologies, which may contribute to the diagnosis and understanding of otherwise mysterious diseases.

52 Automated Assessment of Next Generation Sequencing Library Preparation Workflow for Quality and Quantity Using the Agilent 2200 TapeStation System

Melissa Huang Liu1, Solange Borg1, Ruediger Salowsky2, Arunkumar Padmanaban2, Deepak Saligrama2, Donna McDade Walker2, Adam Inche2 and Jim Elliott2

1Agilent Technologies, La Jolla, CA, 2Agilent Technologies

Quality and quantity assessment of the Next Generation Sequencing (NGS) library manufacture is critical in ensuring successful sequencing results. The Agilent Genomic DNA ScreenTape and the new D1000 ScreenTape assays have been developed to provide a reproducible QC method for analyzing samples in this library preparation workflow. The Genomic DNA ScreenTape assay combined with the Agilent 2200 TapeStation system, automates the assessment of the starting genomic DNA with sample volume as low as 1µl and producing digital results in less than 2 minutes per sample. The D1000 ScreenTape assay provides a QC platform for analyzing the library construction throughout the workflow by automating electrophoretic separation, sizing and quantification. In addition, the ability to overlap and compare electropherograms within the analysis software enables the discrimination of sample quality during the workflow. We present data that shows application of the electrophoretic system for quantification and quality assessment of NGS library preparation workflow from starting material QC to final library quality and quantity determination for the Agilent SureSelect enrichment protocol.

53 Automated RNA Sample Quality Control

Melissa Huang Liu1, Solange Borg1, Ruediger Salowsky2, Adam Inche2, Matthew Connelly2, Dierdre Boland2, Arunkumar Padmanabanv and Eva Graf 2 1Agilent Technologies, La Jolla, CA, 2Agilent Technologies

The Agilent 2200 TapeStation system provides a flexible solution for automated analysis of up to 96 samples using pre-packaged reagents and minimal manual handling. Here we present a new assay – the RNA ScreenTape assay – to enable robust quantification and quality analysis of Total RNA samples from both eukaryotic and prokaryotic sources from 100 pg/µl to 500 ng/µl. The new assay additionally benefits from the ability to provide separation of contaminant genomic DNA allowing more accurate purity assessment of sample material. This study compares the performance of the new RNA assays against the market leading Agilent 2100 Bioanalyzer and NanoDrop for RNA quality and quantity determination. We conclude that the new RNA ScreenTape and High Sensitivity assays provide correlated data to these existing technologies, as well as exceeding these technologies in terms of flexibility and automation.

54 Rapid DNA-Seq to Achieve High Coverage Libraries from 1ng – 1 µg in 2 Hours

J. Risinger1, J. Dickman1, M.M. Toloue1

1Bioo Scientific

As next-generation sequencing technology progresses, the need for highly efficient, non-biased library generation for a wide variety of inputs has grown. High throughput sequencing operations require library preparation protocols which optimize many parameters, such as enzyme efficiency, low input flexibility and time efficiency without sacrificing the quality of downstream bioinformatics analysis. We’ve recently developed a highly efficient adapter ligation procedure that significantly increases the number of adapters bound to DNA insert. This should theoretically improve complexity, diversity, alignment and reduce the number of PCR duplicates in a library. Results: To test this, the enhanced adapter ligation step was used in our library prep protocol to prepare bacterial and human libraries along with currently established DNA-seq protocols. Final libraries were then analyzed on both Illumina MiSeq and HiSeq platforms, followed by coverage and library complexity analysis of the sequence data. Depth of coverage as a function of input material was shown to outperform regular ligation, especially in the 1 ng to 10 ng DNA input range. Normalized coverage as a function of GC content revealed no intrinsic bias when compared to older ligation and amplification methods. The enhanced adapter ligation and several new modifications in this new protocol allow users for the first time to build complete genomic DNA libraries in as little as 2 hours with low nanogram input material.

Page 18: POSTER ABSTRACTS - ABRF protocols, and ... POSTER ABSTRACTS ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 2 ... N. Jafari 5, C. Aquino 6, A. Perera 7

ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 17POSTER ABSTRACTS

55 Sequencing Single Human and Bacterial Cells at Low Coverage for Aneuploidy, CNV, and Genotyping Applications

E. Kamberov, T. Tesmer, S. Yerramilli, M. Carey, J Langmore, M. Carroll Rubicon Genomics, Inc.

Single-cell analysis using PCR and arrays is well established for determining aneuploidy, CNV, and genotyping single cells. NGS of single cells presents many barriers to complete and reproducible analysis, including incomplete genome coverage and irreproducible results. However a surprising number of applications can be successfully executed using partial coverage of the genome as long as the coverage is reproducible. These technical applications include a) aneuploidy and copy number variation determination, b) SNP or other single nucleotide variations in a fraction of the genome, and c) identification of complex populations of cells. These technical applications are important for commercial applications such as pre-implantation genetic screening and diagnosis (PGS and PGD), prenatal diagnostics from single or small numbers of fetal cells in maternal circulation, cancer diagnostics from circulating tumor cells, and identification of infectious disease. In these applications coverage can be compromised as long as the partial coverage is reproducible. We have sequenced human and bacterial cells using a version of the Rubicon PicoPLEX technology that is being developed as a single-cell NGS library kit. This PicoPLEX-scD single-cell NGS prep is as simple as the PicoPLEX WGA kit, which is currently used for microarray and PCR studies and diagnostics from single cells. The PicoPLEX-scD prototype kits were used for the above technical applications. Sequencing quality, genome coverage, and reproducibility were measured in flow sorted and microdissected human cells. To verify that the MiSeq NGS could be used for PGS/PGD applications, as many as 24 single human cells were multiplexed on a single lane. Megabase losses or gains of copy number were reproducibly measured with as few as 200,000 clusters per sample. Partial genotyping and variant identification using single cells were also measured. Finally, single cells were studied in mixtures of other genomes.

56 Next-Generation Genomics Facility at C-CAMP: Accelerating Genomic Research in India

Malali Gowda*, Chandana S, Heikham Russiachand, Pradeep H, Shilpa S, Ashwini M, Sahana S, Jayanth B, Goutham Atla, Smita Jain and Nandini ArunkumarCentre for Cellular and Molecular Platforms (C-CAMP), National Centre for Biological Sciences, GKVK Post, Bangalore – 560 065

Next-Generation Sequencing (NGS; http://www.genome.gov/12513162) is a recent life-sciences technological revolution that allows scientists to decode genomes or transcriptomes at a much faster rate with a lower cost. Genomic-based studies are in a relatively

slow pace in India due to the non-availability of genomics experts, trained personnel and dedicated service providers. Using NGS there is a lot of potential to study India’s national diversity (of all kinds). We at the Centre for Cellular and Molecular Platforms (C-CAMP) have launched the Next Generation Genomics Facility (NGGF) to provide genomics service to scientists, to train researchers and also work on national and international genomic projects. We have HiSeq1000 from Illumina and GS-FLX Plus from Roche454. The long reads from GS FLX Plus, and high sequence depth from HiSeq1000, are the best and ideal hybrid approaches for de novo and re-sequencing of genomes and transcriptomes. At our facility, we have sequenced around 70 different organisms comprising of more than 388 genomes and 615 transcriptomes – prokaryotes and eukaryotes (fungi, plants and animals). In addition we have optimized other unique applications such as small RNA (miRNA, siRNA etc), long Mate-pair sequencing (2 to 20 Kb), Coding sequences (Exome), Methylome (ChIP-Seq), Restriction Mapping (RAD-Seq), Human Leukocyte Antigen (HLA) typing, mixed genomes (metagenomes) and target amplicons, etc. Translating DNA sequence data from NGS sequencer into meaningful information is an important exercise. Under NGGF, we have bioinformatics experts and high-end computing resources to dissect NGS data such as genome assembly and annotation, gene expression, target enrichment, variant calling (SSR or SNP), comparative analysis etc. Our services (sequencing and bioinformatics) have been utilized by more than 45 organizations (academia and industry) both within India and outside, resulting several publications in peer-reviewed journals and several genomic/transcriptomic data is available at NCBI.

57 Improved Manager of Next Generation Sequencing Orders – MANGO

C. A. Fournier, J. K. Georgijevic, R. Schlapbach Functional Genomics Center, ETHZ/UZH

The Functional Genomics Center Zurich (FGCZ) is a joint state-of-the-art research and training facility of the ETH Zurich and the University of Zurich. With latest technologies and expert support in genomics, transcriptomics, and bioinformatics, the FGCZ carries out research projects and technology development in collaboration with the Zurich Life Science research community. The FGCZ offers services for different applications on the Illumina HiSeq2500, Illumina MiSeq, Ion Torrent, Ion Proton and PACBIO RS. At the FGCZ, we handle hundreds of NGS projects a year. We conceptualized, developed and implemented the MANGO to help manage, track, monitor and document our various and diverse NGS service orders. The MANGO works in multiple levels, first, it is a web accessible sample tracking system. It can be accessed and sample data can be added in real-time through a computer, an android tablet or an Ipad. Second, it manages multiplexing of sequencing runs because it can detect sub-optimal index combinations from various popular commericial kits and self made indices. Third, the MANGO creates well-formatted sample sheets for the various sequencers available in the FGCZ. Fourth, it can accept data in .csv format from instruments used for QC during library preparation. And lastly, it is flexible in adapting to the ever changing NGS workflows and instrumentations. In the poster, we will present the new features that have been implemented in the MANGO.

Page 19: POSTER ABSTRACTS - ABRF protocols, and ... POSTER ABSTRACTS ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 2 ... N. Jafari 5, C. Aquino 6, A. Perera 7

ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 18POSTER ABSTRACTS

FLOW CYTOMETRY

58 AcGFP and mCherry Calibration Beads for Flow Cytometry

M. Haugwitz1, M. Vierra1, T. Garachtchenko1, V. Gupta1, I. Schmid2, T. Hawley3, R. Cimbro4, A. Farmer1 1Clontech Laboratories Inc., 2University of California Los Angeles, 3George Washington University, 4John Hopkins University

Analysis of cells via flow cytometry requires calibration of the instrument with the fluorophore used to label the cells of interest. Commercially-available calibration beads are an indispensable tool when preparing flow cytometers for experiments using fluorescent dyes (e.g. FITC); however, due to the different spectral characteristics of fluorescent proteins versus fluorescent dyes, existing fluorescent dye beads are not suitable for instrument calibration if the cells being analyzed express fluorescent proteins. Therefore, we developed calibration beads labeled with either the red fluorescent protein mCherry or the green fluorescent protein AcGFP, which has spectral properties almost identical to EGFP.

Beads with a very low size deviation (CV 2.5–3%) were used to create distinct fluorescent bead populations by covalently linking specific amounts of the respective fluorescent proteins. The low size deviation of the beads, in conjunction with a very controlled labeling method, allowed us to create six bead populations with distinct fluorescent intensities for each of the two fluorescent proteins.

Here, we show that the fluorescent protein Flow Cytometer Calibration Beads are easy to use and that they perform equally well on a variety of flow cytometer platforms. We also present data showing that the mean fluorescence intensity of the beads and the calculated number of fluorescent proteins on each respective bead population are distinct from each other and in a linear correlation. We also provide supporting data showing the signal stability of the calibration beads under different buffer and fixative conditions, as well as at different flow rates. The data show that these calibration beads are a very useful tool, enabling fast and reliable calibration of flow cytometers prior to analysis of cells expressing the corresponding fluorescent protein.

GENE ARRAYS

59 Effects of RNA Degradation on Performance of the Illumina Whole-Genome DASL HT Assay

D. Goerlitz1, X. Li2, MD Islam1, S.W. Byers1, and R. Riggins1 1Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington DC, USA, 2Department of Biostatistics, Bioinformatics and Biomathematics, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington DC, USA

RNA integrity is one of the most important factors influencing the generation of accurate quantitative results from gene expression

experiments. Degraded RNA, such as that typically derived from formalin-fixed paraffin embedded (FFPE) tissue, can hinder amplification and labeling of cDNA. In addition to the direct hybridization whole-genome expression method, Illumina offers the whole-genome cDNA-mediated annealing, selection, extension, and ligation (DASL) assay for expression profiling of partially degraded RNA. Here we purposely degrade RNA to demonstrate the effects of RNA degradation on the performance of the DASL vs. direct hybridization method. MCF7 cell lines were exposed to either 5uM or 10uM ERRbeta/ERRgamma agonist DY131, or DMSO control. RNA was isolated, and fully intact RNA (495 ng, RIN = 10) was processed using the direct hybridization method and hybridized to a HumanHT-12 Expression BeadChip in duplicate. Treatment samples (5uM or 10uM DY131) were compared to the control (DMSO) sample and analyzed for differentially expressed genes (DEGs). The same RNAs (495 ng, RIN = 10) were processed in parallel using the DASL method and analyzed as above. The same RNA samples were then incubated for 5 minutes at 90°C to degrade to an average RIN = 6.1 and processed with the DASL method and analyzed as above. Using the direct hybridization and DASL methods with fully intact RNA, a total of 138 and 165 genes, respectively, were found to be differentially expressed after 5uM or 10uM DY131 incubation vs. control (DMSO) (p-value < 0.05, 2-fold change cutoff, one-way ANOVA). However, only 34 of these genes overlapped. Analysis of degraded RNA (DASL) revealed 165 DEGs, with 98 overlapping with DEGs found in fully intact RNA (DASL). Overall, we found that although the DASL method increases sensitivity, it is challenged to maintain similar expression patterns with the direct hybridization method for both fully intact and partially degraded RNA.

60 Expression Profiling Smackdown: Human Transcriptome Array HTA 2.0 vs. RNA-Seq

Tim Hunter1, Meghann Palermo1, Heather Driscoll2, Scott Tighe1, Julie Dragon1, Jeff Bond1, Arti Shukla1, Mahesh Vangala2, James Vincent1 1University of Vermont, Burlington, Vermont,2Norwich University, Northfield, Vermont

The advent of both microarray and massively parallel sequencing have revolutionized high-throughput analysis of the human transcriptome. Due to limitations in microarray technology, detecting and quantifying coding transcript isoforms, in addition to non-coding transcripts, has been challenging. As a result, RNA-Seq has been the preferred method for characterizing the full human transcriptome, until now. A new high-resolution array from Affymetrix, GeneChip Human Transcriptome Array 2.0 (HTA 2.0), has been designed to interrogate all transcript isoforms in the human transcriptome with >6 million probes targeting coding transcripts, exon-exon splice junctions, and non-coding transcripts. Here we compare expression results from GeneChip HTA 2.0 and RNA-Seq data using identical RNA extractions from three samples each of healthy human mesothelial cells in culture, LP9-C1, and healthy mesothelial cells treated with asbestos, LP9-A1. For GeneChip HTA 2.0 sample preparation, we chose to compare two target preparation methods, NuGEN Ovation Pico WTA V2 with the Encore Biotin Module versus Affymetrix’s GeneChip WT PLUS with the WT Terminal Labeling Kit, on identical RNA extractions from both untreated and treated samples. These same RNA extractions were used for the RNA-Seq library preparation. All analyses were performed in Partek Genomics Suite 6.6. Expression profiles for control and asbestos-treated mesothelial cells prepared with NuGEN versus Affymetrix target preparation methods (GeneChip HTA 2.0) are compared to each other as well as to RNA-Seq results.

Page 20: POSTER ABSTRACTS - ABRF protocols, and ... POSTER ABSTRACTS ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 2 ... N. Jafari 5, C. Aquino 6, A. Perera 7

ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 19POSTER ABSTRACTS

GENETIC VARIATION

61 Requirements for Bacteriophage Growth: Using High Throughput Sequencing to Determine Gene Essentiality

Joseph Pérez1, Rebekah Dedrick2, Michael Rubin1 and Graham Hatfull21Natural Sciences Program, University of Puerto Rico at Cayey, 2Pittsburgh Bacteriophage Institute and Department of Biological Sciences, University of Pittsburgh

Bacteriophages represent an absolute majority of all organisms in the biosphere and offer a special perspective on the diversity, origins, and evolution of viruses. Over 500 Mycobacteriophages have been isolated and sequenced thus far. These phages can be grouped into clusters, where genomes within a cluster have recognizable sequence similarity that spans over more than 50% of their genome lengths. They are replete with novel genes, which do not have homologs in any database. To predict the function of these genes, it is useful to determine if the genes are required for growth of the phage. Bacteriophage Recombineering of Electroporated DNA (BRED) was developed in the Hatfull laboratory to construct targeted mutations in mycobacteriophages and determine the essentiality of genes for lytic phage growth. However, this method is expensive and time-consuming; therefore, we will develop a high throughput method of determining gene essentiality using ethyl methanesulfonate (EMS) to mutagenize mycobacteriophages. The survivors of this treatment will be pooled, DNA will be extracted, and deep sequencing of phages will be conducted. By analyzing the sequencing results we will be able to determine the number of nonsense and missense mutations, and conclude which genes are essential (survivors not found) and nonessential (survivors sequenced) during lytic growth. Mycobacteriophage Giles was exposed to different amounts of EMS ranging from 5 – 40 µl; any amount of EMS over 8 µl caused complete phage death and 7 µl of EMS yielded a two-log decrease in phage growth. It was determined that multiple rounds of 7 µl EMS mutagenesis would be required to cause a large amount of mutations. After each round of mutagenesis DNA will be extracted and analyzed to determine gene essentiality. Determination of the set of essential genes in defined infection and growth conditions will provide important insights into the biology of the phage life cycle.

HPLC OF PROTEINS AND PEPTIDES

62 Isolation and Purification of Water Soluble Proteins from Ginger Root (Zingiber officinale) by Two Dimensional Liquid Chromatography

A. Ahmed1, A.O. Sandovall1,2, K. Andrews1, A. Wahab3, M.I. Choudhary3

1RI-INBRE Research Core Facility, College of Pharmacy, University of Rhode Island, Kingston, Rhode Island, USA,

2Biotechnology Program, Community College of Rhode Island, Warwick, Rhode Island, USA, 3International Center for Chemical & biological Sciences, University of Karachi, Karachi, Pakistan.

The RI-INBRE Centralized Core Facility was established in 2003 and participates annually in Undergraduate Summer Research Program. It provides students hands on research experience in key technologies in biomedical sciences. We present here the isolation and purification of water soluble proteins from ginger, a rhizome of the plant, Zingiber officinale. It is an important ingredient of species used in traditional South Asian cuisines. In Indian, Pakistani and Chinese folk medicine, ginger is used for gastro-intestinal disorders, nausea, vomiting, inflammatory diseases, muscle and joint pain. Limited studies have been reported on the bioactive proteins from ginger extract. The water soluble proteins were extracted from ginger root and successfully purified to homogeneity by using two-dimensional liquid chromatography (FPLC/RP-HPLC) approach. The ginger root was washed with distilled water; skin removed and then emulsified using an electric blender. Sample was stirred for four days at 4°C with and without protease inhibitor. Purification of a 42kDa protein was achieved by employing gel filtration, ion-exchange and reversed phase HPLC. The homogeneity of the protein was confirmed by SDS-PAGE gel electrophoresis and MALDI-TOF mass spectrometry. Future work will be conducted on the protein characterization using mass spectrometry and Edman protein sequencing. Supported by grant 5P20GM103430 from the National Institute of General Medical Sciences, NIH, USA.

LIGHT MICROSCOPY

63 The HVAC Challenges of Upgrading an Old Lab for High-end Light Microscopes L.M. Callahan1,2, R. Richard3, P. Martone4

1University of Rochester Medical Center Light Microscopy Resource, 2Pathology and Laboratory Medicine, 3Medical Center Facilities Organization, Mechanical Engineering, 4Pat Martone, Medical Center Facilities Organization, HVAC

The University of Rochester Medical Center forms the centerpiece of the University of Rochester’s health research, teaching, patient care, and community outreach missions. Within this large facility of over 5 million square feet, demolition and remodeling of existing spaces is a constant activity. With more than $145 million in federal research funding, lab space is frequently repurposed and renovated to support this work. The URMC Medical Center Facilities Organization supporting small to medium space renovations is constantly challenged and constrained by the existing mechanical infrastructure and budgets to deliver a renovated space that functions within the equipment environmental parameters. One recent project, sponsored by the URMC Shared Resources Laboratory, demonstrates these points. The URMC Light Microscopy Shared Resource Laboratory requested renovation of a 121 sq. ft. room in a 40 year old building which would enable placement of a laser capture microdissection microscope and a Pascal 5 laser scanning confocal microscope with the instruments separated by a blackout curtain. This poster discusses the engineering approach implemented to bring an older lab into the environmental specifications needed for the proper operation of the high-end light microscopes.

Page 21: POSTER ABSTRACTS - ABRF protocols, and ... POSTER ABSTRACTS ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 2 ... N. Jafari 5, C. Aquino 6, A. Perera 7

ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 20POSTER ABSTRACTS

64 Impact of the Development of a Light Microscopy Shared Resource for the University of Rochester Medical Center: A Quantitative Assessment

L.M. Callahan1,2, M. Jepson1,2, P. Jordan1,2, K. Kasischke3, E. Brown1,4, A. Reed5, M. Lentine5,6, T. Bushnell5,7,8, E. Puzas8,9,10

1URMC Light Microscopy Shared Resource, 2Pathology and Laboratory Medicine, 3Dept. of Neurology, Univ. Ulm Med. Ctr., Ulm, Germany 4Biomedical Engineering, 5URMC Shared Resource Laboratories, 6URMC Financing Division, 7Department Pediatrics and URMC Flow Cytometry Core, 8URMC Shared Resources Administration, 9Orthopedics and Center for Musculoskeletal Disease, 10URMC Senior Associate Deans. University of Rochester Medical Center, Rochester New York 14620

The University of Rochester Medical Center (URMC) determined the need for a shared Light Microscopy facility to support researchers requiring high-end light microscopy for their research programs. URMC Shared Resource Laboratories (SRLs) represent a strategic investment in technology, targeted expertise, and space administration to systematically support and advance the research mission of the institution. Recognizing the need for centralized light microscopy resources to support the University of Rochester Medical Center, a task force of senior researchers, investigators, and administration developed a plan to create a light microscopy resource. Through strategic investment for instrument upgrades and acquisition as well as hiring of additional staff, the LM resource has grown since its inception in 2008 with expanded capacity and capabilities to support the diverse needs and studies of the URMC researchers. The data presented here address the impact of the LM Shared Resource on the URMC research community in quantitative areas such as publications, new grant funding, and training as well as addressing qualitative measures of success including impact on graduate education and new research avenues.

65 Machine Learning Algorithms Implemented in Image Analysis

A. Cornea1, J. Chen1,2, L. Renner1, M. Neuringer1

1Oregon National Primate Research Center, 2School of Science and Technology

A typical core facility is faced with a wide variety of experimental paradigms, samples, and images to be analyzed. They typically have one thing in common: a need to segment features of interest from the rest of the image. In many cases, for example fluorescence images with good contrast and signal to noise, intensity segmentation may be successful. Often, however, images may not be acquired in optimum conditions, or features of interest are not distinguished by intensity alone. Examples we encountered are: retina fundus photographs, histological stains, DAB immunohistochemistry, etc. We used machine learning algorithms as implemented in FIJI to isolate specific features in longitudinal retinal photographs of non-human primates. Images acquired over several years with different technologies, cameras and skills were analyzed to evaluate small changes with precision. The protocol used includes: Scale-Invariant feature Transform (SIFT) registration, Contrast Limited Adaptive Histogram Equalization (CLAHE) and Weka training. Variance of results for different images of the same time point and for different raters of the same images was less than 10% in most cases.

MASS SPECTROMETRY

66 Combining a Chip-Mate nESI Ion Source with Make-Up Flow Strategy for Increased System Robustness and Sensitivity in nLC-nESI-MS

D. Eikel1, C. Wang1, J. Jones1, S.J. Prosser1

Advion Inc., 10 Brown Road, Ithaca, NY 14850, USA

Adding a post column stream of solvent to the effluent from a nano-liquid chromatography system (make-up flow strategy, MUF) allows for an optimization of the solvent composition for nano-electrospray (nESI) formation of ions and can positively influence analytical performance by effecting ionization efficiency, chromatographic peak width and nESI robustness. Here, we investigated the impact of various solvent additions post analytical column and upstream of a novel chip-based nano electrospray ion source (Chip-Mate) by using a simple splitting T and a flow rate range of 200 to 800 nL/min. A standard 6 protein digest sample was analyzed using a Waters nano Acquity nLC system, an ACE C18 75 um ID analytical column, an Advion Chip-Mate nESI ion source and a Sciex 5500 QTrap mass spectrometer using short, 30 min gradients. A selection of peptides from the mixture were analyzed for peak area and height and their dependence on the ratio of analytical flow rate vs MUF rate as well as the MUF solvent composition. Although a solvent addition effectively dilutes the analyte concentration post column and mass spectrometers are usually considered concentration dependent detectors, we found an actual increase of sensitivity with solvents containing dimethylsulfoxide (DMSO), iso-propanol (IPA) or increased vol% formic acid. So far, sensitivity gains were observed with a factor of 2-10 with some bias towards larger and hydrophobic peptides at an optimal ratio of the analytical flow to MUF of 2:1. Addition of a MUF is a simple set-up choice for any nESI system that does only require an additional nLC pumping channel with sufficient accuracy and stability in nano flow rates. MUF addition increases spray robustness, stability and longevity of a spray emitter as well as significantly increases the system sensitivity with little observed downsides so far.

67 Reference Proteome Extracts for Mass Spec Instrument Performance Validation and Method Development

Sergei Saveliev, Mike Rosenblatt, Marjeta UrhPromega Corporation

Biological samples of high complexity are required to test protein mass spec sample preparation procedures and validate mass spec instrument performance. Total cell protein extracts provide the needed sample complexity. However, to be compatible with mass spec applications, such extracts should meet a number of design requirements:

• compatibility with LC/MS (free of detergents, etc.)

• high protein integrity (minimal level of protein degradation and non-biological PTMs)

• compatibility with common sample preparation methods such as proteolysis, PTM enrichment and mass-tag labeling

• Lot-to-lot reproducibility

Page 22: POSTER ABSTRACTS - ABRF protocols, and ... POSTER ABSTRACTS ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 2 ... N. Jafari 5, C. Aquino 6, A. Perera 7

ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 21POSTER ABSTRACTS

Here we describe total protein extracts from yeast and human cells that meet the above criteria. Two extract formats have been developed:

• Intact protein extracts with primary use for sample preparation method development and optimization

• Pre-digested extracts (peptides) with primary use for instrument validation and performance monitoring

METABOLOMICS

68 Frataxin Deficeiency Alters Assembly of Mitocohndrial Electron Transport Supercomplexes

T. Wang1, B. Van Houten2

Friedreich’s ataxia (FRDA), an inherited, progressive neurodegenerative disease, is caused by a reduced expression of the mitochondrial iron-binding protein, frataxin. To study the pathophysiological mechanism of how frataxin deficiency causes the devastating disease, we created a cellular model of FRDA by stably knocking down frataxin in glioma LN428 cell line. The expression level of frataxin in knockdown cell lines was about 25% of that of control. OXPHOS was significantly down-regulated in frataxin knockdown cells by ~40%. Western analysis indicated that Fe/S containing subunits in complex I, II and III were decreased by 1-2 fold in response to frataxin deficiency. Reduced protein levels of these subunits affected the assembly of complexes and the formation of supercomplexes, as observed by Blue-Native Gel Electrophoresis. The activities of aconitase were also reduced by 2 fold in response to frataxin deficiency. PDK1, a protein kinase that phosphorylates and thus inactivates pyruvate dehydrogenase alpha subunit, was more than 2-fold overexpressed in frataxin knockdown cells. Thus, at three key steps of respiration, including pyruvate entry into TCA cycle, the TCA cycle and OXPHOS, cells down-regulated their activities in response to frataxin deficiency. We conclude that down-regulation of bioenergetics is an important mechanism for the pathological development of Friedreich ataxia. Friedreich’s ataxia (FRDA), an inherited, progressive neurodegenerative disease, is caused by a reduced expression of the mitochondrial iron-binding protein, frataxin. To study the pathophysiological mechanism of how frataxin deficiency causes the devastating disease, we created a cellular model of FRDA by stably knocking down frataxin in glioma LN428 cell line. The expression level of frataxin in knockdown cell lines was about 25% of that of control. OXPHOS was significantly down-regulated in frataxin knockdown cells by ~40%. Western analysis indicated that Fe/S containing subunits in complex I, II and III were decreased by 1-2 fold in response to frataxin deficiency. Reduced protein levels of these subunits affected the assembly of complexes and the formation of supercomplexes, as observed by Blue-Native Gel Electrophoresis. The activities of aconitase were also reduced by 2 fold in response to frataxin deficiency. PDK1, a protein kinase that phosphorylates and thus inactivates pyruvate dehydrogenase alpha subunit, was more than 2-fold overexpressed in frataxin knockdown cells. Thus, at three key steps of respiration, including pyruvate entry into TCA cycle, the TCA cycle and OXPHOS, cells down-regulated their activities in response to frataxin deficiency. We conclude that down-regulation of bioenergetics is an important mechanism for the pathological development of Friedreich ataxia.

69 A Novel Approach for Processing LC – Ion Mobility – MS Metabolomics Data

Giorgis Isaac1, Giuseppe Astarita1, Martin Palmer2, Mark Bennett3, James I. Langridge3, John P. Shockcor1, Andy Borthwick3

1Waters Corporation, Milford, MA, 2Waters Corporation, Manchester, United Kingdom, 3Nonlinear Dynamics, Newcastle, upon Tyne, United Kingdom

MS interfaced with LC and ion mobility (IM) is routinely used to measure the level and variation of metabolites within biofluids as data generated through metabolomics studies may yield insight into disease onset and progression. LC-IM-MS based metabolomics generates large and complex data sets with analysis and interpretation of the results being the rate determining steps. This has led to a demand for improved data analysis, including processing and advanced multivariate approaches, which are described for the large scale analysis of metabolomics datasets. Urine from a healthy individual was centrifuged and the supernatant diluted. The urine was divided into control, low dosed (LD) and high dosed (HD) groups. To create a sample set, 11 different drugs were differentially spiked into LD and HD urine, contrasted with blank urine. A reversed phase gradient was applied and MS data acquired in positive ion data independent (LC-DIA-MS) and mobility assisted data independent (LC-IM-DIA-MS) modes.Distinguishing biological variation and metabolic change from analytical interference is key to data processing and analysis. Samples were randomized and measured six times, including QC runs, to ensure statistically valid analysis. LC-MS data were retention time aligned and deconvoluted to produce a feature list. Identified features were compound searched and interrogated with multivariate statistics to provide marker ions of interest. Relative high abundance levels of the standards were reported for LD and HD compared to controls, confirmed by trend plots analysis showing an increase in LD and HD groups compared to control. The standards were identified with an average score of 91 and mass error of 1.2 ppm. Three sample clusters were produced with the standards being the most differentiating features (top 20 based on q value) between groups. Functionality of the software will be demonstrated using biological samples.

70 Quantitative Lipidomics in a Core Lab: Navigating a Slippery Slope to Success

Karin M. Green and Scott A. ShafferUniversity of Massachusetts Medical School

Successful “global” metabolomics comes from a divide and conquer strategy. The lipidome represents one such partition of the metabolome and includes hydrophobic compounds such as phospholipids, neutral lipids, sphingolipids, glycolipids, fatty acids and steroids. Liquid chromatography coupled to electrospray ionization mass spectrometry is a practical way to interrogate the lipidome from cells, tissues, or biological fluids. While sample preparation and data acquisition methods are straight forward, the challenge and bottleneck comes in data analysis and the subsequent task of reducing data to information. We have developed a general approach to profile and quantitate over twenty lipid classes including 14 polar and 8 neutral lipid classes and applied this approach to the study of brain tissue in a murine model of Huntington’s disease. Targeted extraction of the LC-MS data using Sieve (Thermo) provided identification using an accurate mass and retention time approach

Page 23: POSTER ABSTRACTS - ABRF protocols, and ... POSTER ABSTRACTS ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 2 ... N. Jafari 5, C. Aquino 6, A. Perera 7

ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 22POSTER ABSTRACTS

and quantification of the total lipid for each lipid class was achieved via spiking the sample with a non-endogenous synthetic lipid. Broadly, using a 5 ppm tolerance, ~300 mono-, di-, and triacylated lipids were annotated for the number of carbon atoms and double bonds and were readily differentiated from ether-containing or oxidized lipids with high confidence. However, definitive acyl assignments were elusive as most m/z values are mixtures of 2-6 isobaric compounds. Heterogeneity was measured using tandem mass spectrometry but manual annotation laborious and time consuming. Alternatively, Lipid Search (Thermo) was able to define the fatty acid profiles efficiently, and by exporting and combining the results with MS data extracted from Sieve, a comprehensive dataset resulted for ~1200 lipids. While we conclude that we have developed a working pipeline for complex lipid analysis, the general question of whether quantitative lipidomics is a tractable and viable venture for a core lab is an open question and over-arching theme.

MICROFLUIDICS

71 MudCHiP: Online 2D chip-LC Peptide Trapping as a Robust Approach for Proteomic Biomarker Applications Christoph Krisp1, Nicole Hebert2, Matthew McKay1, Mark Yang2, Remco Van Soest2, Tina Settineri2, Mark P. Molloy1

1Australian Proteome Analysis Facility, Department of Chemistry & Biomolecular Sciences. Macquarie University, Sydney, Australia, 2Eksigent part of AB SCIEX, 1201 Radio Road, Redwood City, CA 94065

Strong cation exchange (SCX) chromatography of tryptic peptides is commonly applied in the proteomic field to overcome limitations in mass spectrometry (MS) based peptide detection due to high sample complexity. One of the most well-established approaches is to use multi-dimensional Protein Identification Technology (MudPIT), which couples SCX with reverse phase (RP) chromatography in an online workflow. Our current research aims to develop a MudPIT-based chip trapping system that would be appropriate for plasma-based biomarker detection. The MudPIT trap chip consists of two C18 RP phases and one SCX phase. The first RP phase is used for sample desalting. This phase is followed by the SCX phase for charge-based separation of the peptides. The second RP phase is used to retain peptides elution from the SCX phase. Each sample or salt elution step was followed by an acetonitrile gradient to transfer peptides either from the first RP phase to the SCX resin or elute peptides from the second RP phase for analytical separation on a 15 cm, 75 µm iD C18 RP chip column and SRM-MS on a 4000 QTrap mass spectrometer. The use of the MudPIT peptide trapping strategy reduces the complexity in each elution fraction which benefits peptide ionization and reduces ion suppression. This will enable precise and multiplexed detection of less abundant biomarker proteins that will be of interest for clinical applications. MudCHiP column applicability to discovery applications was assessed using a 5600 TripleTOF mass spectrometer. We tested up to 10 µg loads of tryptic peptides from cell lysates and plasma samples and achieved 4-fold increases in peptide spectra identification and a more than 2-fold increase in protein identification compared with conventional methods. Thus, the MudCHiP columns demonstrate applicability to both targeted and discovery proteomics with great advantages over conventionally used strategies.

72 KuiqpicKTM: A Novel Instrument for Rapid Collection of Individual Live Cells from Adherent Cultures

Z. Ma, L.C. Kudo, S.L. KarstenNeuroInDx, Inc., Signal Hill, California, USA

Single cell analysis is a rapidly developing field of biomedical science that is critical for sound elucidation of specific cellular functions, which cannot be measured at the bulk population level. In spite of rapid development of methods tailored to the analysis of a single cell and its contents, routine acquisition of individual cells from adherent or 3D cell cultures has remained challenging. Recently, we introduced a novel low-cost capillary-based vacuum-assisted cell and tissue acquisition device, KuiqpickTM v.1.0, which allows for a rapid and accurate brain tissue microdissection under direct microscopic visualization. Here, we further tested its feasibility and efficiency for collecting single cells from in vitro cultures. Collection of individual live cells was performed from various types of adherent cultures, including primary neural progenitor cells established from embryonic and adult mouse and rat brains, neuroblastoma SH-SY5Y, Chinese hamster ovary (CHO) and human melanoma MDA-MB-435 cell cultures. Collection of cells was performed based on both their morphology and fluorescent label (e.g. CellTracker probes). To test the viability of collected cells, Trypan blue exclusion test and recultivation experiments were performed. Trypan blue assay demonstrated 80% and 95% survival rates for SH-SY5Y and CHO cells respectively. The clonal expansion of collected single SH-SY5Y and CHO cells was demonstrated within 6 and 25 days, respectively. In addition, applicability of the instrument for the collection of single cells grown in three-dimensional (3D) culture system was demonstrated using MDA-MB-435 cells. To summarize, the experimental data show that Kuiqpick is a convenient approach to collect single and multiple cells (e.g. colonies) from adherent and 3D cell cultures based on their morphology or fluorescent label. Collected cells demonstrate high survival rate permitting an array of downstream functional studies.

NUCLEIC ACID EXTRACTION/AMPLIFICATION

73 Transcriptome Analysis from Low Cell Numbers: Two RNA-Amplification Approaches

B. Fleharty, J. Vallandingham, A. Peak, J. Morrison, C. Bailey, K. Staehling, A. Perera, K. Zueckert-Gaudenz, P. Kulesa, H. LiStowers Institute for Medical Research

Laser capture microdissection (LCM) permits the precise isolation of small populations of cells from complex tissue. Without the need for purification, the limited quantities of total RNA in LCM samples can be amplified for global gene expression profiling by RNA-Seq. In this study, two RNA amplification methods were evaluated. As input, 10, 30 and 100 human metastatic melanoma cells (c8161) were harvested by LCM after transplantation into the chick embryonic neural crest microenvironment. For comparison, 10 to 300 cultured c8161 cells were processed in identical manner. The amplified cDNA samples generated with either the SMARTer Ultra Low RNA Kit for Illumina Sequencing (Clontech) or the Ovation RNA-Seq System V2 kit (NuGEN) were fragmented prior to Illumina TruSeq library construction. As references, libraries were prepared from 10 ng of amplified and 1 µg of unamplified total RNA isolated from ~7 million cultured cells. We

Page 24: POSTER ABSTRACTS - ABRF protocols, and ... POSTER ABSTRACTS ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 2 ... N. Jafari 5, C. Aquino 6, A. Perera 7

ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 23POSTER ABSTRACTS

show that both kits can be used to perform quantitative transcriptome analysis with as few as 10 cells. The cDNA yields obtained with the NuGEN kit were significantly higher than the Clontech yields. The number of expressed genes (>0 FPKM) increased with higher cell numbers, particularly for the culture-derived samples. Gene length and transcript abundance positively affected correlation with the unamplified reference for both the LCM and culture-derived samples. Considerably different expression profiles were observed for the samples amplified with the NuGEN versus Clontech methods which may be attributed to the different chemistries. Therefore, caution is advised when directly comparing inter-kit data sets.

OTHER

74 Creating Sustainable Biospecimen Resources in Medical Research

A. Brooks 1Chief Operating Officer, RUCDR Infinite Biologics

BACKGROUND: Biospecimens collected during medical research and the data derived from these materials represent a valuable resource of genetic and genomic information that can be used in biopharmaceutical drug development. The growing industry trend towards personalized medicine combined with reductions in research funding is creating the need for organizations to focus on optimization of sample asset inventories and integration of high-quality sample bioprocessing methods in order to maximize future research and reduce costs.

Many organizations have turned to smartsourcing strategies which leverage specialized outsourcing service partners to manage their sample assets due to the required investment in staff, technology, equipment and facilities. At its core, smartsourcing moves beyond the traditional “lift and shift” outsourcing philosophy to a more flexible “value by innovating” philosophy. Sample management smartsourcing strategies include offsite, onsite and hybrid models that can be integrated with sample bioprocessing to deliver incremental value. Biospecimen smartsourcing strategies are evolving to support organizations in achieving significant cost savings and research advancements. As such, the poster will detail the scientific and financial benefits of taking a strategic approach to sample management. Specifically, the poster will highlight: Various smartsourcing management models for centralized specimen storage, bioprocessing and data reconciliation. How to assess existing sample management processes to identify opportunities for improvements and cost-savings. Financial and scientific benefits of integrating sample management and bioprocessing solutions through smartsourcing partnerships. CONCLUSION: Sample management smartsourcing strategies challenge research organizations to not only focus on best practices in biobanking, but also to seek improvements in bioprocessing methodology, operating procedures, technology and data management. Organizations that manage research samples as valuable scientific assets achieve improved sample optimization and utilization throughout the discovery and development process, which enables them to invest in their R&D core competencies.

75 A Microscope Evaluation Database for the Large Microscopy Core

Steven J Hoffman1, Winfried Wiegraebe1

1Stowers Microscopy Center, Stowers Institute for Medical Research, Kansas City, MO

Ensuring accurate and reliable light microscopy imaging from the core lab systems is enhanced by recording of periodic evaluation of microscope performance. The challenges increase with the number and specialty of instruments in a large core. We present a systematic approach that uses a three tier maintenance schedule and a relational database for service notes, performance measures, and logon monitoring to facilitate a large microscopy core staff in maintaining microscopes. The system was developed to manage over 25 instruments ranging from widefield and confocal to super-resolution, with a user base of over 200. As a result: 1) core staff workflow is more efficient because access to equipment status, location and documentation is centralized and instantly available; 2) instrument problem resolution is more effective using issue tracking reports; and 3) equipment procurement decisions are improved by equipment use records.

76 Effect of Probiotic and Pathogenic Bacteria on Drosophila Intestinal Pathology

J. Miles, P. Hegan, M. Mooseker, and J. HandelsmanYale University

Efforts to understand the microbial contribution to chronic conditions such as obesity, irritable bowel disease, and colon cancer have led to increased study of the human gut microbiome. With current interest in manipulating microbiome composition to treat disease, Drosophila has emerged as a model system for studying the principles that govern host-microbe interactions. We recently reported that both Drosophila-associated and human-administered probiotic strains protect Drosophila from infection. Based on these findings, we sought to use the Drosophila system to understand the effect of interactions between probiotic strains and infectious microbes on host intestinal pathology. For this work, we used entomopathogenic Serratia marcescens and the Drosophila symbiont and human probiotic Lactobacillus plantarum. To observe if L. plantarum could lessen gut damage associated with S. marcescens infection, we imaged the ultrastructure of the Drosophila gut and changes in the localization of bacterial populations during S. marcescens infection in the presence and absence of L. plantarum. We also monitored the pH of the Drosophila intestine in response to colonization with L. plantarum, challenge with S. marcescens, or both conditions. This work provides a foundation for further study of the effects of probiotic consumption on human intestinal pathology.

Page 25: POSTER ABSTRACTS - ABRF protocols, and ... POSTER ABSTRACTS ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 2 ... N. Jafari 5, C. Aquino 6, A. Perera 7

ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 24POSTER ABSTRACTS

PCR METHODOLOGY

77 A Simple Multiplex PCR Approach for Target Enrichment in Next-Gen Sequencing

Xiaochun Zhou, Qi Zhu, Chris Hebel LC Sciences

Multiplexing PCR is a simple way to extract genomic regions of interest for various medical and genetic tests. Somatic mutations lead to various diseases including cancer. These mutations are unlikely to be best detected using regular whole genome sequencing. Clinical samples often consist of disease cells, e.g. cancer cells, surrounded by normal cells. Thus, deep sequencing of hundreds to thousands fold coverage is required to detect the mutations. In clinical research many doctors are interested in specific genes or genomic regions and they want to extract the regions from genomic DNA or RNA before sequencing. Many current clinical, forensic, and heretical genetic test workflows start with multiplexing PCR to extract genetic marker carrying regions from whole genomes before running hybridization, sequencing, or electrophoresis tests to identify the markers. Personal medicine and prognosis mostly involve examining sequence variations of a number of targeted genes and metabolic pathway genes so as to predict drug efficacies and drug toxicities. We have developed a new multiplexing PCR approach with a significantly simplified workflow and significantly improved robustness. When applied to sequencing target enrichment application, the workflow for producing amplified targets involves only one hands-on step and one PCR run. The approach is designed to require low sample input and to produce superior amplicon uniformity and sequence specificity. The approach involves a novel primer design and a proprietary reaction composition. A PCR run consists of two functionally separated reaction phases, namely target capture and library amplification, without any hands-on step in between. The performance of the new approach will be demonstrated by a caner panel data.

78 Strategies for Profiling Single Mouse Intestinal Epithelial Cells by Targeted Gene Expression

K. Zueckert-Gaudenz, W. McDowell, A. Box, K. Staehling, F. Wang, L. LiStowers Institute for Medical Research

Targeted gene expression profiling of single cells permits the study of heterogeneity in cell populations. Here, a pool of mouse intestinal crypt-base CD44+/GRP78- cells was collected by fluorescence activated cell sorting. Aliquots were either loaded onto Fluidigm’s C1 System for microfluidic cell capture and cDNA synthesis in nanoliter volumes, or flow-sorted directly into individual PCR plate wells for cDNA synthesis in microliter volumes. The pre-amplified cDNAs were transferred to the BioMark System for EvaGreen real-time PCR. The two sample preparation methods were compared by expression analysis of 86 genes, using Fluidigm’s SINGuLAR R-scripts. After outlier identification, gene expression values from 42% of the “C1” and 92% of the “flow” wells were retained. For 55 of the genes, expression was measured in both the “C1” and “flow” cells. Genes with a high variance in expression likely stemming from the sample preparation method and/or unspecific amplification were removed. Hierarchical clustering on the remaining data revealed gene clusters that contributed to the expected Lgr5hi and Lgr5lo intestinal stem cell (ISC) populations as well as a small population of differentiated cells. The subpopulations could be defined by either method. However, as ISCs quickly undergo apoptosis at room temperature, the use of the C1 System provided no clear advantage over the direct sorting of the fragile cells into lysis/RT reaction buffer. Specifically, the C1 quality control step to verify the number of captured cells and cell viability was omitted to accelerate processing.

79 Precise and Accurate Determination of microRNA Precursors by Digital PCR

S. Patel1, E, Mead2, Y. Wang2, H. Patricia1, S. Jackson1, A. Pietrzykowski31Life Technologies, 2Department of Animal Sciences, Rutgers University, 3Department of Animal Sciences, Department of Genetics, Rutgers University

There is a mounting evidence indicated that microRNAs are fundamental elements of almost every biological process. A majority of methodologies have concentrated on detection of mature microRNAs, which execute microRNA-dependent gene silencing. Much less work has been dedicated to microRNA precursors, despite the fact that they are rudiments of microRNA biogenesis and processing. Challenges of microRNA precursor detection are due to their low abundance, necessity of use of a reference transcript, and sequence similarity. We measured 3 different precursors of the same microRNA – miR-9 (pri-miR-9-1, -2 and -3) in primary neuronal cell cultures, to show that these challenges can be easily overcome using QuantStudio™ 3D Digital PCR System, the most recent advancement of technology dedicated to quantification of nucleic acids. This chip-based digital PCR technology employs 20,000 solid partitions and allowed us to performed absolute quantification of each miR-9 precursor, without the use of a reference gene, with very high precision and very good reproducibility between technical and sample replicates. Next, we used short and long alcohol exposure to evoke changes in miR-9 precursors in cultured cells. Again, our measurements were very reproducible with precision overcoming a 1 Ct (a 2-fold change) limit of the real-time PCR. Interestingly, one of miR-9 precursors is a protein-encoding mRNA transcript, while two others are long non-coding transcripts, thus indicating versatility of digital PCR methodology. In summary, the chip-based QuantStudio™ 3D digital PCR technology provides significant improvement in precision and accuracy of measurement of a variety of RNA transcripts existing in a very diversified RNA world.

PROTEIN ANALYSIS

80 Identification of the Free Cysteinyl Residue and Characterization of the N-glycan Structure in the C-terminal Domain of PCSK9

John O. Hui, John H. Robinson, Chris Spahr, Stone D.-H. Shi, Wei Wang and Hsieng S. LuBiologics Optimization, Therapeutic Discovery, Amgen Inc., Thousand Oaks, CA 91320

Elevated plasma low density lipoprotein cholesterol (LDL-C) level is a risk factor for the development of atherosclerosis and associated cardiovascular disorders. Studies by Brown and Goldstein have firmly demonstrated the pivotal function played by the LDL receptor (LDLR) in cholesterol metabolism. The recently discovered proprotein convertase subtilisin/kexin type 9 (PCSK9) is another important regulator of LDL-C level. The protein promotes degradation of the LDLR, thus abolishing the receptor function. Specific inhibition of PCSK9 may therefore provide another novel therapeutic approach in the treatment of hypercholesterolemia. To evaluate human PCSK9 as a therapeutic target for monoclonal antibody development, the recombinant protein has been expressed in CHO system. The mature protein consists of 2 tightly associated subunits: a pro-domain of ~14 kD containing a single cysteine residue and a C-terminal catalytic

Page 26: POSTER ABSTRACTS - ABRF protocols, and ... POSTER ABSTRACTS ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 2 ... N. Jafari 5, C. Aquino 6, A. Perera 7

ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 25POSTER ABSTRACTS

domain of ~62 kD. The latter contains a total of 25 cysteinyl residues and a single N-linked glycosylation site. To show that the recombinant protein has been properly folded, we attempted to identify the location of the free cysteinyl residue(s) in the C-terminal domain. LC-MS analysis of the intact material after incubation with iodoacetamide in the presence of denaturant showed only a single site was alkylated. The labeled protein was completely reduced and pyridyethylated prior to proteolytic degradation. Analysis by LC-MS/MS showed that Cys301 was the only residue S-carboxamidomethylated, indicating this is the single free cysteine. The data therefore confirms the prior structural information obtained by X-ray crystallography. Peptide mapping also shows Asn533 is glycosylated. The predominant glycoforms are biantennary glycans, followed by triantennary and with very low amount of tetraantennary oligosaccharides. It is interesting to note that Asn533 is located in a cysteine rich area and is proximal to a cysteine residue that is disulfide cross linked (NCS). Thus, N-glycosylation was probably complete prior to cystine bridges formation in the maturation of PCSK9.

81 Locked Nucleic Acid and TaqMan Probes to Detect KRAS Mutations by Droplet Digital PCR in Colorectal Carcinoma Formalin Fixed Paraffin Embedded Tissue Samples

D.C. Sullivan1, B. Chapman1, A. Brown1, S. Cooper2, W. Yang2, G. Karlin-Neumann2, S. Binder2 1University of Mississippi Medical Center, 2Digital Biology Center, Bio-Rad Laboratories

BACKGROUND: Metastatic colorectal carcinoma (CRC) tumors with KRAS mutations in exon 2, codons 12 and 13, do not benefit from anti-epidermal growth factor receptor (EGFR) antibody therapy. Current best practice recommends that KRAS mutation analysis be included in patients with CRC as a prerequisite for treatment with cetuximab or panitumumab. We selected 50 CRC formalin fixed paraffin embedded (FFPE) samples to compare a qPCR assay (QIAGEN therascreen® KRAS RGQ PCR Kit) to droplet digital PCR (ddPCR). Wild type and six single nucleotide mutations in KRAS codons 12 and 13 were tested. Methods. DNA was extracted and used in assays employing primers paired with locked nucleic acid (LNA) or TaqMan probes. Droplets were generated and streamed through a droplet reader (BioRad Laboratories, CA). The QuantaSoft software (BioRad Laboratories) was used to analyze the ddPCR data. Results. In limit of detection studies, ddPCR detected 0.002 cpd of known mutant DNA in a background of 1 cpd of wt DNA. Specific KRAS mutations were detected in 16 (of 50) samples using LNA probes. Of the 16 samples tested by LNA probes, 13 were also tested using TaqMan probes, confirming results from the LNA assays. Results from ddPCR assays were compared to results from an outside commercial laboratory. The commercial lab detected and identified mutations in 10 of the 50 FFPE samples tested. ddPCR assays also detected mutations in 10 of these samples. An additional five samples were determined to be positive by commercial qPCR but no specific mutation was reported. ddPCR detected and identified specific mutations in three of these as well as in three samples not identified by commercial qPCR. Conclusions. ddPCR allele specific assays compared well to current commercially available KRAS mutation detection assays. Some cross reactivity was noted in ddPCR assays employing LNA probes but not TaqMan probes.

82 Accurate Absolute Quantitation of Endogenous AKT1 in Jurkat Cells Using Simple Western

P. Whaley, A. Tu, U. Nguyen, I. Kazakova, F. Ramirez, H. Xu, J. Proctor, A. Boge ProteinSimple

Why do researchers use Western blots? It’s been the default technology for confirming the presence or absence of a protein, but Western blots are a poor method for measuring the amount of that protein. Accurate quantitation of proteins using traditional Western blotting has been a goal since the technique was developed over thirty years ago. However, because the process requires many steps, each introducing variability, it has not been possible. Portions of the technique have been automated to try and improve consistency, but until now, there has been no major leap in the technology that would propel this method of protein analysis from qualitative to quantitative. Simple Western is the modern evolution of traditional immunoassay techniques. Wes, the latest addition to the Simple Western platform, is an easy to use, fully automated system that removes variability seen with traditional Westerns, so results are more reproducible run to run, between users and over time. The curve-fit feature in the software provided with Wes allows comparison of endogenous proteins in a sample against a standard curve. Researchers are able to not only identify their protein, but also achieve reliable quantitation of their target proteins. To demonstrate how the precision and data reliability of Wes results in more accurate absolute quantitation, we ran spiked GST-tagged AKT1, an oncogene that plays an integral role in triggering the anti-apoptotic response of the PI3K signaling pathway, into a Jurkat cell lysate to act as a standard curve. The spiked lysate was then run on both Wes and a traditional Western blot. When the two methods are compared, Wes not only reduces the hands on time and the time to results, but the reproducibility of the data produces a high degree of confidence in the accuracy of the reported concentration of endogenous AKT1.

PROTEOMICS

83 Fast and Efficient IMAC Protocol for Phosphopeptide enrichment for phosphoproteomic Studies via LC-MS/MS

H. Ding1, C. McKennan1, L. Spruce1, and S. Seeholzer1 1Protein and Proteomics Core Facility, The Children’s Hospital of Philadelphia, The Joseph Stroke Jr. Research Institute, 3615 Civic Center Boulevard, ARC/816A, Philadelphia, PA 19104

Recent developments in first dimension High-Performance Liquid Chromatography (HPLC) separation of complex peptide mixtures, followed by a subsequent immobilized metal ion affinity chromatography (IMAC) for phosphopeptide enrichment have shown great promise in both selectivity and quantification of phosphopeptides via LC-MS/MS analysis. The first dimension HPLC, such as hydrophilic interaction chromatography (HILIC) or high pH Reverse Phase chromatography, was employed for its being orthogonal to the second dimension chromatography gradient and/or its even distribution of phosphopeptides among fractions derived from separation. Subsequent IMAC enrichment can then achieve high specificity and superb quantification for SILAC LC-MS/MS

Page 27: POSTER ABSTRACTS - ABRF protocols, and ... POSTER ABSTRACTS ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 2 ... N. Jafari 5, C. Aquino 6, A. Perera 7

ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 26POSTER ABSTRACTS

approaches. However, first dimension HPLC separation can generate a large number of fractions, each of which is handled in a microtube, such as a Nylon filter tube or a stage tip device, for the following IMAC procedure. Handling so many tubes will make the IMAC procedure not only more tedious but also reduce reproducibility when biological replicates are required. In our current improved IMAC protocol, we employed a 96 well glass fiber plate which replaces both the individual nylon filter tubes from the HILIC-IMAC workflow and cumbersome stage tips in the RP-IMAC workflow. Our results indicate that we can identify ~20,000 unique phosphopeptides with 3mg of mouse brain digest input with specificity above 90%. In addition, we are able to overcome contaminating nylon filter peaks in the HILIC-IMAC workflow. Our IMAC protocol significantly shortens the IMAC method to ~1hr instead of several hours with minimum handling of individual microtubes and/or stage tips. In conclusion, our newly improved IMAC phosphopeptide enrichment method achieves a fast and effective phosphopeptide enrichment workflow that retains high specificity and great quantification, therefore providing a very useful tool for global phosphoproteomics studies.

84 A Workflow of MS/MS Data Analysis to Maximize the Peptide Identification

Zefeng Zhang1, Lei Xing1, Liang Yang2, Baozhen Shan1

1Bioinformatics Solutions Inc, Waterloo, Ontario, Canada2Department of Computer Science, University of Waterloo, Waterloo, Canada

INTRODUCTION. A key step in shotgun proteomics is the peptide identification. There are two approaches for the analysis of MS/MS spectra – database search and de novo sequencing. A protein sequence database search is prioritized for database peptides and modified peptides. De novo sequencing is the only option for novel or homolog peptides. Unlike target-decoy approach for database search, there lacks a validation approach for de novo sequencing. Here we describe a workflow integrating database search and de novo sequencing, in which database peptides are used to validate de novo peptides. The workflow maximized the peptide identification.

METHODS. 1. Let T1 be the set of MS/MS spectra. Perform de novo sequencing and database search with T1. 2. Let T2 be the set of the spectra identified by database search with 1% of FDR at the peptide-spectrum match level. For each spectrum in T2, a de novo peptide was validated with the database peptide at amino acid residue level. The local confidence score distributions were plotted for de novo residues that agree/disagree with database residues. 3. For the de novo peptides in T3 = T1 –T2, their score distributions of correct and incorrect residues were estimated with validated distributions in Step 2.

PRELIMINARY RESULTS. Three data sets from complex protein samples on LTQ-Orbitrap and 5600 TripleTOF were tested. PEAKS was used for both de novo and database search. Average local confidence was used to filter de novo sequences. The peptide identification was compared with the one obtained from a consensus database search (PEAKS + MASCOT + X!Tandem). The de novo peptides were selected by a local confidence score threshold with 85% of correctness in the score distributions. The results showed that 8% extra peptides were identified with this workflow. CONCLUSION. A workflow for maximizing peptide identification with de novo and database search.

85 Investigation of Different Hierarchal Clustering Approaches for Protein Identification Directly from Tissue Section in a MALDI Imaging Experiment

Mark Towers1, Laura Cole2, Malcolm Clench2, Emmanuelle Claude1 1Waters Corporation, Manchester, United Kingdom, 2Sheffield Hallam University, Sheffield, United Kingdom

Mass spectrometry imaging (MSI) allows for the correlation of spatial localization and chemical information directly from biological surfaces. A data set can contain thousands of ion signals with varying degrees of co-localisation. Ion mobility separation based on travelling wave technology can be utilized to add specificity to the MSI experiment. This leads to highly complex data sets that necessitate the need for advanced automated computerized processing. Here, we investigate the use of different Hierarchal Clustering Analysis (HCA) methods to aid the analysis of digested tissue sections by clustering ion images based on correlation. Four proteins digests from BSA, Phosphorylase B, ADH and Enolase were spotted forming a 6x6 array comprising four 4x4 overlapping squares. Mouse fibrosarcoma model tissue sections were washed and on-tissue tryptic digested overnight. Matrix was applied evenly in several coats. Data were acquired using MALDI SYNAPT G1 and G2-S instruments in MS mode with tri-wave ion guide optics to separate ions according to their ionic mobility. The acquisition mass range was from 700-3,000 Da. Data were processed and visualized using High Definition Imaging MALDI software. Data reduction is initially achieved by peak picking using multidimensional (m/z and drift time) detection algorithms. A second step aims at generating ion distributions images comprising x,y coordinates and a third step correlates all processed ion distributions using Pearson product-moment algorithms. Different HCA methods were assessed in terms of their ability to cluster peaks from the tryptic digest imaging data set into related peptide groups, which can be used for PMF protein identification. The methods were also evaluated in terms of the time required to complete the analysis and number of hierarchal levels created. Top-down K-medoid HCA was applied to the fibrosarcoma tissue data. It successfully clustered tryptic peptides with multiple protein identifications from a complex digested tissue section.

86 Comparison of Commonly Used Methods for Protein Relative Quantification in Complex Samples

Yan Wang1, Waeowalee Choksawangkarn2, Avantika Dhabaria2

1Proteomics Core Facility, 2Department of Chemistry and Biochemistry, College of Computer, Mathematics, and Natural Sciences, University of Maryland, College Park, MD 20742

One major application of proteomics is to identify proteins with changed expression levels under different conditions (control Vs treatment, wild type Vs. mutant, etc.) in a complex proteome. Over the last decade, dozens of tools, both chemically and computationally, have been developed to help with this kind of analysis. In this presentation, we are attempting to benchmark several commercial tools available at our facility on their effectiveness in identification of differentially expressed proteins in a model system using a shotgun approach. When looking for differentiated proteins, the underlying

Page 28: POSTER ABSTRACTS - ABRF protocols, and ... POSTER ABSTRACTS ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 2 ... N. Jafari 5, C. Aquino 6, A. Perera 7

ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 27POSTER ABSTRACTS

assumption is that expression of the majority of proteins (house-keeping proteins) remains unchanged. Based on this assumption, we developed a model system with whole cell lysate as a “base” that does not change, and 11 commercially available proteins spiked in at different levels and ratios as “targets”. Protein mixtures were digested with trypsin and tryptic peptides were analyzed in triplicate with a 4-hour gradient by nano LCMSMS using LTQ Orbitrap XL. The last version of ipi human protein database (V3.75) was modified to include the 11 proteins that we spiked in for both search engines (Proteome Discoverer and Mascot) prior to data processing. Methods are categorized into 3 areas and data analyzed with different software tools available to us and evaluated for number of proteins identified and accuracy in protein relative abundance. While the focus is on the 11 “target” proteins, we will evaluate number of “base” proteins that are identified to have altered levels (false discovery). The three areas are: 1. Amine reacting chemical labeling (iTraq, and TMT), 2. Label free spectra counting, and 3. Chromatography based label-free analysis.

87 Characterizing Low Stoichiometry Phosphorylation Events without the Requirement for Extensive Fractionation

E.J. Soderblom1, J.W. Thompson1, C.L. Farnsworth2, J.C. Silva2, M.A. Moseley1

1Proteomics Core Facility, Duke University School of Medicine, 2Cell Signaling Technologies

Global characterization of a phosphoproteome using high-resolution LC-MS/MS is well described in the literature. These strategies most often incorporate phosphopeptide enrichments prior to LC-MS/MS analysis, typically TiO2 or IMAC, and increasing the depth of coverage is almost exclusively accomplished using fractionation (typically SCX) prior to enrichment. To characterize lower stoichiometry phosphorylation events without the requirement of extensive fractionation, we have utilized an alternative strategy using multiple kinase motif specific antibody pulldowns to enrich phosphopeptides targeted by biologically relevant kinases. Triplicate TiO2 and motif antibody enrichments were performed for both mouse brain and embryo tissue samples following standardized urea-based solubilization and digestion protocols. Samples were analyzed in duplicate on a 1D NanoAcquity UPLC (Waters) coupled to a LTQ-Orbitrap XL (Thermo) mass spectrometer. Spectra were searched in Mascot against a SwissProt_Mouse database and extracted ion chromatogram label-free quantitation was performed within Rosetta Elucidator v3.3 (Rosetta Biosoftware) following AMRT alignment within each enrichment-matrix pair. The analysis revealed that only 59 of the 1922 uniquely identified phosphopeptides in mouse brain and only 67 of the 1888 uniquely identified phosphopeptides in mouse embryo were in common between the two enrichment strategies. This high degree of uniqueness (greater than 95% in both tissues) highlights the orthogonality of the two strategies and was maintained when phosphopeptides from the brain motif-specific antibody enrichments were compared against a recently published dataset of 12,000+ unique phosphopeptides from mouse brain acquired in a SCX-fractionated/IMAC-enriched experiment. Interestingly, the unique coverage from each enrichment approach was also conserved within individual proteins exemplifying the variation in phosphorylation stoichiometry even within the same protein. Achieving a more comprehensive characterization of an organism’s phosphoprotome without the requirement for extensive fractionation is particularly attractive within Core Facility environments where considerations to instrument availability and experimental costs are of critical importance.

88 Comparison of Fractionation Strategies Post-Enrichment by Titania Used in the Study of the Phosphoproteome of a Green Alga

M.J. Naldrett, E.L. Marsh, S. Alvarez, B.S. Evans Proteomics and Mass Spectrometry Facility, Donald Danforth Plant Science Center, St Louis, MO.

Phosphorylation of proteins plays a pivotal role in numerous cellular processes. Its analysis is non-trivial, but assorted enrichment protocols and fractionation strategies combined with mass spectrometry have opened up this field. Its occurrence at low levels means that attention to detail at all stages of the sample processing and acquisition is imperative to preventing losses and maximizing results. Ease of destruction, adsorption to surfaces, phosphatase activity amongst many others, all contribute to a poor result. Here, using protein extracted from Chlamydomonas reinhardtii and a standard enrichment protocol employing TiO2 with lactic acid, we compare post-enrichment subfractionation by hydrophilic interaction chromatography (HILIC) and strong cation exchange (SCX) and how the choice of approach can affect the subset of phosphopeptides that is ultimately found. HILIC is often the route of first choice as fractions can simply be dried to remove organic solvent prior to injection onto the HPLC. However, findings show that the overlap between SCX and HILIC can be as little as 40%. C. reinhardtii is a useful model for the study of oil production for biofuels. Understanding changes in the phosphorylation of C. reinhardtii proteins would ultimately lead to a greater understanding of the mechanisms regulating lipid production.

QUANTITATIVE PROTEOMICS

89 Verification of a Parkinson’s Disease Protein Signature by Multiple Reaction Monitoring

Tiziana Alberio1,2, Kelly McMahon3, Manuela Cuccurullo4, Lee A. Gethings3, Craig Lawless5, Maurizio Zibetti6, Leonardo Lopiano6, Johannes P.C. Vissers3, and Mauro Fasano1,2

1Division of Biomedical Sciences, Department of Theoretical and Applied Sciences, University of Insubria. Via Luciano Manara 7, Busto Arsizio, I-21052, Italy, 2Center of Neuroscience, University of Insubria. Via Alberto da Giussano 12, Busto Arsizio, I-21052, Italy, 3Waters Corporation, Atlas Park, Simonsway, Manchester, M22 5PP, United Kingdom, 4Waters Corporation, Viale dell’Innovazione 3, Milano, I-20126, Italy, 5Faculty of Life Sciences, University of Manchester, Manchester, M13 9PT, United Kingdom, 6Department of Neuroscience, University of Torino. Via Cherasco 15, Torino, I-10126, Italy

OBJECTIVE: Integration of different ‘omics data (genomic, transcriptomic, proteomic) reveals novel discoveries into biological systems. Integration of these datasets is challenging however, involving use of multiple disparate software in a sequential manner. However, the use of multiple, disparate software in a sequential manner makes the integration of multi-omic data a serious challenge. We describe the extension of Galaxy for mass spectrometric-based proteomics software, enabling advanced multi-omic applications

Page 29: POSTER ABSTRACTS - ABRF protocols, and ... POSTER ABSTRACTS ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 2 ... N. Jafari 5, C. Aquino 6, A. Perera 7

ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 28POSTER ABSTRACTS

in proteogenomics and metaproteomics. We will demonstrate the benefits of Galaxy for these analyses, as well as its value for software developers seeking to publish new software. We will also share insights on the benefits of the Galaxy framework as a bioinformatics solution for proteomic/metabolomic core facilities. METHODS: Multiple datasets for proteogenomics research (3D-fractionated salivary dataset and oral pre-malignant lesion (OPML) dataset) and metaproteomics research (OPML dataset and Severe Early Childhood Caries (SECC) dataset). Software required for analytical steps such as peaklist generation, database generation (RNA-Seq derived and others), database search (ProteinPilot and X! tandem) and for quantitative proteomics were deployed, tested and optimized for use in workflows. The software are shared in Galaxy toolshed (http://toolshed.g2.bx.psu.edu/). Results: Usage of analytical workflows resulted in reliable identification of novel proteoforms (proteogenomics) or microorganisms (metaproteomics). Proteogenomics analysis identified novel proteoforms in the salivary dataset (51) and OPML dataset (38). Metaproteomics analysis led to microbial identification in OPML and SECC datasets using MEGAN software. As examples, workflows for proteogenomics analysis (http://z.umn.edu/pg140) and metaproteomic analysis (http://z.umn.edu/mp65) are available at the usegalaxyp.org website. Tutorials for workflow usage within Galaxy-P framework are also available (http://z.umn.edu/ppingp). CONCLUSIONS: We demonstrate the use of Galaxy for integrated analysis of multi-omic data, in an accessible, transparent and reproducible manner. Our results and experiences using this framework demonstrate the potential for Galaxy to be a unifying bioinformatics solution for ‘omics core facilities.

90 Reducing Sample Amounts for Isobaric Tagging Quantitative Proteomics Experiments

T. B. Abbott1, M. Loprest1, C. M. Colangelo1

1Yale University

Quantitative proteomics by mass spectrometry relies on the accurate comparison between multiple samples, using several methods including isobaric tagging reagents such as iTRAQ or TMT. These reagents enable multiplexing where the area of the reporter ions present in the MS/MS spectrum allows comparison across samples within one run. However, the cost of the isobaric tagging kits deters many researchers. In addition, the iTRAQ labeling reaction is dependent upon both concentration and absolute amount of sample present, and supplier documentation recommends reactions on 20 to 100 micrograms of protein. Since the current generation of high resolution mass spectrometers only need a few micrograms of protein for analysis, developing methods to utilize the isobaric reagents to label multiple sample sets at lower sample amounts would represent a significant savings per experiment. To test this theory, we utilized a single iTRAQ kit to label multiple sets of a quantified HEK 293 tryptic digest at 10, 5, and 1 ug amounts. In order to effectively label 10 ug, 5 ug, and 1ug, we utilized 16.3%, 8.1%, and 1.7% of the reagents in 17, 8.5, and 1.7 ul volume. Differing percentages of each labeled sample were chosen to yield measureable iTRAQ ratios when mixed. Samples were desalted on C18 to remove undigested protein and excess iTRAQ reagent, dried, and redissolved in 70% formic acid, 0.1% trifluoracetic acid in water, and run on an AB Sciex 5600 Triple TOF mass spectrometer. Preliminary results show efficient iTRAQ labeling for all concentrations. Additionally, the 10 and 5 ug loads showed minimal loss of protein identifications when compared to 20 ug experiments, while 1 microgram showed some loss due to inefficient sample recovery at the C18 step. TMT labeling reactions using similar conditions are currently in progress and will also be reported.

SOFTWARE

91 Optimization of Capital Research Equipment Usage through RED-iness of Information

A. Villar Briones1, M. Shimanuki11Biology Resources Section, Research Support Division, Okinawa Institute of Science and Technology Graduate University

Managing capital equipment in research laboratories is among the most challenging tasks core facilities face, requiring an intricate interplay among division and personnel to oversee a complex informational matrix. Numerous factors can contribute to equipment challenges, but leading the list is the fact of fragmented management practices, like depend on multiple individuals tracking information captured in separate databases, spreadsheets, and on paper, most of the times not well correlated nor updated. Not only oversight administrators, but also researchers, technicians, students and new hires, do not have a cohesive view of their equipment infrastructure, much less their entire equipment diversity, and related safety and compliance information. It is necessary to track not only all procurement-related information, current location, usage, calibration and maintenance information, but also user manuals, standard operating procedures, safety regulations, training or even to link to a calendar reservation system. Here we present an open source platform, called Research Equipment Database (RED), capable of centralizing all aspects of equipment information management, providing a common place where administrative and research personnel use and share all the information related to the equipment. Streamlined access, through a web interface, facilitates higher productivity throughout the institution, helps to protect equipment investments and shields both, the equipment itself and researchers from risk, ensuring that equipment is operated safely, and maximizing productivity. This joint management and display of technical and administrative information, significantly improved the productive time of the devices, as a consequence of well informed users, smarter routine maintenance procedures, and finally more effective instrumentation purchasing decisions.

92 The erccdashboard: an R Package for Analysis of External Spike-in RNA Controls in Gene Expression Experiments

S.A. Munro1,2, S. Lund1,3, P.S. Pine1,2, M. Salit1,2 1Material Measurement Laboratory, National Institute of Standards and Technology 100 Bureau Drive Gaithersburg, MD 20899, 2National Institute of Standards and Technology Advances in Biological and Medical Measurement Science Program at Stanford University, 348 Via Pueblo Mall, Stanford, CA 94305, 3Information Technology Laboratory, National Institute of Standards and Technology, 100 Bureau Drive Gaithersburg ,MD 20899

External RNA spike-in control ratio abundance mixtures enable assessment of technical performance of a gene expression experiment and comparison of performance between gene expression experiments. We’ve developed an R package, the “erccdashboard”, to analyze ratio abundance mixtures of the External RNA Control Consortium (ERCC) controls for the purpose of gene expression

Page 30: POSTER ABSTRACTS - ABRF protocols, and ... POSTER ABSTRACTS ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 2 ... N. Jafari 5, C. Aquino 6, A. Perera 7

ABRF 2014 TEAM SCIENCE AND BIG DATA: CORES AT THE FRONTIER 29POSTER ABSTRACTS

experiment method validation. For RNA-Seq, analysis of sequence reads with existing QC metrics provides an assessment of sequencing and mapped read quality, the analysis of ERCC control ratio mixtures is intended to provide QC metrics that are relevant for gene expression analysis methods including transcript quantification and differential expression testing. The erccdashboard package addresses this critical analysis need with QC metrics for gene expression experiments including dynamic range, biases and variability in ratio measurements, diagnostic performance (discrimination of true positive and true negative controls), and empirical determination of the limit of detection of ratios. Using the erccdashboard package quantitative differences between these ratio measurement performance metrics were shown for experiments within the same laboratory, between different laboratories, and for different gene expression measurement technologies (sequencing platforms and microarrays). The ERCC control ratio measurements also provide a quantification of biases that may be addressed with normalization approaches, such as differences in mRNA fraction of total RNA for a pair of gene expression experiment samples and batch effects. The erccdashboard R software package is freely available and may be adopted as part of the QC process for any gene expression measurement analysis pipeline.

93 Developing a Central “Point-of-Truth” Database for Core Facility Information

R.V. Pearse1, S. Cheng1, G. Webber1, D. Bourges-Waldegg1

1Harvard Catalyst, Clinical and Translational Sciences Department, Harvard Medical School.

Core facilities need customers, yet an average researcher is aware of a very small fraction of the facilities available near them. Diligent facilities combat this knowledge gap by broadcasting information about their facility either locally, or by publishing information on one or multiple public-facing websites or third-party repositories of information (e.g. VGN cores database or Science Exchange). Each additional site of information about a facility increases visibility but also impairs their ability to maintain up-to-date information on all sites. Additionally, most third-party repositories house their data in traditional relational databases that are not indexable by common search engines. To start addressing these problems, the eagle-i project (an free, open-source, open-access, publication platform), has begun integrating its core facility database with external websites and web applications allowing them to synchronize their information in real-time. We present here

two experimental integrations. The Harvard Catalyst Cores webpage originally required independent updates which were not within the direct control of the core directors themselves. The eagle-i linked open data architecture developed now allows the Catalyst cores page to pull information from the Harvard eagle-i server and update all data on it’s page accordingly. Additionally, Harvard’s “Profiles” web application references eagle-i data and links resource information from eagle-i to personnel information in the Profiles database. Because of these direct links, updating information in Harvard’s eagle-i server (which can be accessed directly by facility directors through an account on the eagle-i SWEET), automatically updates information on the Catalyst Cores webpage and updates resource counts linked to a researcher’s public profile. This functionality of the eagle-i platform as a central “point-of-truth” for information has the potential to drastically reduce the effort required to efficiently disseminate core facility information.

94 Improvement of OMSSA for High Accuracy MS/MS Data

A. Kong1, R. Azencott1, D.H. Hawke2

1Department of Mathematics, University of Houston, Houston TX2Department of Translational Molecular Pathology, UT-M.D. Anderson Cancer Center, Houston TX”

PSM (peptide-spectrum-match) scoring is a key step in peptide identification from MS/MS data. The development of high accuracy mass spectrometers brings a challenge to PSM scoring, especially to score calibration. The change of precursor mass tolerance from low accuracy to high accuracy reduces the number of candidate peptides for one spectrum; therefore, a calibration technique that uses the empirical distribution of candidate PSM scores is questionable. Through examples, we show that OMSSA (Open Mass Spectrometry Search Algorithm) outperforms other open-source software on high accuracy MS/MS data. This is most likely due to the fact that the scoring method in OMSSA does not rely on empirical score distributions. To further improve its performance, we incorporated a new scoring method based on matched intensities of candidate PSMs into OMSSA. The most important feature of the new scoring method is that the score distribution, estimated by Monte Carlo simulation, is also independent of empirical PSM score distributions. With this new score method, the performance of OMSSA has been improved. Using test datasets we have achieved results similar to or better than that of the gold-standard search tool, Mascot.