family-based analysis tutorial - golden helix, inc€¦ · note: pedigree spreadsheets are denoted...

15
Family-Based Analysis Tutorial Release 8.1 Golden Helix, Inc. Feb 13, 2019

Upload: others

Post on 28-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Family-Based Analysis Tutorial - Golden Helix, Inc€¦ · Note: Pedigree spreadsheets are denoted as such by a pedigree icon in the Project Navigator as well as blue headers for

Family-Based Analysis TutorialRelease 8.1

Golden Helix, Inc.

Feb 13, 2019

Page 2: Family-Based Analysis Tutorial - Golden Helix, Inc€¦ · Note: Pedigree spreadsheets are denoted as such by a pedigree icon in the Project Navigator as well as blue headers for

Contents

1. Data Preparation 2A. Import Pedigree Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2B. Import Phenotype Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4C. Import Genotypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4D. Merge Spreadsheets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2. Quality Assurance 7A. Quality Control by Marker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7B. Quality Assurance by Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3. Association Analysis 11A. Run PBAT Genotype Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11B. Plot Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

i

Page 3: Family-Based Analysis Tutorial - Golden Helix, Inc€¦ · Note: Pedigree spreadsheets are denoted as such by a pedigree icon in the Project Navigator as well as blue headers for

Family-Based Analysis Tutorial, Release 8.1

Updated: December 6, 2018

Level: Advanced

Packages: PBAT Analysis

This tutorial leads you through family-based association analysis using the PBAT statistical package incorporated intoSNP & Variation Suite 8. Covered workflows include data preparation, quality assurance testing, association analysis,and basic visualization of results.

Golden Helix PBAT is developed in collaboration with Dr. Christoph Lange of Harvard University’s School of PublicHealth.

Note: The data used in this tutorial is for demonstration purposes only as it consists of simulated phenotypic infor-mation for the CEU HapMap samples.

Requirements

To complete this tutorial you will need to download and unzip the following file, which includes several datasets.

Download

PBAT-Tutorial.zip

Files included in the above ZIP file:

• CEU - PED.csv - Actual pedigree information for the CEU HapMap samples (Phase III).

• CEU - SIM - PHENO.csv - Simulated phenotype and clinical data.

• CEU - GENO - Chr22.dsf - Actual chromosome 22 genotypes for the CEU HapMap samples (Phase III)generated from a combination of Affymetrix and Illumina arrays.

We hope you enjoy the experience and look forward to your feedback.

Contents 1

Page 4: Family-Based Analysis Tutorial - Golden Helix, Inc€¦ · Note: Pedigree spreadsheets are denoted as such by a pedigree icon in the Project Navigator as well as blue headers for

1. Data Preparation

In order to run PBAT in SVS 8 you need, at minimum, a spreadsheet containing pedigree information (including FamilyID, Patient ID, Mother ID, Father ID, Sex, and Affection Status) and genetic data (either genotypes or continuousvariables, such as log ratios). A fundamental change from previous versions of Golden Helix PBAT is how phenotypeinformation is handled. In order to access phenotype data in SVS 8, you first need to join it with your pedigree andgenetic data. The following step leads you through importing each data type separately and then merging into a singlespreadsheet.

A. Import Pedigree Information

Before you can begin you need to create a new project.

• Open SVS and from the Welcome Screen select File > New Project.

• Name the project PBAT Tutorial, browse to a directory where you want the project saved, keep the defaultgenome assembly Homo sapiens (Human) GRCh37 (hg19) (Feb 2009), and click OK. This will open the ProjectNavigator.

The first file to import is CEU - PED.csv contained within the downloaded zip file. This is a comma-delimited CSVfile with pedigree information for the CEU HapMap samples (Phase III).

• Select Import > Family Pedigree > Text Pedigree.

• Browse to the directory where you saved CEU - PED.csv, select CEU - PED.csv, and click Open.

• Under Row Labels select Use column number: 1.

• Choose the Sex is encoded as 0/1/2 (or ?/1/2) radio button.

• Choose the Affection Status is encoded as 0/1/2 (or ?/1/2) radio button.

Note: If the default options (?/0/1) are used for encoding Sex and Affection Status, the resulting spreadsheetwill not be recognized as a pedigree spreadsheet.

• Click OK.

This will create a new pedigree spreadsheet called CEU - PED Pedigree Dataset - Sheet 1 (Figure 2a).

Note: Pedigree spreadsheets are denoted as such by a pedigree icon in the Project Navigator as well as blue headersfor the pedigree columns at the front of the spreadsheet. If your imported spreadsheet has neither of these, it has notbeen recognized as a pedigree spreadsheet, and so certain analysis options will not be present.

2

Page 5: Family-Based Analysis Tutorial - Golden Helix, Inc€¦ · Note: Pedigree spreadsheets are denoted as such by a pedigree icon in the Project Navigator as well as blue headers for

Family-Based Analysis Tutorial, Release 8.1

Figure 2a. Pedigree spreadsheet.

A. Import Pedigree Information 3

Page 6: Family-Based Analysis Tutorial - Golden Helix, Inc€¦ · Note: Pedigree spreadsheets are denoted as such by a pedigree icon in the Project Navigator as well as blue headers for

Family-Based Analysis Tutorial, Release 8.1

B. Import Phenotype Information

Figure 2b. Simulated phenotype spreadsheet

Next you need to import CEU - SIM - PHENO.csv. This is a comma-delimited CSV file with simulated phenotypeinformation. It is used for demonstration purposes only.

• From the Project Navigator select Import > Text.

• Browse to the directory where you saved CEU - SIM - PHENO.csv, select CEU - SIM - PHENO.csv, andclick Open.

• Leave the rest of the parameters as defaults and click OK.

This will create a new spreadsheet called CEU - SIM - PHENO - Dataset - Sheet 1 (Figure 2b).

C. Import Genotypes

Last, you need to import CEU - GENO - Chr22.dsf. This file contains actual genotypes on chromosome 22 for theCEU samples, which were generated by a combination of Affymetrix and Illumina platforms.

• From the Project Navigator select Import > Golden Helix DSF.

4 1. Data Preparation

Page 7: Family-Based Analysis Tutorial - Golden Helix, Inc€¦ · Note: Pedigree spreadsheets are denoted as such by a pedigree icon in the Project Navigator as well as blue headers for

Family-Based Analysis Tutorial, Release 8.1

Figure 2c. Genotype spreadsheet.

C. Import Genotypes 5

Page 8: Family-Based Analysis Tutorial - Golden Helix, Inc€¦ · Note: Pedigree spreadsheets are denoted as such by a pedigree icon in the Project Navigator as well as blue headers for

Family-Based Analysis Tutorial, Release 8.1

• Browse to the directory where you saved CEU - GENO - Chr22.DSF, select CEU - GENO - Chr22.DSF, andclick Open.

This will create a new marker mapped spreadsheet called CEU - GENO - Chr 22 - Sheet 1 (Figure 2c).

D. Merge Spreadsheets

Now that you have all three spreadsheets in the project you need to join them together. When joining spreadsheets itdoesn’t matter which one you start from. However, if there is certain data you want located toward the front of yourspreadsheet for easier viewing (e.g. phenotype data) you will want to initiate the join from that spreadsheet. Whenpedigree data is available (and denoted as such) this information will always be the first six columns of the spreadsheet.

• Open CEU - PED Pedigree Dataset - Sheet 1 and select File > Join or Merge Spreadsheets.

• From the spreadsheet chooser select CEU - SIM - PHENO - Dataset - Sheet 1 and click OK.

• Enter PED + PHENO for New dataset name:.

• Under Spreadsheet as Child of choose Current Spreadsheet.

• Leave all other parameters as the defaults and click OK.

This will create a new spreadsheet PED + PHENO - Sheet 1. Now join this one with the genotype spreadsheet.

• From PED + PHENO - Sheet 1 select File > Join or Merge Spreadsheets.

• Select CEU - GENO - Chr22 - Sheet 1 and click OK.

• Enter CEU All for New dataset name:.

• Under Spreadsheet as Child of choose Project root.

• Leave all other parameters as the defaults and click OK.

You now have all the data in one spreadsheet, CEU All - Sheet 1, and are ready for analysis.

Note: In addition to performing family-based association testing using genotypes as covariates you can also performassociation with various CNV covariates. Though not covered in this tutorial, you would go about PBAT CNV Anal-ysis in the same manner as PBAT Genotype Analysis, though instead of joining a genotype spreadsheet with yourpedigree and phenotype information, you would join your CNV data. To learn more about processing CNV data, seethe Copy Number Variation (CNV) Analysis Tutorial.

6 1. Data Preparation

Page 9: Family-Based Analysis Tutorial - Golden Helix, Inc€¦ · Note: Pedigree spreadsheets are denoted as such by a pedigree icon in the Project Navigator as well as blue headers for

2. Quality Assurance

New in SVS are several quality control metrics to control for poor quality SNPs and samples. This tutorial focusesspecifically on PBAT Family-Based QC, which enables the detection of Mendelian errors and samples with overallpoor genotype quality.

Note: Though not covered in this tutorial, it is still appropriate to apply other non-family-based quality assurancemetrics to exclude poor quality samples and markers from analysis. Several additional options are available under theGenotype > Quality Assurance and Utilities spreadsheet menu. For more information about these options, see theGenotype Data Quality Assessment and Utilities section of the SVS Manual.

A. Quality Control by Marker

• Open CEU All - Sheet 1 and select Genotype > PBAT Family-Based QA.

• Under Computation parameters check Use alternative rapid pedigree algorithm. This option needs to bechecked in order for PBAT to report Mendelian errors.

• Under Output choose Output by marker.

• Leave all parameters as the defaults and click Run.

Upon completion a new spreadsheet is created, PBAT QA Results (by Marker) (Figure 3a), with various qualitycontrol statistics. In this tutorial we’ll focus on removing SNPs that have one or more Mendelian errors.

• Right-click the Mendelian errors column and select Activate by Threshold.

• Select <= 0 and click OK.

This will inactivate all the rows where there are Mendelian errors. We will use the active rows in this spreadsheet toactivate their respective columns in the CEU All - Sheet 1 spreadsheet.

• From the PBAT QA Results (by Marker) spreadsheet go to Select > Apply Current Selection to SecondSpreadsheet.

• Choose to apply filtered rows to CEU All - Sheet 1 then Click OK.

This will create a new spreadsheet, CEU All - Sheet 2, with 19,090 active columns. This tool will also inactivatethe pedigree and phenotype columns–to reactivate these, left-click once on the Family ID column header, then whileholding down the Shift button, click on the Age phenotype column header.

7

Page 10: Family-Based Analysis Tutorial - Golden Helix, Inc€¦ · Note: Pedigree spreadsheets are denoted as such by a pedigree icon in the Project Navigator as well as blue headers for

Family-Based Analysis Tutorial, Release 8.1

Figure 3a. PBAT QA Results by marker

8 2. Quality Assurance

Page 11: Family-Based Analysis Tutorial - Golden Helix, Inc€¦ · Note: Pedigree spreadsheets are denoted as such by a pedigree icon in the Project Navigator as well as blue headers for

Family-Based Analysis Tutorial, Release 8.1

Figure 3b. PBAT QC Results by proband

A. Quality Control by Marker 9

Page 12: Family-Based Analysis Tutorial - Golden Helix, Inc€¦ · Note: Pedigree spreadsheets are denoted as such by a pedigree icon in the Project Navigator as well as blue headers for

Family-Based Analysis Tutorial, Release 8.1

B. Quality Assurance by Sample

The latest version of PBAT incorporates a novel test that assesses the genotyping quality of individual probands infamily-based association studies. Published in PLoS Genetics [Fardo, 2009] these tests are “ideally suited as the finallayer of quality assurance filters in the cleaning process of genome-wide association studies.”

• Open CEU All - Sheet 2 and select Genotype > PBAT Family-Based QA.

• Again, check Use alternative rapid pedigree algorithm under Computation parameters.

• This time select Output by proband under Output and click Run.

Another new spreadsheet is created, PBAT QA Results (by Proband) (Figure 3b.), this time with quality controlmetrics for each proband. In the paper cited above, Fardo et al. suggests that, on a genome-wide scale, probands witha score greater than 30 are considered to have poor genotyping quality.

• Right-click on the Tgw column header and select Sort Descending.

Notice there are 5 samples with a Tgw value greater than 30. However, this particular dataset only contains genotypesfor chromosome 22 so the statistics reported do not necessarily translate to a whole genome scale. Therefore, for thistutorial we will not exclude any samples.

10 2. Quality Assurance

Page 13: Family-Based Analysis Tutorial - Golden Helix, Inc€¦ · Note: Pedigree spreadsheets are denoted as such by a pedigree icon in the Project Navigator as well as blue headers for

3. Association Analysis

Now that important quality control metrics have been considered, you’re ready to run PBAT analysis on the remainingsamples and SNPs. There are many different configurations of association tests and parameters one could run in PBAT.This tutorial covers a basic workflow. For more detailed information on the various options please reference the PBATFamily-Based Analysis section of the SVS Manual.

A. Run PBAT Genotype Analysis

• Open the CEU All - Sheet 2 spreadsheet and select Genotype > PBAT Genotype Analysis.

This will open the PBAT Genotype Analysis window. The first window enables you to select various phenotypes,predictor variables, interactions, and more for analyses. For this tutorial we will only consider Affection Status.

Figure 4a. PBAT Results spreadsheet

11

Page 14: Family-Based Analysis Tutorial - Golden Helix, Inc€¦ · Note: Pedigree spreadsheets are denoted as such by a pedigree icon in the Project Navigator as well as blue headers for

Family-Based Analysis Tutorial, Release 8.1

• Select Affection Status in the upper-left box on the Select Phenotypes tab.

• Click the Test Statistic and Computational tab.

• Check Output -log 10 p-values under Output Format.

• Leave all other parameters as defaults and click Run.

Upon completion a results spreadsheet, PBAT Results is created (Figure 4a.) This spreadsheet reports a number ofstatistics, of greatest interest being -log10 pvalue(FBAT) and power(FBAT). For a complete description of these andthe other statistics reported please see the PBAT Family-Based Analysis section of the SVS Manual.

B. Plot Results

We will examine both the -log10 pvalue(FBAT) and power(FBAT) columns.

Figure 4b. Plot of -log10 pvalues (FBAT)

• From the PBAT Results spreadsheet, right-click on the -log10 pvalue (FBAT) column and select Plot Variablein GenomeBrowse.

12 3. Association Analysis

Page 15: Family-Based Analysis Tutorial - Golden Helix, Inc€¦ · Note: Pedigree spreadsheets are denoted as such by a pedigree icon in the Project Navigator as well as blue headers for

Family-Based Analysis Tutorial, Release 8.1

• Zoom into chromosome 22 by copy and pasting 22: 13,501,202 - 51,304,566 into the address bar at the top ofthe GenomeBrowse window.

This opens the plot viewer with -log10 pvalues displayed according to chromosome and position (Figure 4b.). Youcan add additional plots to this view from the User Graphs node in the Graph Control Interface.

• Go to File > Plot and click the Project button then select the PBAT Results spreadsheet and check thepower(FBAT) item and then click Plot & Close.

You should now have two graphs in the plot viewer.

B. Plot Results 13