stata_bc_plink.rjla.nov2007.ppt
DESCRIPTION
TRANSCRIPT
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BIOSTATISTIC/BIOINFORMATIC TOOLS FOR BIOSTATISTIC/BIOINFORMATIC TOOLS FOR GENETICS DATA: GENETICS DATA:
DATA MANAGEMENT AND ANALYSISDATA MANAGEMENT AND ANALYSIS
RICHARD ANNEYRICHARD ANNEYNEUROPSYCHIATRIC GENETICS RESEARCH GROUPNEUROPSYCHIATRIC GENETICS RESEARCH GROUP
WORKSHEET, TUTORIALS AND SLIDES AVAILABLE ON
P:\Personal Folders\anneyr\stata9\talk
http://www.medicine.tcd.ie/psychiatry/research/neuropsychiatry/
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
OverviewOverview
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
STATA9STATA9
• A STATISTICAL SOFTWARE PACKAGE
• LESS PRETTY THAN SPSS GUI
• POWERFUL AND “SCRIPT” FRIENDLY
• LESS CLICKING AND DROP-DOWN …MORE SCRIPTING
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
STATA9: SET UP FOLDER STRUCTURESTATA9: SET UP FOLDER STRUCTURE
• SET UP FOLDERS TO STORE YOUR;
• DO-FILES
• CR FILE• AN FILE
• DTA-FILES
• LOG-FILES
• INPUT-FILES (TXT)
• OUTPUT-FILES
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PROBLEM 1: PROBLEM 1: BASIC CASE-CONTROL ASSOCIATION STUDYBASIC CASE-CONTROL ASSOCIATION STUDY
• HOW DO I GET FILES INTO STATA?
• HOW DO I MERGE MY DATA WITH ANOTHER FILE?
• CAN I GENERATE A FEW BASIC STATISTICS ON MY MARKERS?
• CAN I PERFORM A CASE-CONTROL STUDY?
• IS MY QUANTITATIVE VARIABLE ASSOCIATED WITH A GENOTYPE?
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
STATA9: LOOK AT ME!! MAIN WINDOWSTATA9: LOOK AT ME!! MAIN WINDOW
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
STATA9: LOOK AT ME!! DO-WINDOWSTATA9: LOOK AT ME!! DO-WINDOW
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
STATA9: LOOK AT ME!! MAIN WINDOWSTATA9: LOOK AT ME!! MAIN WINDOW
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
STATA9: LOOK AT ME!! DTA-EDITOR STATA9: LOOK AT ME!! DTA-EDITOR WINDOWWINDOW
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PROBLEM 1: PROBLEM 1: BASIC CASE-CONTROL ASSOCIATION STUDYBASIC CASE-CONTROL ASSOCIATION STUDY
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PROBLEM 1: PROBLEM 1: BASIC CASE-CONTROL ASSOCIATION STUDYBASIC CASE-CONTROL ASSOCIATION STUDY
cr00 genotype_qtlsnp.do
1. ADDING TAB-TEXT FILES TO STATA USING THE INSHEET COMMAND, SORTING THE KEY VARIABLE USING THE SORT COMMAND AND SAVE AS *.DTA FILES USING THE SAVE COMMAND
2. CONVERTING “STRINGS” TO NUMBER VARIABLES USING THE GENERATE AND REPLACE COMMAND
3. MERGING USING THE KEY VARIABLE USING THE MERGE COMMAND
4. TABULATING THE MERGE USING THE TABULATE COMMAND AND ORDER VARIABLES USING THE ORDER VARIABLE
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PROBLEM 1: PROBLEM 1: BASIC CASE-CONTROL ASSOCIATION STUDYBASIC CASE-CONTROL ASSOCIATION STUDY
cr00 genotype_qtlsnp.do
1. ADDING TAB-TEXT FILES TO STATA USING THE INSHEET COMMAND, SORTING THE KEY VARIABLE USING THE SORT COMMAND AND SAVE AS *.DTA FILES USING THE SAVE COMMAND
2. CONVERTING “STRINGS” TO NUMBER VARIABLES USING THE GENERATE AND REPLACE COMMAND
3. MERGING USING THE KEY VARIABLE USING THE MERGE COMMAND
4. TABULATING THE MERGE USING THE TABULATE COMMAND AND ORDER VARIABLES USING THE ORDER VARIABLE
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PROBLEM 1: PROBLEM 1: BASIC CASE-CONTROL ASSOCIATION STUDYBASIC CASE-CONTROL ASSOCIATION STUDY
cr00 genotype_qtlsnp.do
1. ADDING TAB-TEXT FILES TO STATA USING THE INSHEET COMMAND, SORTING THE KEY VARIABLE USING THE SORT COMMAND AND SAVE AS *.DTA FILES USING THE SAVE COMMAND
2. CONVERTING “STRINGS” TO NUMBER VARIABLES USING THE GENERATE AND REPLACE COMMAND
3. MERGING USING THE KEY VARIABLE USING THE MERGE COMMAND
4. TABULATING THE MERGE USING THE TABULATE COMMAND AND ORDER VARIABLES USING THE ORDER VARIABLE
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PROBLEM 1: PROBLEM 1: BASIC CASE-CONTROL ASSOCIATION STUDYBASIC CASE-CONTROL ASSOCIATION STUDY
cr00 genotype_qtlsnp.do
1. ADDING TAB-TEXT FILES TO STATA USING THE INSHEET COMMAND, SORTING THE KEY VARIABLE USING THE SORT COMMAND AND SAVE AS *.DTA FILES USING THE SAVE COMMAND
2. CONVERTING “STRINGS” TO NUMBER VARIABLES USING THE GENERATE AND REPLACE COMMAND
3. MERGING USING THE KEY VARIABLE USING THE MERGE COMMAND
4. TABULATING THE MERGE USING THE TABULATE COMMAND AND ORDER VARIABLES USING THE ORDER VARIABLE
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PROBLEM 1: PROBLEM 1: BASIC CASE-CONTROL ASSOCIATION STUDYBASIC CASE-CONTROL ASSOCIATION STUDY
• THE COMBINED *.DTA FILE
• THE TABULATE FUNCTION
• 1= ONLY IN 1st FILE
• 2=ONLY IN 2nd FILE
• 3=IN BOTH 1st & 2nd FILE
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PROBLEM 1: PROBLEM 1: BASIC CASE-CONTROL ASSOCIATION STUDYBASIC CASE-CONTROL ASSOCIATION STUDY
cr00 genotype_qtlsnp.do
1. ADDING TAB-TEXT FILES TO STATA USING THE INSHEET COMMAND, SORTING THE KEY VARIABLE USING THE SORT COMMAND AND SAVE AS *.DTA FILES USING THE SAVE COMMAND
2. CONVERTING “STRINGS” TO NUMBER VARIABLES USING THE GENERATE AND REPLACE COMMAND
3. MERGING USING THE KEY VARIABLE USING THE MERGE COMMAND
4. TABULATING THE MERGE USING THE TABULATE COMMAND AND ORDER VARIABLES USING THE ORDER VARIABLE
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PROBLEM 1: PROBLEM 1: BASIC CASE-CONTROL ASSOCIATION STUDYBASIC CASE-CONTROL ASSOCIATION STUDY
an00 genotype_qtlsnp.do
• CREATING THE LOG FILE USING THE LOG COMMAND
• OPENING THE *.DTA FILE USING THE USE COMMAND
• CREATING GENOTYPE VARIABLES FROM ALLELE VARIABLES USING GTYPE PROTOCOL
• TABULATE THE GENOTYPE VARIABLES USING THE TABULATE COMMAND
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PROBLEM 1: PROBLEM 1: BASIC CASE-CONTROL ASSOCIATION STUDYBASIC CASE-CONTROL ASSOCIATION STUDY
1. TEST HWE USING GTAB COMMAND
2. TEST HWE USING GENHW COMMAND
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PROBLEM 1: PROBLEM 1: BASIC CASE-CONTROL ASSOCIATION STUDYBASIC CASE-CONTROL ASSOCIATION STUDY
1. TEST PAIR-WISE LINKAGE DISEQUILIBRIUM USING PWLD COMMAND
2. TEST ASSOCIATION WITH BINARY TRAIT USING GENCC COMMAND
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PROBLEM 1: PROBLEM 1: BASIC CASE-CONTROL ASSOCIATION STUDYBASIC CASE-CONTROL ASSOCIATION STUDY
• QTLSNP COMMAND MODELS
• CODOMINANT (THREE MODELS)
• DOMINANT
• RECESSIVE
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PROBLEM 1: PROBLEM 1: BASIC CASE-CONTROL ASSOCIATION STUDYBASIC CASE-CONTROL ASSOCIATION STUDY
1. TEST WHETHER A QUANTITATIVE VARIABLE IS ASSOCIATED WITH DIFFERENT INHERITENCE MODELS USING QTLSNP COMMAND - CODOMINANT
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PROBLEM 1: PROBLEM 1: BASIC CASE-CONTROL ASSOCIATION STUDYBASIC CASE-CONTROL ASSOCIATION STUDY
1. TEST WHETHER A QUANTITATIVE VARIABLE IS ASSOCIATED WITH DIFFERENT INHERITENCE MODELS USING QTLSNP COMMAND – DOMINANT
2. NOT ASSOCIATED SO MINIMAL OUTPUT
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PROBLEM 1: PROBLEM 1: BASIC CASE-CONTROL ASSOCIATION STUDYBASIC CASE-CONTROL ASSOCIATION STUDY
1. TEST WHETHER A QUANTITATIVE VARIABLE IS ASSOCIATED WITH DIFFERENT INHERITENCE MODELS USING QTLSNP COMMAND - RECESSIVE
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PROBLEM 1: PROBLEM 1: BASIC CASE-CONTROL ASSOCIATION STUDYBASIC CASE-CONTROL ASSOCIATION STUDY
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BC|SNPmax©BC|SNPmax©
• DATABASE AND ANALYSIS PLATFORM
• MASTER DATABASE FOR STORING ALL OUR “MASTER” GENETIC AND PHENOTYPE DATASETS
• ONGOING PROCESS TO UPLOAD AND MANAGE DATA
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BC|SNPmax: StructureBC|SNPmax: Structure
• FIVE DOMAINS;
1. GENOTYPES/SNPS
2. MAPS
3. PEDIGREES
4. AFFECTION
5. PHENOTYPES
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BC|SNPmax: StructureBC|SNPmax: Structure
• FIVE DOMAINS;
1. GENOTYPES/SNPS
2. MAPS
3. PEDIGREES
4. AFFECTION
5. PHENOTYPES
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BC|SNPmax: StructureBC|SNPmax: Structure
• FIVE DOMAINS;
1. GENOTYPES/SNPS
2. MAPS
3. PEDIGREES
4. AFFECTION
5. PHENOTYPES
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BC|SNPmax: StructureBC|SNPmax: Structure
• FIVE DOMAINS;
1. GENOTYPES/SNPS
2. MAPS
3. PEDIGREES
4. AFFECTION
5. PHENOTYPES
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BC|SNPmax: StructureBC|SNPmax: Structure
• FIVE DOMAINS;
1. GENOTYPES/SNPS
2. MAPS
3. PEDIGREES
4. AFFECTION
5. PHENOTYPES
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BC|SNPmax: StructureBC|SNPmax: Structure
• FIVE DOMAINS;
1. GENOTYPES/SNPS
2. MAPS
3. PEDIGREES
4. AFFECTION
5. PHENOTYPES
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
FROM OUTPUT TO GEN-FILE (VIA FROM OUTPUT TO GEN-FILE (VIA STATA)STATA)
• TWO EXAMPLES
1. BASIC EXCEL FILE
2. TAQ-MAN FILE
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
FROM OUTPUT TO GEN-FILE (VIA STATA):FROM OUTPUT TO GEN-FILE (VIA STATA):BASIC EXCEL FILEBASIC EXCEL FILE
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
FROM OUTPUT TO GEN PED AFF-FILE (VIA FROM OUTPUT TO GEN PED AFF-FILE (VIA STATA):STATA):BASIC EXCEL FILEBASIC EXCEL FILE
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
FROM OUTPUT TO GEN-FILE (VIA STATA):FROM OUTPUT TO GEN-FILE (VIA STATA):BASIC EXCEL FILEBASIC EXCEL FILE
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
FROM OUTPUT TO GEN-FILE (VIA STATA):FROM OUTPUT TO GEN-FILE (VIA STATA):BASIC EXCEL FILEBASIC EXCEL FILE
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
FROM OUTPUT TO GEN-FILE (VIA STATA):FROM OUTPUT TO GEN-FILE (VIA STATA):BASIC EXCEL FILEBASIC EXCEL FILE
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
FROM OUTPUT TO GEN-FILE (VIA STATA):FROM OUTPUT TO GEN-FILE (VIA STATA): TAQ-MAN FILE TAQ-MAN FILE
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
FROM OUTPUT TO GEN-FILE (VIA STATA):FROM OUTPUT TO GEN-FILE (VIA STATA): TAQ-MAN FILE TAQ-MAN FILE
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
FROM OUTPUT TO GEN-FILE (VIA STATA):FROM OUTPUT TO GEN-FILE (VIA STATA): TAQ-MAN FILE TAQ-MAN FILE
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BC|SNPmax BC|SNPmax
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BC|SNPmax: Types of AnalysisBC|SNPmax: Types of Analysis
• QUALITY• PED-CHECK• MERLIN• BASIC MEASURES (MAF, HWE,
CALL)• FAMILY-BASED
• MENDEL• MERLIN• GENEHUNTER• SIMWALK• FBAT/PBAT• TRANSMIT• QTDT• PLINK• HAPLOVIEW• R-PACKAGE
• CASE-CONTROL• ALLELE ASSOCIATION• MENDEL• PHASE• SNPHAP• PLINK• R-PACKAGE
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BC|SNPmax: Types of AnalysisBC|SNPmax: Types of Analysis
• FOR MOST ANALYSIS YOU NEED TO SELECT MATCHED
• GEN
• PED
• MAP – b128 NOW UPLOADED
• AFF
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BC|SNPmax BC|SNPmax
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BC|SNPmax BC|SNPmax
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BC|SNPmax BC|SNPmax
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BC|SNPmax BC|SNPmax
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BC|SNPmax BC|SNPmax
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BC|SNPmax BC|SNPmax
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BC|SNPmax BC|SNPmax
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BC|SNPmax BC|SNPmax
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
BC|SNPmax BC|SNPmax
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PLINK… GETTING STARTEDPLINK… GETTING STARTED
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PLINK…PLINK…
• RUNNING PLINK FROM YOUR OWN COMPUTER
• WHY?
1. MULTIPLE ANALYSES2. KEEP A RECORD OF YOUR WORK IN BAT AND SCRPT3. EASE OF USE4. EASE OF REPEATING TASK5. SCRIPTS NOT DROP DOWN MENUS6. RUNNING >1 CHROMOSOME (BC|SNPmax
ADDRESSED)7. POST-ANALYSIS INTERGRATION USING PERL AND
STATA
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PLINK…PLINK…
• FOLDER STRUCTURE
• ANALYSIS
• DATASET
• OUTPUT
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PLINK… DATASETSPLINK… DATASETS
• PED & MAP
• BINARY FILES
• BINARY PED (BED)
• BINARY MAP (BIM)
• FAMILY FILES (FAM)
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PLINK…PLINK…
• PED & MAP
• BINARY FILES
• BINARY PED (BED)
• BINARY MAP (BIM)
• FAMILY FILES (FAM)
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PLINK…PLINK…
• PED & MAP
• BINARY FILES
• BINARY PED (BED)
• BINARY MAP (BIM)
• FAMILY FILES (FAM)
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PLINK…PLINK…
• PED & MAP
• BINARY FILES
• BINARY PED (BED)
• BINARY MAP (BIM)
• FAMILY FILES (FAM)
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PLINK…PLINK…
• PED & MAP
• BINARY FILES
• BINARY PED (BED)
• BINARY MAP (BIM)
• FAMILY FILES (FAM)
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
EXAMPLE ANALYSES IN PLINK…EXAMPLE ANALYSES IN PLINK…
• DATA TRANSFORMATION
• DATA FILTERING AND PRUNING
• DATA MERGING
• SUMMARY STATS
• MISSINGNESS
• HWE
• MAF
• MENDEL ERRORS
• INCLUSION THRESHOLDS
• POPULATION STRATIFICATION
• ASSOCIATION
• CASE/CONTROL
• QTL
• GxE
• NEW MULTIPLE CORRECTION TESTING (--adjust)
• FAMILY-BASED• TDT• POO
• PERMUTATION• EPISTASIS• HAPLOTYPE ANALYSIS• NEW PROXY-ASSOCIATION (FROM
SNP TO HAPLOTYPE)• R-PACKAGE• NEW MODIFY OUTPUT
• PLOG10• P<x• GENOMIC CONTROL• QQ-PLOT
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PLINK… : RUNNING TDT IN PLINKPLINK… : RUNNING TDT IN PLINK
• CAN RUN FROM COMMAND LINE AND USING gPLINK (GUI)
• RECOMMEND BAT AND SCRPT FILES
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
PLINK… : SUMMARY TABLES IN STATAPLINK… : SUMMARY TABLES IN STATA
• INSHEET THE TDT.CLEAN FILE
• ADD GENE NAMES
• ADD CHROMOSOME POSITION
• ADJUST OR TO RISK
• GENERATE GRAPHS OF DATA
• GENERATE TABLES BY GENE
• GENERATE TABLES BY POSITION
• GENERATE TABLES BY P-VALUE
• SELECT COLUMNS FOR OTHER ANALYSES (GENMAPP)
:NEUROPSYCHIATRIC GENETICS
[BIOSTATISTICS|BIOINFORMATICS] CORE
THE END!THE END!