the shinysispa manual -...

Post on 04-Nov-2019

9 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

TheshinySISPAManual

IntroductionshinySISPA is a web-based tool intended for the researchers who are interested in definingsampleswithsimilargenesetenrichmentprofiles.TheshinySISPAtoolisbasedonanovelSISPAmethodpublishedinourpreviouspaper(Kowalski,etal.,2016).SISPAdefinessamplesaswithandwithoutprofileactivity,onaverage,amonggenesdefinedasaset,byapplyingachangepointmodel (Killick, et. al., 2016) to a composite, between features zscore obtained by adding orsubtracting individual sample zscores between features, depending on their correspondingprofiles(Fig1;seeKowalski,etal.,2016formethoddetails).TheindividualsamplezscoresarecomputedusingtheBioconductorRpackageGSVA(Hänzelmann,etal.,2013)formorethanorequaltothreegenes.

Figure1.SchematicrepresentationoftheSISPAmethod

Thetoolishostedona64bitCentOS6server(http://shinygispa.winship.emory.edu/shinySISPA/)running the Shiny Server program designed to host R Shiny applications. This tool has beenextensively testedonWindows7 andMacPro10operating systemwith firefox and chromebrowser.Givenadatasetof377samplesand16genesunderthetwo-featureanalysis,ittookthreesecondstoobtainshinySISPAdefinedsamplegroupsandlessthanasecondtogeneratethewaterfallplot.Thetimeittakestogeneratesampleprofilediagnosticplotsdependsonthenumberofgenesinaset;ittooklessthan10secondsfor16genesinbothsetsofatwo-featureanalysis.RunningshinySISPAonalocalcomputer:1) DownloadandinstallRorRStudio(version3.3.2.orlater)fromhttps://cran.r-

project.organd2) OpenRandinstallthebelowrequiredpackages:

>install.packages(c("shiny","GSVA",“genefilter”,"changepoint","data.table",“ggplot2”,"plyr"))

3) UserscanrunshinySISPAlocallyusingthesourcecodeavailablefromtheGitHub:https://github.com/BhaktiDwivedi/shinySISPA,bytypingthebelowcommandsinRconsole:

>library(shiny)>runApp("shinySISPA")4) UserscanalsodownloadandruntheappfromGitHubdirectlyusing:>shiny::runGitHub('shinySISPA','BhaktiDwivedi')ExampleSampleDataset We have used publically available multiple myeloma (MM) patient data from the ongoingMultiple Myeloma Research Foundation (MMRF) CoMMpass clinical trial(https://www.themmrf.org/) to identify sample groups using GISPA derived gene setcharacterizingIgHtranslocation,t(14;16)intheMMcelllines(Kowalski,et.al.,2016).Datafrom377patientswithavailableclinicaloutcome,RNA-seqexpression,andcopynumberchangeatpre-treatment was downloaded from the MMRF researcher gateway portal(https://research.themmrf.org) based on the IA6 release of this trial. The expression isnormalized log2transformedcountdata,whilecopynumber isthemeansegmentvalue.Thedata is available to download from GitHub (https://github.com/BhaktiDwivedi/shinyGISPA).Userscanalsoaccessandanalyzethisdatasetbychoosingthe“Exampledata”fromthe“DataInput”optionunderthetwo-featureAnalysisTypeontheweb-interface.GettingStarted1. Selecttheanalysistype

Clickonoptionsunder“AnalysisType”toselectasinglefeatureortwo-featureanalysis.Herefeatureisdefinedbyadatatype(e.g.,expression,somaticmutations).

An example of single-feature analysis could be identifying samples with increased (ordecreased) expression profilewithin the defined gene set signature, while a two-featureanalysiscouldbebasedonacombinationofanytwodatatypes,e.g.,identifyingsamplesthatexhibitdecreasedgeneexpressionanddecreasedcopychange.

2. UploadtheInputData

1

Uploadtheinputdatafilegiventheselecteddatatypefrom(1).Useruploadstheinputdatafileforgenesofinterestandprofiletodefinesamplesonwithinthedatatype.Thegenesetscanbedefinedbasedonpriorknowledgederivedfromeitherbiologicalprocesses,pathways,biomarkersdiscovery,genomicanalysis,orintegratedgenesetanalysis(e.g.,GISPA).Aprofileisagenomicchangeofeitherincrease(“up”)ordecrease(“down”)withinaspecificfeatureordatatype.

Fileformatrequirements:

• Maximumfilesizelimitofupto200MB.• ASCII formatted tab-delimited file, where each row represents a gene and each

columnasample.• Here isanexampleofauseruploaded ‘File Input’ foronedata typeanalysis.First

columnisgeneidfollowedbysamplesdataasshowninthescreenshotbelow:

• Hereisanexampleofuseruploaded‘FileInput1’andFileInput2’fortwodatatypeanalysis.Foreachinputdatafile,firstcolumnisgeneidfollowedbysamplesdataasshowninthescreenshotbelow:

s1 s2 s3 s4 s5g1 6.79 8.25 10.27 11.13 9.70g2 12.05 9.40 11.34 11.71 9.62g3 0.00 0.00 0.00 0.00 0.00g4 5.15 5.72 6.04 4.68 3.41g5 8.96 5.09 6.10 1.70 6.03

Genenames

Samplenames

DataPoints

s1 s2 s3 s4 s5v1 0.50 0.23 0.40 0.60 0.71v2 0.20 0.50 0.34 1.00 0.62v3 0.00 0.30 0.00 1.00 0.00v4 0.15 0.55 0.60 0.68 0.41v5 0.40 0.09 0.10 0.70 0.03v6 0.05 0.90 0.00 0.65 0.22

s1 s2 s3 s4 s5g1 6.79 8.25 10.27 11.13 9.70g2 12.05 9.40 11.34 11.71 9.62g3 0.00 0.00 0.00 0.00 0.00g4 5.15 5.72 6.04 4.68 3.41g5 8.96 5.09 6.10 1.70 6.03

Genenames

Samplenames

DataPoints

GenevariantID’s

Samplenames

DataPoints

• WhenrunningshinySISPAontwodatatypes,thesamplesmustoverlapbetweenthe

twoinputdatafiles.Thetwoinputdatafilesmusthavethesameexactnumberofsamples and sample (or column) names. Rows that corrospond to genes (probes,variants,oranyotherid)mayormaynotbethesame.

• Aminimumofatleastonegeneandtensamplesarerequired.• Noduplicatedcolumnnamesorduplicatedrownamesareallowedandanalysiswill

bestopped.• Genes (or rows) with zero variance across all samples will be excluded from the

analysis.

SnapshotofshinySISPAusingSingle-featureanalysis:

2

Click on “Choose File” option under “File Input 1” to upload a single data file

Select the desired “Sample Profile” to define the sample profile of interest. Here usercan select either “up” or “down” profile to define samples with increased or decreaseddata change in the user input gene set, i.e., to identify samples with increasedexpression within the gene set; or samples with increased variant change within thegene set.

SnapshotofSISPAusingTwo-featureanalysis:

3. SelecttheChangepointMethod

Usercanspecify/modifythechangepointdetectionmethod(KillickR,etal.,2016), i.e.,tofind the optimal break pointswithin the estimated profile sample score (Kowalski, et al.,2016).Thechangescanbefoundinmeanand/orvarianceusingtheuser-specifiedmethod(“AMOC”,“BinSeg”,“PELT”,or“SeqNeigh”)giventheallottedmaximumnumberofchangepoints.Notethat thenumberofchangepoints identifiedmaydiffer for thesamedatasetdependingonthechangepointRpackageversioninstalledonthesystem.Currentlywearerunningchangepointversion2.2.2onourhostingserver.

Click on “Choose File” option under “File Input 1” to upload the First data type file

Select the desired “Sample Profile” to define the sample profile for first data type.Here user can select either “up” or “down” profile to define samples with increasedor decreased data change in the user input data.

Click on “Choose File” option under “File Input 2” to upload the Second data type file

Select the desired “Sample Profile” to define the sample profiles from the seconddata type. Here user can select either “up” or “down” profile to define samples withincreased or decreased data change in the user input data.

2

4. Result

Theresultsareoutputinfourtabs:

4.1. InputData4.2. SISPAResults4.3. WaterfallPlot4.4. SampleProfile

4.1. InputData

Summarizestheuserinputdataintermsoftheinputnumberofgenes(orrows),numberofsamples (or columns), and number of genes (or rows) with zero variance among all thesamplestobeexcludedfromtheanalysis.Dataforonlythefirstfivegenesoverallsamplesisshownandvisualizedasboxplot.

3

Select method (“AMOC”, “PELT”, “SegNeigh”, or “BinSeg”) for change point identificationin the data setSelectmaximum number of change points to search for using the method

Select changes inmean, variance, or both for change point identification

4.2. SISPAResults

Tableofdefinedsampleprofileswiththeirgenesetenrichmentscorefortheselecteddatatypeandsampleprofileofinterest.Theenrichmentscoresforeachsamplearecomputedusingthezscoremethod(Hänzelmann,et.al.,2013).Thescorestatisticsisrankorderedbytheuserselectedprofile(e.g.,upordown)foreachsample.Achangepointmodel(Killick,et.al.,2016)isthenappliedtothesamplescorestoidentifygroupsofsamplesthatshowsimilargenesetprofile.

4.1

Samplesassignedavalueof1inthe‘sample_group’columnarethesampleswiththeuserselectedprofileactivity,whilesamplesassignedavalueof“0”arethesampleswithouttheprofile activity. The results table canbe searched, sorted, and filteredby anyof the fourcolumns.Thescatterplotontherightdisplaysallthechangepointsdetectedwithinthedata-set,samplesfallinginthetopmostchangepointarethesampleswiththeprofileactivity.Thefrequencyplotattherightmostbottomrepresentsthedistributionofthenumberofsampleswithandwithouttheprofileactivity.

4.3. WaterfallPlotVisualrepresentationofallsamplessortedbytheirzscores.Sampleswiththeprofileactivityhavethehighestscoreandareshowninorangefilledbars,whilesampleswithouttheprofileactivityare shownwithgrey-filledbars. Forany selected sampleprofile,whether “up”or“down”,theorange-filledbarswiththehighestscoresarethemostdesirableandrepresentsampleswithprofileactivity.

4.2

4.4. SampleProfileRepresents thedistributionof theuser-inputdataoverall by the samplegroupswithandwithouttheprofileactivity.Thisenablestoassesswhetherornot,onanaverage,theentiregenesetdefinesthesamplegroupsbytheuserselectedsampleprofile.Withinthe“SampleProfile” tab, user also have the option to select a “Gene to Display” from the “ChooseOptions”dropdownmenuontherightmostpanel,toseetheinputdatadistributionbyeachsamplesortedwithineachsampleprofile.Thisenablestheusertoidentifythegene(s)thataremostdistinguishingamongsampleswithandwithouttheprofileactivity.

4.3

5. SaveResultsUsercandownloadtheresultsasoneexcelfilebyclickingonthe“Save”buttonontheleftpanel.Thedownloadincludestableandpdfplotsoftheresults.

4.4

5

CitationPlease cite the SISPAmethodas: Kowalski J,Dwivedi B,NewmanS, Switchenko JM,PaulyR,GutmanDA,etal.Geneintegratedsetprofileanalysis:acontext-basedapproachforinferringbiologicalendpoints.NucleicAcidsResearch.2016;44(7):e69.ReferencesHänzelmann,S.,Castelo,R.andGuinney,A.GSVA:genesetvariationanalysisformicroarrayandRNA-seqdata.BMCBioinformatics2013;14(7).Killick R, Eckley IA. changepoint: An R Package for Changepoint Analysis. J Stat Softw.2014;58(3):1-19.KowalskiJ,DwivediB,NewmanS,SwitchenkoJM,PaulyR,GutmanDA,etal.Geneintegratedsetprofile analysis: a context-based approach for inferring biological endpoints. Nucleic AcidsResearch.2016;44(7):e69.

top related