Transcript

Last updated: May, 2017

BCHM 6280 2017 Excel Tutorial Page 1 of 5

Tutorial 1: Using Excel to find unique values in a list Itisnotuncommontohavealistofdatathatcontainsredundantvalues.Geneswithmultipletranscriptisoformsisoneexample.Ifyouareonlyinterestedinthegenesandnotthedifferenttranscripts,thenyouwillprobablywanttofilterthelisttoremovetheredundantvalues.IdidasearchoftheUCSChumangenomebrowserwiththequery“coloncancer”andgotback>500matches.Icreatedatextfilelistingthefirst500matches.YoucandownloadthisdatafromtheExercise1homepagebyclickingonthelinkListofGenesfromUCSC.txt.Thefilehas2columns:GeneNameandChromosomeLocation.YouwillfilteronGeneName.Onceyou’vedownloadedthetextfile,dothefollowing:

• OpenExcelandfromwithinExcelopenthetextdocument.Ifthefileyouwanttoopenisgreyedout,changethedropdownmenutoEnable:AllReadableDocuments.

• Double-clickthefileyouwanttoopenandthisshouldbringuptheTextImportWizard• Itshouldrecognizeitasdelimited.ClicktheNextbuttontodefinethedelimiters.• Bydefault,Excelassumesa.txtfileistab-delimited• ClickNextandthenFinishtofinishtheimport.

Advancedfilter:SelectthecolumnofgenenamesClickontheDatamenuandselectAdvancedfilter(ifyougetawarningaboutbeingunabletodeterminewhichrowcontainscolumnlabelsandyouhaveacolumnheaderinrow1,justclickOK).Checktheradiobutton“Copytoanotherlocation”Thisshouldmoveourmousetothe“Copyto”textbox.Selectacolumn(notColumnsA-C)Checkthebox“Uniquerecordsonly” ClicktheOKbutton.Thisshouldproducealistof208genesfromtheoriginal500genes.

Last updated: May, 2017

BCHM 6280 2017 Excel Tutorial Page 2 of 5

Tutorial 2: Using Excel to manage text data Anissuecommontogenenamesorgeneidentifiersisslightvariationsthatcanpreventtheiridentificationviaadatabaselookup.Anexampleisthatasgeneortranscriptrecordsarereviewedbycurators,theyareoftengivenanappendednumbersuchasNM_0012345.1orNM_0012345.3indicatingwhichversiontheyare.ThebaseidentifierofNM_0012345isthesamebetweenthembutifyourlisthastheappendedversionnumber,thedatabaselookuporExcellookupwon’trecognizethetwoasbeingthesamerecord.Inthisexample,therearetwoExcelfilesavailablefromtheExercise2homepage:ExpressionData.xlsxandGeneInfo.xlsxTheExpressionDatafilehastwocolumns.ThefirsthasEnsemblGeneIDswiththeversionnumber.ThesecondcolumncontainsgeneexpressioninformationintheformofLog2ratiooftreatment/control.TheGeneInfofilehasfourcolumns.ThefirsthasEnsembleGeneIDs,butasthestableidentifierratherthanasaversion.Theremainingcolumnshavethegenesymbol,NCBIGeneIDandgenedescription.YouwanttobeabletobringininformationfromtheGeneInfofileintotheExpressionDatafilebutatthemoment,theydonotsharethesameidentifiers.Tocorrectthis,youwilluseatext-relatedfunctioncalledLEFTtochangetheGeneIDsintheExpressionDatafiletomatchthoseintheGeneInfofile.

1. InsertacolumntotheleftoftheGeneIDcolumnintheExpressionDatafile.2. IncellA2,type=andselecttheLEFTfunction3. SelectcellB2forthetextboxintheFormulaBuilderdialogbox4. Tabtothenum_charsboxandtypein155. ThisshouldreturntheENSG##uptothe.asitwasoriginially6. SelectthenewlygeneratedIDinA2,thencopydowntotheendofthecolumn.TypeCtrl-D

tocopythefunctiondowntherestofthecolumn.7. ThenEdit->copythenewlygeneratedIDsanduseEdit->Paste->Special->Valuestoreplace

theformulawithvalues.8. NowyoucanusethetwofilesinthenextsectiontobringthedatafromGeneInfointothe

ExpressionDatafile Tutorial 3: Using Excel to compare lists of data. Averycommonprobleminbioinformaticsorinformationprocessingofanykindishavingmultiplelistsofdatathatyouwanttocomparetoeachother.InExcelisafunctioncalledVLOOKUPthatmakesthiseasytodo.Itisalsousefulfortransferringdatafrom1worksheettoanother.Forthispartofthetutorial,youwillusetheGeneInfoandyourmodifiedExpressionDatafilefromtheprevioussection.YoucandeletethecolumnfromtheExpressionDatafilethathadtheGeneIDswithversionnumberinthem.Inthispartofthetutorial,youwillbringintheGeneNameandNCBIGeneIDintotheExpressionDatafile.

Last updated: May, 2017

BCHM 6280 2017 Excel Tutorial Page 3 of 5

OpenbothworksheetsinExcel.o IntheExpressionDatafile,insertacolumnbetweencolumns1and2.o Inthesecondrowofcolumn2(cellB2),typeand“=”sign.Thengotothedropdownmenuin

theupperleftoftheworksheet,findthefunction“VLOOKUP”andselectit.IfyoudonotseeVLOOKUPonthemainmenu,scrolldownto“morefunctions”whichopensadialogboxwithalloftheavailableExcelfunctions.Under“lookupandreference”youwillfindVLOOKUP.

o Onceyou’veinsertedthefunction,youmustfillouttheargumentsforthefunctionusingthedialogboxthatopensup.SelectcellA2asthelookupvalue.

o Thenclickintothebox“Table_array”.GouptothewindowmenuandselectGeneInfor_ExcelTutorial.xlsxasshowninFigure2.

o ThiswillactivateGeneInfo.xlsx.

Figure1:InsertingaVLOOKUPfunctionintocolumn2ofExpressionDataworksheet.

Figure2:Selectingsecondworksheetforastable_arrayintheVLOOKUPfunction.

Last updated: May, 2017

BCHM 6280 2017 Excel Tutorial Page 4 of 5

o Selectthefirst2columnsofGeneInfo.xlsx.o Taborclickonthebox“Col_index_num.”Thistellstheargumentwhichcolumnofdatato

bringovertothefirstworksheet.Typeina2.o Inthefinalbox,“Range_lookup,”type“false”.IfA2intheExpressionDataworksheetmatches

A2inGeneInfoworksheet,thenthevaluefromcolumn2ofGeneInfowillbeenteredintocellB2ofExpressionData.Ifthe2cellsdonotmatch,itwillfillin“N/A”.

o Tofillintherestofthecolumn,selectfromcellB2throughthenendofthedataandundertheEditmenu,selectFillDownorusethekeyboardshortcutof“Ctl+D”.

Figure5:Fillingintherestofthecolumnwiththesamefunction. Whenyouaredone,yourExpressionDataworksheetshouldlooklikethatshowninFigure4:

Figure3:Fillingintherestofthecolumnwiththesamefunction.

Figure4:GeneExpressionworksheetaftercompletingVLOOKUP

Last updated: May, 2017

BCHM 6280 2017 Excel Tutorial Page 5 of 5

Atthispoint,thedataincolumn2isstilllinkedtotheGeneInfoworksheet.Youcanseethisifyouclickononeofthegenenamesandlookatwhatisdisplayedinthetextboxatthetopofthesheet.Youdonotwanttoleaveyourfilelikethat,otherwiseeverytimeyouopenitwillgothroughthedatalookupfunctionagain.Toavoidthis,selecttheentirecolumn,copyitandthendoaEdit->PasteSpecialandselect“values”inthe“Pastespecial”dialogbox.Thiswillreplacethefunctionwiththevalueofthefunction.Afteryoucompletethat,clickonagenename.Youshouldseejustthegenenamedisplayedinthetextboxatthetop.

TobringintheNCBIgeneID,justinsertanothercolumnintheExpressionDataworksheetandrepeattheVLOOKUPprocessbringingincolumn3datafromGeneInforatherthancolumn2.

Figure5:GeneExpressionworksheetaftercopyingandpastespecialwithvalues


Top Related