bms 353 bioinformacs for biomedical scienceopendsi.cc/bioinformatics/assets/lecture_wk7.pdf · part...
TRANSCRIPT
BMS353
BMS353Bioinforma.csforBiomedicalScience
Modulecoordinator:DrMartaMilo
BMS353
PartA:Presenta.onofthemoduleBreak–ques.onansweringPartB:Introduc.on
Today’sOutline
BMS353
PartAPresenta.onofthemodule
BMS353
Whatisallabout?
This module will describe fundamental concepts and technologies underlyingcomputa.onalbiologyandbioinforma.cs.
Computa(onalBiologyisthedevelopmentandapplica5onofdatadrivenmathema5calmodelingandcomputa5onalsimula5ontechniquestostudyofbiological,behavioral,andsocialsystems
Bioinforma(csisaninterdisciplinaryfieldofsciencethatdevelopsmethodsandso?waretoolsforunderstandingbiologicaldataBioinforma5cscombinescomputerscience,sta5s5cs,mathema5cs,andengineeringtoanalyseandinterpretbiologicaldata. adaptedfromwikipedia
BMS353
• Nextgenera.onSequencingdataanalysis
• Noisedeconvolu.on
• Modellinguncertainty
• Integra.onofdata
• Modellingobserveddataforpredic.ons
WhatisaBioinforma.cian?
WhatIdoinmyresearch:
WhatamIgoingtoteachyou?
SomeofthatSTUFF
BMS353
Whatarethelearningoutcomesofthismodule?
Thismoduleaimsto:1. provideanunderstandingofthefundamentalconceptsandtechnologies
underlyingcomputa.onalbiologyandbioinforma.cs
2. equipbiologystudentswithbasicknowledgeofmathema5calconceptsthemwithmethodsofBioinforma.csandComputa.onalbiology
3. useamul5disciplinaryapproachintegratedwithprogrammingtoolsandsta.s.calconceptsunderpinningadvanceddataanalysisandmethodsthataresuitableforhigh-throughputdataanalysis
4. providenewtransferableskills
BMS353
Howwillyoubelearning?
• Lecturesontheore.calconcepts
• OnlineresourcesfromopensourcesoSware
• Wri.ngsimplescriptsfordataanalysisduringprac.calclasses
• Self-markingandforma.vefeedback
• Groupdiscussionandforumthroughthemodulewebsite
• Smallresearchprojectonrealdata• Bangingyourheadonthecomputer..
• Givingyourself.metoadapttothisnewwayofthinking…
BMS353
WhatwillyougainfromBMS353?
• Trainingindataanalysisandbasicprogrammingskillswiththeaimsofbeingawareoftheeffectsofexperimentaldesigninthedataanalysis
• AgoodunderstandingoftechnologiesandmethodsforBioinforma.csanduseofworkflowandpipelinesfordataanalysis
• Newqualifica.onsthatwillincreaseyouremployability
• Deeperinsightintotheprinciplesofconduc.ngaresearchdataanalysisproject
• Anewseoftransferableskills,likeprogrammingandawarenessofcloudcompu.nganddatasharing
• Learninganewterminologyandnewinterdisciplinaryskills
BMS353
ModuleOutline
Theteachingconsistsoftwohoursoflecturesandtwooflabclasseseachweek.Thelectureswillbefollowedbyprac.calclasses.Inthelabclasseswewillusecodingtotransformtheoryinprac.ceLabclassesaresplitintwogroupstoreduceclassnumbersCodingrequiresprac.ce,themorethebe\er
BMS353
ModuleOutline(cont.)
• Course-worksareessen.altolearnthecodingskills–dothem.
• Hitthedeadlinesfortheself-assessmenttomonitoryourprogressandhighlightproblemyoumighthave.
• Makesureyoupar.cipateac.velytotheinterac.vesessionsintheclassandinthelabs.
• Usetheresourcesonthemodulewebsiteand
• Readcarefullythenotebookandfollowtheinstruc.ons
• Avoidcatchup!RememberBMS353isdifferentfrombiologyteachingandcanbeoverwhelmingifleSallattheend.
• Pleasenote,ifyouemailaques.onthatcanbeansweredbyreadingthemodulehandbookorinstruc.onsonMOLEormodulewebsiteyouwillnotreceiveananswer.
BMS353
BMS353website
BMS353
ThetoolswewilluseJupyternotebook(OriginallyIpythonnotebook)Combinescomputerprograms(code),text,data,resultsintooneinterac.vedocument
Apopularprogramminglanguageinareassuchasbioinforma.cs,sta.s.csanddataanalysis.
Wewilluseacloudcompu.ngenvironmentcalledCoCalc(SageMathCloud)
Wewilluseourbraintocreatenewknowledge
Somemathema.calconcepts
BMS353
BMS353assessment
Theexamforthismodulewillbesplitintwoparts:PartA–AMul.pleChoiceQues.ontestforthedura.onof1hrand30minutes,thatwillcount30%ofthefinalgradePartB–Anotebookwiththeimplementa.onofallocatedprojectsthatwillcountfor70%ofthefinalgrade.Theprojectwillbeacollec.onofallthetoolsexperiencedintheprac.callabsimplementedonasetofrealdata.Itwillbedevelopedingroupsofthreestudents,butnotebookwillhavetobehandedindividually.Thelabprac.calnotebookshandedineveryweekduringthemodulewillcons.tuteforma.vefeedbackthatcanbeusedforthefinalproject.
BMS353
MCQassessment
Eachques.onwillhave4possibleresponsesA,B,CorD.ONLYONERESPONSEISCORRECTINEACHCASE.Eachques.onisworthonemark,correctanswerwillcountas1,anincorrectanswerwillcountas-0.5.Notansweredques@onswillcountas0.
1. WhatisthemainsubjectofBMS353:A.PhycologyB.Sta.s.csC.Computa.onalbiologyD.ComputerScience
2.WhatlevelstudentsBMS353isaimedat:A.Level3-BMSB.PostgraduateC.MasterstudentsD.ComputerSciencestudents
3.Therewillbenomathema.csinBMS353:A.TRUEB.FALSEC.TRUEonlyinodddaysD.TRUEonlyinevendays
Student1:1C,2B,3Amark=0Student2:1C,2-,3-mark=1
BMS353
PartBIntroduc.on
BMS353
Cloudcompu.ng
Cloudcompu5ng,orsimply“thecloud”,alsoknownason-demandcompu.ngisamodelforenablingon-demandaccesstoasharedpoolofconfigurableresources
SamJohnston–fromWikipedia
Thecloudmetaphor:thenetworkelementsrepresen.ngtheservicesareinvisibletotheuser,likeobscuredbyacloud
• Costefficient• Largespacestorage• Backupandrecovery• Easyaccess• Quicktogainfunc.onality• Incen.vescollabora.onanddatasharing
Advantages
Disadvantages• Technicalissues• Securityinthecloud• Pronetoa\ack
BMS353
Cloudcompu.ng:anexampleAveryeffec.veuseofthecloudresourcesanditscommercialexploita.onisgivenbyAmazon
Theyusedcloudcompu.ngtocreatetheconceptofElas@cCompu@ng(EC2).ItisakeypartoftheAmazonWebServices(AWS),whichiscomposedofscalableelas.ccomputeunit(ECU)thatwereintroducedasanabstrac.onofcomputerresources.Ausercancreate,launch,andterminateserverusageasneeded.Itisbasedona“payingbythehourforac.veservers”thisiswhyitiscalled"elas.c".Itsglobalfeatureallowsuserstocontroloverthegeographicalloca.onofinstances(serverusage),op.misinglatencyandredundancy.
Firsttoallowcompanytorentscalablecompu.ngresourcesTheirretailecommercesiteisen.relybaseoncloudcompu.ng
BMS353
BigDataandDataSharingBigdataisaverygenerictermtoindicatedatasetsthataresolargeorcomplexthattradi.onaldataprocessingapplica.onsareinadequateforminingit.
Visualiza.onofdailyWikipediaeditscreatedbyIBM.Atmul.pleterabytesinsize,thetextandimagesofWikipediaareanexampleofbigdata.
HighvolumeHighvelocityHighvarietyHighlyvariableHighvaria.oninqualityHighcomplexity
BMS353
Therearemanychallengeswhendealingwithbigdata,someofthemare:• Dataanalysis• Datacura.on• Searchingengines• Datasharing• Datastorageandtransfer• Datavisualiza.on• Informa.onprivacyHowever,bigdatahasahighpredic5vepoweranditsaccuracymayleadtomoreconfidentdecisionmaking.
BigDataandDataSharing(cont.)
Inbiology:Withtheadventofhigh-throughputgenomics,lifescien.stsarestar.ngtograpplewithmassivedatasets,encounteringbigdatachallenges
TechnologyFeature,Nature2013
Analysingthelargeamountofgenomicdatawithlocalinfrastructureisimpossible.Thedataisthenmovedtothecloudforanalysisandstorage.Datasharingisbecomingcrucialforbiologicaldata.
BMS353
CoCalcwww.cocalc.com
WeareusingthecloudtolearninBMS353.Theresourcesonthecloudareusedasteachingtool
BMS353
JupyterNotebooksonCoCalcCloud
WewilluseJupyterNotebooksandtheirkernelsonCoCalcforallourprac.calclasses.AJupyterNotebookskernelisa“computa5onalengine”thatexecutesthecodewri\enintheNotebookdocument.Inthismodule(BMS353)wewilluseRkernelstoimplementourdataanalysisinthenotebooks.Therewillbeallocatedfolderandstoragespacetoourproject:BMS353YouwillaccessyourassignmentsanddatausingCoCalcwithawebbrowser.EverythingwillbestoredinCoCalcfolderallocatedtoyou.Thecloudwillbackupandsecureourwork,aswellasgivinguscomputa.onal.meforthedataanalysisAllthelabprac.calsandthefinalprojectwillbemarkedandassessedfromnotebookssavedintheCoCalcfolders.
BMS353
BasicprogrammingterminologyProgramminglanguage=isalanguageformallydesignedtocommunicateinstruc.ontoamachine,i.e.acomputer,tocontrolbehaviorortoexpressamathema.calconstructinnumericalform(makeopera.ons,moreorlesscomplex)Algorithm=itisaprocedureorformulaforsolvingaproblemKernels=computa.onalenginethatisac.vatedbyaspecificlanguage(i.e.R,Python,Cetc.)Scripts=alistofinstruc.onsthatrepresentthecommandneededtorepresentatask.IthasalogicalstructureandadefinedstructurefordatainputImplementa@on=theprocessofpuqngintoeffectthelistofinstruc.onsthatarespecifiedinthescript.Thisisdonebyusingnumericalvaluesasinput.Theimplementa.onprocesswillproduceaafinalsetofvalues.Debug=Processforiden.fyingandremovingerrorfromscriptsObject=virtualcontainerofvaluesstoredintheworkingspace.Itisusedtoimplementtheinstruc.onsandtostorevaluesduringtheimplementa.onandasfinalset.ProgrammingFunc@on=itisaprocedureorarou.nethatencapsulatea“task”.Manyinstruc.onsarecombinedinone“word”(thenameoftheprogrammingfunc.on)whichwillimplementthat“task”onasetofspecifiedinput.ReadandWrite=Theprocessofuploadingdataintotheworkspaceandtodownloaddatafromtheworkingspaceintoalocalorremotearchive(folder)
BMS353
Basicmathema.cs
BMS353
Basicmathema.csnota.on
Singlevaluesandvectors
xandyarevaluesfromtherealnumbersx, y ∈ℜ
Z
X
Y
�
A ≡ (x, y, z)A
x
y
z Ingeneralx ≡ (x1, x2,..., xN )
xi ∈ℜ
i =1,...,N
Thevaluesxi arecalledvariablessincetheycanassumearangeafixedvaluesTheparameterarefixedvaluesthatweindicateinmathema.calnota.onwithGreekle\ers
α,β,µ,σ ,λ.....
BMS353
Basicmathema.csnota.on(cont.)Matricesaretablesofvaluesorle\ersthatareorganisedinrowsandcolumns.Incommonusetheyonlyhavetwodimensions,inmoreadvancedusetheycanhavethree.Vectorsarespecialcasesofmatrices,theyhaveanumberofNcolumnsandonlyonerow
A = [3x4] Opera@onwithMatrixSumandDifferencesamedimensionsMul.plica.onsnumberofcolumnofthefirstmatrixneedtobethesameasnumberofrawofthesecondmatrix.Mul.plica.onisdonesothat:
BMS353
Basicmathema.csnota.on(cont.)
Awayofwri.nganota.onforlargesumsormul.plica.onistousetheGreeksymbolsof
∑
∏
Forsumming
Formul.plying
ForsummingNvalueswewillusethefollowingnota.on:ix /σ
i=1
N
∑
Formul.plyingNvalueswewillusethefollowingnota.on: ix /σi=1
N
∏
BMS353
Basicmathema.csnota.on(cont.)
Afunc.onisarela.onfromasetofinputtoasetofpossibleoutputs,whereeachinputisrelatedtoexactlyoneoutput.
f (x) = x / 2
outputInput(variable)
f (x) = 4x + 4
Whentheinputisonewesayaone-dimensionfunc5on
f (x, y) = 2x +2y
Whentheinputismorethatonevariablewesayamul5-dimensionfunc5on.Withtwovariablewesayabi-dimensionalfunc5on
f (x, y /α) =2x +
2yα
Wecanalsohavefunc.oncondi5onaltoaparameter.Inthiscasewecallthemcondi5onalfunc5ons
Whereαhasvaluefromasetofevennumberbetween0and10
BMS353
Summary
• WhatisBMS353aboutandwhatyouexpecttolearnandgainaSertakingBMS353
• Howtogaininforma.onaboutthemoduleandwheretofindlinkstoaddi.onalreadingmaterial,lecturescontentandprac.calclasses(Website)
• Howtointeractfordiscussionandproblem-solving
• Howyouwillgetassessed
• ToolswewillbeusinginBMS353
• Cloudcompu.ngandBigData
• JupyterNotebooksandCoCalcCloud
• Basicprogrammingterminology
• Refreshedsomebasicmathema.calno.onsandnota.ons.