the microsoft biology foundation and its...
TRANSCRIPT
![Page 1: The Microsoft Biology Foundation and its Applicationssalsahpc.indiana.edu/ECMLS2010/presentation/Simon_Mercer...at Harvard and David Heckerman’s team (Microsoft Research) • Discovered](https://reader033.vdocument.in/reader033/viewer/2022052014/602b82c9aa682713be35f028/html5/thumbnails/1.jpg)
The Microsoft Biology Foundation and its Applications
Simon MercerDirector for Health & Wellbeing
Microsoft External Research
![Page 2: The Microsoft Biology Foundation and its Applicationssalsahpc.indiana.edu/ECMLS2010/presentation/Simon_Mercer...at Harvard and David Heckerman’s team (Microsoft Research) • Discovered](https://reader033.vdocument.in/reader033/viewer/2022052014/602b82c9aa682713be35f028/html5/thumbnails/2.jpg)
MICROSOFT EXTERNAL RESEARCH - SOFTWARE
![Page 3: The Microsoft Biology Foundation and its Applicationssalsahpc.indiana.edu/ECMLS2010/presentation/Simon_Mercer...at Harvard and David Heckerman’s team (Microsoft Research) • Discovered](https://reader033.vdocument.in/reader033/viewer/2022052014/602b82c9aa682713be35f028/html5/thumbnails/3.jpg)
• Phil Bourne • Lynn Fink
Ontology Add-in for Word
Source code and binary:http://research.microsoft.com/ontology/
Relationships:Ontology browser
Intent: Term recognition & disambiguation
• John Wilbanks
Services: Ontology download web service
![Page 4: The Microsoft Biology Foundation and its Applicationssalsahpc.indiana.edu/ECMLS2010/presentation/Simon_Mercer...at Harvard and David Heckerman’s team (Microsoft Research) • Discovered](https://reader033.vdocument.in/reader033/viewer/2022052014/602b82c9aa682713be35f028/html5/thumbnails/4.jpg)
NodeXL
Binary and source code:http://nodexl.codeplex.com
![Page 5: The Microsoft Biology Foundation and its Applicationssalsahpc.indiana.edu/ECMLS2010/presentation/Simon_Mercer...at Harvard and David Heckerman’s team (Microsoft Research) • Discovered](https://reader033.vdocument.in/reader033/viewer/2022052014/602b82c9aa682713be35f028/html5/thumbnails/5.jpg)
3D Molecule Viewer
Binary and source code:http://3dmoleculeviewer.codeplex.com/
•PDB File Viewer•Written in C# using WPF
![Page 6: The Microsoft Biology Foundation and its Applicationssalsahpc.indiana.edu/ECMLS2010/presentation/Simon_Mercer...at Harvard and David Heckerman’s team (Microsoft Research) • Discovered](https://reader033.vdocument.in/reader033/viewer/2022052014/602b82c9aa682713be35f028/html5/thumbnails/6.jpg)
The Trident Scientific Workflow Workbench
• Built on top of Windows Workflow Foundation
• Write once, deploy and run anywhere…
• Visually program workflows
• Libraries of activities and workflows
• Automatic provenance capture
A visual workflow environment that allows researchers to better manage, evaluate and interact with even the most complex scientific datasets
Available at: http://research.microsoft.com/en-us/collaboration/tools/trident.aspx
![Page 7: The Microsoft Biology Foundation and its Applicationssalsahpc.indiana.edu/ECMLS2010/presentation/Simon_Mercer...at Harvard and David Heckerman’s team (Microsoft Research) • Discovered](https://reader033.vdocument.in/reader033/viewer/2022052014/602b82c9aa682713be35f028/html5/thumbnails/7.jpg)
Origins of a Platform
![Page 8: The Microsoft Biology Foundation and its Applicationssalsahpc.indiana.edu/ECMLS2010/presentation/Simon_Mercer...at Harvard and David Heckerman’s team (Microsoft Research) • Discovered](https://reader033.vdocument.in/reader033/viewer/2022052014/602b82c9aa682713be35f028/html5/thumbnails/8.jpg)
Jaroslav Pillardy, Computational Biology Service Unit, Cornell University• BioHPC: Suite of 28 applications modified and adapted for efficient use in an
Windows HPC environment with ASP.NET interface• Currently supports the areas of DNA sequence analysis, protein structure
prediction, population genetics and phylogenetics
Jim Hogan, SilverMap: Queensland University of Technology• MQUTer supports research into bioinformatics, sensor networks, visualization
and parallelism on the Microsoft platform• Six new tools – the latest under development using MBF and Silverlight 3 which
visualizes DNA sequence similarity and is integrated into MBF (and will shortly be available as an Excel plug-in)
Previous bioinformatics project outputs
Robin Gutell, Center for Computational Biology and Bioinf., UT Austin• Suite of tools to explore evolutionary relationships and predict function of RNA
molecules• Available as a website – also a complementary open-source suite of Windows-
based tools, under development using MBF (H1 FY11)
+ Cancer Bioinformatics in ERMarty Humphrey, Department of Computer Science, University of Virginia• The caBIG platform connects consumers, the care delivery system, and the research
community. Close to 60 NCI-designated Cancer Centers are deploying caBIG®
infrastructure and tools, as are 16 Community Cancer Centers that in the aggregate touch 20 million lives.
• This project pilots caBIG clients on Windows, leveraging and extending MBF, and tutorials demonstrating the value of Microsoft technologies to the caBIG developer and user community.
![Page 9: The Microsoft Biology Foundation and its Applicationssalsahpc.indiana.edu/ECMLS2010/presentation/Simon_Mercer...at Harvard and David Heckerman’s team (Microsoft Research) • Discovered](https://reader033.vdocument.in/reader033/viewer/2022052014/602b82c9aa682713be35f028/html5/thumbnails/9.jpg)
Fighting HIV and AIDS
• Four-year collaboration between Bruce Walker at Harvard and David Heckerman’s team (Microsoft Research)
• Discovered three key insights to fight HIV:– Immune system is led astray by decoy
epitopes (Nature Medicine, 2006)– Frameshift epitopes exist (JEM, 2010)– Natural killer cells directly attack HIV (Nature
Medicine, in review)• 40+ publications, including Nature and Science• Walker has obtained $110M+ subsequent
funding• PhyloD.Net, a tool for inferring HIV evolution in
an individual, is used by 100+ HIV researchers and is now part of Microsoft Biology Foundation
• Numerous press stories including Business Weekand NPR
![Page 10: The Microsoft Biology Foundation and its Applicationssalsahpc.indiana.edu/ECMLS2010/presentation/Simon_Mercer...at Harvard and David Heckerman’s team (Microsoft Research) • Discovered](https://reader033.vdocument.in/reader033/viewer/2022052014/602b82c9aa682713be35f028/html5/thumbnails/10.jpg)
Microsoft BiologyFoundation
• Beta 1: Nov 5, 2009 (MS Connect)• Beta 2: Feb 10, 2010 (CodePlex)• V1 release: July 2010
• Early adopters from industry and academia
• Bio-IT Alliance partner
• Leveraging Microsoft assets: Pivot, NodeXL, TRIDENT, Iron Python, etc
• Showcasing Microsoft products: Excel/Office, Visual Studio 2010, .NET 4.0, WPF, Silverlight
Convergence on a Strategic Platform for Bioinformatics
Azure engagement through XCG(Azure BLAST, PhyloD services)
Product engagement and prototyping use by TC, HSG
• V1 launch June 2010• Keynote presentations
planned• Training course in prep• Community ownership • Foundation of future MSR
genomics projects• Foundation of all future ER
genomics engagements with academia
![Page 11: The Microsoft Biology Foundation and its Applicationssalsahpc.indiana.edu/ECMLS2010/presentation/Simon_Mercer...at Harvard and David Heckerman’s team (Microsoft Research) • Discovered](https://reader033.vdocument.in/reader033/viewer/2022052014/602b82c9aa682713be35f028/html5/thumbnails/11.jpg)
What is The Microsoft Biology Foundation?An open-source library of reusable bioinformatics
algorithms, services and functions built on the .NET platform
Benefits: Easy to parallelize algorithms Easy to distribute computations and workflows Easy to visualize massive data sets Ability to leverage greater strength from existing use of
other MS technologies Provides transition from local to cloud-based computation
and data storage
![Page 12: The Microsoft Biology Foundation and its Applicationssalsahpc.indiana.edu/ECMLS2010/presentation/Simon_Mercer...at Harvard and David Heckerman’s team (Microsoft Research) • Discovered](https://reader033.vdocument.in/reader033/viewer/2022052014/602b82c9aa682713be35f028/html5/thumbnails/12.jpg)
Architecture: Namespaces
Bio• Sequences• Alphabets• Alignments• Genomic Intervals• Phylogeny
Bio.IO• FASTA / FASTQ• GenBank• NEXUS• …
Bio.Algorithms• Translation• Alignment• Sequence Assembly• …
Bio.Web• BLAST• ClustalW• BioHPC• …
![Page 13: The Microsoft Biology Foundation and its Applicationssalsahpc.indiana.edu/ECMLS2010/presentation/Simon_Mercer...at Harvard and David Heckerman’s team (Microsoft Research) • Discovered](https://reader033.vdocument.in/reader033/viewer/2022052014/602b82c9aa682713be35f028/html5/thumbnails/13.jpg)
Objectives
• Modular by design• Commonly used features• Exceptionally well-
documented• Extensible• Interoperable
![Page 14: The Microsoft Biology Foundation and its Applicationssalsahpc.indiana.edu/ECMLS2010/presentation/Simon_Mercer...at Harvard and David Heckerman’s team (Microsoft Research) • Discovered](https://reader033.vdocument.in/reader033/viewer/2022052014/602b82c9aa682713be35f028/html5/thumbnails/14.jpg)
Initial Areas of Focus
• Genomics– Sequencing– Analysis and Annotation
• Advanced Research– Phylogenetics– Genome Wide Association– Haplotype reconstruction
• Next Targets– Visualization– Large data sets
![Page 15: The Microsoft Biology Foundation and its Applicationssalsahpc.indiana.edu/ECMLS2010/presentation/Simon_Mercer...at Harvard and David Heckerman’s team (Microsoft Research) • Discovered](https://reader033.vdocument.in/reader033/viewer/2022052014/602b82c9aa682713be35f028/html5/thumbnails/15.jpg)
mbf.codeplex.com• Open Source
Available free of charge for commercial and non-commercial use and modification under the MS-PL license (http://opensource.org/licenses/ms-pl.html)
• Community-DevelopedMoved to CodePlex, Creating advisory board and building a community
• Community-CuratedModify code, find bugs, contribute new features
• V1 ReleaseLate June 2010
![Page 16: The Microsoft Biology Foundation and its Applicationssalsahpc.indiana.edu/ECMLS2010/presentation/Simon_Mercer...at Harvard and David Heckerman’s team (Microsoft Research) • Discovered](https://reader033.vdocument.in/reader033/viewer/2022052014/602b82c9aa682713be35f028/html5/thumbnails/16.jpg)
• Build executables– Visual Studio
• Office add-in– BioExcel
• Commandline scripting access– Iron Python, PowerShell
• Workflow Activities– Trident, WF
• Services on the Cloud– Azure
Different Styles of Usage
![Page 17: The Microsoft Biology Foundation and its Applicationssalsahpc.indiana.edu/ECMLS2010/presentation/Simon_Mercer...at Harvard and David Heckerman’s team (Microsoft Research) • Discovered](https://reader033.vdocument.in/reader033/viewer/2022052014/602b82c9aa682713be35f028/html5/thumbnails/17.jpg)
mbf.codeplex.com
![Page 18: The Microsoft Biology Foundation and its Applicationssalsahpc.indiana.edu/ECMLS2010/presentation/Simon_Mercer...at Harvard and David Heckerman’s team (Microsoft Research) • Discovered](https://reader033.vdocument.in/reader033/viewer/2022052014/602b82c9aa682713be35f028/html5/thumbnails/18.jpg)
18
Selecting Restriction Endonucleases: DNA PReDuST(Aditi Technologies)
Fragment Size Distribution Graph
Restriction Map [Circular DNA]
![Page 19: The Microsoft Biology Foundation and its Applicationssalsahpc.indiana.edu/ECMLS2010/presentation/Simon_Mercer...at Harvard and David Heckerman’s team (Microsoft Research) • Discovered](https://reader033.vdocument.in/reader033/viewer/2022052014/602b82c9aa682713be35f028/html5/thumbnails/19.jpg)
Computational Biology Applications Suite for High Performance Computing (BioHPC)Computational Biology
Service Unit
![Page 20: The Microsoft Biology Foundation and its Applicationssalsahpc.indiana.edu/ECMLS2010/presentation/Simon_Mercer...at Harvard and David Heckerman’s team (Microsoft Research) • Discovered](https://reader033.vdocument.in/reader033/viewer/2022052014/602b82c9aa682713be35f028/html5/thumbnails/20.jpg)
• MBF Team– Mike Zyskowski, Chris Wu
• Microsoft Research– David Heckerman, Bob Davidson, Carl Kadie, Yogesh Simmhan,
Jennifer Listgarten, Jonathan Carlson
• Cornell University– Jarek Pillardy
• Queensland University of Technology– Jim Hogan
• University of Texas at Austin– Robin Gutell
• Aditi Technologies– Vivek Kumar
• Illumina Corporation– Scott Kahn
• Johnson & Johnson Pharmaceutical Research Division LLC.– Dimitris Agrafiotis, Victor Lobanov, Jeremy Kolpak
Acknowledgements
mbf.codeplex.com
![Page 21: The Microsoft Biology Foundation and its Applicationssalsahpc.indiana.edu/ECMLS2010/presentation/Simon_Mercer...at Harvard and David Heckerman’s team (Microsoft Research) • Discovered](https://reader033.vdocument.in/reader033/viewer/2022052014/602b82c9aa682713be35f028/html5/thumbnails/21.jpg)
© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date
of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.