sequencing and supercomputers - intel · sequencing and supercomputers spain’s national center of...

4
Intel® Xeon® Processor E5 Family Big Data Analytics Healthcare & Life Sciences CASE STUDY Sequencing and Supercomputers Spain’s National Center of Genomic Analysis (CNAG) is involved in large- scale sequencing projects in areas as diverse as cancer genetics, rare diseases, host-pathogen interactions, preservation of endangered species, evolutionary studies, and the improvement of agriculturally useful species. Its mission is to deliver research and results that help make citizens’ lives better. CNAG has worked with Intel and Atos to build its latest analytics plat- form, which helps drive new insights faster, with wide-ranging applications. Challenges Billions of bases. CNAG sequences over 800 billion genomic bases every day and needs to conduct quick, accurate analysis to process large volumes of data as effi- ciently as possible Needles in haystacks. Identifying the small number of variations that drive break- through insights can be time-consuming and complex Solutions Strong combination. Atos Big Data & Security service line developed a tailor-made compute cluster, powered by the Intel® Xeon® processor E5 family, to conduct in-depth high-performance data analytics (HPDA) on genome sequences Impact Healthy results. CNAG plans to offer more in-depth and comprehensive genomic analy- ses to healthcare organizations to improve treatments Leading the way. The new platform maintains CNAG’s position as a leading genomic research facility, enabling it to achieve higher output while driving cost savings

Upload: others

Post on 14-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sequencing and Supercomputers - Intel · Sequencing and Supercomputers Spain’s National Center of Genomic Analysis (CNAG) is involved in large-scale sequencing projects in areas

Intel® Xeon® Processor E5 FamilyBig Data AnalyticsHealthcare & Life Sciences

CASE STUDY

Sequencing and Supercomputers

Spain’s National Center of Genomic Analysis (CNAG) is involved in large-scale sequencing projects in areas as diverse as cancer genetics, rarediseases, host-pathogen interactions, preservation of endangered species,evolutionary studies, and the improvement of agriculturally useful species.Its mission is to deliver research and results that help make citizens’ livesbetter. CNAG has worked with Intel and Atos to build its latest analytics plat-form, which helps drive new insights faster, with wide-ranging applications.

Challenges• Billions of bases. CNAG sequences over 800 billion genomic bases every day andneeds to conduct quick, accurate analysis to process large volumes of data as effi-ciently as possible

• Needles in haystacks. Identifying the small number of variations that drive break-through insights can be time-consuming and complex

Solutions • Strong combination. Atos Big Data & Security service line developed a tailor-madecompute cluster, powered by the Intel® Xeon® processor E5 family, to conduct in-depthhigh-performance data analytics (HPDA) on genome sequences

Impact • Healthy results. CNAG plans to offer more in-depth and comprehensive genomic analy-ses to healthcare organizations to improve treatments

• Leading the way. The new platform maintains CNAG’s position as a leading genomicresearch facility, enabling it to achieve higher output while driving cost savings

Page 2: Sequencing and Supercomputers - Intel · Sequencing and Supercomputers Spain’s National Center of Genomic Analysis (CNAG) is involved in large-scale sequencing projects in areas

Covering a lot of bases CNAG works with scientists from univer-sities, hospitals, research centers andbiotechnological and pharmaceuticalcompanies across Europe. It is a keyplayer in many international researchinitiatives, including the InternationalCancer Genome Consortium (ICGC), theInternational Rare Disease ResearchConsortium (IRDiRC) and the Interna-tional Human Epigenome Consortium(IHEC). CNAG supports around 120 re-searchers, who conduct around 300projects each year. This results in it se-quencing around 800 Gigabases perday, the equivalent of sequencing eightfull human genomes at 30-fold cover-age (the coverage required for reliableanalysis), making it one of the largest ca-pacity sequencing facilities in Europe.

Genome sequencing is a complex anddemanding task. “Each genome is madeup of over 3 billion bases. It’s not simplya case of starting at the first base in thegenome and then identifying them all inorder until you have the completestring,” says Ivo Gut, director of CNAG.“All genomes are 99.9 percent identical,but it’s the variations in that remaining0.1 percent that we’re looking for. Tofind them, we need to break eachgenome down into strings of a hundredor so bases, sequence the short stringsand rebuild it. It’s like doing a jigsawpuzzle with billions of pieces.”

Once found, these variations in thegenome bases can be used to identifycertain characteristics of an organism –anything from eye color, to disease sus-ceptibility, to tolerance to medication.

These genetic characteristics, known asphenotypes, can then be applied in de-veloping new medications and clinicalpractices, growing more disease-resis-tant crops, or even preserving endan-gered species. For example, CNAG hassupported the genome sequencing ofthe Iberian lynx, a highly endangered bigcat now found only in southern Spain,which is particularly susceptible to in-fection. With only around 200 left in thewild, it is critical that scientists are ableto identify the genetic traits that makethese animals so prone to illness, inorder to better protect them. The centerwas also the first organization in theworld to sequence the olive genome,using a 1,000-year-old tree near Madrid,and has participated in the genome se-quencing of many other agriculturalcrops including tomatoes, peaches, mel-ons, almonds and fungi.

New fields of studyBuilding on the success of its first fiveyears in operation, CNAG has nowopened two new departments, special-izing in biomedical genomics and popu-lation genomics. The BiomedicalGenomics Department takes data gener-ated within CNAG and combines it withother data sources to gain new insights.For example, it has investigated a partic-ular mutation related to cystic fibrosiswhich impacts the severity with whichthe disease manifests in patients. “Withaccess to a larger data set, we can studythese differences in more detail thanhas been previously possible, and spotmodifiers in the milder cases that could

CNAG conducts large-scale DNA sequencingand analysis, ensuring Spain’s competitivenessin the strategic field of genomics

“We’re certainly handling bigdata, but what we’re reallyafter is big information: the

ability to identify the valuableinsight from the sequences infront of us. To do this well, weneed good data, good analyticsand good tools. We quality con-trol these elements carefully.”

Ivo Gut,Director,

National Center of Genomic Analysis

Sequencing and Supercomputers

Page 3: Sequencing and Supercomputers - Intel · Sequencing and Supercomputers Spain’s National Center of Genomic Analysis (CNAG) is involved in large-scale sequencing projects in areas

Lessons Learned Supercomputing platforms like theone at CNAG can generate amazingresults. However, for them to scalewell as data volumes and the de-mand for insight increase, it’s essen-tial to have the flexibility to grow.CNAG worked with Intel and Atos todesign and implement a sophistica-ted analytics platform that can growseamlessly over time.

be targeted in the more severe cases tohelp deliver better treatment to peoplewith the condition,” says Gut.

Meanwhile, the Population GenomicsDepartment is seeking opportunities toanalyze the genomes of a given popula-tion or group of people. This is impor-tant when assessing the potentialimpact or effectiveness of new clinicaltreatments, and how this might vary de-pending on the population to whichthey are administered. “Say we’re look-ing at a new diabetes medication,” saysGut. “We might test it on a group ofFinnish diabetics versus a group ofSpanish control patients by adding acorrecting factor to remove the popula-tion effects, and we’ll be able to say forcertain whether any impact of the drugis influenced by a phenotype within ei-ther group for increased or reduced sus-ceptibility to the disease.”

CNAG also works on a number of proj-ects funded by the European Commis-sion – for example, collaborating withother institutions around the region toresearch lung disease, and conducting astudy of 20,000 breast cancer patients.

Data, analytics and toolsThe work that CNAG does has huge po-tential across a wide range of disci-plines. The main challenge facing Gutand his colleagues in reaching this po-tential is scale. In every genome of 3.2billion bases, there are many base varia-tions that are potentially responsible fordiseases, making them challenging andtime-consuming to spot. “At the mo-

ment we’re able to identify some ofthese variations, but the aim is to beable to locate and accurately predict theeffect of every one of them in whichevergenome we’re looking at,” says Gut.“This will enable us to make big-picturepredictions about, say, how a specifictype of cancer is likely to respond to agiven combination of medication. Oncewe get to this stage – and it’s not far off– we’ll be able to put the findings to usein a clinical environment and have animpact on how many diseases aretreated.”

With such complex analytics, a powerfulcomputing platform is essential. Analy-sis of each genomic sequence may takehundreds of CPU hours, which limits thenumber of projects that can be carriedout, and makes repeating an analysisdifficult. “This is fine for one-off re-search projects, but we want to be ableto offer sequence analysis on an indus-trial scale, with the ability to repeat oneanalysis a number of times if required,”says Gut. “We’re certainly handling bigdata now – and it’s growing all the time –but what we’re really after is big infor-mation: the ability to identify the reallyvaluable insight from the sequences infront of us. To do this well, we needgood data, good analytics and goodtools. We quality control all these ele-ments very carefully.”

When CNAG built its sequencing andanalytics environment, it issued a tenderand identified Atos as its solutionprovider of choice. Gut explains: “It wasnot only about increasing the sequenc-ing capacity by procuring new equip-

ment, but designing from scratch theappropriate computer infrastructure,with the assistance of a technologicalleader with expertise in the field of ge-nomics. Choosing a flexible core infra-structure enabling limitless growth,together with genomics projects, wasalso essential.”

Five years in, it refreshed its supercom-puting platform to ensure the scalabilityand performance it needs to supportthis continued up-scaling were in place.Specialized sequencers feed the ge-nomic sequence data to the analyticsplatform, which is run on a cluster of 52bullx R4222E2* servers powered byIntel Xeon processors E5 family.

Sequencing and Supercomputers

Page 4: Sequencing and Supercomputers - Intel · Sequencing and Supercomputers Spain’s National Center of Genomic Analysis (CNAG) is involved in large-scale sequencing projects in areas

1 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measuredusing specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information andperformance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go tohttp://www.intel.com/performance.

Intel does not control or audit the design or implementation of third party benchmark data or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Websites or others where similar performance benchmark data are reported and confirm whether the referenced benchmark data are accurate and reflect performance of systems available for purchase.

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration.Check with your system manufacturer or retailer or learn more at http://www.intel.com

Intel, the Intel logo, and Intel Xeon are trademarks of Intel Corporation in the U.S. and other countries.

*Other names and brands may be claimed as the property of others.

© 2015, Intel Corporation 0615/JNW/RLC/XX/PDF 332686-001EN

“We’ve found that working with Atos asour single point of contact for the wholesolution has made managing it very sim-ple for us, and enabled us to focus onour research rather than technology ad-ministration,” says Simon Heath, chieftechnology officer, CNAG. “We’ve alsobeen very impressed by the latest IntelXeon processors that we have deployed,which by our estimates have con-tributed to a ten-fold increase in analyt-ics software performance1.”

Diving deeperWith the latest platform in place, CNAGis looking forward to providing moregranular and varied insights to its endusers. For instance, hospitals will beable to use its genomic research in dif-ferent ways to combat different types ofillness. “When treating a rare disease, a

physician may never have seen anothercase of it in their whole career,” explainsGut. “But we can provide them with phe-notype information that could helpthem make an identification and diagno-sis faster. In many cases, this can lead toquick and effective treatment – such asputting the patient on a course of vita-min supplements. On the other hand, inthe oncology unit, diagnosis is not somuch of an issue. Here, it’s more aboutidentifying the right medication or com-bination of treatments to have thegreatest success against a given type oftumor – and this is where the phenotypedata will come in use here.”

Furthermore, CNAG will soon reinforceits capacity with a few new sequencers,each of them producing 3.2TBases perweek that corresponds to 32 full humangenomes at 30x. This new capacity willgo hand in hand with the expansion of

its supercomputing systems, with whichCNAG will expand not only its sequenc-ing capacity, but also the scope of its re-search. This in turn will help CNAGmaintain its position as a center of inter-national reference in genomic research,ensuring Spain’s leading position in thestrategic field of genomics.

www.cnag.crg.eu

Find the solution that’s right for your or-ganization. View success stories from yourpeers, learn more about server productsfor business and check out the IT Center,Intel’s resource for the IT Industry.

Sequencing and Supercomputers