grid , Облачные технологии, Большие Данные ( big data )
DESCRIPTION
GRID , Облачные технологии, Большие Данные ( Big Data ). Кореньков В.В. Директор ЛИТ ОИЯИ Зав. кафедры РИВС Университета «Дубна». Grids, clouds, supercomputers. Grids, clouds, supercomputers, etc. Grids Collaborative environment Distributed resources (political/sociological) - PowerPoint PPT PresentationTRANSCRIPT
GRID, Облачные технологии, Большие Данные (Big Data)
1/98
Кореньков В.В.Директор ЛИТ ОИЯИ
Зав. кафедры РИВС Университета «Дубна»
Mirco Mazzucato DUBNA-19-12-09
2
Grids, clouds, supercomputers, etc.
Ian Bird 2
Grids• Collaborative environment• Distributed resources (political/sociological)• Commodity hardware (also supercomputers)• (HEP) data management• Complex interfaces (bug not feature)
Supercomputers• Expensive• Low latency interconnects• Applications peer reviewed• Parallel/coupled applications• Traditional interfaces (login)• Also SC grids (DEISA, Teragrid)
Clouds• Proprietary (implementation)• Economies of scale in management• Commodity hardware• Virtualisation for service provision and encapsulating application environment• Details of physical resources hidden• Simple interfaces (too simple?)
Volunteer computing• Simple mechanism to access millions CPUs• Difficult if (much) data involved• Control of environment check • Community building – people involved in Science• Potential for huge amounts of real work
Many different problems:Amenable to different solutions
No right answer
Grids, clouds, supercomputers..
Концепция «Облачных вычислений»
Все есть сервис (XaaS)
AaaS: приложения как сервис PaaS: платформа как сервис SaaS: программное обеспечение как сервис DaaS: данные как сервис IaaS: инфраструктура как сервис HaaS: оборудование как сервис
Воплощение давней мечты о компьютерном обслуживании на уровне обычной коммунальной услуги: масштабируемость
оплата по реальному использованию (pay-as-you-go)
Что такое Cloud computing?
Централизация IT ресурсов
Виртуализация IT ресурсов
Динамическое управление IT ресурсами
Автоматизация IT процессов
Повсеместный доступ к ресурсам
Упрощение IT услуг
Стандартизация IT инфраструктуры
Real World Problems Taking UsBEYOND PETASCALE
1 PFlops100 TFlops10 TFlops1 TFlops
100 GFlops10 GFlops
1 GFlops100 MFlops
1993 20171999 2005 2011
SUMOf
Top500
2023
#1
2029
Aerodynamic Analysis: Aerodynamic Analysis: Laser Optics: Laser Optics: Molecular Dynamics in Biology: Molecular Dynamics in Biology: Aerodynamic Design: Aerodynamic Design: Computational Cosmology: Computational Cosmology: Turbulence in Physics: Turbulence in Physics: Computational Chemistry: Computational Chemistry:
1 Petaflops1 Petaflops10 Petaflops10 Petaflops20 Petaflops20 Petaflops
1 Exaflops1 Exaflops10 Exaflops10 Exaflops
100 Exaflops100 Exaflops1 Zettaflops1 Zettaflops
Source: Dr. Steve Chen, “The Growing HPC Momentum in China”,Source: Dr. Steve Chen, “The Growing HPC Momentum in China”,June 30June 30thth, 2006, Dresden, Germany, 2006, Dresden, Germany
Example Real World Challenges:Example Real World Challenges:• Full modeling of an aircraft in all conditionsFull modeling of an aircraft in all conditions• Green airplanesGreen airplanes• Genetically tailored medicineGenetically tailored medicine• Understand the origin of the universeUnderstand the origin of the universe• Synthetic fuels everywhereSynthetic fuels everywhere• Accurate extreme weather prediction Accurate extreme weather prediction
10 PFlops100 PFlops
10 EFlops1 EFlops
100 EFlops1 ZFlops
What we can just model today with
<100TF
Reach Exascale by 2018From GigFlops to ExaFlops
Sustained T
eraFlo
p
Sustained P
etaFlo
p
Sustained G
igaFlop
Sustain
ed E
xaFlo
p
“The pursuit of each milestone has led to important breakthroughs in science and engineering.”
Source: IDC “In Pursuit of Petascale Computing: Initiatives Around the World,” 2007
~1987
~1997
2008
~2018
Note: Numbers are based on Linpack Benchmark. Dates are approximate.
7
Top 500Site System
CoresRmax (TFlop/s)
Rpeak (TFlop/s)
Power (kW)
1National University of Defense Technology China
Tianhe-2 (MilkyWay-2) - TH-IVB-FEP Cluster, Intel Xeon E5-2692 12C 2.200GHz, TH Express-2, Intel Xeon Phi 31S1P NUDT
3120000 33862.7 54902.4 17808
2DOE/SC/Oak Ridge National Laboratory United States
Titan - Cray XK7 , Opteron 6274 16C 2.200GHz, Cray Gemini interconnect, NVIDIA K20x Cray Inc.
560640 17590.0 27112.5 8209
3DOE/NNSA/LLNL United States
Sequoia - BlueGene/Q, Power BQC 16C 1.60 GHz, Custom IBM
1572864 17173.2 20132.7 7890
4RIKEN Advanced Institute for Computational Science (AICS)
Japan
K computer, SPARC64 VIIIfx 2.0GHz, Tofu interconnect Fujitsu
705024 10510.0 11280.4 12660
5DOE/SC/Argonne National Laboratory United States
Mira - BlueGene/Q, Power BQC 16C 1.60GHz, Custom IBM
786432 8586.6 10066.3 3945
6Texas Advanced Computing Center/Univ. of TexasUnited States
Stampede - PowerEdge C8220, Xeon E5-2680 8C 2.700GHz, Infiniband FDR, Intel Xeon Phi SE10PDell
462462 5168.1 8520.1 4510
7Forschungszentrum Juelich (FZJ) Germany
JUQUEEN - BlueGene/Q, Power BQC 16C 1.600GHz, Custom Interconnect IBM
458752 5008.9 5872.0 2301
8DOE/NNSA/LLNLUnited States
Vulcan - BlueGene/Q, Power BQC 16C 1.600GHz, Custom Interconnect IBM
393216 4293.3 5033.2 1972
9Leibniz RechenzentrumGermany
SuperMUC - iDataPlex DX360M4, Xeon E5-2680 8C 2.70GHz, Infiniband FDR IBM
147456 2897.0 3185.1 3423
10
National Supercomputing Center in Tianjin China
Tianhe-1A - NUDT YH MPP, Xeon X5670 6C 2.93 GHz, NVIDIA 2050 NUDT
186368 2566.0 4701.0 4040
«Грид - это система, которая:
· координирует использование ресурсов при отсутствии
централизованного управления этими ресурсами
· использует стандартные, открытые, универсальные
протоколы и интерфейсы.
· обеспечивает высококачественное обслуживание»
(Ian Foster: "What is the grid? ", 2002 г.)
Концепция Грид
Модели грид:Distributed ComputingHigh-Throughput ComputingOn-Demand ComputingData-Intensive ComputingCollaborative Computing
Междисциплинарный характер грид: развиваемые технологии применяются в физике высоких энергий, космофизике, микробиологии, экологии, метеорологии, различных инженерных и бизнес приложениях.
Виртуальные организации (VO)
ПРОМЕЖУТОЧНОЕ
ПРОГРАММНОЕ
Визуализация
Рабочие станции
Мобильный доступ
Суперкомпьютеры, ПК- кластеры
Интернет, сети
ОБЕСПЕЧЕНИЕ ГРИД
Массовая память, сенсоры, эксперименты
Грид - это средство для совместного использования вычислительных мощностей и хранилищ данных посредством интернета
Korea and CERN / July 2009
10
Enter a New Era in Fundamental ScienceEnter a New Era in Fundamental ScienceThe Large Hadron Collider (LHC), one of the largest and truly global
scientific projects ever built, is the most exciting turning point in particle physics.
The Large Hadron Collider (LHC), one of the largest and truly global scientific projects ever built, is the most exciting turning point in
particle physics.
Exploration of a new energy frontier Proton-proton and Heavy Ion collisions
at ECM up to 14 TeV
Exploration of a new energy frontier Proton-proton and Heavy Ion collisions
at ECM up to 14 TeVLHC ring:27 km circumference
TOTEM LHCfMOEDAL
CMS
ALICE
LHCb
ATLAS
Collision of proton beams…
…observed in giant detectors
Large Hadron Collider
13
Tier 0 – Tier 1 – Tier 2Tier 0 – Tier 1 – Tier 2
Tier-0 (CERN):•Data recording•Initial data reconstruction
•Data distribution
Tier-1 (11 centres):•Permanent storage•Re-processing•Analysis
Tier-2 (>200 centres):• Simulation• End-user analysis
14/98
The Worldwide LHC Computing Grid (WLCG)
GRID as a brand-new understanding of possibilities of computers and computer networks foresees a global integration of information and computer resources.
At the initiative of CERN, a project EU-dataGrid started up in January 2001 with the purpose of testing and developing advanced grid-technologies. JINR was involved with this project.
The LCG (LHC Computing Grid) project was a continuation of the project EU-dataGrid. The main task of the new project was to build a global infrastructure of regional centres for processing, storing and analysis of data of physical experiments on the Large Hadron Collider (LHC).
2003 – Russian Consortium RDIG – Russian Data Intensive Grid – was established to provide a full-scale participation of JINR and Russia in the implementation of the LCG/EGEE project.
2004 – The EGEE (Enabling Grid for E-Science) projects was started up. CERN is its head organization, and JINR is one of its executors.
2010 – The EGI-InSPIRE project (Integrated Sustainable Pan-European Infrastructure for Researchers in Europe)
15/98
1616
Country Normalized CPU time (2013)
All Country - 12,797,893,444 Russia- 345,722,044Job 446,829,340 12,836,045
CICC comprises CICC comprises 25822582 Cores Cores Disk storage capacity 1800 TB Disk storage capacity 1800 TB
Availability and Reliability = 99%
JINR Central Information and Computing Complex JINR Central Information and Computing Complex ((CICCCICC) )
JINR-LCG2 Tier2 SiteJINR-LCG2 Tier2 Site
JINR covers
40% of the RDIG
share to the LHC
~ 18 million tasks were executed in 2010-2013
RDIG Normalized CPU time (HEPSPEC06)
per site (2010-2013)
Foreseen computing resources to be allocated for JINR CICC
20201414 – – 20152015
20120166
CPU CPU (HEPSPEC06)(HEPSPEC06)
28 28 000000 4400 000000
Disk storage (TB)Disk storage (TB) 44 000000 88 000000
Mass storageMass storage (TB)(TB)
55 000000 1010 000000
Tier2 level grid-infrastructure to support the experiments at LHC (ATLAS, ALICE, CMS, LHCb), FAIR (CBM, PANDA) as well as other large-scale experiments;
a distributed infrastructure for the storage, processing and analysis of experimental data from the accelerator complex NICA;
a cloud computing infrastructure; a hybrid architecture supercomputer;educational and research infrastructure for
distributed and parallel computing.18/98
Multifunctional centre for data processing, analysis and
storage
Collaboration in the area of WLCG monitoring
The Worldwide LCG Computing Grid (WLCG) today includes more than 170 computing centers where more than 2 million jobs are being executed daily and petabytes of data are transferred between sites.
Monitoring of the LHC computing activities and of the health and performance of the distributed sites and services is a vital condition of the success of the LHC data processing
• For several years CERN (IT department) and JINR collaborate in the area of the development of the applications for WLCG monitoring:
- WLCG Transfer Dashboard
- Monitoring of the XRootD federations
- WLCG Google Earth Dashboard
- Tier3 monitoring toolkit
JINR distributed cloud grid-infrastructure for training and research
Main components of Main components of modern distributed modern distributed computing and datacomputing and datamanagement management technologiestechnologies
Scheme of the distributed cloud Scheme of the distributed cloud grid-infrastructuregrid-infrastructure
There is a demand in special infrastructure what could become a platform for training, research, development, tests and evaluation of modern technologies in distributedcomputing and data management.Such infrastructure was set up at LIT integrating the JINR cloud and educational grid infrastructure of the sites located at the following organizations:
Institute of High-Energy Physics (Protvino, Moscow region), Bogolyubov Institute for Theoretical Physics (Kiev, Ukraine), National Technical University of Ukraine "Kyiv Polytechnic Institute" (Kiev, Ukraine), L.N. Gumilyov Eurasian National University (Astana, Kazakhstan), B.Verkin Institute for Low Temperature Physics and Engineering of the National Academy of Sciences of Ukraine (Kharkov,Ukraine), Institute of Physics of Azerbaijan National Academy of Sciences (Baku, Azerbaijan)
9/12/13
NEC 2013 21
Big Data Has Arrived at an Almost Unimaginable Scale
Business e-mails sent per year 3000 PBytes
Content Uploaded to Facebook each year. 182 PBytes
Google search index 98 PBytes
Health Records 30 PBytes
Youtube15 PBytes
LHC Annual15 PBytes
Climate
Library of congress
Nasdaq
US census
http://www.wired.com/magazine/2013/04/bigdata
ATLAS AnnualData Volume30 PBytes
ATLAS Managed Data Volume 130 PBytes
Proposal titled “Next Generation Workload Management and Analysis System for BigData” – Big PanDA started in Sep 2012 (funded DoE) Generalization of PanDA as meta application, providing
location transparency of processing and data management, for HEP and other data-intensive sciences, and a wider exascale community.
There are three dimensions to evolution of PanDA Making PanDA available beyond ATLAS and High Energy
Physics Extending beyond Grid (Leadership Computing Facilities,
High-Performance Computing, Google Compute Engine (GCE), Clouds, Amazon Elastic Compute Cloud (EC2), University clusters)
Integration of network as a resource in workload management 9/12/13
NEC 2013 22
Evolving PanDA for Advanced Scientific Computing
8/6/13
Big Data Workshop 23
Leadership Computing Facilities. Titan
Slide from Ken Read
24
Tier1 centerTier1 center
The Federal Target Programme ProjectThe Federal Target Programme Project: : «Creation of the automated system of «Creation of the automated system of data processing for experiments at the data processing for experiments at the LHC of Tier-1 level and maintenance LHC of Tier-1 level and maintenance of Grid services for a distributed analysis of Grid services for a distributed analysis of these data» of these data»
Duration:Duration: 2011 – 20132011 – 2013
March 2011 - Proposal to create the LCG Tier1 center in Russia (official letter by Minister of (official letter by Minister of Science and Education of Russia A. Fursenko has been Science and Education of Russia A. Fursenko has been sent to CERN DG R. Heuer):sent to CERN DG R. Heuer):
NRC KI for ALICE, ATLAS, and LHC-BLIT JINR (Dubna) for the CMS experiment
Full resources - in 2014 to meet the start of Full resources - in 2014 to meet the start of next working LHC session.next working LHC session.
September 2012 – Proposal was reviewed by WLCG OB and JINR and NRC KI Tier1 sites were accepted as a new “Associate Tier1”
25/98
JINR Tier1 Connectivity Scheme
26/98
JINR CMS Tier-1 progress
Engineering infrastructure (system of uninterrupted power supply and climate-control);
High-speed reliable network infrastructure with the allocated reserved channel to CERN (LHCOPN);
Computing system and storage system on the basis of disk arrays and tape libraries of high capacity;
100% reliability and availability.
20122012
(done)(done)
20132013 20142014
CPU (HEPSpec06)CPU (HEPSpec06)
Number of coreNumber of core
1440014400
12001200
2880028800
24002400
5760057600
48004800
Disk (Terabytes)Disk (Terabytes) 720720 35003500 45004500
Tape (Terabytes)Tape (Terabytes) 7272 57005700 80008000
27/98Lyon/CCIN2P3
Barcelona/PIC
De-FZK
US-FNAL
Ca-TRIUMF
NDGF
CERNUS-BNL
UK-RAL
Taipei/ASGC
Amsterdam/NIKHEF-SARA
Bologna/CNAF
Russia:NRC KI
JINR
28/98
Frames for Grid cooperation of JINRFrames for Grid cooperation of JINR
2828
Worldwide LHC Computing Grid (WLCG) Enabling Grids for E-sciencE (EGEE) - Now is EGI-InSPIREEGI-InSPIRE RDIG Development Project BNL, ANL, UTA “Next Generation Workload Management and Analysis System for
BigData” Tier1 Center in Russia (NRC KI, LIT JINR) 6 Projects at CERN BMBF grant “Development of the grid-infrastructure and tools to provide joint
investigations performed with participation of JINR and German research centers” “Development of grid segment for the LHC experiments” was supported in frames of
JINR-South Africa cooperation agreement; Development of grid segment at Cairo University and its integration to the JINR GridEdu Development of grid segment at Cairo University and its integration to the JINR GridEdu
infrastructureinfrastructure JINR - FZU AS Czech Republic Project “The grid for the physics experiments”JINR - FZU AS Czech Republic Project “The grid for the physics experiments” NASU-RFBR project “Development and support of LIT JINR and NSC KIPT grid-NASU-RFBR project “Development and support of LIT JINR and NSC KIPT grid-
infrastructures for distributed CMS data processing of the LHC operation”infrastructures for distributed CMS data processing of the LHC operation” JINR-Romania cooperation Hulubei-Meshcheryakov programme JINR-Moldova cooperation (MD-GRID, RENAM) JINR-Mongolia cooperation (Mongol-Grid) Project GridNNN (National Nanotechnological Net)
29/98
РезюмеРезюме
2929
Подготовка высококвалифицированных ИТ-кадровПодготовка высококвалифицированных ИТ-кадров-Современная инфраструктура (суперкомпьютеры, грид, Современная инфраструктура (суперкомпьютеры, грид, cloud) cloud) для для обучения, тренингаобучения, тренинга,, выполнения проектов; выполнения проектов;-Центры передовых ИТ-технологий, на базе которых создаются научные Центры передовых ИТ-технологий, на базе которых создаются научные школы, выполняются проекты высокого уровня в широкой международной школы, выполняются проекты высокого уровня в широкой международной кооперации с привлечением студентов и аспирантов;кооперации с привлечением студентов и аспирантов;-Участие в мегапроектах (Участие в мегапроектах (LHC, FAIR, NICA, PIC)LHC, FAIR, NICA, PIC), в рамках которых , в рамках которых создаются новые ИТ-технологии (создаются новые ИТ-технологии (WWW, Grid, Big Data)WWW, Grid, Big Data);;-Международные студенческие школы, летние практики на базе Международные студенческие школы, летние практики на базе высокотехнологических организаций (ЦЕРН, ОИЯИ, НИЦ «Курчатовский высокотехнологических организаций (ЦЕРН, ОИЯИ, НИЦ «Курчатовский институт» институт»