scientific data cloud infrastructure and services in chinese academy of sciences jianhui...
TRANSCRIPT
![Page 1: Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI(lijh@cnic.cn),lijh@cnic.cn Yuanke Wei(weiyuanke@cnic.cn)weiyuanke@cnic.cn](https://reader038.vdocument.in/reader038/viewer/2022110210/56649eaa5503460f94baf2bd/html5/thumbnails/1.jpg)
Scientific data cloud infrastructure and services in
Chinese Academy of Sciences
Jianhui LI([email protected]), Yuanke Wei([email protected])
Yuanchun Zhou([email protected])Computer Network Information Center
Chinese Academy of Sciences
![Page 2: Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI(lijh@cnic.cn),lijh@cnic.cn Yuanke Wei(weiyuanke@cnic.cn)weiyuanke@cnic.cn](https://reader038.vdocument.in/reader038/viewer/2022110210/56649eaa5503460f94baf2bd/html5/thumbnails/2.jpg)
Outline
• About us– CAS (Chinese Academy of Sciences)– CNIC(Computer Network Information Center), CAS– SDC(Scientific Data Center), CNIC, CAS
• About Scientific Data Cloud of CAS– Data Challenge– Architecture– Infrastructure Service– Middleware Service– Data Service
• Conclusion2
![Page 3: Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI(lijh@cnic.cn),lijh@cnic.cn Yuanke Wei(weiyuanke@cnic.cn)weiyuanke@cnic.cn](https://reader038.vdocument.in/reader038/viewer/2022110210/56649eaa5503460f94baf2bd/html5/thumbnails/3.jpg)
• CAS is a leading academic institution and comprehensive research and development center in natural science, technological science and high-tech innovation in China.
• It was founded in Beijing on 1st November 1949 on the basis of the former Academia Sinica (Central Academy of Sciences) and Peiping Academy of Sciences.
3
![Page 4: Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI(lijh@cnic.cn),lijh@cnic.cn Yuanke Wei(weiyuanke@cnic.cn)weiyuanke@cnic.cn](https://reader038.vdocument.in/reader038/viewer/2022110210/56649eaa5503460f94baf2bd/html5/thumbnails/4.jpg)
4
![Page 5: Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI(lijh@cnic.cn),lijh@cnic.cn Yuanke Wei(weiyuanke@cnic.cn)weiyuanke@cnic.cn](https://reader038.vdocument.in/reader038/viewer/2022110210/56649eaa5503460f94baf2bd/html5/thumbnails/5.jpg)
• a public support institution for consistent construction, operation and services of information infrastructure of CAS.
• a pioneer, promoter and participator for informtion of domestic scientific research and scientific research management
5
![Page 6: Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI(lijh@cnic.cn),lijh@cnic.cn Yuanke Wei(weiyuanke@cnic.cn)weiyuanke@cnic.cn](https://reader038.vdocument.in/reader038/viewer/2022110210/56649eaa5503460f94baf2bd/html5/thumbnails/6.jpg)
Operation and Services in CNIC
6
—— Provided by 7 Business Departments Respectively
Scientific Research Network Environment
Scientific Data Environment
Supercomputing Environment
Informatization of Research Management
Internet-based Science Popularization and Education
Internet Fundamental Resource Services
![Page 7: Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI(lijh@cnic.cn),lijh@cnic.cn Yuanke Wei(weiyuanke@cnic.cn)weiyuanke@cnic.cn](https://reader038.vdocument.in/reader038/viewer/2022110210/56649eaa5503460f94baf2bd/html5/thumbnails/7.jpg)
• Scientific Data Center (SDC) is the support facility in charge of the construction, management, operation and maintenance of CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific Database Project for more than 20 years.
• SDC provides storage services, data services and related application technology services for the entire CAS
• SDC hosts the Secretariat of Committee on Data for Science and Technology (CODATA) and the CAS Secretariat for World Wide Web Consortium (W3C).
• The vision of SDC is striving to become an important facilitator of exchange and application of scientific data resources, key technology supplier during lifecycle of scientific data, and leader in transforming scientific data into knowledge service.
Scientific Data Center
7
![Page 8: Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI(lijh@cnic.cn),lijh@cnic.cn Yuanke Wei(weiyuanke@cnic.cn)weiyuanke@cnic.cn](https://reader038.vdocument.in/reader038/viewer/2022110210/56649eaa5503460f94baf2bd/html5/thumbnails/8.jpg)
Outline
• About us– CAS (Chinese Academy of Sciences)– CNIC(Computer Network Information Center), CAS– SDC(Scientific Data Center), CNIC, CAS
• About Scientific Data Cloud of CAS– Data Challenge– Architecture– Infrastructure Service– Middleware Service– Data Service
• Conclusion8
![Page 9: Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI(lijh@cnic.cn),lijh@cnic.cn Yuanke Wei(weiyuanke@cnic.cn)weiyuanke@cnic.cn](https://reader038.vdocument.in/reader038/viewer/2022110210/56649eaa5503460f94baf2bd/html5/thumbnails/9.jpg)
Hotter and hotter in data researchMar.29, 2012, the Obama Administration “ Big Data
Research and Development Initiative ”($200 Million) : improving our ability to extract knowledge and insights from large and complex collections of digital data
Feb. 11, 2011, 《 Science 》 issued a Special Online Collection: “Dealing with Data”
Sep., 2009, 《 Nature 》 issued “Data’s shameful neglect”: Research cannot flourish if data are not preserved and made accessible. All concerned must act accordingly.
The Second International Symposium on Dataology & Data Science was held 3 days ago in China
Difficult to discover
Difficult to access Being lost
9
![Page 10: Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI(lijh@cnic.cn),lijh@cnic.cn Yuanke Wei(weiyuanke@cnic.cn)weiyuanke@cnic.cn](https://reader038.vdocument.in/reader038/viewer/2022110210/56649eaa5503460f94baf2bd/html5/thumbnails/10.jpg)
Data Driven Scientific Discovery• Data is regarded as the most valuable thing.
“The impact of Jim Gray’s thinking is continuing to get people to think in a new
way about how data and software are redefining what it means to do science."
— Bill Gates
Scientific discovery based on data intensive computing is now considered as the ''fourth
paradigm'' after theoretical, experimental, and computational science.
10
![Page 11: Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI(lijh@cnic.cn),lijh@cnic.cn Yuanke Wei(weiyuanke@cnic.cn)weiyuanke@cnic.cn](https://reader038.vdocument.in/reader038/viewer/2022110210/56649eaa5503460f94baf2bd/html5/thumbnails/11.jpg)
Over Moore’s Law in Data• IDC: Data doubles less every 18 months
• Huge volume
• Rapid increase
• Various types and formats
11
![Page 12: Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI(lijh@cnic.cn),lijh@cnic.cn Yuanke Wei(weiyuanke@cnic.cn)weiyuanke@cnic.cn](https://reader038.vdocument.in/reader038/viewer/2022110210/56649eaa5503460f94baf2bd/html5/thumbnails/12.jpg)
Data Challenge• Scientists are being overwhelmed with exploding scientific
data.
• Much scientific research needs data distributed in different locations.
• There is a growing gap between ability of modern scientific instruments and that of scientists.
• It has been a great challenge to view, manipulate, store, move, share, and interpret the massive data.
12
![Page 13: Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI(lijh@cnic.cn),lijh@cnic.cn Yuanke Wei(weiyuanke@cnic.cn)weiyuanke@cnic.cn](https://reader038.vdocument.in/reader038/viewer/2022110210/56649eaa5503460f94baf2bd/html5/thumbnails/13.jpg)
Scientific Data Deluge in CAS• Large scientific facilities produce huge data
– +20 being operation– +20 under construction
• Long-Term field observation stations– +100 stations including Ecology, Environment, Space, etc.
• Long-Term Research data need to be archived and shared– 100+ institutes
Large Scientific facilitiesField observation stations13
![Page 14: Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI(lijh@cnic.cn),lijh@cnic.cn Yuanke Wei(weiyuanke@cnic.cn)weiyuanke@cnic.cn](https://reader038.vdocument.in/reader038/viewer/2022110210/56649eaa5503460f94baf2bd/html5/thumbnails/14.jpg)
High Speed Network-CSTNET-CSTNET-CNGI-GLORIAD
1.Field observation stations2.Large scientific facilities3.others
Advanced CI for Data Lifecycle in CAS
Application
Generation&Collection
Trans-mission
Computing&Analysis
Storage &Curation
Data
Information Stream
Information Stream
Information Stream
Information Stream
Info
rmati
on Stre
am
Data Centers-storage &preservation- Curation- Sharing and Service
Supercomputing Grid-Computing-Analysis-Mining -visualization
Data intensive e-Science activities and
Applications
14
![Page 15: Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI(lijh@cnic.cn),lijh@cnic.cn Yuanke Wei(weiyuanke@cnic.cn)weiyuanke@cnic.cn](https://reader038.vdocument.in/reader038/viewer/2022110210/56649eaa5503460f94baf2bd/html5/thumbnails/15.jpg)
It is mixed evolution of grid computing, distributed computing, parallel computing, utility computing, network storage technologies, virtualization, and etc.
It has the characteristics of large-scale, virtualization, high reliability, generality, expandability, on-demand service, extremely cheap, which enables it a popular computing paradigm.
It can bridge the scientists and massive data.
Chinese Academy of Sciences Scientific Data Cloud (CASSDC) is focused on cloud technology to provide facilitated ways for scientists to make use of powerful information infrastructure, massive scientific data and rich scientific software.
Cloud Computing
15
![Page 16: Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI(lijh@cnic.cn),lijh@cnic.cn Yuanke Wei(weiyuanke@cnic.cn)weiyuanke@cnic.cn](https://reader038.vdocument.in/reader038/viewer/2022110210/56649eaa5503460f94baf2bd/html5/thumbnails/16.jpg)
IntegratedService
Middleware
Infrastructure
ScientificData
DataService
InfrastructureService
InfrastructureService
Network
J ob Scheduler
Data publisher
MetaData Manager
Data Transport
Services of CASSDC
16
![Page 17: Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI(lijh@cnic.cn),lijh@cnic.cn Yuanke Wei(weiyuanke@cnic.cn)weiyuanke@cnic.cn](https://reader038.vdocument.in/reader038/viewer/2022110210/56649eaa5503460f94baf2bd/html5/thumbnails/17.jpg)
Scientific Data infrastructure
Middle ware(Scientific data grid middleware,
internet-based storage service middleware…)
Scientific databases
Massive storage systemData-intensive computing facility
High speed network
Application enabled environments and typical e-science practice
Software and Toolkits
(scientific data collection, curation, and publishing, data analyzing and
visualization…)
17
![Page 18: Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI(lijh@cnic.cn),lijh@cnic.cn Yuanke Wei(weiyuanke@cnic.cn)weiyuanke@cnic.cn](https://reader038.vdocument.in/reader038/viewer/2022110210/56649eaa5503460f94baf2bd/html5/thumbnails/18.jpg)
Data Centers Distribution of CASSDCScientific Data
~1PBAbove 60 institutionsMultiple Disciplines
Storage Capacity ~ 22PB(50PB)1 major center1 archive center12 middle-size center
Computing Capacity~ 5000(10000) CPU
coresDedicated design for DIC
18
![Page 19: Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI(lijh@cnic.cn),lijh@cnic.cn Yuanke Wei(weiyuanke@cnic.cn)weiyuanke@cnic.cn](https://reader038.vdocument.in/reader038/viewer/2022110210/56649eaa5503460f94baf2bd/html5/thumbnails/19.jpg)
System Ach. Of Major Center
19
![Page 20: Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI(lijh@cnic.cn),lijh@cnic.cn Yuanke Wei(weiyuanke@cnic.cn)weiyuanke@cnic.cn](https://reader038.vdocument.in/reader038/viewer/2022110210/56649eaa5503460f94baf2bd/html5/thumbnails/20.jpg)
Enabling Technology: InfrastructureGlobal File System of Cloud Storage
20
![Page 21: Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI(lijh@cnic.cn),lijh@cnic.cn Yuanke Wei(weiyuanke@cnic.cn)weiyuanke@cnic.cn](https://reader038.vdocument.in/reader038/viewer/2022110210/56649eaa5503460f94baf2bd/html5/thumbnails/21.jpg)
Enabling Technology: InfrastructureOn fly provision of a computing cluster
CPUMemory
CPUMemory
CPUMemory
CPUMemory
CPUMemory
IP kernelWOL
(1) (2) (3)
(4)
Computing Nodes Pool
Image
(4)(4)
switch to root file system
switch to root file system……
……
Storage
Image Image
DHCP Server
TFTPServer
ClusterManager
21
![Page 22: Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI(lijh@cnic.cn),lijh@cnic.cn Yuanke Wei(weiyuanke@cnic.cn)weiyuanke@cnic.cn](https://reader038.vdocument.in/reader038/viewer/2022110210/56649eaa5503460f94baf2bd/html5/thumbnails/22.jpg)
Scientific Databases (SDB) • A Long-term mission started
in 1986 which funded by CAS– many institutes involved– long-term, large-scale
collaboration– data from research, for research
• Collecting multi-discipline research data and promoting data sharing– More than 350 research
databases and 500 datasets by 61 institutes
– Over 200TB data available to open access and download http://www.csdb.cn
22
![Page 23: Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI(lijh@cnic.cn),lijh@cnic.cn Yuanke Wei(weiyuanke@cnic.cn)weiyuanke@cnic.cn](https://reader038.vdocument.in/reader038/viewer/2022110210/56649eaa5503460f94baf2bd/html5/thumbnails/23.jpg)
Scientific Databases (cont.) • focusing on data integration and improving
research database to be resource database and even reference database)
Research databaseResearch database
Resource database
Reference database
Applicationorienteddatabase
23
![Page 24: Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI(lijh@cnic.cn),lijh@cnic.cn Yuanke Wei(weiyuanke@cnic.cn)weiyuanke@cnic.cn](https://reader038.vdocument.in/reader038/viewer/2022110210/56649eaa5503460f94baf2bd/html5/thumbnails/24.jpg)
Scientific Databases (cont.)• 8 Resource databases
– Geo-Science– Biodiversity– Chemistry– Astronomy– Space Science– Micro biology and virus– Material science– Environment
2 Reference databases– China Species
– compound 4 application-Oriented
databases– High Energy (ITER)– Western Environment
Research– Ecology research– Qinghai Lake Research
24
![Page 25: Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI(lijh@cnic.cn),lijh@cnic.cn Yuanke Wei(weiyuanke@cnic.cn)weiyuanke@cnic.cn](https://reader038.vdocument.in/reader038/viewer/2022110210/56649eaa5503460f94baf2bd/html5/thumbnails/25.jpg)
Scientific Databases (cont.) • 37 research databases
– Physics & Chemistry, Geosciences, Biosciences, Atmospheric & Ocean Science, Energy Science, Material Science, Astronomy & Space Science
GeoSci ence 43%
Chemi stry 9%Bi oSci ence 18%
I CT 6%
Space 4%
Astronomy 1%
Physi cs 6%Ocean 5%Materi al 5% Energy 3%
25
![Page 26: Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI(lijh@cnic.cn),lijh@cnic.cn Yuanke Wei(weiyuanke@cnic.cn)weiyuanke@cnic.cn](https://reader038.vdocument.in/reader038/viewer/2022110210/56649eaa5503460f94baf2bd/html5/thumbnails/26.jpg)
CAS Scientific Data Grid• SDG is
– built upon the Scientific Database, supporting to find and access large scale, distributed and heterogeneous scientific data uniformly and conveniently in a SECURE and proper way
• Building scientific data application grid according to domain requirements– Integrate distributed data, analysis tools and storage and
computing facilities, providing a uniform data service interface
– 4 pilot grids • bioscience grid• geoscience grid• Chemistry grid• Astronomy and space science grid
26
![Page 27: Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI(lijh@cnic.cn),lijh@cnic.cn Yuanke Wei(weiyuanke@cnic.cn)weiyuanke@cnic.cn](https://reader038.vdocument.in/reader038/viewer/2022110210/56649eaa5503460f94baf2bd/html5/thumbnails/27.jpg)
Scientific Data Grid-Architecture
Organization Architecture of SDG27
![Page 28: Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI(lijh@cnic.cn),lijh@cnic.cn Yuanke Wei(weiyuanke@cnic.cn)weiyuanke@cnic.cn](https://reader038.vdocument.in/reader038/viewer/2022110210/56649eaa5503460f94baf2bd/html5/thumbnails/28.jpg)
SDG-Platform && Middleware
• Platform– SDGIM: Information
Management– SDGOM: Operation
Management– SDGSA: Storage Service– SDGMS: Monitor && Statistic
• Middelware– SDGDD: Data Publish– SDGDT:Data Transfer Toolkit– SDGDC: Data Compress
Toolkit– SDGMM:MetaData
Management– SDGJS: Job Scheduler
28
![Page 29: Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI(lijh@cnic.cn),lijh@cnic.cn Yuanke Wei(weiyuanke@cnic.cn)weiyuanke@cnic.cn](https://reader038.vdocument.in/reader038/viewer/2022110210/56649eaa5503460f94baf2bd/html5/thumbnails/29.jpg)
Tools for data management and service
29
![Page 30: Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI(lijh@cnic.cn),lijh@cnic.cn Yuanke Wei(weiyuanke@cnic.cn)weiyuanke@cnic.cn](https://reader038.vdocument.in/reader038/viewer/2022110210/56649eaa5503460f94baf2bd/html5/thumbnails/30.jpg)
An Integrated Case on Geography Supported by CASSDC
• Data and computing resource are both distributed
• Model is from CAS scientist
• Adopted Middleware:• Data search• Data transport• On-fly computing
provision• Job scheduler
• It solves massive data computing while some commercial geometric software can’t work
• Project: High Precision Display of Earth Surface
30
![Page 31: Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI(lijh@cnic.cn),lijh@cnic.cn Yuanke Wei(weiyuanke@cnic.cn)weiyuanke@cnic.cn](https://reader038.vdocument.in/reader038/viewer/2022110210/56649eaa5503460f94baf2bd/html5/thumbnails/31.jpg)
• Data: • Microbiology Institute• World Data Center for
Microorganisms• Wuhan Virus Institute
• Computing: • CNIC • Microbiology Institute
• Adopted Middleware:• Data search• Data transport• Job scheduler• User athentication
• Gene Alignment Project
An Integrated Case on Biography Supported by CASSDC
31
![Page 32: Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI(lijh@cnic.cn),lijh@cnic.cn Yuanke Wei(weiyuanke@cnic.cn)weiyuanke@cnic.cn](https://reader038.vdocument.in/reader038/viewer/2022110210/56649eaa5503460f94baf2bd/html5/thumbnails/32.jpg)
An Integrated Case on Biography Supported by CASSDC
32
![Page 33: Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI(lijh@cnic.cn),lijh@cnic.cn Yuanke Wei(weiyuanke@cnic.cn)weiyuanke@cnic.cn](https://reader038.vdocument.in/reader038/viewer/2022110210/56649eaa5503460f94baf2bd/html5/thumbnails/33.jpg)
Cooperation• International Organization Membership
33
![Page 34: Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI(lijh@cnic.cn),lijh@cnic.cn Yuanke Wei(weiyuanke@cnic.cn)weiyuanke@cnic.cn](https://reader038.vdocument.in/reader038/viewer/2022110210/56649eaa5503460f94baf2bd/html5/thumbnails/34.jpg)
Cooperation with Europe CSTNET provide network support for the data
transmission between Europe and China
34
ITER
Global Earth Observation System
of Systems
CERN LHC: ATLAS & CMS ARGO-Yangbajing
![Page 35: Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI(lijh@cnic.cn),lijh@cnic.cn Yuanke Wei(weiyuanke@cnic.cn)weiyuanke@cnic.cn](https://reader038.vdocument.in/reader038/viewer/2022110210/56649eaa5503460f94baf2bd/html5/thumbnails/35.jpg)
Challenges
• On-demand Linking multi-disciplinary data based on semantic
• Big Data processing– High scalable, Low cost, high Throughput – On-demand flexible data processing
• Integrate data, storage, computing, analysis model and etc. as a whole system driven by one specific scientific problem– Making infrastructure invisible for scientists
35
![Page 36: Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI(lijh@cnic.cn),lijh@cnic.cn Yuanke Wei(weiyuanke@cnic.cn)weiyuanke@cnic.cn](https://reader038.vdocument.in/reader038/viewer/2022110210/56649eaa5503460f94baf2bd/html5/thumbnails/36.jpg)
Conclusion• Science discovery has increasingly become
data intensive, and it calls for reliable and easily accessible scientific data infrastructure
• CAS is always promoting to build scientific data infrastructure and data intensive e-Science practices
• Seeking potential cooperation in data intensive e-Science and data cloud
36
![Page 37: Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI(lijh@cnic.cn),lijh@cnic.cn Yuanke Wei(weiyuanke@cnic.cn)weiyuanke@cnic.cn](https://reader038.vdocument.in/reader038/viewer/2022110210/56649eaa5503460f94baf2bd/html5/thumbnails/37.jpg)
Thank you!
37