scientific data cloud infrastructure and services in chinese academy of sciences

Download Scientific  data cloud infrastructure and services in Chinese Academy of Sciences

If you can't read please download the document

Upload: celine

Post on 21-Mar-2016

53 views

Category:

Documents


3 download

DESCRIPTION

Scientific data cloud infrastructure and services in Chinese Academy of Sciences. Jianhui LI( [email protected] ), Yuanke Wei( [email protected] ) Yuanchun Zhou( [email protected] ) Computer Network Information Center Chinese Academy of Sciences. Outline. About us - PowerPoint PPT Presentation

TRANSCRIPT

e-Science

Scientific data cloud infrastructure and services in Chinese Academy of Sciences Jianhui LI([email protected]), Yuanke Wei([email protected])Yuanchun Zhou([email protected])Computer Network Information CenterChinese Academy of SciencesOutlineAbout usCAS (Chinese Academy of Sciences)CNIC(Computer Network Information Center), CASSDC(Scientific Data Center), CNIC, CASAbout Scientific Data Cloud of CASData ChallengeArchitectureInfrastructure ServiceMiddleware ServiceDataServiceConclusion2CAS is a leading academic institution and comprehensive research and development center in natural science, technological science and high-tech innovation in China.It was founded in Beijing on 1st November 1949 on the basis of the former Academia Sinica (Central Academy of Sciences) and Peiping Academy of Sciences.

3

4a public support institution for consistent construction, operation and services of information infrastructure of CAS.

a pioneer, promoter and participator for informtion of domestic scientific research and scientific research management

5ARP: CAS research managementscience education and public outreach

Collaboration Environment Research Center (CERC,http://www.cerc.cnic.cn/) is dedicated in the research, construction and service in the collaboration environment for the e-Science.5Scientific Data Center (SDC) is the support facility in charge of the construction, management, operation and maintenance of CAS Informatization Data Application Environment, and has been taking the lead in implementing the CAS Scientific Database Project for more than 20 years.

SDC provides storage services, data services and related application technology services for the entire CAS

SDC hosts the Secretariat of Committee on Data for Science and Technology (CODATA) and the CAS Secretariat for World Wide Web Consortium (W3C).

The vision of SDC is striving to become an important facilitator of exchange and application of scientific data resources, key technology supplier during lifecycle of scientific data, and leader in transforming scientific data into knowledge service.

Scientific Data Center7OutlineAbout usCAS (Chinese Academy of Sciences)CNIC(Computer Network Information Center), CASSDC(Scientific Data Center), CNIC, CASAbout Scientific Data Cloud of CASData ChallengeArchitectureInfrastructure ServiceMiddleware ServiceDataServiceConclusion8Hotter and hotter in data researchMar.29, 2012, the Obama Administration Big Data Research and Development Initiative ($200 Million) : improving our ability to extract knowledge and insights from large and complex collections of digital data

Feb. 11, 2011, Scienceissued a Special Online Collection: Dealing with Data

Sep., 2009, Nature issued Datas shameful neglect: Research cannot flourish if data are not preserved and made accessible. All concerned must act accordingly.

The Second International Symposium on Dataology & Data Science was held 3 days ago in China

Difficult to discover Difficult to access Being lost9Data Driven Scientific DiscoveryData is regarded as the most valuable thing.The impact of Jim Grays thinking is continuing to get people to think in a new way about how data and software are redefining what it means to do science." Bill Gates

Scientific discovery based on data intensive computing is now considered as the ''fourth paradigm'' after theoretical, experimental, and computational science. 10Over Moores Law in DataIDC: Data doubles less every 18 months

Huge volume Rapid increase Various types and formats

11Data ChallengeScientists are being overwhelmed with exploding scientific data.

Much scientific research needs data distributed in different locations.

There is a growing gap between ability of modern scientific instruments and that of scientists.

It has been a great challenge to view, manipulate, store, move, share, and interpret the massive data.12Global issues researchCross disciplines12Scientific Data Deluge in CASLarge scientific facilities produce huge data+20 being operation+20 under construction Long-Term field observation stations+100 stations including Ecology, Environment, Space, etc.Long-Term Research data need to be archived and shared100+ institutes

Large Scientific facilities

Field observation stations

13High Speed Network-CSTNET-CSTNET-CNGI-GLORIAD

1.Field observation stations2.Large scientific facilities3.othersAdvanced CI for Data Lifecycle in CAS

Application

Generation&Collection

Trans-mission

Computing&Analysis

Storage &Curation

DataInformation StreamInformation StreamInformation StreamInformation StreamInformation Stream Data Centers-storage &preservationCurationSharing and ServiceSupercomputing Grid-Computing-Analysis-Mining -visualizationData intensive e-Science activities and Applications

14It is mixed evolution of grid computing, distributed computing, parallel computing, utility computing, network storage technologies, virtualization, and etc.

It has the characteristics of large-scale, virtualization, high reliability, generality, expandability, on-demand service, extremely cheap, which enables it a popular computing paradigm.

It can bridge the scientists and massive data.

Chinese Academy of Sciences Scientific Data Cloud (CASSDC) is focused on cloud technology to provide facilitated ways for scientists to make use of powerful information infrastructure, massive scientific data and rich scientific software.Cloud Computing15

Services of CASSDC

16Scientific Data infrastructureMiddle ware(Scientific data grid middleware, internet-based storage service middleware)Scientific databasesMassive storage systemData-intensive computing facilityHigh speed networkApplication enabled environments and typical e-science practice Software and Toolkits(scientific data collection, curation, and publishing, data analyzing and visualization)17Data Centers Distribution of CASSDCScientific Data~1PBAbove 60 institutionsMultiple Disciplines

Storage Capacity ~ 22PB(50PB)1 major center1 archive center12 middle-size center

Computing Capacity~ 5000(10000) CPU coresDedicated design for DIC

18System Ach. Of Major Center

1919Enabling Technology: InfrastructureGlobal File System of Cloud Storage

2020Enabling Technology: InfrastructureOn fly provision of a computing cluster

2121

Scientific Databases (SDB) A Long-term mission started in 1986 which funded by CASmany institutes involvedlong-term, large-scale collaborationdata from research, for researchCollecting multi-discipline research data and promoting data sharingMore than 350 research databases and 500 datasets by 61 institutesOver 200TB data available to open access and downloadhttp://www.csdb.cn22Scientific Databases (cont.) focusing on data integration and improving research database to be resource database and even reference database)Research databaseResearch databaseResource databaseReference databaseApplicationorienteddatabase23Scientific Databases (cont.)8 Resource databasesGeo-ScienceBiodiversityChemistryAstronomySpace ScienceMicro biology and virusMaterial scienceEnvironment2 Reference databasesChina Speciescompound4 application-Oriented databasesHigh Energy (ITER)Western Environment ResearchEcology researchQinghai Lake Research 24Scientific Databases (cont.) 37 research databasesPhysics & Chemistry, Geosciences, Biosciences, Atmospheric & Ocean Science, Energy Science, Material Science, Astronomy & Space Science

25CAS Scientific Data GridSDG isbuilt upon the Scientific Database, supporting to find and access large scale, distributed and heterogeneous scientific data uniformly and conveniently in a SECURE and proper way Building scientific data application grid according to domain requirementsIntegrate distributed data, analysis tools and storage and computing facilities, providing a uniform data service interface4 pilot grids bioscience gridgeoscience gridChemistry gridAstronomy and space science grid26Scientific Data Grid-ArchitectureOrganization Architecture of SDG

27SDG-Platform && MiddlewarePlatformSDGIM: Information ManagementSDGOM: Operation ManagementSDGSA: Storage ServiceSDGMS: Monitor && Statistic MiddelwareSDGDD: Data PublishSDGDT:Data Transfer ToolkitSDGDC: Data Compress ToolkitSDGMM:MetaData ManagementSDGJS: Job Scheduler

2828Tools for data management and service

29An Integrated Case on Geography Supported by CASSDCData and computing resource are both distributedModel is from CAS scientistAdopted Middleware:Data searchData transportOn-fly computing provisionJob schedulerIt solves massive data computing while some commercial geometric software cant work

Project: High Precision Display of Earth Surface30Data: Microbiology InstituteWorld Data Center for MicroorganismsWuhan Virus Institute

Computing: CNIC Microbiology Institute

Adopted Middleware:Data searchData transportJob schedulerUser athenticationGene Alignment Project

An Integrated Case on Biography Supported by CASSDC31

An Integrated Case on Biography Supported by CASSDC32CooperationInternational Organization Membership

33Cooperation with Europe CSTNET provide network support for the data transmission between Europe and China34

ITER Global Earth Observation System of Systems

CERN LHC: ATLAS & CMS

ARGO-Yangbajing

ATLAS a particle physics experiment at the Large Hadron Collider at CERN 3000 scientists from 38 countries involved

34ChallengesOn-demand Linking multi-disciplinary data based on semanticBig Data processingHigh scalable, Low cost, high Throughput On-demand flexible data processing Integrate data, storage, computing, analysis model and etc. as a whole system driven by one specific scientific problemMaking infrastructure invisible for scientists 35ConclusionScience discovery has increasingly become data intensive, and it calls for reliable and easily accessible scientific data infrastructureCAS is always promoting to build scientific data infrastructure and data intensive e-Science practicesSeeking potential cooperation in data intensive e-Science and data cloud

36Thank you!3737CPUMemory

Chart21463162211331917169

MSIS101312X+20000421100M4345G54641.5T8321601.7G14281.4T1892150M19810.8G2094-6G8-102145.63T238250G247100G25926515002853900040GB29310G303631760GB52landsat110TB166200G17101TB+200GB146009275000001111600G48ICS2300MB/ICS3811000G39380-10040518G25310004410T7161761441293.522150032312500012G3314G34610860035628000369198G+100M37-13G501500M515103114T3410G15131.3GB2712GB41// ()144435G453464500G475621000+1000495100G337337147TBGeoScience146Chemistry31BioScience62ICT21Space13Astronomy3Physics19Material17Ocean16Energy9

MSIS0000000000