san diego supercomputer center sdsc storage resource broker a data storage language for the...
Post on 16-Jan-2016
217 Views
Preview:
TRANSCRIPT
San Diego Supercomputer CenterSan Diego Supercomputer CenterSDSC Storage Resource Broker
A Data Storage Language for the Requirements of Rebels and
MisfitsArun Jagatheesan
San Diego Supercomputer Center
University of California, San Diego
HPTS WorkshopAsilomar, California, 25-28 September 2005
OrA talk on Data Grids
and DGL
San Diego Supercomputer CenterSDSC Storage Resource Broker 2
Talk Outline• “Next Hype in Grids”
• My belief system before we begin• Meet my friends – Rebels and Misfits
• File Systems, Databases, Datagrids• Mapping physical data to logical view• Mapping physical data and storage to logical view• SRB Statistics• Mapping physical data, storage and processes to logical
view• Data Grid Language
• Conclusion• What Now = work and sacrifices; What Next = Vision
He has 44 slides and 20 minutes. No
infotainment slides either – Boring!
San Diego Supercomputer CenterSDSC Storage Resource Broker 3
Disclaimer and Warning
• My own opinion or thoughts• Arun says so… (can be wrong?)
• Based on my current knowledge and understanding• On September 2005 – current knowledge and level of
understanding (can change?)
• My belief system• I believe in Data Grids for Inter/Intra/Multi-Organizational
Unstructured Data Management (biased ?)• My belief might not be in sync with your belief, but it can
co-exist with your favorite technology
San Diego Supercomputer CenterSDSC Storage Resource Broker 4
Meet my friends – Rebels and Misfits
• Esoteric Requirements from “High-end” users• To keep them alive, they need more… more of every thing• Requirements not broadly felt or required in industry• They push the existing technology to the limits
• From the existing technology’s perspective…• These folks are nuts!• The existing technology was not designed for these
requirements• My friends become rebels or misfits from the existing
technology’s perspective
San Diego Supercomputer CenterSDSC Storage Resource Broker 5
Talk Outline• “Next Hype in Grids”
• My belief system before we begin• Meet my friends – Rebels and Misfits
• File Systems, Databases, Datagrids• Mapping physical data to logical view• Mapping physical data and storage to logical view• SRB Statistics• Mapping physical data, storage and processes to logical
view• Data Grid Language
• Conclusion• What Now = work and sacrifices; What Next = Vision
San Diego Supercomputer CenterSDSC Storage Resource Broker 6
Mapping physical data to logical view
Hierarchical view, independent of
network, disk, sector, track, fragments
Rule : Storage Abstraction – Hide storage resources
San Diego Supercomputer CenterSDSC Storage Resource Broker 7
Mapping physical data to logical viewRelational view (assume
its a database), independent of network,
disk, sector, track, fragments
Thanks to rebels and misfits in Airline
industry who wanted transactional capabilities
San Diego Supercomputer CenterSDSC Storage Resource Broker 8
Talk Outline• “Next Hype in Grids”
• My belief system before we begin• Meet my friends – Rebels and Misfits
• File Systems, Databases, Datagrids• Mapping physical data to logical view• Mapping physical data and storage to logical view• SRB Statistics• Mapping physical data, storage and processes to logical
view• Data Grid Language
• Conclusion• What Now = work and sacrifices; What Next = Vision
San Diego Supercomputer CenterSDSC Storage Resource Broker 9
NIH BIRN SRB Data Grid
• Biomedical Informatics Research Network• Access and analyze biomedical image data• Data resources distributed throughout the country• Medical schools and research centers across the US
• Stable high performance grid based environment• Coordinate data sharing• Federate collections • Support data mining and analysis
San Diego Supercomputer CenterSDSC Storage Resource Broker 10
Mapping distributed data & storage to logical view
25 Universities or Research Hospitals,
Multiple heterogeneous
storage resources
San Diego Supercomputer CenterSDSC Storage Resource Broker 11
Approach we have taken in Data Grids
• Logical Schema (view) is independent of physical schema• Just like databases or even file systems
• Physical Resources are provided in the form of logical resources in the logical view• This is very different from databases (may be similar to
tablespaces)
• A database is used for mapping• Data path, network, access permissions, meta data, storage
type, logical storage resource, physical storage resources• Used for digital libraries, persistent archives and data grids
San Diego Supercomputer CenterSDSC Storage Resource Broker 12
The “Grid” Vision
San Diego Supercomputer CenterSDSC Storage Resource Broker 13
Data Grid Resource Providers
Grid Resource Providers (GRP) providing content
and/or storageGRP
/txt3.txt
GRP
San Diego Supercomputer CenterSDSC Storage Resource Broker 14
Data Grid Administrative Domain
GRP
• Administrative domain with one or more Grid Resource Providers
•Could include their data centers
/txt3.txt
GRP
Research Lab
San Diego Supercomputer CenterSDSC Storage Resource Broker 15
Data Grid Administrative domains
/…/text1.txt /…//text2.txt
GRP GRP GRP GRPGRP GRP GRP
/txt3.txt
GRP
Storage-R-Us Resource Providers
data + storage (50)
Research labdata + storage (40)
Universitydata + storage (10)
San Diego Supercomputer CenterSDSC Storage Resource Broker 16
Data Grid: Logical view of data & resources
/…/text1.txt /…//text2.txt
GRP GRP GRP GRPGRP GRP GRP
/txt3.txt
GRP
Storage-R-Us Resource Providers
data + storage (50)
Research Labdata + storage (40)
Universitydata + storage (10)
/home/arun.sdsc/exp1/home/arun.sdsc/exp1/text1.txt/home/arun.sdsc/exp1/text2.txt/home/arun.sdsc/exp1/text3.txtdata + storage (100)
Logical Namespace (Need not be same as physical view of
resources )
San Diego Supercomputer CenterSDSC Storage Resource Broker 17
BIRN: Inter-organizational Data
San Diego Supercomputer CenterSDSC Storage Resource Broker 18
SDSC SRB User Community (Major US)• BaBar, Stanford Linear Accelerator
Center (SLAC)• California Digital Library (CDL)• Center for Integrated Space Weather
Modeling (CISM)• CVC, Visualization Portal• LDC Data Storage• NIH Bio Informatics Research Network
(BIRN)• NSF Southern California Earthquake
Center (SCEC)• National Archives and Records
Administration (NARA)• National Aeronautics and Space
Administration Centers (NASA)• National Virtual Observatory (NVO)• Npackage, NSF Middleware Initiative
(NMI)
• National Science Digital Library (NSDL)
• National Optical Astronomy Observatory (NOAO)
• ROADNet• Purdue University• SCCOOS, USA• Scientific Rich Media Archive• Salk Institute
• Strand Map Service, USA• UC Berkeley Library• UCSD Library• University of Houston• Persistent Archives Test bed• University of Wisconsin, Madison• WebBase, Stanford University• Yale University Library
San Diego Supercomputer CenterSDSC Storage Resource Broker 19
SDSC SRB User Community• Academia Sinica, Taiwan• Australian National University• Bio-Lab, University of Genoa, Italy• Council for the Central Laboratory of
the Research Councils (CCLRC), UK• CC-IN2P3, France• Distributed Framework, Singapore • Distributed Aircraft Maintenance
Environment (DAME), UK• eMinerals Project, UK• eScience, Belfast Center• Fraunhofer ITWM, Germany• High Energy Accelerator
Organization, KEK, Japan
• K* Grid Computing, Korea• KEK Computing Center, Japan• Lyon, France• NorGrid, Norway• Nanyang Data Grid, Singapore• NCHC, Taiwan• Queensland University of Technology
(QUT), Australia• Rutherford Appleton Laboratory
(RAL), UK• T-Systems, Germany• UK eScience Project, UK• UniGrid, Poland• UMK, Poland• Virtual Laboratory for eScience,
Netherlands
San Diego Supercomputer CenterSDSC Storage Resource Broker 20
0
2
4
6
8
10
12
14
> 100TB
> 10 TB > 5 TB > 1 TB > 500GB
< 200GB
Response
Unique
Outside SDSC
324 TB358 TB
682 TB
Total data brokered by SDSC SRB
San Diego Supercomputer CenterSDSC Storage Resource Broker 21
Talk Outline• “Next Hype in Grids”
• My belief system before we begin• Meet my friends – Rebels and Misfits
• File Systems, Databases, Datagrids• Mapping physical data to logical view• Mapping physical data and storage to logical view• SRB Statistics• Mapping physical data, storage and processes to logical
view• Data Grid Language
• Conclusion• What Now = work and sacrifices; What Next = Vision
San Diego Supercomputer CenterSDSC Storage Resource Broker 22
Mapping distributed data, storage and processes to logical view
San Diego Supercomputer CenterSDSC Storage Resource Broker 23
Long-run Processes in Data Grid
• Data Grid ILM• Data Grid Triggers
• Data Gridflows
San Diego Supercomputer CenterSDSC Storage Resource Broker 24
Data Grid (Enterprise Utility)
ABCZ.com USABCZ.com Asia
Data center
IT Department US IT Department Asia 3rd Party
Physical Resources managed by autonomous administrative domains of the same enterprise (ABCZ.com)
San Diego Supercomputer CenterSDSC Storage Resource Broker 25
Data Grid (Enterprise Utility)
ABCZ.com USABCZ.com Asia
Data center
IT Department US IT Department Asia 3rd Party
Project 1 Project 2
Each project has a data grid instance consisting of
Logical Resources with different SLAs offered by IT
department
San Diego Supercomputer CenterSDSC Storage Resource Broker 28
Change is Constant
• Changes in access patterns• Based on number of users accessing a data• Domains which want to access data
• Data Value• The value of data set (collections?) for a particular domain
based on it business model and users’ access patterns• Each domain will have a different value based on its users
and its role in a data grid
San Diego Supercomputer CenterSDSC Storage Resource Broker 29
“Data Value” based on users
ABCZ.com USABCZ.com Asia
Data center
IT Department US IT Department Asia 3rd Party
Project1 Project2 Project3 Project4
When more users access a project’ data, its data value increases, move that data to a
faster storage type
San Diego Supercomputer CenterSDSC Storage Resource Broker 30
“Data Value” based on domain
ABCZ.com USABCZ.com Asia
Data center
IT Department US IT Department Asia 3rd Party
Project1 Project2 Project3 Project4
When more users from the same domain access the data, the data value for that
particular data in that particular domain increases, so replicate the data to resources
in that domain. (converse is also true)
San Diego Supercomputer CenterSDSC Storage Resource Broker 31
“Data Value” based on role
ABCZ.com USABCZ.com Asia
Data center
IT Department US IT Department Asia 3rd Party
Project1 Project2 Project3 Project4
The 3rd party data center – no users who use data, but is interested in having replica of any data (or deleted data) for long term
preservation
San Diego Supercomputer CenterSDSC Storage Resource Broker 32
Data Grid ILM• ILM = Information Lifecycle Management (Sales
Jargon)• Dynamic re-orientation of data placement and data
retention policies (rules)• Based on “business value of data” and storage
cost• HSM = Hierarchical Storage Management, based
on “data freshness”. ILM goes one step further• Applying this concept on Data Grid, very tricky as
different autonomous domains have different business rules
San Diego Supercomputer CenterSDSC Storage Resource Broker 33
Data Grid Triggers
• Similar to triggers in databases• Based on ECA concepts
• Event• Condition• Action
• Example• Event = Insert new file in collection (“/ourProject/data”)• Condition = (color= “blue” && galaxy = “Andromedia”)• Action = Run ( selectiveDataReplicator.dgl )
San Diego Supercomputer CenterSDSC Storage Resource Broker 34
Talk Outline• “Next Hype in Grids”
• My belief system before we begin• Meet my friends – Rebels and Misfits
• File Systems, Databases, Datagrids• Mapping physical data to logical view• Mapping physical data and storage to logical view• SRB Statistics• Mapping physical data, storage and processes to logical
view• Data Grid Language
• Conclusion• What Now = work and sacrifices; What Next = Vision
San Diego Supercomputer CenterSDSC Storage Resource Broker 35
Data Grid Language
• Requirement• Data Grid ILM process
• The long run process that has to be run is described in DGL
• Data Grid Triggers • Action part of the ECA (Event-Condition-Action) logic
• Data Gridflows• Step by step execution of long run process on Data Grid
• Analogy of SQL in relational databases• Long-run procedures stored and executed in Data Grid it self• Captures the “Infrastructure Execution Logic”
San Diego Supercomputer CenterSDSC Storage Resource Broker 36
DGL RequestAnnotations
about the Data Grid Request
Can be either a Flow or a Status
Query
San Diego Supercomputer CenterSDSC Storage Resource Broker 37
DGL Requests (2 types)
• Data Grid Flow• An XML Structure that describes the execution logic,
associated procedural rules and DGL variables. Can be synchronous or asynchronous flow
• Status Query• An XML Structure used to query the execution status any
gridflow or a sub-flow at any granular level. Status Queries can be made for both synchronous and asynchronous flows
San Diego Supercomputer CenterSDSC Storage Resource Broker 38
FlowScoped Variables that can control
the flow
Logic used by the sub-members
Sub-members that are the
real execution statements
San Diego Supercomputer CenterSDSC Storage Resource Broker 39
Flow Logic (How a flow executes)
San Diego Supercomputer CenterSDSC Storage Resource Broker 40
DGL-Response
Responses can be synchronous or asynchronous
San Diego Supercomputer CenterSDSC Storage Resource Broker 41
Talk Outline• “Next Hype in Grids”
• My belief system before we begin• Meet my friends – Rebels and Misfits
• File Systems, Databases, Datagrids• Mapping physical data to logical view• Mapping physical data and storage to logical view• SRB Statistics• Mapping physical data, storage and processes to logical
view• Data Grid Language
• Conclusion• What Now = work and sacrifices; What Next = Vision
San Diego Supercomputer CenterSDSC Storage Resource Broker 42
Conclusion
• Data Grids are for real – they manage Inter/Intra/Multi-organizational unstructured data (files, streams, …)
• Data Grids extend the database concepts and internally use a database
• A language like Data Grid Language mentioned here is necessary for the proliferation and automation of Data Grid Management Systems (DGMS)
• Reference: Paper in VLDB Workshop on Data Management in Grids
San Diego Supercomputer CenterSDSC Storage Resource Broker 43
We are SDSC SRB
Arun is here!- Shameless
Self promotion
Not in picture: Many students
San Diego Supercomputer CenterSDSC Storage Resource Broker 44
Additional Thanks (Ignorance is a bliss)
• My Advisor: “You already graduated, and have a job at a research firm. Now why are writing to MS Research? Whom did you write to?”
• Me: “I wrote to two people. The first person works on social communities, we can use service brokering for them. I have not got any response from him. But there is another person who did respond. His last name is of the color “Gray” and his web page is very cheesy with music in the background. I guess he does not do much computer science – he works with astronomers.
San Diego Supercomputer CenterSDSC Storage Resource Broker 45
Contact Info
Arun Jagatheesanarun@sdsc.edu
Or
srb@sdsc.eduhttp://www.sdsc.edu/srb/
top related