named data networking in scientific applications · 2016. 10. 24. · cc* nsf workshop, oct 20,...
TRANSCRIPT
-
NAMED DATA NETWORKING IN
SCIENTIFIC APPLICATIONS
Christos Papadopoulos
Colorado State University
CC* NSF Workshop, Oct 20, 2016 Work supported by NSF #1345236 and #13410999
-
1
CMIP5 Data Retrieval
HTTP://someESGFnode:/CMIP5/output/MOHC/HadCM3/dec
adal1990/day/atmos/tas/r3i2p1/tas_Amon_HADCM3_
historical_r1i1p1_185001-200512.nc
http://someesgfnode/
-
2
What if only used the name?
HTTP://someESGFnode:/CMIP5/output/MOHC/HadCM3/dec
adal1990/day/atmos/tas/r3i2p1/tas_Amon_HADCM3_
historical_r1i1p1_185001-200512.nc
2
http://someesgfnode/
-
3
A Named Data Network (NDN)
The main idea: Name the data, not the hosts!
..so you just tell the network what you want..
..and let the network find it for you
-
4
NDN Operation
Publishers push hierarchical name prefixes into the network
Users send Interests that follow path to published prefix
“Brea d crum b s” d irect data back to the user
Data is cached into the network
Interest: /nytimes.com/today
Publisher prefix:
/nytimes.com/
-
5
NDN data is unique and
immutable
In NDN you sign the data
with a digital signature..
..so the users know when
they get bad data!
Security and Provenance
A digital certificate
identifies the publisher
..so the users can verify
provenance!
-
6
The Power of Names
Data can be retrieved from the closest node
..because the network knows where it is
Data can be cached at the routers
..because the network can recognize similar requests
Data can be firewalled precisely
allow /CMIP5/*
Data names can be predicted
/CM IP5/… /m etadata, /CM IP6/… /m etadata
/CM IP5/… /sim ulator -params
/CM IP5/… /subset/[varA,varB,..]
..and much more!
6
-
7
A Skeleton ESGF
NDN
ESGF node 1
Data storage
Data storage
(1)Publish Dataset
names
(3) Query for
Dataset names
Publisher
(4) Retrieve data
ESGF node 2
(2) Sync changes
Scientist
ESGF node 3
-
8
Science NDN Testbed
2012 NSF CC-NIE campus infrastructure award
10G testbed (courtesy of ESnet, UCAR, and CSU Research LAN)
Currently ~50TB of CMIP5, ~20TB of HEP data
-
9
ESGF Request Patterns
9
LBL node
Request aggregation:
6hours
Up to eight simultaneous
users request the same
dataset
-
10
xrootd Request Patterns
10
Will caching work in xrootd requests?
Seven day log of xrootd
data access • 115K unique records • 10 min granularity • Avg file size: 2GB • Hits at dataset level Up to 1000 duplicate
hits!
-
11
Our Vision
11
-
12
Conclusions
NDN encourages common data access methods where
IP encourages common host access methods
NDN encourages interoperability at the content level
NDN unifies scientific data access methods
Eliminates repetition of functionality
Adds significant security leverage
Rewards structured naming
-
13
For More Info
http://named-data.net
http://github.com/named-data
http://named-data.net/http://named-data.net/http://named-data.net/http://github.com/named-datahttp://github.com/named-datahttp://github.com/named-data