named data networking in scientific applications · 2016. 10. 24. · cc* nsf workshop, oct 20,...

14
NAMED DATA NETWORKING IN SCIENTIFIC APPLICATIONS Christos Papadopoulos Colorado State University CC* NSF Workshop, Oct 20, 2016 Work supported by NSF #1345236 and #13410999

Upload: others

Post on 31-Jan-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

  • NAMED DATA NETWORKING IN

    SCIENTIFIC APPLICATIONS

    Christos Papadopoulos

    Colorado State University

    CC* NSF Workshop, Oct 20, 2016 Work supported by NSF #1345236 and #13410999

  • 1

    CMIP5 Data Retrieval

    HTTP://someESGFnode:/CMIP5/output/MOHC/HadCM3/dec

    adal1990/day/atmos/tas/r3i2p1/tas_Amon_HADCM3_

    historical_r1i1p1_185001-200512.nc

    http://someesgfnode/

  • 2

    What if only used the name?

    HTTP://someESGFnode:/CMIP5/output/MOHC/HadCM3/dec

    adal1990/day/atmos/tas/r3i2p1/tas_Amon_HADCM3_

    historical_r1i1p1_185001-200512.nc

    2

    http://someesgfnode/

  • 3

    A Named Data Network (NDN)

    The main idea: Name the data, not the hosts!

    ..so you just tell the network what you want..

    ..and let the network find it for you

  • 4

    NDN Operation

    Publishers push hierarchical name prefixes into the network

    Users send Interests that follow path to published prefix

    “Brea d crum b s” d irect data back to the user

    Data is cached into the network

    Interest: /nytimes.com/today

    Publisher prefix:

    /nytimes.com/

  • 5

    NDN data is unique and

    immutable

    In NDN you sign the data

    with a digital signature..

    ..so the users know when

    they get bad data!

    Security and Provenance

    A digital certificate

    identifies the publisher

    ..so the users can verify

    provenance!

  • 6

    The Power of Names

    Data can be retrieved from the closest node

    ..because the network knows where it is

    Data can be cached at the routers

    ..because the network can recognize similar requests

    Data can be firewalled precisely

    allow /CMIP5/*

    Data names can be predicted

    /CM IP5/… /m etadata, /CM IP6/… /m etadata

    /CM IP5/… /sim ulator -params

    /CM IP5/… /subset/[varA,varB,..]

    ..and much more!

    6

  • 7

    A Skeleton ESGF

    NDN

    ESGF node 1

    Data storage

    Data storage

    (1)Publish Dataset

    names

    (3) Query for

    Dataset names

    Publisher

    (4) Retrieve data

    ESGF node 2

    (2) Sync changes

    Scientist

    ESGF node 3

  • 8

    Science NDN Testbed

    2012 NSF CC-NIE campus infrastructure award

    10G testbed (courtesy of ESnet, UCAR, and CSU Research LAN)

    Currently ~50TB of CMIP5, ~20TB of HEP data

  • 9

    ESGF Request Patterns

    9

    LBL node

    Request aggregation:

    6hours

    Up to eight simultaneous

    users request the same

    dataset

  • 10

    xrootd Request Patterns

    10

    Will caching work in xrootd requests?

    Seven day log of xrootd

    data access • 115K unique records • 10 min granularity • Avg file size: 2GB • Hits at dataset level Up to 1000 duplicate

    hits!

  • 11

    Our Vision

    11

  • 12

    Conclusions

    NDN encourages common data access methods where

    IP encourages common host access methods

    NDN encourages interoperability at the content level

    NDN unifies scientific data access methods

    Eliminates repetition of functionality

    Adds significant security leverage

    Rewards structured naming

  • 13

    For More Info

    [email protected]

    http://named-data.net

    http://github.com/named-data

    http://named-data.net/http://named-data.net/http://named-data.net/http://github.com/named-datahttp://github.com/named-datahttp://github.com/named-data