shelter from the storm building a safe archive in a hostile world

Shelter from the Storm

Building a Safe Archive in a Hostile World

SCOOP Goal

• SURA-funded Coastal Modeling Project

• Want to develop the community’s cutting-edge techniques to make them ready for use in tomorrow’s production systems.

• For example, automatic verification of storm/surge models against observed data, to help improve the models

CCT Goals

• One of CCTs key research outputs is software

• Want this to be software of a good quality, to be robust

• Want re-use of software across projects

• Also want software to be picked up by external users, as well as collaborators

The SCOOP Archive

• Need to archive lots of files– Atmospheric models (MM5, GFDL)– Hydrodynamic models (ADCIRC, SWAN, etc)– Observational data (sensor data, buoys)

• Requirements poorly defined:– How much data? Don’t know– How long should we keep it for? Don’t know

• Have to interface with bespoke data transport mechanisms (LDM)

• How to achieve our goals under these conditions?!

Basic Archive Operation

Upload:1. Client signals they want to do an upload

of some files (names are given)

2. Archive tells the client where to upload them to (transaction handles)

3. Client uploads files (indep. of archive)

4. Client tells archive it’s done

5. Archive creates the logical filenames

• Use “upload” tool for this

Basic Archive Operation

Download:1. Clients use the catalog service to

discover/search for logical filenames2. Clients talk to the RLS server to get

physical URLs3. Interact with physical URLs directly

• Can use “getdata” CLI tool to encapsulate this

• Also, there are portal pages...

Operations on Service

• fileUploadBegin - for starting an upload

• fileUploadEnd - for saying that an upload is completed

• logicalNameRetry

• removeDeadTransactions

• closeArchive

Distributed Software

• Some services hosted externally• Can’t assume our machine or s/w never fails• Need to retain state of our service on restart

Robust Code

• Don’t assume our service will remain “up”=> Keep all internal state in a database=> Reload internal state on a restart

• Don’t assume external services always “up”=> Design loosely coupled services=> Store pending interactions in the database=> Retry these periodically

• Do “stress testing” on the service during the testing/debug cycle

int logname_initialize(void);

void logname_remove(void);

bool logname_create_logfile

(std::string logical_name,

bool name_is_final,

const std::vector<std::string>& urls);

bool logname_delete_logfile(std::string logical_name);

ulong logname_upload_pending_lognames

(ulong max_rows,

ulong& total_found,

ulong& max_rows_used);

Keep the internalAPIs Simple

Encouraging Reuse

• SCOOP Archive has lots of strange rules about filenames and metadata

• During design and implementation, keep thinking:– Is this for the SCOOP project, or– Is this a generic feature

• Use good O-O design to keep SCOOP code separate from archive code

Keeping SCOOPto one side...class ArchiveFilingLogic {

public:

// Called by the default moveFiles implementation virtual bool createPhysicalPath(std::string physicalPath);

virtual bool moveFiles(std::vector<std::string>& fileNames,std::vector<std::string>& missingFiles,std::string stagePath,std::string physicalPath);

virtual void physicalLocationForFiles (const std::vector<std::string>& filenames, std::map<std::string,std::string>& directories, std::map<std::string,std::string>& errors)=0;

virtual std::vector<std::string> logicalNamesForFiles(const std::vector<std::string>& filenames,std::string physicalPath)=0;

} ;

New Requirements

• Handling common compression formats• Producing subsets of data (predictively)• Tracking data before it is ingested• Notifying people when data arrives• Transforming data to other formats• Generating analytical data “on the fly”• Federating data across multiple locations

• Good initial design will simplify all this...

Highest Priority...

• Archive machine running out of space• People have started to rely on the service

• So, currently we are uploading copies of all data to SDSC DataCenter, using SRB

• Now need to keep track of URLs on physically distributed resources

• But SRB can help with some of the other requirements...

Any Questions?

shelter from the storm building a safe archive in a hostile world

Documents

archive code slide

restart slide

safe archive

basic archive operation

hostile world slide

testingdebug cycle slide

string logical

vector filenames