adsm’s hsm at ecmwftsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • an adsm client,...

60
F. Dequenne September 1999 ADSM’s HSM at ECMWF 1 ADSM’s HSM at ECMWF F. Dequenne September 1999

Upload: others

Post on 18-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 1

ADSM’s HSMat ECMWF

F. Dequenne September 1999

Page 2: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 2

Introduction.

Page 3: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 3

What is ECMWF?• European Centre For Medium-range Weather

Forecast.• International organisation founded by 18

European national weather organisations• Our Job:

– Daily production of 10-day global weather forecast– Meteorological research in a strong computational

environment– Provision of access to a large database of

meteorological information.– ...

Page 4: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 4

Page 5: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 5

Estimated growth in stored data

0

200

400

600

800

1000

1200

1400

Tera

byte

s

1998 1999 2000 2001 2002 2003Year End

Page 6: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 6

Data Handling System• Based on several ADSM servers running on

RS/6000 and SP nodes.

• 2 main components:

– Meteorological Archive and Retrieval System.• Application geared towards efficient storage and retrieval of

meteorological data. 90TB +backups.

– ECFS.• General purpose file archival system.

Page 7: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 7

DHS Services: ECFS• Ad-hoc files archiving system

Page 8: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 8

ECFS set-up.

Dartagnan R40 (6 CPUs)

ADSM

Constance R40 (4 CPUs)

ADSM

3494 robot with 16 3590-B 3494 robot with 16 3590-B

Shared 8.3 TB, 1,068 K inodes

Season,2.7 TB,460K inodes.

Eras3.3 TB,152K inodes.

Users 7.5 TB,745K inodes.

Users21.8 TB,265K inodes.

Tmp.7 TB, 38 K inodes

Ms2.5 TB,276K inodes.

Op3.2 TB,558K inodes.

Rdx 0.02 TB,78K inodes.

ECFS clients ECFS clients

Page 9: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 9

Characteristics.

• 30 TB of data (+ backup copy)• 3 Million files• Files size: from a few bytes to 2GB (avg. 10 Mb).

• 2 RS/6000 R40• 2 3494 libraries• 32 Magstar (3590-B)tape drives• 5500 Magstar tapes.

Page 10: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 10

Characteristics.• Some of the file systems are large.• Active and dynamic file systems.

– 1000 transactions/hour, 15GB/hour transfers.– 50GB added to then archives on a daily basis.– Some of the data his volatile, some will stay forever.

• 24/7 availability.• Intense retrieval activity.

Page 11: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 11

HSM, A crash course.

Page 12: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 12

What is it?

Automatic or on demandmigration

Migrated files are recalled on demand or when accessed.

• An ADSM client, composed of a kernel extension and a set of utilities.

• Use VFS to “enrich” the functionality of a standard JFS file system.

Files written and read.

Page 13: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 13

Automatic migration.

Time

File

syst

em

usag

e

Full

High

Low

Migration to ADSMstarts when high mark is reached...

… and ends when the low mark is reached.

• For each HSM file system, a site can define high and low water marks, that control automatic migration to ADSM.

Page 14: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 14

Some HSM commands.• Dsmautomig

– performs automatic migration to ADSM on High water mark or file system full conditions.

• Dsmreconcile– Creates a list of the files that can be migrated by

dsmautomig.

• Dsmmigrate– Allow users or applications to migrate file on request.

• Dsmrecall– Allow users or application to re-stage files without

opening them.

Page 15: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 15

Some HSM concepts.• Stubs

– When a file is migrated to ADSM, a place holder, or stub, including metadata about the file, as well as the beginning of the file, is left in the file system.

• “Small file”– If a file is smaller than a stub, it will never be migrated.

• Candidates– Files that have been flagged by dsmreconcile as being

“migratable” by dsmautomig. They are sorted by weight and stored in a candidate list.

• Premigration– Files with a copy in both the file system and ADSM.

Page 16: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 16

HSM, The challenges.

Page 17: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 17

Main benefits of HSM.• Access to virtually huge disk space.• Easy to deploy… at least initially.• Appears as a standard JFS file system

– but beware of catches (e.g. special characters).

• Applications using HSM need only to understand disk IOs.

Page 18: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 18

Main issues.• However, in a our environment, we have been

confronted with several challenges:

– Auto migration.

– Dsmreconcile challenges.

– Administration.

Page 19: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 19

Dsmautomig issues• Tool in charge of clearing space in a file system when the file system

reaches its high water mark.

• Victim of retrieval centric ADSM server

• Single threaded.

• Requires a candidate list.

Page 20: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 20

Retrieval centric ADSM server

• ADSM attach “hardcoded” priorities to sessions and processes.

• Retrievals have higher priority than saves.

• If tape drives involved, lower priority sessions can be cancelled to allow higher priority ones to run.

Page 21: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 21

Disk stg pool

Store requests

Automig starts

! High water mark reached

Migrate to tape

! Disk pool reacheshigh threshold!

Retrieval centric server...ADSM

Page 22: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 22

Disk stg pool

Store requests

Automig Migrate to tape

Retrieval centric server...

Retrieves requests

Recall 3

Recall 1

Recall 2

ADSM

Page 23: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 23

Retrieval centric server• In a busy system,• with a mixed load of HSM retrieves and archives,• the file system will fill up, and automig will start.• However, the underlying “mover to tapes”,

sessions or processes, will be cancelled in order to allow retrievals (recalls) to take place…

• which will accelerate the filling of the file system.

• Eventually the whole file system will fill up.

Page 24: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 24

Our workaround.• We “solved” this by enforcing low number of

dsmrecalls daemons, so that:– filling of the file system as a result of recalls is not

overpowering.– We ensure that some tape drives are available for

writing.

• We also make use of “migrate on close” when big files are being retrieved.

• This can generate large retrieval queues

Page 25: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 25

Dsmautomig is single threaded.

• One file at a time is transferred from the HSM file system to ADSM. Rates seen are low (2-4 MB/s).

• At times of peak activity, automig is not able to empty fast enough a file system being bombarded by store and retrieve requests.

Page 26: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 26

A multi threaded dsmautomig.ECMWF wrote its own dsmautomig: 500 lines of Perl script.

Candidate list

>dsmls

>Sporndsmmigrates

>dsmls

>Sporndsmmigrate -p

>rewrite what is left of the C.L.> possibly start a dsmreconcile.

Mini CL

Mini CL

Standard File system config table.

Special config file

Page 27: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 27

Multithreaded dsmautomig pros and cons

• Up to seven concurrent streams.• Automig runs really fast.• We clear the candidate list of “dead” entries. (e.g.

deleted or manually migrated files)• Efficient call to dsmreconcile easily integrated.

• Requires disk staging at ADSM level.• Use unpublished interfaces and tables structures.• Need to be maintained.

Humm

Page 28: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 28

HSM’s Achilles heel

Page 29: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 29

Dsmreconcile• Indispensable utility used to perform 2 critical

services.

– 1) Create a candidate list.

• Without this one automig does not know what files to migrate when the file system reaches the high thresholds.

Page 30: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 30

Dsmreconcile• Indispensable utility used to perform 2 critical

services.– 1) Create a candidate list.

– 2) Ensure coherence between metadata existing

• at file system level (e.g. stubs, premig DB, inodes)

• in the ADSM DB (migrated file exist, backup has been done).

Page 31: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 31

Dsmreconcile• Indispensable utility used to perform 2 critical

services.– 1) Create a candidate list.– 2) Ensure coherence between file system and ADSM

metadata.

• Required to purge at ADSM level the data associated to files that have been deleted.

• Needed to recover from disaster/accidents scenarios.

Page 32: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 32

ADSMDB

dsmreconcile phases

DSMRECONCILE

Pre migrationDB

Inodes(either stubs

or files)

MigratedFiles

Backed upfiles

Delete expired

filesCandidate

list

Page 33: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 33

• Full reconcile of a big file system: – Between 10 and 20 hours (sometimes over 1 day).– Resources intensive.– BLOCKS AUTO MIGRATION

• UNWORKABLE IN AN ENVIRONMENT WHERE – SEVERAL RECONCILES MAY NEED TO RUN

DURING A SINGLE DAY,– MIGRATIONS NEED TO BE PERFORMED ON A

REGULAR BASIS.

Reconcile issues.

Page 34: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 34

Candidate list reconciles• dsmreconcile -c

• Candidate list reconciles are shorter, but

– IBM’s dsmautomig does not know about them. If you run out of candidates during a migration… good luck.

– Full reconcile need to be run from time to time.

Page 35: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 35

Reconciles timings.

0

4

8

12

16

20

24

0 200000 400000 600000 800000 1000000 1200000

Number of files in the file system

Hour

s

0

4

8

12

16

20

24

0 200000 400000 600000 800000 1000000 1200000

Number of files in the file system

Hour

s

Candidate list reconciles

Full reconciles

Dsmreconcile runs performed between 17/8/99 and 21/9/99

Page 36: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 36

Full reconcile phases• The premigration database is read

• Not very consequent. A few minutes at the most.

• The ADSM list of migrated files is obtained.• This can take several hours.

• The list of ADSM backed-up files is obtained.• We do not back-up in the same ADSM server, so negligible.

• The file system is traversed.• This can take several hours.

• Stale premigrated entries are removed.• Not very consequent.

• Expired files are removed from ADSM.• This can take several hours.

Page 37: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 37

ADSMDB

dsmreconcile phases

DSMRECONCILE

Pre migrationDB

Inodes(either stubs

or files)

MigratedFiles

Backed upfiles

Delete expired

filesCandidate

list

Page 38: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 38

Full reconcile timings

0

120

240

360

480

600

0 200000 400000 600000 800000 1000000

Size of the retrieved list

Min

utes

0

120

240

360

480

600

0 0.2 0.4 0.6 0.8 1 1.2

Millions of files in the file system

Min

utes

0

120

240

360

480

600

0 10000 20000 30000 40000

Number of files deleted

Min

utes

Getting the migrated files list Traversing the file system

Expiring files in ADSM. Example:One file system of 750,000 files,of which 650,000 are migrated, and 7,500 need to be deleted:

5.5 hours + 4.5 hours + 2 hours.11 hours run.

1.2 files per second!

Page 39: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 39

How to avoid the reconcile trap?• Only perform full reconcile once or twice a week,

during less busy time (night or Week-end)

• Otherwise, recreate the candidate list by using the short version of reconcile (-c option).

• Limit the need for automig runs by using “dsmmigrate on close” (only possible if the user application co-operate)

Page 40: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 40

Starting reconciles.• Use Crontab?

– Dsmreconcile and dsmautomig can not run at the same time.

– Often, reconcile runs are aborted because automig is already running.

– Reconcile starts just before an automig is due…

• Too often, the file systems get full before a candidate list is recreated!

Page 41: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 41

When to start reconcile?

Time

File

syst

em

usag

e

Full

High

Low

CrontabCrontab

Full

High

Low

Time

File

syst

em

usag

e

Started after automig

Candidate listreconcile.

Crontab

Page 42: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 42

Impacts• Blocks auto-migration for long period of time.

• Limits the number of files stored in a file system.

• Forces us to migrate files straight after they are used.

• Does not allow us to run “normal” dsmautomig.

Page 43: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 43

Other problems linked to reconcile

• Forced to keep high water mark low.

• In some conditions, recalls and migrates can be locked out.

• Delay reconcile until they are really needed. This result in Loss of candidate list prioritisation.

Page 44: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 44

In an ideal world...• The Candidate list should be maintained

dynamically.

• ADSM DB and HSM file space should stay synchronised.

• The dsmreconcile utility use should be limited to recovery scenarios.

Page 45: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 45

Maintain the candidate list dynamically

filename weight info bck

Candidate list (indexed)

File closes.

mv

rm

dsmmigrate

dsmrecalls

dsmc backups

Sort list

migrate files

Update

Automig ADSM

Could it be as simple asextending the vfs implementation to trap file closesand operations that logically imply an update of the candidate list?

Page 46: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 46

Dsmreconcile: reducing the need for full reconciles.

The main reason to run full reconcile:

Purging in ADSM the files that were deleted/modified in the file system.

Rm

file update

...

Delete-able list

ADSM internal name expdate

Crontab initiateddeleter.

ADSMKernel Extension

Instead, could the next concept be used?

Page 47: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 47

When full reconcile is due.• Reconcile problem: It blocks activity to get a

perfect image of ADSM and the file system.

• 2 solutions:– partial reconciles, that only look at part of a file system

(difficult)

– Accept that the result of reconcile is ESSENTIALLY correct.

Page 48: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 48

“Fuzzy” reconciles.

• To leave a few inconsistencies between an HSM file system and and its ADSM server is OK…

AS LONG AS – only safe delete are performed, – the inconsistencies are rectified at a later run.

Page 49: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 49

Administration issues.• Configuration

• Day to day work.

Page 50: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 50

Configuration.• Defining the right set-up from the start is crucial.

– How many file systems?– On which platform?– Using which ADSM servers?

• Extremely difficult to change a posteriori.

Page 51: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 51

Number of file systems• Just a few? Then their size will grow too large,

and they will become un-reconciliable.

• Many? – More difficult to manage.– How to split them so that load is balanced? – Requires division and possibly inefficient use of disk

cache.

Page 52: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 52

ECMWF’s approach.• Initially:

– one file system for users private data.– One file system for data shared between users – very few file systems for special projects.

– Each of these with very large disk cache associated to them.

• Dsmreconcile blew this approach.

Page 53: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 53

ECMWF’s approach:• Now:

– less than 500,000 files in a file system.• Try to organise the file systems to balance transfer activity.• Keep similar type of data together.

– The users and shared file systems had to be split.

• But…– requires front end diverting users request to appropriate

file system– Split operation is very resources and time consuming.

Page 54: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 54

To split a file system…

ADSM ADSM

(R)CP

Recall the data

Copy it

Migrate the data

>How to optimise tapemounts during recalls.>extra long reconciles.>Severe disruptions to

normal services.>Need to be done with

minimal denial of service.

A lot of data moves!

Delete old data

1

2

3

4

Page 55: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 55

Day to day Administration.– System visibility.

• Not always obvious why a recall is delayed.

– Tame large surge in activity.• “Let’s store these few hundreds GB in ECFS”… (some users)

– … and leave an HSM file system completely overwhelmed!

– Load balancing.• Today's super active file system could be dead quiet for the

next two weeks.

– Reorganisation of the file systems.• Move directories sub-trees between file systems, plan and

organise for new file systems to be used.

– Group many small files in one large tar file.

Page 56: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 56

Conclusion

Page 57: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 57

What we learned.• Better served by home made dsmautomig• Reduce as much as possible use of full reconciles.• Any single file system becomes difficult to

manage over 1/2 million files, and unmanageable over 800,000.

• Get it right the first time. Reorganising an HSM environment is no easy task.

• Avoid as much as possible to keep small files in an HSM file system.

• Do not back-up to the same server.

Page 58: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 58

HSM works well...• If the file systems are not very big,

• In a light/medium load environment.

• If batch utilities can run during off-peak periods.

• If recalls are not too frequent.

Page 59: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 59

Where we are..• 3.5 Millions files, 30 Terabytes…

– 3 years ago, we never thought that we could stretch HSM to reach these peaks!

• However, we have reached the limits of what can be done with our current hardware, and the version of HSM that we run today.

• This requires a lot of Tender Loving Care.

Page 60: ADSM’s HSM at ECMWFtsm-symposium.oucs.ox.ac.uk/1999/papers/francis.pdf · • An ADSM client, composed of a kernel extension and a set of utilities. • Use VFS to “enrich”

F. Dequenne September 1999 ADSM’s HSM at ECMWF 60

Future.• Our R40s will be replaced next year by more

powerful machines, providing better IO bandwidth.

• By the end of next year, after 5 years of use of ADSM based solutions, we will also re-evaluate the various HSM and archival solutions existing on the market, in view to expand or replace our existing environment in 2001.