repack and tape label options tim bell charles curran gordon lee june 27 th 2008

34
RN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Repack and Tape Label Options Tim Bell Charles Curran Gordon Lee June 27 th 2008

Upload: palmer

Post on 05-Feb-2016

39 views

Category:

Documents


0 download

DESCRIPTION

Repack and Tape Label Options Tim Bell Charles Curran Gordon Lee June 27 th 2008. The Bulk Repack Problem. IBM and Sun have new drives coming Aim for production at CERN in January Higher capacity (1TB per tape) Faster drives (up to 160MBytes/s) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Repack and Tape Label Options Tim Bell Charles Curran Gordon Lee June 27 th  2008

CERN - IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

Repack and Tape Label Options

Tim BellCharles Curran

Gordon LeeJune 27th 2008

Page 2: Repack and Tape Label Options Tim Bell Charles Curran Gordon Lee June 27 th  2008

CERN - IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it 2

The Bulk Repack Problem

• IBM and Sun have new drives coming– Aim for production at CERN in January– Higher capacity (1TB per tape)– Faster drives (up to 160MBytes/s)

• Require repacking to avoid buying new media and robot slots

• Current dataset– 104 million files– 15PB storage– 39000 tapes

Page 3: Repack and Tape Label Options Tim Bell Charles Curran Gordon Lee June 27 th  2008

CERN - IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it 3

Why are we copying ?

Vendor Current Future At CERN Delta Capacity

Cost to purchase

IBM 700GB 1000GB 9692 2.9PB 0.5MCHF

Sun 513 500GB 1000GB 14890 7.4PB 1.3MCHF

Sun 613 500GB 1000GB 15408 7.7PB 1.4MCHF

Total 18.0PB 3.2MCHF

• Cost to purchase is the additional media and slots required if we write at new densities but do not copy and recycle old tapes

• Adds up to a saving of 3.2M CHF• With higher density and repack, media requirements for

2009 are covered• Without higher density, 2 new 10,000 slot robots would be

required in 2009

Page 4: Repack and Tape Label Options Tim Bell Charles Curran Gordon Lee June 27 th  2008

CERN - IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it 4

Per-VO file sizes

alice atlas cms compass lhcb na48 other user0

500

1000

1500

2000

2500

3000

3500

Average File Size on Tape per VOM

B

• Some improvements in file sizes from LHC experiments over the past 6 months but no major revolution expected

• Current average is 154MBytes per file

Page 5: Repack and Tape Label Options Tim Bell Charles Curran Gordon Lee June 27 th  2008

CERN - IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it 5

Per Tape Distribution

• Long tail up to 154,000 files per tape• Only 25% of tapes have average file size >1 Gbyte• Projected year end 2008 based on LHC usage

0-49

9

1500

-199

9

3000

-349

9

4500

-499

9

6000

-649

9

7500

-799

9

9000

-949

9

1000

0-10

499

1150

0-11

999

1300

0-13

499

1450

0-14

999

1600

0-16

499

1750

0-17

999

1900

0-19

499

2050

0-20

999

2200

0-22

499

2350

0-23

999

2500

0-25

499

2650

0-26

999

2800

0-28

499

2950

0-29

999

0

2000

4000

6000

8000

10000

12000

14000

Distribution of files per tape

Tape Count Projected End 2008

Files per tape

Tap

es

Page 6: Repack and Tape Label Options Tim Bell Charles Curran Gordon Lee June 27 th  2008

CERN - IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it 6

Castor tape formats

A B C

H TAM M M H M B TM M H M C T M

Castor

AUL

A M B M C MNL

File Marks In AUL, these are written at the end of each label and each user data file. In NL, these are written at the end of each user file

Labels Meta data about the file contents. These are stored as full data files on the tape with a terminating file mark. Headers in front of the user data, trailers at the end.

Page 7: Repack and Tape Label Options Tim Bell Charles Curran Gordon Lee June 27 th  2008

CERN - IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it 7

File size and performance

• AUL shows 7.3 seconds overhead per file• NL shows 3.3 seconds overhead per file• Tests using low level tape to tape copy are covered by read/cksum/write• Figures confirmed by running repack2 and Castor to aul and nl tapes

Page 8: Repack and Tape Label Options Tim Bell Charles Curran Gordon Lee June 27 th  2008

CERN - IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it 8

Repack in a year

• This is the number of drives which would need to be dedicated to complete the repack within 1 year.

• The write performance varies with different output label types• Includes projected data to year end 2008• Drive costs around 35K CHF over 3 years

aul nl il0

10

20

30

40

50

60

Write

Read

Output Tape Label Type

Dri

ves

Page 9: Repack and Tape Label Options Tim Bell Charles Curran Gordon Lee June 27 th  2008

CERN - IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it 9

Ignore worst cases

• Determine drive requirements if we ignore the projected 6000 tapes with >10000 files

• Leave worst cases in the robot unpacked (i.e. Cost of 0.5MCHF for 3000 more tapes/slots/robots)

aul nl il0

5

10

15

20

25

30

35

40

Write

Read

Output Tape Label Type

Dri

ves

Page 10: Repack and Tape Label Options Tim Bell Charles Curran Gordon Lee June 27 th  2008

CERN - IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it 10

Repack using 20 drives

• Approach to take easy tapes with large files first• Repack using aul tapes would take over 3 years to

complete• Max80 figures reflect the performance if engine is able to

sustain reading at 80MBytes/s. Max50 for 50MBytes/s and Max25 for 25MBytes/s

• The ‘to migrate’ queue would be around 400,000 files at the end of processing if 20 drives are used.

0 100 200 300 400 500 600 7000%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

aul,max80 nl,max80 il,max80 aul,max50 aul,max25

Days Taken Using 20 Drives

Rep

ack

Co

mp

lete

d

Page 11: Repack and Tape Label Options Tim Bell Charles Curran Gordon Lee June 27 th  2008

CERN - IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it 11

IL – Internal Label Format

• New format of data on tape to reduce the number of file marks

• Stores data located by block offset rather than file sequence number

• Tape mark only at the end of the migration stream rather than end of each file

• Simple prototype copy program has produced 85MBytes/s. Full drive speed can be achieved if shared buffers used.

• This label format is new and is therefore not currently supported by Castor

Page 12: Repack and Tape Label Options Tim Bell Charles Curran Gordon Lee June 27 th  2008

CERN - IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it 12

IL tape format

A B C

H TAM M M H M B TM M H M C T M

Castor

AUL

A M B M C MNL

MIL

Internal Label Contains the VID, checksum, Castor name, block number. These are stored in the first few kilobytes of each tape block written

User Data User Data is stored after the internal label and completes a full tape block. The castor file is split into smaller chunks to fit within a tape block such that chunk size + Internal Label size = tape block size. Unit repeated until end of data. No tape mark written at the end of file since internal label contains information about which file it is.

File Marks In AUL, these are written at the end of each label and each user data file. In NL, these are written at the end of each user file. In IL, these are written at the end of the migration stream

Labels Meta data about the file contents. These are stored as full data files on the tape with a terminating file mark. Headers in front of the user data, trailers at the end.

Page 13: Repack and Tape Label Options Tim Bell Charles Curran Gordon Lee June 27 th  2008

CERN - IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it 13

Intermediate Conclusion

• Given the file sizes and drives currently being used, the label format is the limiting factor for performance

• The engine used for copying is a secondary performance factor. This factor becomes more important for label formats or file sizes which support higher speeds such as 50MB/s or more.

• Scanning tapes at full drive speed can be used to validate a complete repack commit to the name server

Page 14: Repack and Tape Label Options Tim Bell Charles Curran Gordon Lee June 27 th  2008

CERN - IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it 14

Option A – bulk repack

• Need a new low level label format using block addressing to write many castor files without tape marks

• Develop a new low level repack program which writes out in il format using direct tape to tape copy with two tape drives on a tape server

• Enhance Castor to support reading il format in the short term

• Writing il format requires modifications to rtcpd/rtcpclientd as current writing is file-by-file and il requires a full stream. This is unlikely before clustering implementation is done so continue to write new data in aul format until clustering implementation is complete which will require rework in this area.

Page 15: Repack and Tape Label Options Tim Bell Charles Curran Gordon Lee June 27 th  2008

CERN - IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it 15

Option B - clustering

• Architecture task force recommended to cluster related data onto tape.

• One possible implementation of this would be to merge many related Castor files into a single large file when migrating to tape and recalled as a unit.

• Start using the repack2 engine at maximum speed and aul tapes on tapes with large files until clustering is available

• Once clustering is available, repack many tapes in parallel to allow related files to be grouped together on tape for more efficient recall.

• Need at least 30 disk servers for production repack service class to ensure reasonable clustering and drive performance.

• Cluster implementation needs to be architected, implemented, policies defined and deployed at very latest by end 1Q 2009 to avoid delays in the repacking process.

Page 16: Repack and Tape Label Options Tim Bell Charles Curran Gordon Lee June 27 th  2008

CERN - IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it 16

Option C – tape to tape copy

• Develop a new low level repack program which is able to write nl tape format output using direct tape to tape copy with two tape drives on a tape server

• Write in nl format and partial re-scan of tapes on completion to validate contents

• 80% of tapes (giving 14PB additional space) can be completed in 1 year with 25 drives which may be sufficient for 2009 data

Page 17: Repack and Tape Label Options Tim Bell Charles Curran Gordon Lee June 27 th  2008

CERN - IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it 17

Costing

• Option A – bulk repack– Development for

• Bulk repack tool• Support of new label format for read in Castor• Name server fields for block offset ?

– 22 drives for 1 year• Option B – start repack2/aul then clustering

– Development for• 2nd level disk hierarchy• Legacy cluster definitions

– Hardware• 33 disk servers @ 8K CHF / disk server dedicated for one year• Fat tape servers purchase required ?

– 33 drives for 1 year

• Option C – copy only good cases to nl– Development for

• Bulk repack tool

– 25 drives for 1 year– Purchase 3000 additional slots (0.5M CHF)

Page 18: Repack and Tape Label Options Tim Bell Charles Curran Gordon Lee June 27 th  2008

CERN - IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it 18

Points• What tool for repack 2010/11?

– Must repack all of the 50PB data in 2011 to new media– 10Gbit/s ethernet and drives at 160MBytes/s– Do we still need a low level tool anyway even if clustering can be used ?– Can we avoid the repack2 restrictions on number of concurrent files being

processed and submitted to the stager ?• What risk with new tape IL format ?

– Complete testing before EOY 2008– Nameserver/stager changes for block offset

• What risk with nl format ?– If tapes are appended to, tape drive malfunction may overwrite data– Write to the tapes once only, scan and then commit to reduce nl risk– Test recovery program based on name server checksums

• What risk with new bulk tool ?– How can we test it ? Scanning tool is also required for validation

• What risk for clustering deliverable ?– Architecture, will multiple user files per tape file be selected ?– Additional hardware for disk layer / fat tape layer– Define experiment and legacy clusters– Schedule is critical for repack success .. Emergency orders for tape capacity

Page 19: Repack and Tape Label Options Tim Bell Charles Curran Gordon Lee June 27 th  2008

CERN - IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it 19

Points (contd)

• How many drives can we spare ?– Need to get underway during low data recording periods

– Further drive purchase ? Use old drives for reading ? – More drives means more load on the stager as queues longer

• Can we reduce read mounting in the future by repack/clustering ?– Use repack as a rebalancing tool by reading in several tapes and re-clustering– What is the access frequency for older LHC data ?– Is the disk layer large enough to be able to effectively cluster on repack ?

• What are the relative efforts ?– Developing new clustering solutions ? Needs to be done anyway but the repack

requirements may bring time pressure– Investment to tune repack2 to get the necessary throughput and robustness will

need to continue and occupy substantial development resources– The low level tool would require scripting and a method to track outstanding

work similar to that used for repack-1

Page 20: Repack and Tape Label Options Tim Bell Charles Curran Gordon Lee June 27 th  2008

CERN - IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it 20

Conclusion

• ?

Page 21: Repack and Tape Label Options Tim Bell Charles Curran Gordon Lee June 27 th  2008

CERN - IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it

Backup Slides

Page 22: Repack and Tape Label Options Tim Bell Charles Curran Gordon Lee June 27 th  2008

CERN - IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it 22

What is in an AUL label

Vol1 Volume Label This contains the Volume Serial Number (VSN) in 6 bytes and this is not necessarily the same as the volume identifier or VID. The VID is site dependent and is normally the number on the cartridge sticker. Vol1 also specifies if label information on the tape is coded in EBCDIC or ASCII

Hdr1 Header 1 label This contains the last 17 chars of the filename and the date / time of writing the file.

Hdr2 Header 2 Label This contains a 5 character field for the block size in bytes used for the file. The 5 characters limits the blocksize to 99999 and for Castor tapes, the "real" block size is held in uhl1. hdr2 also contain the tape format - F for fixed block and U for unformatted. Castor uses a FS format which is fixed block with the option that the last block of the file can be truncated.

uhl1 User header label 1 uhl labels can be defined to hold any non standard data such as the full file name. In Castor, uhl1 holds the real block size which can be greater than the 99999 five character value. These can be repeated several times in 80 block chunks.

eof1 Trailer label 1 a trailer label separated by tape mark from the data

eof2 Trailer label 2 a second trailer label utl1 User trailer label 1 like the user header labels, utl labels can be defined to hold any non standard data. In Castor,

utl1 holds the "actual" number of blocks written to the file.

Page 23: Repack and Tape Label Options Tim Bell Charles Curran Gordon Lee June 27 th  2008

CERN - IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it 23

What is in AUL / UHL 1 ?

Field Example

User Header label UHL(UTL also possible)

Header Label Number 1

Site CERN

Actual files sequence number 000012345

Actual record length 000262144

Tape mover hostname TPSRV201

Drive manufacturer STK

Drive model T9940B

Drive serial number 456000001642

Page 24: Repack and Tape Label Options Tim Bell Charles Curran Gordon Lee June 27 th  2008

CERN - IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it 24

What is in AUL / UHL 2 ?

Field Example

User Header label UHL(UTL also possible)

Header Label Number 2

Bit file ID (64 bits) 00000000000000376975

Name Server hostname CASTORNS1

Absolute mode 0644

Uid 0000000395

Gid 0000001028

File size in bytes (64 bits) 00000000010031553895

Page 25: Repack and Tape Label Options Tim Bell Charles Curran Gordon Lee June 27 th  2008

CERN - IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it 25

What is in AUL / UHL 3 ?

Field Example

User Header label UHL(UTL also possible)

Header Label Number 3

User name timbell

Experiment/Project name

Checksum algorithm AD (adler32)CS (cksum)

File checksum (32 bits)

Last modification (UTC) 2001/04/04 08:51:30

Page 26: Repack and Tape Label Options Tim Bell Charles Curran Gordon Lee June 27 th  2008

CERN - IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it 26

What is in AUL / UHL 4 ?

Field Example

User Header label UHL(UTL also possible)

Header Label Number 4

Copy number 00001

Segment number 00001

Segment size in bytes 00000000010031553895

Checksum algorithm AD (adler32)CS (cksum)

Segment checksum (32 bits)

Tape write timestamp (UTC) 2001/04/04 08:51:30

Number of blocks 0000002342

Page 27: Repack and Tape Label Options Tim Bell Charles Curran Gordon Lee June 27 th  2008

CERN - IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it 27

Repack using 20 drives

• Full extended timeline showing aul,max25 to completion

0 200 400 600 800 1000 1200 1400 16000%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

aul,max80 nl,max80 il,max80 aul,max50 aul,max25

Days Taken Using 20 Drives

Rep

ack

Co

mp

lete

d

Page 28: Repack and Tape Label Options Tim Bell Charles Curran Gordon Lee June 27 th  2008

CERN - IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it 28

Performance for large files

Page 29: Repack and Tape Label Options Tim Bell Charles Curran Gordon Lee June 27 th  2008

CERN - IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it 29

Tape-to-Tape repack?

CERNCERN

Disk Server

Stager

repack

• Tape-to-tape copy rather than copying through the stager avoids network bottleneck

• Initial tests indicate that the tape writing overheads are larger for our typical files

Page 30: Repack and Tape Label Options Tim Bell Charles Curran Gordon Lee June 27 th  2008

CERN - IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it 30

Tests to scale repack 2

30

• 3 disk servers• 3 tape drives in• 3 tape drives out• File size of 2GB+• Elapsed 3h for

1500GB, 46MBytes/s• Around 60MBytes/s

during steady state

• 6 disk servers• 3 tape drives in• 3 tape drives out• File size of 500MB+

Page 31: Repack and Tape Label Options Tim Bell Charles Curran Gordon Lee June 27 th  2008

CERN - IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it 31

Tests to scale repack 2

31

• 3 disk servers• 1 tape drive in• 1 tape drive out• Reaches Gigabit

ethernet wire speeds

Page 32: Repack and Tape Label Options Tim Bell Charles Curran Gordon Lee June 27 th  2008

CERN - IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it 32

c2public small files

• Migrated 400,000 files in 18 days• Two drives• Two disk servers

• Using a mixture of nl and aul tapes on IBM drives• Corresponds to a file / drive every 8 seconds

Page 33: Repack and Tape Label Options Tim Bell Charles Curran Gordon Lee June 27 th  2008

CERN - IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it 33

File size and performance

Date Alice Atlas CMS LHCb

CCRC May ’08 322 MB 1291 MB 872 MB 1327 MB

March ‘08 143 MB 230 MB 1490 MB 865 MB

CCRC Feb ’08 340 MB 320 MB 1470 MB 550 MB

Jan ’08 200 MB 250 MB 2000 MB 200 MB

0 500 1000 1500 2000 2500 30000

10

20

30

40

50

60

70

80

90

100

lhcbcms

atlas

alice

Typical Drive Performance

File Size (MB)

Dri

ve S

pee

d (

Mb

ytes

/s)

Page 34: Repack and Tape Label Options Tim Bell Charles Curran Gordon Lee June 27 th  2008

CERN - IT DepartmentCH-1211 Genève 23

Switzerlandwww.cern.ch/it 34

Additional Information

• Repack Options– https://twiki.cern.ch/twiki/bin/view/FIOgroup/TapeBulkRepack

• Repack Performance Analysis• http://it-div-ds.web.cern.ch/it-div-ds/HO/repack_challenge.html

• Label Options– https://twiki.cern.ch/twiki/bin/view/FIOgroup/TapeLabelOptions