astronomical tiled image compression how & why. authors: zrob seaman, noao zbill pence,...

Post on 24-Dec-2015

217 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Astronomical Tiled Image Compression

How

&

Why

Authors:

Rob Seaman, NOAOBill Pence, NASA/GSFCRick White, STScIMark Dickinson, NOAOFrank Valdes, NOAONelson Zárate, NOAO

Statement of problem

No one compression is always bestNew instruments and survey

programs will dwarf data sets that have come before

Observatories' data storage costsTransport latency & bandwidth

challenge not just budgets, but technology and human patience

The bottom line is data handling throughput, not static storage

Host level compressionPer-file gzip compression

Contents of file are opaque

Speed of compression

Speed of decompression

Size of output

Limited support for on-the-fly decompression

How

FITS tile compression convention

Provides a general framework

Supports any compression

algorithm that can operate on

multidimensional image sections

FITS headers remain readable

Access to individual FITS HDUs

Files are still FITS

LimitationsOnly partially supported by IRAFSupported by CFITSIO, but caveats:Not idempotent, even a losslessly

compressed file would suffer keyword changes

Original convention covered only per-HDU issues, e.g., compressing a SIF produced same binary table as MEF original

Only application was the limited imcopy example program

Unsupported algorithms

Improvementsfpack compression toolCompress images in-place Multi-image archives for efficiencyIdempotentSupports FITS ChecksumApplications layered on CFITSIO

access compressed files and file archives transparently

Support for HcompressGeneral purpose option for

adaptively scaling input data.

fpack / funpackfpack, a FITS tile-compression engine. Version 0.8.2 (25 September

2006)usage: fpack [-r|-p|-g|-h] [-w|-t <axes>] [-n <bits>] [-v] [-Etc] <FITS>

Flags must appear (separately) before filenames: -r Rice compression [default], or -p PLIO compression, or -g GZIP (per-tile) compression -h Hcompress compression -w override tile size to be whole image, or -t <axes> comma separated list of tile sizes [default=row] -n <bits> noise bits to preserve for real pixels [default=4] -v verbose -F clobber output [default overwrites input in-

place] -K keep (don't delete, overwrite or change) input files -A <file> write (append or clobber) output to single file, or -P <pre> prepend <pre> to create separate output filenames -L list and validate contents, files unchanged -H print this message -V print version number <FITS> FITS files or extensions to pack

… & WhyPreserve the scientific integrity of

processed astronomical data setsNative integer data products permit

lossless compression techniques for neutral effect, or

May benefit from lossy compression for high compression factors

Processing, pipeline or hands-on, often creates floating point

Choose lossy compression, orScale data into integers

Compression statistics

Additional cost for gzip’ed floating point output from pipeline is $2.86 per image versus Rice compressed integers.

BenefitsReduced:

DiskspaceBandwidthLatency

Remove need to decompressPack multiple files for efficient

transportHeaders remain readableIndividual HDUs are accessibleChoice of algorithm isn’t fixed

DMS architecture

Benefits NSA, NHPP, NVO portal

No need for ASCII header filesSmaller footprintFaster replicationFiles remain FITS throughoutExtends upstream into domesExtends downstream to usersCompression can be free or

better than free

top related