processing 2.5 terapixels of the sky in 2 days

23
IDIES Inaugural Symposium Aug 25-26, 2009 Processing 2.5 Terapixels of the Sky in 2 Days George Fekete, JHU

Upload: fawzi

Post on 23-Feb-2016

38 views

Category:

Documents


0 download

DESCRIPTION

Processing 2.5 Terapixels of the Sky in 2 Days. George Fekete, JHU. DR7 Visual Images. DR7 Visual Images. 1,393 rows 1,984 columns 427,853 fields 1,184,462,470,336 pixels. DR7 Visual Images. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Processing 2.5  Terapixels of the Sky in 2 Days

IDIES Inaugural Symposium Aug 25-26, 2009

Processing 2.5 Terapixelsof the Sky in 2 Days

George Fekete, JHU

Page 2: Processing 2.5  Terapixels of the Sky in 2 Days

DR7 Visual Images

Page 3: Processing 2.5  Terapixels of the Sky in 2 Days

DR7 Visual Images

1,393 rows 1,984 columns 427,853 fields1,184,462,470,336 pixels

Page 4: Processing 2.5  Terapixels of the Sky in 2 Days

DR7 Visual Images

1,393 rows 1,984 columns 427,853 fields1,184,462,470,336 pixels

3,553,387,411,008 3 bands of FITS pixels

Page 5: Processing 2.5  Terapixels of the Sky in 2 Days

Goals Pretty images

in the eyes of the beholder Ease of manipulation

store pixels in DB on demand cutout and mosaic initiated inside the

DB store one entire colour image in < ¾ MB

uncompressed TGA is 8M, good jpeg is 2 ¼ MB Important preconditions to good

compressibility background should have little or no salt and

pepper noise choose a good despeckler

Page 6: Processing 2.5  Terapixels of the Sky in 2 Days

Without And With Despeckling

Page 7: Processing 2.5  Terapixels of the Sky in 2 Days

Two Distinct DespecklersWhich seems better?

Page 8: Processing 2.5  Terapixels of the Sky in 2 Days

Same Two ― LaplacianWhat about now?

Page 9: Processing 2.5  Terapixels of the Sky in 2 Days

Same Two ― LaplacianWINNER!

Magick Photoshop

Page 10: Processing 2.5  Terapixels of the Sky in 2 Days

Process Raw Color Images Despeckle

better visual experience better compressibility

Photoshop (!) has best despeckling filter we found can do jpeg 2000 codec can do all other necessary tasks

jpeg2000? compresses better than jpeg produces fewer undesirable visual artifacts

j2k is 28% of jpeg or 8% of TGA

Page 11: Processing 2.5  Terapixels of the Sky in 2 Days

What's The Big Deal? 500,000 images in 24 hours?

doesn't seem like a lotespecially if you can use a thousand processor

cluster. 2 Step process

FITS to TGA (formerly fits2jpeg) been there, done that about 2s per field (without optimzation)

Use Photoshop (cont...)

Page 12: Processing 2.5  Terapixels of the Sky in 2 Days

What's The Big Deal? Tasks for Photoshop

open a TGA add a little noise cleaning apply despeckle filter save as jpeg2000 reduce size by ½ to make ½ size image save as jpeg2000 reduce again to make ¼ size image save as jpeg2000 reduce again to make 1/8 size image adjust contrast and brightness for small thumbnail save as jpeg2000 delete TGA relese all resources

Do this about 500,000 times robustly

Page 13: Processing 2.5  Terapixels of the Sky in 2 Days

Unsupervised Photoshoping NECESSARY

Photoshop runs under Windows XP Windows XP runs under qemu (virtual PC thing) qemu runs the Linux cluster (HHPC) Photoshop can be controlled by a custom .net

application Therefore ... photoshop runs on the linux

cluster SUFFICIENT

qemu /WinXP can see the file system qemu/WinXP/Photoshop can run without a

phyisical display Therefore it is doable

Page 14: Processing 2.5  Terapixels of the Sky in 2 Days

Flow

FITS

FITS to TGA

TGA to j2k

TGA

j2k

Page 15: Processing 2.5  Terapixels of the Sky in 2 Days

Two Steps Decoupled

FITS

FITS to TGA

TGA to j2k

TGA

j2k

Runs asynchronously.Available resources can be added or removed any time

Page 16: Processing 2.5  Terapixels of the Sky in 2 Days

FITS to TGAjobtabl

eskydev/skyfits

WS

FITS to TGATGATGA

FITS

Page 17: Processing 2.5  Terapixels of the Sky in 2 Days

FITS to TGAjobtabl

eskydev/skyfits

WS

FITS to TGATGATGA

FITS

Page 18: Processing 2.5  Terapixels of the Sky in 2 Days

TGA to jpeg2000jobtabl

eskydev/skyfits

WS

TGA to j2k

j2k

TGA

Page 19: Processing 2.5  Terapixels of the Sky in 2 Days

Image generation workflowjobtabl

eskydev/skyfits

WS

j2k

TGAPoller

Photoshop

TGA.net appcontrolsPhotoshopthrough exposedmethods

Page 20: Processing 2.5  Terapixels of the Sky in 2 Days

Image generation workflowjobtabl

eskydev/skyfits

WS

j2k

TGAPoller

Photoshop

TGATGAServer

work nodes

shared file system

edges node(s)

Page 21: Processing 2.5  Terapixels of the Sky in 2 Days

"Scheduler" is a DB

jobtable

jobid,run, rerun, camcol, field, status (ready, working, done)TGA path, output directory,nodeid, grabbed(timestamp), finished(timestamp)

Page 22: Processing 2.5  Terapixels of the Sky in 2 Days

Framework HHPC Cluster

154 nodes, 1232 processors PBS job submission Linux Windows + Photoshop is run as a qemu job

One time: make a C: disk image, install qemu All processors use same C: disk image Each instance of qemu runs in snapshot mode

C: read-only incremental change to disk image cached locally can kill qemu instead of gracefull shutdown (PBS proof) qemu runs without a display window pixels are in /dev/null

Page 23: Processing 2.5  Terapixels of the Sky in 2 Days

Performance for DR7 images

427,853 fields/job 140 seconds total per job (measured)

fits to TGA 2sTGA to j2k 136s

5,989,940s = 693 days (one processor) 0.56 day (1232 processors + leap of faith)add 60% fudge factor penalty Still does it in a day, with two hours to spare