processing 2.5 terapixels of the sky in 2 days
DESCRIPTION
Processing 2.5 Terapixels of the Sky in 2 Days. George Fekete, JHU. DR7 Visual Images. DR7 Visual Images. 1,393 rows 1,984 columns 427,853 fields 1,184,462,470,336 pixels. DR7 Visual Images. - PowerPoint PPT PresentationTRANSCRIPT
IDIES Inaugural Symposium Aug 25-26, 2009
Processing 2.5 Terapixelsof the Sky in 2 Days
George Fekete, JHU
DR7 Visual Images
DR7 Visual Images
1,393 rows 1,984 columns 427,853 fields1,184,462,470,336 pixels
DR7 Visual Images
1,393 rows 1,984 columns 427,853 fields1,184,462,470,336 pixels
3,553,387,411,008 3 bands of FITS pixels
Goals Pretty images
in the eyes of the beholder Ease of manipulation
store pixels in DB on demand cutout and mosaic initiated inside the
DB store one entire colour image in < ¾ MB
uncompressed TGA is 8M, good jpeg is 2 ¼ MB Important preconditions to good
compressibility background should have little or no salt and
pepper noise choose a good despeckler
Without And With Despeckling
Two Distinct DespecklersWhich seems better?
Same Two ― LaplacianWhat about now?
Same Two ― LaplacianWINNER!
Magick Photoshop
Process Raw Color Images Despeckle
better visual experience better compressibility
Photoshop (!) has best despeckling filter we found can do jpeg 2000 codec can do all other necessary tasks
jpeg2000? compresses better than jpeg produces fewer undesirable visual artifacts
j2k is 28% of jpeg or 8% of TGA
What's The Big Deal? 500,000 images in 24 hours?
doesn't seem like a lotespecially if you can use a thousand processor
cluster. 2 Step process
FITS to TGA (formerly fits2jpeg) been there, done that about 2s per field (without optimzation)
Use Photoshop (cont...)
What's The Big Deal? Tasks for Photoshop
open a TGA add a little noise cleaning apply despeckle filter save as jpeg2000 reduce size by ½ to make ½ size image save as jpeg2000 reduce again to make ¼ size image save as jpeg2000 reduce again to make 1/8 size image adjust contrast and brightness for small thumbnail save as jpeg2000 delete TGA relese all resources
Do this about 500,000 times robustly
Unsupervised Photoshoping NECESSARY
Photoshop runs under Windows XP Windows XP runs under qemu (virtual PC thing) qemu runs the Linux cluster (HHPC) Photoshop can be controlled by a custom .net
application Therefore ... photoshop runs on the linux
cluster SUFFICIENT
qemu /WinXP can see the file system qemu/WinXP/Photoshop can run without a
phyisical display Therefore it is doable
Flow
FITS
FITS to TGA
TGA to j2k
TGA
j2k
Two Steps Decoupled
FITS
FITS to TGA
TGA to j2k
TGA
j2k
Runs asynchronously.Available resources can be added or removed any time
FITS to TGAjobtabl
eskydev/skyfits
WS
FITS to TGATGATGA
FITS
FITS to TGAjobtabl
eskydev/skyfits
WS
FITS to TGATGATGA
FITS
TGA to jpeg2000jobtabl
eskydev/skyfits
WS
TGA to j2k
j2k
TGA
Image generation workflowjobtabl
eskydev/skyfits
WS
j2k
TGAPoller
Photoshop
TGA.net appcontrolsPhotoshopthrough exposedmethods
Image generation workflowjobtabl
eskydev/skyfits
WS
j2k
TGAPoller
Photoshop
TGATGAServer
work nodes
shared file system
edges node(s)
"Scheduler" is a DB
jobtable
jobid,run, rerun, camcol, field, status (ready, working, done)TGA path, output directory,nodeid, grabbed(timestamp), finished(timestamp)
Framework HHPC Cluster
154 nodes, 1232 processors PBS job submission Linux Windows + Photoshop is run as a qemu job
One time: make a C: disk image, install qemu All processors use same C: disk image Each instance of qemu runs in snapshot mode
C: read-only incremental change to disk image cached locally can kill qemu instead of gracefull shutdown (PBS proof) qemu runs without a display window pixels are in /dev/null
Performance for DR7 images
427,853 fields/job 140 seconds total per job (measured)
fits to TGA 2sTGA to j2k 136s
5,989,940s = 693 days (one processor) 0.56 day (1232 processors + leap of faith)add 60% fudge factor penalty Still does it in a day, with two hours to spare