donald gavel, marc reinig, and carlos cabrera uco/lick observatory laboratory for adaptive optics

Tomography for

Multi-guidestar Adaptive OpticsAn Architecture for Real-Time Hardware Implementation

Donald Gavel, Marc Reinig, and Carlos CabreraUCO/Lick Observatory Laboratory for Adaptive Optics

University of California, Santa Cruz

Presentation at the SPIE Optics and Photonics Conference5903-15

San Diego, CAJune 3, 2005

Gavel, Tomography for Multi-guidestar AO SPIE Optics and Photonics, San Diego, Aug. 2005 2

Outline of talk

• Introduction: The problem of real-time AO tomography for extremely large telescopes (ELTs):

Real-time calculations grow with D4

• An alternative approach using a massively parallized processor (MPP) architecture

• Performance study results

– Experiment

– Simulation

• Conclusions


AO systems are growing in complexity, size, ambition

–MOAO•Up to 20 IFUs each with a DM•8-9 LGS•3-5 TTS

–MCAO•2-3 conjugate DMs•5-7 LGS•3 TTS


Extrapolating the conventional vector-matrix-multiply AO reconstructor method to ELTs is not feasible

Ksa

sΣHΣHHΣa

sHHHa

ˆ

ˆ

ˆ1

1

nTT

TT

• Online calculation requires P x M matrix multiply– M = 10,000 subaps x 9 LGS– P = 20,000 acts (MCAO) or 100,000 acts (MOAO)

– fs = 1 kHz frame rate

~1011 calcs x 1 kHz = ~105 Gflops = ~105 Keck AO processors!

• Offline calculation requires O(M3) flops to (pre)compute the inverse ~1015 calcs --106 sec (12 days) with 1Gflop machine

• “Moore’s Law” of computation technology growth: processor capability doubles every 18 months. To get a 105 improvement takes 25 years growth. Let’s say we use 100 x more processors; a 103 improvement takes 15 years.

Least-squares solution

Minimum variance solution

General form

H = actuator to sensor influence function matrix


Alternative: massively parallel processing

• Advantages– Many small processors each do a small part of the task – not taxing to any one processor

– Modularity: each processor has a stand-alone task – possibly specialized to one piece of hardware (WFS or DM)

– Modularity makes the system easier to diagnose – each part has a “recognizable” task

– Modularity makes system design easier – each subsection depends only on parameters associated with it, as opposed to global optimization of a monolithic design

• Requires– Lots of small processors, with high speed data paths

– Iteration to solution – but what if 1 iteration took only 1 s? – then we would have time for 1000 iterations per 1 ms data frame cycle!

ImageProcessors

ImageProcessors

ImageProcessors

ImageProcessors

WavefrontSensorsWavefrontSensorsWavefrontSensorsWavefrontSensors

TomographyUnit

ImageProcessors

ImageProcessors

ImageProcessors

DMFit

WavefrontSensorsWavefrontSensorsWavefrontSensors

DMProjection

DM conjugatealtitude

Cn2 profile Actuatorinfluencefunction

Centroid algorithmr0, guidstar brightness,

Guidestar position

ImageProcessors

ImageProcessors

ImageProcessorsDeformable

Mirrors

ImageProcessors

ImageProcessors

ImageProcessors

ImageProcessors


TomographyUnit

ImageProcessors

ImageProcessors

ImageProcessors

DMFit


DMProjection




Guidestar position

ImageProcessors

ImageProcessors


Mirrors


1. Wavefront sensor processing• Hartmann sensor: s = Gy

– s = vector of slopes– y = vector of phases– G = gradient operator

• Problem is overdetermined (more measurements than unknowns), assuming no branch points

• High speed algorithms are well knowne.g. FFT based algorithm by Poyneer et. al. JOSA-A 2002 is O(n0 log(n0))

ImageProcessors

ImageProcessors

ImageProcessors

ImageProcessors


TomographyUnit

ImageProcessors

ImageProcessors

ImageProcessors

DMFit


DMProjection




Guidestar position

ImageProcessors

ImageProcessors


Mirrors

ImageProcessors

ImageProcessors

ImageProcessors

ImageProcessors


TomographyUnit

ImageProcessors

ImageProcessors

ImageProcessors

DMFit


DMProjection




Guidestar position

ImageProcessors

ImageProcessors


Mirrors


Weiner solution of the wavefront sensor slope-to-phase problem in the Fourier domain

350

22 1

1~~~

~~

r

si

CC

siy

nnest

23111 2027.0 and = spatial frequency~ indicate Fourier transformr0 = Fried’s parametern = meas. Noiseda = subap diameterC = Kolmogorov spectrumCnn = noise spectrum


2. Tomographic reconstruction

Axy where

y = vector of all WFS phase measurementsx = value of OPD at each voxel in turbulent volumeA is a forward propagation operator (entries = 0 or 1)

x is an N-vectory is an M-vectorA is M x N

• The problem in underdetermined – there are more unknowns than measurements

• Guidestars probe the atmosphere:Image

ProcessorsImage

ProcessorsImage

ProcessorsImage

Processors


TomographyUnit

ImageProcessors

ImageProcessors

ImageProcessors

DMFit


DMProjection




Guidestar position

ImageProcessors

ImageProcessors


Mirrors

ImageProcessors

ImageProcessors

ImageProcessors

ImageProcessors


TomographyUnit

ImageProcessors

ImageProcessors

ImageProcessors

DMFit


DMProjection




Guidestar position

ImageProcessors

ImageProcessors


Mirrors


Inverse tomography algorithms

AT is the back propagation operator

C is the “preconditioner”affects convergence rate only

P,N is the “postconditioner”determines the type of solution:

P=I, N=0 least squaresP=<xxT>, N=<nnT> min variance

= constant feedback gainf(.) = 1st order regression (and other hidden details of the CG algorithm)

vPAx

vNAPAye

Cev

vvv

T

kT

k

kk

kkk

f1

Linear feedback Preconditioned conjugate gradient

-or-

vPAx

vNAPAye

Cev

vvv

T

kT

k

kk

kkk

1


Compute count for inverse tomography• A and AT are massively parallelizable over transverse dimension, guidestars• AT is massively parallelizable over layers

• Optional Fourier domain preconditioning and postconditioning:

per iteration

Back-propagate

Post-condition

Forward-propagate

FT FT-1X

Aperture

WFSdata

VolumetricOPDestimates-+

Pre-condition

FT-1 FTX

Aperture

Back-propagate

Post-condition

Forward-propagate

FT FT-1X

Aperture

WFSdata


Pre-condition

FT-1 FTX

Aperture

Operation CPU MPPU

Fourier Transform M log(M) Log(M) per iteration


Prototype implementation on an FPGA

VoxelLocal Registers

Control Logic

GS1 Error

Current Estimated

ValueCn

2

GSN Error

GS3 Error

GS2 Error

ALU(Word Size + NGS) wide

GSn

...

...

GS1

GS1 ...

...

Cumulative Value GS1

Cumulative Value

GSn

GSn

Forward Propagation Path


Back PropagationPath


GS1

GS1 ...Note:Because the Forward propagation and Back Progagation paths are parallel, but are used at different times, they will actually be a single bus in the physical implementation .

Global SystemState Information


GSn

GSn


Control Logic

GS1 Error

Current Estimated

ValueCn

2

GSN Error

GS3 Error

GS2 Error


GSn

...

...

GS1

GS1 ...

...


Cumulative Value GSn

GSn





GS1




GSn

GSn


Control Logic

GS1 Error

Current Estimated

ValueCn

2

GSN Error

GS3 Error

GS2 Error


GSn

...

...

GS1

GS1 ...

...


Cumulative Value

GSn

GSn





GS1




GSn

GSn


Control Logic

GS1 Error

Current Estimated

ValueCn

2

GSN Error

GS3 Error

GS2 Error


GSn

...

...

GS1

GS1 ...

...


Cumulative Value

GSn

GSn





GS1




GSn

GSn


Control Logic

GS1 Error

Current Estimated

ValueCn

2

GSN Error

GS3 Error

GS2 Error


GSn

...

...

GS1

GS1 ...

...



GSn





GS1

GS1 ...Note:Because the Forward propagation and Back Progagation paths are parallel, but are used at different times, they will actually be a single bus in the physical implementation.



GSn

GSn


Control Logic

GS1 Error

Current Estimated

ValueCn

2

GSN Error

GS3 Error

GS2 Error


GSn

...

...

GS1

GS1 ...

...


Cumulative Value

GSn

GSn





GS1




GSn

GSn


Control Logic

GS1 Error

Current Estimated

ValueCn

2

GSN Error

GS3 Error

GS2 Error


GSn

...

...

GS1

GS1 ...

...


Cumulative Value

GSn

GSn





GS1




GSn

GSn


Control Logic

GS1 Error

Current Estimated

ValueCn

2

GSN Error

GS3 Error

GS2 Error


GSn

...

...

GS1

GS1 ...

...



GSn





GS1




GSn

GSn


Control Logic

GS1 Error

Current Estimated

ValueCn

2

GSN Error

GS3 Error

GS2 Error


GSn

...

...

GS1

GS1 ...

...


Cumulative Value

GSn

GSn





GS1




GSn

GSn


Control Logic

GS1 Error

Current Estimated

ValueCn

2

GSN Error

GS3 Error

GS2 Error


GSn

...

...

GS1

GS1 ...

...


Cumulative Value

GSn

GSn





GS1




GSn

GSn

•A Single Voxel Processor•An Array of Voxel Processors


Preliminary Results for MPP Timing and Resource Allocation on an FPGA

Timing• Basic clock speed supported: 50 MHz (Xilinx Vertex 4)• Total number of states per iteration: 36

Element Current Value Derived Formula Comment

Load Measured Value 12 3n0 Done once per msec

Forward Propagate 27 NGS(2L + 1)

Compare 1 1

Back Propagate 1 1

Calculate New Estimate 7 3NGS + 4

Parameters (current Value)L = Layers (4)NGS = Guide Stars (3)n0 = Sub Apertures (4)A single iteration takesT = 4NGS + 2LNGS + 6 clock cycles

Currently this is 36 50MHz clocks = 720 nsec. Per iteration

Note: algorithm parallelizes over guidestarsFor reasons of simplicity and debugingof this first implementation we have not done this yet

Chip count• This implementation: Vertex 4 chip is 20% utilized (2996 of 15360 available logic cells employed)• Scaling to a system with 10,000 subapertures (such as for the 30 meter telescope) would require 500 of these chips• Standard packing density is ~50 chips/board, this equates to 10 circuit boards


Simulation: extrapolation to the full ELT spatial scale to estimate convergence rates

• 7800 subapertures per guidestar• 5 guidestars• 7 layer atmosphere

• Fixed feedback gain iteration• A and AT implemented in the spatial domain• Initial atmospheric realizations were random with a Kolmogorov spatial

power spectrum.

Convergence to 3 digits accuracy in 1ms


3. Projection and fitting to DMs

• MCAO– Requires filtering and weighted integral over layers for each DM– Filters and weights chosen to minimize “Generalized

Anisoplanatism” (Tokovinin et. al. JOSA-A 2002)– Massively parallelizable over the Fourier domain and over DMs -

L steps to integrate

• MOAO– Requires integral over layers for each science direction (DM)– Massively parallelizable over Spatial or Fourier domain and over

DMs – L steps to integrate

• DM fitting– Deconvolution – massively parallelizable given either spatially

invariant or spatially localized actuator influence function– PCG suppresses aperture affects in 2-3 iterations

ImageProcessors

ImageProcessors

ImageProcessors

ImageProcessors


TomographyUnit

ImageProcessors

ImageProcessors

ImageProcessors

DMFit


DMProjection




Guidestar position

ImageProcessors

ImageProcessors


Mirrors

ImageProcessors

ImageProcessors

ImageProcessors

ImageProcessors


TomographyUnit

ImageProcessors

ImageProcessors

ImageProcessors

DMFit


DMProjection




Guidestar position

ImageProcessors

ImageProcessors


Mirrors


Conclusions

• The architecture: massive parallel computation

• Conceptually simple• Tested with a commercial FPGA; evaluated with simulations – it’s feasible

with today’s technology• Under study:

FD-PCG – extra computation per iteration traded off against faster convergence rate

ImageProcessors

ImageProcessors

ImageProcessors

ImageProcessors


TomographyUnit

ImageProcessors

ImageProcessors

ImageProcessors

DMFit


DMProjection




Guidestar position

ImageProcessors

ImageProcessors


Mirrors

ImageProcessors

ImageProcessors

ImageProcessors

ImageProcessors


TomographyUnit

ImageProcessors

ImageProcessors

ImageProcessors

DMFit


DMProjection




Guidestar position

ImageProcessors

ImageProcessors


Mirrors

Back-propagate

Post-condition

Forward-propagate

FT FT-1X

Aperture

WFSdata


Pre-condition

FT-1 FTX

Aperture

Back-propagate

Post-condition

Forward-propagate

FT FT-1X

Aperture

WFSdata


Pre-condition

FT-1 FTX

Aperture

donald gavel, marc reinig, and carlos cabrera uco/lick observatory laboratory for adaptive optics

Documents