hardware acceleration of monte- carlo structural financial instrument pricing using a gaussian...
TRANSCRIPT
Hardware Acceleration of Monte-Carlo Structural Financial Instrument
Pricing Using a Gaussian Copula Model
Presenter: Alexander Kaganov
Supervisor: Paul Chow
Increasing Computational Requirements (1/3)
In recent years the financial industry has seen:
1. Increasing contract/model complexity Every year new models are developed Unavailability of closed-form solution Necessitate Monte-Carlo pricing
Increasing Computational Requirements (2/3)
2. Increasing portfolio sizes Increase in simple to price instruments Increase in complex derivate security
CDO issuance has increased from $157 billion in 2004 to $507 billion in 2007 (>3x)¹
3xN instruments
3xY time (at least)
N instruments
Y time
¹ SIFMA
Increasing Computational Requirements (3/3)
3. Ever-present need to make real-time decisions Market trends can change quickly Instruments traded electronically
1 ms in Latency is Worth $100 M in Stock Trading Business Value (AMD
Analyst Day-26 july 2007)
Trends in Financial Monte-Carlo Algorithms
1. Computationally intensive Accuracy Time2
2. Highly repetitive A large portion of the calculation time
is spent in a small portion of the code (~90% of the time is spent in ~10% of
the code)
3. High degree of coarse and fine-grain parallelism
Coarse-GrainFine-Grain
Typical MC Financial simulation
Collateralized Debt Obligation (CDO)
CDOProblem:
Banks typically hold portfolios with highly volatile assets.
Solution: Sell assets to an outside entity (SPV), which combines the
different assets together into one collateral pool Repackage the pool as CDO tranches. Sell tranches as form of protection to investors in return for
premium payments
CDO Structure/Pricing (1/2)
Investors
Sponsor (Bank)
Bonds
Loans
CDS
CDOs
Collateral Pool
SPV
Tranches
Super Senior: 12%-100%
Senior: 6% -12%
Mezzanine: 3% -6%
Equity: 0% -3%
Borrowers
(Credit Default Swap)
Each tranche has attachment and detachment points Losses below attachment point → the tranche is unaffected Losses above the detachment point → the tranche becomes inactive
Investor premium is paid based on the tranche width minus tranche losses
CDO Structure/Pricing (2/2)
Attachment (3%)
Detachment (6%)
Tranche Losses
Investor Premium Payments4%
Mezzanine Tranche:
Paid premium based on the full investment
Losses 1/3 of the principal investment. Premium paid based on 2/3 of the original investment
Investor Premium Payments
))())(1
11
T
iiii
T
iiii dLLEdLSsE
CDO Tranche Value = Premium Leg – Default Leg
S =tranche thickness si= Premium
di= Discount factor Li= Tranche loses at time interval i
iLE
Li’s Gaussian Copula Model
Monte-Carlo (MC) pricing model For each path:
2. Compare: )]([1 tPY ii
iiii ZXY 21 1. Generate: iiij
m
jiji ZXY
)(1
or
3. Record losses:
One-Factor Multi-Factor
Implementation
Multi-Core Architecture Three portions: Distributor, Pricing
cores, and Collector. Coarse Grain Parallelism: MC paths
divided among OFGC cores All cores have the same input data
except for random generator seeds Data transfer occurs in parallel to
calculations Double Buffering
Minimal required data transfer rate with an external processor of: 11.3MBytes/sec 1-Lane PCI express- 250 MBytes/sec Data transfer latency can be hidden
Pricing Core Design
Phase 4: Convert collateral pool losses to tranche losses
Phase 5: Accumulate tranche losses
Phase 3: Combine the partial sums, L(ti)’s.
Phase 1: Generate Yi
Phase 2: Compare Yi<Φ-1[P(τi<t)]. Record partial losses
Phase 1
One-Factor Multi-Factor
Four factors accumulated per clock cycle
Accumulation is performed in parallel to Phase 2 execution
Pricing Core Design
Phase 4: Convert collateral pool losses to tranche losses
Phase 5: Accumulate tranche losses
Phase 3: Combine the partial sums, L(ti)’s.
Phase 1: Generate Yi
Phase 2: Compare Yi<Φ-1[P(τi<t)]. Record partial losses
Phase 2 Compare Yi<Φ-1[P(τi<t)].
Record Losses Fine-grain parallelism:
parallelize over time 8 replicas
More replicas → higher speedup (potentially) However, large portions of
the hardware become underutilized
Pipelined adder latency creates multiple partial sums
Pricing Core Design
Phase 4: Convert collateral pool losses to tranche losses
Phase 3: Combine the partial sums, L(ti)’s.
Phase 1: Generate Yi
Phase 2: Compare Yi<Φ-1[P(τi<t)]. Record partial losses
Phase 4: Convert collateral pool losses to tranche losses
Phase 3: Combine the partial sums, L(ti)’s.
Phase 5: Accumulate tranche losses
Phase 5: Accumulate tranche losses
Results
One-Factor Utilization*Single
PrecisionDouble
PrecisionHybrid “Integer”
Flip-Flops 6530 9910 (+51.2%)
6721 (+2.9%)
4906
(-24.9%)
LUTs 7052 13325 (+89.0%)
7599 (+7.8%)
5224
(-25.9%)
BRAMs 15 31 (+106.7%)
15
(0%)
15
(0%)
DSP48Es 29 40 (+37.9%) 30 (+3.4%) 7 (-75.9%)
Frequency 248.8 190.9
(-23.3%)
244.8
(-1.6%)
268.2
(+7.8%)
Average Error (%)
0.37
[1.07]
0 3.02E-5
[5.27E-5]
0
*Utilization for Virtex 5 SX50T chip
Performance: Benchmarks# Based on Data From # of
Assets# of
Time Steps
# of Default Curves
1 CDX.NA.HY 100 15 5
2 CDX.NA.IG 125 35 5
3 CDX.NA.IG.HVOL 30 19 4
4 CDX.NA.XO 35 22 4
5 CDX.EM 14 6 4
6 CDX.DIVERSIFIED 40 23 5
7 CDX.NA.HY.BB 37 13 4
8 CDX.NA.HY.B 46 26 4
9 Semi-homogenous 400 24 2
Credit rating and number of instruments are based on Dow Jones CDX
Notionals obtained from Moody’s, range from $600,000 to $6.6 billion
α: uniformly distributed in [0, 1]
Recovery rate: Normally distributed, N (0.4,0.15)
# of Time Steps: Normally distributed, N (20,10)
# of Systemic Factors: varied from 1 to 16
Processor vs. FPGA setup
3.4 GHz Intel Xeon Processor
3GB RAM C++ program 100,000 Monte-Carlo paths
Virtex 5 SX50T speed grade -3
Connected to host through PCI express
100,000 Monte-Carlo paths
0
5
10
15
20
25
CD
X.N
A.H
Y
CD
X.N
A.IG
CD
X.N
A.IG
.HV
OL
CD
X.N
A.X
O
CD
X.E
M
CD
X.D
IVE
RS
IFIE
D
CD
X.N
A.H
Y.B
B
CD
X.N
A.H
Y.B
Sem
i-homogenous
AV
ER
AG
E
Benchmarks
Sp
eed
up Double Precision
Single Precision
Single/Double Hybrid
"Integer"
Performance: One-Factor Single Core Results (1/3)
Performance: One-Factor Single Core Results (2/3)
0
5
10
15
20
25
30
0 10 20 30 40 50 60 70
# of Time Steps
Sp
eed
up Single Precision
Double Precision
Integer
Performance: One-Factor Single Core Results (3/3)
Single Core Average Acceleration:
Double Precision: 10.8 X
Single Precision: 14.0 X
Single/Double Hybrid: 13.8 X
Integer: 15.8 X
Performance: One-Factor Multi-Core Results Monte-Carlo paths independence allows for a linear speedup as
more pricing cores are incorporated.
Double Single Single/Double Hybrid
Integer
Single Core
Acceleration
10.8X 14.0X 13.8X 15.8X
Maximum # of
Instantiations
2 4 4 5
Multi-Core Acceleration
15.9X 47.2X 47.5X 64.5X
Performance: Multi-Factor Results (1/3)
8
Steps Time of #TRF
4
Factors Systemic of #SFRF
• SFRF is the number of cycles before Phase 1 generates a new Yi
•TRF is the number of cycles before Phase 2 requires a new Yi
Consumer: Phase 2
Ratec: 1/TRF
Producer: Phase 1
Ratep: 1/SFRF
Performance: Multi-Factor Results (2/3)
Benchmark 1: CDX.NA.HY
0
5
10
15
20
25
30
0 2 4 6 8 10 12 14 16 18
# of Systemic Factors
Sp
eed
up
Double-Precision
Single-Precision
Integer
Ratec<= Ratep Ratec>Ratep
Performance: Multi-Factor Results (3/3)Ratep>Ratec Ratep= Ratec Ratep<Ratec Average
Double Precision
1 Core 13.3X 15.9X 12.0X 13.5X
2 Cores 18.7X 22.5X 16.9X 19.0X
Single Precision
1 Core 16.7X 20.1X 15.1X 17.0X
4 Cores 52.8X 63.4X 47.7X 53.6X
Hybrid
1 Core 16.5X 19.9X 14.9X 16.8X
4 Cores 50.3X 60.4X 45.4X 51.1X
Integer
1 Core 17.7X 22.4X 18.6X 18.9X
5 Cores 67.0X 84.5X 70.1X 71.5X
Summary Presented a hardware architecture for pricing
Collateralized Debt Obligations using Li’s models Demonstrating the flexibility of an FPGA in exploiting
both coarse- and fine-grain parallelism
Established that either a single/double hybrid or “integer” representations could be used to balance resource utilization and accuracy
Achieved a maximal acceleration of over 64-fold and 71-fold, for the one-factor and multi-factor models, respectively
Future Work
1. Expand to a multi-FPGA system
2. Attempt the algorithm on a different accelerator architectures
GPU
3. Incorporate into a financial software suite
Thank You(Questions?)