vpa-2box model diagnostics used in the …...vpa-2box model diagnostics used in the 2012 assessment...

SCRS/2013/030 Collect. Vol. Sci. Pap. ICCAT, 70(1): 190-201 (2014)

VPA-2BOX MODEL DIAGNOSTICS USED IN THE 2012 ASSESSMENT OF WESTERN ATLANTIC BLUEFIN TUNA (THUNNUS THYNNUS)

Shannon L. Cass-Calay and Matthew Lauretta1

SUMMARY This report documents the model diagnostics used in the assessment of West Atlantic bluefin tuna during 2012. Model diagnostics included model convergence statistics, sensitivity analyses of abundance indices and life history assumptions, and bootstrap analysis to evaluate model robustness and estimate central tendency and variance of parameter estimates. The usefulness of these diagnostics in evaluating model performance during the stock assessment is discussed, and recommendations for future improvement are provided.

RÉSUMÉ

Le présent rapport documente les diagnostics du modèle utilisés dans l'évaluation du thon rouge de l’Atlantique Ouest en 2012. Les diagnostics du modèle incluaient les statistiques de convergence du modèle, les analyses de sensibilité des indices d'abondance et les postulats sur le cycle vital et les analyses par bootstrap pour évaluer la solidité du modèle et estimer la tendance centrale, ainsi que la variance des estimations des paramètres. L'utilité de ces diagnostics dans l'évaluation des performances du modèle pendant l'évaluation des stocks est discutée et des recommandations sont formulées en vue d'améliorations futures.

RESUMEN

Este trabajo documenta los diagnósticos del modelo utilizados en la evaluación de atún rojo del Atlántico oeste durante 2012. Los diagnósticos del modelo incluían estadísticas de convergencia del modelo, análisis de sensibilidad de índices de abundancia y supuestos de ciclo vital y análisis de bootstrap para evaluar la robustez del modelo y estimar la tendencia central y la varianza de estimaciones de parámetros. Se debatía la utilidad de los diagnósticos a la hora de evaluar el funcionamiento del modelo durante la evaluación de stocks y se facilitaban recomendaciones para futuras mejoras.

KEYWORDS

Stock assessment, Model diagnostics, Bluefin tuna

1 NOAA Fisheries, Southeast Fisheries Science Center, Sustainable Fisheries Division, 75 Virginia Beach Drive, Miami, FL, 33149-1099, USA. E-mail: [email protected]

190

1. Introduction A number of model diagnostic tools were used in the 2012 assessment of West Atlantic bluefin tuna (Thunnus thynnus) to evaluate VPA-2BOX model performance and assess the robustness of stock biomass and fishing mortality estimates. These diagnostics were important for determining the validity of model estimates, given the assumptions of the data and life-history parameters of the stock. The diagnostics tools are outlined below, and figures are included to demonstrate the approach. 2. Methods Following is a list of the diagnostic tools used during the 2012 assessment of West Atlantic bluefin tuna. Relative Abundance Indices

A summary table was constructed to evaluate the appropriateness and validity of individual abundance indices based on predetermined criteria provided by ICCAT.

Plots were constructed to visually compare model fit to the standardized relative abundances. Individual outliers that were believed to be false signals of stock abundance trends (i.e. CAN GSL 2010

index) were removed to determine the influence of those data points on VPA results. Indices jackknife: Individual indices were iteratively removed from the VPA model to determine the

most influential data series and the effects of excluding individual indices on model results. Model Convergence

Model convergence statistics provided within the VPA-2BOX program were examined to verify model performance.

Model estimates were compared across a range of parameter starting values to assess model solution robustness.

Model Sensitivity Runs Selectivity sensitivity: Model sensitivity to selectivity assumptions were evaluated through alternative

model runs that included no constraints applied to selectivity schedules, and constraints applied to selectivity standard deviations and across different age classes. SSB, apical fishing mortality and recruitment estimates were compared for all selectivity sensitivity model runs to determine the effect of those assumptions on model results.

Maturity Sensitivity: Three maturity schedules were evaluated to determine the effects of stock maturity assumptions on VPA results. Alternative maturity assumptions included early maturation (i.e. eastern Atlantic oogive with 50% maturity at age 4 and full maturity of older age classes), late maturation (i.e. approximately logistic increase in maturity from 0 at age 8 to 1 at age 16), and intermediate maturation (fully mature at age 9). SSB and fishing mortality estimates under each alternative were compared.

Retrospective analysis: Annual indices and catch information were removed iteratively from the most recent year to up to five years prior to determine the influence of data from recent years on stock biomass and fishing mortality estimate trends.

Recruitment Sensitivity: Two alternative hypotheses of the functional form of the stock-recruitment relationship were modeled to determine model sensitivity to recruitment assumptions. Stock status future projections were compared between the alternative recruitment assumptions.

Bootstrap Iterations

500 bootstrap iterations of the VPA model were conducted to determine model stability and variation in SSB and fishing mortality estimates. Histograms of the bootstrap estimates were constructed to examine bias and normality of estimates across bootstraps.

191

Results and Discussion A summary table of abundance indices diagnostics (Table 1) constructed by ICCAT and filled in by Cooperating Parties was very useful for cross-indices comparisons of data assumptions and standardization methods. Plotting model fits to indices provided a visual comparison of model performance between the current assessment and the previous one (Figure 1), as well as between the base model and select sensitivity runs (Figure 2). Model convergence diagnostics Stock biomass and fishing mortality estimates across various sensitivity analyses indicated that historical stock trend patterns were fairly robust to the assumptions of parameter starting values, life-history parameters, particularly maturity assumptions (Figure 3), as well as selectivity assumptions (Figure 3) and the removal of individual abundance indices from the jack-knife analysis (Figure 4). Retrospective analyses indicated that model results were not largely influenced by the removal of indices and catch data from recent years. Contrastingly, the alternative hypotheses of the stock-recruitment relationship had a large effect on the perception of current stock status (Figure 5) and in the stock biomass projections (Figure 6). Bootstrap resampling provided a measure of model estimate uncertainty and histograms of stock biomass and fishing mortality estimates across bootstrap iterations demonstrated good central tendency and approximate normality of estimates. From these bootstraps, it was possible to estimate confidence intervals around parameter estimates (Figures 7 and 8) to characterize model uncertainty. Recommendations

192

Table 1. Summary table to evaluate the available Atlantic bluefin abundance indices. Reproduced from the 2012 Atlantic bluefin tuna stock assessment report. Fishery Independent Fishery Independent

SCRS doc SCRS/2012/100 SCRS‐12‐103 SCRS/2012/111 SCRS/2012/124 SCRS/2012/131 SCRS/2012/158 SCRS/2012/160 SCRS/2012/164 SCRS/2012/118 SCRS/2012/118Index Bay of Biscay Baitboat Morocco and Spanish traps Spanish traps Juvenile western Med Japanese LL US rod and reel US LL Larval survey southern Gulf of St.

LawrenceSouthwest Nova Scotia

DiagnosticsMost of the appropriate diagnostics appear to be included

Most of the appropriate diagnostics appear to be included

Most of the appropriate diagnostics appear to be included No diagnotics presented Yes diagnotics presented Few diagnostics presented

Presented ‐ Some Deviation for expectations

Most of the appropriate diagnostics appear to be included

all the appropriate diagnostics were included

all the appropriate diagnostics were included

Appropriateness of data exclusions and classifications (e.g. to identify targeted trips).

data exclusions/classifications are listed and justified, specific targeting factors included in standardisation

data exclusions not discussed, targetting not an issue

data exclusions not discussed, although data is classified

data described and method clearly explained with caveats and limitations

data exclusions are clearly identified and justified, alternate CPUE runs are attempted using additional exclusions. GLM includes factors that could be considered proxies for targetting

Data exclusions, if any are not mentioned. GLMM specifically includes a target factor

Data exclusions are described. Vessel exclusions applied to limit trips to those that caught BF in >= 2 years.

data collection method clearly explained, as is a survey, presumabely few data exclusions

data exclusions are indicated, classifications appropriate

data exclusions are indicated, classifications appropriate

Geographical Coverage Geographical coverage is limited to bay of Biscay, maps are provided

Coverage limited to the straits of Gibraltar

Coverage limited to the straits of Gibraltar

coverage limited to Med. Maps of surveys provided

covers west and north‐east Atlantic. Distribution maps are provided

Northeast US coast only, no distribution maps provided U.S. GOM

coverage limited to Med. No maps of surveys provided

coverage limited to Nafo area 4T

coverage limited to Nafo area 4X

Catch Fraction catch fraction is roughly 5% ? Not clear appears to be small NA significant ? Not discussed. NA

Length of Time Series relative to the history of exploitation.

longer series (starts in 1952) ; shorter series (starts in 2000)

time series starts at beginning of the 1980s

time series starts at beginning of the 1980s Starts 2000

for the west ; for the northeast

time series starts at beginning of the 1980s 1987‐2010 since 2001

since 1981, exploitation began in 1972‐73 since 1988

Are other indices available for the same time period? Yes (5) Yes (3) Yes (3) Yes (2) Yes (3) Yes (3) Yes (3) Yes (1) no no

Does the index standardization account for known factors that

influence catchability/selectivity?

analysis includes many factors that could affect fishing efficiency/selectivity. Multiple interactions included

factors included in the model, table 1, are not explained in the text and impossible to understand for those not immediately familiar with the fishery. It would appear only one factor was included that could influence catchability ‐ trap

due to the nature of the fishery, only the trap factor could explain catchability but the factors were explained. GLM contains only factors

Methodology for standardisation of the series appears to be appropriate for a survey

gear type is included as is a selectivity proxy. No interactions included

Model includes multiple factors that could influence cathability and selectivity

Methodology for standardisation of the series appears to be appropriate for a survey

factors are month, fleet, gear and hours fished

factors are month, fleet, gear and hours fished

Are there conflicts between the catch history and the CPUE

response?No conflict noted No conflict noted No conflict noted No conflict noted No conflict noted No conflict noted

Yes. Circle hooks, weak hooks and regulatory measures were discussed. The index was not adjusted for these factors. No conflict noted

response sensitive to management measures and shrinking quotas

response sensitive to management measures and shrinking quotas

Is the interannual variability within plausible bounds (e.g.

SCRS/2012/039) variability increases over the latter years of the series

there is a high degree of variability, but no formal tests conducted to see whether this is biologically plausible

There is a high degree of variability between the beginning of the series and the final few years, but no formal tests conducted to see whether this is biologically plausible

annual variability higher for west atlantic base case cpues, but northeast cpue has extreme increase in most recent years Unknown

variability does not impair interpretation of the trend. Increased variability in recent times.

variability does not impair interpretation of the trend. Increased variability in recent times.

Are biologically implausible interannual deviations severe? (e.g. SCRS/2012/039) moderate No tests condcuted No tests condcuted Unknown

deviations relate to known impacts

deviations relate to known impacts

Assessment of data quality and adequacy of data for

standardization purpose (e.g. sampling design, sample size,

factors considered)

information does not include length frequencies of catches in recent years. Multiple factors and interactions included. Model design takes into account effort distribution. Discussions of data quality touched on

document states LF data was recorded, but it is not presented. Document states series applied to spawners 10+, model is extremely low on factors

data is discussed and method has been adjusted from prior studies accordingly. Paper provides size distribution of sampled catch

data is presented and limitations of the method discussed

information includes length frequencies of catches. Multiple factors included. Sample design and sensitivity runs investigate effort distribution as well as data assumptions/concerns and effort is presented

information includes length frequencies of catches. Multiple factors included. Sample design and sensitivity runs investigate alternative data assumptions/concerns

data is presented and methodolgy for standardisation explicitly presented. Factors appear to be appropriate

data is presented and methodolgy for standardisation explicitly presented. Factors appear to be appropriate for a survey

includes trends in forage fish and recent changes in environmental variables. Shows weight frequencies and trends in condition.

includes trends in forage fish and recent changes in environmental variables. Shows weight frequencies and trends in weight.

Is this CPUE time series continuous?

Yes Yes Yes Yes Yes

No missing points, although series have been split into time periods Yes. yes yes

193

1970 1980 1990 2000 2010

01

23

45

6

CAN_GSL

1970 1980 1990 2000 2010

0.0

1.0

2.0

3.0

CAN_SWNS

1970 1980 1990 2000 2010

0.0

0.5

1.0

1.5

2.0

US_RR_0_145

1970 1980 1990 2000 2010

0.0

1.0

2.0

US_RR_66_114

1970 1980 1990 2000 2010

0.0

1.0

2.0

US_RR_115_144

1970 1980 1990 2000 2010

0.0

1.0

2.0

US_RR_195plus

1970 1980 1990 2000 2010

0.0

1.0

2.0

3.0

US_RR_177plus

1970 1980 1990 2000 2010

0.0

0.5

1.0

1.5

2.0

2.5

JPN_LL_AREA2

1970 1980 1990 2000 2010

01

23

45

6

US_LARVAL

1970 1980 1990 2000 2010

0.0

1.0

2.0

3.0

US_LL_GOM

1970 1980 1990 2000 2010

0.0

0.5

1.0

1.5

JPN_LL_GOM

1970 1980 1990 2000 2010

0.0

1.0

2.0

TAGGING

Year

Ind

ex

of A

bu

nd

an

ce

Figure 1. Fits to the CPUE indices for 2012 western Atlantic BFT continuity VPA (observed shown as black points, predicted shown as blue lines), compared the 2010 base model (observed shown as open points, predicted shown as red lines). Reproduced from the 2012 Atlantic bluefin tuna stock assessment report.

194

1970 1980 1990 2000 2010

0.0

1.0

2.0

3.0

CAN_GSL

1970 1980 1990 2000 2010

0.0

1.0

2.0

CAN_SWNS

1970 1980 1990 2000 2010

0.0

0.5

1.0

1.5

2.0

US_RR_0_145

1970 1980 1990 2000 2010

0.0

1.0

2.0

US_RR_66_114

1970 1980 1990 2000 2010

0.0

1.0

2.0

US_RR_115_144

1970 1980 1990 2000 2010

0.0

1.0

2.0

US_RR_195plus

1970 1980 1990 2000 2010

0.0

1.0

2.0

3.0

US_RR_177plus

1970 1980 1990 2000 2010

0.0

1.0

2.0

3.0

JPN_LL_AREA2

1970 1980 1990 2000 2010

01

23

45

6

US_LARVAL

1970 1980 1990 2000 2010

0.0

1.0

2.0

3.0

US_LL_GOM

1970 1980 1990 2000 2010

0.0

0.5

1.0

1.5

JPN_LL_GOM

1970 1980 1990 2000 2010

0.0

1.0

2.0

TAGGING

Year

Ind

ex

of A

bu

nd

an

ce

Figure 2. Fits to the CPUE indices (black points) for western Atlantic BFT base VPA runs without the Canadian GSL (red lines) and U.S RR >177 cm (turquoise lines) indices compared to the 2012 continuity model (dark blue lines). Reproduced from the 2012 Atlantic bluefin tuna stock assessment report.

195

Selectivity Sensitivity Models Maturity Sensitivity Models

Figure 3. Annual estimates of spawning stock biomass (SSB), depletion with regard to 1970 (SSB/SSB1970), apical fishing mortality and recruitment for the VPA continuity and select sensitivity runs. Sensitivity run 2 displayed poor model behavior (e.g. Apical F = 5.0 in 2011 – the upper bound on F allowable in VPA). Reproduced from the 2012 Atlantic bluefin tuna stock assessment report.

196

1970 1980 1990 2000 2010

010

000

3000

050

000

1970 1980 1990 2000

0e+

002e

+05

4e+

05

2000 2002 2004 2006 2008 2010

1000

014

000

1800

022

000

2000 2002 2004 2006 2008

010

0000

2000

0030

0000

ContinuityCAN_GSL_RemovedCAN_SWNS_RemovedUS_RR_66_114_RemovedUS_RR_115_144_RemovedUS_RR_145less_RemovedUSRR_177plus_RemovedUS_RR_195plus_RemovedJPN_LL_AREA2_RemovedUS_LL_GOM_RemovedUS_Larval_RemovedJPN_LL_GOM_RemovedTagging_Removed

SSB Recruits

Figure 4. Jack-knife analysis demonstrating the effects of iteratively removing individual relative abundance indices and associated partial catch-at-age matrices from the western BFT VPA continuity model. Reproduced from the 2012 Atlantic bluefin tuna stock assessment report.

197

Figure 5. Estimated status of stock relative to the Convention objectives (MSY) by year (1973 to 2011). The lines represent the time series of point estimates for each recruitment scenario. The estimated stock status in 2009 (the geometric mean fishing mortality during 2006-2008 is the proxy for F in 2009) in shown as a red “X”. Reproduced from the 2012 Atlantic bluefin tuna stock assessment report.

198

60% Probability – Low Recruitment Potential

60% Probability – High Recruitment Potential

Figure 6. WBFT: The projected SSB/SSBMSY and F/FMSY trajectories at various catch levels for the two recruitment scenarios. These trajectories correspond to a 60% probability of achieving a given level of SSB/SSBMSY or F/FMSY. Reproduced from the 2012 Atlantic bluefin tuna stock assessment report.

199

Figure 7. Median (solid line) estimates of spawning stock biomass, abundance of spawners (Age 9+), apical fishing mortality and recruitment. The 2009 to 2011 recruitment estimates were replaced with values from the two-line S-R relationship. Dashed lines indicate the 80% confidence interval. Reproduced from the 2012 Atlantic bluefin tuna stock assessment report.

200

Low Recruitment Scenario

High Recruitment Scenario

Figure 8. Histograms of bootstrap estimates of 2011 stock status at using FMSY and F0.1 references. The yellow bar contains the value corresponding to the continuity-case deterministic estimate. The cumulative frequency is shown as a solid red line. Reproduced from the 2012 Atlantic bluefin tuna stock assessment report.

201

vpa-2box model diagnostics used in the …...vpa-2box model diagnostics used in the 2012 assessment...

Documents