impacts of phased-array radar data on forecaster

16
Impacts of Phased-Array Radar Data on Forecaster Performance during Severe Hail and Wind Events KATIE A. BOWDEN Cooperative Institute for Mesoscale Meteorological Studies, University of Oklahoma, Norman, Oklahoma PAMELA L. HEINSELMAN NOAA/OAR/National Severe Storms Laboratory, Norman, Oklahoma DARREL M. KINGFIELD Cooperative Institute for Mesoscale Meteorological Studies, University of Oklahoma, and NOAA/OAR/National Severe Storms Laboratory, Norman, Oklahoma RICK P. THOMAS School of Psychology, Georgia Institute of Technology, Atlanta, Georgia (Manuscript received 29 August 2014, in final form 8 December 2014) ABSTRACT The ongoing Phased Array Radar Innovative Sensing Experiment (PARISE) investigates the impacts of higher-temporal-resolution radar data on the warning decision process of NWS forecasters. Twelve NWS forecasters participated in the 2013 PARISE and were assigned to either a control (5-min updates) or an experimental (1-min updates) group. Participants worked two case studies in simulated real time. The first case presented a marginally severe hail event, and the second case presented a severe hail and wind event. While working each event, participants made decisions regarding the detection, identification, and re- identification of severe weather. These three levels compose what has now been termed the compound warning decision process. Decisions were verified with respect to the three levels of the compound warning decision process and the experimental group obtained a lower mean false alarm ratio than the control group throughout both cases. The experimental group also obtained a higher mean probability of detection than the control group throughout the first case and at the detection level in the second case. Statistical significance ( p value 5 0.0252) was established for the difference in median lead times obtained by the experimental (21.5 min) and control (17.3 min) groups. A confidence-based assessment was used to categorize decisions into four types: doubtful, uninformed, misinformed, and mastery. Although mastery (i.e., confident and correct) decisions formed the largest category in both groups, the experimental group had a larger proportion of mastery decisions, possibly because of their enhanced ability to observe and track individual storm char- acteristics through the use of 1-min updates. 1. Introduction During warning operations, weather forecasters rely heavily on radar technology to observe and monitor potentially severe thunderstorms (Andra et al. 2002). The National Weather Service (NWS) currently utilizes a network of 158 Weather Surveillance Radar-1988 Dopplers (WSR-88Ds) that are located across the United States (Whiton et al. 1998). Given that the WSR- 88D was initially designed with a projected lifetime of 20 yr (Zrni c et al. 2007), continuous upgrades are re- quired to maintain its functionality (e.g., Saffle et al. 2009; Crum et al. 2013). However, eventually the WSR- 88D network will have to be replaced. A replacement candidate under consideration is phased-array radar (PAR; Zrni c et al. 2007). To explore the suitability of PAR for weather observation, a phased-array antenna was loaned to the NOAA/National Severe Storms Corresponding author address: Katie Bowden, 120 David L. Boren Blvd., Norman, OK 73072. E-mail: [email protected] APRIL 2015 BOWDEN ET AL. 389 DOI: 10.1175/WAF-D-14-00101.1 Ó 2015 American Meteorological Society Unauthenticated | Downloaded 02/17/22 08:19 AM UTC

Upload: others

Post on 17-Feb-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Impacts of Phased-Array Radar Data on Forecaster Performance during SevereHail and Wind Events

KATIE A. BOWDEN

Cooperative Institute for Mesoscale Meteorological Studies, University of Oklahoma, Norman, Oklahoma

PAMELA L. HEINSELMAN

NOAA/OAR/National Severe Storms Laboratory, Norman, Oklahoma

DARREL M. KINGFIELD

Cooperative Institute for Mesoscale Meteorological Studies, University of Oklahoma, and NOAA/OAR/National

Severe Storms Laboratory, Norman, Oklahoma

RICK P. THOMAS

School of Psychology, Georgia Institute of Technology, Atlanta, Georgia

(Manuscript received 29 August 2014, in final form 8 December 2014)

ABSTRACT

The ongoing Phased Array Radar Innovative Sensing Experiment (PARISE) investigates the impacts of

higher-temporal-resolution radar data on the warning decision process of NWS forecasters. Twelve NWS

forecasters participated in the 2013 PARISE and were assigned to either a control (5-min updates) or an

experimental (1-min updates) group. Participants worked two case studies in simulated real time. The first

case presented a marginally severe hail event, and the second case presented a severe hail and wind event.

While working each event, participants made decisions regarding the detection, identification, and re-

identification of severe weather. These three levels compose what has now been termed the compound

warning decision process. Decisions were verified with respect to the three levels of the compound warning

decision process and the experimental group obtained a lower mean false alarm ratio than the control group

throughout both cases. The experimental group also obtained a higher mean probability of detection than the

control group throughout the first case and at the detection level in the second case. Statistical significance

( p value 5 0.0252) was established for the difference in median lead times obtained by the experimental

(21.5min) and control (17.3min) groups. A confidence-based assessment was used to categorize decisions

into four types: doubtful, uninformed, misinformed, and mastery. Although mastery (i.e., confident and

correct) decisions formed the largest category in both groups, the experimental group had a larger proportion

of mastery decisions, possibly because of their enhanced ability to observe and track individual storm char-

acteristics through the use of 1-min updates.

1. Introduction

During warning operations, weather forecasters rely

heavily on radar technology to observe and monitor

potentially severe thunderstorms (Andra et al. 2002).

The National Weather Service (NWS) currently utilizes

a network of 158 Weather Surveillance Radar-1988

Dopplers (WSR-88Ds) that are located across the

United States (Whiton et al. 1998). Given that theWSR-

88D was initially designed with a projected lifetime of

20 yr (Zrni�c et al. 2007), continuous upgrades are re-

quired to maintain its functionality (e.g., Saffle et al.

2009; Crum et al. 2013). However, eventually the WSR-

88D network will have to be replaced. A replacement

candidate under consideration is phased-array radar

(PAR; Zrni�c et al. 2007). To explore the suitability of

PAR for weather observation, a phased-array antenna

was loaned to the NOAA/National Severe Storms

Corresponding author address: Katie Bowden, 120 David L.

Boren Blvd., Norman, OK 73072.

E-mail: [email protected]

APRIL 2015 BOWDEN ET AL . 389

DOI: 10.1175/WAF-D-14-00101.1

� 2015 American Meteorological SocietyUnauthenticated | Downloaded 02/17/22 08:19 AM UTC

Laboratory (Forsyth et al. 2005) in Norman, Oklahoma

by the U.S. Navy. A key characteristic of this PAR is its

capability to provide volume updates in less than 1min

(Heinselman and Torres 2011).

When exploring future replacement technologies to

the WSR-88D, an important consideration is forecaster

needs. In a survey conducted by LaDue et al. (2010),

forecasters expressed a need for higher-temporal-

resolution radar data during rapidly evolving weather

events. In particular, forecasters reported that the 4–6-min

updates provided by the WSR-88D are insufficient for

observing radar precursor signatures of thunderstorms

such as downbursts (LaDue et al. 2010). Fujita and

Wakimoto (1983) define a downburst as, ‘‘A strong

downdraft which induces an outburst of damaging winds

on or near the ground.’’ Radar precursor signatures,

such as a descending high-reflectivity core and strong

midlevel convergence, can be used to identify storms

capable of producing a downburst (e.g., Roberts and

Wilson 1989; Campbell and Isaminger 1990). Such pre-

cursor signatures, however, can evolve too quickly for

trends to be sampled sufficiently by theWSR-88D. Such

limitations may result in delayed warnings and therefore

reduced lead time or, worse, missed events. These lim-

itations are of concern because downbursts can produce

damaging winds at the surface, presenting a threat to life

and property. Therefore, for improvement in warning

operations, a future radar system should be capable of

sampling the atmosphere on a shorter time scale, which

PAR can provide.

Heinselman et al. (2008) examined the weather sur-

veillance capabilities of the PAR during severe weather

events. In particular, microburst precursor signatures

observed by the PAR were compared to those observed

by the WSR-88D. During a 13-min observation period

when a storm was sampled by both radars, the PAR and

WSR-88D collected 23 and 3.5 volume scans, respec-

tively. The considerably faster PAR sampling resulted in

an improved ability to observe and track microburst

precursor signatures, prior to the detection of divergent

outflow at the lowest scans. Additionally, Heinselman

et al. (2008) analyzed a hailstorm observed by PAR.

Although a comparison to the WSR-88D was not

available, the development of radar features indicative

of a hail threat (e.g., bounded weak-echo region and

three-body scatter spike) were clearly visible in PAR

data as the storm quickly evolved. These findings by

Heinselman et al. (2008) suggest that the use of PAR

data could provide forecasters with the ability to detect

impending severe weather earlier, which in turn may

provide the public with longer warning lead times.

The Phased Array Radar Innovative Sensing Exper-

iment (PARISE) was designed to assess the impacts of

higher-temporal-resolution radar data on the warning

decision process of forecasters (Heinselman et al. 2012;

Heinselman and LaDue 2013). The work of PARISE is

critical to ensuring that the implementation of PAR

technology would be beneficial to the NWS. The 2010

and 2012 PARISE focused on low-end tornado events

(Heinselman et al. 2012; Heinselman and LaDue 2013).

Both experiments reported enhanced forecaster per-

formance with the use of 1-min radar updates compared

to forecasters using traditional 5-min radar updates, as

demonstrated through warnings issued with longer tor-

nado lead times. The purpose of this study was to extend

the work of PARISE to include severe hail and wind

events, with a focus on downbursts (see section 3b for

the NWS definition of severe). Based on the findings of

Heinselman et al. (2012) and Heinselman and LaDue

(2013), we hypothesized that during such events, rapidly

updating radar data would positively impact the warning

decision process of NWS forecasters. To assess this hy-

pothesis, data collection focused on both quantitative

and qualitative aspects of the forecaster warning de-

cision process. In particular, details of warning products

were recorded so that forecaster performance could be

assessed from a verification standpoint. The data col-

lected revealed that the warning decision process com-

prised three key decision stages. For this reason,

verification was assessed with regard to what has been

termed the compound warning decision process, which

recognizes that forecasters detect, identify, and re-

identify severe weather (see section 3a). Additionally,

confidence ratings were obtained each time a forecaster

made a key decision, along with reasoning for each

confidence rating. Through the use of a confidence-

based assessment, these ratings were analyzed to ad-

dress the question of whether increasing the temporal

availability of radar data leads to better decisions. Spe-

cifically, decisions were categorized into four types:

doubtful, uninformed, misinformed, and mastery. The

reasoning for each confidence rating provides insight

into why each decision type occurred, and whether the

temporal resolution of radar data played a role.

2. Methods

a. Experimental design

From two NWS Weather Forecast Offices (WFOs),

12 forecasters were recruited to participate in the 2013

PARISE. The two WFOs were located in the NWS’s

Southern and Eastern Regions, and therefore given the

climatology of these regions, the 12 forecasters would

have experienced working severe hail and wind events

(Kelly et al. 1985). During each of the six experiment

weeks, one forecaster from each WFO visited Norman,

390 WEATHER AND FORECAST ING VOLUME 30

Unauthenticated | Downloaded 02/17/22 08:19 AM UTC

Oklahoma. The experiment adopted a two-independent-

group design, where each week forecasters were as-

signed to either a control or an experimental group.

The volume update time acted as the independent var-

iable, where the control group received 5-min updates

from temporally degraded PAR data, and the experi-

mental group received 1-min updates from full-

temporal-resolution PAR data.

To ensure balanced groups in terms of knowledge and

experience, matched random assignment was in-

corporated into the experiment design. Matching was

accomplished through an online survey that was issued

to participants prior to the experiment. Participants’

experience was measured by the number of years they

had worked in the NWS (Table 1, columns 1 and 3).

Although experience is important with respect to the

amount of exposure one has had in their work envi-

ronment, experience does not imply expertise. As de-

scribed by Jacoby et al. (1986), experience and expertise

are ‘‘conceptually orthogonal,’’ with a distinguishing

factor being that expertise is achieved through acquiring

a ‘‘qualitatively higher level of either knowledge or

skill.’’ Therefore, to assess aspects of forecaster exper-

tise relevant to this study, knowledge was measured

through four questions regarding familiarity (Table 1,

columns 2 and 3), understanding, knowledge of pre-

cursors, and training with respect to downburst events

(Table 1, columns 4–7). For knowledge, the three

questions requiring qualitative responses were com-

pared to criteria that were based on downburst con-

ceptual models (e.g., Atkins and Wakimoto 1991).

Based on their survey responses, all participants were

assigned an experience and knowledge score ranging

between 1 and 5 (Fig. 1). The experience score was

based on the single experience question, whereas the

knowledge score was generated by averaging the points

obtained from the four knowledge questions. Among

the participants, experience was spread fairly evenly,

and knowledge was clustered around the medium range

TABLE 1. Criteria for points assigned to questions from the preexperimental online survey. Columns 1 and 3 refer to how experience

scores were assigned, and columns 2–7 refer to how knowledge scores were assigned. In column 2, a scale from 1 to 10 is used (where 1

indicates no familiarity and 10 indicates extensive familiarity).

Experience

(yr) Familiarity Points

Understanding

of a downburst

Precursors for

forecasting a downburst Training Points

#5 1 and 2 1 Definition Suspended core Distance Learning

Operations Course

and Advanced

Warning Operations

Course

Assign one point for

each topic discussed

within the question

category; total of five

points for each category

#10 3 and 4 2 Wet and dry variety

recognized

Midaltitude radial

convergence

Seasonal familiarization

training

#15 5 and 6 3 Description of

soundings

Storm-top

divergence

Other courses (e.g.,

online/workshops)

#20 7 and 8 4 Thermodynamic

and dynamic

mechanisms

Environment

assessment

Exposure to

literature/current

forecasting techniques

.20 9 and 10 5 Demonstration of

an understanding

beyond that of a

typical responder

Demonstration of

an understanding

beyond that of a

typical responder

Personal experience

(e.g., storm chasing)

FIG. 1. Experience and knowledge scores for each participant are

given. The group assignment of each participant was based on the

control and experimental group combinations that yielded the

smallest Mahalanobis distance. Participants assigned to the control

or experimental groups are assigned open or filled circles,

respectively.

APRIL 2015 BOWDEN ET AL . 391

Unauthenticated | Downloaded 02/17/22 08:19 AM UTC

(Fig. 1). For all possible group combinations, the

Mahalanobis distance was computed to assess the sim-

ilarity between groups by using experience and knowl-

edge scores as variables (McLachlan 1999). The smallest

distance represented the greatest similarity between

groups, which therefore determined the group assign-

ment for each participant (Fig. 1).

Although efforts were made to match groups, the

limitations associated with the applied methodology

should be acknowledged. A limitation that arose fol-

lowing the distribution of the survey was that partici-

pants may not have always interpreted the questions

correctly, leading to discussions on tangential topics. For

example, participants were asked to explain their un-

derstanding of a downburst. Although most participants

perceived this question as intended (Table 1, column 4),

some responses focused on the type of damage observed

from downbursts. In addition, the amount of time and

effort that participants invested into the survey was

likely variable. For these reasons, it is possible that

survey responses did not provide a complete represen-

tation of participants’ knowledge. However, despite

this possibility, the consistent assessment of survey re-

sponses and the use of a similarity metric provided

a means to objectively match groups.

b. Case studies

The National Weather Radar Testbed located in

Norman, Oklahoma, is home to an S-band PAR that is

being evaluated and tested for weather applications.

Given that the PAR is a single flat-panel array, data

collection is limited to a 908 sector at any one time.

PAR’s electronic beam steering means that it operates

with a nonconformal beamwidth increasing from 1.58 to2.18 as the beam is steered from boresight to6458 (Zrni�cet al. 2007). Additionally, the electronic beam steering

allows the atmosphere to be scanned noncontiguously,

enabling weather-focused observations, which further

reduce the volume update time to less than 1min

(Heinselman and Torres 2011; Torres et al. 2012).

Based on the following criteria, two cases from ar-

chived PAR data were selected for the 2013 PARISE

(Table 2). First, the cases needed to be long enough to

allow participants to settle into their roles and demon-

strate their warning decision processes as the weather

evolved. Second, severe hail and/or wind reports needed

to be associated with the event, preferably toward the

end of the case to give participants an opportunity to

interrogate the storms beforehand and make warning

decisions as necessary. Third, for consistent low-level

sampling of the weather event, the PAR data needed to

be uninterrupted and within a range of 100 km from

the radar.

Case 1 presented multicell clusters of storms that oc-

curred at 0134–0210 UTC 20 April 2012 (Figs. 2a,b;

Table 2). This marginally severe (i.e., at or slightly

greater than the severe criteria) hail event was observed

by the PAR using an enhanced volume coverage pattern

(VCP) 12 strategy. Specifically, this VCP scanned 19

elevation angles ranging between 0.518 and 52.908. Al-

though only one severe hail report occurred during case

time, an additional six hail reports were associated with

the same storm 1h after case end time.

Case 2 included multicellular storms with some rota-

tion that were sampled by PAR at 2053–2139 UTC 16

July 2009 (Figs. 2c,d; Table 2). PAR collected data using

a VCP that was composed of 14 elevation angles ranging

between 0.518 and 38.808. Both severe hail and wind

events were reported and associated with a downburst

event that occurred in central Oklahoma. During case

time, there was one severe wind and two severe hail

reports. Within the hour after case end time, an addi-

tional 16 reports of severe hail and wind events were

associated with the same storm.

All storm reports were obtained from StormData, which

is logged in the NWS Performance Management System

(https://verification.nws.noaa.gov/). Because the spatial

and temporal accuracy of Storm Data is limited (e.g., Witt

et al. 1998; Trapp et al. 2006), it was important to ensure

consistency between the location and timing of storm re-

ports with the radar data. Additionally, weather reports

obtained during the Severe Hazards Analysis and Verifi-

cation Experiment (SHAVE; Ortega et al. 2009) were

examined to validate confidence in the storms that did not

produce severe weather. Both SHAVE and Storm Data

were in agreement with storms classified as null events.

The occurrence of both severe and nonsevere storms

during the cases provided a realistic scenario whereby

participants were challenged to differentiate between

storms that would and would not produce severe weather.

c. Working the cases

Before working each case, participants viewed

a weather briefing video that was prepared by J. Ladue

TABLE 2. Descriptions of cases 1 and 2.

Case 1 Case 2

Time and

date

0134–0210 UTC

20 Apr 2012

2053–2139 UTC 16 Jul 2009

Event type Multicell, severe hail Multicell, severe hail

and wind

Storm

reports

0209 UTC, 1-in.

hail

2135 UTC, 1.75-in. hail;

2135 UTC, estimated

gust 56-kt wind; and

2138 UTC, 1.75-in. hail

VCP 19 elevations,

0.518–52.90814 elevations, 0.518–38.808

392 WEATHER AND FORECAST ING VOLUME 30

Unauthenticated | Downloaded 02/17/22 08:19 AM UTC

of the Warning Decision Training Branch. This video

provided participants with an overview of the envi-

ronmental conditions associated with the case, along

with satellite and radar imagery leading up to the case

start time. The weather briefing gave all participants

the same information from which they could form ex-

pectations. Participants were then told that they had

just come on shift, that no warnings were in progress,

and that it was their job to determine whether a warn-

ing was required for the storms they would encounter.

All participants worked independently in separate

rooms. They were reminded that the data collected

from their participation would remain anonymous, and

participants were encouraged to work as they would in

their usual WFOs. In this study, participants are

referred to as P1–P6 for the control group and P7–P12

for the experimental group.

Cases were played in simulated real time using the

next-generation Advanced Weather Interactive Pro-

cessing System-2 (AWIPS-2). Given that during the

summer of 2013 participants were using AWIPS-1

within their WFOs, a short familiarization session with

the newer software prior to working events was pro-

vided to increase the participants’ comfort level using

AWIPS-2 as their forecasting tool. Therein, participants

were able to view base velocity, reflectivity, and spec-

trum width products from the PAR. During the case,

participants received verbal information of storm re-

ports that were timed according to the details provided

in Storm Data. All warning products (e.g., special

FIG. 2. The (left) 0.518 reflectivity and (right) velocity for (a),(b) case 1 at 0140 UTC 20 Apr 2012 and (c),(d) case 2

at 2111 UTC 16 Jul 2009. Times were chosen to illustrate the variety of storms that participants encountered during

the cases. The storms that were later associated with severe weather reports from Storm Data are identified by the

white circles.

APRIL 2015 BOWDEN ET AL . 393

Unauthenticated | Downloaded 02/17/22 08:19 AM UTC

weather statements, severe thunderstorm warnings, and

severe weather statements) were issued using the

Warning Generation (WARNGEN) software. When-

ever participants issued a product, they were asked to

indicate their level of confidence on a scale that ranged

from not sure (0%), to partially sure (50%), to sure

(100%; Fig. 3). Following the case, participants were

asked a set of probing questions that targeted the rea-

sons for each decision and the decision maker’s associ-

ated confidence level.

3. Forecaster performance

a. The compound warning decision process

Decisions are oftentimes not a one-step procedure.

Rather, decision makers can find themselves in a com-

pound decision environment that consists of multiple

decision elements. For example, search and rescue op-

erations require locating a target followed by identifying

that the correct target has been recovered (Duncan

2006), and medical diagnoses can involve first detecting

an abnormality, and then correctly localizing the ab-

normality for treatment (Obuchowski et al. 2000). Ob-

servations of participants during the 2013 PARISE

revealed that weather forecasters also encounter mul-

tiple problems when working toward a solution. In

particular, these problems are focused on warning de-

cisions and are recognized as detection, identification,

and reidentification, together forming the compound

warning decision process (Fig. 4). Detection relates to

the decision to warn; a forecaster perceives and com-

prehends information that leads to the belief that severe

weather will occur. The decision to issue a warning

prompts the forecaster to open the WARNGEN soft-

ware, at which point the forecaster progresses to the

identification stage. For instance, when issuing a severe

thunderstorm warning, the forecaster must identify the

expected weather threats (i.e., hail and/or wind) from

the storm in question. Once the warning is issued, the

forecaster continues to monitor the storm’s evolution

and updates the warning by issuing severe weather

statements. It is through these updates that the

forecaster reidentifies the weather threats; the threat

may be maintained, changed in magnitude, changed in

type, or canceled.

Distinguishing severe hail and wind events from one

another is a challenge that NWS forecasters regularly

encounter during warning operations. Currently,

though, the NWS only assesses forecaster performance

at the detection level. The compound warning decision

process, however, allows for a more comprehensive as-

sessment of warning decisions. A correct decision at the

detection level does not necessarily mean that the

forecaster has accurately comprehended information

regarding the storm’s potential. For example, while

working case 2, P3 (control participant) issued a severe

thunderstorm warning, identifying only wind as the

weather threat. Although at the detection level P3 made

a correct decision, P3 had missed the hail threat during

identification. The participant maintained this threat

expectation through the issuance of two warnings, only

realizing after the first hail report at 2135 UTC 16 July

2009 that hail was also a threat. At this point, P3 issued

a severe weather statement to reidentify both hail and

wind as weather threats, but unfortunately had not

communicated the hail threat until after the event had

occurred. This example demonstrates that to fully un-

derstand the quality of a forecast, a more intricate

analysis of warning decisions is required.

FIG. 3. Tool used to indicate confidence related to the issuance of a warning, severe weather

statement (SVS), or special weather statement (SPS) product.

FIG. 4. The compound warning decision process is composed of

three decision stages: detection, identification, and reidentification.

394 WEATHER AND FORECAST ING VOLUME 30

Unauthenticated | Downloaded 02/17/22 08:19 AM UTC

b. Verification

Tomeasure forecaster performance at the three levels

of the compound warning decision process, forecaster

decisions were verified using the NWS severe criteria. In

operations, a severe thunderstorm warning is verified by

the occurrence of 50 knots (kt; 1 kt 5 0.51m s21) or

higher wind and/or at least hail of 1-in. diameter,

whereas a tornado warning is verified by reports of

a tornado within the spatiotemporal limits of the warn-

ing polygon (NOAA 2011). Storm reports associated

with the severe weather events in cases 1 and 2 were

treated as instantaneous events (Table 2). Additionally,

since participants worked only a portion of a severe

weather event, there were occasions where warnings

were verified only by severe hail and/or wind events

after the case had ended. In these instances, storm re-

ports recorded 1h after case end time were used for

verification purposes. For detection, individual warnings

were verified by assessing whether the warning encom-

passed an event both spatially and temporally. Each

event that was not warned for was recorded as a miss.

For identification, weather threats were first considered

individually. For example, in case 1, only hail reports

were associated with the severe storm. If both hail and

wind were identified as threats in the warning, then hail

was a hit, and wind was a false alarm. The results from

each weather threat were then combined for overall

identification statistics. Reidentification was verified in

a similar manner to identification, but this time for the

updated warning information detailed in severe weather

statements.

Performance measures were calculated for detection,

identification, and reidentification. The NWS commonly

assesses forecaster performance using the probability of

detection POD and false alarm ratio FAR. Whereas the

POD represents the proportion of events that occurred

and were successfully warned for, the FAR represents

the proportion of warnings issued that were false alarms.

The POD and FAR can be calculated as follows (Wilks

2006):

POD5a

a1 cand (1)

FAR5b

a1 b, (2)

where a, b, and c are the numbers of hits, false alarms,

and misses, respectively. For cases 1 and 2, group mean

POD and FAR scores were calculated for detection,

identification, and reidentification. With the exception

of two instances in case 2, the experimental group ob-

tained superior mean POD and FAR scores compared

to the control group (Table 3). Participants’ individual

results are plotted in Figs. 5 and 6, which illustrates the

underlying distribution in performance that led to the

different group averages.

1) CASE 1

All but one participant (control) successfully detected

the severe hail event in case 1, which resulted in POD

scores of 0.83 and 1 for the control and experimental

groups, respectively (Fig. 5a). All control and five ex-

perimental participants also decided to issue warnings

on a storm that was not associated with severe weather.

Although at the detection level the experimental

group’s performance scores were more variable, the

overall performance of the experimental group resulted

in a lower FAR score (0.45) than the control group (0.58;

Fig. 5a). Five control and two experimental participants

obtained FAR scores of 0.5 by warning once on the se-

vere storm and once on a nonsevere storm. Two warn-

ings were also issued by P5, but neither verified (FAR51). Three warnings were issued by P7 and P12, of which

only one verified at the detection level (FAR 5 0.67).

Three warnings were also issued by P10, but two of his

warnings verified (FAR5 0.33). The only experimental

participant who did not incorrectly detect severe

weather (FAR 5 0) was P11.

Following detection, participants identified the

weather threat associated with the storm that was being

warned. All participants identified hail in each of their

warnings, which only verified for warnings that were

successful at the detection level. Therefore, the POD

scores for identification match those for detection

(Fig. 5b). Participants were also assigned FAR scores

because of incorrect identifications (Fig. 5b). Incorrect

identifications were a result of two reasons: 1) a weather

threat was identified for a warning that did not verify at

the detection level and 2) incorrect identification of

TABLE 3. Mean POD and FAR statistics for the control

and experimental groups for detection, identification, and

reidentification.

Case 1 Case 2

Mean

control

Mean

experimental

Mean

control

Mean

experimental

Detection

POD 0.83 1.00 0.95 1.00

FAR 0.58 0.45 0.33 0.25

Identification (overall threat)

POD 0.83 1.00 0.88 0.88

FAR 0.79 0.70 0.36 0.22

Reidentification (overall threat)

POD 0.60 0.83 1.00 0.90

FAR 0.87 0.69 0.23 0.19

APRIL 2015 BOWDEN ET AL . 395

Unauthenticated | Downloaded 02/17/22 08:19 AM UTC

a wind threat was made on the severe storm that was

associated with only severe hail events. Both groups’

FAR scores increased from detection to identification,

though the experimental group continued to achieve

FIG. 5. POD (black circle) and FAR (open circle) scores for

(a) detection, (b) identification, and (c) reidentification in case 1.

The vertical dashed line separates the (left) control and the (right)

experimental participants, and the horizontal dashed line marks

the 0.5 values for POD and FAR scores.

FIG. 6. As in Figs. 5, but for case 2.

396 WEATHER AND FORECAST ING VOLUME 30

Unauthenticated | Downloaded 02/17/22 08:19 AM UTC

a lower FAR score than the control group (Table 3). We

surmise that the increase in FAR scores from the de-

tection to identification level is due to the added chal-

lenge of having to discern between potential weather

threats.

Reidentification of weather threats during case 1 were

made while updating a warning. No updates were issued

by P6, and therefore statistics were not calculated for

this participant. Hail and wind threats were only re-

identified on warnings that were false alarms at the de-

tection level by P4, P5, and P7. These participants

therefore received POD and FAR scores of 0 and 1,

respectively (Fig. 5c). The remaining participants re-

identified a hail threat on the correct storm at least once,

achieving POD scores of 1. The variable FAR scores at

the reidentification level resulted from 1) what storms

participants decided to update (i.e., the severe storm or

the nonsevere storm) and 2) whether participants were

able to correctly reidentify hail as the only threat.

Whereas the mean FAR score from identification to re-

identification remained nearly steady for the experimen-

tal group, the control group’s mean FAR score increased

(Table 3). Overall, the experimental group was more

successful at reidentifying the correct weather threat on

the correct (i.e., severe) storm than the control group.

2) CASE 2

During case 2, all participants except P5 successfully

detected the three severe weather events (Fig. 6a). This

participant missed one event, which resulted in the

control group achieving a slightly lower mean POD

score of 0.95 compared to the experimental group’s

mean POD score of 1 (Table 3). In comparison to POD

scores, FAR scores were more variable among partici-

pants (Fig. 6a). For participants obtaining FAR scores of

0, each warning that was issued encompassed the severe

storm and therefore was verified with respect to de-

tection. Participants obtaining FAR scores of 0.5 typi-

cally issued two warnings, of which only one was verified

by severe weather, while the other targeted a storm to

the north that was not associated with severe weather

reports. Three warnings were issued by both P4 and P6.

For P4, one of these warnings verified (FAR 5 0.67),

whereas for P6, two warnings verified (FAR 5 0.33).

Overall, the experimental group had fewer false alarms,

as demonstrated by their lower mean FAR score of 0.25

compared to 0.33 for the control group (Table 3).

Unlike in case 1, case 2 presented a storm that pro-

duced both severe hail and wind. Of these events, all

participants identified the wind event successfully, and

four participants in each group identified the hail events

(Fig. 6b). This similar performance between groups led

to matching mean POD scores at the identification level

of 0.88 (Table 3). The experimental group, though,

performed better than the control group regarding false

alarms. In case 2, false alarms at the identification level

occurred mostly within warnings that did not verify at

the detection level. While all control participants in-

correctly identified weather threats within these warn-

ings, three experimental participants achieved an FAR

score of 0. Additionally, two control participants in-

correctly identified a tornado threat. The resulting mean

FAR scores for identification were 0.36 for the control

group and 0.22 for the experimental group (Table 3).

When participants began to reidentify weather

threats, group POD scores increased and the FAR

scores decreased (Table 3). As the severe storm evolved

over time, participants realized that the southern storm

had more potential than the storm to the north, which

was beginning to dissipate. The wind threat associated

with the severe storm was correctly reidentified by all

participants, while all control and four experimental

participants also correctly reidentified the hail threat

(Fig. 6c). Some participants in both groups also in-

correctly reidentified weather threats. Whereas the ex-

perimental group’s FAR score decreased slightly from

identification to reidentification, the control group’s

FAR score decreased more substantially (Table 3).

However, the accuracy of the control group’s decisions

during reidentification improved to a level of accuracy

similar to that demonstrated by the experimental group

during the identification stage.

c. Lead time

The lead time was calculated as the time of the severe

hail or wind event minus the time of warning issuance.

For events that were unwarned, a lead time of 0min was

assigned. On occasions where multiple warnings en-

compassed one event, the earliest issued warning was

used to calculate lead time. Lead time was calculated for

all 12 participants for one event in case 1, and three

events in case 2.

Participants’ lead times during case 1 ranged from 0 to

30min (Fig. 7a). The experimental group, however,

demonstrated a tendency toward longer lead times.

With the exception of P7, all experimental participants

achieved a lead time of at least 20min, compared to just

half of the control participants. Group mean lead times

were 16.4 and 22.0min for the control and experimental

groups, respectively (Table 4). For case 2, lead time was

calculated for three events that both spatially and tem-

porally occurred close to one another. Therefore, often

one warning verified the three events. Within the ex-

perimental group, four participants achieved a lead time

of at least 20min for all three events, compared to just

one control participant (Fig. 7b). Group mean lead

APRIL 2015 BOWDEN ET AL . 397

Unauthenticated | Downloaded 02/17/22 08:19 AM UTC

times for case 2 were 16.4 and 21.8min for the control

and experimental groups, respectively.

Combining the lead time results of both cases, we find

that the control group’s mean lead time was 16.4min

compared to 21.9min for the experimental group.

Therefore, the experimental group’s mean lead time

exceeded the mean lead time of the control group by

5.5min. While this difference in mean lead time is sim-

ilar to the temporal resolution provided to the control

group, the variability among participants’ lead time re-

sults within the same group suggests that factors in

addition to temporal resolution may be important for

explaining participant performance. Additionally, the

Wilcoxon rank sumnonparametric test (Wilks 2006) was

used to assess the difference between the median lead

times of the control (17.3min) and experimental

(21.5min) groups. The test yielded a p value of 0.0252,

indicating that the difference in median lead times was

statistically significant above the 95% confidence in-

terval. Although the results from this study cannot be

generalized because of the small sample size, the per-

formance of the experimental group is encouraging and

the lead time results are in favor of the use of higher-

temporal-resolution radar data.

4. Decision types

a. Confidence-based assessment

The increased flux of information provided by PAR

raises the question of how rapidly updating radar data

will impact forecaster confidence, and what the result-

ing effects will be on the decisions that are made. To

investigate this question, the relationship between con-

fidence and correctness was assessed using a two-

dimensional testing method (Bruno 1993). Referred to

as the confidence-based assessment behavioral model,

a decision maker is required to indicate their confidence

associated with each decision on a scale ranging from

‘‘not sure’’ (0%) to ‘‘partially sure’’ (50%) to ‘‘sure’’

(100%; Fig. 3). In particular, confidence-based assess-

ment can identify three states of mind, confidence,

doubt, and ignorance (e.g., lacking knowledge), and can

help categorize decisions into four types: doubtful, un-

informed, misinformed, andmastery (Fig. 8; Bruno et al.

2006; Adams and Ewen 2009). According to Bruno et al.

(2006), doubtful decisions, although correct, lack confi-

dence and are made with hesitance. Decisions that are

both incorrect and made without confidence are un-

informed. Decisions that are incorrect yet made with

confidence aremisinformed, and perhaps are the riskiest

types of decisions. The most desirable type of decision is

mastery, which arises from smart and informed choices

that are both confident and correct.

FIG. 7. (a) Case 1 and (b) case 2 warning lead times for each

participant. The vertical dashed line separates the (left) control and

(right) experimental participants.

TABLE 4. Mean lead times for control and experimental groups

for cases 1 and 2, along with the group differences in mean lead

time.

Case 1: Mean

lead time (min)

Case 2: Mean

lead time (min)

Control 16.4 16.4

Experimental 22.0 21.8

D lead time 5 experimental

lead time2 control lead time

5.6 5.4

398 WEATHER AND FORECAST ING VOLUME 30

Unauthenticated | Downloaded 02/17/22 08:19 AM UTC

b. Categorizing decisions

When participants made a key decision (i.e., decision

to issue or update a warning), a corresponding confi-

dence rating was assigned. Since there was variability in

their confidence baselines, results (which ranged from

26% to 100%) were normalized by linear trans-

formation onto a new scale ranging from 0 to 7. Ratings

of at least 5 were considered confident decisions, since

this value indicated that the decision was closer to sure

than partially sure (i.e.,$75%). The key decisions made

during cases 1 and 2 were combined, yielding a total of

N 5 53 and 54 key decisions for the control and exper-

imental groups, respectively. Decisions were classified

as correct if the decision to issue or maintain a warning

corresponded with the occurrence of severe weather.

Similarly, decisions to not issue or to cancel a warning

were correct for instances when severe weather did not

occur.

Of these key decisions, a larger proportion of the

decisions made by the experimental group were classi-

fied as mastery (63%) compared to those of the control

group (51%). Individual participants in the experimen-

tal group made a higher number of mastery decisions

and a lower number of uninformed and misinformed

decisions compared to individual participants in the

control group (Fig. 9a). The majority of the key de-

cisions in both groups were categorized as misinformed

and mastery. This result is unsurprising since one may

expect for decisions to be made more frequently when

a decision maker is confident rather than unsure. The

Wilcoxon rank sumnonparametric test (Wilks 2006) was

used to assess the difference in the median number of

decisions made by the control and experimental groups

for all four decision types. The p values yielded were

0.862, 0.673, 0.802, and 0.325 for the doubtful, un-

informed, misinformed, and mastery decision types, re-

spectively. Therefore, although statistical significance

was not established, these results indicate that of the

four decision types, the control and experimental groups

differed most with respect to mastery decisions.

c. Explanations for decision types

Following each case, participants were questioned on

the reasons for the confidence ratings that they had

provided. The qualitative data collected from this

questioning gives insight into why doubtful, uninformed,

misinformed, and mastery decisions were made. Al-

though reasoning provided by participants varied

somewhat, common topics discussed by participants

were also found.

The control and experimental groups made five and

four decisions, respectively, that were correct but made

without confident (i.e., doubtful; Fig. 9b). Of these

hesitant decisions, the majority were made during case

2, with just one doubtful decision being recorded during

case 1 for both groups. Three control participants ex-

plained that their hesitation was due to their warning

criteria not being fully satisfied. For example, in case 1,

P3 said that she was ‘‘flirting with the criteria’’ since the

storm appeared ‘‘more marginal,’’ and P4 went ahead

with issuing a tornado warning in case 2 despite being

‘‘not sure [that the] environment was conducive’’ for

tornadogenesis. Similarly, some experimental partici-

pants found themselves making warning decisions

without confidence. During case 2, P10 questioned the

severe potential of a storm on which he had decided to

warn. His doubt arose because despite seeing that the

storm had a ‘‘good’’ and ‘‘healthy’’ core, he was ‘‘just not

sure [whether] the environment’’ was supportive of se-

vere storms. For P12 and P8, though, conflict arose as

a result of earlier warnings not being verified. For ex-

ample, P12 explained that during case 1 she wanted ‘‘any

kind of determination on previous storms.’’ Addition-

ally, P8 lacked confidence in case 2 after observing

a ‘‘downward trend in reflectivity and velocity data’’

while also having ‘‘not received reports at that time.’’

The absence of reports on storms that were already

warned on resulted in P12 and P8 being hesitant in their

subsequent warning decisions.

Decisions categorized as uninformed were made on

eight occasions in the control group and five occasions in

the experimental group (Fig. 9b). Participants that did

not make incorrect decisions without confidence (i.e.,

uninformed) also did not make incorrect decisions with

FIG. 8. The four types of decisions based on the relationship

between confidence and correctness. [Adapted from Adams and

Ewen (2009).]

APRIL 2015 BOWDEN ET AL . 399

Unauthenticated | Downloaded 02/17/22 08:19 AM UTC

confidence (i.e., doubtful). These participants are iden-

tified as P5 and P6 of the control group, and P7, P9, and

P11 of the experimental group (Fig. 9b). Of the eight

uninformed decisions recorded in the control group,

three control participants explained that they did not

have sufficient data to make a confident and informed

decision. In particular, P1 described going ‘‘off [his] gut’’

when he decided to warn during case 1,P4 projected that

a storm in case 2 would ‘‘continue to grow’’ despite

‘‘[not] having a lot of information,’’ and P2 decided to

issue a tornado warning in case 2 because she thought

that if she had waited for more information, it would

have been ‘‘too late.’’ Control participants made in-

correct decisions without confidence for other reasons

also, including the warning decision being the ‘‘first one

of the day’’ (P3; case 2), feeling that a warning could not

be canceled despite it ‘‘[not] look[ing] severe anymore’’

(P4; case 1), and maintaining a tornado warning because

it was ‘‘approaching a major interstate’’ despite having

‘‘reservations about [the] tornado aspects’’ of the storm

(P4; case 2). Experimental participants’ reasoning for

their lack of confidence varied, but unlike control par-

ticipants, their reasons were not associated with the

amount of radar data they had available. Furthermore,

all uninformed decisions made by experimental partic-

ipants were made during case 1. For P12, not having

‘‘reports of ground truth’’ led to an incorrect decision

being made without confidence on two occasions. Both

FIG. 9. The distribution of doubtful (yellow), uninformed (orange), misinformed (red), and

mastery (green) decisions made by (a) the control and experimental groups and (b) individual

participants for both cases 1 and 2. The sample size of each decision type is given for both

groups.

400 WEATHER AND FORECAST ING VOLUME 30

Unauthenticated | Downloaded 02/17/22 08:19 AM UTC

P8 and P10 reported that their lack of confidence in case

1 was due to the storm of interest appearing weaker than

a storm that they had already warned on. It was ex-

plained by P10 that, ‘‘reflectivity-wise, it did not seem as

robust as the southern storm.’’ Similarly, P8 noted that

the storm was not ‘‘as strong as the southern storm.’’ A

second decision was made by P8 in case 1 without con-

fidence as a result of observing an apparent weakening

in a storm of interest, which was evident by the ‘‘lowering

hail core to less than 20kft.’’

Misinformed decisions, which were incorrect but

made with confidence, made up the second largest

decision-type category for both the control and experi-

mental groups. Whereas all control participants made at

least one incorrect decision with confidence, only four

experimental participants did so (Fig. 9b). No key de-

cisions made by P10 or P12 were categorized as mis-

informed. Across the two cases, the experimental

group’s misinformed decisions were distributed evenly,

whereas the control group’s occurred predominantly

during case 1. Most incorrect yet confident decisions

made by the control (N 5 11 of 13) and experimental

(N 5 10 of 11) groups were made with the belief that

severe weather was a threat. Typically within the forecast

office, warning criteria are established based on experi-

ence and climatology. Many participants applied their

usual warning criteria to the storms they encountered

during these cases. For example, in case 1, P7 reported

seeing ‘‘60dBZ above 20kft,’’ which she explained ‘‘fit

[her] conceptual model for [severe] hail.’’ This warning

criterion was common among participants, because, as P9

explained during case 2, ‘‘hail is very predictable when

the core is that high.’’ However, given that warning cri-

teria are established with respect to a certain location,

participants’ warning criteriamay not have been as suited

to the environment in Oklahoma, ultimately leading to

participants making incorrect decisions with confidence.

Misinformed decisions were also recorded twice in the

control group and once in the experimental group for

participants who had decided to trim thewarning polygon

since the storm had moved ‘‘out of the county’’ (P1).

Although confidence was associated with the decision to

cancel a threat in some location, these three participants

chose to incorrectly maintain the severe threat elsewhere

in the warning polygon, resulting in false alarms at the

reidentification level.

More than half of the decisions made by both groups

fell into the mastery decision category. In total, the

control and experimental groups made 27 and 34 con-

fident and correct decisions, respectively (Fig. 9b). At

least three key decisions made by each participant were

categorized as mastery. A maximum of eight key de-

cisions were categorized as mastery for one participant

in each group (P1 and P11; Fig. 9b). Mastery decisions

were common in both cases, with approximately 40%

occurring during case 1 and 60% during case 2. Expla-

nations for confidence that was associated with correct

decisions revolved around two reasons. The first reason

was that participants compared storm characteristics on

radar. For example, in case 1, P4 noted that the severe

storm had a ‘‘much larger and deeper high-reflectivity

core’’ than other storms, and P8 described the severe

storm as being the ‘‘most intense’’ on radar. Similar to

these observations, in case 2, P6 explained that the se-

vere storm was ‘‘more impressive’’ than the storm to the

north that he had already warned on. Making compar-

ative observations of storms provided participants with

confidence in their warning decisions. This type of rea-

soning was provided on 12 occasions by the control

group, but only 4 occasions by the experimental group.

The second reason for mastery decisions was based on

perceived severe radar signatures of specific storms.

Participants observed features and trends of individual

storms that justified their warning decisions. The ex-

perimental group made confident and correct decisions

using this reasoning on 30 occasions compared to 15

occasions by the control group. One possibility that the

experimental group provided this reasoning twice as

often as the control group is that the use of rapidly

updating radar data aided experimental participants in

obtaining more-detailed observations of storms. For

example, in case 2, P7 saw that the severe storm was

‘‘increasing in intensity aloft,’’ leading to concern that

there was ‘‘precipitation loading producing high winds

near the ground.’’ Another example of specific storm

interrogation was when P8 observed that the hail core

was ‘‘[continuing] to grow on upper-level reflectivity,’’

while the ‘‘midlevel convergence signature [was] getting

stronger and stronger.’’ Examples such as these dem-

onstrate the experimental group’s ability to track in-

dividual characteristics of storms, which was sufficient

for developing an understanding of the storm dynamics

and correctly projecting the occurrence of severe

weather.

5. Discussion and summary

The purpose of the 2013 PARISE was to extend the

work of earlier experiments (Heinselman et al. 2012;

Heinselman andLaDue 2013) to investigate whether the

use of higher-temporal-resolution radar data during se-

vere hail and wind events would be beneficial to the

warning decision process of NWS forecasters. The ex-

periment design allowed for a comparison between

control and experimental participants that utilized PAR

data with temporal updates of 5 and 1min, respectively.

APRIL 2015 BOWDEN ET AL . 401

Unauthenticated | Downloaded 02/17/22 08:19 AM UTC

While working two severe hail and/or wind case studies

in simulated real time, all participants exhibited a de-

cision process that was formed of multiple components.

Observing participants detecting, identifying, and re-

identifying severe weather led to the designation of the

compound warning decision process. This process in-

troduced a new verification approach, where the accu-

racy of warning decisions was considered with respect to

the detection, identification, and reidentification of se-

vere weather. This verification approach was important

for fully understanding and comparing each group’s

performance, since warning decisions and perceived

severe weather threats changed as storms evolved with

time. Given that the elements that compose the com-

pound warning decision process are a part of real-time

warning operations, we suggest that evaluating fore-

caster warning decisions beyond the detection level may

provide a more thorough assessment of forecaster per-

formance for the duration of a severe weather event,

rather than for the initial warning decision only.

The POD and FAR statistics were calculated for all

three stages of the compound warning decision process

(Table 3). Overall, the experimental group made more

accurate warning decisions than did the control group.

Additionally, the experimental group also made more

timely warnings (Table 4). More timely warnings were

demonstrated through the significantly higher (p value50.0252) median lead time obtained by the experimen-

tal group (21.5min) compared to the control group

(17.3min). The finding that the experimental group

made more accurate and timely warning decisions was

not necessarily expected, since earlier studies have

shown that the skill of operational meteorologists did

not increase notably with increased information

(Stewart et al. 1992; Heideman et al. 1993). Research has

also shown that increasing the amount of information

a decision maker receives may increase confidence and

satisfaction, yet decrease actual performance (O’Reilly

1980). This effect was not observed during the 2013

PARISE. Rather, the experimental group performed

superiorly to the control group through the use of 1-min

radar updates.

The findings from each experiment support the use of

higher-temporal-resolution radar data during warning

operations with increased lead time being a consistent

finding through all three experiments (Heinselman et al.

2012; Heinselman and LaDue 2013). However, a limi-

tation of the 2013 PARISE, along with the 2010

and 2012 PARISE, is the sample size. Given that a to-

tal of 12 participants were recruited for each experi-

ment, and that each PARISE focused on a particular

weather threat, the generalizability of the results to the

wider forecasting community is questionable. Future

experiments should include a wider variety of cases that

together are more representative of what forecasters

encounter in the real world.

Participants’ decisions were also assessed with respect

to both confidence and correctness. Rather than simply

identifying decisions as right or wrong, the goal of this

confidence-based assessment was to categorize de-

cisions into four different types, namely, doubtful, un-

informed, misinformed, and mastery (Bruno et al. 2006;

Adams and Ewen 2009). Both groups made decisions

that fell into each category. However, while the control

group made slightly more doubtful, uninformed, and

misinformed decisions than the experimental group, the

experimental group made more mastery decisions than

the control group. Qualitative reasoning for each con-

fidence rating was important for understanding the

factors that led to each decision type. The reasons

leading to uninformed and misinformed decisions

highlight some of the limitations associated with work-

ing in a simulated environment. Not having available

radar data prior to the case start time resulted in control

participants making incorrect decisions without confi-

dence, while a change in geographic location and

therefore unsuitable application of warning criteria re-

sulted in both groups making incorrect decisions with

confidence. Avoiding limitations such as these could be

accomplished by experimenting with the use of PAR

data during real-time operations in the local forecast

office. Mastery decisions resulted from participants ei-

ther making a comparison between storms or observing

and tracking individual storm characteristics. While

both reasons explained the confident and correct de-

cisions made by the control group, the mastery decisions

in the experimental group were predominantly ex-

plained by the latter reason. As discussed previously,

LaDue et al. (2010) reported that forecasters expressed

a need for faster radar updates in order to observe

rapidly evolving weather. The qualitative reasoning

provided for mastery decisions suggests that the exper-

imental group’s ability to observe storm evolution on

a finer temporal scale was enhanced through the use of

1-min radar updates.

Acknowledgments. Thank you to the 12 NWS fore-

casters for participating in this study, to the participating

WFOs’ MICs for supporting recruitment, and to

Michael Scotten for participating in the pilot experiment.

We also thank A/V specialist James Murnan, software

expert Eddie Forren, and GIS expert Ami Arthur. Ad-

vice from committeemembers Robert Palmer andDavid

Parsons, along with insightful discussion with Harold

Brooks and Lans Rothfusz, aided the development of

this study. We are grateful to Kurt Hondl, Michael

402 WEATHER AND FORECAST ING VOLUME 30

Unauthenticated | Downloaded 02/17/22 08:19 AM UTC

Scotten, and the two anonymous reviewers for providing

comments on this paper. Funding was provided by

NOAA/Office of Oceanic and Atmospheric Research

under NOAA–University of Oklahoma Cooperative

Agreement NA11OAR4320072, U.S. Department of

Commerce.

REFERENCES

Adams, T. M., and G. W. Ewen, 2009: The importance of confi-

dence in improving educational outcomes. 25th Annual Conf.

on Distance Teaching and Learning, Madison, WI, University

of Wisconsin–Madison, 1–5.

Andra, D. L., E. M. Quoetone, and W. F. Bunting, 2002: Warning

decision making: The relative roles of conceptual models, tech-

nology, strategy, and forecaster expertise on 3 May 1999. Wea.

Forecasting, 17, 559–566, doi:10.1175/1520-0434(2002)017,0559:

WDMTRR.2.0.CO;2.

Atkins, N. T., andR.M.Wakimoto, 1991:Wet microburst activity

over the southeastern United States: Implications for

forecasting. Wea. Forecasting, 6, 470–482, doi:10.1175/

1520-0434(1991)006,0470:WMAOTS.2.0.CO;2.

Bruno, J. E., 1993: Using testing to provide feedback to support

instruction: A reexamination of the role of assessment in ed-

ucational organizations. Item Banking: Interactive Testing and

Self-Assessments, D. A. Leclercq and J. E. Bruno, Eds.,

Springer-Verlag, 190–209.

——, C. J. Smith, P. G. Engstrom, T. M. Adams, K. D. Warr, M. J.

Cushman, B. D. Webster, and F. M. Bollin, 2006: Method and

system for knowledge assessment using confidence-based

measurement. U.S. Patent 2006/0029920 A1, filed 23 July

2005, issued 9 February 2006.

Campbell, S. D., and M. A. Isaminger, 1990: A prototype micro-

burst prediction product for the Terminal Doppler Weather

Radar. Preprints, 16thConf. on Severe Local Storms,Kananaskis

Park, AB, Canada, Amer. Meteor. Soc., 393–396.

Crum, T., S. D. Smith, J. N. Chrisman, R. E. Saffle, R. W. Hall, and

R. J. Vogt, 2013:WSR-88D radar projects–Update 2013.Proc.

29th Conf. on Environmental Information Processing Tech-

nologies, Austin, TX, Amer. Meteor. Soc., 8.1. [Available

online at https://ams.confex.com/ams/93Annual/webprogram/

Paper221461.html.]

Duncan, M., 2006: A signal detection model of compound decision

tasks. Defense Research and Development Canada Tech.

Rep. TR2006–256, 56 pp.

Forsyth, D. E., and Coauthors, 2005: The National Weather Radar

Testbed (phased array). Preprints, 32nd Conf. on Radar Me-

teorology, Albuquerque, NM, Amer. Meteor. Soc., 12R.3.

[Available online at https://ams.confex.com/ams/pdfpapers/

96377.pdf.]

Fujita, T. T., and R. Wakimoto, 1983: Microbursts in JAWS de-

picted by Doppler radars, PAM and aerial photographs. Pre-

prints, 21st Conf. on Radar Meteorology, Edmonton, AB,

Canada, Amer. Meteor. Soc., 19–23.

Heideman, K. F., T. R. Stewart, W. R. Moninger, and P. Reagan-

Cirincione, 1993: The Weather Information and Skill Exper-

iment (WISE): The effect of varying levels of information on

forecast skill. Wea. Forecasting, 8, 25–36, doi:10.1175/1520-

0434(1993)008,0025:TWIASE.2.0.CO;2.

Heinselman, P. L., and S. M. Torres, 2011: High-temporal-

resolution capabilities of the National Weather Radar Testbed

Phased-Array Radar. J. Appl. Meteor. Climatol., 50, 579–593,

doi:10.1175/2010JAMC2588.1.

——, and D. S. LaDue, 2013: Supercell storm evolution observed

by forecasters using PAR data. Proc. 36th Conf. on Radar

Meteorology, Breckenridge, CO, Amer. Meteor. Soc., 3B.4.

[Available online at https://ams.confex.com/ams/36Radar/

webprogram/Paper228747.html.]

——, D. L. Priegnitz, K. L. Manross, T. M. Smith, and R. W.

Adams, 2008: Rapid sampling of severe storms by theNational

Weather Radar Testbed Phased Array Radar. Wea. Fore-

casting, 23, 808–824, doi:10.1175/2008WAF2007071.1.

——, D. S. LaDue, and H. Lazrus, 2012: Exploring impacts of

rapid-scan radar data on NWS warning decisions. Wea. Fore-

casting, 27, 1031–1044, doi:10.1175/WAF-D-11-00145.1.

Jacoby, J., T. Troutman, A. Kuss, and D. Mazursky, 1986: Expe-

rience and expertise in complex decision making. Adv. Con-

sum. Res., 13, 469–472.

Kelly, D. L., J. T. Schaefer, and C. A. Doswell III, 1985: Clima-

tology of nontornadic severe thunderstorm events in the

United States. Mon. Wea. Rev., 113, 1997–2014, doi:10.1175/

1520-0493(1985)113,1997:CONSTE.2.0.CO;2.

LaDue, D. S., P. L. Heinselman, and J. F. Newman, 2010: Strengths

and limitations of current radar systems for two stakeholder

groups in the southern plains. Bull. Amer. Meteor. Soc., 91,

899–910, doi:10.1175/2009BAMS2830.1.

McLachlan, G. J., 1999: Mahalanobis distance. Resonance, 4, 20–26, doi:10.1007/BF02834632.

NOAA, 2011: Verification. NWS Rep. NWSI 10-51601, 100 pp.

[Available online at http://www.nws.noaa.gov/directives/sym/

pd01016001curr.pdf.]

Obuchowski, N. A., M. L. Lieber, and K. A. Powell, 2000: Data

analysis for detection and localization of multiple abnormali-

ties with application to mammography. Acad. Radiol., 7, 516–

525, doi:10.1016/S1076-6332(00)80324-4.

O’Reilly, C. A., 1980: Individuals and information overload in or-

ganizations: Is more necessarily better? Acad. Manage. J., 23,

684–696, doi:10.2307/255556.

Ortega, K. L., T. M. Smith, K. L. Manross, K. A. Scharfenberg,

A. Witt, A. G. Kolodziej, and J. J. Gourley, 2009: The Severe

Hazards Analysis and Verification Experiment. Bull. Amer.

Meteor. Soc., 90, 1519–1530, doi:10.1175/2009BAMS2815.1.

Roberts, R. D., and J. W. Wilson, 1989: A proposed microburst

nowcasting procedure using single-Doppler radar. J. Appl.

Meteor., 28, 285–303, doi:10.1175/1520-0450(1989)028,0285:

APMNPU.2.0.CO;2.

Saffle, R. E., M. J. Istok, and G. Cate, 2009: NEXRAD product

improvement—Update 2009. 25th Conf. on Interactive In-

formation and Processing Systems (IIPS) for Meteorology,

Oceanography, and Hydrology, Phoenix, AZ, Amer. Meteor.

Soc., 10B.1. [Available online at https://ams.confex.com/ams/

pdfpapers/147971.pdf.]

Stewart, R. T., K. F. Heideman, W. R. Moninger, and P. Reagan-

Cirincione, 1992: Effects of improved information on

the components of skill in weather forecasting. Organ.

Behav. Hum. Decis. Processes, 53, 107–134, doi:10.1016/

0749-5978(92)90058-F.

Torres, S. M., and Coauthors, 2012: ADAPTS Implementation:

Can we exploit phased-array radar’s electronic beam steering

capabilities to reduce update time? Extended Abstract, 28th

Conf. on Interactive Information and Processing Systems (IIPS)

for Meteorology, Oceanography, and Hydrology, New Orleans,

LA, Amer. Meteor. Soc., 6B.3. [Available online at https://ams.

confex.com/ams/92Annual/webprogram/Paper196416.html.]

APRIL 2015 BOWDEN ET AL . 403

Unauthenticated | Downloaded 02/17/22 08:19 AM UTC

Trapp, R. J., D. M. Wheatley, N. T. Atkins, R. W. Przybylinski,

and R. Wolf, 2006: Buyer beware: Some words of caution on

the use of severe wind reports in postevent assessment and

research. Wea. Forecasting, 21, 408–415, doi:10.1175/

WAF925.1.

Whiton, R. C., P. L. Smith, S. G. Bigler, K. E. Wilk, and A. C.

Harbuck, 1998: History of operational use of weather radar by

U.S. weather services. Part II: Development of operational

Doppler weather radars. Wea. Forecasting, 13, 244–252,

doi:10.1175/1520-0434(1998)013,0244:HOOUOW.2.0.CO;2.

Wilks, D. S., 2006: Statistical Methods in the Atmospheric Sciences.

2nd ed. Academic Press, 467 pp.

Witt, A., M. D. Eilts, G. J. Stumpf, E. D. Mitchell, J. T. Johnson,

and K. W. Thomas, 1998: Evaluating the performance of

WSR-88D severe storm detection algorithms. Wea. Fore-

casting, 13, 513–518, doi:10.1175/1520-0434(1998)013,0513:

ETPOWS.2.0.CO;2.

Zrni�c, D. S., and Coauthors, 2007: Agile beam phased array radar

for weather observations. Bull. Amer. Meteor. Soc., 88, 1753–

1766, doi:10.1175/BAMS-88-11-1753.

404 WEATHER AND FORECAST ING VOLUME 30

Unauthenticated | Downloaded 02/17/22 08:19 AM UTC