real-time performance improvement and...
TRANSCRIPT
BSA improvement plan separate out time-critical and non-time-critical parts, and queue the non-time-critical
processing into a lower priority thread
implement thread level code instead of the record based processing to avoid overhead
and performance dragging of record processing.
Lessons from Prototype test prototype shows improvement
epics record global locking and lockset issue
need more test with the EPICS snapshot version for multi-core and parallel callbacks
K. H. Kim#, S. Allison, T. Straumann, E. Williams SLAC National Accelerator Laboratory, Menlo Park, CA 94025, U. S. A.
Abstract
Real-Time Performance Improvement and
Consideration of Parallel Processing for
Beam Synchronous Acquisition (BSA)*
Beam Synchronous Acquisition (BSA)
Definition / Requirements
BSA Improvement Plan
Summary and Discussion
*Work supported by the the U.S. Department of Energy, Office of Science
under Contract DE-AC02-76SF00515 for LCLS I and LCLS II. #[email protected]
Acquire all beam-dependent scalars across multiple IOCs on the same pulse
over multiple pulses of a certain kind up to 120Hz
Acquire up to 2800 values per scalar in one acquisition
Each value of the 2800 values can be an average/RMS of up to 1000 values
Configure and process 20 independent acquisitions simultaneously
Each acquisition request can specify: beam code (defines project)
machine conditions of interest – rate, timeslot, permit, and etc
BSA record processing
Improvement Required
BSA failures in operation
Beam Synchronous Acquisition (BSA) provides a common infrastructure
for aligning data to each individual beam pulse, as required by the Linac
Coherent Light Source (LCLS). BSA allows 20 independent acquisitions
simultaneously for the entire LCLS facility and is used extensively for beam
physics, machine diagnostics and operation. BSA is designed as a part of LCLS
timing system and is currently an EPICS record based implementation, allowing
timing receiver EPICS applications to easily add BSA functionality to their own
record processing. However the lack of real-time performance of EPICS record
processing and the increasing number of BSA devices has brought real-time
performance issues. We think the major reason for the performance problem is
due to the lack of separation between time-critical BSA upstream processing and
non-critical downstream processing. We are improving BSA with thread level
programming, breaking the global lock in each BSA device, adding a queue
between upstream and downstream processing, and moving out the non-critical
downstream to a lower priority worker thread. We are also investigating the use
of multiple worker threads for parallel processing in Symmetric Multi-Processor
(SMP) system.
BSA operation
Acquisition EDEF setup and request activation, done on the EVG IOC
360Hz checking on the EVG IOC with user notification when finished
360Hz requests (acquisition control) sent by the EVG IOC to all EVR IOCs
via timing fiber optical link
Data checking, averaging, and array update per scalar record per request on
the EVR IOCs
Figure 1: Data Acquisition across IOCs
Figure 2: BSA EDEF reservation, mask setup, and control screens
360Hz Task in EVG IOC
The 360Hz event task wakes up on interrupt which indicates timeslot change
on the AC powerline, and performs the following steps: Generates 196 bits timing pattern using pre-programmed scenario
Moves the timing pattern pipeline
Matches pattern for event codes and program the EVG sequence ram for next timeslot
Matches pattern for BSA
BSA Pattern Matching Checks for a match with the new pulse’s pattern, beam code and each EDEFs
Counts the number of measurements and number of values for the current average
per EDEF
The timing pattern is pipelined –the new pattern is actually for 3 pulses ahead
Figure 3: 360Hz tasks in EVG and EVR
360Hz Task in EVR IOC
EDEF bit masks included in 360Hz data sent by EVG to EVR
EVR IOC caches the data on data interrupt
EVR IOC 360Hz event task is activated on the next fiducial interrupt Copies pattern data to the end of pipeline
Moves the pipeline
Updates EDEF tables after the pipeline update preparing for BSA processing
done later in the same pulse
Figure 4: 360Hz tasks and EDEF table update
Data source PV triggers BSA record processing, after the EDEF table
updating by the 360Hz processing
The BSA record processing performs following: Matches timestamp to decide if the data is used in the BSA buffer
Calculates RMS and Average, if required
Checks Severity
Process BSA buffer (compress record)
Updates BSA management related information
Figure 5: BSA record processing
Figure 6: BSA fail scenario
LCLS-I has total ~1,150 BSA PVs ~982 PVs succeed for BSA (85.4%)
but, ~168 PVs fail (14.6%)
The reason for failure varies and depends on the situations misconfigured data source PVs
wrong trigger rates
data is ready too late due to slow device and communication, and does not meet
the time budget for BSA
Figure 6: Typical BSA fail scenario with unexpected delay,
resulting in wrong timestamp in record
Improvement Plan
Separate BSA processing into two parts: time-critical and non-time-critical
Provide C/C++ APIs for data source
device/driver code directly calls the API to put data into BSA
keep data receptor (record interface) for backward compatibility
The time-critical part (BSA upstream processing) is done by device/driver
context in the API method or it is done by high-priority callback thread in
the record interface method.
The BSA upstream processing quickly do the timestamp checks and if the
data matches, queues up the information for the lower priority agent thread
The lower priority agent thread processes the non-time-critical part (BSA
downstream processing) whenever it gets CPU time.
BSA fail scenario
BSA is based on epics record processing and is driven by the data source PV
Both time-critical and non-time-critical parts are processed by a high priority
callback in epics IOC: time-critical part: gather data from data source
and compare timestamp with EDEF table
non-time-critical part: calculate RMS, average, maintain counters,
and post out BSA buffers to epics PVs
The high priority callback is a single thread in epics IOC, the non-time-
critical part and other record processing, adds a delay on the time-critical part
The delay could make BSA fail due to timestamp mismatching.
The overloaded high priority callback thread could also delay the data source
PV processing and result it a late timestamp for the data.
Consideration for multi-core platform and parallel callback
for new EPICS base
Figure 7: Processing timeline for BSA improvement (for single core system)
Figure 8: Consideration for parallel callbacks for multi-core system
The improvement allows to make multiple agent threads which waits a
same queue to get BSA request from the upstream BSA processing. It
allows parallel processing for downstream part in the multi-core system
The improvement also allow to use parallel callbacks in new epics base
release. It allows parallel processing for the BSA upstreaming processing
Prototyping and Preliminary Test Results
Quick prototype code record processing based code
separates the upstream processing and the down stream processing
high priority callback processes the upstream part and queues downstream processing to
the lower priority agent thread.
Test for single-core system MVME6100/RTEMS/epics-R3.15.1
stress test under 100% CPU load (lowest priority dummy thread runs an infinite loop)
find spending ~ 5 usec for time-critical part and ~ 10 usec for non-time-critical
for each BSA cycle
capable for 64 BSA PVs @ 120Hz for 8 EDEFs (~61k BSA acquisition per sec)
improve almost x2 capacity compare to the original
Test for multi-core system x86/LinuxRT/epics-R3.15.1
stress test under 100% CPU load by lower priority scan threads run dummy
record processing
apply parallel callbacks and multiple agent threads
BSA capacity saturated around x2 compare to the original even having a large number
of cores and parallel callback threads
find there are record global locking and lock set issue to improve epics base itself
request EPICS core development team to solve the global locking and lock set issues.
Ongoing BSA test with the EPICS base snapshot version