vision-based modal analysis using multiple video segmentssstl.cee.illinois.edu/papers/hoskere et al...
TRANSCRIPT
Student Paper Competition of ASCE EMI SHMC Committee 1
Vision-based modal analysis using multiple video segments
Vedhus Hoskere1, JongWoong Park2, Hyungchul Yoon3, and Billie F. Spencer Jr.4 1Department of Civil and Environmental Engineering, University of Illinois, Urbana-Champaign, IL,
[email protected] 2School of Civil and Environmental Engineering, Urban Design and Studies, Chung-Ang University, Seoul, Republic
of Korea, [email protected] 3Department of Civil and Environmental Engineering, Michigan Technological University, Houghton, MI,
[email protected] 4Department of Civil and Environmental Engineering, University of Illinois, Urbana-Champaign, IL,
ABSTRACT: Computer vision techniques for extracting structural displacements from video are
increasingly gaining acceptance for the purposes of system identification and structural health monitoring.
However, the application of video based techniques for modal analysis of full-scale civil infrastructure has
been limited, because the displacements extracted are only useful so long as the pixel displacement is
sufficiently large, i.e., above the camera noise floor. Hence, obtaining measurements of all points on a large
structure with a single video frame may not be feasible. In this study, a new method is presented to facilitate
the automated extraction of mode shapes of full-scale civil infrastructure from video segments obtained by
panning a camera across the structure. A preliminary evaluation of the method is presented using a 6 story
shear building model excited on a shake table. Additionally, the potential of using video from a camera
enabled unmanned aerial vehicle (UAV) is presented.
INTRODUCTION
Much of the critical infrastructure that serves our society today has been erected several decades ago -
currently living well past their design periods; this is true for bridges, dams, highways, lifeline systems and
even buildings. The economic implications of repair thus necessitate the need for a systematic
understanding of the current state of infrastructure and have advanced research in technologies to enable
structural health monitoring (SHM) over the last two decades. System identification and modal analysis are
powerful tools for SHM as they provide valuable information about the dynamic properties of structural
systems. Traditionally, system identification has been accomplished using wired or wireless accelerometers
or strain gages. Vision based techniques offer the advantages of non-contact methods over these traditional
approaches.
Recently, computer vision techniques for extracting structural displacements of civil infrastructure from
video have gained increased acceptance. Combined with the proliferation of low cost cameras in the market
and increased computation capabilities, video-based methods have become convenient approaches for
displacement measurement in structures. Several algorithms are available to accomplish displacement
extraction, working in principle by template matching or by tracking either contours of constant phase or
intensity through time. Early application of these algorithms focused on natural frequency estimation
(Nogueira et al. 2005), displacement measurement (Chang and Xiao 2010), (Fukuda et al. 2013) , etc. In
the last few years, the application of these algorithms has been extended to system identification of
laboratory structures. Schumacher and Shariati (2013), introduced the concept of a virtual visual sensor that
could be used for modal analysis of a structure. Yoon et al. (2016) implemented a KLT tracker to identify
a model for a laboratory-scale six-story building model. Cha et al.(2017) used a phase based approach
together with unscented Kalman filters for system identification using noisy displacement measurements.
The efficacy of computer vision techniques for system identification of laboratory structures presented thus
far is limited, because they not scalable to full-scale civil infrastructure for a number of reasons. Firstly,
since accurate displacement measurement requires that the pixel amplitude of the displacements are large
enough to be measured, measurement of displacement with the entire structure in a single video frame will
likely not be possible. Secondly, because civil engineering structures are typically large, recording
Student Paper Competition of ASCE EMI SHMC Committee 2
vibrations from one end of the structure can cause perspective distortion. Thirdly, most computer vision
techniques for system identification require user input to specify the regions of interest from which to
extract the displacements. This manual process tends to lower repeatability and induce human error when
conducting system identification.
This paper addresses three key issues by combining multiple video
segments of the vibrating structure. The first issues is resolved by a
panning camera using a divide and conquer strategy to capture
motion of a specified area of interest at each stage - allowing for a
larger pixel displacement and signal-to-noise ration. The second issue
is the proposed method eliminates the effect of perspective distortion
to obtain the true displacements of the structure. In addition, the use
of a UAV is presented as an option to get close to the regions of
interest, thereby circumventing the perspective problem all together.
Finally, a method to automate the selection of regions of interest is
presented to allow for and maintain good repeatability.
PROPOSED PIPELINE
An automated pipeline is presented relying on passive fiducial markers installed on the structure. A fiducial
marker system is composed by a set of predesigned markers and an algorithm performing its detection.
Each marker represents a natural number so they can denote information, such as the location at which they
are placed. This capability is particularly useful while using video segments for modal analysis of civil
infrastructure, because civil infrastructure is usually highly regular, making determination of which part of
the structure the camera is focused on in any given frame difficult. These markers were originally designed
to be used in augmented reality applications to generate highly reliable camera pose estimation (Garrido-
Jurado et al. 2014). This system was incorporated on the test structure to demonstrate the applicability to
SHM. For tracking applications where automated target recognition is desired, markers provide a highly
robust and reliable solution. Because the size and shape of the fiducial marker is known precisely, we also
use them to remove the perspective distortion and compute an accurate scale factor for every frame and
show that the entire process of system identification from videos can be automated.
The pipeline is summarized in Figure 2. Each frame in the input video is calibrated using the precomputed
intrinsic camera matrix to remove distortion. The calibrated image is then fed into the marker detector
which scans the entire frame for known fiducial markers. Each marker represents a different location on
Fiducial Marker detection
New Markers Found? -
Detect Features
Increment
Frame KLT
Tracking
Cross
Correlations
Eigen system
Realization
Algorithm
𝜙Ω𝑚 = ራ𝑅𝑖𝜙Ω𝑖
𝑚
𝑛
𝑖=1
Union of local modes
using OLS fit
Global Mode
Shapes
Automated Displacement
Extraction Local Mode Shapes Global Mode Shapes
Track features
Remove Perspective
Compute Scale
{
Input
Video
Camera
Calibration
Figure 2. Proposed pipeline for automated mode shape extraction using video segments
Figure 1. Schematic of a UAV capturing
video segments of a structure part by part.
Student Paper Competition of ASCE EMI SHMC Committee 3
the structure. If new markers are found compared to the previous frame, the bounding boxes around these
markers are passed to the next step where tracking takes place. In the tracking step, a Harris corner detector
(Harris and Stephens 1988) is used to identify features which are then tracked using the KLT algorithm
(Tomasi 1991). The displacements are obtained by tracking the centroid of a tomography-fit bounding box
between each set of points in consecutive frames. The projection of the bounding box onto the horizontal
and vertical axis is computed. Because the size and shape of the bounding box is known (as it is a fiducial
marker), the scale can be computed readily, and the perspective effect can be removed using the angles of
the bounding box lines with the axes. Once the displacements are extracted, the NExT ERA (James III et
al. 1993) method is employed to generate local mode shapes. Other modal analysis methods may be used,
but the ERA based method was preferred to facilitate automation of the entire process. Further, for use with
UAVs, the correlations of displacements are more reliable, because they are less effected by the motion of
the UAVs, thus NExT ERA, which uses correlations as inputs, is preferred. Local mode shapes are then
combined by the method outlined in (Sim et al. 2009) to obtain the global mode shapes; this approach
essentially takes the union of all the local modes with a least squares fit at the overlapping portions.
ANALYSIS AND RESULTS
Experimental Setup
To test the proposed pipeline, a 6 story shear building
model was tested. The structure was affixed with one
fiducial marker on each floor and subject to band- limited
white noise excitation via a uniaxial shaking table. Two
cameras were used in the experiment – (i) Nikon D3300
DSLR (ii) DJI Phantom 3 Professional camera. Two
different tests were conducted. In the first test, the
structure’s motion was captured only by the ground camera
(Nikon D3300) with 3 floors in frame, and then panning one
floor up every 90s. In the second test, a reference video was
obtained with the camera zoomed out to have the entire
structure in view. In addition, the UAV was flown to record
the motion by panning, similar to the first test.
Results
The proposed approach was evaluated by comparing the results from video segments to those from the
video of the entire structure. The results include estimated natural frequencies and mode shapes. In the case
of the segmented videos, the natural frequencies were obtained by averaging the natural frequencies from
each of the segments. The auto-power spectrum of the first floor extracted from the ground camera and the
UAV are also presented for reference. The last two modes are not shown for the UAV, because their
frequencies are close to the Nyquist frequency of the Phantom 3 Pro camera, thus making them unreliable.
Figure 5. Extracted Mode Shapes
Figure 4. Power spectral density of first floor
displacements. Low frequency hover of the UAV is
visible and natural frequency peaks match well.
Figure 3. Experimental Setup
Nikon D3300
1920x1080 @
60Hz
DJI Phantom 3 Pro
2704x1520 @ 30Hz
6 Story Shear
Building
Model
Student Paper Competition of ASCE EMI SHMC Committee 4
CONCLUSIONS
A new method was presented for automated extraction of displacements and mode shapes from structures
using multiple video segments. Apart from allowing for automation, the main benefit of the method is to
leverage the camera resolution to the maximum extent possible by focusing on segments of the structure at
a time. Preliminary evaluation has shown that the method can produce highly accurate results. The method
was evaluated using both a ground camera and a UAV. Future work that is currently being undertaken
includes application of the proposed method to a full-scale bridge.
REFERENCES
Cha, Y.-J., Chen, J. G., and Büyüköztürk, O. (2017). “Output-only computer vision based damage
detection using phase-based optical flow and unscented Kalman filters.”
Chang, C. C., and Xiao, X. H. (2010). “An integrated visual-inertial technique for structural displacement
and velocity measurement.” Smart Structures and Systems, 6(9), 1025–1039.
Fukuda, Y., Feng, M. Q., Narita, Y., Kaneko, S., and Tanaka, T. (2013). “Vision-based displacement
sensor for monitoring dynamic response using robust object search algorithm.” IEEE Sensors
Journal, 13(12), 4725–4732.
Garrido-Jurado, S., Muñoz-Salinas, R., Madrid-Cuevas, F. J., and Marín-Jiménez, M. J. (2014).
“Automatic generation and detection of highly reliable fiducial markers under occlusion.” Pattern
Recognition, 47(6), 2280–2292.
Harris, C., and Stephens, M. (1988). “A Combined Corner and Edge Detector.” Procedings of the Alvey
Vision Conference 1988, 147–151.
James III, G. H., Carne, T. G., Lauffer, P., J., and Lauffer, J. P. (1993). “The Natural Excitation
Technique (NExT) for Modal Parameter Extraction From Operating Wind Turbines.” The
International Journal of Analytical and Experimental Modal Analysis, 10(4), 260–277.
Nogueira, F. M. a, Barbosa, F. S., and Barra, L. P. S. (2005). “Evaluation of structural natural frequencies
using image processing.” EVACES 2005 - Experimental Vibration Analysis For Civil Engineering
Structures, 1–7.
Schumacher, T., and Shariati, A. (2013). “Monitoring of Structures and Mechanical Systems Using
Virtual Visual Sensors for Video Analysis: Fundamental Concept and Proof of Feasibility.” Sensors,
Multidisciplinary Digital Publishing Institute, 13(12), 16551–16564.
Sim, S. H., Jr, B. F. S., Zhang, M., and Xie, H. (2009). “Automated decentralized modal analysis using
smart sensors.” Structural Control and Health Monitoring, 9999(9999), n/a.
Tomasi, C. (1991). “Detection and Tracking of Point Features.” School of Computer Science, Carnegie
Mellon Univ., 91(April), 1–22.
Yoon, H., Elanwar, H., Choi, H., Golparvar-Fard, M., and Spencer, B. F. (2016). “Target-free approach
for vision-based structural system identification using consumer-grade cameras.” Structural Control
and Health Monitoring.
Natural Frequency (Hz) % Error in natural frequency MAC
Reference Ground Pan UAV Pan Ground Pan UAV Pan Ground Pan UAV Pan
Mode 1 1.583 1.582 1.583 0.082 0.012 0.999 0.998
Mode 2 4.920 4.932 4.925 0.241 0.105 0.999 0.996
Mode 3 7.944 7.985 7.977 0.527 0.424 0.998 0.999
Mode 4 10.733 10.702 10.700 0.287 0.308 1.000 0.996