vision-based modal analysis using multiple video segmentssstl.cee.illinois.edu/papers/hoskere et al...

Student Paper Competition of ASCE EMI SHMC Committee 1

Vision-based modal analysis using multiple video segments

Vedhus Hoskere1, JongWoong Park2, Hyungchul Yoon3, and Billie F. Spencer Jr.4 1Department of Civil and Environmental Engineering, University of Illinois, Urbana-Champaign, IL,

[email protected] 2School of Civil and Environmental Engineering, Urban Design and Studies, Chung-Ang University, Seoul, Republic

of Korea, [email protected] 3Department of Civil and Environmental Engineering, Michigan Technological University, Houghton, MI,

[email protected] 4Department of Civil and Environmental Engineering, University of Illinois, Urbana-Champaign, IL,

[email protected]

ABSTRACT: Computer vision techniques for extracting structural displacements from video are

increasingly gaining acceptance for the purposes of system identification and structural health monitoring.

However, the application of video based techniques for modal analysis of full-scale civil infrastructure has

been limited, because the displacements extracted are only useful so long as the pixel displacement is

sufficiently large, i.e., above the camera noise floor. Hence, obtaining measurements of all points on a large

structure with a single video frame may not be feasible. In this study, a new method is presented to facilitate

the automated extraction of mode shapes of full-scale civil infrastructure from video segments obtained by

panning a camera across the structure. A preliminary evaluation of the method is presented using a 6 story

shear building model excited on a shake table. Additionally, the potential of using video from a camera

enabled unmanned aerial vehicle (UAV) is presented.

INTRODUCTION

Much of the critical infrastructure that serves our society today has been erected several decades ago -

currently living well past their design periods; this is true for bridges, dams, highways, lifeline systems and

even buildings. The economic implications of repair thus necessitate the need for a systematic

understanding of the current state of infrastructure and have advanced research in technologies to enable

structural health monitoring (SHM) over the last two decades. System identification and modal analysis are

powerful tools for SHM as they provide valuable information about the dynamic properties of structural

systems. Traditionally, system identification has been accomplished using wired or wireless accelerometers

or strain gages. Vision based techniques offer the advantages of non-contact methods over these traditional

approaches.

Recently, computer vision techniques for extracting structural displacements of civil infrastructure from

video have gained increased acceptance. Combined with the proliferation of low cost cameras in the market

and increased computation capabilities, video-based methods have become convenient approaches for

displacement measurement in structures. Several algorithms are available to accomplish displacement

extraction, working in principle by template matching or by tracking either contours of constant phase or

intensity through time. Early application of these algorithms focused on natural frequency estimation

(Nogueira et al. 2005), displacement measurement (Chang and Xiao 2010), (Fukuda et al. 2013) , etc. In

the last few years, the application of these algorithms has been extended to system identification of

laboratory structures. Schumacher and Shariati (2013), introduced the concept of a virtual visual sensor that

could be used for modal analysis of a structure. Yoon et al. (2016) implemented a KLT tracker to identify

a model for a laboratory-scale six-story building model. Cha et al.(2017) used a phase based approach

together with unscented Kalman filters for system identification using noisy displacement measurements.

The efficacy of computer vision techniques for system identification of laboratory structures presented thus

far is limited, because they not scalable to full-scale civil infrastructure for a number of reasons. Firstly,

since accurate displacement measurement requires that the pixel amplitude of the displacements are large

enough to be measured, measurement of displacement with the entire structure in a single video frame will

likely not be possible. Secondly, because civil engineering structures are typically large, recording


vibrations from one end of the structure can cause perspective distortion. Thirdly, most computer vision

techniques for system identification require user input to specify the regions of interest from which to

extract the displacements. This manual process tends to lower repeatability and induce human error when

conducting system identification.

This paper addresses three key issues by combining multiple video

segments of the vibrating structure. The first issues is resolved by a

panning camera using a divide and conquer strategy to capture

motion of a specified area of interest at each stage - allowing for a

larger pixel displacement and signal-to-noise ration. The second issue

is the proposed method eliminates the effect of perspective distortion

to obtain the true displacements of the structure. In addition, the use

of a UAV is presented as an option to get close to the regions of

interest, thereby circumventing the perspective problem all together.

Finally, a method to automate the selection of regions of interest is

presented to allow for and maintain good repeatability.

PROPOSED PIPELINE

An automated pipeline is presented relying on passive fiducial markers installed on the structure. A fiducial

marker system is composed by a set of predesigned markers and an algorithm performing its detection.

Each marker represents a natural number so they can denote information, such as the location at which they

are placed. This capability is particularly useful while using video segments for modal analysis of civil

infrastructure, because civil infrastructure is usually highly regular, making determination of which part of

the structure the camera is focused on in any given frame difficult. These markers were originally designed

to be used in augmented reality applications to generate highly reliable camera pose estimation (Garrido-

Jurado et al. 2014). This system was incorporated on the test structure to demonstrate the applicability to

SHM. For tracking applications where automated target recognition is desired, markers provide a highly

robust and reliable solution. Because the size and shape of the fiducial marker is known precisely, we also

use them to remove the perspective distortion and compute an accurate scale factor for every frame and

show that the entire process of system identification from videos can be automated.

The pipeline is summarized in Figure 2. Each frame in the input video is calibrated using the precomputed

intrinsic camera matrix to remove distortion. The calibrated image is then fed into the marker detector

which scans the entire frame for known fiducial markers. Each marker represents a different location on

Fiducial Marker detection

New Markers Found? -

Detect Features

Increment

Frame KLT

Tracking

Cross

Correlations

Eigen system

Realization

Algorithm

𝜙Ω𝑚 = ራ𝑅𝑖𝜙Ω𝑖

𝑚

𝑛

𝑖=1

Union of local modes

using OLS fit

Global Mode

Shapes

Automated Displacement

Extraction Local Mode Shapes Global Mode Shapes

Track features

Remove Perspective

Compute Scale

{

Input

Video

Camera

Calibration

Figure 2. Proposed pipeline for automated mode shape extraction using video segments

Figure 1. Schematic of a UAV capturing

video segments of a structure part by part.


the structure. If new markers are found compared to the previous frame, the bounding boxes around these

markers are passed to the next step where tracking takes place. In the tracking step, a Harris corner detector

(Harris and Stephens 1988) is used to identify features which are then tracked using the KLT algorithm

(Tomasi 1991). The displacements are obtained by tracking the centroid of a tomography-fit bounding box

between each set of points in consecutive frames. The projection of the bounding box onto the horizontal

and vertical axis is computed. Because the size and shape of the bounding box is known (as it is a fiducial

marker), the scale can be computed readily, and the perspective effect can be removed using the angles of

the bounding box lines with the axes. Once the displacements are extracted, the NExT ERA (James III et

al. 1993) method is employed to generate local mode shapes. Other modal analysis methods may be used,

but the ERA based method was preferred to facilitate automation of the entire process. Further, for use with

UAVs, the correlations of displacements are more reliable, because they are less effected by the motion of

the UAVs, thus NExT ERA, which uses correlations as inputs, is preferred. Local mode shapes are then

combined by the method outlined in (Sim et al. 2009) to obtain the global mode shapes; this approach

essentially takes the union of all the local modes with a least squares fit at the overlapping portions.

ANALYSIS AND RESULTS

Experimental Setup

To test the proposed pipeline, a 6 story shear building

model was tested. The structure was affixed with one

fiducial marker on each floor and subject to band- limited

white noise excitation via a uniaxial shaking table. Two

cameras were used in the experiment – (i) Nikon D3300

DSLR (ii) DJI Phantom 3 Professional camera. Two

different tests were conducted. In the first test, the

structure’s motion was captured only by the ground camera

(Nikon D3300) with 3 floors in frame, and then panning one

floor up every 90s. In the second test, a reference video was

obtained with the camera zoomed out to have the entire

structure in view. In addition, the UAV was flown to record

the motion by panning, similar to the first test.

Results

The proposed approach was evaluated by comparing the results from video segments to those from the

video of the entire structure. The results include estimated natural frequencies and mode shapes. In the case

of the segmented videos, the natural frequencies were obtained by averaging the natural frequencies from

each of the segments. The auto-power spectrum of the first floor extracted from the ground camera and the

UAV are also presented for reference. The last two modes are not shown for the UAV, because their

frequencies are close to the Nyquist frequency of the Phantom 3 Pro camera, thus making them unreliable.

Figure 5. Extracted Mode Shapes

Figure 4. Power spectral density of first floor

displacements. Low frequency hover of the UAV is

visible and natural frequency peaks match well.

Figure 3. Experimental Setup

Nikon D3300

1920x1080 @

60Hz

DJI Phantom 3 Pro

2704x1520 @ 30Hz

6 Story Shear

Building

Model


CONCLUSIONS

A new method was presented for automated extraction of displacements and mode shapes from structures

using multiple video segments. Apart from allowing for automation, the main benefit of the method is to

leverage the camera resolution to the maximum extent possible by focusing on segments of the structure at

a time. Preliminary evaluation has shown that the method can produce highly accurate results. The method

was evaluated using both a ground camera and a UAV. Future work that is currently being undertaken

includes application of the proposed method to a full-scale bridge.

REFERENCES

Cha, Y.-J., Chen, J. G., and Büyüköztürk, O. (2017). “Output-only computer vision based damage

detection using phase-based optical flow and unscented Kalman filters.”

Chang, C. C., and Xiao, X. H. (2010). “An integrated visual-inertial technique for structural displacement

and velocity measurement.” Smart Structures and Systems, 6(9), 1025–1039.

Fukuda, Y., Feng, M. Q., Narita, Y., Kaneko, S., and Tanaka, T. (2013). “Vision-based displacement

sensor for monitoring dynamic response using robust object search algorithm.” IEEE Sensors

Journal, 13(12), 4725–4732.

Garrido-Jurado, S., Muñoz-Salinas, R., Madrid-Cuevas, F. J., and Marín-Jiménez, M. J. (2014).

“Automatic generation and detection of highly reliable fiducial markers under occlusion.” Pattern

Recognition, 47(6), 2280–2292.

Harris, C., and Stephens, M. (1988). “A Combined Corner and Edge Detector.” Procedings of the Alvey

Vision Conference 1988, 147–151.

James III, G. H., Carne, T. G., Lauffer, P., J., and Lauffer, J. P. (1993). “The Natural Excitation

Technique (NExT) for Modal Parameter Extraction From Operating Wind Turbines.” The

International Journal of Analytical and Experimental Modal Analysis, 10(4), 260–277.

Nogueira, F. M. a, Barbosa, F. S., and Barra, L. P. S. (2005). “Evaluation of structural natural frequencies

using image processing.” EVACES 2005 - Experimental Vibration Analysis For Civil Engineering

Structures, 1–7.

Schumacher, T., and Shariati, A. (2013). “Monitoring of Structures and Mechanical Systems Using

Virtual Visual Sensors for Video Analysis: Fundamental Concept and Proof of Feasibility.” Sensors,

Multidisciplinary Digital Publishing Institute, 13(12), 16551–16564.

Sim, S. H., Jr, B. F. S., Zhang, M., and Xie, H. (2009). “Automated decentralized modal analysis using

smart sensors.” Structural Control and Health Monitoring, 9999(9999), n/a.

Tomasi, C. (1991). “Detection and Tracking of Point Features.” School of Computer Science, Carnegie

Mellon Univ., 91(April), 1–22.

Yoon, H., Elanwar, H., Choi, H., Golparvar-Fard, M., and Spencer, B. F. (2016). “Target-free approach

for vision-based structural system identification using consumer-grade cameras.” Structural Control

and Health Monitoring.

Natural Frequency (Hz) % Error in natural frequency MAC

Reference Ground Pan UAV Pan Ground Pan UAV Pan Ground Pan UAV Pan

Mode 1 1.583 1.582 1.583 0.082 0.012 0.999 0.998

Mode 2 4.920 4.932 4.925 0.241 0.105 0.999 0.996

Mode 3 7.944 7.985 7.977 0.527 0.424 0.998 0.999

Mode 4 10.733 10.702 10.700 0.287 0.308 1.000 0.996

vision-based modal analysis using multiple video segmentssstl.cee.illinois.edu/papers/hoskere et al...

Documents