icce_virtual3dshield

Virtual 3D Shield for Asset Protection

Daniel Moldovan, Oliver Zendel and Christian ZinnerDepartment of Safety & Security

AIT Austrian Institute of Technology, Vienna, [email protected]

Abstract—In this paper, we propose a practical system fordetecting 3D volumetric intrusion in a predefined restrictedarea using depth images provided by a range camera. Thissystem can be employed for the protection of valuable objectsdisplayed in public areas, as well as for monitoring the spacearound private property assets. The system defines a virtual 3Dshield around the asset that has to be protected, thus delimitingthe protected boundaries in all three dimensions. Experimentalresults performed with both passive stereo camera and IR depthsensors confirmed that the proposed method effectively localizedthe intrusion detection to the volume of the monitored object.

I. INTRODUCTIONIntrusion detection techniques (e.g., person-machine colli-

sion prevention, off-limits area observation, etc.) are importantmonitoring activities that are useful in establishing safe andsecure environments. Present day technology has reached astage where mounting cameras to capture video imagery ischeap, but finding available human resources to sit and watchthat imagery is expensive. In this paper, we introduce a prac-tical system for detecting volumetric intrusion in a predefinedrestricted area by using depth sensing cameras. A depth-basedintrusion detection has the advantage of performing intrudertracking from a distance without any human intervention.

Intrusion detection systems (IDS) are an indispensablecomponent of security infrastructure that detects potentialthreats before they inflict any damage. Video based IDS havemany advantages to consumer applications. Firstly, and mostimportantly, vision based security systems are unobtrusiveand user-friendly. Secondly, setup is easy and inexpensive asthey only require simple low-cost vision devices of reasonableresolutions such as consumer cameras, computers or servers,and other peripheral devices.

In this paper we propose a real time surveillance andmonitoring system that can be employed for the protection ofvaluable objects (such as paintings, sculptures etc.) displayedin public areas, as well as for monitoring the space aroundprivate property assets. The system defines a virtual 3D shieldaround the asset that has to be protected, thus delimiting theprotected boundaries in all three dimensions (see Figure 1).

The paper is structured as follows: Section II gives anoverview of existing vision based intrusion detection systems.Section III depicts the underlying algorithm while the experi-mental results are presented in Section IV. Finally, Section Vconcludes the paper.

II. RELATED WORKVolumetric intrusion detection sensors are designed to

detect intruder motion within the interior of a protected space.Current commercial intrusion detection systems employed forthe monitoring of predefined 2D / 3D zones (e.g. in productionpremises or factories) are characterized by specific, dedicatedmonitoring tasks. The intrusion detector of Kiwi-Security [1]

(a) Protection of museum exhibits. (b) Monitoring privateproperty assets.

Fig. 1. Volumetric indoor asset monitoring.

employs monocular views for defining the protected perime-ter. Although this system is operable also with thermal andinfrared cameras, it lacks the flexibility of defining preciselythe 3D volume that has to be protected. In order to protectagainst encroachments from behind the protected area, such alimitation would require an intricate combination of multiplesensors.

On the other hand, the IDS developed by Pilz GmbH[2] implements a 3D safety technology that surrounds thedanger zone by a virtual customized protective volume. Anyobject that encroaches into the protected volume and exceeds adefined minimum size is automatically detected and reported.Although it employs an innovative safety technology, such asystem is limited by the capability of the sensing device (thatconsists of three cameras that transmit grey-scale images).Vision based technology is very sensitive to lighting conditionsand it cannot operate in poorly lit spaces. Comparatively, oursystem allows the interchange of both vision based and IR /ToF (Time of Flight) depth cameras.

A somehow tedious approach for performing intrusiondetection has been introduced in [3]. Such a system wouldemploy a multiple-camera setup in which the user wouldgenerate a number of control points in the scene directly witha colored marker. Subsequently, the restricted area would begenerated automatically as the convex hull of all the specifiedpoints. Comparing with this approach, the novelty of ourmethod resides in its flexibility of defining the location of theprotected volume (no need to use markers, user-friendly safetyzone configurator) as well as in its ability to employ both lowcost stereo vision cameras and IR dept sensors like Kinect [5].

By exploiting depth information, the proposed system canautomatically generate the sensitive screen without the burdenfor the user, of placing the marker in the real space in order toobtain the corresponding control points. Also, as our algorithmis performing intruder detection by analysing of only the depthimages, it provides a way to protect the privacy in applicationsthat require such a functionality.

III. PROPOSED METHODAn intrusion detection system dynamically monitors the

events taking place in an environment, and decides whetherthese events are symptomatic of an intrusion or constitute alegitimate use of the system. Figure 2 depicts the organizationof our IDS where solid lines indicate data/control flow, whiledashed lines indicate responses to intrusive activities.

The administrator of the IDS is responsible for theconfiguration of the depth sensor as well as for defining thelocation of the virtual shield. This location is generated byemploying an easy to use interface that allows the placementof a volumetric shape around the asset that has to be protected.

Fig. 2. Framework of the proposed IDS.

Figure 3(a) depicts a scene that has in its center abox that represents the protected object. By using the depthcamera, a 3D point cloud of the scene can be generatedand then visualized in a user friendly GUI (see Figure 3(b)).Such a representation will allow a precise placement of thevirtual shield (currently a sphere) around the object with theboundaries of the restricted area defined in a precise mannerin all three dimensions.

Intrusion detection is performed by employing a back-ground subtraction algorithm based on depth images: Oncethe location of the virtual shield has been defined, a 3Dpreprocessing step is employed that removes the 3D pointslocated outside of the protected zone. In the first stage, theadministrator is generating the background image from thedepth image that corresponds to the filtered out 3D volume(that includes also the object to be protected). Subsequently,detection of the objects entering the predefined 3D volume isperformed automatically by subtracting the background image

(a) Configuration of the depth sen-sor. The object to be protected (theempty box on the table) is placedin view.

(b) 3D point cloud of the scene (side-view). Virtual shield (yellow spheregrid) around the object to be protected.

Fig. 3. Setting up the virtual shield.

(a) Separate intruding objects thatare disconnected in depth image.

(b) Due to a depth-based connec-tivity between the intruding ob-jects, IDS detects a single intru-sion.

Fig. 4. Intrusion detection employs a depth-based CC labeling of the pixels.

from the subsequent depth images. In order to adjust theminimum size of the detectable intruding object, we are usinga virtual shield that exhibits two layers. An intrusion will beconsidered valid only after the inner layer of the virtual shieldwould be penetrated. The distance between the two layersrepresents the minimum size of the detectable objects.

As we envision that the result of the intrusion detectionwill be subsequently integrated into a image recognition sys-tem, we decided for a represention of the intruding points in aunitary fashion. As a consequence, we employed a connected-component (CC) labeling approach [4] that allowed us toidentify the connected regions in the depth image. When anintrusion takes place, we are able to detect the group ofpoints that belong to the same label as the points detectedas intruders. Once an intrusion is identified, the perimeter ofthe area that corresponded to the group of pixels that belongedto the identified label is marked on the reference input image.

Figure 4 (a) exemplifies the discriminative nature ofour implementation that will allow the detection of separateintruding objects that are disconnected in the depth image.Connected objects on the other hand (Figure 4 (b)) will bedetected as a single intrusion. Furthermore, due to its highflexibility in placing the virtual shield, we can precisely tunethe conditions for which an intrusion alarm will be generated(see Figure 5 in which the intruding object is placed in theproximity of the virtual shield).

Depth information is an important cue when humansrecognize objects because the objects may not have consistentcolor and texture. In order to support a rapid and intuitive setupof the virtual shields our system implements also a dominantplane detection functionality that helps the user navigate moreeasily the virtual 3D model of the scene.

(a) Object outside of the virtualshield. No intrusion detected.

(b) Object piercing the virtual shield.Intrusion alarm generated.

Fig. 5. Fine tuning the location of the virtual shield.

IV. EXPERIMENTAL RESULTSWe tested our application with both passive and active

stereo cameras. For the passive stereo vision case we employedan in-house developed stereo camera head [7], [8] while forthe active stereo vision we used the Asus XtionPROLive [6]IR depth sensor. Experiments have been performed on thepremises of Urban Mill [9] collaboration space in Espoo,Helsinki.

First scenario involved a factory-like activity in whichthe safety zone had to be monitored from a bird’s eyeperspective in order to guarantee both ergonomic processesand efficient results. In order to have everything in sight,the camera system was mounted at a height of 7m abovethe ground floor. To generate a good quality depth map weemployed our customized passive stereo camera that exhibiteda canonical setup (two monochrome cameras) with a baselineof 100cm. The USB2 board-level industrial cameras displayeda resolution of 1280x1024 pixels. At the given resolution, thestereo sensor delivered the depth data (see Figure 6) at a framerate of approximately 6fps on a modern PC (i7-2600 CPU @3.40GHz, 4 GB RAM).

Fig. 6. Depth map generated by our in-house passive stereo camera for thefactory-like scenario.

Subsequently, in order to test our IDS, we set up a safetyzone by placing the spherical virtual shield in such a waythat the encompassed volume would contain a section of theground floor. The red rectangle in Figure 7(a) is defined by theminimum/maximum values of the planar coordinates of the 3Dpoints that are included in our spherical virtual shield. This 2Drepresentation is used solely for displaying the relative locationof our safety zone. The actual shape of our virtual shield forthis scenario was a cupola placed above the ground with aheight of approx. 1.7m and a circular base with a radius ofapprox. 1.5m (Figure 7(b)).

(a) The red square marks the 2D lo-cation of the protected zone.

(b) Top view of the protected zone.

Fig. 7. Intrusion detection in a factory-like scenario.

Fig. 8. Safely monitoring and controlling work processes. Detection of anunauthorized entry in the safety zone (left side); Corresponding depth mapfor the intruding objects (right side).

By using the same PC for running both the stereomatching and the intrusion detection applications we obtaineda intrusion detection rate of approx. 4 fps that enabled acontinuous monitoring of typical work processes (Figure 8).

Although the quality of depth map provided by the passivestereo camera (a lateral output resolution of 6,25mm at a depthof 7m) proved to be more than sufficient for the required task,the lighting condition as well as the texture of the objects in thescene will influence the performance of depth map generation.Such a situation can be seen also in our test data in whichfor shiny metal surfaces as well as for poorly textured groundfloor areas the stereo matching algorithm will fail to generatedepth data. A solution for overcoming these limitations isto use active stereo cameras. However, the cons of such anapproach are the limited range due to illumination strengthof the projector. Also, for outdoors environments the daylightis orders of magnitude stronger and the projector pattern isbarely visible. As a consequence, choosing for the right sensingdevice is a delicate task that will depend not only on thetechnical features of the sensor but also on the specificity ofthe security areas.

For our second experiment we envisioned a scenario inwhich the focus was on object protection - like access tobuildings or protecting exhibits in a museum. We tested acombination of both passive and active sensors that simulateda situation in which an exhibit in a museum had to be protectedfrom all directions but with the constraint that the cameras hadto be placed on the side-walls (Figure 9).

Fig. 9. Intrusion detection by using both passive and active sensors.

In our setup, the two cameras were placed oppositely andthey were both facing a candy plate (the target to be protected)on the white table (Figure 10(a)). This dual configurationis useful in making the intrusion detection more robust tosituations in which one of the cameras becomes blinded fora short period of time or for compensating for the occludedareas.

(a) The target to be protected: theplate on top of the white table.

(b) An intrusion event will light upthe LED ring.

Fig. 10. Object protection scenario close ups.

In order to accommodate the passive stereo camera tothe new environment, we employed a shorter baseline (40cm)that generated a lateral output resolution of 5mm at a depth of4m. The active sensor exhibited a depth map size of 640x480pixels and an acquisition frame rate of 16fps. For each sensingdevice we allocated an individual analysis unit (equivalent ofan i7 CPU notebook with 8GB RAM) and for each one ofthem we generated an individual virtual shield.

While the intrusion detection frame rate for the passivesensor remained unchanged, for the active sensor it reached11 fps. A possible solution for increasing the frame rateof our passive stereo vision IDS is to restrict the stereomatching analysis to the ROI (Region Of Interest) resultedafter generating the 2D location of the protected zone.

In order to present in a visual manner the event ofintrusion we subsequently interconnected the two analysisunits with an in-house developed device that lighted up an LEDring whenever an intrusion would be detected by either one ofthe sensing systems (Figure 10(b)). The LED switch was builtaround a RaspberryPI [11] computer and it used the websocketstandard [12] for communicating with the notebooks.

Some directions for the future development of our IDSinclude a simultaneous control of multiple virtual shields aswell as setting up of a unique virtual shield for multiple sensingdevices.

V. CONCLUSIONDepth information is an important cue when humans

recognize objects because the objects may not have consistentcolor and texture but must occupy an integrated region inspace. In this paper, we introduced a practical system fordetecting volumetric intrusion in a predefined restricted areaby using a depth-based camera. The proposed system per-forms intruder tracking from a distance without any humanintervention. The results obtained from different experimentalscenarios show a great potential for a mixed passive/activestereo vision IDS.

VI. ACKNOWLEDGEMENTPart of the research leading to these results has re-

ceived funding from the European Unions Seventh FrameworkProgramme managed by REA (Research Executive Agency)(FP7/2007-2013) under grant agreement no. FP7-SME-2012 -DSenS. The authors gratefully acknowledge also the compa-nies RDnet [13] and XTrust [14] for their technical support.

REFERENCES[1] KiwiSecurity Software GmbH - Intrusion Detector: http://kiwi-

security.com/de/57-intrusion-detector-2/[2] Pilz GmbH - SafetyEYE: https://shop.pilz.com/eshop/cat/en/DE/00014

000337042/SafetyEYE-Safe-camera-system[3] S. Kawabata, S. Hiura and K. Kato, 3D Intrusion Detection System with

Uncalibrated Multiple Cameras, ACCV 2007, Tokyo, Japan, 2007.[4] L. Shapiro and G. Stockman, Computer Vision, Prentice Hall, pp. 6973,

(2002).[5] Microsoft Corp. Kinect for Windows, http://www.microsoft.com/en-

us/kinectforwindows/.[6] Asus Corp. Xtion PRO LIVE,

http://www.asus.com/Multimedia/Xtion/Xtion PRO LIVE.[7] S3E Evaluation Package,

http://www.ait.ac.at/uploads/media/Broschuere S3E EN V4.1 Web 02.pdf[8] Martin Humenberger, Christian Zinner, Michael Weber, Wilfried Kub-

inger, Markus Vincze, A fast stereo matching algorithm suitable forembedded real-time systems, Computer Vision and Image Understanding,vol 114, 11/2010, pag. 1180-1202.

[9] Urban Mill - Building IntenCity, http://urbanmill.org/[10] IDS Imaging Development Systems GmbH, http://en.ids-imaging.com/[11] RaspberryPI, http://www.raspberrypi.org/[12] Websocket standard, http://www.websocket.org/[13] RDnet company, http://www.rdnet.fi/[14] Xtrust company, http://www.xtrust.net/

icce_virtual3dshield

Documents