camera - semantic scholarretina designed to detect unstructured roads lik e scarf, but it do es not...

Robust Real-Time Lane and Road Detectionin Critical Shadow Conditions�Alberto BroggiDipartimento di Ingegneria dell'InformazioneUniversit�a di ParmaParma, ITALY, I-43100AbstractThis paper presents the vision-based road detectionsystem currently installed onto the MOB-LAB land ve-hicle. Based on a geometrical transform and on a fastmorphological processing, the system is capable to de-tect road markings even in extremely severe shadowconditions on at and structured roads. The use ofa special-purpose massively architecture (PAPRICA)allows to achieve a processing rate of about 17 Hz.1 IntroductionMany di�erent vision-based road detection sys-tems have been developed worldwide, each of themrelying on di�erent characteristics such as di�erentroad models (2D or 3D), acquisition devices (coloror monochrome camera), hardware systems (special-or general-purpose, serial or parallel), and compu-tational techniques (template matching, neural net-works, mono or stereo vision,...).The SCARF system [7] (tested on the NAVLABvehicle at Carnegie Mellon University) uses two colorcameras for a color-based image segmentation; the dif-ferent regions are classi�ed and a Hough-like transformis used to vote for di�erent binary road models. Al-though the image resolution is reduced from 480�512to 60� 64 pixel, a high performing system (a 10 cellsWarp [9]) has been chosen to speed-up the process-ing. SCARF, capable of detecting even unstructuredroads in slow varying illumination conditions, reachesa processing rate of 13 Hz [8].Also the VITS system (tested on the ALV vehicle,Martin Marietta) relies on two color cameras. It usesa combination of the red and blue color bands to re-duce the artifacts caused by shadows. Information onvehicle motion are also used to aid the segmentationprocess. Tested successfully on straight, single laneroads, it runs faster than SCARF, sacri�cing generalcapability for speed [24].ALVINN [21] (tested on NAVLAB, CMU) is a neu-ral network based 30 � 32 video retina designed todetect unstructured roads like SCARF, but it doesnot have any road model: it learns associations be-tween visual patterns and steering wheel angles, with-out reasoning about the road location. It has been�This work was partially supported by Italian CNR underthe framework of the Eureka PROMETHEUS Project.

implemented on the Warp system as well, reaching aprocessing rate of about 10 Hz [22].A di�erent neural approach has been developed atCMU and tested on NAVLAB, too: a 256� 256 colorimage is segmented on a 16k processors MasPar MP-2[19]. A trapezoid road model is used together withthe assumption of a constant width road. A reducedversion (128� 128) runs at a rate of 2.5 Hz [14].Due to the high amount of data (2 color images) andto the complex operations involved (clustering, Houghtransform, non-linear neural functions) these systemshave been implemented on extremely powerful hard-ware engines. Anyway, a lot of di�erent methods havebeen considered for the speed-up of the processing.As an example, in VaMoRs (Universit�at der Bun-deswehr, M�unchen) monochrome images are processedby custom hardware, focusing on the regions of inter-est only [11]. These windowing techniques are sup-ported by strong road and vehicle models1 to predictfeatures in incoming images [10]. The use of a singlemonochrome camera together with these simple roadmodels allows a fast processing, but unfortunately thisapproach is not successful in shadow conditions orwhen road imperfections are found [18].Also the LANELOK system (General Motors) [15]relies on strong models: it estimates the location oflane boundaries with a curve �tting method [17]. Un-fortunately the technique used to correct the shadowartifacts [16] relies on �xed brightness thresholdswhich is far from being a robust and general approach.Conversely this paper presents a low-cost system ca-pable of reaching real-time performances in the detec-tion of structured roads (with painted lane markings),and robust enough to tolerate critical shadow condi-tions. The limitation to the analysis of structured en-vironments allows to use simple road models, as wellas the processing of monocular monochrome imageson special-purpose hardware allows to reach high per-formances at a low cost. The system has been testedon the MOB-LAB land vehicle [1], taking advantageof the critical analysis of previous approaches [3, 4].The following section motivates the approach used,while Sect. 3 details its theoretical basis; Sect. 4 and 51In this case, the vehicle was driven at high speeds (up to100 kph) onGermanhighways, which have constantwidth lanes,and where the road has speci�c shapes: straight, constant cur-vature, or clothoid.

present the low- and medium-level processing for roadmarking detection; Sect. 6 analyzes the results anddiscusses the current evolution; �nally Sect. 7 endsthe papers with some concluding remarks.2 The Underlying ApproachDue to its intrinsic nature, low-level image process-ing is e�ciently performed on SIMD systems by meansof a massively parallel computational paradigm. Any-way, this approach is meaningful in the case of generic�lterings (such as noise reduction, edge enhance-ment,...), which consider the image as a mere collec-tion of pixels, independently of their semantic content.On the other hand, the implementation of more so-phisticated �lters requires some semantic knowledge.As an example, let us consider the speci�c problem ofroad markings detection. Due to the perspective e�ectinduced by the acquisition conditions, the road mark-ings width changes according to their distance fromthe camera. Thus, the correct detection of road mark-ings should be based on matchings with di�erent sizedpatterns, according to the speci�c position within theimage. Unfortunately this di�erentiated low-level pro-cessing cannot be e�ciently performed on SIMD mas-sively parallel systems, which by de�nition performthe same elaboration on each pixel of the image.In fact, the perspective e�ect associates di�erentmeanings to the di�erent image pixels, depending ontheir position in the image. Conversely, after the re-moval of the perspective e�ect, each pixel representsthe same portion of the road2, allowing a homoge-neous distribution of the information among all theimage pixels; now the size and shape of the matchingtemplate can be independent of the pixel position.To remove the perspective e�ect it is necessary toknow the speci�c acquisition conditions (camera posi-tion, orientation, optics,...) and the scene representedin the image (the road, which is now assumed to be at), which constitutes the a-priori knowledge. Theprocessing can be conveniently divided into two steps:the �rst, exploiting the a-priori knowledge, is a trans-form (a non-uniform resampling similar to what hap-pens in the human visual system [25, 26]), that gener-ates an image in a new domain where the detection ofthe features of interest is extremely simpli�ed; the sec-ond, exploiting the sensorial data, consists of a merelow-level morphological processing. In this way it ispossible:� to detect the road markings through an extremelysimple and fast morphological processing;� to overcome completely the annoying problemscaused by a non uniform illumination (shadows);� to implement e�ciently the detection step onmassively parallel SIMD architectures, in orderto obtain real-time performances.3 Removing the Perspective E�ectThe procedure aimed to remove the perspectivee�ect reads the incoming image and resamples it,2A pixel in the lower part of the original images of �g. 5represents a few cm2 of the road, while a pixel in the middle ofthe same images represents a few tens of cm2, or even more.

remapping each pixel toward a di�erent position andproducing a new 2-dimensional array of pixels. Theresulting image represents a top view of the road re-gion in front of the vehicle, as it were observed fromthe top.Two Euclidean spaces are de�ned:� W = n(x; y; z)o 2 E3 representing the 3D worldspace (world-coordinate system), where the realworld is de�ned;� I = n(u; v)o 2 E2 representing the 2D imagespace (screen-coordinate system), where the 3Dscene is projected.The image acquired by the camera belongs to the Ispace, while the reorganized image is de�ned as thez = 0 plane of the W space (according to the assump-tion of a at road). Fig. 1 shows the relationshipsbetween the two spaces W and I.The reorganization process projects the acquiredimage onto the z = 0 plane of the 3D world spaceW, acting as the dual of a ray-tracing algorithm [20].z

x

y The z=0 planeof the W space

v

u

Camera

The I space

Q

PFigure 1: The relationship between the two coordinatesystems3.1 Mapping the W space to the I spaceIn order to generate a 2D view of a 3D scene, thefollowing parameters must be speci�ed [20]:� viewpoint: the camera is placed in C = (l; d; h) 2W;� viewing direction: the optical axis o is determinedby the following angles:{ : the angle formed by the projection (de-�ned by versor �) of the optical axis o onthe plane z = 0 and the x axis (as shown in�g. 2.a);{ �: the angle formed by the optical axis o andversor � (as shown in �g. 2.b);� aperture: the camera angolar aperture is 2�;� resolution: the camera resolution is n� n pixels.

x

Cxy

Cxy

yd

l

γ

η z

o

P

h

θ

η(b)(a)Figure 2: (a) the xy plane in the W space and (b)the z� plane, assuming the Origin translated onto theprojection Cxy of C on z = 0.After simple algebraic and trigonometric manipu-lations [2], the �nal mapping f : I !W as a functionof u and v is given by:8>>>>>>>>>><>>>>>>>>>>: x(u; v) = htg h(� � �) + u 2�n� 1i �� cos h( � �) + v 2�n� 1i+ ly(u; v) = htg h(� � �) + u 2�n� 1i �� sin h( � �) + v 2�n� 1i+ dz = 0 (1)with u; v = 0; 1; :::; n� 1.Given the coordinates (u; v) of a generic point Qin the I space, equations (1) return the coordinates(x; y; 0) of the corresponding point P in the W space(see �g. 1). As an example, �g. 3.a shows a syntheticcomputer-generated texture representing the z = 0plane of the W space. Fig. 3.b presents the result ofthe application of the ray-tracing-like algorithm de-�ned by equations (1), using the set of real param-eters [23] of the camera installed on MOB-LAB andn = 512. This transform causes an information loss,higher in the central region of �g. 3.b (correspondingto the top of �g. 3.a) and lower in the peripheral re-gion of the same image (corresponding to the bottomof �g. 3.a).3.1.1 Mapping the I space to the W spaceThe inverse transform g :W ! I (the dual mapping)is given as follows [2]:8>>>>><>>>>>: u(x; y; 0) = (x; y; 0)� � � ��2�n� 1v(x; y; 0) = �(x; y; 0)� �� 2�n� 1 : (2)The reorganization process de�ned by equations (2)removes the perspective e�ect and recovers the texture

of the z = 0 plane of the W space. It consists of scan-ning the array of pixels of coordinates (x; y; 0) 2 Wwhich form the reorganized image, in order to asso-ciate to each of them the corresponding value assumedby the point of coordinates �u(x; y; 0); v(x; y; 0)� 2 I.Fig. 3.c shows the application of the reorganizationprocess (de�ned by equations (2)) applied to �g. 3.b.In this case the resolution chosen (n = 128) is deter-mined as a good trade-o� between information lossand processing time. Note that, as shown in �g. 3.c,the lower portion of the reorganized image is unde-�ned: this is due both to the speci�c camera positionwith respect to the z axis, and to the camera angolaraperture.As an example, �g. 3.e shows the e�ect of the re-organization procedure applied on the original frameshown in �g. 3.d: it is clearly visible that in this casethe road markings width is almost invariant within thewhole image.4 The Low-Level Processing4.1 Identifying road markingsThe assumptions used in the de�nition of a \RoadMarking" are the following: a road marking in thez = 0 plane of the W space (i.e. in the reorganizedimage) is represented by a quasi-vertical bright lineof constant width surrounded by a dark region (theroad). Thus the pixels belonging to a road markinghave a brightness value higher than their left and rightneighbors. The detection is thus reduced to the deter-mination of horizontal black-white-black transitions.For each image line x = 0; 1; :::; n� 1, every pixelP = (x; y; 0) compares its brightness value b(x; y; 0)with its left and right ones at distance m: b(x; y�m; 0)and b(x; y + m; 0), with m � 1. A new pixel valuer(x; y; 0) is computed according to the following rule:r(x; y; 0) = 8>>>>>>><>>>>>>>: �b(x; y; 0)� b(x; y �m; 0)�+�b(x; y; 0)� b(x; y +m; 0)�;if�b(x; y; 0) � b(x; y �m; 0)�&&�b(x; y; 0) � b(x; y +m; 0)�0 otherwise (3)According to equation (3),r(x; y; 0) 6= 0 =) � r(x; y �m; 0) = 0r(x; y +m; 0) = 0 : (4)The choice of m depends on the road markings width,which is assumed to be in a known range.Considering m = 2, the resulting image3 is shownin �g. 4.a.3The rule described by equation (3) can be expressed in amore compact and simple way thanks to grey-tone mathemati-cal morphology notations [12]: the new pixel value r(x; y; 0) isgiven by:r(x; y; 0) = n d(x; y; 0)� b(x; y; 0) ; if d(x; y; 0) � b(x; y; 0)0 ; otherwise

Figure 3: a) The model image; b) the projection of the previous model on the I space; c) the result of thereorganization procedure applied on the previous image; d) the original image; e) the reorganized image.4.2 Image enhancement and binarizationDue to di�erent illumination conditions (e.g. inpresence of shadows), the road markings may havedi�erent brightness, yet maintaining their superiorityrelationship with their horizontal neighbors. Thus asimple threshold seldom gives a satisfactory binariza-tion and consequently an image enhancement is re-quired, as well as an adaptive binarization. Exploitingthe property expressed by equation (4), the left andright neighbors of a road marking line assume a zerovalue in the �ltered image. Thus the execution of afew iterations (say h) of the following rule4e(i+1)(x; y; 0) = 8>>><>>>: max�e(i)(x+ 1; y; 0); e(i)(x� 1; y; 0);e(i)(x; y + 1; 0); e(i)(x; y � 1; 0)�;if e(i)(x; y; 0) 6= 00 otherwise (5)starting from e(0)(x; y; 0) = r(x; y; 0) generates an en-hanced image, as shown in �g. 4.b (with h = 8).The binarization is performed by means of an adap-tive threshold:t(x; y; 0) = ( 1 if e(h)(x; y; 0) � m(x; y; 0)k0 otherwise ; (6)where m(x; y; 0) is the maximum value computed ina given c� c neighborhood, and k is a constant. Theresult of the binarization of �g. 4.b, considering k = 2and c = 11, is presented in �g. 4.c.Fig. 4.d shows the representation in the I space ofthe binarized result of �g. 4.c, while �g. 4.e presentsits �nal superimposition onto the original image. Thehigh quantization visible in the lower region of theseimages is the e�ect of the choice of a medium resolu-tion (128� 128) for the reorganized image.where d(x; y; 0) is de�ned as the grey-tone dilation [12] of theoriginal brightness image by the following binary structuringelement: � �!" � .4Note that equation (5) can be e�ciently expressed as a di-lation with the following binary structuring element �� !" ��contextualized (see [13]) to the state of the central pixel.

5 The Medium-Level ProcessingThe medium-level processing is reduced to the de-termination of the best concatenations of the pixelsrepresenting road markings. This is done by meansof a serial scanning of the image: �rst an histogramis computed in the lower region of the binary image,and a threshold is applied; then, starting from theover-threshold positions, a neighborhood search is per-formed, keeping track of the search direction. Whereno black neighbors are found (namely a gap in theroad boundary is met) the search is continued in thepreviously determined direction, according to the bestcontinuation Gestalt principle. The result is then com-pared to the position of road markings in normal con-ditions (vehicle in the center of the right lane) and thedi�erence is then used to warn the driver in dangeroussituations.Due to the high e�ectiveness of the low-level pro-cessing and to the high correlation between two sub-sequent frames in a sequence (thanks to the fast pro-cessing), the medium-level step is extremely fast; itis performed in pipeline by a sequential architectureduring the low-level processing of the following frame.This system is currently integrated on the MOB-LABland vehicle, and has been proven to be e�ective in anumber of di�erent road conditions running at about50 kph on very narrow rural roads.6 Performance Analysis and CurrentEvolutionThe discussed algorithmhas been integrated on PA-PRICA [5, 6] massively parallel architecture featuringan hardware extension for the e�cient removal of theperspective e�ect (based on a look-up table). Table 1presents the performance of the current implementa-tion: the complete acquisition and processing of a sin-gle frame takes less than 55 ms, thus allowing the pro-cessing of about 17 frames per second.Beside being easily implementable on any SIMDmassively parallel processor based on a morphologi-cal computational paradigm, the presented approachis extremely robust with respect to the noise causedby sunny blobs on shaded roads. Fig. 5 presents afew results of the processing of images acquired underdi�erent conditions5.5A couple of sequences in MPEG format (200 and 1000frames respectively) are available in http://WWW.CE.UniPR.IT/computer vision/applications.html, showing lane detection

Figure 4: a) The �ltered image; b) the enhanced image; c) the binarized image; d) the projection of the previousresult on the I space; e) the superimposition of the previous image onto the original one.Figure 5: Top: original images acquired from MOB-LAB vehicle; Bottom: the superimposition of the results ofroad markings detection onto brighter versions of the original images (images taken during the demonstration ofthe MOB-LAB vehicle at the �nal meeting of the PROMETHEUS project, Mortefontaine track, Paris, Oct. 94).Operation TimeImage acquisition (512� 256) 20 msPerspective e�ect removal (128�128) 2.8 msLow-level processing (128� 128) 30 msMedium-level processing (128� 128) in pipelineWarnings to the driver negligibleTable 1: Timings on PAPRICA systemUnfortunately this approach relies on the implicitassumption of visible road markings, namely when noobstacles are on the path. The worst case is repre-sented by a white vehicle on the right lane, since it in-duces black-white-black transitions in the reorganizedimage that are erroneously identi�ed as road mark-ings. This problem can be solved by the applicationof the mentioned approach to a couple of stereo im-ages (see �g. 6). The independent detection of theroad markings on both images would be disturbed byin particularly challenging conditions. It is possible to notethat the vehicle's pitch does not disturb the processing even ifthe reorganized image represents a very large road area in frontof the vehicle (up to 40 meters).

the presence of vehicles. Conversely, the fusion of thetwo results (obtained with a simple logical intersec-tion, thanks to the calibration of the two image reorga-nization processes) improves and makes more reliablethe detection (see �g. 6.g.) In this case the problemsof stereo vision are then easily solved by the low-levelportion of the processing, thus reaching a high com-putational e�ciency.Moreover, since on a at road the two reorganizedimages are identical, as soon as an obstacle is ap-proached the two reorganized images di�er. The dif-ference image can be used for obstacle detection (see�g. 6.i).7 ConclusionThis paper presented an approach to real-time roaddetection, working on at roads with painted roadmarkings. It has been demonstrated to be robust withrespect to extremely critical shadow conditions andglobal illumination changes; an extension to obstacledetection is now under evaluation.It has been implemented on the special-purpose andlow-cost massively parallel system PAPRICA and in-tegrated onto the MOB-LAB land vehicle, reaching aprocessing rate of about 17 Hz.

Figure 6: a) left image; b) right image; c) left reorganized image; d) right reorganized image; e) left binarizedimage; f) right binarized image; g) logical intersection between e and f ; h) superimposition of the result onto theoriginal image; i) di�erence between images c and d, used for obstacle detection.References[1] G. Adorni, A. Broggi, V. D'Andrea, and G. Conte. Real-Time Image Processing for Automotive Applications. InP. A. Laplante and A. D. Stoyenko, eds., Real-Time ImageProcessing: Theory, Techniques and Applications. IEEEPress, 1996. In press.[2] A. Broggi. An Image Reorganization Procedure for Auto-motive Road Following Systems. In Proc. 2nd IEEE Intl.Conf. on Image Processing, 1995. In press.[3] A. Broggi. Parallel and Local Feature Extraction: aReal-Time Approach to Road Boundary Detection. IEEETrans. on Image Processing, 4(2):217{223, February 1995.[4] A. Broggi and S. Bert�e. A Morphological Model-DrivenApproach to Real-Time Road Boundary Detection forVision-Based Automotive Systems. In Proc. 2nd IEEEWorkshop on Applications of Computer Vision, pages 73{90, 1994.[5] A. Broggi, G. Conte, F. Gregoretti, C. Sanso�e,L. M. Reyneri. The PAPRICA Massively Parallel Pro-cessor. In Proc. IEEE Intl. Conf. on Massively ParallelComputing Systems, pages 16{30, 1994.[6] A. Broggi, G. Conte, F. Gregoretti, C. Sanso�e,L. M. Reyneri. The Evolution of the PAPRICA System.Integrated Computer-Aided Engineering Journal - SpecialIssue on Massively Parallel Computing, 1995. In press.[7] J. Crisman and C. Thorpe. Color Vision for Road Follow-ing. In C. E. Thorpe, editor, Vision and Navigation. TheCarnegie Mellon Navlab, pages 9{24. Kluwer AcademicPublishers, 1990.[8] J. Crisman and C. Thorpe. SCARF: A Color Vision Sys-tem that Tracks Roads and Intersections. IEEE Trans. onRobotics and Automation, 9(1):49{58, February 1993.[9] J. D. Crisman and J. A. Webb. The Warp Machine onNavlab. IEEE Trans. on PAMI, 13(5):451{465, May 1991.[10] E. D. Dickmans and B. D. Mysliwetz. Recursive 3-DRoad and Relative Ego-State Recognition. IEEE Trans.on PAMI, 14:199{213, May 1992.[11] V. Graefe and K.-D. Kuhnert. Vision-based AutonomousRoad Vehicles. In I. Masaki, editor, Vision-based VehicleGuidance, pages 1{29. Springer Verlag, 1991.[12] R. M. Haralick, S. R. Sternberg, and X. Zhuang. ImageAnalysis Using Mathematical Morphology. IEEE Trans.on PAMI, 9(4):532{550, 1987.

[13] W. D. Hillis. The Connection Machine. MIT Press, Cam-bridge, Ma., 1985.[14] T. M. Jochem and S. Baluja. A Massively Parallel RoadFollower. In M. A. Bayoumi, L. S. Davis, and K. P. Vala-vanis, editors, Proc. Computer Architectures for MachinePerception, pages 2{12, 1993.[15] S. K. Kenue. LANELOK: An Algorithm for Extending theLane Sensing Operating Range to 100 Feet. In Proc. ofSPIE - Mobile Robots V, vol. 1388, pp. 222{233, 1991.[16] S. K. Kenue. Correction of Shadow Artifacts for Vision-based Vehicle Guidance. In Proc. of SPIE - Mobile RobotsVIII, vol. 2058, pp. 12{26, 1994.[17] S. K. Kenue and S. Bajpayee. LANELOK: Robust Lineand Curvature Fitting of Lane Boundaries. In Proc. ofSPIE - Mobile Robots VII, vol. 1831, pp. 491{503, 1993.[18] K. Kluge and C. E. Thorpe. Explicit Models for RobotRoad Following. In C. E. Thorpe, editor, Vision and Nav-igation. The Carnegie Mellon Navlab, pages 25{38.KluwerAcademic Publishers, 1990.[19] MasPar Computer Corporation, Sunnyvale, California.MP-1 Family Data-Parallel Computers, 1990.[20] W. M. Newman and R. F. Sproull.Principles of InteractiveComputer Graphics. McGraw-Hill, Tokyo, 1981.[21] D. A. Pomerleau. Neural Network BasedAutonomousNav-igation. In C. E. Thorpe, editor, Vision and Navigation.The Carnegie Mellon Navlab, pages 83{93. Kluwer Aca-demic Publishers, 1990.[22] D. A. Pomerleau. Neural Network Perception for MobileRobot Guidance. Kluwer Academic Publishers, 1993.[23] R. Tsai. An E�cient and Accurate Camera CalibrationTechnique for 3D Machine Vision. In Proc. IEEE Intl.Conf. on CVPR, pages 364{374, Miami Beach, FL, 1986.[24] M. A. Turk, D. G. Morgenthaler, K. D. Gremban, andM. Marra. VITS - A Vision System for Autonomous LandVehicle Navigation. IEEE Trans. on PAMI, 10(3), 1988.[25] J. M. Wolfe and K. R. Cave. Deploying visual atten-tion: the guided model. In AI and the eye, pages 79{103.A.Blake and T.Troscianko, 1990.[26] B. Zavidovique and P. Fiorini. A Control View to VisionArchitectures. In V. Cantoni, editor, Human and MachineVision: Analogies and Divergencies, pages 13{56. PlenumPress, 1994.

camera - semantic scholarretina designed to detect unstructured roads lik e scarf, but it do es not...

Documents