a decision-making model for autonomous vehicles at urban

Research ArticleA Decision-Making Model for Autonomous Vehicles at UrbanIntersections Based on Conflict Resolution

Zi-jia Wang1 Xue-mei Chen 12 Pin Wang3 Meng-xi Li1 Yang-jia-xin Ou1 and Han Zhang4

1Beijing Institute of Technology School of Mechanical Engineering Intelligent Vehicle Research Institute5 South Zhong Guan Cun Street Haidian District Beijing 100081 China2Advanced Technology Research Institute Beijing Institute of Technology Jinan 250001 Shandong China3University of California Berkeley 1357 South 46th Street Richmond CA 94804 USA4Shandong Hi-Speed Construction Management Group Co Ltd Jinan 250001 Shandong China

Correspondence should be addressed to Xue-mei Chen chenxue781126com

Received 26 March 2020 Revised 6 November 2020 Accepted 19 November 2020 Published 13 February 2021

Academic Editor Wenqing Wu

Copyright copy 2021 Zi-jia Wang et al -is is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

-e decision-making models that are able to deal with complex and dynamic urban intersections are critical for the developmentof autonomous vehicles A key challenge in operating autonomous vehicles robustly is to accurately detect the trajectories of otherparticipants and to consider safety and efficiency concurrently into interactions between vehicles In this work we propose anapproach for developing a tactical decision-making model for vehicles which is capable of predicting the trajectories of incomingvehicles and employs the conflict resolution theory to model vehicle interactions -e proposed algorithm can help autonomousvehicles cross intersections safely Firstly Gaussian process regression models were trained with the data collected at intersectionsusing subgrade sensors and a retrofit autonomous vehicle to predict the trajectories of incoming vehicles -en we proposed amultiobjective optimization problem (MOP) decision-making model based on efficient conflict resolution theory at intersectionsAfter that a nondominated sorting genetic algorithm (NSGA-II) and deep deterministic policy gradient (DDPG) are employed toselect the optimal motions in comparison with each other Finally a simulation and verification platform was built based onMatlabSimulink and PreScan -e reliability and effectiveness of the tactical decision-making model was verified by simulations-e results indicate that DDPG is more reliable and effective than NSGA-II to solve the MOPmodel which provides a theoreticalbasis for the in-depth study of decision-making in a complex and uncertain intersection environment

1 Introduction

Todayrsquos driving-assistance systems have made traffic moreefficient and safer and show considerable improvementstowards the availability of autonomous driving To developthe next generation of driver assistance systems or even self-driving systems the algorithms that are capable of handlingcomplex situations are required Many researchers haveproposed some approaches about perception [1] pathplanning [2] and control [3] However the decision-makingof autonomous driving at intersections is still one of themajor bottlenecks -e primary reason for the difficulty inanalyzing crossing behavior is that most models may onlywork when given long-term accurate predictions of the

trajectories of other participants To address this problemthis paper will focus on developing a tactical decision-making model for autonomous vehicles in intersectioncrossing scenarios

-e problems of robust tactical decision-making forautonomous vehicles in a complex and dynamic urbanenvironment have been investigated quite extensively bymany organizations and researchers such as Google [4]Carnegie Mellon University [5] Berkeley [6] and Baidu [7]-e UCB utilized a minimal future distance and a two-leveldynamic threshold to perform collision prediction tasks aturban intersections [8] BMW and the University of Munichcame up with a decision-making model based on partiallyobservable Markov decision processes [9] NVIDIA used a

HindawiJournal of Advanced TransportationVolume 2021 Article ID 8894563 12 pageshttpsdoiorg10115520218894563

deep convolutional neural network (DCNN) to establish anend-to-end driving model [10]

In recent years more and more researchers have begunstudying decision-making behavior Chen [11] established avehicle decision model in an urban environment using ahierarchical finite state machine method for different driversand road environment characteristics Liu et al [12] adoptedthe control prediction theory and the reinforcement learningtheory to obtain a decision model However these modelscannot be adapted to urban intersections Ma et al [13]proposed a decision-making framework titled ldquoPlan-Deci-sion-Actionrdquo for autonomous vehicles at complex urbanintersections Zhong et al [14] proposed a model-learning-based actor-critic algorithm with the Gaussian processapproximator to solve the problems with continuous stateand action spaces Xiong et al [15] used a Hidden Markovmodel to predict other vehiclesrsquo intentions and built a de-cision-making model for vehicles at intersections Lv et al[16] combined offline and online machine learning methodsto establish a personalized decision model that could sim-ulate the characteristics of driver behavior Chen et al [17]used the rough-set theory to extract different driversrsquo de-cision rules Chen et al [18] used a novel RSAN (rough-setartificial neural network) method to learn decisions made byexcellent human drivers Chen et al [19] proposed amergingstrategy based on the least squares policy iteration (LSPI)algorithm and selected a basis function that included thereciprocal of TTC relative distance and relative velocity torepresent the state space and discretize the action spaceHowever these studies did not take the overall interactionscenarios into consideration and can only be adopted forshort-term trajectory prediction

-is paper focuses on the decision-making process ofautonomous vehicles in an urban environment and developsa vehicle trajectory prediction model based on Gaussianprocess regression (GPR) [20] which can generate long-term predictions of incoming vehicles -e problem ofconflict resolution among vehicles at intersections is mod-eled as a multiobjective optimization problem (MOP) inwhich the acceleration as the only decision variable is usedto control the vehicles -e main contributions of this workare the presentations of two solutions of intersection mul-tiobjective optimization problems First the noninferiorgenetic algorithm (NSGA-II) is applied to maximize theoverall driving benefit of system the other one considers thedeep deterministic policy gradient (DDPG) algorithm ofreinforcement learning with continuous actions Its ex-pected gradient of the action-value function means thatDDPG can be estimated much more stable than the usualstochastic policy gradient A simulation and verificationplatform was built to validate the results based on MatlabSimulink and PreScan and the proposed MOP decision-making method and calculation algorithms were verified inseveral typical scenarios

-e remainder of this paper is organized as followsSection 2 elaborates upon the methodology used in thisstudy which includes an introduction of Gaussian processregression nondominated sorting genetic algorithm(NSGA-II) and deep deterministic policy gradient

algorithm of reinforcement learning Section 3 describesdata acquisition and data processing Section 4 proposes theGPR models for trajectory prediction and the MOP deci-sion-making model based on efficient conflict resolution atintersections which is solved by NSGA-II and DDPG -esimulation verification platform to evaluate the effectivenessand reliability of the proposed model and performancebetween two algorithms is introduced in Section 5 InSection 6 conclusions and future work are presented

2 Methodology

21 Gaussian Process Regression Model Gaussian processregression (GPR) is a statistical method that can make fulluse of raw data by considering its temporal trends andperiodic changes to establish a suitable predictive model-is model has been used to predict the trajectories ofvehicles and has been proven to be efficient Compared withLSTM its main advantage is that it is more robust whendealing with data with noise making it more suitable forurban intersections

-e log likelihood function of the sample data is shownas follows

logp(y|X) minus12yTCminus 1y minus

12log|C| minus

N

2log(2π) (1)

-e joint distribution of the modelrsquos observations andtraining data is shown as follows

y

ylowast1113890 1113891 sim N 0

C KlowastKTlowast C xlowast xlowast( 1113857

1113890 11138911113888 1113889 (2)

where Klowast [C(xlowast x1)C(xlowast x2) C(xlowast xn)]T is thecovariance matrix between the test data xlowast and the trainingdata and C(xlowast xlowast) is the covariance matrix of the test dataitself

-erefore the output of the model can be found with (3)By calculating the mean and variance of the output ylowast of themodel the predicted mean 1113955ylowast and predictive confidence σ2lowastof the model can be obtained separately

1113955ylowast E ylowast( 1113857 KTlowastC

minus 1y

σ2lowast Var ylowast( 1113857 C xlowast xlowast( 1113857 minus KTlowastC

minus 1Klowast

⎧⎨

⎩ (3)

22 Nondominated Sorting Genetic Algorithm In 2000 anew nondominated sorting genetic algorithm (NSGA-II)was proposed by Srinivas and Deb on the basis of the NSGAwhich is a theory and method of handling the Pareto optimain multiobjective optimization problems It is one of themost popular multiobjective genetic algorithms (GAs) instudying complex system analysis and diversity resultsdiscovery -e structure of the algorithm is as shown inFigure 1

-e step-by-step procedure shows that NSGA-II algo-rithm is simple and straightforward First a combinedpopulation Rt Pt cup Qt is formed -e population Rt is ofsize 2N -en the population Rt is sorted according tonondomination Since all previous and current population

2 Journal of Advanced Transportation

members are included in Rt elitism is ensured Now so-lutions belonging to the best nondominated set F1 are of bestsolutions in the combined population and must be em-phasized more than any other solution in the combinedpopulation If the size of F1 is smaller than N we definitelychoose all members of the set F1 for the new populationPt+ 1 -e remaining members of the population Pt+ 1 arechosen from subsequent nondominated fronts in the orderof their ranking -us solutions from the set F2 are chosennext followed by solutions from the set F3 and so on -isprocedure is continued until no more sets can be accom-modated Say that the set is the last nondominated set be-yond which no other set can be accommodated [21]

23 Deep Deterministic Policy Gradient -e interactivelearning process of reinforcement learning is similar tohuman learning which can be represented as a Markovdecision process consist of (s a P r) In 2013 the DQN [22]algorithm was proposed by DeepMind opening a new era ofdeep reinforcement learning -e core improvement of thealgorithm is to use experience replay and build a secondtarget network [23] which eliminate the correlation betweenthe training samples and improve stability of training Somealgorithms evolved by DQN have made great progress indiscrete action control problem but it is difficult to learncontinuous strategy control problem In 2015 DeepMindproposed the DDPG algorithm based on the DPG and DQNalgorithms [24] importing the normalization mechanism indeep learning [25] Experiments show that the proposedalgorithm performs well on multiple kinds of continuouscontrol problems

-e DDPG algorithm is an improved actor-criticmethod In the actor-critic algorithm the actor functionπ(s|ϕ) generates an action given the current state -e criticevaluates an action-value function Q(s a|θ) based on theoutput from actor as well as the current state -e TD(temporal-difference) errors produced from the critic drivethe learning in the critic network and then actor network isupdated based on policy gradient

-e DDPG algorithm combines the advantages of theactor-critic and DQN algorithms so that the converge be-comes easier In other words DDPG introduces someconcepts from DQN which are employing the target

network and estimate network for both of the actor andcritic Moreover the policy of the DDPG algorithm is nolonger stochastic but deterministic It means the only realaction is outputted from the actor network instead of tellingprobability of different actions-e critic network is updatedbased on

L 1N

1113944

N

i

Q st at|θQ

1113872 1113873 minus yi1113872 11138732 (4)

where yi ri + cQprime(st+1 at|θQprime) is the Q value estimated by

target network and N indicates the total number of mini-batch size -e actor network is updated by means of thegradient term

nablaθμJ asymp1N

1113944

N

i

nablaaQ s a|θQ1113872 1113873|ssiaμ si( )nablaθμμ s|θμ( 1113857|ssi

(5)

where Q(s a|θQ) is from critic estimate network Further-more the DDPG algorithm solves continuous action spaceproblem by means of experience replay and asynchronousupdating -e updates of the target critic and target actornetworks are as follows

θQprime ⟵ τθQ+(1 minus τ)θQprime

(6)

3 Data

-e data were collected from the intersections of Wei GongCun Road using subgrade sensors and a retrofit autonomousvehicle as the training and testing samples of the trajectoryprediction model -e details are discussed in the followingsection

31 Subgrade Data Acquisition -e camera for subgradedata acquisition was installed on the BIT Science andTechnology Building -e vehiclesrsquo locations (x y z) ve-locities (v) and accelerations (a) were extracted -e sym-metric exponential moving average (SEMA) method [26]was adopted to smooth out the training data

32 Vehicle Data Acquisition -e vehicle data were col-lected with a BYD line-controlled autonomous vehiclewhich was retrofitted by the BIT Intelligent Vehicle ResearchInstitute -e retrofit autonomous vehicle ldquoSuruirdquo [27] wasequipped with several kinds of sensors as shown inFigure 2(b)

-e camera and LIDAR sensor were able to detect trackand localize dynamic objects -e outputs of the fusionalgorithm are the positions of vehicles

4 Model

41 Analysis of Driving Behavior at Intersections Due to thedifferent driving directions and routes of vehicles at inter-sections a collision may occur As shown in Figure 3(a) ablue unmanned vehicle (unmanned vehicle UV)may collidewith a yellow manned vehicle (manned vehicle MV) that

Nondominated sorting Crowding distance sorting

Rejected

F1

Pt

Pt + 1Qt

Rt

F2

F3

Fn

Figure 1 Structure of NSGA-II

Journal of Advanced Transportation 3

60

40

20

0Vert

ical

pos

ition

(m)

ndash200 2 4 6

t (s)8 10 12

Vertical position

RawProcessed

Lateral position14

13

12

11

10

Late

ral p

ositi

on (m

)

90 2 4 6

t (s)8 10 12

RawProcessed

06

04

02

0

ndash02

a (m

s2 )

ndash040 2 4 6

t (s)8 10 12

Acceleration

RawProcessed

27

26

25

24

22

21

23v (k

mh

)

200 2 4 6

t (s)8 10 12

Velocity

RawProcessed

(a)

32-line laser radar

Binocular camera (right)

GPS (front) GPS (behind)Binocular camera (le)

(b)

Figure 2 (a) -e raw and processed data (b) -e layout of sensors on the BYD retrofit vehicle

1

2

3UV

5

2 1(X1 Y1V1 φ1)

(X2 Y2 V2 φ2)

4

S2

S1

7

3 v2

v1

8D A

C B

6

MV

(a)

UV

MV

Tmv2

Conflict zone UV

MV

Tuv1

Conflict zone

(b)

Figure 3 Scenes on crossing intersection with no signal


may go straight and turn right or turn left at blue areas whichcalled conflict zones -erefore a decision-making modelwas established to avoid collisions in these spaces based onvehicles crossing the intersection at different times -ispaper only focused on the impacts of motor

42 Research on Vehicle Trajectory Prediction

421 Feature Motion Parameter Extraction Vehicle courseangle and azimuth were extracted to distinguish if a vehicleturned or not because these two parameters change linearlywith time when vehicles turn By utilizing vehiclesrsquo motionparameters to recognize driving patterns incoming vehiclesrsquotrajectories were predicted effectively Real-time accelerationwas used to distinguish if a vehicle kept driving or gave wayto incoming vehicles because vehiclesrsquo real-time accelera-tions for the two patterns are distributed across differentranges

422 Trajectory Prediction Model A trajectory predictionmodel based on the GPR model was used to predict thetrajectories of MVs-e training process of GPRmodels [28]is shown in Figure 4(a)

In this paper the data collected from the subgradesensors were used for training the GPR models and opti-mizing its hyper parameters minus (x(t) y(t) vs(t) vy(t)) werethe inputs while ax(t) was the output A square exponentialcovariance function (SE) was adopted as the kernel functionbecause it can accurately describe the nonlinear relation-ships between the inputs and outputs A conjugate gradientoptimization algorithm was then adopted to search foroptimal parameters When the error fell below 0001 theresults were regarded as convergent

After training the prediction model as this paper paidmore attention to straight driving MVs the CA (constantacceleration) [29] kinematic formula is utilized to calculatethe follow-up trajectories more accurately as shown inFigure 4(b)

43 A Decision-Making Model Based on Efficient ConflictResolution An appropriate parameter should be selected toanalyze the traffic conflict TTC (time to collision) is a widelyused parameter in traffic conflict research but it is generallyused for scenes such as highway and is improper to evaluatethe danger degree of vehicles collision at intersections Weuse EPET (estimating postencroachment time) as the safetyindicator which describes the time difference between ve-hicles passing through the center of conflict zone and caneffectively evaluate collision danger between vehicles at anyangles as shown in Figure 3(b)

EPET Tuv minus Tmvi

11138681113868111386811138681113868

11138681113868111386811138681113868 (7)

where Tuv and Tmviare respectively the time when UV and

MV(i) arrive the conflict zone EPET is expected a largervalue which means smaller risk of collision

While ensuring safety an appropriate speed is expectedwhich stands for efficiency during crossing the intersectionUsing these criteria we define the following measurecombining safety and efficiency

U Usafe(EPET) + Uefficiency(V)

minus exp(minus EPET) minusVMV minus Vcri( 1113857

2

V2cri

(8)

where U is the profit function and we expect a larger U thatrepresents more ideal motions during the crossing for ve-hicles Vcri is the expected speed for MV which is set to40 kmh according to the driving rules at the intersections-e reason for defining the U negative is ensuring efficiencyin the following model eg deep reinforcement learning

As the states and actions of vehicles are continuous weuse acceleration a as a parameter to control target Aconstrained model of multiobjective optimization problem(MOP) is proposed based on conflict resolution at inter-section and the goal of which is to maximize profit of thesystem -e interaction between vehicles is quantized byimporting a variable parameter P and vehicles will cross theintersection in competition when P is zero When P is 1vehicles will be in cooperation completely

-e mathematical model of MOP is usually expressed asfollows

minf(X)

hi(X) 0 i 1 2 m

gj(X)ge 0 j 1 2 l

⎧⎪⎪⎪⎨

⎪⎪⎪⎩

(9)

where f(X) is object function and hi(X) 0 and gj(X)ge 0are constraint conditions For solving the maximum of U itcan be transformed into finding the minimum of negativefunction -erefore we can then establish

max U U1 + p middot U2

st amin le ai le amax i 1 2

0le vi le vmax i 1 2

(10)

vmax depends on the speed limit at the intersections andamax and amin represent comfort requirement during driv-ing which are defined as plusmn2ms2 in this paper

44 De Calculation Method Based on NSGA-II

441 Constraint Condition To ensure safety a simplifiedcircle model for vehicles is established as shown in Figure 5

We set a safety constraint for no overlap between theexcircles of vehicles

x1(t) minus x2(t)( 11138572

+ y1(t) minus y2(t)( 11138572

1113872 1113873

1113969

neR1 + R2 (11)


where Ri L2

i + W2i

11139692 where L and W are respectively

the length and width of vehicles-e formula for the motion state of vehicles is as follows

xi(t) xi(0) minus vi(0)t +12ai(0)t

21113876 1113877cosφi

yi(t) yi(0) minus vi(0)t +12ai(0)t

21113876 1113877sinφi

⎧⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎩

0le tleT

(12)

where ( xi(0) yi(0)) is initial position and φ1 is orientation

442 Process of DecisionMaking For themodel ofMOP weperform an optimal solution based on NSGA-II and theprocess is shown in Figure 6

-ere are two stages in the solution process the firstphase is decision making at the initial moment and per-forming the action with the known information and thesecond phase is to update the position and velocity of ve-hicles with dynamic information and then regenerate op-timal motions

45 De Calculation Method Based on Deep ReinforcementLearning If we assume that the process of crossing inter-sections is a Markov decision process (MDP) it is practicalto apply deep reinforcement learning for continuous actionspaces -e input state is the speed of vehicles and distancefrom the center of vehicles to the center of conflict zone ieS (VMV DMV VUV DUV) if there are more than one UVon the scene all the speed and distance of UVs are appendedinto observation state-e output action is the acceleration a

Trajectory data

Training set Test set

Select proper kernel function

Set the initial value ofsuperparameters to determine

the prior model

Train model and optimizesuperparameters

Determine the posterior model

Output the mean andvariance of the predicted

points

(a)

InputX = (x(t) y(t) vx(t) vy(t))

GPR accelerationprediction model


Output =

1 0 Δt Δt220

CA kinematic formula0

0 1 0 0Δt Δt220 0 1 Δt0 00 0 0 01 Δt0 0 0 10 00 0 0 00 1

Update

GPR modelY2 = αy(t)

Y1 = αx(t)xt+1

yt+1

vxt+1

vyt+1

axt+1

ayt+1

xtytvxtvytaxtayt

(b)

Figure 4 (a) -e training process of the GPR model (b) -e trajectory prediction model

Center of conflict zone(xt yt)

MV (x1 y1 v1 φ1)

UV (x2 y2 v2 φ2)

v1ndashv2

v2

xo

y

v1

Figure 5 Simplified circle model


of MV In this study the reward function is built in the sameway as (8) R U which is with consideration of safety andefficiency We expect a larger total reward that means thesum of the rewards for each step and converge it throughtraining based on policy gradient which is the reason whywe set the reward is negative As for a positive rewardfunction a larger total reward may result from more stepwhich means more time to cross intersections by an inef-ficient policy However for a negative reward function alarger total reward means a safe and efficient policy

5 Discussion and Evaluation

In this section we trained DDPG on OpenAI Gym and thentested the algorithms on PreScan to compare -is allowedus to verify the effectiveness and reliability of the proposedalgorithms

Simulation parameters are set as follows we test thealgorithms in single or multiple-vehicle scenes where there isone or moreMVs driving straight from north to south and aUV is excepted to cross the intersection controlled by al-gorithms with no collision -e length and width of vehicleMV and UV are 4800mm and 2178mm respectivelycommunication distance range is less than 200m apart fromeach other and speed limit at intersection is 60 kmh

51 Simulation and Verification Platform PreScan is asimulation environment for developing advanced drivingassistant systems (ADASs) and intelligent vehicle (IV)systems It is a platform that can be used to build 3D virtual

traffic scenes generate vehicles pedestrians traffic lightsand other control modules as shown in Figure 7(a) PreScancomes with a powerful graphics preprocessor a high-end 3Dvisualization viewer and a connection to standard MAT-LABSimulink It is composed of various main modulesSome of these main modules represent a specific worldMultiple sensor readings were simulated and captured in theSensor World

We build a new task about intersection with multiplevehicles on OpenAI Gym as shown in Figure 7(b) -edeterministic actor policy network and critic policy networkhave the same architectures which are multilayer percep-tions with two hidden layers (64-64) For the metaexplo-ration policy we implemented a stochastic Gaussian policywith a mean network or variance network represented with aMLP with two hidden layers (64-64)

52 Analysis of Experimental Results

521 Results of Prediction Model In this paper the pre-dictions of steering-vehicle trajectories and the straightvehicle trajectories are verified separately -ese trajectoriesare divided into several different pieces to evaluate theprediction performance -e prediction lengths of thestraight vehicle are 3 s 4 s 5 s and 6 s-e prediction lengthsof steering-vehicle are 3 s 4 s and 5 s -ere are 80 trajec-tories in each group

Figure 8(a) shows the prediction error of the straightvehicle trajectories It can be found that the GPR model hasbetter performance than the commonly used model in

Observation information at current moment t0(position and speed of MV and UV)

MOP model

Initial parameter andparent population

Nondominant sortand calculate

crowding distance

Recombinationand mutation

Create an offspringpopulation

If terminationNo

Yes

Optimal solutionby pareto

Combinepopulations

Calculation byNSGA-II algorithm


Generate optimalmotion for vehicles

Update the state ofvehicles

Update the motionfor vehicles

No collision

Collision

Judge a collisionby EPET at t1

Figure 6 Process of the MOP model at intersections by the NSGA-II algorithm


prediction of straight vehicle trajectories Figure 8(b) showsthe prediction error of the steering-vehicle trajectories It canbe found that the GPR model is more accurate than theconstant-rate steering motion model (CTRV)

522 Effect of MOP Model Scenario 1 single-vehiclescenario

Figure 9(a) depicts the interaction between a UV andan incoming MV Two experiments were carried out in thesimulation platform -e difference between the twoexperiments was whether the UV was controlled by thetactical decision-making algorithm or not In the firstexperiment without the proposed algorithm a collisionbetween the MV and UV happened at t 58 s In the otherexperiment the main vehicle was controlled by theproposed algorithm When the two vehicles met at theintersection the main vehicle predicted the trajectory ofthe other vehicle which is shown in Figure 9(b) In thisexperiment deceleration was the optimal choice -edesired velocities given by the decision-making algorithmand the actual velocity changes are shown in Figure 9(c)-ere was no collision because the algorithm chose toyield to the incoming vehicle

Figure 9(c) shows that with the decision-making al-gorithm the main vehicle decelerates before entering

conflict zone thus slowing down to give way to the in-coming vehicle Figures 9(d) and 9(e) show the distancesand TTCs of the two vehicles Before the algorithm isexecuted both the distance and the TTC curves of the twovehicles pass through x 0 indicating that a collisionoccurs at this time After the algorithm is executed thedistance and the TTC remain within the safe range in-dicating that no collision occurred

523 Comparison of NSGA-II and DDPG AlgorithmScenario 2 multiple-vehicle scenario

To compare the performances of the DDPG and NSGA-II algorithms we conducted two groups of experiments onthe same scene in which DMV1 and DMV2 were respectively10m and 32m and the initial position of the UV ie DUVwas 30m We set MV1 and MV2 to drive with a constantspeed of 40 kmh Subsequently we trained the DDPG al-gorithm based on theMOPmodel tested the performance ingroup B and compared it with that of NSGA-II in group Aas shown in Figure 10

For group A the UV adopts a yield strategy wherein itslows down before t 3 s to wait for MV1 and MV2 to crossthe intersection and then accelerates after the MVs moveaway As shown in Figure 10(a) as the speed becomes

0268504896

0425707874

0681614217

1057928518

3s 4s 5s 6s

GPRCA

005

115

225

3

RMSE

(m)

(a)

3s 4s 5s 6s

0268504896

0425707874

0681614217

1057928518

GPRCA

005

115

225

3

RMSE

(m)

(b)

Figure 8 -e trajectory prediction error of straight vehicle and left-turn vehicle (a) Straight vehicle (b) Left-turn vehicle

(a)

MV2

MV1

UV

(b)

Figure 7 Simulation platform (a) PreScan (b) OpenAI Gym


increasingly lower than the expected speed the rewardappears to decline until t 3 s and increases thereafter Ahigher crossing time means a higher accumulation of thenegative reward which leads to a lower total reward ofminus 44184

Figure 10(b) shows that the UV passes through theintersection between the two MVs with an efficient strategyin group B as shown in the bottom image in Figure 10(b)the UV reaches the conflict zone at t 2 s approximately05 s earlier than MV2 In the image the shaded area rep-resents the conflict zone in consideration of the size of thevehicles With the efficient strategy of the DDPG the UVmaintains an acceleration of 2ms2 during the entire processof passing through the intersection thus achieving a muchhigher total reward than that in group A

-e comparison data in Table 1 show that the passingthrough time for the UV of group B is approximately 15 slower than that of group A which means that the DDPGalgorithm reduces the traffic delay and improves the

efficiency with which the UV passes through the intersec-tion Moreover the rate of change in the acceleration of theUV is lower in group B which implies a lower energyconsumption In general the DDPG algorithm is moreefficient than NSGA-II

-e stability of the DDPG and NSGA-II algorithms wasstudied by performing a new task wherein the initial speed ofthe UV was varied from 30 kmh to 55 kmh

We built a single-vehicle scene where there is only oneUV and imported the trained actor policy of the DDPG tooutput the motions of the UV We then imported theNSGA-II algorithm as a compared group to observe theperformance on the same task 10 times As shown inFigure 11 because the NSGA-II algorithm was recalcu-lated each time the total reward is quite different at thesame initial speed of the UV On the other hand theDDPG gives a more stable and efficient result and theaverage of the total rewards of the DDPG is higher thanthat of NSGA-II Furthermore the averages of the total

(a)

Loca

tion

(m)

70

60

50

40

30

20

10

t (s)0 1 2 3 4 5 6

Δ = 02s

Δ = 015s

ActualPredicted

(b)

v (m

s)

v (m

s)

0

2

4

6

10

5

00 2 4 6

t (s)8 10 12

0 2 4 6 8 10 12

BeforeAfter

StraightLeft-turn

(c)

Dist

ance

(m)

100

80

60

40

20

00 2 4 6

t (s)8 10 12

BeforeAfter

(d)

TTC

(s)

12

8

10

6

4

2

0

ndash20 2 4 6

Time (s)8 10 12

The nearest distance between the two vehicles before executing the algorithm

The nearest distancebetween the two

vehicles after executingthe algorithm

(e)

Figure 9 (a) Scenario 1 (b) Predicted trajectory (c) Velocity (d) Distance (e) TTC


Time (s)

Dist

ance

(m)

Total reward ndash4418400

ndash01

ndash02

ndash0300 05 10 15 20 25 30 35 40 45

00

20

ndash2

0

2

0

ndash20

ndash40

(ms

2 )

05 10 15 20 25 30 35 40 45

00 05 10 15 20 25 30 35 40 45

Reward

Acceleration

UVMV2MV1

(a)

Time (s)


00

ndash01

ndash02

ndash0300 05 10 15 20 25

ndash2

0

2D

istan

ce (m

)(m

s2 )

102030

0ndash10ndash20

00 05 10 15 20 25

00 05 10 15 20 25

Reward

Acceleration

UVMV2MV1

(b)

Figure 10 Comparison of the performance of the two algorithms (a) NSGA-II (b) DDPG

Table 1 Comparison data of two algorithms

Algorithm Time to cross the conflict zone for UV (s) Total reward _amax(ms3)

NSGA-II 375 minus 44184 5DDPG 225 minus 18743 0


rewards of the two algorithms decrease when the initialspeed is greater than 50 kmh which indicates the pos-sibility of a collision

6 Conclusion and Future Work

To improve the safety and efficiency of autonomous ve-hicles this paper proposed a MOP decision-makingmodel based on efficient conflict resolution for autono-mous vehicles at urban intersections which considers thecomplexity of urban intersections and the uncertainties ofvehicle behavior -e prediction algorithm for incomingvehicles was studied and we compare the performance forUV at intersections based on the decision-making modelby NSGA-II and DDPG -e main conclusions are listedas follows

(1) -e trajectory prediction model fits the predictedtrajectory by learning the probability distributionof a large amount of trajectory data and the ac-curacy of the model depends on the quantity andquality of the training data -e incoming vehicletrajectory data collected in this paper was limitedand was unable to cover all the incoming vehiclemotion patterns

(2) -e MOP decision-making model performs wellwhich can avoid a collision for vehicles happened atintersections Compared to a traditional machinelearning algorithm NSGA-II the performance ofDDPG algorithm is more stable and effective to solvethe MOP model at intersections and UVs performthe more appropriate and efficient motions byDDPG

-e decision making of autonomous vehicles is influencedby human-vehicle-road (environmental) factors Due to limitson the length of this article the impacts of pedestrians non-motor vehicles road structure types and traffic flow density ondecision-making were not considered in this study In thefuture the impacts of these factors will be studied and discussed-e interactions between people and vehicles will be consideredto further improve the decision-making model of driving be-havior under real road conditions

Data Availability

-e data used to support the findings of this study areprovided in the Supplementary Materials section

Conflicts of Interest

-e authors declare that they have no conflicts of interest

Acknowledgments

-is work was supported in part by the Youth Science Fund(no 51705021) Automobile Industry Joint Fund (noU1764261) of the National Natural Science Foundation ofChina Beijing Municipal Science and Technology Project(noZ191100007419010) and Key Laboratory for NewTechnology Application of Road Conveyance of JiangsuProvince (no BM20082061706)

Supplementary Materials

-e data collected from the intersections of Wei Gong CunRoad in Beijing using subgrade sensors and a retrofit

Speed (kmh)

NSGA-II

30 35 40 5045 5510

50

0

ndash50

ndash100

ndash150

Tota

l rew

ard

Times (n)

(a)

DDPG

Tota

l rew

ard

30 35 40 45 50 5520

4 6 8 10 12Speed (kmh)

Times (n)

0

ndash50

ndash100

ndash150

(b)

Figure 11 Stability and robustness of two algorithms


autonomous vehicle were as the training and testing samplesof trajectory prediction model And the data were dividedinto three categories left-turn vehicles right-turn vehiclesand straight vehicles Every category included the vehiclesinformation like location speed acceleration and so on(Supplementary Materials)

References

[1] M Paul Muresan I Giosan and S Nedevschi ldquoStabilizationand validation of 3D object position using multimodal sensorfusion and semantic segmentationrdquo Sensors vol 20 no 42020

[2] H Kim J Cho D Kim et al ldquoIntervention minimized semi-autonomous control using decoupled model predictivecontrolrdquo in Proceedings of the Intelligent Vehicles SymposiumIEEE Las Vegas NV USA July 2017

[3] M R Boukhari A Chaibet M Boukhnifer et al ldquoExtero-ceptive fault-tolerant control for autonomous and safedrivingrdquo in Automation Challenges of Socio-technical Sys-temsWiley Online Library New Jersey NY USA 2019

[4] S Gibbs Google Sibling Waymo Launches Fully AutonomousRide-Hailing Service -e Guardian London UK 2017

[5] S Zelinski T Koo and S Sastry ldquoOptimization-based for-mation reconfiguration planning for autonomous vehiclesrdquoin Proceedings of the 2003 IEEE International Conference onRobotics and Automation September 2003

[6] C Urmson J C Baker B P Salesky Rybski W WhittakerD Ferguson and M Darms ldquoAutonomous driving in trafficboss and the urban challengerdquo AI Magazine vol 30 no 2pp 17ndash28 2009

[7] H Whittaker Z Fan C Liu et al ldquoBaidu apollo em motionplannerrdquo 2018 httparxivorgabs180708048

[8] P Wang and C-Y Chan ldquoVehicle collision prediction atintersections based on comparison of minimal distance be-tween vehicles and dynamic thresholdsrdquo Iet IntelligentTransport Systems vol 11 no 10 pp 676ndash684 2017

[9] C Hubmann M Becker D Althoff et al ldquoDecision makingfor autonomous driving considering interaction and uncer-tain prediction of surrounding vehiclesrdquo in Proceedings of the2017 IEEE Intelligent Vehicles Symposium (IV) June 2017

[10] M Bojarski D Del Testa D Dworakowski et al ldquoEnd to endlearning for self-driving carsrdquo 2016 httparxivorgabs160407316

[11] J Chen Research on Decision Making System of AutonomousVehicle in Urban Environments University of Science andTechnology of China Hefei China 2014

[12] C Liu R Zheng and Q Guo ldquoA decision-making method forautonomous vehicles based on simulation and reinforcementlearningrdquo in Proceedings of the International Conference onMachine Learning amp Cybernetics Tianjin China July 2013

[13] Z Ma J Sun and Y Wang ldquoA two-dimensional simulationmodel for modelling turning vehicles at mixed-flow inter-sectionsrdquo Transportation Research Part C Emerging Tech-nologies vol 75 pp 103ndash119 2017

[14] S Zhong J Tan H Dong et al ldquoModeling-learning-basedactor-critic algorithm with Gaussian process approximatorrdquoJournal of Grid Computing vol 18 pp 181ndash195 2020

[15] G Xiong Y Li S Wang X Li and P Liu ldquoHMM and HSSbased social behavior of intelligent vehicles for freeway en-trance ramprdquo International Journal of Control and Auto-mation vol 7 no 10 pp 79ndash90 2014

[16] C Lv C Li Y Xing C Lu et al ldquoHybrid-learning-basedclassification and quantitative inference of driver braking

intensity of an electrified vehiclerdquo IEEE Transactions onVehicular Technology vol 99 no 1 2018

[17] X Chen G Tian C-Y Chan Y Miao J Gong and Y JiangldquoBionic lane driving of autonomous vehicles in complexurban environments decision-making analysisrdquo Trans-portation Research Record Journal of the Transportation Re-search Board vol 2559 no 1 pp 120ndash130 2016

[18] X ChenM Jin M Yi-song and Q Zhang ldquoDriving decision-making analysis of car-following for autonomous vehicleunder complex urban environmentrdquo Journal of Central SouthUniversity vol 24 pp 1476ndash1482 2017

[19] X-m Chen Q Zhang Z-h Zhang G-m Liu et al ldquoResearchon intelligent merging decision-making of unmanned vehiclesbased on reinforcement learningrdquo in 2018 IEEE IntelligentVehicles Symposium (IV) pp 91ndash96 Changshu SuzhouChina July 2018

[20] M Chen C Pan B Yin et al ldquoShip navigation trajectoryprediction based on Gaussian process regressionrdquo TechnologyInnovation and Application vol 31 pp 28-29 2017

[21] K Deb A Pratap S Agarwal and T Meyarivan ldquoA fast andelitist multiobjective genetic algorithm NSGA-IIrdquo IEEETransactions on Evolutionary Computation vol 6 no 2 2002

[22] V Mnih K Kavukcuoglu D Silver et al ldquoPlaying Atari withdeep reinforcement learningrdquo 2013 httparxivorgabs13125602

[23] L J Lin Reinforcement Learning for Robots Using NeuralNetworks Carnegie-Mellon University Pittsburgh PA USA1993

[24] T P Lillicrap J J Hunt A Pritzel et al ldquoContinuous controlwith deep reinforcement learningrdquo 2015 httparxivorgabs150902971

[25] S Ioffe and C Szegedy ldquoBatch normalizationacceleratingdeep network training by reducing internal covariate shiftrdquo2015 httparxivorgabs150203167

[26] M Stuart ldquoFSM design and verificationrdquo Electronic Engi-neering vol 71 pp 17-18 1999

[27] Y Gu Y Hashimoto L T Hsu et al ldquoMotion planning basedon learning models of pedestrian and driver behaviorsrdquo in2016 IEEE 19th International Conference on IntelligentTransportation Systems (ITSC) Rio de Janeiro Brazil No-vember 2016

[28] M Du Trajectory Prediction Method of Surrounding Vehiclesat Urban Intersections Based on Motion Modes RecognitionBeijng Institute of Technology Beijing China 2019

[29] N Zhao W Chen Y Xuan et al ldquoFocus and shift of visualattention in driving scenesrdquo Ergonomics vol 17 no 4pp 85ndash88 2011


deep convolutional neural network (DCNN) to establish anend-to-end driving model [10]

In recent years more and more researchers have begunstudying decision-making behavior Chen [11] established avehicle decision model in an urban environment using ahierarchical finite state machine method for different driversand road environment characteristics Liu et al [12] adoptedthe control prediction theory and the reinforcement learningtheory to obtain a decision model However these modelscannot be adapted to urban intersections Ma et al [13]proposed a decision-making framework titled ldquoPlan-Deci-sion-Actionrdquo for autonomous vehicles at complex urbanintersections Zhong et al [14] proposed a model-learning-based actor-critic algorithm with the Gaussian processapproximator to solve the problems with continuous stateand action spaces Xiong et al [15] used a Hidden Markovmodel to predict other vehiclesrsquo intentions and built a de-cision-making model for vehicles at intersections Lv et al[16] combined offline and online machine learning methodsto establish a personalized decision model that could sim-ulate the characteristics of driver behavior Chen et al [17]used the rough-set theory to extract different driversrsquo de-cision rules Chen et al [18] used a novel RSAN (rough-setartificial neural network) method to learn decisions made byexcellent human drivers Chen et al [19] proposed amergingstrategy based on the least squares policy iteration (LSPI)algorithm and selected a basis function that included thereciprocal of TTC relative distance and relative velocity torepresent the state space and discretize the action spaceHowever these studies did not take the overall interactionscenarios into consideration and can only be adopted forshort-term trajectory prediction

-is paper focuses on the decision-making process ofautonomous vehicles in an urban environment and developsa vehicle trajectory prediction model based on Gaussianprocess regression (GPR) [20] which can generate long-term predictions of incoming vehicles -e problem ofconflict resolution among vehicles at intersections is mod-eled as a multiobjective optimization problem (MOP) inwhich the acceleration as the only decision variable is usedto control the vehicles -e main contributions of this workare the presentations of two solutions of intersection mul-tiobjective optimization problems First the noninferiorgenetic algorithm (NSGA-II) is applied to maximize theoverall driving benefit of system the other one considers thedeep deterministic policy gradient (DDPG) algorithm ofreinforcement learning with continuous actions Its ex-pected gradient of the action-value function means thatDDPG can be estimated much more stable than the usualstochastic policy gradient A simulation and verificationplatform was built to validate the results based on MatlabSimulink and PreScan and the proposed MOP decision-making method and calculation algorithms were verified inseveral typical scenarios

-e remainder of this paper is organized as followsSection 2 elaborates upon the methodology used in thisstudy which includes an introduction of Gaussian processregression nondominated sorting genetic algorithm(NSGA-II) and deep deterministic policy gradient

algorithm of reinforcement learning Section 3 describesdata acquisition and data processing Section 4 proposes theGPR models for trajectory prediction and the MOP deci-sion-making model based on efficient conflict resolution atintersections which is solved by NSGA-II and DDPG -esimulation verification platform to evaluate the effectivenessand reliability of the proposed model and performancebetween two algorithms is introduced in Section 5 InSection 6 conclusions and future work are presented

2 Methodology

21 Gaussian Process Regression Model Gaussian processregression (GPR) is a statistical method that can make fulluse of raw data by considering its temporal trends andperiodic changes to establish a suitable predictive model-is model has been used to predict the trajectories ofvehicles and has been proven to be efficient Compared withLSTM its main advantage is that it is more robust whendealing with data with noise making it more suitable forurban intersections

-e log likelihood function of the sample data is shownas follows

logp(y|X) minus12yTCminus 1y minus

12log|C| minus

N

2log(2π) (1)

-e joint distribution of the modelrsquos observations andtraining data is shown as follows

y

ylowast1113890 1113891 sim N 0

C KlowastKTlowast C xlowast xlowast( 1113857

1113890 11138911113888 1113889 (2)

where Klowast [C(xlowast x1)C(xlowast x2) C(xlowast xn)]T is thecovariance matrix between the test data xlowast and the trainingdata and C(xlowast xlowast) is the covariance matrix of the test dataitself

-erefore the output of the model can be found with (3)By calculating the mean and variance of the output ylowast of themodel the predicted mean 1113955ylowast and predictive confidence σ2lowastof the model can be obtained separately

1113955ylowast E ylowast( 1113857 KTlowastC

minus 1y

σ2lowast Var ylowast( 1113857 C xlowast xlowast( 1113857 minus KTlowastC

minus 1Klowast

⎧⎨

⎩ (3)

22 Nondominated Sorting Genetic Algorithm In 2000 anew nondominated sorting genetic algorithm (NSGA-II)was proposed by Srinivas and Deb on the basis of the NSGAwhich is a theory and method of handling the Pareto optimain multiobjective optimization problems It is one of themost popular multiobjective genetic algorithms (GAs) instudying complex system analysis and diversity resultsdiscovery -e structure of the algorithm is as shown inFigure 1

-e step-by-step procedure shows that NSGA-II algo-rithm is simple and straightforward First a combinedpopulation Rt Pt cup Qt is formed -e population Rt is ofsize 2N -en the population Rt is sorted according tonondomination Since all previous and current population







L 1N

1113944

N

i

Q st at|θQ

1113872 1113873 minus yi1113872 11138732 (4)



nablaθμJ asymp1N

1113944

N

i


(5)



(6)

3 Data





4 Model



Rejected

F1

Pt

Pt + 1Qt

Rt

F2

F3

Fn



60

40

20

0Vert

ical

pos

ition

(m)

ndash200 2 4 6

t (s)8 10 12

Vertical position

RawProcessed

Lateral position14

13

12

11

10

Late

ral p

ositi

on (m

)

90 2 4 6

t (s)8 10 12

RawProcessed

06

04

02

0

ndash02

a (m

s2 )

ndash040 2 4 6

t (s)8 10 12

Acceleration

RawProcessed

27

26

25

24

22

21

23v (k

mh

)

200 2 4 6

t (s)8 10 12

Velocity

RawProcessed

(a)

32-line laser radar



(b)


1

2

3UV

5

2 1(X1 Y1V1 φ1)

(X2 Y2 V2 φ2)

4

S2

S1

7

3 v2

v1

8D A

C B

6

MV

(a)

UV

MV

Tmv2

Conflict zone UV

MV

Tuv1

Conflict zone

(b)










EPET Tuv minus Tmvi

11138681113868111386811138681113868

11138681113868111386811138681113868 (7)






2

V2cri

(8)




minf(X)

hi(X) 0 i 1 2 m

gj(X)ge 0 j 1 2 l

⎧⎪⎪⎪⎨

⎪⎪⎪⎩

(9)





(10)





x1(t) minus x2(t)( 11138572

+ y1(t) minus y2(t)( 11138572

1113872 1113873

1113969

neR1 + R2 (11)


where Ri L2

i + W2i




21113876 1113877cosφi


21113876 1113877sinφi

⎧⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎩

0le tleT

(12)





Trajectory data




the prior model




points

(a)




Output =

1 0 Δt Δt220


0 1 0 0Δt Δt220 0 1 Δt0 00 0 0 01 Δt0 0 0 10 00 0 0 00 1

Update


Y1 = αx(t)xt+1

yt+1

vxt+1

vyt+1

axt+1

ayt+1

xtytvxtvytaxtayt

(b)



MV (x1 y1 v1 φ1)

UV (x2 y2 v2 φ2)

v1ndashv2

v2

xo

y

v1














MOP model



crowding distance



If terminationNo

Yes


Combinepopulations






No collision

Collision












0268504896

0425707874

0681614217

1057928518

3s 4s 5s 6s

GPRCA

005

115

225

3

RMSE

(m)

(a)

3s 4s 5s 6s

0268504896

0425707874

0681614217

1057928518

GPRCA

005

115

225

3

RMSE

(m)

(b)


(a)

MV2

MV1

UV

(b)









(a)

Loca

tion

(m)

70

60

50

40

30

20

10

t (s)0 1 2 3 4 5 6

Δ = 02s

Δ = 015s

ActualPredicted

(b)

v (m

s)

v (m

s)

0

2

4

6

10

5

00 2 4 6

t (s)8 10 12

0 2 4 6 8 10 12

BeforeAfter

StraightLeft-turn

(c)

Dist

ance

(m)

100

80

60

40

20

00 2 4 6

t (s)8 10 12

BeforeAfter

(d)

TTC

(s)

12

8

10

6

4

2

0

ndash20 2 4 6

Time (s)8 10 12




(e)



Time (s)

Dist

ance

(m)


ndash01

ndash02

ndash0300 05 10 15 20 25 30 35 40 45

00

20

ndash2

0

2

0

ndash20

ndash40

(ms

2 )

05 10 15 20 25 30 35 40 45

00 05 10 15 20 25 30 35 40 45

Reward

Acceleration

UVMV2MV1

(a)

Time (s)


00

ndash01

ndash02

ndash0300 05 10 15 20 25

ndash2

0

2D

istan

ce (m

)(m

s2 )

102030

0ndash10ndash20

00 05 10 15 20 25

00 05 10 15 20 25

Reward

Acceleration

UVMV2MV1

(b)












Data Availability




Acknowledgments




Speed (kmh)

NSGA-II

30 35 40 5045 5510

50

0

ndash50

ndash100

ndash150

Tota

l rew

ard

Times (n)

(a)

DDPG

Tota

l rew

ard

30 35 40 45 50 5520

4 6 8 10 12Speed (kmh)

Times (n)

0

ndash50

ndash100

ndash150

(b)




References





































L 1N

1113944

N

i

Q st at|θQ

1113872 1113873 minus yi1113872 11138732 (4)



nablaθμJ asymp1N

1113944

N

i


(5)



(6)

3 Data





4 Model



Rejected

F1

Pt

Pt + 1Qt

Rt

F2

F3

Fn



60

40

20

0Vert

ical

pos

ition

(m)

ndash200 2 4 6

t (s)8 10 12

Vertical position

RawProcessed

Lateral position14

13

12

11

10

Late

ral p

ositi

on (m

)

90 2 4 6

t (s)8 10 12

RawProcessed

06

04

02

0

ndash02

a (m

s2 )

ndash040 2 4 6

t (s)8 10 12

Acceleration

RawProcessed

27

26

25

24

22

21

23v (k

mh

)

200 2 4 6

t (s)8 10 12

Velocity

RawProcessed

(a)

32-line laser radar



(b)


1

2

3UV

5

2 1(X1 Y1V1 φ1)

(X2 Y2 V2 φ2)

4

S2

S1

7

3 v2

v1

8D A

C B

6

MV

(a)

UV

MV

Tmv2

Conflict zone UV

MV

Tuv1

Conflict zone

(b)










EPET Tuv minus Tmvi

11138681113868111386811138681113868

11138681113868111386811138681113868 (7)






2

V2cri

(8)




minf(X)

hi(X) 0 i 1 2 m

gj(X)ge 0 j 1 2 l

⎧⎪⎪⎪⎨

⎪⎪⎪⎩

(9)





(10)





x1(t) minus x2(t)( 11138572

+ y1(t) minus y2(t)( 11138572

1113872 1113873

1113969

neR1 + R2 (11)


where Ri L2

i + W2i




21113876 1113877cosφi


21113876 1113877sinφi

⎧⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎩

0le tleT

(12)





Trajectory data




the prior model




points

(a)




Output =

1 0 Δt Δt220


0 1 0 0Δt Δt220 0 1 Δt0 00 0 0 01 Δt0 0 0 10 00 0 0 00 1

Update


Y1 = αx(t)xt+1

yt+1

vxt+1

vyt+1

axt+1

ayt+1

xtytvxtvytaxtayt

(b)



MV (x1 y1 v1 φ1)

UV (x2 y2 v2 φ2)

v1ndashv2

v2

xo

y

v1














MOP model



crowding distance



If terminationNo

Yes


Combinepopulations






No collision

Collision












0268504896

0425707874

0681614217

1057928518

3s 4s 5s 6s

GPRCA

005

115

225

3

RMSE

(m)

(a)

3s 4s 5s 6s

0268504896

0425707874

0681614217

1057928518

GPRCA

005

115

225

3

RMSE

(m)

(b)


(a)

MV2

MV1

UV

(b)









(a)

Loca

tion

(m)

70

60

50

40

30

20

10

t (s)0 1 2 3 4 5 6

Δ = 02s

Δ = 015s

ActualPredicted

(b)

v (m

s)

v (m

s)

0

2

4

6

10

5

00 2 4 6

t (s)8 10 12

0 2 4 6 8 10 12

BeforeAfter

StraightLeft-turn

(c)

Dist

ance

(m)

100

80

60

40

20

00 2 4 6

t (s)8 10 12

BeforeAfter

(d)

TTC

(s)

12

8

10

6

4

2

0

ndash20 2 4 6

Time (s)8 10 12




(e)



Time (s)

Dist

ance

(m)


ndash01

ndash02

ndash0300 05 10 15 20 25 30 35 40 45

00

20

ndash2

0

2

0

ndash20

ndash40

(ms

2 )

05 10 15 20 25 30 35 40 45

00 05 10 15 20 25 30 35 40 45

Reward

Acceleration

UVMV2MV1

(a)

Time (s)


00

ndash01

ndash02

ndash0300 05 10 15 20 25

ndash2

0

2D

istan

ce (m

)(m

s2 )

102030

0ndash10ndash20

00 05 10 15 20 25

00 05 10 15 20 25

Reward

Acceleration

UVMV2MV1

(b)












Data Availability




Acknowledgments




Speed (kmh)

NSGA-II

30 35 40 5045 5510

50

0

ndash50

ndash100

ndash150

Tota

l rew

ard

Times (n)

(a)

DDPG

Tota

l rew

ard

30 35 40 45 50 5520

4 6 8 10 12Speed (kmh)

Times (n)

0

ndash50

ndash100

ndash150

(b)




References
































60

40

20

0Vert

ical

pos

ition

(m)

ndash200 2 4 6

t (s)8 10 12

Vertical position

RawProcessed

Lateral position14

13

12

11

10

Late

ral p

ositi

on (m

)

90 2 4 6

t (s)8 10 12

RawProcessed

06

04

02

0

ndash02

a (m

s2 )

ndash040 2 4 6

t (s)8 10 12

Acceleration

RawProcessed

27

26

25

24

22

21

23v (k

mh

)

200 2 4 6

t (s)8 10 12

Velocity

RawProcessed

(a)

32-line laser radar



(b)


1

2

3UV

5

2 1(X1 Y1V1 φ1)

(X2 Y2 V2 φ2)

4

S2

S1

7

3 v2

v1

8D A

C B

6

MV

(a)

UV

MV

Tmv2

Conflict zone UV

MV

Tuv1

Conflict zone

(b)










EPET Tuv minus Tmvi

11138681113868111386811138681113868

11138681113868111386811138681113868 (7)






2

V2cri

(8)




minf(X)

hi(X) 0 i 1 2 m

gj(X)ge 0 j 1 2 l

⎧⎪⎪⎪⎨

⎪⎪⎪⎩

(9)





(10)





x1(t) minus x2(t)( 11138572

+ y1(t) minus y2(t)( 11138572

1113872 1113873

1113969

neR1 + R2 (11)


where Ri L2

i + W2i




21113876 1113877cosφi


21113876 1113877sinφi

⎧⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎩

0le tleT

(12)





Trajectory data




the prior model




points

(a)




Output =

1 0 Δt Δt220


0 1 0 0Δt Δt220 0 1 Δt0 00 0 0 01 Δt0 0 0 10 00 0 0 00 1

Update


Y1 = αx(t)xt+1

yt+1

vxt+1

vyt+1

axt+1

ayt+1

xtytvxtvytaxtayt

(b)



MV (x1 y1 v1 φ1)

UV (x2 y2 v2 φ2)

v1ndashv2

v2

xo

y

v1














MOP model



crowding distance



If terminationNo

Yes


Combinepopulations






No collision

Collision












0268504896

0425707874

0681614217

1057928518

3s 4s 5s 6s

GPRCA

005

115

225

3

RMSE

(m)

(a)

3s 4s 5s 6s

0268504896

0425707874

0681614217

1057928518

GPRCA

005

115

225

3

RMSE

(m)

(b)


(a)

MV2

MV1

UV

(b)









(a)

Loca

tion

(m)

70

60

50

40

30

20

10

t (s)0 1 2 3 4 5 6

Δ = 02s

Δ = 015s

ActualPredicted

(b)

v (m

s)

v (m

s)

0

2

4

6

10

5

00 2 4 6

t (s)8 10 12

0 2 4 6 8 10 12

BeforeAfter

StraightLeft-turn

(c)

Dist

ance

(m)

100

80

60

40

20

00 2 4 6

t (s)8 10 12

BeforeAfter

(d)

TTC

(s)

12

8

10

6

4

2

0

ndash20 2 4 6

Time (s)8 10 12




(e)



Time (s)

Dist

ance

(m)


ndash01

ndash02

ndash0300 05 10 15 20 25 30 35 40 45

00

20

ndash2

0

2

0

ndash20

ndash40

(ms

2 )

05 10 15 20 25 30 35 40 45

00 05 10 15 20 25 30 35 40 45

Reward

Acceleration

UVMV2MV1

(a)

Time (s)


00

ndash01

ndash02

ndash0300 05 10 15 20 25

ndash2

0

2D

istan

ce (m

)(m

s2 )

102030

0ndash10ndash20

00 05 10 15 20 25

00 05 10 15 20 25

Reward

Acceleration

UVMV2MV1

(b)












Data Availability




Acknowledgments




Speed (kmh)

NSGA-II

30 35 40 5045 5510

50

0

ndash50

ndash100

ndash150

Tota

l rew

ard

Times (n)

(a)

DDPG

Tota

l rew

ard

30 35 40 45 50 5520

4 6 8 10 12Speed (kmh)

Times (n)

0

ndash50

ndash100

ndash150

(b)




References







































EPET Tuv minus Tmvi

11138681113868111386811138681113868

11138681113868111386811138681113868 (7)






2

V2cri

(8)




minf(X)

hi(X) 0 i 1 2 m

gj(X)ge 0 j 1 2 l

⎧⎪⎪⎪⎨

⎪⎪⎪⎩

(9)





(10)





x1(t) minus x2(t)( 11138572

+ y1(t) minus y2(t)( 11138572

1113872 1113873

1113969

neR1 + R2 (11)


where Ri L2

i + W2i




21113876 1113877cosφi


21113876 1113877sinφi

⎧⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎩

0le tleT

(12)





Trajectory data




the prior model




points

(a)




Output =

1 0 Δt Δt220


0 1 0 0Δt Δt220 0 1 Δt0 00 0 0 01 Δt0 0 0 10 00 0 0 00 1

Update


Y1 = αx(t)xt+1

yt+1

vxt+1

vyt+1

axt+1

ayt+1

xtytvxtvytaxtayt

(b)



MV (x1 y1 v1 φ1)

UV (x2 y2 v2 φ2)

v1ndashv2

v2

xo

y

v1














MOP model



crowding distance



If terminationNo

Yes


Combinepopulations






No collision

Collision












0268504896

0425707874

0681614217

1057928518

3s 4s 5s 6s

GPRCA

005

115

225

3

RMSE

(m)

(a)

3s 4s 5s 6s

0268504896

0425707874

0681614217

1057928518

GPRCA

005

115

225

3

RMSE

(m)

(b)


(a)

MV2

MV1

UV

(b)









(a)

Loca

tion

(m)

70

60

50

40

30

20

10

t (s)0 1 2 3 4 5 6

Δ = 02s

Δ = 015s

ActualPredicted

(b)

v (m

s)

v (m

s)

0

2

4

6

10

5

00 2 4 6

t (s)8 10 12

0 2 4 6 8 10 12

BeforeAfter

StraightLeft-turn

(c)

Dist

ance

(m)

100

80

60

40

20

00 2 4 6

t (s)8 10 12

BeforeAfter

(d)

TTC

(s)

12

8

10

6

4

2

0

ndash20 2 4 6

Time (s)8 10 12




(e)



Time (s)

Dist

ance

(m)


ndash01

ndash02

ndash0300 05 10 15 20 25 30 35 40 45

00

20

ndash2

0

2

0

ndash20

ndash40

(ms

2 )

05 10 15 20 25 30 35 40 45

00 05 10 15 20 25 30 35 40 45

Reward

Acceleration

UVMV2MV1

(a)

Time (s)


00

ndash01

ndash02

ndash0300 05 10 15 20 25

ndash2

0

2D

istan

ce (m

)(m

s2 )

102030

0ndash10ndash20

00 05 10 15 20 25

00 05 10 15 20 25

Reward

Acceleration

UVMV2MV1

(b)












Data Availability




Acknowledgments




Speed (kmh)

NSGA-II

30 35 40 5045 5510

50

0

ndash50

ndash100

ndash150

Tota

l rew

ard

Times (n)

(a)

DDPG

Tota

l rew

ard

30 35 40 45 50 5520

4 6 8 10 12Speed (kmh)

Times (n)

0

ndash50

ndash100

ndash150

(b)




References
































where Ri L2

i + W2i




21113876 1113877cosφi


21113876 1113877sinφi

⎧⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎩

0le tleT

(12)





Trajectory data




the prior model




points

(a)




Output =

1 0 Δt Δt220


0 1 0 0Δt Δt220 0 1 Δt0 00 0 0 01 Δt0 0 0 10 00 0 0 00 1

Update


Y1 = αx(t)xt+1

yt+1

vxt+1

vyt+1

axt+1

ayt+1

xtytvxtvytaxtayt

(b)



MV (x1 y1 v1 φ1)

UV (x2 y2 v2 φ2)

v1ndashv2

v2

xo

y

v1














MOP model



crowding distance



If terminationNo

Yes


Combinepopulations






No collision

Collision












0268504896

0425707874

0681614217

1057928518

3s 4s 5s 6s

GPRCA

005

115

225

3

RMSE

(m)

(a)

3s 4s 5s 6s

0268504896

0425707874

0681614217

1057928518

GPRCA

005

115

225

3

RMSE

(m)

(b)


(a)

MV2

MV1

UV

(b)









(a)

Loca

tion

(m)

70

60

50

40

30

20

10

t (s)0 1 2 3 4 5 6

Δ = 02s

Δ = 015s

ActualPredicted

(b)

v (m

s)

v (m

s)

0

2

4

6

10

5

00 2 4 6

t (s)8 10 12

0 2 4 6 8 10 12

BeforeAfter

StraightLeft-turn

(c)

Dist

ance

(m)

100

80

60

40

20

00 2 4 6

t (s)8 10 12

BeforeAfter

(d)

TTC

(s)

12

8

10

6

4

2

0

ndash20 2 4 6

Time (s)8 10 12




(e)



Time (s)

Dist

ance

(m)


ndash01

ndash02

ndash0300 05 10 15 20 25 30 35 40 45

00

20

ndash2

0

2

0

ndash20

ndash40

(ms

2 )

05 10 15 20 25 30 35 40 45

00 05 10 15 20 25 30 35 40 45

Reward

Acceleration

UVMV2MV1

(a)

Time (s)


00

ndash01

ndash02

ndash0300 05 10 15 20 25

ndash2

0

2D

istan

ce (m

)(m

s2 )

102030

0ndash10ndash20

00 05 10 15 20 25

00 05 10 15 20 25

Reward

Acceleration

UVMV2MV1

(b)












Data Availability




Acknowledgments




Speed (kmh)

NSGA-II

30 35 40 5045 5510

50

0

ndash50

ndash100

ndash150

Tota

l rew

ard

Times (n)

(a)

DDPG

Tota

l rew

ard

30 35 40 45 50 5520

4 6 8 10 12Speed (kmh)

Times (n)

0

ndash50

ndash100

ndash150

(b)




References











































MOP model



crowding distance



If terminationNo

Yes


Combinepopulations






No collision

Collision












0268504896

0425707874

0681614217

1057928518

3s 4s 5s 6s

GPRCA

005

115

225

3

RMSE

(m)

(a)

3s 4s 5s 6s

0268504896

0425707874

0681614217

1057928518

GPRCA

005

115

225

3

RMSE

(m)

(b)


(a)

MV2

MV1

UV

(b)









(a)

Loca

tion

(m)

70

60

50

40

30

20

10

t (s)0 1 2 3 4 5 6

Δ = 02s

Δ = 015s

ActualPredicted

(b)

v (m

s)

v (m

s)

0

2

4

6

10

5

00 2 4 6

t (s)8 10 12

0 2 4 6 8 10 12

BeforeAfter

StraightLeft-turn

(c)

Dist

ance

(m)

100

80

60

40

20

00 2 4 6

t (s)8 10 12

BeforeAfter

(d)

TTC

(s)

12

8

10

6

4

2

0

ndash20 2 4 6

Time (s)8 10 12




(e)



Time (s)

Dist

ance

(m)


ndash01

ndash02

ndash0300 05 10 15 20 25 30 35 40 45

00

20

ndash2

0

2

0

ndash20

ndash40

(ms

2 )

05 10 15 20 25 30 35 40 45

00 05 10 15 20 25 30 35 40 45

Reward

Acceleration

UVMV2MV1

(a)

Time (s)


00

ndash01

ndash02

ndash0300 05 10 15 20 25

ndash2

0

2D

istan

ce (m

)(m

s2 )

102030

0ndash10ndash20

00 05 10 15 20 25

00 05 10 15 20 25

Reward

Acceleration

UVMV2MV1

(b)












Data Availability




Acknowledgments




Speed (kmh)

NSGA-II

30 35 40 5045 5510

50

0

ndash50

ndash100

ndash150

Tota

l rew

ard

Times (n)

(a)

DDPG

Tota

l rew

ard

30 35 40 45 50 5520

4 6 8 10 12Speed (kmh)

Times (n)

0

ndash50

ndash100

ndash150

(b)




References








































0268504896

0425707874

0681614217

1057928518

3s 4s 5s 6s

GPRCA

005

115

225

3

RMSE

(m)

(a)

3s 4s 5s 6s

0268504896

0425707874

0681614217

1057928518

GPRCA

005

115

225

3

RMSE

(m)

(b)


(a)

MV2

MV1

UV

(b)









(a)

Loca

tion

(m)

70

60

50

40

30

20

10

t (s)0 1 2 3 4 5 6

Δ = 02s

Δ = 015s

ActualPredicted

(b)

v (m

s)

v (m

s)

0

2

4

6

10

5

00 2 4 6

t (s)8 10 12

0 2 4 6 8 10 12

BeforeAfter

StraightLeft-turn

(c)

Dist

ance

(m)

100

80

60

40

20

00 2 4 6

t (s)8 10 12

BeforeAfter

(d)

TTC

(s)

12

8

10

6

4

2

0

ndash20 2 4 6

Time (s)8 10 12




(e)



Time (s)

Dist

ance

(m)


ndash01

ndash02

ndash0300 05 10 15 20 25 30 35 40 45

00

20

ndash2

0

2

0

ndash20

ndash40

(ms

2 )

05 10 15 20 25 30 35 40 45

00 05 10 15 20 25 30 35 40 45

Reward

Acceleration

UVMV2MV1

(a)

Time (s)


00

ndash01

ndash02

ndash0300 05 10 15 20 25

ndash2

0

2D

istan

ce (m

)(m

s2 )

102030

0ndash10ndash20

00 05 10 15 20 25

00 05 10 15 20 25

Reward

Acceleration

UVMV2MV1

(b)












Data Availability




Acknowledgments




Speed (kmh)

NSGA-II

30 35 40 5045 5510

50

0

ndash50

ndash100

ndash150

Tota

l rew

ard

Times (n)

(a)

DDPG

Tota

l rew

ard

30 35 40 45 50 5520

4 6 8 10 12Speed (kmh)

Times (n)

0

ndash50

ndash100

ndash150

(b)




References






































(a)

Loca

tion

(m)

70

60

50

40

30

20

10

t (s)0 1 2 3 4 5 6

Δ = 02s

Δ = 015s

ActualPredicted

(b)

v (m

s)

v (m

s)

0

2

4

6

10

5

00 2 4 6

t (s)8 10 12

0 2 4 6 8 10 12

BeforeAfter

StraightLeft-turn

(c)

Dist

ance

(m)

100

80

60

40

20

00 2 4 6

t (s)8 10 12

BeforeAfter

(d)

TTC

(s)

12

8

10

6

4

2

0

ndash20 2 4 6

Time (s)8 10 12




(e)



Time (s)

Dist

ance

(m)


ndash01

ndash02

ndash0300 05 10 15 20 25 30 35 40 45

00

20

ndash2

0

2

0

ndash20

ndash40

(ms

2 )

05 10 15 20 25 30 35 40 45

00 05 10 15 20 25 30 35 40 45

Reward

Acceleration

UVMV2MV1

(a)

Time (s)


00

ndash01

ndash02

ndash0300 05 10 15 20 25

ndash2

0

2D

istan

ce (m

)(m

s2 )

102030

0ndash10ndash20

00 05 10 15 20 25

00 05 10 15 20 25

Reward

Acceleration

UVMV2MV1

(b)












Data Availability




Acknowledgments




Speed (kmh)

NSGA-II

30 35 40 5045 5510

50

0

ndash50

ndash100

ndash150

Tota

l rew

ard

Times (n)

(a)

DDPG

Tota

l rew

ard

30 35 40 45 50 5520

4 6 8 10 12Speed (kmh)

Times (n)

0

ndash50

ndash100

ndash150

(b)




References
































Time (s)

Dist

ance

(m)


ndash01

ndash02

ndash0300 05 10 15 20 25 30 35 40 45

00

20

ndash2

0

2

0

ndash20

ndash40

(ms

2 )

05 10 15 20 25 30 35 40 45

00 05 10 15 20 25 30 35 40 45

Reward

Acceleration

UVMV2MV1

(a)

Time (s)


00

ndash01

ndash02

ndash0300 05 10 15 20 25

ndash2

0

2D

istan

ce (m

)(m

s2 )

102030

0ndash10ndash20

00 05 10 15 20 25

00 05 10 15 20 25

Reward

Acceleration

UVMV2MV1

(b)












Data Availability




Acknowledgments




Speed (kmh)

NSGA-II

30 35 40 5045 5510

50

0

ndash50

ndash100

ndash150

Tota

l rew

ard

Times (n)

(a)

DDPG

Tota

l rew

ard

30 35 40 45 50 5520

4 6 8 10 12Speed (kmh)

Times (n)

0

ndash50

ndash100

ndash150

(b)




References






































Data Availability




Acknowledgments




Speed (kmh)

NSGA-II

30 35 40 5045 5510

50

0

ndash50

ndash100

ndash150

Tota

l rew

ard

Times (n)

(a)

DDPG

Tota

l rew

ard

30 35 40 45 50 5520

4 6 8 10 12Speed (kmh)

Times (n)

0

ndash50

ndash100

ndash150

(b)




References

































References
































a decision-making model for autonomous vehicles at urban

Documents