automated methods for the evaluation and analysis of

133
Escuela Polit Ž ecnica Superior Dpto. de Ingenier Ž ıa Inform Ž atica Doctorado en Ingenier Ž ıa Inform Ž atica y Telecomunicaci Ž on Doctoral Thesis Automated Methods for the Evaluation and Analysis of Training Operations in an Unmanned Aircraft System Author V Ž ICTOR RODR Ž IGUEZ FERN Ž ANDEZ Advisor Dr. D. DAVID CAMACHO FERN Ž ANDEZ Co-Advisor Dr. D. ANTONIO GONZ Ž ALEZ PARDO May 2019

Upload: others

Post on 13-Feb-2022

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automated Methods for the Evaluation and Analysis of

Escuela Politecnica Superior

Dpto. de Ingenierıa Informatica

Doctorado en Ingenierıa Informatica y Telecomunicacion

Doctoral Thesis

Automated Methods for theEvaluation and Analysis ofTraining Operations in an

Unmanned Aircraft System

Author

VICTOR RODRIGUEZ FERNANDEZ

Advisor

Dr. D. DAVID CAMACHO FERNANDEZ

Co-Advisor

Dr. D. ANTONIO GONZALEZ PARDO

May 2019

Page 2: Automated Methods for the Evaluation and Analysis of

Department: Ingenierıa InformaticaEscuela Politecnica SuperiorUniversidad Autonoma de Madrid (UAM)SPAIN

PhD Thesis: “Automated Methods for the Evaluation and Analysisof Training Operations in an Unmanned Aircraft System.”

Author: Vıctor Rodrıguez FernandezIngeniero en InformaticaUniversidad Autonoma de Madrid, Spain

Advisor: David Camacho FernandezDoctor Ingeniero en InformaticaUniversidad Autonoma de Madrid, Spain

Co-advisor: Antonio Gonzalez PardoDoctor Ingeniero en InformaticaUniversidad Autonoma de Madrid, Spain

Year: 2019

Committee: President:Prof. Carlos CottaUniversidad de Malaga, Spain

Secretary:Dr. Raul Lara CabreraUniversidad Politecnica de Madrid, Spain

Vocal 1:Dr. Javier del SerOptima Area, Tecnalia Research & Innovation, 48160 Derio, Bizkaia, Spain

Vocal 2:Prof. Sancho Salcedo-SanzUniversidad de Alcala, Spain

Vocal 3:Prof. Grzegorz J. NalepaAGH University of Science and Technology, Jagiellonian University , Poland

Page 3: Automated Methods for the Evaluation and Analysis of

Este trabajo de investigacion ha sido financiado por Airbus Defence & Space

a traves del proyecto SAVIER (Situational Awareness Virtual Environment),

con codigo FUAM-076914.

This research work has been funded by Airbus Defence & Space

under SAVIER (Situational Awareness Virtual Environment) project,

with code FUAM-076914.

Page 4: Automated Methods for the Evaluation and Analysis of

Agradecimientos

La verdad es que no se ni como empezar estos agradecimientos para lo que ha sido,a grandes rasgos, la principal ocupacion que ha ocupado los ultimos 4 anos de mivida. Hace poco vi un artıculo cuyo titular mencionaba que hacer un doctoradoperjudicaba seriamente la salud mental, y aunque me parece algo amarillista, le veoalgo de sentido, debido al nivel de compromiso y responsabilidad que uno sientecon lo que esta presentando. Por suerte, toda la gente que te acompana duranteesta “locura” es lo que hace que termine bien, y que quede en la memoria como unrecuerdo unico e imborrable.

En primer lugar, quisiera agradecer la financiacion de este trabajo por parte deAirbus Defence & Space bajo el proyecto SAVIER (FUAM-076914), ası como la in-formacion proporcionada por los miembros del proyecto Jose Insenser, Juan AntonioHenrıquez y Gemma Blasco. Por supuesto, agradecer tambien el apoyo del resto departicipantes del proyecto, los “savieritos”. Realmente quede muy sorprendido conel talento y el esfuerzo de algunos de los trabajos que componıan el proyecto.

En segundo lugar, me gustarıa dar las gracias al Departamento de Informatica de laUniversidad Autonoma de Madrid por haberme dado la oportunidad de realizar estatesis, y a la gente que se esfuerza dıa a dıa por mejorar el nivel docente e investigadorde este departamento. Gracias tambien a Diego y a Lorenzo por transmitir su alegrıaen todos los momentos de cafeterıa, que han sido muchos.

Ya hace casi 5 anos desde que mande el primer correo a un miembro de este depar-tamento, el Dr. David Camacho, que se convertirıa en director mi trabajo fin demaster y de esta tesis. Hoy en dıa no se cuantos cientos de correos nos habremosintercambiado ya, y la verdad es que aun me siento afortunado de haber dado conalguien ası, no solo por las oportunidades que ofrece a nivel academico, sino por lacercanıa que se siente cuando se trabaja con el. Quiero agradecer tambien el trabajode codireccion del Dr. Antonio Gonzalez-Pardo, que me ha aportado ayuda, guıa yenergıa durante todos estos anos. Ambos se han convertido en muy buenos amigos,y eso es algo que agradezco profundamente.

Parte del trabajo contenido en esta tesis se debe tambien a la supervision y el apoyodel Dr. Hector Menendez, con quien tengo la suerte de mantener una gran amistad adıa de hoy. La vida da muchas vueltas, y aunque ahora nuestra relacion es puramentecervecil, ojala llegue el dıa en que volvamos a colaborar en algo juntos.

I would also like to acknowledge Professor Massimiliano Vasile for receiving andtutoring me during my stay at the university of Strathclyde. I hope that our futurecollaborations are successful. I would also like to thank the support and kindnessof rest of the people working in his research group, specially Romain, who guidedme through the world of aerospace engineering and also became my close “gabachofriend”.

Page 5: Automated Methods for the Evaluation and Analysis of

VI

Volviendo al departamento, y mas en especıfico al grupo de ınvestigacion AIDA,quiero agradecer de corazon la companıa y la amistad de Gema durante todos estosanos. Su naturalidad, su efusividad, y su fuste al expresar las cosas me han ayudadomucho a aprender como sobrevivir en este mundillo sin dejar de ser uno mismo ydefendiendo lo que uno cree. Muchas gracias tambien al resto de companeros queestan o han estado en el grupo: Alejandro, Angel, Felix, Javi, Raul y Raquel. Lacalidad investigadora, y sobre todo humana, que se respira trabajando aquı es algoque no se encuentra en cualquier sitio.

Mencion aparte requiere mi agradecimiento al senor Cristian Ramırez Atencia, queha sido un hermano para mı durante la tesis. Cuando pasen los anos y recuerde estaepoca, lo primero que me vendra a la memoria seran las miles de idas y venidas dela universidad en el coche, divagando sobre cualquier cuestion irrelevante como si deun capıtulo de Big Bang Theory se tratara. Sin duda, aun nos quedan temas con losque divagar, viajes que compartir, y por supuesto, risas, muchısimas risas.

Dejando a un lado el plano academico, me gustarıa dar las gracias a la gente queforma parte de mi vida musical, la cual ha estado presente durante toda esta tesis.A mis compadres de Morgans: Juanma, Jesu, Noel, Cristofer; y a los de Chanela:Jata, Luisin y Kike. Gracias por darme gloria siempre con vuestro arte y con vuestroser. Gracias tambien a mis amigos de toda la vida de Alcobendas, los Canteros, enespecial al Ruky y al Dae. Bendita locura la vuestra.

En el lado mas personal, me gustarıa agradecer todo el carino que tengo de mifamilia, desde mi hermano Chubi, que a su manera es carinoso, hasta mi padre,Jacinto. Llegar al pueblo y y hablar contigo con una copa de vino por delante escurativo para mı. Tampoco me quiero olvidar del pequeno Iker, de Borja, de Rocıo,ni del peludo de Kratos, aunque nunca vaya a leer esto. A mi familia riojana: Reyes,Jose y Kirita, un millon de gracias por todo lo que me apoyais y lo que creeis en mı.Y por supuesto quiero dar las gracias a mi madre, a mi santısima madre, Julia, quese merece un templo por todo lo que ha hecho y hace por mı, dıa tras dıa.

Por ultimo, a Reyes, la persona que mas me conoce y que, sin duda, mas ha sufridotambien mis inestabilidades emocionales tanto academicas como no academicas. Seme hace imposible imaginarme el camino recorrido en esta tesis sin ella. Mil graciaspor tu apoyo, por tu confianza, y por tu amor infinito. Te quiero.

Page 6: Automated Methods for the Evaluation and Analysis of

Resumen y conclusiones

Resumen

En los ultimos anos, los sistemas aereos no tripulados (UASs o RPASs por sus siglasen ingles) se han convertido en un tema popular en diferentes campos de investi-gacion y aplicaciones industriales. Estos sistemas operan con uno o varios vehıculosaereos no tripulados (UAVs por sus siglas en ingles), lo que supone una reduccionde riesgos humanos y economicos en muchas tareas crıticas, como la inspeccion deinfraestructuras, la vigilancia de zonas costeras, la gestion del trafico y de desastres,la agricultura o la silvicultura, entre otros muchos.

Aunque los UASs modernos estan disenados para controlar UAVs de forma autonoma,el rol de los operadores en un UAS sigue siendo un aspecto crıtico para garantizar elexito de la mision. Esto se debe a los altos costos que implica cualquier operacion deeste tipo, especialmente cuando un solo operador debe supervisar multiples UAVs.Por esta razon, los operadores son entrenados en entornos de simulacion, donde seenfrentan a distintas situaciones y alertas, con el objetivo de que se acostumbren aellas y esten preparados para resolverlas con exito en un escenario real.

Desafortunadamente, el uso creciente de UASs no se ha visto acompanado de una in-tegracion adecuada de la ciencia de entrenamiento de operadores. La mayorıa de lastareas de evaluacion y analisis llevadas a cabo por un instructor durante el analisisde una sesion de entrenamiento se realizan aun de forma rudimentaria e individualpara cada operador, debido a la actual falta de metodos y herramientas capaces dehacerlo automaticamente a gran escala. Hoy en dıa, un instructor experto evalua elcomportamiento de un solo operador en cada sesion, creando un informe (normal-mente a mano) con diferentes aspectos como su capacidad de respuesta a las alertas ola evolucion de su rendimiento. Por lo tanto, la introduccion de metodos inteligentesy automaticos en estos sistemas permitirıa escalar el numero de operadores que par-ticipan en una sesion de entrenamiento. Ademas, gracias a esto el instructor podrıarecibir no solo un informe individual de cada operador, sino tambien un analisis colec-tivo de un grupo de operadores, lo cual puede ser utilizado como mecanismo parala seleccion de operadores, el desarrollo de entrenamiento adaptativo y el analisis depatrones de comportamiento.

Esta tesis se centra en proporcionar inteligencia y automatizacion para las opera-ciones de entrenamiento en UASs, ofreciendo apoyo a los instructores en tareascomo: 1. el analisis del rendimiento del operador 2. la extraccion de patrones decomportamiento 3. la evaluacion del seguimiento de procedimientos. Para lograr es-tos objetivos, nos basaremos en tecnicas que dependen, parcial o exclusivamente, delos registros de datos de mision producidos durante las sesiones de entrenamiento.De manera mas especıfica, se estudia el clustering de series temporales, los modelosocultos de Markov o la minerıa de procesos.

Con respecto a la tarea de analisis de rendimiento, describimos un metodo paradescubrir un conjunto de perfiles de operador representativos a partir de los datosregistrados en el entrenamiento, en los que la evolucion del rendimiento del opera-dor durante una mision es la unidad de medida principal. El perfil temporal del

Page 7: Automated Methods for the Evaluation and Analysis of

VIII

rendimiento del operador se define en funcion de la combinacion de un conjunto demedidas numericas que cuantifican diferentes facetas de la respuesta del operadoren un entorno de simulacion especıfico. A partir de ellas, se utilizan tecnicas declustering de series temporales para obtener automaticamente los perfiles mas dis-criminantes que describen la evolucion del rendimiento de un grupo de operadores.

El uso de medidas de rendimiento no es facilmente escalable entre diferentes entornosde simulacion y, por lo tanto, es interesante utilizar directamente las interaccionesdel operador sin procesar como base para la tarea de extraccion patrones decomportamiento. En este sentido, los metodos actuales basados en modelos ocul-tos de Markov (HMMs por sus siglas en ingles) permiten crear modelos predictivosdel comportamiento del operador. Estos metodos se han extendido de dos manerasdiferentes. En primer lugar, se ha propuesto el uso de HMMs multicanal para en-riquecer el significado de los estados del modelo con el uso de fuentes de informacionparalelas extraıdas de los registros de la mision. En segundo lugar, se consideran laslimitaciones internas de modelado a traves de HMMs y, en base a esto, se estudia laaplicabilidad de un enfoque mas flexible basado en los modelos de doble cadena deMarkov (DCMMs) de alto orden.

Los metodos propuestos para el analisis del rendimiento y la extraccion de patronesde comportamiento se basan exclusivamente en los registros de datos producidosdurante las simulaciones, es decir, no disponen de ningun conocimiento previo sobreel comportamiento nominal del operador. Sin embargo, para la tarea de evaluaciondel seguimiento de procedimientos, el instructor debe controlar que el operadoreste siguiendo correctamente las pautas descritas en un procedimiento operativo ochecklist. Para automatizar esta tarea, las tecnicas de conformance checking (unarama de la minerıa de procesos) han sido adaptadas al uso de series temporalesy procesos dependientes de ellas, ya que los metodos clasicos presentan algunaslimitaciones en este sentido.

Para demostrar la efectividad de cada uno de los avances propuestos en esta tesis, sehan llevado a cabo experimentos en distintos entornos de simulacion. Por un lado,las tareas de analisis de rendimiento y extraccion de patrones se ha utilizado unentorno ligero basado en gamificacion. Por otro lado, para la tarea de evaluaciondel seguimiento de procedimientos se proporciona un caso de estudio en un UASrealista. Ademas, de cara a demostrar la generalidad de la propuesta desarrolladacon respecto a esta ultima tarea, se proporciona otro caso de estudio en un dominioexterno (la minerıa de carbon de tajo largo).

La automatizacion de las tareas del instructor mencionadas anteriormente puede lle-var al desarrollo de una herramienta de analisis de entrenamiento completa, que seutilice no solo para llevar a cabo un analisis mas profundo y solido de las sesionesde entrenamiento, sino tambien para dar soporte a la seleccion de operadores, paraadaptar y mejorar la transferencia de entrenamiento, y para predecir comportamien-tos inesperados a tiempo en operaciones reales.

Page 8: Automated Methods for the Evaluation and Analysis of

IX

Conclusiones y Trabajos Futuros

Las principales contribuciones de esta tesis doctoral pueden resumirse en el desar-rollo de metodos que automatizan las tareas de evaluacion y analisis llevadas a cabopor un instructor durante una sesion de entrenamiento en un UAS, de cara a per-mitir operaciones de entrenamiento a gran escala. De una manera mas especıfica,las tareas de analisis de rendimiento, extraccion de patrones de comportamiento yevaluacion del seguimiento de procedimientos se han abordado con tecnicas que sebasan, parcial o exclusivamente, en los datos de mision registrados durante las se-siones de entrenamiento, como son el clustering de series temporales, el modelado deMarkov o la minerıa de procesos. Para lograr los objetivos principales de esta tesisen relacion con cada una de las tareas del instructor mencionadas anteriormente, sehan estudiado varios enfoques.

Con respecto a la tarea de analisis de rendimiento, se ha presentado una metodologıabasada en clustering de series temporales para extraer perfiles representativos delrendimiento de los operadores de UAVs durante sus operaciones de entrenamiento,con el fin de analizar y comparar como su rendimiento varıa a lo largo de una mision.La metodologıa propuesta se ha aplicado en un entorno de simulacion multi-UAV(llamado DWR) con operadores sin experiencia, obteniendo una descripcion razon-able de los patrones de comportamiento temporales seguidos durante el curso de lasimulacion.

Por otro lado, se ha estudiado la tarea de extraccion de patrones de compor-tamiento entre operadores, que ha sido abordada recientemente en la literatura me-diante el desarrollo de modelos predictivos de comportamiento basados en HMMs.En este sentido, el uso de estos modelos se ha aplicado a los datos extraıdos deoperadores inexpertos en el entorno de simulacion DWR, y se ha extendido de dosmaneras. La primera esta relacionada con el uso de HMMs multicanal, que permitenreunir en el mismo modelo multiples secuencias de datos, tales como la combinacionde interacciones del operador y los eventos de la mision. Los modelos resultantespresentan estados mas robustos e informativos que los de trabajos anteriores en elcampo. La segunda esta relacionada con el uso de DCMMs, que amplıan las posibil-idades de los HMMs clasicos mediante la combinacion de dos de cadenas de Markovde orden superior en el mismo modelo, una para los estados del modelo (cadenaoculta) y otra para las observaciones (cadena visible). Los diferentes procesos quepermiten crear, evaluar y seleccionar estos modelos en un contexto similar a los UASsse han disenado y desarrollado acorde a las necesidades del problema, y tras una ex-perimentacion con datos de DWR, los modelos resultantes muestran que agregaruna cadena de Markov visible para la capa de observaciones mejora las capacidadespredictivas de los DCMMs sobre los HMMs clasicos, manteniendo un nivel razonablede interpretabilidad.

Finalmente, en relacion a la tarea de evaluacion del seguimiento de procedimientos,la solucion propuesta para automatizarla consiste en brindar una nueva perspectivaal campo de la verificacion de conformidad (conformance checking), y a la minerıade procesos en general, mediante el uso de series temporales como base para larealizacion del analisis. Esto es crucial para analizar modelos de proceso cuyas tareasdependen de como una o varias variables del proceso fluctuan, como es el caso delos procedimientos operativos de un UAS. La formalizacion de los datos, el modelo

Page 9: Automated Methods for the Evaluation and Analysis of

X

de proceso y los algoritmos necesarios para realizar la verificacion de conformidadse han redisenado y adaptado, proporcionando una perspectiva temporal hacia losdatos y los modelos de proceso. Se han llevado dos casos de estudio para demostrarcomo se pueden aplicar estos conceptos, no solo para en procedimientos operativosdentro de un UAS, sino tambien en modelos de proceso mas genericos de un dominioindustrial diferente.

Teniendo en cuenta todos los analisis experimentales realizados para cada metodopropuesto en esta tesis, es posible proporcionar una respuesta a las principales pre-guntas de investigacion que se han formulado en la seccion I. A continuacion, semuestra una revision de estas preguntas de investigacion, y se discuten las respues-tas a las mismas en base a las conclusiones experimentales extraıdas en esta tesis:

ïżœ P1: ÂżEs posible automatizar el analisis de la evolucion tıpica del rendimientodel operador a lo largo de una mision?

El enfoque presentado en la Seccion 3.1 para tratar la tarea de analisis derendimiento en un UAS tiene como objetivo describir el comportamiento delos operadores a lo largo del tiempo utilizando un modelo basado en perfiles.Suponiendo que hemos definido el perfil de rendimiento de una simulacion comouna serie temporal multivariable compuesta por la evolucion de varias medidasde rendimiento numericas, la metodologıa propuesta comienza agrupando losdatos para cada medida por separado, validando diferentes configuraciones declustering. Los resultados del clustering para cada medida se utilizan paradefinir la similitud entre dos perfiles, la cual se utiliza en un ultimo proceso declustering basado en medoides para extraer los mas representativos.

El metodo propuesto se ha aplicado en un entorno de simulacion de misionescon multiples UAVs (denominado DWR) donde existen un total de 6 medidasnumericas que definen el perfil de rendimiento del operador. Para evaluar los re-sultados, se ha creado un conjunto de datos de validacion basado en el juicio deexpertos que han calificado la similitud entre pares de series temporales de me-didas de rendimiento. Los resultados obtenidos de la experimentacion muestranque la metodologıa propuesta obtiene buenos valores de precision, especialmentedesde un punto de vista general, en el que se tiene en cuenta la aplicabilidaddel metodo en otros entornos y dominios. En este sentido, el enfoque propuestodestaca debido tanto a su adaptabilidad a diferentes tipos de series temporalescomo al uso de diferentes metodos de clustering. Los perfiles representativos delrendimiento del operador obtenidos en la experimentacion han sido analizadoscualitativamente, de acuerdo con las relaciones observadas entre los cambios derendimiento del operador y los eventos ocurridos en la mision.

ïżœ P2: ÂżComo podemos extender las metodologıas actuales basadas en HMMs parael modelado de comportamiento de operadores de UASs?

Dado que un HMM clasico solo permite modelar una unica secuencia de datos(la secuencia de interacciones del operador), los estados cognitivos resultantescarecen de informacion adicional con respecto a los cambios ocurridos en elcurso de la mision. Al usar HMMs multicanal (MC-HMMs), los estados del

Page 10: Automated Methods for the Evaluation and Analysis of

XI

modelo se ven enriquecidos con el uso de fuentes paralelas de informacion: lasinteracciones realizadas por los operadores en el entorno de simulacion y loseventos que describen el curso de la mision. En este trabajo, se han descritolos diferentes pasos para crear, seleccionar y analizar MC-HMMs en un entornosimilar al de un UAS, y se ha llevado a cabo un experimento del uso de estosmodelos con operadores sin experiencia en el entorno de simulacion DWR.

El modelo resultante de este experimento es bastante descriptivo y revela va-rios patrones de comportamiento complejos que representan inexperiencia delos operadores de la prueba, como por ejemplo, la forma en que controlan lavelocidad de simulacion o la tendencia general que tienen de acelerar y cambiarel plan de mision preprogramado, especialmente en las ultimas partes de unamision. En resumen, al agregar informacion adicional en el modelo, aparte delas interacciones del operador, logramos modelos mas robustos e informativosque los de trabajos anteriores en el campo.

Por otro lado, la asuncion de Markov de primer orden en la que se basan losHMMs, y la tambien asumida independencia entre las acciones del operadora lo largo del tiempo por parte de la literatura, limitan las posibilidades demodelado. Para superar estos problemas, analizamos la aplicabilidad de losDCMMs, que proporcionan un marco de modelado flexible en el que se combinandos cadenas de Markov de alto orden (una oculta y otra visible). En estetrabajo, se propone un metodo para rankear y seleccionar DCMMs en base aun conjunto de medidas de evaluacion que cuantifican la predictibilidad y lainterpretabilidad de los modelos.

Tras un experimento en el entorno de simulacion DWR, los modelos resultantesmuestran que, a pesar de la inclusion de cadenas ocultas de orden superior, nose mejora sustancialmente la calidad del modelo ni en terminos de predictabi-lidad ni de interpretabilidad. Sin embargo, al agregar una cadena visible deMarkov entre las observaciones sı que se mejoran las capacidades predictivasdel modelo sobre los HMMs clasicos, al tiempo que se mantiene un nivel ra-zonable de interpretabilidad. En cualquier caso, estos resultados solo muestranlas conclusiones para un entorno de simulacion especıfico, y lo que es realmenteinteresante para el estado del arte es la flexibilidad y la riqueza del marco demodelado propuesto en comparacion con las metodologıas actuales basadas enHMM.

ïżœ P3: ÂżEs posible automatizar la evaluacion del seguimiento de procedimientos?

Automatizar la tarea de evaluacion del seguimiento de procedimientos requierede una metodologıa para comparar procedimientos operativos con datos demision. De acuerdo con la literatura, la familia de tecnicas enfocadas en com-parar modelos de procesos con datos del mismo proceso se conoce como analisisde conformidad. Sin embargo, el analisis de conformidad clasico no es adecuadopara la evaluacion del seguimiento de procedimientos porque normalmente, elcumplimiento de las tareas de un procedimiento operativo depende de comoevolucionan una o varias variables en el registro de datos, lo cual no se puedeimplementar con un simple registro de eventos y un analisis de coincidenciaevento-tarea. En este trabajo, este problema se ha resuelto al introducir la

Page 11: Automated Methods for the Evaluation and Analysis of

XII

perspectiva de las series temporales en el analisis de conformidad. Esto provocaun cambio de paradigma en el analisis de conformidad en particular, y en laminerıa de procesos a un nivel mas general.

Para implementar la perspectiva temporal dentro del analisis de conformidad, eluso de registros de eventos se ha reemplazado por el de registros de series tem-porales, y se ha definido un subtipo de redes de Petri, llamado flujo de trabajocon series temporales (TSWF-Net por sus siglas en ingles), para representar unmodelo de proceso basado en series temporales. Se ha disenado un algoritmopara detectar las discrepancias entre un registro de series de temporales y unaTSWF-Net, que retorna no solo si una tarea del modelo ha coincidido con losdatos del registro o no, sino tambien el tiempo de coincidencia e informacionadicional que incluye la posibilidad de que una tarea coincida en un intervalode tiempo diferente al esperado. Para ilustrar la efectividad de nuestro enfoquedel analisis de conformidad como una solucion para la automatizacion de laevaluacion del seguimiento de procedimientos, se ha llevado a cabo un caso deestudio disenando y modelando un procedimiento operativo de emergencia den-tro de un UAS real, cuyo seguimiento se ha evaluado utilizando una baterıade operadores de prueba. Aunque el algoritmo utilizado en el estudio de casoes basico y podrıa mejorarse devolviendo mas informacion sobre las discrep-ancias entre el comportamiento del operador y el procedimiento operativo, elproceso se ha ejecutado de forma totalmente automatica. Una representaciongrafica de los resultados de esta automatizacion se puede ver en [RARFGPC17].

ïżœ P4: ÂżPodemos aplicar alguno de los metodos propuestos en un dominio difer-ente al de los UASs?

Con respecto a la tarea de analisis de rendimiento, el metodo propuesto paraextraer perfiles representativos del rendimiento del operador tiene un inconve-niente principal: depende de la existencia de medidas directas de rendimiento decara a definir un perfil. Estas medidas suelen estar relacionadas con el sistemaespecıfico que se esta analizando, y pueden no existir o no ser significativas.Sin embargo, en terminos puramente metodologicos, el enfoque propuesto esadaptable a medidas de cualquier naturaleza, ya que busca y valida diferentesdistancias de series temporales y configuraciones de clustering sin la necesi-dad de seleccionar ningun parametro a priori. Por lo tanto, su aplicacion ennuevos entornos es sencilla una vez que se dispone de un conjunto de medidasde rendimiento fiables.

Al igual que en el caso anterior, los metodos propuestos para modelar el com-portamiento de operadores en un UAS a traves de MCHMMs y DCMMs se hanaplicado unicamente en el entorno de simulacion DWR, pero son facilmente apli-cables a otro entorno similar, o incluso a otros sistemas de interaccion hombre-maquina, siempre que se cumplan ciertas condiciones: 1. El sistema registradatos secuenciales tanto de las interacciones del usuario como del contexto delproceso. 2. El numero de interacciones posibles que el usuario puede realizarno es muy alto. 3. El modelo resultante esta disenado para presentar una com-binacion de capacidades de predictibilidad e interpretabilidad.

Con respecto a la tarea de evaluacion del seguimiento de procedimientos, la

Page 12: Automated Methods for the Evaluation and Analysis of

XIII

solucion propuesta en este trabajo para automatizarla se basa en una adaptacionde las tecnicas de analisis de conformidad a los modelos de procesos que tienenen cuenta variables secuenciales o series temporales. En este sentido, todaslas formalizaciones y algoritmos se han abordado de manera general, sin teneren cuenta el dominio especıfico de los UASs. Solamente en el primer caso deestudio el enfoque propuesto se ha implementado especıficamente para evaluarun procedimiento operativo real de un UAS. Sin embargo, el segundo caso deestudio se ha llevado a cabo en un dominio completamente diferente (minerıade tajo largo), mostrando la generalidad de la propuesta.

Finalmente, hay varias lıneas de trabajo que podrıan extenderse en un futuro proximo,relacionadas con los diferentes metodos y algoritmos presentados en esta tesis:

ïżœ Una comparacion formal de los metodos propuestos entre distintos entornosde simulacion de UASs, e incluso entre otros sistemas de interaccion hombre-maquina.

ïżœ Relacionado con el modelado de comportamiento basado en HMMs, un estudiode como las dos extensiones utilizadas en esta tesis (MCHMMs y DCMMs) sepueden aplicar juntas en forma de un modelo de doble cadena de Markov mul-ticanal (MCDMM), serıa de gran interes en la mejora de los patrones de com-portamiento contenidos en el modelo. En este sentido, tambien es interesantebasar las extensiones propuestas en modelos ocultos Semi-Markov (HSMMs)en lugar de HMMs, debido a que el uso de cadenas Semi-Markov permiten unmodelado explıcito de la duracion de los estados ocultos, lo que es util paraUASs modernos donde el operador pasa la mayor parte del tiempo de la misionsupervisando el estado de la mision y, por lo tanto, el numero de interaccioneses muy bajo.

ïżœ El uso de covariables en la creacion de modelos de comportamiento, con el finde comparar patrones con respecto a caracterısticas especıficas del operador,como la edad o la experiencia previa en operaciones con UAVs.

ïżœ El uso de tecnicas de optimizacion de hiperparametros actuales, como la opti-mizacion bayesiana, para buscar de forma eficiente las mejores configuracionestanto de los metodos propuestos basados en clustering como de los basados enmodelos de Markov.

ïżœ Desarrollo de una herramienta de comparacion de modelos markovianos, queproporcione, dada una base de datos de secuencias de sımbolos (posiblementemulticanal), un ranking de los modelos que mejor puntuan en un conjunto demedidas de evaluacion, seleccionadas por el usuario. El ranking puede incluirno solo HMMs, sino tambien HSMMs, DCMMs, cadenas de Markov simples yversiones multicanal de todos ellos.

ïżœ Tanto el modelo basado en clustering obtenido para analizar perfiles de rendimientodel operador como los modelos basados en HMMs creados para extraer pa-trones de comportamiento se han utilizado principalmente para proporcionar alinstructor informacion valiosa para interpretar y analizar. Sin embargo, estosmodelos, como cualquiera que resulta de un proceso de aprendizaje automatico,se pueden utilizar con fines de prediccion. De hecho, la seleccion de DCMMsen este trabajo tiene en cuenta explıcitamente las capacidades predictivas del

Page 13: Automated Methods for the Evaluation and Analysis of

XIV

modelo para elegirlo como un buen candidato. Por lo tanto, es importante des-plegar los modelos resultantes como una herramienta predictiva a tiempo realque detecte, con suficiente antelacion, los comportamientos anormales de losoperadores durante una mision real, ası como las reducciones significativas ensu rendimiento.

ïżœ El enfoque del analisis de conformidad basado en series temporales desarrolladopara automatizar la evaluacion del seguimiento de procedimientos en UASs abrecamino para una inclusion mas profunda de las series temporales dentro de lainvestigacion en minerıa de procesos. En este sentido, algunas lıneas de trabajorelacionadas podrıan incluir:

– Estudio de otras posibles categorıas de conformidad que pueden resultar dela comparacion de series temporales y modelos de proceso.

– Mejora de la version actual del algoritmo de analisis de conformidad, paraque una tarea pueda representar un estado en lugar de un evento. De estamanera, se solucionarıan las incoherencias entre tareas dependientes, comola que se muestra en la Figura 4.7b.

– Implementacion para modelos de proceso declarativos (en este trabajo solose consideran los modelos de proceso procedurales representados como redesde Petri).

– Exploracion de metodos de visualizacion para los resultados del algoritmopropuesto, en los casos en que el numero de variables en el registro de seriestemporales o el numero de rutas posibles en el modelo es alto.

– Desarrollo de metodos automaticos que elijan los mejores parametros (ambitosde tiempo, parametros de las guardas de las transiciones...) para un modelode proceso especıfico, dado un conjunto de datos de entrenamiento.

Page 14: Automated Methods for the Evaluation and Analysis of

Abstract

In recent years, Unmanned Aircraft Systems (UASs), or Remotely Piloted AircraftSystems (RPASs), asklhave become a popular topic in many research fields and in-dustrial applications. These systems operate with one or multiple Unmanned AerialVehicles (UAVs), reducing both the human and economical risks of many sensitivetasks, such as infrastructure inspection, monitoring coastal zones, traffic and disastermanagement, agriculture or forestry, among many others.

Although modern UASs are designed to control the UAVs autonomously, the role ofUAS operators is still a critical aspect that guarantee the mission success due to thehigh costs involved in this kind of operations. For this reason, operators are trainedin simulation environments, where they have different situations and alerts, takeadequate decisions, and get prepared to solve them successfully in a real scenario.

Unfortunately, the increasing use of UASs has not been met with appropriate in-tegration of training science. Most of the tasks of evaluation and analysis carriedout by an instructor during the debriefing of a training session are still performedrudimentarily and individually for each operator, due to the current lack of methodsand tools capable of doing it automatically on a large scale. Nowadays, an expertinstructor evaluates the behaviour of a single operator in each session, creating areport (usually handwritten) with different aspects such as his/her responsiveness toalerts or the evolution of his/her performance. Thus, the introduction of intelligentand automatic methods in this regard would allow to to scale up the number ofoperators that take part of a training session. Furthermore, the instructor would beprovided not only with an individual report, but also with a collective analysis of agroup of operators, which is a potential mechanism for allowing operator selection,adaptive training, and behavioural pattern analysis.

This dissertation is focused on providing intelligent and automated methods to train-ing operations in a UAS by supporting instructors in some debriefing tasks, such as:1. the analysis of operator performance; 2. the extraction of behavioural patterns;3. the procedure following evaluation. To achieve these main objectives, we will baseour approaches on techniques that rely, partially or exclusively, on the mission datalogs produced during multiple training sessions. More specifically, we will study theapplicability of time series clustering, markovian modelling and process mining.

Regarding the task of performance analysis, we describe a method to discovera set of representative operator profiles directly from the mission logs, where theevolution of the operator performance during a mission is the main unit of measure.The temporal profile of the operator performance is defined based on the combinationof a set of numerical measures that quantify different aspects of the operator responsein a specific simulation environment. Then, time series clustering techniques are used

Page 15: Automated Methods for the Evaluation and Analysis of

XVI

to retrieve automatically the most discriminant profiles that describe the performanceevolution of a group of trainees.

The use of performance measures is not easily scalable among different simulationenvironments, and thus, it is interesting to use directly raw operator interactionsas the basis for the task of extracting behavioural patterns. In this regard,the current methods based on Hidden Markov Models (HMMs) are used to createpredictive models of the operator’s behaviour. These methods have been extendedin two different ways: first, the use of Multichannel HMMs is proposed in order toenrich the meaningfulness of the model states with the usage of parallel sources ofinformation from the mission logs; secondly, the inner modelling limitations of HMMsare considered, and based on this, the applicability of a more flexible approach basedon high order Double Chain Markov Models (DCMMs) is studied.

All the proposed methods for the analysis of performance and behavioural patternsare tested rely exclusively on the simulation logs produced during the experimenta-tion, i.e., there is no prior knowledge about the nominal operator behaviour. How-ever, related to the task of procedure following evaluation, an instructor is incharge of controlling that the operator is correctly following the guidelines describedin an operating procedure, or checklist. In order to automate this task, conformancechecking techniques (a family of process mining techniques) have been adapted to theuse of time-based data and time-aware processes, in order to solve some limitationsof the classical methods.

In order to demonstrate the effectiveness of each of the proposed approaches, severalexperiments have been carried out in different simulation environments. On theone hand, the approaches for automating the tasks of performance analysis andbehavioural pattern extraction have been tested in a lightweight and simple multi-UAV simulation environment, with inexperienced operators. On the other hand, forthe task of procedure following evaluation, a case study in a realistic UAS has beenprovided. Additionally, in order to prove the generality of this last approach, anothercase study in a external domain (longwall mining) is also provided.

The automation of the instructor tasks mentioned above may lead to the developmentof an all-in-one training analysis tool, which is useful not only for carrying outa deeper and more robust debriefing of the training sessions, but also to performoperator selection, to adapt and improve the transfer of training, and to predictabnormal behaviour in real operations.

Page 16: Automated Methods for the Evaluation and Analysis of

Contents

Resumen y conclusiones VII

Resumen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VII

Conclusiones y Trabajos Futuros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IX

Abstract XV

Contents XVII

List of Figures XIX

List of Tables XX

List of Acronyms and Symbols XXIII

I Report 1

1 Introduction 3

1.1 Motivation of the dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.4 Structure of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.5 Publications of the compendium and Contributions . . . . . . . . . . . . . . . . . 81.6 Other Publications and Contributions . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Backgrounds on training operations in a UAS 13

2.1 UAS simulation training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2 Simulation Environments: Drone Watch And Rescue (DWR) and SAVIER Demon-

strator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.3 Debriefing process for a training session on a UAS . . . . . . . . . . . . . . . . . 17

2.3.1 Performance Analysis of operators in a UAS . . . . . . . . . . . . . . . . . 182.3.2 Behavioural patterns of operators in a UAS . . . . . . . . . . . . . . . . . 182.3.3 Operating Procedures & Procedure Following Evaluation . . . . . . . . . 19

3 Data-driven learning approaches for the analysis of performance and

XVII

Page 17: Automated Methods for the Evaluation and Analysis of

XVIII Contents

behavioural patterns 21

3.1 Automatic retrieval of representative performance profiles . . . . . . . . . . . . . 21

3.1.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.1.1.1 Applying time series clustering on every performance measureseparately . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.1.1.2 Cluster assignation distance between performance profiles . . . . 24

3.1.1.3 Medoid-based clustering to extract the most representative per-formance profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.1.2 Experimentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.1.2.1 Evaluation criteria . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.1.2.2 Results from applying the proposed method in DWR . . . . . . 26

3.1.2.3 Comparative study between the proposed method and other mul-tivariate time series clustering approaches. . . . . . . . . . . . . 28

3.1.2.4 Interpretation of the performance profiles found . . . . . . . . . 29

3.2 Modelling hidden patterns through Markovian Models . . . . . . . . . . . . . . . 30

3.2.1 Enriching the meaningfulness of the model states via MC-HMMs . . . . . 32

3.2.1.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.2.1.2 Experimentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.2.2 Analysing the applicability of high order double chain Markov models . . 35

3.2.2.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.2.2.2 Experimentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4 Automatic Procedure Following Evaluation through time series-awareconformance checking 41

4.1 Adapting Workflow Nets (WF-Nets) to time-based data: The Workflow Net withtime series (TSWF-net) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.1.1 Time series log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.1.2 TSWF-net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.2 Basic time series-aware conformance checking over a TSWF-net . . . . . . . . . . 46

4.3 Complete time series-aware conformance checking over a TSWF-net . . . . . . . 47

4.4 Experimentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.4.1 Case Study 1: Basic conformance checking over the “Engine Bay Over-heating” operating procedure in an Unmanned Aircraft System (UAS) . . 52

4.4.1.1 Modelling the Operating Procedure (OP) as a TSWF-net . . . . 52

4.4.1.2 Testing the basic conformance checking algorithm in the TSWF-net-based OP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.4.2 Case Study 2: Complete conformance checking over the process model ofa longwall mining shearer . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5 Conclusions and Future Works 59

5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.2 Future lines of work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

Page 18: Automated Methods for the Evaluation and Analysis of

II Publications 65

1 Analysing temporal performance profiles of UAV operators using timeseries clustering 67

2 Modelling Behaviour in UAV Operations Using Higher Order DoubleChain Markov Models 85

3 Automatic Procedure Following Evaluation Using Petri Net-BasedWorkflows 97

Bibliography 111

A Additional resources for the experimental evaluation of time series-aware conformance checking on Longwall mining processes 119

List of Figures

1.1 Summary of the objectives and contributions of this dissertation. . . . . . . . . . 5

2.1 Screenshot of a simulation in Drone Watch & Rescue. . . . . . . . . . . . . . . . 15

2.2 Timeline for a training mission, as studied in this work. Orange boxes remarkthe concepts that are studied and automated in this thesis. . . . . . . . . . . . . 17

3.1 Finding the best discrimination for each of the M performance measures separately. 23

3.2 Extracting the most representative performance profiles based on the clusteringresults obtained by following the process described in Figure 3.1 . . . . . . . . . 24

3.3 Plots of two of the representative performance profiles found. Red lines marktimes when an incident was triggered, and the green line indicates the momentwhen the mission preparation phase finishes and the execution phase starts. Eachsubplot contains the evolution of the six performance measures (S,A,At, C,Ag, P )for comprising a performance profile. . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.4 5-states HMM fitted for DWR, analysed in terms of the operator interactions. . . 32

3.5 General scheme of the method of analysis in DWR with Multichannel HiddenMarkov Models (MC-HMMs). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

XIX

Page 19: Automated Methods for the Evaluation and Analysis of

3.6 6-states MC-HMM fitted for DWR, analysed in terms of the parallel occurencesof operator interactions and mission events. . . . . . . . . . . . . . . . . . . . . . 35

3.7 General scheme of the analysis process developed to find the best DCMM formodelling behaviour in a training operation with UAVs. . . . . . . . . . . . . . . 36

3.8 Heatmaps showing the best position of a DCMM in the top-10 ranking in termsof the three hyperparameters of a DCMM: M , l and f . The x-axis shows how theresults vary when we modify the importance of the predictability of the model,and thus, modifying importance of the interpretability too. White color is usedwhen no model with such hyperparameters is found in the top-10 ranking. . . . . 39

4.1 Firing a transition in a TSWF-net. . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.2 Example of each of the conformance categories detected using the proposed algo-rithm. For all the examples, the ts-guard of transition t is “foo starts increasing”and its time scope is [0, 100]. Where applicable, the reversing time (r) is set to100 time units. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.3 Example of situations with multiple untreated contexts. On the situation shownon the left, repairing one of the invalid contexts (e.g. 〈t, x〉) makes the otherdisappear, since t and tâ€Č share input tokens. On the other hand, the situation onthe right can be fully repaired in two iterations of algorithm 3. . . . . . . . . . . 51

4.4 Workflow Process Definition for the different type of procedural steps defined forany OP. Transitions drawn as thin lines represent instantaneous tasks (checks).Transitions drawn as rectangles represents an implicit delay in their execution. . 53

4.5 Description and representation of the “Engine Bay Overheating” alert. . . . . . . 54

4.6 The general model of a shearer cycle. The number in brackets refer to the phasesof the process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.7 Results of the complete conformance checking algorithm over two cases of thelongwall mining time series log. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

A.1 Context of the experimentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

A.2 Representation of the process model “First half cycle of a shearer” as a TSWF-net.120

List of Tables

2.1 Summary of the interactions and events from the simulator DWR, used in someof the experiments carried out in this thesis. . . . . . . . . . . . . . . . . . . . . . 16

2.2 Specification summary for the test missions (T.M) designed in DWR. . . . . . . 17

3.1 Parameter tuning for all the variables involved in the experiment. . . . . . . . . . 27

3.2 Summary of the best validation results of the time series clustering applied toeach of the performance measures used, corresponding to Step 1 of the methodproposed in this work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.3 Validation results for the final clustering process of the proposed method. Boldedcells represent the best results obtained. . . . . . . . . . . . . . . . . . . . . . . . 28

XX

Page 20: Automated Methods for the Evaluation and Analysis of

List of Tables XXI

3.4 Comparative results, in terms of Pairwise Accuracy (P-Acc), between the pro-posed method against direct clustering approaches based on multivariate timeseries distances. The results are compared for different number of clusters (K2).While the bolded cell indicates the result obtained for the proposed method, cellsin italics show the results that surpasses our best value. . . . . . . . . . . . . . . 29

3.5 Results for the MC-HMM model selection. The bolded row indicates that the6-states model is chosen, since it obtains good values for both the BIC and NRSmeasures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.6 Summary of the evaluation measures used to assess the quality of a Double ChainMarkov Model (DCMM) in this work. These measures are divided into twogroups, namely predictability and interpretability, depending on the aspect ofthe model covered by each of them. The column “Best” indicates whether bestmodels are achieved by maximizing (Max.) or minimizing (Min.) the value of theevaluation measure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.7 Top-5 DCMMs for DWR, evaluated in terms of several predictability and inter-pretability measures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.1 Test cases defined for this case study. . . . . . . . . . . . . . . . . . . . . . . . . . 544.2 Results of the APFE algorithm for each test case (Test Scenario + Test Operator)

developed in this case study. Bolded cells mark the cases where the test scenariorepresents a real alert and the operator followed the procedure successfully. t andf denote true and false respectively. . . . . . . . . . . . . . . . . . . . . . . . . 55

A.1 Description of the time series guards of the transitions of the process model “Firsthalf cycle of a shearer”. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

A.2 Parameter tuning for the experiments. . . . . . . . . . . . . . . . . . . . . . . . . 121A.3 Expressions of the time series guards of the transitions of the process model “First

half cycle of a shearer”. Guards are expressed in terms of a set of log entries Δ ∈ E∗(E = X × V ×D) from a specific case of a time series log L = (V,U,X, Y ). Forsimplicity sake, we assume that the set of log entries is sorted by time index, i.e.,Δ can be expressed as Δ = {ei = 〈xi, vi, di〉}i∈N, where xi is the time index of thei-th log entry (xi > xi−1 ∀i), vi the variable name and di the data value. . . . . . 122

Page 21: Automated Methods for the Evaluation and Analysis of

List of Acronyms and Symbols

APFE Automatic Procedure Following Evaluation.

BIC Bayesian Information Criterion.

DCMM Double Chain Markov Model.

DWR Drone Watch And Rescue.

GA Genetic Algorithm.

GCS Ground Control Station.

HMM Hidden Markov Model.

HRI Human-Robot Interaction.

HRT Human-Robot Team.

HSC Human Supervisory Control.

HSMM Hidden Semi-Markov Model.

KPP Key Performance Parameter.

MC-HMM Multichannel Hidden Markov Model.

OP Operating Procedure.

PFE Procedure Following Evaluation.

PN Petri Net.

RPAS Remotely Piloted Aircraft System.

TFF Time of First Fulfilment.

ToC Time of Completion.

XXIII

Page 22: Automated Methods for the Evaluation and Analysis of

XXIV List of Acronyms and Symbols

TSWF-net Workflow Net with time series.

UAM Universidad Autonoma de Madrid.

UAS Unmanned Aircraft System.

UAV Unmanned Aerial Vehicle.

WF-Net Workflow Net.

Page 23: Automated Methods for the Evaluation and Analysis of

Part I

Report

1

Page 24: Automated Methods for the Evaluation and Analysis of

Chapter 1

Introduction

This chapter presents the motivation and overview of this dissertation. Firstly, Section 1.1motivates the research questions that will be addressed later. After that, Section 1.1 brieflydescribes the current state of training operations in a Unmanned Aircraft System (UAS), asmain research focus of this work, and the need for automating some of the tasks carried outby the instructor, which provide a basic framework for the research questions described inSection 1.3. Then, the dissertation structure is described in Section 1.4 and, finally, the maincontributions and the associated publications related to the this thesis are presented in Sections1.5 and 1.6.

1.1 Motivation of the dissertation

An Unmanned Aerial Vehicle (UAV), commonly known as a drone, and referred to as a RemotelyPiloted Aircraft (RPA) by the International Civil Aviation Organization (ICAO), is an aircraftwithout a human pilot aboard. In the last decade, the use of UAVs has become very popular,and it is expected to grow even more over the coming years [DN04]. This growth is produced dueto the interest of both the industry and the research community in this type of systems. On theone hand, the different potential applications provided, such as surveillance [PBC+09], disasterand crisis management [WZ06], agriculture or forestry [MC06] have attracted the interest of theindustry. On the other hand, the research community is also interested in UAVs due to thechallenging problems that must be faced from different fields such as human-machine interfaces[RGCGBC16], air-to-air refueling [MAHO16], integration into non-segregated space [CJN14],voice and gesture commanding [MdBJG16], stress monitoring [HGAR15], augmented reality[RGY+15] and minimum time target detection [PCBPLOdlC16], amongst others. Typically, thewhole system involved in an operation with UAVs is known as UAS, or Remotely Piloted AircraftSystem (RPAS), and it is mainly composed of the UAVs, as well as associated launch, recovery,and control hardware and software. Although these systems are increasingly autonomous, therole of UAS operators is still a critical aspect that guarantee the mission success, due to thehigh costs involved in any real mission, specially in future scenarios when one single operatorwill be responsible for supervising multiple UAVs. Thus, UAV operators are hardly trained insimulation environments, where they are asked by instructors to face different situations andalerts, to choose an appropriate decision, and to be able to solve the situation successfully

3

Page 25: Automated Methods for the Evaluation and Analysis of

4 Chapter 1. Introduction

in a real scenario [HRCB12]. Unfortunately, the increasing use of UASs has not been metwith appropriate integration of training science [BJBR+16]. Although several researchers arecontributing to define a formal framework of Knowledge, Skills and Attitudes (KSAs) to improvethe effectiveness UAS training methods [ITV13], the expectations of UAS growth has raisedalarm among training instructors, due to the current lack of tools capable of evaluating andanalyzing the performance of operators on a large (or even massive) scale. Most of the taskscarried out by an instructor during the mission debriefing, such as the analysis of the performanceand behaviour of an operator, or the evaluation of the compliance with a established operatingprocedure, are still performed manually and individually. This dissertation focuses on solvingthese problems by providing UAS instructors with automated methods to monitor, evaluate andanalyse training operations on a large scale. The automation of the analysis in training processeshas been barely studied so far, and it can provide many direct benefits not only to UAS trainingbut also to classical manned aviation and to any supervisory control system. Due to moderntraining systems register all the mission information, including vehicle telemetry, mission eventsand operator commands, the methods developed in this thesis are mainly data driven. Thus,we can take advantage of techniques from different Artificial Intelligence and Data Analysisfields, such as unsupervised machine learning (Clustering, Hidden Markov Models...), in orderto automate the analysis of the performance and behaviour of the operators.

However, in some high-risk situations such as the emergence of an alert during the mission,the operator must follow an operating procedure carefully. Thus, the evaluation and analysisof his/her response in these cases does not depend exclusively on the data stored, but in therelation between the data and the procedure. Here, it is necessary to be based on process miningtechniques in order to automate the evaluation, due to they support the analysis of data relatedto a business process.

1.2 Problem statement

On the left side of Figure 1.1 a graphical representation of the current situation of a trainingoperation in a UAS is shown. Nowadays, the tasks of analysis are still performed manually[DKL+13]. Typically, an expert instructor assesses the behavior of a single operator duringa training session, measuring different aspects of the operator response [DKL+13]. As it wasstated in the previous section, one of the future problems related to training in UASs (see thecentral part of Figure 1.1) is related to the increasing use of these type of systems expected forthe incoming years, which raises the issue of large-scale training operations. In the same wayas before, a single instructor will likely need to monitor and analyze the behavior of multipleoperators, which makes it difficult and costly to make an objective and robust evaluation andanalysis of all of them.

The solution to this problem lies in making the training data accesible for the instructorby providing intelligent and automated methods that provide information and patterns aboutthe operator response in the different aspects required (See right side of Figure 1.1). Specifi-cally, in this work we are interested in three aspects of the evaluation and analysis of operators:1. Performance, understood as a set of direct measures that “score” the quality of the operatorin the mission. 2. Behavioral patterns, useful to understand the hidden cognitive processes anddecision making behind the operator interactions, and also to predict the consequences of highworload and time pressure. 3. Procedure following, i.e., the evaluation of how accurate and ef-

Page 26: Automated Methods for the Evaluation and Analysis of

1.2. Problem statement 5

1. Current Situation 2. Future Problem 3. Thesis Contributions

Procedure Following

PerformanceAutomatic Procedure Following Evaluation

Behavioural models

Inst

ruct

orO

pera

tor/s

Data driven analysis

ManualAnalysis

Performance profiles

Behavioural patterns

Operating procedure

Figure 1.1: Summary of the objectives and contributions of this dissertation.

fectively operators respond to mission incidents while following an specific operating procedureor checklist. Regarding performance analysis, some previous approaches are devoted to createa general profile of the operator based on the combination of a set of numerical measures thatquantify different facets of the operator response during the mission, such as the agility, theaggresiveness or the number of mission tasks successfully accomplished. In order to make ananalysis of the most representative profiles among many operators, clustering techniques havebeen applied [RFMC16]. However, those profiles are the result of an aggregation of the wholemission data, and thus, the instructor has no information about the evolution of the performanceduring the mission. Here, we go a step further in profile-based performance analysing by de-scribing a method, based on time series clustering, to define and analyse a set of representativetemporal profiles, where the evolution of the operator performance during a mission is the mainunit of measure.

The analysis of direct performance measures of quality is usually related to the specificsystem that is being analysed, and thus, the profile-based methods cannot be applied if thesystem does not provide a good set of performance measures, or at least, enough informationto post-process them. One way to overcome this problem is to use directly the raw operatorinteractions to create behavioural models, which in turn allow to infer underlying cognitiveprocesses on the operator response. In this regard, the works of Boussemart et al. [Bou11] setthe usage of Hidden Markov Models (HMMs) and derivatives as the state of the art in modellingand predicting knowledge-based supervisory tasks such as the control of UAVs. The methoddefined in that work aims to create models that contain interesting patterns for the instructorabout the behaviour of operators, but it opens the way for several improvements.

On the one hand, since a classical HMM only allows to model a single sequence of data, in ourcase the sequence of operator interactions, the resulting cognitive states may lack of additionalinformation regarding the changes in the course of the mission, which reduces the robustness

Page 27: Automated Methods for the Evaluation and Analysis of

6 Chapter 1. Introduction

of the conclusions extracted in the analysis. In this work, we will study the applicability ofMultichannel Hidden Markov Models (MC-HMMs) (or Multivariate) to enrich the states ofthe model with the usage of parallel sources of information: the interactions performed by theoperators in the simulation environment, and the events that describe the course of the mission.On the other hand, the Markov assumption (i.e., the current state only depends on the previousstate and not on earlier ones) of classical HMMs may result insufficient to detect long hiddenpatterns, and thus, it is interesting to study the applicability of more general models that relaxthis assumption. In this sense, high order Double Chain Markov Models (DCMMs) provideflexible and fully Markovian modelling capabilities which can expand the study of behaviouralmodelling in UASs. Both the performance profiling and the behavioural modelling approachescan be gathered in the field of data-driven analysis, since they rely exclusively on the missiondata, without any prior knowledge about the nominal behaviour expected in any phase of themission course. However, and due to the high costs involved in any mission established in anUAS, every critical step or possible failure is controlled by following the guidelines of a completeaction checklist, as it happens in manned operations [Joh09]. In this work, we will refer to theterm Operating Procedure (OP) as a way to gather different step-by-step guiding tools, such aschecklists, action checklists, Emergency Operating Procedures.

An instructor is in charge of controlling that the operator is correctly following the stepsdescribed in the OP. This is what we will henceforth call Procedure Following Evaluation (PFE).In order to automate this task, we must rely on a family of process mining techniques, knownas conformance checking, focused on comparing process models (the OP) with data from thesame process. Unfortunately, classical conformance checking techniques are not suitable forPFE, because the data associated to a case cannot change over time [dLvdA13], and sometimesthe procedural steps of an OP in a UAS must be checked according to flexible and complexconditions involving the state of one or many variables in the log during a certain period oftime. Thus, the basic elements and algorithms of conformance checking must be adapted to theparadigm of time-based data and time-aware processes, in order to solve, among other issues,the task of automatic procedure following evaluation.

The automation of the instructor tasks mentioned above (see the right side of Figure 1.1)may lead to the development of an all-in-one training analysis tool, where the instructor isprovided with graphical representations of the information extracted from the proposed methodsof evaluation and analysis. This can be useful not only for carriyng out a deeper and more robustdebriefing of the training sessions, but also to perform operator selection, to adapt and improvethe transfer of training, and to predict abnormal behaviour in real operations. An exampleof how some of these methods can be implemented in a tool inside a UAS can be seen in[RARFGPC17].

1.3 Research Questions

This PhD Thesis aims to provide intelligence and automation to training operations in a UAS bysupporting instructors in some debriefing processes, such as the analysis of operator performance,the extraction of behavioural patterns and the procedure following evaluation. To achieve thesemain objectives, the main research questions of this thesis can be described as follows:

ïżœ Q1: Is it possible to automate the analysis of the normal evolution of the operator per-

Page 28: Automated Methods for the Evaluation and Analysis of

1.4. Structure of the thesis 7

formance throughout the course of a mission?

ïżœ Q2: How can we extend the current HMM-based methods for behavioural modelling inUASs?

ïżœ Q3: Is it possible to automate the procedure following evaluation?

ïżœ Q4: Can we apply any of the proposed methods in a different domain than UASs?

1.4 Structure of the thesis

This PhD dissertation has been divided into two parts (Part I and Part II). The first one isdevoted to the statement of a summarised description of all the methods developed to provideintelligence and automation to training operations in a UAS. The second one collects the mainscientific publications obtained as a result of the work developed.

Part I has been structured in five chapters. A brief description of the chapter contents aregiven as follows:

ïżœ Chapter 1: Introduction. It provides the general context and motivations related tothis dissertation. In addition, the main objectives and research questions are introduced,as well as the main contributions and publications generated in this work.

ïżœ Chapter 2: Backgrounds on training operations in a UAS. It briefly describes thecontext of a training mission in an UAS, the role of the instructor and the operator andthe processes that will be studied and automated in this work. Furthermore, a reviewof the state of the art in training systems for UAV operations is provided, as well as anintroduction to the simulation environment Drone Watch And Rescue (DWR), that hasbeen designed to be used as a testbed for some of the experiments of this work.

ïżœ Chapter 3: Data-driven learning approaches for the analysis of performanceand behavioural patterns. This chapter is concerned with the improvement of thecurrent methods for the instructors tasks related to performance analysis and behaviouralmodelling, from a exclusively data-driven perspective. Regarding performance analysis,an approach based on time series clustering is proposed for creating representative profilesof the evolution of the operator performance. Regarding behavioural modelling, the studyof Markovian models goes further than classical HMMs, adding the usage of multiplesources of information through MC-HMMs and the study of the applicability of high orderextensions of a DCMM. All the approaches are tested in the simulation environment DWR.

ïżœ Chapter 4: Automatic Procedure Following Evaluation through time series-aware conformance checking. It presents the approach taken to automate the problemof Procedure Following Evaluation through time series-aware conformance checking. Thedata, the models and the algorithms from classical conformance checking have been ex-tended and adapted in order to provide support for time series-aware systems. Finally,two case studies are included in order to demonstrate the effectiveness of the proposedapproach.

Page 29: Automated Methods for the Evaluation and Analysis of

8 Chapter 1. Introduction

ïżœ Chapter 5: Conclusions and Future Works. The Research Questions described inChapter 1 are addressed in order to provide some answers, based on the results obtainedfrom this research. Finally, taking into account all the analysis carried out, a summary ofthe possible future works is presented.

Part II details the work developed to accomplish the objectives stated before by assem-bling three scientific publications. These publications and their contributions are stated in nextsection.

1.5 Publications of the compendium and Contributions

The publications that make up the compendium of this dissertation, along with the contributionof the PhD candidate and their general contributions to this thesis, are the following:

(IJ-1) Rodrıguez-Fernandez, Vıctor, Hector D. Menendez, and David Camacho. 2017. “AnalysingTemporal Performance Profiles of UAV Operators Using Time Series Clustering.” ExpertSystems with Applications 70: 103–18.DOI: 10.1016/j.eswa.2016.10.044.Impact Factor = 3.768 (JCR, 2017) [Q1; 20/132; Computer Science, Artificial Intelligence].

– Contribution: The contribution of this paper is presented in Section 3.1, whichpresents a method based on multivariate time series clustering to analyse a set ofrepresentative operator performance profiles, where the evolution of the performanceduring a mission is the main unit of measure.

– Contribution of the PhD candidate:

* First author of the article.

* Co-authoring in the conception of the presented idea.

* Definition of the measures that comprise a temporal performance profile.

* Design, implementation and testing of the proposed method.

* Design and implementation of the webapp used to evaluate the results.

* Design and execution of the experiments.

* Co-authoring in the interpretation and discussion of results.

* Writing of the manuscript with inputs from all authors, and design of the figures.

(IJ-2) Rodrıguez-Fernandez, Vıctor, Antonio Gonzalez-Pardo, and David Camacho. 2018.“Modelling Behaviour in UAV Operations Using Higher Order Double Chain Markov Mod-els.” IEEE Computational Intelligence Magazine 12 (4): 28–37.DOI: 10.1109/MCI.2017.2742738Impact factor = 6.611 (JCR, 2017) [Q1; 9/132; Computer Science, Artificial Intelligence].

– Contribution: The contribution of this paper is presented in Section 3.2.2, whichpresents an extension of the study of behavioural modelling in UAV operations byusing high order Double Chain Markov Models (DCMM).

– Contribution of the PhD candidate:

Page 30: Automated Methods for the Evaluation and Analysis of

1.6. Other Publications and Contributions 9

* First author of the article.

* Co-authoring in the conception of the presented idea.

* Design and implementation of the proposed method.

* Author of the simulation environment and dataset.

* Design and execution of the experiments.

* Co-authoring in the interpretation and discussion of results.

* Writing of most of the manuscript and design of the figures.

(IJ-3) Rodrıguez-Fernandez, Vıctor, Antonio Gonzalez-Pardo, and David Camacho. 2018.“Automatic Procedure Following Evaluation Using Petri Net-Based Workflows.” IEEETransactions on Industrial Informatics 14 (6): 2748–59.DOI: 10.1109/TII.2017.2779177Impact factor = 5.430 (JCR, 2017) [Q1; 5/105; Computer Science, Interdisciplinary Ap-plications].

– Contribution: The contribution of this paper is presented in Chapter 4, where thetask of procedure following evaluation is automated and applied to the operatingprocedures from a realistic UAS.

– Contribution of the PhD candidate:

* First author of the article.

* Co-authoring in the conception of the presented idea.

* Formalization of the modelling concepts proposed

* Implementation of the proposed algorithms.

* Design and simulation of the use cases.

* Co-authoring in the interpretation and discussion of results.

* Writing of the manuscript with inputs from all authors, and design of the figures.

1.6 Other Publications and Contributions

Besides from the main publications of the compendium, other contributions have been generatedduring the development of this thesis which support the research of this work. These publicationshave been organized by journals and conferences, and sorted by year.

International Journals

(IJ-4) Rodrıguez-Fernandez, Vıctor, Hector D. Menendez, and David Camacho. 2016. “Au-tomatic Profile Generation for UAV Operators Using a Simulation-Based Training Envi-ronment.” Progress in Artificial Intelligence 5 (1): 37–46. Ed. by Springer-Verlag.DOI: 10.1007/s13748-015-0072-y.

ïżœ Contribution: This contribution is a previous work in clustering-based performanceprofiling that motivates the study of time-based performance profiles presented Sec-tion 3.1.

Page 31: Automated Methods for the Evaluation and Analysis of

10 Chapter 1. Introduction

(IJ-5) Rodrıguez-Fernandez, Vıctor, Hector D Menendez, and David Camacho. 2017. “AStudy on Performance Metrics and Clustering Methods for Analyzing Behavior in UAVOperations.” Journal of Intelligent and Fuzzy Systems 32 (2): 1307–19. Ed. By IOS Press.DOI: 10.3233/JIFS-169129.Impact Factor = 1.426 (JCR, 2017) [Q3; 76/132; Computer Science, InterdisciplinaryApplications].

ïżœ Contribution: This contribution is a previous work in clustering-based performanceprofiling that motivates the study of time-based performance profiles presented Sec-tion 3.1.

International Conferences

(IC-1) Rodrıguez-Fernandez, Vıctor, Antonio Gonzalez-Pardo, and David Camacho. 2015.“Modeling the Behavior of Unskilled Users in a Multi-UAV Simulation Environment.” InIntelligent Data Engineering and Automated Learning – IDEAL 2015: 16th InternationalConference, Wroclaw, Poland, October 14-16, 2015, Cham: Springer International Pub-lishing, 441–48.DOI: 10.1007/978-3-319-24834-9 51CORE Conference Ranking: C (CORE 2017)

ïżœ Contribution: This contribution is related to Section 3.2, and presents the HMM thatprovides the basis for improving the Markovian-based modelling methods in UASs,as applied in this work.

(IC-2) Rodrıguez-Fernandez, Vıctor, Antonio Gonzalez-Pardo, and David Camacho. 2016.“Finding Behavioral Patterns of UAV Operators Using Multichannel Hidden Markov Mod-els.” In 2016 IEEE Symposium Series on Computational Intelligence, SSCI 2016, Athens,Greece, December 6-9, 2016, IEEE, 1–8.DOI: 10.1109/SSCI.2016.7850101CORE Conference Ranking: C (CORE 2017)

ïżœ Contribution: This contribution is related to Section 3.2.1, which presents the idea ofusing Multichannel Hidden Markov Models (MC-HMMs) to enrich the meaningfulnessof the model states.

(IC-3) Rodrıguez-Fernandez, Vıctor, Antonio Gonzalez-Pardo, and David Camacho. 2016.“A Method for Building Predictive HSMMs in Interactive Environments.” In 2016 IEEECongress on Evolutionary Computation (CEC), IEEE, 3146–53.DOI: 10.1109/CEC.2016.7744187CORE Conference Ranking: B (CORE 2017)

ïżœ Contribution: This contribution is related to Section 3.2, and studies the applicabilityof Hidden Semi-Markov Models (HSMMs) to the data extracted in DWR.

(IC-4) Ramırez-Atencia, Cristian, Vıctor Rodrıguez-Fernandez, Antonio Gonzalez-Pardo,and David Camacho. 2017. “New Artificial Intelligence Approaches for Future UAVGround Control Stations.” In 2017 IEEE Congress on Evolutionary Computation, CEC2017, Donostia, San Sebastian, Spain, June 5-8, 2017, IEEE, 2775–82.DOI: 10.1109/CEC.2017.7969645CORE Conference Ranking: B (CORE 2018)

Page 32: Automated Methods for the Evaluation and Analysis of

1.6. Other Publications and Contributions 11

ïżœ Contribution: This contribution is related to Chapter 4, and presents an implemen-tation of some of the Automatic Procedure Following Evaluation (APFE) approachesdescribed in that chapter into a UAS-like simulation environment, including a graph-ical tool to represent the results of the procedure following.

Submitted International Journals

(SIJ-1) Rodrıguez-Fernandez, Vıctor et. al. 2017. “Conformance Checking for Time Series-aware Processes” Transactions on Industrial Informatics. Ed. By IEEE. Submitted inJune 2019.Impact factor = 5.430 (JCR, 2017) [Q1; 5/105; Computer Science, Interdisciplinary Ap-plications].

ïżœ Contribution: This contribution is related to Section 4, which tackles the problemof conformance checking between a business process and the data produced by itsexecution in cases there the data is not given as an event log, by as a set of timeseries contaiing the evolution of the variables involved in the process. The algorithmsare applied in the domain of longwall mining processes.

Page 33: Automated Methods for the Evaluation and Analysis of

Chapter 2

Backgrounds on trainingoperations in a UAS

This chapter provides the backgrounds related to study of training operations in a UAS, placingparticular emphasis on the tasks of evaluation and analysis carried out by the instructor thatwill be the subjects of research in this work: Performance analysis; Behavioural pattern analysisand Procedure following evaluation

The rest of the chapter is structured as follows: Section 2.1 provides a brief overview oftraining operations in UASs, including both research and industrial perspectives. Then, Section2.2 presents the simulation environments that will be used as a testbed for some of the exper-iments of this thesis: DWR and the SAVIER demonstrator. Finally, Section 2.3 gives somebackgrounds on each of the debriefing processes mentioned above.

2.1 UAS simulation training

UAS Simulation Training allows UAV operators to train in real-time to operate UAVs in avirtual environment that is realistic and accurate, but without the risks and constraints of a realflight. UAS simulation may include:

ïżœ Simulated UAVS: The system allows training on both helicopter and fixed wing UAVs.

ïżœ Training with the actual ground station: The system uses real data generated by the actualUAV autopilot to provide an extremely realistic simulation.

ïżœ The virtual world in which the UAV flies is modeled in 3D with photo textures, andcontains all the necessary features to simulate a mission in operational conditions.

ïżœ Payload simulation: The system generates real-time video to simulate the payload outputin both visible and infrared modes. This video is piped to the actual video station as inthe real system.

ïżœ Simulation features: Multilingual interface, Simulated weather functions, display options(flight panel, UAV trajectory), display of telemetry data. [HS09]

13

Page 34: Automated Methods for the Evaluation and Analysis of

14 Chapter 2. Backgrounds on training operations in a UAS

Whereas the technological evolution of UASs is impressive and the use for these systems ison demand, the community is struggling to keep up with the training requirements associated tothat demand. In [BJBR+16], authors discuss about the need of adaptive training experiences,and about various current training issues for UAS from a Human System Integration Perspec-tive: 1. The need of defining standardized training requirements, following a known job analysisprocess such as the Mission Essential Competency (MEC) methodology; 2. Training environmentlimitations, which refer to the importance of, on the one hand, having an accurate replicationof the UAS environment for an optimal transfer of training, and on the other hand, taking intoaccount how the information is presented to the learner; 3. Human cognitive requirements, whichinvolve leveraging the basic Knowledge, Skills and Abilities (KSA) required to learn the task athand. These skills include comprehension of verbal, written, and visual information, decisionmaking, spatial ability, attention, control precision, time sharing, multitasking and situationalawareness [How11]; 4. Social Limitations, which point to the need for collaborative skills due tothe high impact that collaboration can have on performance after training.

The most advanced architecture and simulation software related to UASs training is usedin the United States Armed Forces, in different branches of services such as the Air Force, theArmy, the Navy and the Marine Corps [oDUSoA12]. However, UAS training has also a place indomestic/civil sectors such as weather researchers [Dar12], police forces [ST16] and firefighters[Rob14]. Even at the level of higher education, the role purposes for UAS training is foundin relation to the training and certification of UAS operators, through official undergraduateprograms with options for getting Bachelor, Master or PhD degrees [Uni19].

2.2 Simulation Environments: DWR and SAVIER Demonstrator

Computer simulations of UASs are an emerging topic. There are at least three motivations forthese type of simulators. One is the role of simulators in adoption of new technology, another istheir potential for low cost training, and finally their utility in research. The four critera usedto jugde the quality of any virtual simulator are defined in [ABSW05]: 1. Physical Fidelity ;2. Functional Fidelity ; 3. Ease of Development ; 4. Cost .

In this thesis we are interested in the utility in research of a simulator, its multi-UAVcapabilities, and its potential for low cost training, especially in terms of the data that can beretrieved during the training operations. For this reason, the simulation environment used asthe basis for the experimentation of some methods presented here has been designed followingthe criteria of data-availability. It has been named as Drone Watch And Rescue (DWR), andits complete description can be found in [RFMC15]. A screenshot of a simulation in DWR isshown in Figure 2.1.

DWR gamifies the concept of a multi-UAV mission, challenging the operator to capture allmission targets consuming the minimum amount of resources, while avoiding at the same timethe possible incidents that may occur during a mission. To avoid these incidents, an operator inDWR can perform multiple interactions to alter both the UAVs in the mission and the waypointscomprising their mission plan. The list of high-level interactions and mission events recorded inDWR is shown in Table 2.1.

The simulator DWR was deployed into a server located at Universidad Autonoma de Madrid(UAM). The testers of the simulator were computer engineering students of the same university

Page 35: Automated Methods for the Evaluation and Analysis of

2.2. Simulation Environments: DWR and SAVIER Demonstrator 15

Figure 2.1: Screenshot of a simulation in Drone Watch & Rescue.

(UAM), all of them inexperienced in this type of systems. A total of 4 test missions were designedfor the training process with different complexity. Table 2.2 summarizes the main features foreach of them. As it can be appreciated, there is an increasing order in terms of the challengethat suppose a mission. The last row of the table indicates the number of simulation logs storedin our database for the corresponding test mission. A total of 285 simulations conform thedatabase, operated by more than 30 different operators.

The data logs from DWR have been used as the basis for all the approaches proposed in thenext chapter (Chapter 3), due to they need of a relatively large amount of data to work properly,and that is something hard to find nowadays in a realistic and professional environment due tothe futuristic aspect of these type of systems. However, with regard to the process mining-basedapproaches proposed in Chapter 4 for dealing with the automation of the task of procedurefollowing evaluation, a more realistic simulation environment has been employed: The SAVIERdemonstrator.

The SAVIER demonstrator is a realistic Ground Control Station (GCS) designed by AirbusDefence & Space for the command and control of one UAV. It features a Human MachineInterface (HMI) known by an expert operator, and general GCS functionalities already validated.Its main goal is to serve as a product in which the research contributions from the SAVIERproject 1 (minimum time search, training, stress, security, augmented reality, gesture and voicecontrol...) can be integrated and validated. The data logs produced in this system follow

1SAVIER: Situational Awareness VIrtual EnviRonment.

Page 36: Automated Methods for the Evaluation and Analysis of

16 Chapter 2. Backgrounds on training operations in a UAS

Table 2.1: Summary of the interactions and events from the simulator DWR, used in some of theexperiments carried out in this thesis.

Symbol Name Description

Operator Interactions

SU Select UAV Allows the operator to fo-cus, monitor and send com-

mands to a specific UAVCSS Change Simulation Speed Increase or decrease the

simulation speed. Usually,UAV missions last manyhours, thus sometimes itis desirable to accelerate

the process to allow a fastsimulation-based training

CUP Change UAV Path Add/edit/remove way-points of any UAV

using the main screenMWT Modify Waypoint Table Edit/remove waypoints

of any UAV usingthe waypoint table

CM Change Control Mode Control modes in DWRmanage how a user can

change a UAV path(Monitor mode, Add

waypoints, Manual mode)CUS Change UAV Speed Change the speed

of a selected UAV

IS Incident Started A new incident starts

Mission Events

IE Incident Ended An incident ends (eitherbecause the operator

opposed it, or because ithad an scheduled end time)

TD Target Detected A UAV detects a targetDD Drone Destroyed A UAV is destroyedAS Action Started A UAV starts an action

(Refueling, Landing...) or amission task (Surveillance)

the message formats and variables described in the NATO Standardization Agreement 4586(STANAG 4586) [FP13], that record the state of the UAV and the Ground Control Station(GCS) periodically during a mission. On the other hand, the nominal response to the warningsand alerts triggered by the system are described as a do-list written operating procedure, andall of them are gathered in the Airbus manual for emergencies in UAV missions 2.

2References and specific details from these procedures have not been included due to a confidential agreementwith Airbus Defence & Space company.

Page 37: Automated Methods for the Evaluation and Analysis of

2.3. Debriefing process for a training session on a UAS 17

Table 2.2: Specification summary for the test missions (T.M) designed in DWR.

T.M.01 T.M.02 T.M.03 T.M.04

Map extension (Km) 440× 160 430× 500 800× 500 800× 500

Objective Supervision Supervision Supervision Planning &Supervision

# UAVs 1 1 3 3

# Tasks 1 1 4 0

# Targets 1 1 4 4

# Incidents 2 2 4 4

# No Flight Zones 1 2 4 4

# Refueling Stations 1 3 4 4

# Simulations in thedatabase

142 31 81 31

Operators

Instructor

Mission Time

UAS

Execu

tion

Debrie

fing

Pre-Exe

cutio

n

Set Scenario and tasks

Mission Planning

Mission Design

New task(s) arrival

Mission Replanning

Trigger alert(s)

Alert Response

Performance analysis

Behavioural modelling

Procedure Following Evaluation

Operating Procedure

1. --2. --3. --

Figure 2.2: Timeline for a training mission, as studied in this work. Orange boxes remark theconcepts that are studied and automated in this thesis.

2.3 Debriefing process for a training session on a UAS

Figure 2.2 shows the timeline for a training mission in a UAS, as studied in this work. Theresponsibilities of the mission are divided between two roles:

ïżœ The operator(s), who are responsible for designing, planning and supervising the state ofthe mission to guarantee that the mission objectives are fulfilled.

ïżœ The instructor, who is responsible for defining the objectives to be accomplished in themission. He/she can also act during the mission execution triggering alerts so that the op-erator is forced to respond by following the corresponding operating procedures. After themission is ended, he/she will provide information about the performance of the operator,the most common failure patterns and the level of procedure following.

In this thesis, we are focused on helping the instructor of a training operation by improving

Page 38: Automated Methods for the Evaluation and Analysis of

18 Chapter 2. Backgrounds on training operations in a UAS

the analysis (or debriefing) phase of the mission, with the automation of some tasks that arestill performed rudimentarily: 1. Performance analysis; 2. Behavioural modelling; 3. ProcedureFollowing Evaluation (PFE) over Operating Procedures or checklists.

2.3.1 Performance Analysis of operators in a UAS

The performance of operators in a UAS must be evaluated following the criteria of the field ofHuman Supervisory Control (HSC) in Human-Robot Interactions (HRIs) systems. Accordingto the research of Crandall et al. in [CC07], the different metric classes (set of metrics) definingthe effectiveness of a Human-Robot Team (HRT) should:

ïżœ Contain the Key Performance Parameters (KPPs): A KPP is a measurable quantity that,while often only measuring a sub-portion of the system, indicates the overall effectivenessof the team.

ïżœ Identify the limits of the agents in the team: It is needed to measure the capacity of bothhuman operator and robots in the team.

ïżœ Have predictive power : It is needed that the metrics have capabilities of generalizing andpredicting the effectiveness of the system under uncertain or untested conditions.

For the goals of this thesis, we focus on the metric class of human performance. The mostcommon metrics to assess human performance on HRI systems focus on the operator workloadand its Situational Awareness [DSY03]. However, it is also interesting to define some metricsthat collect the performance of an operator in a direct way, as a global score indicating theperformance quality. These metrics, also known as Direct measures of performance quality,create an operator profile, and are widely used, for example, in the world of videogames [Beg00].

2.3.2 Behavioural patterns of operators in a UAS

The analysis of human behavioral patterns has been studied in different research fields andapplied in a wide variety of applications. As an example, applied psychologists traditionally usebehavioral models to understand the theoretical aspects of human decision making [PBR12].Another application of the computational models of human cognitive processes is to predict theconsequences of high workload and time pressure [GJA92]. Recent approaches for modeling userinteractions are exclusively data-driven, and rely on pattern recognition techniques to predictfuture behaviors from user interface events. Some of the most popular modeling techniques inthis field are Tree-based models [GO03], Bayesian Networks [HPJ10], Clustering [RFMC16] andMarkov-based models [AK12].

In the field of UAV operations, the study of pattern recognition and operator modellingis undoubtedly led by the group headed by M.L. Cummings at the Massachusetts Instituteof Technology. Their work to model and predict the operator behavioral patterns consist ofbuilding HMMs [Vis11] and HSMMs [Yu10] representing behavioral states from the clicks thatan operator make during a simulation of a multi-UAV mission, where the operator is responsiblefor replanning tasks [BC11] or for detection of UAV hacking detection [ZCE+19]. Apart from

Page 39: Automated Methods for the Evaluation and Analysis of

2.3. Debriefing process for a training session on a UAS 19

the good results shown in these works, it is remarkable to notice the conclusions they reach whencomparing supervised vs unsupervised learning techniques when creating the operator modelsfor multi-UAV systems. They say that, due to the fact that multi-UAV systems are still futuristdevelopments, it is impossible to trust any expert trying to label the operator interactions inorder to make an objective supervised analysis, hence we can only work in this field by usingunsupervised learning techniques [BCFR11].

2.3.3 Operating Procedures & Procedure Following Evaluation

The number of human-dependant critical tasks is increasing every day in a large number of works.To reduce the risk involved in them, a step-by-step guiding tool is usually given, describing thedifferent checks and actions that the person responsible for this task, namely the operator,must perform in order to solve it successfully. These guiding tools are referred in this work asOperating Procedures (OPs).

An Operating Procedure (OP), also known as checklist or action plan in many domainfields, is a list of actions or criteria arranged in a systematic way, commonly used in areassuch as aviation [OP14] or healthcare [UGS+14] to ensure the success of critical tasks and tohelp decrease human errors. In these areas, operators are trained intensively to follow theOP carefully, but the evaluation of how they are following it is usually performed manuallyby an expert instructor [DKL+13]. This is what we will, henceforth, call Procedure FollowingEvaluation (PFE). Automating this evaluation process would lead to an objective and scalableanalysis of the operator performance, which is extremely important not only for the scalabilityof training operations, but also for extracting objective measures about the performance of theoperators in this context.

Page 40: Automated Methods for the Evaluation and Analysis of

Chapter 3

Data-driven learning approachesfor the analysis of performance

and behavioural patterns

In this chapter, we will study the applicability of automated methods to create behaviouralmodels of UAV operators based exclusively on the sequential data that is recorded during atraining mission in a simulation environment, i.e., we will not make use of any prior knowledgeof the nominal behaviour that an operator must follow. In the domain of multi-UAV systemsthere is still a high shortage of experts able to label objectively, in order to develop a super-vised analysis. Thus, we can only work in this field by using unsupervised learning techniques[BCFR11].

This chapter is structured as follows: The first section describes a method that relies on aset of performance measures as a way to create a model of representative performance profilesthrough time series clustering. Then, the next section proposes some improvements in the stateof the art methods for cognitive task analysis on supervisory control systems, which try todiscover hidden cognitive processes from the patterns of visible events extracted from humanbehaviour, via HMMs and derivatives. All the experiments will be carried out on the simulationenvironment DWR, which was presented in Section 2.2.

The contents of this chapter are related (mainly) to Publications 1 and 2 from thecompendium 1.

3.1 Automatic retrieval of representative performance profiles

This section is focused on proposing a method to extract automatically the most representativeperformance profiles from a simulation environment. A performance profile comprises a setof time series (i.e, it is a multivariate time series) representing the evolution of a number ofperformance measures throughout the execution of a mission. Obtaining and analyzing the

1The contents of Section 3.2.1 can be found in [RFGPC16].

21

Page 41: Automated Methods for the Evaluation and Analysis of

22Chapter 3. Data-driven learning approaches for the analysis of performance and behavioural

patterns

most representative performance profiles is really useful for improving the quality of simulation-based training systems, since it can help not only to exploit general behavioral patterns amongsimulations, but also to detect off-nominal performances and to study whether the behavior ofa specific operator changes when he/she is encountered in dangerous situations.

In previous works [RFMC16, RFMC17a], the performance measures were computed globally,in an aggregated (or static) way, hence every simulation was described as a numeric tuple(m1,m2, . . . ,mM ) (assuming that M metrics have been defined), where each metric mi wasrepresented by a value in the range [0, 1], being 0 the worst performance for that metric, and1 the best. However, in this work every performance measure is defined dynamically as a timeseries, thus not only we are able to analyze the general performance of a simulation, but also tostudy its evolution and to detect the time intervals where the values of a specific measure tendto increase or decrease.

3.1.1 Method

Given a log of simulations and a set of M performance measures, this process will blindlycompute those measures for each simulation and extract the most representative profiles using atwo-step clustering-based process. At the end of the process, several representative profiles willbe generated, ready to be analyzed and described by a domain expert. Below are detailed thesteps in which this method can be divided. In Figures 3.1 and 3.2, a graphical overview of thisprocess is shown. The method is composed of three main steps: 1. time series clustering on eachperformance measure separately; 2. cluster assignation distance between performance profiles;3. medoid-based clustering to extract the most representative performance profiles

3.1.1.1 Applying time series clustering on every performance measure separately

By using a set of M time-dependant performance measures, every simulation is processed andtransformed into M time series, i.e, into a M -dimensional time series. Each dimension representsthe evolution of a performance measure. This multivariate time series comprises the profile ofthat simulation, namely the performance profile.

The first step in this method consists in extracting patterns among the M performancemeasures (i.e. among the M dimensions) separately. For this purpose, we will make use oftime series clustering techniques. A graphical overview of this step of the method is shown inFigure 3.1.

In order to perform time series clustering, we need to fix three important parameters: 1. timeseries dissimilarity measure (”); 2. clustering method (Ο); 3. number of clusters (k1). Since wehave no prior information about the different groups in which each performance measure can bediscriminated, we will compute different clustering solutions using different values of ”, Ο andk1. Then, in order to automatically decide which is the best discrimination for each performancemeasure, the results of all those clusterizations will be assessed by three internal validationindices, based on the works of Hennig et al.: [HL13] 2 1. Average Silhouette Width (ASW);2. Calinski and Harabasz index (CH); 3. Pearson version of Hubert’s Γ (PH).

2More details are presented in Publication 1 [RFMC17b].

Page 42: Automated Methods for the Evaluation and Analysis of

3.1. Automatic retrieval of representative performance profiles 23

SIMULATION DATA

Performance Measure 1

Performance Measure M

Step 1

Apply clustering

Time Series Dissimilarity Metrics

(Ό)

Clustering Methods(Ο)

Number of clusters (k)

Parameter combination Validation Indices

ASW

CH

PH

Max.Validation

Rating (VR)

Clusters for Performance Measure 1 Clusters for Performance Measure M

N simulations

Figure 3.1: Finding the best discrimination for each of the M performance measures separately.

In order to automatically choose the best discrimination based on these validation indices, wedefine a final Validation Rating (VR), which balances the scores obtained for each of the indicesdefined above. Since all the indices defined denote better clusterizations when maximized, theValidation Rating (VR) is defined as:

VR(”, Ο, k) =ASW (”, Ο, k)

max”,Ο,k ASW+

CH(”, Ο, k)

max”,Ο,k CH+

PH(”, Ο, k)

max”,Ο,k PH, (3.1)

where k1 refers to a specific number of clusters tested in the validation process, ” refers to atime series metric and Ο to a clustering method. Using the criteria of Eq. 3.1 allows us tochoose a discrimination that may not be the best in one of the validation indices, but guaranteesreasonable values in all of them. The combination of parameters ”, Ο and k1 whose clustering

Page 43: Automated Methods for the Evaluation and Analysis of

24Chapter 3. Data-driven learning approaches for the analysis of performance and behavioural

patterns

Clusters for

Performance Measure 1

Clusters for Performance Measure M

Performance Measures

Sim

ulat

ions

Cluster Assignation

Distance

Final Clustering

PAM Clustering

Validation Indices

ASW CH

PH

Number of clusters (k2)

Parameters

Performance Measure M

Performance Measure 1

Performance Measure M

Performance Measure 1

Representative Performance Profiles

Cluster Assignation Matrix

Figure 3.2: Extracting the most representative performance profiles based on the clustering resultsobtained by following the process described in Figure 3.1

result maximizes the value of V R(”, Ο, k) will be chosen and pass to the next step.

3.1.1.2 Cluster assignation distance between performance profiles

Once the previous step is finished, the M performance measures for all the simulations in thedataset have been clustered into groups of shared temporal behaviour, sharing some featuressuch as the monotony, or the minimum and maximum values reached. The next step consists indefining the similarity between two performance profiles based on the definition of those clusters.This part of the method is based on the work of Menendez et al. [MVC14].

Let {Cmi }kmi=1 be the clusters obtained after applying time series clustering on the mth per-formance measure. Note that the number of clusters, km, can vary depending on the measurereferred. Each of the N performance profiles belongs to one cluster per measure. Denoting bycmn , 1 ≀ n ≀ N , 1 ≀ m ≀ M the assignation of the nth performance profile to a cluster of themth performance measure, with cmn ∈

{Cm1 , . . . , C

mkm

}, we can build a N ×M matrix containing

the cluster assignations for all the simulations in the dataset:

c11 · · · · · · cM1......c1N

· · ·

...

· · ·...cMN

(3.2)

Rows in Eq. 3.2 represent different simulations and columns account for each of the Mperformance measures used. Given this matrix, we define a dissimilarity measure between two

Page 44: Automated Methods for the Evaluation and Analysis of

3.1. Automatic retrieval of representative performance profiles 25

simulations (rows) based on the number of cluster assignations shared among them. Formally,the Cluster Assignation Distance (CAD) between two performance profiles si and sj is definedas:

CAD (si, sj) = 1−∑M

m=1 ÎŽmi,j

M, (3.3)

where M is the number of performance measure considered, and ÎŽ is the Dirichlet delta (i.e.,the coincidences) defined as:

ÎŽmi,j =

{1 if cmi = cmj0 otherwise

3.1.1.3 Medoid-based clustering to extract the most representative performance profiles

Given the cluster assignation matrix (Eq. 3.2) and the cluster assignation distance (Eq. 3.3),the pairwise dissimilarity matrix among all the simulations can be computed, and used as inputfor a conventional clustering algorithm. In this case, since we are interested in analyzing themost representative performance profiles, we will perform a medoid-based clustering algorithmto gather the simulations together based on the defined dissimilarity metric and extract themedoids of each of the resulting clusters.

For this work, the medoid-based clustering algorithm used in this last step is the classicalPartition Around Medoids (PAM) method [KR87]. However, as it happened in the first roundof clustering of this method, an optimal number of clusters (or medoids in this case), namely k2,needs to be established. The process to select this value will be the same as in the previous step,i.e, we will assess several possibilities via a set of validation indices and get the one maximizing abalance ratio among all of them (see Eq. 3.1). After that, the optimal medoids will be obtainedand will conform the most representative performance profiles in the dataset. The analysisof these medoids, carried out by a domain expert, will provide helpful information about thebehavioural patterns followed in the simulations and the causes that increase or decrease theperformance of an operator over time.

3.1.2 Experimentation

In this section, the proposed method is tested using a database of simulations from DWR. Allthe operators that participated were inexperienced in this type of systems. The mission used inthe experiment was Test Mission 3 (See 2.2 to see more information about the test missions ofDWR and the database of simulation logs).

The information retrieved by DWR throughout a mission is used to define a set of perfor-mance measures that asses the performance of an operator in a specific simulation, and thus,comprise a performance profile in this simulation environment. A total of six performance mea-sures have been defined: Score(S), Agility(A), Attention (At), Cooperation (C), Aggressiveness(Ag) and Precision (P). All of them take values in the range [0, 1], and are defined cumulativelyover time. This means that, given an instant t in the simulation time, the value of a performancemeasure will depend on the information retrieved from time 0 (simulation start time) to time t.Following this, a performance profile s is defined as a multivariate time series with the 6-tuple:

(S(s), A(s), At(s), C(s), Ag(s), P (s))

Page 45: Automated Methods for the Evaluation and Analysis of

26Chapter 3. Data-driven learning approaches for the analysis of performance and behavioural

patterns

A complete description of these performance measures can be found in Publication 1 [RFMC17b].For the sake of reproducible research, the source code related to this experimentation is availableon Github 3.

3.1.2.1 Evaluation criteria

In order to perform an external evaluation of the clustering results obtained with this method,and to compare them objectively against other clustering approaches, we have created a groundtruth dataset based on collective human judgement, inspired by the work of Afnan et al. in[ASSB+16], where the similarity of a set of mobile apps is rated manually by several users. Theground truth is created by asking users to rate the similarity of pairs of time series, correspondingto the evolution of a specific performance measure between two randomly selected simulationsexecuted in DWR. Ratings are given on a 5-star rating system [PL05], where 1 star indicatesthe lowest possible similarity and 5 the highest.

Formally, let Sp = {(si, sj)}Ni,j=1 be the set of all possible pairs (ignoring order) of simulationsin our simulation dataset S, the ground truth can be defined as a function ASR : Sp → [1, 5],where ASR refers to the Average Similarity Rating between two simulations si and sj , computedby averaging the ratings between them.

To measure the accuracy of a clustering result against this type of ground truth, we checkwhether the pairs of simulations rated with high similarity ratings are assigned to the samecluster or not. If the ASR between a pair of simulations in the ground truth is greater than agiven match acceptance threshold (ΞA) (a value between 1 and 5), then the ground truth is sayingthat they are very similar, so a given clustering solution should locate them at the same cluster.On the contrary, if the ASR between the two simulations is lower than a given match rejectionthreshold (ΞR), the clustering solution is expected to place the simulations into different clusters.If the ASR falls between the two thresholds, then we consider that the human judgment is notdecisive, and that pair is not taken into account. With this, we define the Pairwise-Accuracy(P-Acc) as the percentage of concordance between the clustering solution and the ground truthover every pair of simulations 4.

A total of 3 raters has contributed to the creation of the ground truth data. A total of1742 ratings were gathered, covering the 82% of all possible cases. To measure the degree ofconsistency among raters, the Kendall’s Coefficient of Concordance (W ) is used [KS39]. Theachieved coefficient is 0.58 (p-value = 0.0018), and thus, according to the common criteria tojudge this value [Rem10], there is a moderate agreement among raters.

3.1.2.2 Results from applying the proposed method in DWR

In this section, we will detail the intermediate and final validation results of the proposed method,and check the evaluation results against the ground truth data. All the hyperparameters seenin the proposed method will be assigned to a value, or a set of values, in the context of thisexperiment. A summary of this parameter tuning is shown in Table 3.1.

3https://github.com/vrodriguezf/ESWA-2017.4A detailed explanation of this measure can be found in Publication 1, Algorithm 1 [RFMC17b].

Page 46: Automated Methods for the Evaluation and Analysis of

3.1. Automatic retrieval of representative performance profiles 27

Table 3.1: Parameter tuning for all the variables involved in the experiment.

Context Parameter Value

DWRNumber ofperformance measures (M)

6

Sampling Resolution 2000 ms

ProposedMethod

Time Series MetricsFrechetDTWarp

Clustering methodsAGNESDIANAPAM

Possible numberof clusters for step 1 (K1)

2 . . . 8

Possible number ofclusters for step 2 (K2)

3 . . . 8

Ground truth andClustering Evaluation

Minimum number ofperformance measuresrated for eachpair of simulations

4

Match AcceptanceThreshold (ΞA)

3.5

Match AcceptanceThreshold (ΞR)

2.5

Table 3.2: Summary of the best validation results of the time series clustering applied to each ofthe performance measures used, corresponding to Step 1 of the method proposed in this work.

PERFORMANCE MEASURESS A At C Ag P

DissimilarityMetric

Frechet Frechet DTWarp Frechet Frechet Frechet

ClusteringMethod

PAM PAM AGNES AGNES DIANA PAM

Number ofClusters (k1)

8 8 3 7 8 8

ASW 0.586 0.581 0.708 0.606 0.580 0.587

CH 608.725 596.207 1354.357 529.529 510.083 611.322

PH 0.389 0.397 0.423 0.436 0.443 0.398

ValidationRating(V R)

2.332 2.335 2.243 2.376 2.390 2.344

Due to the large number of parameter combinations tested to find a good cluster discrimi-nation for each performance measure in the first step of the method, only the best results aresummarized in Table 3.2 for legibility purposes. As it can be seen, all dissimilarity metrics andclustering methods tested are selected as “best” at least once. The Validation Rating introducedin this work allows an easy comparison among clusterizations and avoids the differences in therange of each of the validation index. The optimal number of clusters chosen is, excluding the

Page 47: Automated Methods for the Evaluation and Analysis of

28Chapter 3. Data-driven learning approaches for the analysis of performance and behavioural

patterns

Table 3.3: Validation results for the final clustering process of the proposed method. Bolded cellsrepresent the best results obtained.

ASW CH PH VR

K2 = 3 0.451 22.919 0.592 1.417K2 = 4 0.628 44.298 0.828 2.086K2 = 5 0.701 53.13 0.843 2.273K2 = 6 0.73 67.724 0.897 2.498K2 = 7 0.788 112.873 0.924 3.000K2 = 8 0.782 103.035 0.807 2.779

Attention and Cooperation measures, always the minimum or maximum value of k tested. Thisgives us general information about the variance in the temporal evolution of each of the metricsand must be taken into account when analyzing the performance profiles: those time seriesgrouped into 8 different clusters will define a richer set of behaviors and must be given moreimportance than those with only 2 different patterns detected (best k is 2).

Table 3.3 shows the validation results for each of the values of k2 tested in this last clusteringprocess. As it can be seen, the selected k2 not only get the best general rating (represented bythe Validation Rating), but also maximizes each of the validation indices independently. The7 medoids of this clusterization represent the most representative performance profiles for thisdataset. Section 3.1.2.4 will focus on analyzing those profiles (medoids) and give some ideasabout the typical behaviours followed by the operators of this experiment.

With regard to the external evaluation, the Pairwise Accuracy (P-Acc) marks 84.09%, whichis quite a good result taking into account the accuracy values usually obtained when using humanjudgement-based ground truth data. As an example, in the world of sentiment analysis, valuesof accuracy above 70% are considered more than acceptable [PP10].

3.1.2.3 Comparative study between the proposed method and other multivariate time seriesclustering approaches.

In this section, we are interested in finding out if the proposed method performs better than otherclustering approaches. Due to the unit of analysis that we want to cluster is a performance profile,which is a multivariate time series composed of the evolution of several performance measures,we will compare our approach against a PAM clustering applied over different multivariate timeseries distances from the literature. As a requisite, we need that the distance accepts time seriesof different length, so that we can compare simulations with different durations. Below is detailedthe list of multivariate time series measures used for this comparison is 5: 1. Mean Frechet ;2. Mean Dynamic Time Warping (DTW); 3. Penrose Distance; 4. Mahalanobis Distance.

The results of this comparison are shown in Table 3.4. The value used to compare theclustering results is the Pairwise Accuracy (P-Acc) against our ground truth dataset. Resultsfor the other clustering approaches are given for different number of clusters, ranging from 3 to8, exactly the same range of values used in the last step of the proposed method. Note that theresults of the proposed method for values of K2 different than K2 = 7 have a low interest, since

5See more information about them in Publication 1.

Page 48: Automated Methods for the Evaluation and Analysis of

3.1. Automatic retrieval of representative performance profiles 29

Table 3.4: Comparative results, in terms of Pairwise Accuracy (P-Acc), between the proposedmethod against direct clustering approaches based on multivariate time series distances. The resultsare compared for different number of clusters (K2). While the bolded cell indicates the result obtainedfor the proposed method, cells in italics show the results that surpasses our best value.

PenroseDistance

MahalanobisDistance

MeanDTWARP

MeanFrechet

ProposedMethod

K2 = 3 60.61 46.97 62.12 68.94 -K2 = 4 66.67 73.48 73.48 71.97 -K2 = 5 73.48 75.76 76.52 81.82 -K2 = 6 75.00 78.03 77.27 84.09 -K2 = 7 83.33 84.85 78.79 87.88 84.09K2 = 8 87.12 88.64 86.36 89.39 -

these values were not chosen as best in the internal validation process.

From the results we can appreciate that, on a total of six occasions, some other clusteringresult has surpassed the P-Acc value obtained with respect to the proposed method. Also, itcan be noted that in general, the Mean Frechet distance is the most suited for this experiment,because the nature of the performance measures defined for this specific experimental setupfavours the Frechet distance, as it can be seen in Table 3.2. However, these results have tobe analyzed with the sights set on a bigger picture, due to the proposed method is clearlyintended for being flexible, specially in cases where the nature of the performance measurescomprising the performance profile is very diverse, and thus the use of different time seriesclustering configurations for each of them would result of extreme utility, not only for achievinghigher accuracies than multivariate time series clustering methods, but also for having specificinsights into the best discrimination of each measure separately. Even so, achieving a P − Accvalue of 84.09% is clearly above the mean accuracy for the rest of the methods, and this isachieved without the need of selecting any parameter a priori. In fact, since the proposedmethod is scalable, it may be the case that adding more clustering methods or more dissimilaritymetrics to the first step of the method would lead to an increase of the accuracy. In conclusion,summarizing the pros and cons of applying this the proposed method, we conclude that thismethod is quite accurate and interesting for open and new environments where the nature of thetime series is unknown, and though, one does not know a priori which clustering configurationis optimal for the problem.

3.1.2.4 Interpretation of the performance profiles found

Figure 3.3 shows two of the representative performance profiles found in this experiment. Inorder to facilitate the analysis, red lines mark the instants when some incidents are triggered,and a green line marks the moment when the operator started to accelerate the simulation speedfor the first time, denoting the end of the mission preparation phase and the beginning of themission execution phase. During the mission preparation phase, the simulation is paused, andthe operator can spend some time overviewing the scenario and making changes in the initialmission plan. Below are described the two profiles of Figure 3.3. The rest of them can be foundin Section 6 of Publication 1.

Page 49: Automated Methods for the Evaluation and Analysis of

30Chapter 3. Data-driven learning approaches for the analysis of performance and behavioural

patterns

SA

At

CA

gP

0e+00 1e+05 2e+05Time (ms)

Per

form

ance

Mea

sure

[0,1

]

(a) Passive Monitoring

SA

At

CA

gP

0e+00 1e+05 2e+05 3e+05 4e+05Time (ms)

Per

form

ance

Mea

sure

[0,1

](b) Aggressiveness to incidents, single target

tracking

Figure 3.3: Plots of two of the representative performance profiles found. Red lines mark timeswhen an incident was triggered, and the green line indicates the moment when the mission prepa-ration phase finishes and the execution phase starts. Each subplot contains the evolution of the sixperformance measures (S,A,At, C,Ag, P ) for comprising a performance profile.

1. Passive Monitoring (Figure 3.3a): This performance profile features a constant level ofattention once the mission preparation phase has ended. This means that there have beenscarcerly a few interactions during the mission execution phase. Thus, all the performancemeasures which depend directly on the interactions will remain constant. Incident timesare close to each other, which means that the simulation speed established at the beginningof the mission execution phase is high. Despite all this, the Score does not decrease untilthe end of the mission, which suggests that operators within this performance profile trustin the pre-loaded mission plan in order to detect all targets.

2. Aggressiveness to overcome incidents (Figure 3.3b): This performance profile features atype of aggressive operation. After a soft mission preparation, only dedicated to overviewthe map (no paths are changed because aggressiveness marks 0), the simulation beginswith high speed, and to overcome the incidents, the paths of the UAVs are completelyredesigned (maximum aggressiveness). The rest of the simulation maintains the path ofone single UAV, ensuring that it detects all targets. The mission finishes with maximumscore, which indicates that all targets have been detected and none of the UAVs weredestroyed.

3.2 Modelling hidden patterns through Markovian Models

The approach of modelling the behaviour of operators using performance profiles has one maindrawback: it is not aplicable to some simulation environments, due to it relies heavily in theperformance measures that define a profile. Those measures are usually related to the specificsystem that is being analysed, and thus, we cannot define a performance profile if the system doesnot provide a good set of performance measures, or at least, enough information to post-process

Page 50: Automated Methods for the Evaluation and Analysis of

3.2. Modelling hidden patterns through Markovian Models 31

them.

One way to overcome this problem is to use directly the raw data from the operator interac-tions to create the behavioural model. In this regard, as it was mentioned in Section 2, the worksof Boussemart et al. [Bou11] set the usage of Hidden Markov Models (HMMs) and Hidden Semi-Markov Model (HSMM) as the state of the art in modelling and predicting knowledge-basedsupervisory tasks such as the control of UAVs. The particular applicability of these modelscan be explained by their hidden-visible structure, which allows to infer underlying cognitiveprocesses from the patterns of visible events like the operator interactions. At the same time,these models provide valuable predictions which can be used to detect abnormal behaviour intime.

Discrete HMMs are stochastic models mainly used for modelling and predicting sequencesof discrete values, or symbols. They are characterized by a set of N (hidden) states, which canbe interpreted as phases in a cognitive process, each of the producing typical behaviour [Vis11].The term Markov pertains to the time-dependence between the consecutive states, which followa Markov process. This means that the current state only depends on the previous state andnot on earlier ones. The transition probabilities between the states of the model are denoted bya stochastic N ×N square matrix, called transition matrix. On the other hand, the term hiddenin a HMM indicates that the underlying states cannot be observed directly during the process,but what we see is the emission of that state. The model can emit only one of the possible Kobservation symbols in each state at each time step.

The estimation of the parameters of a HMM based on recorded sequential data is usuallysolved by the so-called Baum-Welch algorithm [BP66], which is form of expectation-maximizationalgorithm that tries to maximize the likelihood of a set observation sequences to be producedby the model. To choose an optimal number of states without prior knowledge about the modeltopology, Markovian models are usually compared with the Bayesian Information Criterion(BIC) [BA04]. The BIC penalizes the likelihood of a model by a complexity factor proportionalto number of parameters in the model and the number of training observations, so it givesadvantages to simple and general models.

In a previous preliminary work [RFGPC15], the state of the art method proposed by Bousse-mart et. al. in [Bou11] was applied to the data extracted in the simulation environment DWR,in order to create and select an optimal HMM that exploits the behavioural patterns from theinteractions of inexperienced operators. The selected model (via the BIC measure) is showngraphically in Figure 3.4. Low-probability transitions and self-transitions are omitted for leg-ibility purposes. Slices in the pie charts represent the emission probabilities for each operatorinteraction. The edge width is proportional to the transition probability. State labels comprisethe state name (assigned after the model analysis) along with the initial state probability (inbrackets). Legend symbols make reference to the interactions described in Table 2.1.

The above model contains interesting insights about the behaviour of operators in DWR 6,but it opens the way for several improvements. On the one hand, states like “Defining paths” hasno likely chance of going outside it (it is an absorbing state), which makes it difficult to extractrobust conclusions from the model if one has no extra information about what is happening inthe mission in that moment. On the other hand, the Markov assumption may result insufficientto detect long hidden patterns, and thus, it is interesting to study the applicability of more

6See the model analysis in [RFGPC15].

Page 51: Automated Methods for the Evaluation and Analysis of

32Chapter 3. Data-driven learning approaches for the analysis of performance and behavioural

patterns

Figure 3.4: 5-states HMM fitted for DWR, analysed in terms of the operator interactions.

general models that relax this assumption.

3.2.1 Enriching the meaningfulness of the model states via MC-HMMs

Since a classical HMM only allows to model a single sequence of data, in our case the sequence ofoperator interactions, the resulting cognitive states may lack of additional information regardingthe changes in the course of the mission. In this section, we will study the applicability of MC-HMMs (or Multivariate) to enrich the states of the model with the usage of parallel sources ofinformation: the interactions performed by the operators in the simulation environment, andthe events that describes the course of the mission.

MC-HMMs, or Multivariate HMMs, are an extension of a classical HMMs, where the se-quence data feeding the model is divided into C parallel sequences. The term “Multichannel” isadopted from the works of Helske et al. in [HH16], and makes reference to groups of categoricaldata sequences, rather than numerical or continuos time series, for which the term “Multivariate”is the most common [CWM08].

3.2.1.1 Method

A graphical overview of the method designed to build and analyze a MC-HMM exploiting thepatterns found among UAV operators during a training session in the simulator DWR is shown inFigure 3.5. As it was mentioned in Section 2.2, DWR stores a simulation as a list of asynchronous

Page 52: Automated Methods for the Evaluation and Analysis of

3.2. Modelling hidden patterns through Markovian Models 33

Simulation 1

Simulation K

Step 1. Filter valid

simulations

o11 o21 ... oT1

o12 o22 ... oT2

o11 o21 ... oT1

o12 o22 ... oT2

Step 2. Sequence

processing

Last Interaction

Last Mission EventS

imul

atio

n 1

Sim

ulat

ion

K

Step 3. Building Model

MC-HMM Learning

N.States

BICNumber of Rare States

DWR Database

Step 4. Analyze model

DWR Operators

Training

Operation

Selected MC-HMM

Autom

ated processH

uman Intervention

Instructor

Figure 3.5: General scheme of the method of analysis in DWR with MC-HMMs.

and timestamped snapshots. Despite HMMs model time information, they do not consider timeas a continuous variable, but a discrete one. For that reason, every log must be discretized intoequidistant time steps, where each one contains, at the most, one log entry. Then, the discretizedlog is divided into two aligned categorical data sequences:

ïżœ Last Interaction: It contains, for every time step, a symbol identifying the last commandperformed by the operator.

ïżœ Last Mission Event : It contains, for every time step, the last event happened in the missioncourse.

By having this multi-channel sequence representation, not only we are able to analyze theoperator response, but also we can relate the concepts of “What happened” to “What was done”.

In order to perform the model selection, a pool of MC-HMM candidates, with differentnumber of states are trained and scored on two different measures:

1. Bayesian Information Criterion (BIC).

2. Number of Rare States (NRS): In addition to BIC, we add a second rating that counts thenumber of “rare” hidden states in a HMM, to ensure the simplicity and interpretability ofthe model 7.

The candidate model which minimizes both selection measures (BIC and NRS) will be se-lected as the best model to fit the dataset. Finally, the selected MC-HMM will be presented

7More details of this proposed measure are given in [RFGPC16].

Page 53: Automated Methods for the Evaluation and Analysis of

34Chapter 3. Data-driven learning approaches for the analysis of performance and behavioural

patterns

Table 3.5: Results for the MC-HMM model selection. The bolded row indicates that the 6-statesmodel is chosen, since it obtains good values for both the BIC and NRS measures.

Nstates LogLik BICNumber of

Rare States (NRS)

2 -26326.87 52814.3 03 -23063.94 46401.76 14 -19846.67 40099.45 15 -17545.55 35648.31 26 -15621.96 31971.14 07 -15207.23 31330.55 48 -13679.73 28483.32 69 -13235.95 27822.44 610 -12160.1 25916.29 7

to the training instructor (See Figure 3.5, step 4), who is responsible for analyzing it, givingsense to each of the hidden states and interpreting the most representative patterns found inthe underlying Markov chain.

3.2.1.2 Experimentation

In this section, the modelling process detailed above is applied to the data obtained after traininga set of inexperienced operators in a training mission of DWR, namely Test Mission 3 (see Section2.2, Table 2.2). After applying the filter to clean the useless simulations, only 55 simulationsfrom the experiment were considered as useful for this experiment. Since we want the MC-HMMto be interpretable, the maximum number of possible states is set to 10. In addition, a hiddenstate is considered “rare” when it is visited less than 5% in average.

The results for the model selection process are shown in Table 3.5. For every possible value ofthe number of hidden states, multiple models are trained, and from them, we choose as candidatethe one maximizing the (log)-likelihood to the training data. It can be seen that, from state 2to state 10, the BIC measure is always decreasing. However, the rate of decline is lower frommodels with 6 states, which can be seen as an “elbow” in the BIC decreasing. Furthermore, themodel with 6 states also minimizes the Number of Rare States (NRS), which is a sign that, byhaving 6 states, we achieve a simple and fair description of the patterns hidden throughout thesequences.

A graphical presentation of the selected 6-state MC-HMM is shown in Figure 3.6. To allow abetter model analysis, the state emission probabilities are combined across channels, and drawnas a pie chart within each of the states (nodes) 8. A deep analysis of each of the hidden statesis given in [RFGPC16].

Comparing this model with the equivalent with one data channel (See Figure 3.4), it isclear by seeing the state labels that now the model states describe higher level situations. As anexample, the initial state of the previous model, Selecting UAVs, represents just a major presence

8Interaction SU from the table has been renamed as SD (Select Drone). Interaction CSS from the table hasbeen renamed as ST (Set Simulation Speed). Interactions CUP and MWT from the table have been joint as a singlesymbol CP (Change Path). Interaction CUS from the table has been removed from this setup.

Page 54: Automated Methods for the Evaluation and Analysis of

3.2. Modelling hidden patterns through Markovian Models 35

Figure 3.6: 6-states MC-HMM fitted for DWR, analysed in terms of the parallel occurences ofoperator interactions and mission events.

of the interaction SU at the beginning of the mission. However, now the most likely initial stateis Monitoring, which contextualizes the interaction SD (alias for SU) with moments when a taskbegins (event AS) or an indicent ends (IE). This behaviour is characteristic of those parts of thecourse of the simulation where the operator does not need to alter anything in the mission andsimply selects and monitors the status and the trajectory of the different UAVs. In sum, byusing MC-HMMs to add extra information in the model apart from the operator interactions,we achieve more robust and informative models than those from previous experiments.

3.2.2 Analysing the applicability of high order double chain Markov models

There are two main drawbacks in the HMM-based modelling process proposed by Boussemartet al. for the supervisory control of UAVs [BC11]. On the one hand, these models rely on thefirst order Markov assumption, which implies memoryless transitions from one hidden state toanother, limiting their predictive capabilities and the possibilities to discover more complex andlong patterns when interpreting the models. On the other hand, HMMs make the assumptionof conditional independence between the visible observations, which is unlikely to hold for UAVoperators. As an example, an operator may maintain the same cognitive (or hidden) state“Replanning mission” for a time, and meanwhile follow a typical interaction (or visible) pattern“Select UAV-Add Waypoint” that must be captured outside the hidden chain.

In this section, we extend the study of behavioural modelling in UAV operations by usinga flexible fully Markovian model called the Double Chain Markov Model (DCMM). The maincharacteristic of this model (proposed by Berchtold in [Ber99]) is that it combines two Markov

Page 55: Automated Methods for the Evaluation and Analysis of

36Chapter 3. Data-driven learning approaches for the analysis of performance and behavioural

patterns

𝛍(M0, l0, f0)

Instructor

Operators

Training Sequences

A. Extracting interaction sequences

M

l

f

B. Learning DCMMs

Test Sequences

C. Evaluating

DCMMs

D.Weighted Rank

Aggregation

Top-QDCMM

Ranking(L* )

1. 𝛍(2, 2, 2)2. 𝛍(1, 3, 1)...K. 𝛍(5, 1, 3)

Predictability Importance

Interpretability Importance

Weighted rankings

...

LiP

WiP

LiP(1) Li

P(2) ... LiP(M ⹉ l ⹉ f)

WiP(1) Wi

P(2) ... WiP(M ⹉ l ⹉ f)

...

Weighted rankings

...Lj

I

WjI

LjI(1) Lj

I(2) ... LjI(M ⹉ l ⹉ f)

WjI(1) Wj

I(2) ... WjI(M ⹉ l ⹉ f)

...

Pred

icta

bilit

ym

easu

res

Inte

rpre

tabi

lity

mea

sure

s

Figure 3.7: General scheme of the analysis process developed to find the best DCMM for modellingbehaviour in a training operation with UAVs.

chains: an observed non-homogeneous Markov chain and a hidden homogeneous one, whosestate at each time decides the transition matrix used in the visible process. In order to extendeven more the possibilities of the model, we consider the high order extensions of the DCMM[Ber02], which can handle transitions of high-order between both a set of observations and a setof hidden states.

3.2.2.1 Method

In the parameter estimation algorithm used to learn a single DCMM (”) 9, there are threehyperparameters that must be fixed: 1. M , the number of hidden states of the model. 2. l, theorder of the hidden chain. 3. f , the order of the visible chain (0 for a classical HMM).

Since none of those hyperparameters can be assigned large values in order to allow the modelto be analysable by a UAV instructor, we train DCMMs along the entire M × l × f grid space.With regard to the model selection, the use of just one measure (the BIC) for comparison isinsufficient, due to the richness and the numerous options provided in such a general model asthe DCMM. Here, we go beyond the idea of predictability-interpretability balance present in the

9See Section 2 of Publication 2.

Page 56: Automated Methods for the Evaluation and Analysis of

3.2. Modelling hidden patterns through Markovian Models 37

Table 3.6: Summary of the evaluation measures used to assess the quality of a DCMM in this work.These measures are divided into two groups, namely predictability and interpretability, depending onthe aspect of the model covered by each of them. The column “Best” indicates whether best modelsare achieved by maximizing (Max.) or minimizing (Min.) the value of the evaluation measure.

Evaluation Measure Type Range Best

Sequence (Log)-Likelihood (SL) [SM07] Predictability (−∞, 0) Max.Accuracy of Generated Predictions (AGP) Predictability [0, 1] Max.Minimum Precision of Generated Predictions (MPGP) Predictability [0, 1] Max.Bayesian Information Criterion (BIC) [KR95] Both (0,∞) Min.Coeff. of High Probability Hidden Transitions (CHPHT) Interpretability (0, 1) Max.

BIC by setting the two concepts as families, which gather several evaluation measures that aredesigned to cover the corresponding concept in some way. These measures enrich the evaluationprocess and provide opportunities to higher order models, which are excessively penalized bythe BIC. In Table 3.6, a summary of the evaluation measures designed for this work are shown.Some of them are a novelty in the field of Markovian modelling 10.

As it can be seen in Figure 3.7, the evaluation measures are applied over every DCMM”(M, l, f) trained. By sorting these results, we obtain one ranking of models for each measure.Let LPi be the ranking of learnt DCMMs sorted by the ith predictability measure, and WP

i theweights of that ranking, i.e., the value of the measure for each of the models in LPi . Similarly,we have LIj and W I

j for the weighted ranking sorted by the jth interpretability measure (seeFigure 3.7, B-C).

The problem is to find one ranking of size Q, namely L∗, which represents the optimal ag-gregation of the base rankings introduced above and thus, give the best Q DCMMs in termsof predictability and interpretability (See Figure 3.7, D). To do this, we apply Weighted Rank

Aggregation techniques over the rankings{LPi}i

and{LIj

}j, using the corresponding weights

{WPj

}i

and{W Ij

}j

in the distance function between rankings. The problem of Rank Aggrega-

tion is an NP-hard optimization problem, and here we will apply a Genetic Algorithm (GA) inorder to find an optimal solution 11.

One critical issue to address in this process is the importance given to each of the baserankings LPi and LIj . Here, we determine this value based on two factors: On the one hand,the dispersion of the weight vector associated to each ranking, which is computed in terms ofthe Coefficient of Variation [McA15]. On the other hand, the specific needs of the instructorin the analysis, which may cause that the predictability is given more importance than theinterpretability, and vice versa (see the predictability-interpretability slider of Figure 3.7).

Unlike the previous proposed method, here the instructor is provided with a final rankingof models instead of a single one. This is due to the large number of shapes that a DCMM canfeature, which can lead to the situation in which two models have similar ranking scores, butdifferent type of patterns inside, both of them interesting for extracting conclusions.

10See the complete description of the evaluation measures in Publication 2 [RFGPC18b].11See backgrounds on Rank Aggregation on Publication 2 [RFGPC18b].

Page 57: Automated Methods for the Evaluation and Analysis of

38Chapter 3. Data-driven learning approaches for the analysis of performance and behavioural

patterns

Table 3.7: Top-5 DCMMs for DWR, evaluated in terms of several predictability and interpretabilitymeasures.

#SL

(Cv = 0.076)AGP

(Cv = 0.025)MPGP

(Cv = 0.651)BIC

(Cv = 0.462)CHPHT

(Cv = 0.190)

1 ”(8, 2, 2) [−69.530] ”(3, 1, 2) [0.837] ”(7, 2, 0) [0.310] ”(6, 1, 0) [11261.926] ”(3, 1, 2) [0.451]2 ”(8, 1, 2) [−70.174] ”(3, 1, 1) [0.835] ”(4, 2, 0) [0.301] ”(3, 1, 1) [11589.492] ”(4, 1, 2) [0.441]3 ”(7, 1, 2) [−70.906] ”(7, 1, 2) [0.834] ”(4, 3, 0) [0.279] ”(5, 1, 1) [11591.593] ”(3, 3, 0) [0.436]4 ”(7, 2, 2) [−71.362] ”(5, 1, 1) [0.834] ”(5, 1, 1) [0.279] ”(8, 1, 0) [11621.068] ”(5, 1, 2) [0.431]5 ”(8, 3, 2) [−71.472] ”(7, 1, 1) [0.834] ”(8, 1, 1) [0.271] ”(5, 1, 0) [11626.030] ”(6, 1, 0) [0.427]

3.2.2.2 Experimentation

In this section, we will apply the method shown in Figure 3.7 to the data extracted from DWR,from the point of view of an instructor who wants to obtain a behavioural model. More specifi-cally, we will focus in studying how changing the importance balance between the predictabilityand interpretability affects to the hyperparameters of the best DCMM found. Furthermore, wewill check if the capabilities of DCMMs over the currently used HMMs are worthwhile when itcomes to find better models. For the sake of reproducible research, the source code related tothis experimentation is available on Github 12.

The dataset used comprises 85 simulation logs executed by a total of 27 operators on testmission 3 (See Table 2.2 in Section 2.2). A 75% of them will be used for training DCMMs,and the rest 25% will be used as test sequences used to compute the predictability evaluationmeasures. The simplest DCMM in the grid search has (M = 3, l = 1, f = 0) (3-state HMM)and the most complex has (M = 8, l = 3, f = 2). We consider that any model with higherhyperparameters would be intractable.

Table 3.7 shows the resulting top-5 DCMM ranking for each evaluation measures used inthis work, when they are applied to the data from DWR. Each cell contains an identifier of theDCMM based on its hyperparameters (M, l, f) and the associated numerical measure. For eachmeasure it is shown its associated coefficient of variation (e.g. SL, Cv = 0.076), which influencesthe importance of each measure in the Rank Aggregation process.

As it can be observed, the most heterogeneous measures, i.e, the ones that achieve highestcoefficients of variation are the BIC and the MPGP. Therefore, the models leading those rankingswill likely be the ones leading the final ranking. Classical HMMs (l = 1, f = 0) are mostly foundwithin the top-5 BIC ranking, which prove that the use of higher order DCMMs only makessense if we extend the model selection process with additional measures. Top positions of SL andAGP are occupied by complex DCMMs, which give us an initial idea of the superiority of thesemodels over classical HMMs. Finally, DCMMs with high values of l (the order of the hiddenchain) are found within the top-5 CHPHT ranking, probably containing long hidden transitionpatterns.

In order to have a better understanding of the influence of each hyperparameter of a DCMMin terms of the predictability and the interpretability, we represent in Figure 3.8 three heatmaps(one for each hyperparameter) showing the best position of a DCMM in the top-10 aggregatedranking in terms of the importance of the predictability and interpretability of the model. From

12https://github.com/vrodriguezf/CIM-2017.

Page 58: Automated Methods for the Evaluation and Analysis of

3.2. Modelling hidden patterns through Markovian Models 39

3

4

5

6

7

8

0.00 0.25 0.50 0.75 1.00Predictability Importance

Num

ber

of s

tate

s (M

)

1

2

3

0.00 0.25 0.50 0.75 1.00Predictability Importance

Hid

den

chai

n or

der

(l)

0

1

2

0.00 0.25 0.50 0.75 1.00Predictability Importance

Vis

ible

cha

in o

rder

(f)

1

5

10

BestRankPosition

Figure 3.8: Heatmaps showing the best position of a DCMM in the top-10 ranking in terms ofthe three hyperparameters of a DCMM: M , l and f . The x-axis shows how the results vary whenwe modify the importance of the predictability of the model, and thus, modifying importance of theinterpretability too. White color is used when no model with such hyperparameters is found in thetop-10 ranking.

them, we can extract some important conclusions related to this experiment:

ïżœ There is a clear inverse relationship between the predictability importance and the valueof l (see the second heatmap). This means that the use of higher order hidden chains inorder to link more than one previous cognitive operator states is not combined with animprovement of the model predictive capabilities, probably because the evolution of theoperator cognitive state is not complex enough in a simple simulation environment as theone used.

ïżœ Models with visible chains (f > 0) outperform HMMs when it comes to predictabilityissues (See third heatmap). Furthermore, models with f = 1 also occupy high positions inthe ranking when more importance is given to interpretability. This is extremely importantdue to it shows clearly the advantage of having a Markov chain among operator interactionsin each state, which is not possible in the current state of the art HMM-based modellingframework.

It is worth to mention that the conclusions extracted here about the applicability of DCMMsin the context of UAV operations should be strengthened with a set of additional experimentsin different simulation environments, and, above all, the availability of a bigger dataset, whichis specially needed to fit models with such a large number of independent parameters, speciallywhen l or f is greater than 1 13.

13The total number of independent parameters for a DCMM is∑l−1

g=0Mg (M − 1) + M l (M − 1) + MK(f −

1) (K − 1), where M is the number of states and K is the number of observation symbols.

Page 59: Automated Methods for the Evaluation and Analysis of

Chapter 4

Automatic Procedure FollowingEvaluation through time

series-aware conformance checking

Due to the high costs involved in any mission established in an UAS, every critical step or possiblefailure is controlled by following the guidelines of a complete action checklist, as it happens inmanned operations [Joh09]. In this work, we will refer to the term Operating Procedure (OP)as a way to gather different step-by-step guiding tools, such as checklists, action checklists,Emergency Operating Procedures, action plans, etc. OPs have been used consistently in the fieldof Aviation, and more specifically, in UAV operations.

Due to the critical facet of the just mentioned tasks, operators are trained to deal withthem by following OPs strictly. In this training, an instructor is in charge of controlling that theoperator is correctly following the steps described in the OP. This is what we will henceforth callProcedure Following Evaluation (PFE). Nowadays, in most of the applications involving OPs thePFE is performed manually, i.e., the instructor must verify that an operator is following all thesteps described in the OP [DKL+13]. This manual supervision is not appropriate if the numberof operators increases because instructors are not able to inspect whether each trainee is correctlyperforming all the steps described in the OP. For this reason, the development of systems thatautomatically evaluate and analyze how operators follow an OP is highly important, not onlyfor the scalability of the training phase, but also to extract measures about the performance ofthe operators.

The analysis of data related to a business process is supported by a family of techniquesknown as process mining [VdA16]. Among them is conformance checking, which is the maintopic of this work. The goal of conformance checking is to pinpoint differences between thebehaviour observed in an event log and the behavior captured in a process model [GBvBD+18].On the one hand, the process model specifies the exact way(s) in which the tasks comprising aprocess can be executed, and it is often formalized as a Petri Net, or more speficially, a WorkflowNet (WF-Net) [vTKB03]. On the other hand, an event log is a set of traces, each consisting ofthe sequence of event records produced by one execution of the process (one case) [GBvBD+18].

Most of the conformance checking techniques found in the recent literature are focused on

41

Page 60: Automated Methods for the Evaluation and Analysis of

42Chapter 4. Automatic Procedure Following Evaluation through time series-aware conformance

checking

comparing just the order between tasks in the process model and events in the log [GBvBD+18,dLM17, BC17]. Some of them also take into account additional perspectives such as the resourcesneeded by a task and the data associated to a case [BMS16, dLvdA13]. However, to the best ofour knowledge, all of them have two important restrictions: 1. At the most basic level, tasks andevents are matched through a single identifier. 2. The data associated to a case cannot changeover time, unless it is overwritten by a task in the process model (See [dLvdA13]).

The idea of this part of the thesis arises from the need to fill the gap left by those restrictionsin areas where the data produced by the execution of a business process is not directly expressedin terms of events, but of a set of time series containing the evolution of the variables involvedin the process. This usually happens when the process model is loosely coupled to the processdata, as in the case of the operations manuals used in aviation [RFGPC18a], or the performanceanalysis of distributed data-intensive systems [YLZ+14]. More specifically, the field of PFE overOPs in UASs needs of these features. On the one hand, some procedural steps must be checkedaccording to flexible and complex conditions involving the state of one or many variables in thelog during a certain period of time, for example, the step “Supervise that the altitude of thevehicle is below 10000m for 1 minute”. On the other hand, the concept of time here is a criticalfactor not only for evaluating the time spent to complete a OP, but also to define a priori themaximum step duration allowed for each procedural step.

The rest of the chapter is structured as follows: In Section 4.1, the basic elements of con-formance checking are adapted to the paradigm of time-based data and time-aware processes.This serve as input for Section 4.2, where a basic algorithm for time-series aware conformance-checking is presented, which just evaluates whether a case in the log fits the model or not as awhole. Then, in Section 4.3 the algorithm is improved with the inclusion of more conformancecategories to evaluate the tasks individually. Finally, in order to demonstrate the effectivenessof the proposed approaches, two case studies are provided in Section 4.4. In the former case,the basic algorithm is applied as a way of performing APFE over a real OP of an UAS. In thelatter case, the improved algorithm is applied to a more general process model different than anOP, which is obtained from a different domain than UASs: the study of cycles in the positionof a longwall mining shearer.

The contents of this chapter are related to Publication 3 from the compendium 1.

4.1 Adapting WF-Nets to time-based data: The TSWF-net

The use of time series data as the basis for the analysis changes the paradigm of conformancechecking, and process mining at a more general level. Thus, we need to adapt its basic elementsto support time series data and time series-aware processes. Instead of event logs, time serieslogs are defined and used. In the same way, the use of WF-Nets to represent the process isreplaced by TSWF-nets.

1There are differences in this document with respect to Publication 3 (P.3) in the formalization of conceptssuch as the time series log (known as data log in P.3), the Workflow Net with time series (TSWF-net), and thebasic conformance checking algorithm for time series (known as Automatic Procedure Following Evaluation inP.3). Furthermore, the contents of Section 4.3 and 4.4.2 are based on a submitted international journal (See SIJ-1in the Introduction), not published yet.

Page 61: Automated Methods for the Evaluation and Analysis of

4.1. Adapting WF-Nets to time-based data: The TSWF-net 43

4.1.1 Time series log

A time series log is a tuple L = (V,U,X, Y ) that contains:

ïżœ a set V of variable names.

ïżœ a function U that defines the values admissible for each variable, i.e., U(v) is the domainof variable v for each v ∈ V . In this context, D =

⋃v∈V U(v) represents the set of all

possible data values of any variable.

ïżœ An index set X.

ïżœ A function Y that records, for a given case identifier n ∈ N, the values of each variableover an index set, i.e, Y : N×X × V 7→ D âˆȘ {⊄}, with Y (n, x, v) ∈ U(v) âˆȘ {⊄} 2. In thiswork we will work at the level of individual cases. For readability purposes, we will usethe notation Y n(x, v) to refer to the records related to case n.

The data contained in a time series log for an specific case n ∈ N can be seen as a set of logentries 〈x, v, Y n(x, v)〉 ∈ E, where E = X × V ×D. In order to navigate through a time serieslog over time using an arbitrary index, we introduce the log state function, Sn : X × V → E,which represents the closest past indexed value of a variable at any moment of the process.Based on this, we introduce the set Sn[x1,x2] = {Sn(xâ€Č, v) | x1 ≀ xâ€Č ≀ x2, v ∈ V }, which gathersall the log entries between x1 and x2.

4.1.2 TSWF-net

Formally, a classical (or low-level) Petri Net (PN) can be described as a tuple (P, T,R,M),where:

ïżœ P ={p1, . . . , pnp

}is a finite set of np places.

ïżœ T = {t1, . . . , pnt} is a finite set of nt transitions.

ïżœ R ⊆ (P × T ) âˆȘ (T × P ) is the flow relation describing the arcs that connect places andtransitions. A place p is called an input place of a transition t iff there exists an arc fromp to t. In the same way, p is called an output place of t iff there exists an arc from t to p.

ïżœ M : P → N is the marking of the net, which represents a distribution of tokens over places.Furthermore, M(p) represents the number of tokens of place p in marking M .

The dynamics of a classical PN follow some execution rules: 1. A transition t is said to beenabled in a marking M iff each input place of the transition contains at least one token, i.e., iffM(p) > 0,∀p ∈ ‱t, where ‱t refers to the set of input places of the transition t, also called thepreset of t. 2. When a transition t is enabled, it may fire, consuming one token from each inputplace pin ∈ ‱t and producing one token in each output place pout ∈ t‱. The set t‱ is also knownas postset.

2If a variable is not given a value, we use the special symbol ⊄.

Page 62: Automated Methods for the Evaluation and Analysis of

44Chapter 4. Automatic Procedure Following Evaluation through time series-aware conformance

checking

The application of PNs to workflow management results in a class of PNs named WF-Nets[LDZY14], which satisfies the following: 1. It has one initial place (pi), where the process starts,and one output end place (pe), where the process ends. 2. Every task transition should belocated on a path from the initial place to the end place.

A WF-Net can be used to describe the routing of cases throughout the workflow, buildingblocks such as the AND-split, AND-join, OR-split and OR-join in order to model sequential,conditional, parallel, and iterative routing [dAAvD12]. A TSWF-net is a WF-Net in whichtransitions can have ts-guards that allow/block tokens depending of the fulfilment of a set ofconditions expressed in terms of the entries of a time series log. Formally, a ts-guard is a booleanexpression GTS : E∗ → {true, false} that decides whether a set of log entries are admissible tofulfil a condition or not. Normally, a ts-guard does not need to verify a condition for the wholeindex set X. Thus, transitions in a TSWF-net have an associated closed interval [x1, x2], knownas the time scope, where the ts-guard will be assesed.

Tokens in a TSWF-net relate to a time series log through the token time index. This isan attribute (or colour) that represents the “current progress” of a token through a time serieslog. The specification of the time scope of a transition must be relative to the time index ofthe input token(s) that enable it, in order to ensure the conditions are verified after the currenttoken time.

Since the specification of a token is expanded with the inclusion of a time index, so it doesthe description of a net marking. The definition for classical Petri Nets makes no distinctionbetween tokens sharing the same place. Here, that distinction is crucial because two tokensin the same place can have different time indices, which may cause that only one of them isable to pass a ts-guard. From now on, a marking is referred as a function M : P → X∗ thatdescribes the indexed tokens of each place in a TSWF-net. Furthermore, given a transition t,M [‱t] =

⋃p∈‱tM(p) refers to all the input tokens of t, and M [t‱], defined analogously, to all its

output tokens.

Using the previous concepts and definitions, a TSWF-net can be defined as a tuple W =(P, T,R, pi, pe,G,Σ) consisting of: 1. A WF-Net (P, T,R, pi, pe). 2. A guard function G ∈ T → GTS

that associates a ts-guard with each transition. 3. A scope function Σ ∈ T → X ×X that asso-ciates a time scope with each transition.

Regarding the dynamics of a TSWF-net, the key issue lies in deciding when a transitionmust be fired and how the time index of their input tokens must be updated after the firing.Since the time index of a token represents its “progress” through the time series log, we want itto be updated during a firing to the moment when the conditions of the corresponding ts-guardare met for the first time. For this reason, we introduce the function Time of First Fulfilment(TFF) 3, TFFn : T × X → X, such that TFFn(t, xi) denotes the first time index when thets-guard G(t) is fulfilled within the interval xi + Σ(t). If the ts-guard is never fulfilled withinthat interval, TFF returns −1.

A marking M ∈ M∗ is said to be valid for a transition t ∈ T if every input place of tcontains at least one token that fulfills its ts-guard (TFF (t, x) 6= −1). A valid context 〈t,M〉may lead to the firing of t. A firing will consume one token from each input place and produceone for each output place. To avoid ambiguities choosing the input tokens that take part of the

3Known as Time of Completion (ToC) in Publication 3

Page 63: Automated Methods for the Evaluation and Analysis of

4.1. Adapting WF-Nets to time-based data: The TSWF-net 45

t

p1

p2

xo = TFF( t, 15 )

xo ≠ -1

M’ = firing( t, M, {15}, 19)

t is enabled

valid context

t

M( p1 ) = {xi = 15}M( p2 ) = ∅

M’( p1 ) = ∅M’( p2 ) = { xo= 19 }

ÎŁ(t)=[ 0,10 ]

p1

𝓖 ( t )= “foo is greater than 0”

xo = 19

15 25

p2Ti

me

serie

s lo

g (L

)

15

19

(a) Firing a single input place with a single token. The entry of thetime series log that meets the guard for the first time (and gives value to

xo) is marked in red.

t

p1 p2

p3

x2

x1 x1x3

t

x4p3

p1 p2

x1x2

(b) Firing with multipleinput places and input

tokens. We assume thatx1 < x2 < x3 and that

the color of a token(green or red) represents

its validity (ornon-validity) for

transition t.

Figure 4.1: Firing a transition in a TSWF-net.

firing, we define a function I : P 7→ X that establishes the input of the firing. Given an outputtime index xo ∈ X, a firing is a function T ×M∗ × I∗ × X → M∗ that produces a markingM â€Č = firing(t,M, I, xo), where:

M â€Č(p) =

M(p) \ I(p) if p ∈ ‱t \ t‱M(p) âˆȘ {xo} if p ∈ t ‱ \ ‱ t(M(p) \ I(p)) âˆȘ {xo} if p ∈ ‱t ∩ t‱M(p) otherwise

Figure 4.1a shows a simple example for the firing of a transition in a TSWF-net with the con-cepts explained above. In the example, the transition t is associated to the ts-guard “Supervisethat foo is greater than 0”, which is internally expressed as:

GTS(Δ) =

{true if ∃ 〈x, foo, y〉 ∈ Δ | y > 0

false otherwise

In order to avoid the ambiguity of a firing with multiple input tokens and places, the rule applied

Page 64: Automated Methods for the Evaluation and Analysis of

46Chapter 4. Automatic Procedure Following Evaluation through time series-aware conformance

checking

Algorithm 1 Execute Dynamics of a TSWF-net

Input: W = (P, T,R, pi, pe,G,ÎŁ) is a TSWF-net. L = (V,U,X, Y ) is a time series log, and n isthe case identifier to be used. M0 represents the initial marking for the net execution.

Output: Final marking of the TSWF-net at the end of the process (M). Record of net firings(φ ∈ F ∗)

1: function ExecuteDynamics(W , L, n, M0)2: M ← M0

3: φ ← ∅4: IC ← ∅ . Invalid contexts5: while (∃t ∈ EnabledTransitions(W,M) |

∀p ∈ ‱t ∃x ∈M(p) | 〈t, x〉 /∈ IC) do6: τ ← ∅7: for all p ∈ ‱t do8: x ← max {xâ€Č ∈M(p) | 〈t, xâ€Č〉 /∈ IC}9: I(p) ← x

10: τ ← τ âˆȘ {〈x,TFFn(t, x)〉} . Times of first fulfilment11: end for12: if @x ∈ I[‱t] | 〈x,−1〉 ∈ τ then13: xo ← max {xâ€Č | x, xâ€Č ∈ X ∧ 〈x, xâ€Č〉 ∈ τ}14: M â€Č ← Firing(t,M, I, xo)15: φ ← φ âˆȘ {〈t,M, I, xo〉}16: M ← M â€Č

17: else18: IC ← IC âˆȘ {〈t, x〉 | x ∈ I[‱t] ∧ 〈x,−1〉 ∈ Îł}19: end if20: end while21: return 〈M,Ï†ă€‰22: end function

is to set time index of the output tokens to the maximum value of the TFF computed for theinput tokens (See Figure 4.1b).

Algorithm 1 summarizes the execution semantics of a TSWF-net. The algorithm receives asinput a TSWF-net with all its attributes (W) and an initial marking to begin the process (M0).For each iteration of the loop, the algorithm tries to fire an enabled transition with valid inputcontexts. When the net reaches a marking in which no transition can be fired (see Algorithm 1,line 5), that marking is returned as a way to express the final state of the execution. Note thata transition can be enabled but not fired. Also, the arguments of every firing are returned (seeAlgorithm 1, lines 15 and 21), which is useful for further analysis of the output.

4.2 Basic time series-aware conformance checking over a TSWF-net

Algorithm 2 summarizes the process of basic conformance checking over a TSWF-net, whichconsists in analysing whether an individual case of the time series log fits completely the processmodel or not, and the timestamp when the case ended its progress through the process model.

Page 65: Automated Methods for the Evaluation and Analysis of

4.3. Complete time series-aware conformance checking over a TSWF-net 47

Algorithm 2 Basic CC on a TSWF-net

Input: W = (P, T,R, pi, pe,G,ÎŁ) is an Operating Procedure modelled as a TSWF-net. L =(V,U,X, Y ) is the time series log of the operation, and n is the case identifier to be used.x0 represents the start time for the evaluation.

Output: Tuple containing the following elements: A boolean value indicating whether theprocedure has been successfully followed or not, the final marking of the Petri Net at theend of the process, and the time spent in the Procedure Following Evaluation.

1: function APFE(W , L, n, x0)2: M0 ← {〈W.pi, x0〉}3: 〈Mf , Ï†ă€‰ ← executeDynamics(W,L, n,M0)4: time ← maxx∈Mf

(x− x0)5: if |Mf (W.pe) | > 0 then6: return < true,Mf , time >7: else8: return < false,Mf , time >9: end if

10: end function

Given a TSWF-net with all its attributes (W ), a time series log (L) and a start time when wewant to begin the evaluation (x0), a single token is created in the initial place of the net (seeAlgorithm 2, line 2). This token comprises the initial marking of the TSWF-net, namely M0.After the initialization, the dynamics of the TSWF-net are executed (See the call on Algorithm2, line 3). When this call ends, the information contained in Mf is used to judge whether theprocess has been successfully completed or not:

ïżœ If Mf contains a token placed in the end place (pe) of the net (see Algorithm 2, line 5),then we consider that the case in the time series log fits the model successfully.

ïżœ Otherwise, if the initial token has been locked in some intermediate place of the net, itmeans that some procedural step in the process model has not been performed correctly.Nevertheless, the algorithm returns the information of the marking Mf so that we cananalyse which steps are causing troubles (see Algorithm 2, line 8).

4.3 Complete time series-aware conformance checking over a TSWF-net

In this section, we will extend the basic conformance checking algorithm proposed in the abovesection, so that it no longer returns just whether the process model has been completely fulfilledor not. Instead, now every task analysed during the execution of the algorithm will be classifiedinto one of the following categories (See Figure 4.2):

ïżœ match: The conditions of the task are fulfilled in the current temporal context of thealgorithm execution.

ïżœ time mismatch: The conditions of the task are not fulfilled in the current temporal context

Page 66: Automated Methods for the Evaluation and Analysis of

48Chapter 4. Automatic Procedure Following Evaluation through time series-aware conformance

checking

t75

t100

(a) Match

t175

t175

r

(b) Time mismatch

t

r

175

t175

(c) Absence

Figure 4.2: Example of each of the conformance categories detected using the proposed algorithm.For all the examples, the ts-guard of transition t is “foo starts increasing” and its time scope is[0, 100]. Where applicable, the reversing time (r) is set to 100 time units.

of the algorithm execution, but they are fulfilled in earlier contexts. This means that thelog gathers the behaviour captured in the model, but in an unexpected time frame.

ïżœ absence: The conditions of the task are not fulfilled in the current temporal context ofthe algorithm execution, nor in a earlier one.

In the above descriptions, the term “temporal context” refers to the time indices of thetokens that populate the TSWF-net at a certain point in the execution of the algorithm.

Formally, the result of executing the proposed method will be gathered in a conformancefunction, c ∈ T ×X 7→ {match, time mismatch, absence}, that maps every analysed transition(or task) to a category of those mentioned above, along with time index, that represents thetime of first fulfilment in case of a match, and the time when the discrepancy was found in anyother case.

Algorithm 3 describes part of the proposed method. The word “partial” in the name of thealgorithm refers to the fact that only a branch of the TSWF-net will be analysed, and because ofthat, the conformance function returned will be defined partially for a subset of the transitionsof the net. Given a TSWF-net (W ), a time series log (L) and the identifier of the case that wantsto be analysed (n), and given an initial marking (M0) from which to begin the evaluation, themain loop of the algorithm is run until the marking reached (M) does not contain any enabledtransition (see line 17), which is normally the situation where all tokens are gathered in theending place of the model (pe).

For each iteration of the main loop, the execution dynamics of the given TSWF-net arerun (see Algorithm 3, line 5), and the current marking of the net (M) is updated. As itwas commented in the previous section, a call to the dynamics of a TSWF-net does not onlyreturn the final marking reached, but also the list of firings that took place in the process.From the perspective of conformance checking, these firings represent matches between theprocess model and the time series log. Thus, the corresponding transitions will be added to the

Page 67: Automated Methods for the Evaluation and Analysis of

4.3. Complete time series-aware conformance checking over a TSWF-net 49

Algorithm 3 Partial CC on a TSWF-net

Input: W = (P, T,R, pi, pe,G,Σ) is a TSWF-net, L = (V,U,X, Y ) is a time series log, andn ∈ N is the identifier of the case to be analysed, M0 represents the initial marking for theevaluation.

Output: Conformance function c (partial function). Marking trace (” ∈M∗).

1: function PartialCCTS(W , L, n, M0)2: ” ← {M0}3: M ← M0

4: do5: 〈M,Ï†ă€‰ ← ExecuteDynamics(W,L, n,M)6: for all 〈t,M â€Č, I, xo〉 ∈ φ do7: ” ← ” âˆȘM â€Č8: c(t, xo) ← match

9: end for10: ” ← ” âˆȘM11: if ∃t ∈ EnabledTransitions(W,M) then12: 〈M, câ€Č〉 ← evaluateUntreatedContext(t,M)13: for all 〈tâ€Č, x〉 ∈ dom(câ€Č) do14: c(tâ€Č, x) ← câ€Č(tâ€Č, x)15: end for16: end if17: while EnabledTransitions(W,M) 6= ∅18: return 〈c, ”〉19: end function

conformance function with the category match, along with their corresponding time of fulfilment(see Algorithm 3, line 8).

If the execution of the net dynamics yields to a marking where there is at least one enabledtransition (t), that means that there is at least one untreated context 〈t, x〉 between t and any ofits input tokens x ∈M [‱t] (see Algorithm 3, lines 11-12). By calling a helper function (describedin Algorithm 4) this situation is “repaired” in three possible ways:

1. First, we check whether the untreated context is valid, and if so, label the conformance oft with the category match (see Algorithm 4, line 13).

2. Secondly, in case the untreated context is invalid, we analyse if the invalid pair 〈t, x〉become valid in an earlier time index x− r, where r is a fixed parameter of the algorithm,named as reversing time. With this, we deal with the cases where a task is fulfilled in thetime series log, but in a time interval different than expected (see Figure 4.2). In thesecases, the conformance function labels t with the category time mismatch (see Algorithm4, line 16).

3. In case that the previous step cannot transform every invalid context 〈t, x〉 into a valid one,we consider that the log does not reflect the behaviour asked by the ts-guard in the localenvironment of x, and thus the conformance function labels t with the category absence

(see Figure 4.2 and Algorithm 4, line 19).

Page 68: Automated Methods for the Evaluation and Analysis of

50Chapter 4. Automatic Procedure Following Evaluation through time series-aware conformance

checking

Algorithm 4 Evaluate untreated context

Input: t ∈ T is a transition, M represents a marking. It is assumed that 〈t,M〉 is an invalidcontext. Also, it is assumed that the TSWF-net, the time series log and the case to beanalysed has been fixed before calling this function.

Output: Marking M updated after the repairing. Conformance function c containing the eval-uation of t.

1: function EvaluateUntreatedContext(t, M)2: for all p ∈ ‱t do3: if ∃x ∈M(p) | 〈t, x〉 is a valid context then4: M â€Č(p) ← {x}5: I(p) ← x6: else7: M â€Č(p) ← {max (M(p))−R}8: I(p) ← max (M(p))−R9: end if

10: end for11: if 〈t,M〉 is a valid context then12: xo ← maxx∈I[‱t] TFF(t, x)13: c(t, xo) ← match

14: else if 〈t,M â€Č〉 is a valid context then15: xo ← maxM [‱t]16: c(t,maxx∈I[‱t] TFF(t, x)) ← time mismatch

17: else18: xo ← maxM [‱t]19: c(t, xo) ← absence

20: end if21: M â€Čâ€Č ← firing(t,M â€Č, I, xo)22: return 〈M â€Čâ€Č, c〉23: end function

In any of the situations described above, the lock caused by the presence of untreated contextsmust be removed to make the algorithm continue. For this purpose, the firing of t will besimulated (see 4, line 21), and the resulting marking will be used to continue the main loop ofthe algorithm. Note that the time index set to the output tokens during the firing (xo) is differentdepending on the conformance category that has been assigned to the task. More specifically,in case of a match evaluation, the value of xo is set as the maximum TFF of the input tokens, aswas done in the dynamics shown in Algorithm 1. For any non-matching conformance category(i.e., time mismatch or absence), there is no fulfilment time, and thus, xo keeps the maximumtime index of the input tokens.

As it was noted above, Algorithm 3 performs a partial conformance checking. This may leadto the situation shown in Figure 4.3, where there are multiple untreated contexts to repair, andsolving one of them (e.g. 〈t,M〉) makes the other disappear, since the associated transition (tâ€Č)is no longer enabled. This, in turn, provokes that everything in the TSWF-net under tâ€Č is notanalysed.

To solve this issue and make the proposed method complete, we introduce a new algorithm

Page 69: Automated Methods for the Evaluation and Analysis of

4.3. Complete time series-aware conformance checking over a TSWF-net 51

t

p

t’

x

t

p1

t’

x1 x2p2

Figure 4.3: Example of situations with multiple untreated contexts. On the situation shown on theleft, repairing one of the invalid contexts (e.g. 〈t, x〉) makes the other disappear, since t and tâ€Č shareinput tokens. On the other hand, the situation on the right can be fully repaired in two iterationsof algorithm 3.

Algorithm 5 Complete CC on a TSWF-net

Input: W = (P, T,R, pi, pe,G,Σ) is a TSWF-net. L = (V,U,X, Y ) is a time series log, andn ∈ N is the identifier of the case to be analysed. M0 represents the initial marking for theevaluation.

Output: Conformance function c (complete function).

1: function CompleteCCTS(W , L, n, M0)2: 〈c, ”〉 ← PartialCCTS(W , L, n, M0)3: for all M ∈ ” do4: for all t ∈ T |

(t ∈ EnabledTransitions(W,M) ∧∀x ∈ X 〈t, x〉 /∈ dom(c)) do

5: 〈M â€Č, câ€Č〉 ← EvaluateUntreatedContext(t,M)6: for all 〈tâ€Č, x〉 ∈ dom(câ€Č) do7: c(tâ€Č, x) ← câ€Č(tâ€Č, x)8: end for9: câ€Čâ€Č ← CompleteCCTS(W,L, n,M â€Č)

10: for all 〈tâ€Č, x〉 ∈ dom(câ€Č) do11: c(tâ€Č, x) ← câ€Čâ€Č(tâ€Č, x)12: end for13: end for14: end for15: return c16: end function

called completeCCTS (See Algorithm 5). This is a recursive method that starts by making a callto partialCCTS (See Algorithm 5, line 2), and uses the returned marking trace (”) in order todetect situations such as the one shown in Figure 4.3, where the same token is part of multipleuntreated contexts. Formally, this situation is expressed in line 4 of Algorithm 5. If a transition tis enabled in a given marking (M) of the marking trace, but it is not present in the domain of theconformance function returned by partialCCTS, it means that 〈t,M〉 constitutes an untreatedcontext that was not solved during the call to partialCCTS.

Once this situation is detected, the untreated invalid context is repaired (i.e., a call to Algo-rithm 4 is made), and the process starts again recursively, starting from the marking obtainedafter the repair (See Algorithm 5, line 9). The conformance results for both the repair functionand the recursive call are gathered into a single conformance function, which is returned at the

Page 70: Automated Methods for the Evaluation and Analysis of

52Chapter 4. Automatic Procedure Following Evaluation through time series-aware conformance

checking

end of the algorithm.

4.4 Experimentation

The proposed approach for modelling and checking the conformance of time-series aware pro-cesses has been applied in two different case studies from different domains, in order to validateits effectiveness. The former study is focused on the main domain of this thesis (UAV opera-tions), and tackles specifically the use of TSWF-nets to model OPs, and the use of basic timeseries-aware conformance checking algorithms as a way to perform PFE. On the other hand, thelatter study is faced from a more general perspective than OPs and PFE. Here, the theoreti-cal cycle of the position of a longwall mining shearer will be compared with its actual loggedposition, using the complete time series-aware conformance checking algorithm.

4.4.1 Case Study 1: Basic conformance checking over the “Engine Bay Overheating”operating procedure in an Unmanned Aircraft System (UAS)

In this section, we will study how to model and evaluate the response of an operator in a UASwhile following a OP designed to face the alert “Engine Bay Overheating”. This alert is firedwhen the temperature inside the engine bay of a UAV is over the equipment operative limit.The risk of damage or loss of equipment creates a serious danger, and thus, in case the operatoris unable to reduce the temperature, he/she must act immediately aborting the mission andlanding the UAV. Both the written OP and the time series log of this study come from theSAVIER UAS demonstrator 4, from Airbus Defence & Space ©.

4.4.1.1 Modelling the OP as a TSWF-net

Basically, an OP comprises an ordered sequence of procedural steps, each of them describing asub-goal that the operator must accomplish to successfully complete the whole procedure. Aprocedural step may be either atomic or a container of a sequence of sub-steps, which must be allfulfilled in order to consider the container step has been completed. Additionally, a proceduralstep may contain an attribute, namely ”, indicating the maximum step duration.

Based on the type of response expected from the operator in a procedural step, each atomicprocedural step can be classified according to three groups:

ïżœ Action: the operator is asked to do or command something that will alter in some waythe status of the system. Actions are usually mandatory for the completion of the OP andconsume a specific amount of time.

ïżœ Check : A check is a step in which the operator is asked to confirm/refuse that a specificproperty inside the operational environment is fulfilled. In this work, we consider thatchecks are instantaneous (” = 0), which means that the time employed by the operator isirrelevant.

4More information can be found in https://bit.ly/2HYXmsY

Page 71: Automated Methods for the Evaluation and Analysis of

4.4. Experimentation 53

fulfilled !fulfilled

[0,0] [0,0]

(a) Check

[0, 𝝁]

(b) Action

fulfilled !fulfilled

[0, 𝝁] [0, 𝝁]

(c) Supervision

Substeps

AND-Split

AND-JoinOR-Join

fulfilled!fulfilled

(d) Concurrent container

Figure 4.4: Workflow Process Definition for the different type of procedural steps defined for anyOP. Transitions drawn as thin lines represent instantaneous tasks (checks). Transitions drawn asrectangles represents an implicit delay in their execution.

ïżœ Supervision: There are some procedural steps that ask the operator to supervise thestatus of some aspects of the system for a time period. Unlike check steps, the process ofsupervising is not instantaneous, but persistent during a time interval (” 6= 0).

Figure 4.4 shows the definition of the different types of procedural steps described in terms ofthe elemens of a TSWF-net. Check steps and Supervision steps always offer two different waysto follow the execution of the OP, depending on whether the condition to check or supervise isfulfilled or not. On the contrary, when a OP requires the action of an operator, there is no otherway to follow than waiting for the operator to execute that action successfully. The time scopeof every transition is set to [0, ”] (” = 0 for check steps), so given an input time index, xi, thets-guards will be evaluated from xi to xi + ”.

Regarding container steps, we distinguish between concurrent and sequential containers,depending on whether the OP allows operators to perform those sub-steps in parallel or not.In case of having sub-steps concurrency, we surround them with routing blocks (AND-split,AND-Join, OR-Join) to ensure that step N+1 cannot start until every substep from step N hasfinished.

In Figure 4.5a, a snapshot of a OP document related to the alert “Engine Bay Overheating”is shown 5. The different checks comprising the first procedural step are carried out in parallel.If any of them is not fulfilled, it is considered that the alert is not real, and the operator shouldnot continue the next steps of the OP. Figure 4.5b shows a graphical representation of theTSWF-net that results from applying the modelling techniques proposed in Section 4.4.1.1. Toget a better understanding of the model, and to see the formal definition of the time series logused and the ts-guards associated to each step, please refer to Publication 3 [RFGPC18a].

4.4.1.2 Testing the basic conformance checking algorithm in the TSWF-net-based OP

In order to carry out this case study, some test cases have been developed. On the one hand,Table 4.1a shows the different test scenarios defined for this case study. Each of them simu-

5The contents of the operating procedure shown here have been adapted from a real checklist used by AirbusDefence & Space company. However, and due to a confidential agreement with this company, any detail orsensitive information has been removed.

Page 72: Automated Methods for the Evaluation and Analysis of

54Chapter 4. Automatic Procedure Following Evaluation through time series-aware conformance

checking

ALERT NAME ENGINE BAY OVERHEATING (EBO)

DESCRIPTION

Risk of damage or loss of equipment in engine bay. DO-LIST OPERATING PROCEDURE

1. In Main Control Panel, check: - Engine Temperature Panel is high-yellow

- Engine Temperature is over 70ÂșC 2. Modify UAV altitude by using ALTITUDE command 3. Supervise that EBO caution disappears from the alert panel 4. IF EBO caution persists:

4.1. Command GO-WP to waypoint IAF 4.2. Command LANDING

5. If EBO caution disappears: - Resume Mission by changing the control mode to AUTOMATIC

(a) Written Description of the OP.

(b) Corresponding TSWF-net.

Figure 4.5: Description and representation of the “Engine Bay Overheating” alert.

Table 4.1: Test cases defined for this case study.

(a) Test Scenarios (TS).

Step 1.1 Step 1.2 Step 3

TS1 F F FTS2 !F F FTS3 F !F FTS4 !F !F FTS5 F F !FTS6 !F F !FTS7 F !F !FTS8 !F !F !F

(b) Test Operators (TO).

Action 2 Action 4.1 Action 4.2 Action 5

TO1 - - - -TO2 10 - - -TO3 - 10 - -TO4 - - 10 -TO5 - - - 15TO6 10 20 - -TO7 10 - 20 -TO8 10 - - 25TO9 - 10 20 -

TO10 10 20 30 -

lates a possible combination of Fulfilment (F) and not Fulfilment (!F) of the different checks &supervisions found in the Engine Bay Overheating OP, in order to cover every possible way offacing the alert. TS1 and TS5 are the only scenarios where the two initial checks are fulfilled,and then we can consider that the alert is real. The rest of the cases represent spurious, i.e.,

Page 73: Automated Methods for the Evaluation and Analysis of

4.4. Experimentation 55

Table 4.2: Results of the APFE algorithm for each test case (Test Scenario + Test Operator)developed in this case study. Bolded cells mark the cases where the test scenario represents areal alert and the operator followed the procedure successfully. t and f denote true and false

respectively.

TS1 TS2 TS3 TS4 TS5 TS6 TS7 TS8

TO1 f, 1p2, 0 t, 1pe + 1paj1.2, 0 t, 1pe + 1paj1.1, 0 t, 2pe, 0 f, 1p2, 0 t, 1pe + 1paj1.2, 0 t, 1pe + 1paj1.1, 0 t, 2pe, 0TO2 f, 1p5, 15 t, 1pe + 1paj1.2, 0 t, 1pe + 1paj1.1, 0 t, 2pe, 0 f, 1p4.1, 10 t, 1pe + 1paj1.2, 0 t, 1pe + 1paj1.1, 0 t, 2pe, 0TO3 f, 1p2, 0 t, 1pe + 1paj1.2, 0 t, 1pe + 1paj1.1, 0 t, 2pe, 0 f, 1p2, 0 t, 1pe + 1paj1.2, 0 t, 1pe + 1paj1.1, 0 t, 2pe, 0TO4 f, 1p2, 0 t, 1pe + 1paj1.2, 0 t, 1pe + 1paj1.1, 0 t, 2pe, 0 f, 1p2, 0 t, 1pe + 1paj1.2, 0 t, 1pe + 1paj1.1, 0 t, 2pe, 0TO5 f, 1p2, 0 t, 1pe + 1paj1.2, 0 t, 1pe + 1paj1.1, 0 t, 2pe, 0 f, 1p2, 0 t, 1pe + 1paj1.2, 0 t, 1pe + 1paj1.1, 0 t, 2pe, 0TO6 f, 1p5, 15 t, 1pe + 1paj1.2, 0 t, 1pe + 1paj1.1, 0 t, 2pe, 0 f, 1p4.2, 20 t, 1pe + 1paj1.2, 0 t, 1pe + 1paj1.1, 0 t, 2pe, 0TO7 f, 1p5, 15 t, 1pe + 1paj1.2, 0 t, 1pe + 1paj1.1, 0 t, 2pe, 0 f, 1p4.1, 10 t, 1pe + 1paj1.2, 0 t, 1pe + 1paj1.1, 0 t, 2pe, 0TO8 t,1pe,25 t, 1pe + 1paj1.2, 0 t, 1pe + 1paj1.1, 0 t, 2pe, 0 f, 1p4.1, 10, t, 1pe + 1paj1.2, 0 t, 1pe + 1paj1.1, 0 t, 2pe, 0TO9 f, 1p2, 0 t, 1pe + 1paj1.2, 0 t, 1pe + 1paj1.1, 0 t, 2pe, 0 f, 1p2, 0 t, 1pe + 1paj1.2, 0 t, 1pe + 1paj1.1, 0 t, 2pe, 0

TO10 f, 1p5, 15 t, 1pe + 1paj1.2, 0 t, 1pe + 1paj1.1, 0 t, 2pe, 0 t,1pe,30 t, 1pe + 1paj1.2, 0 t, 1pe + 1paj1.1, 0 t, 2pe, 0

false alerts. On the other hand, the actions performed by a battery of test operators in order toface the alert are recorded in Table 4.1b. Each cell represents the time of first fulfillment (TFF)of the action, measured in seconds since the beginning of the alert. The symbol ”-” means thatthe action has not been performed by the operator.

The results of running the APFE algorithm for each test case are shown in Table 4.2. Thereare a total of 80 test cases, combining the test scenarios shown in Table 4.1a with the testoperators of Table 4.1b. Regarding the test cases run in a real-alert test scenario (TS1 andTS5), we can see that for the operators TO1,TO3,TO4,TO5 and TO9, the algorithm returns1p2 as last state, which means that they are missing action 2 of the procedure. That informationcould be used by an instructor to improve the performance of these operators during a trainingsession. On the other hand, for every test case run in a run in a spurious test scenario (i.e., TS2,TS3, TS4, TS6, TS7, TS8), the APFE algorithm returns TRUE as a general response and 0 as thetime spent to complete the OP, regardless the test operator. This means that, no matter whatactions the operator perform, his/her response will always be considered as correct with respectto this alert if the alert is not real. Thus, it seems neccesary to improve the APFE algorithm sothat it can evaluate when the operator “overreacts” to an alert with unnecessary actions.

4.4.2 Case Study 2: Complete conformance checking over the process model of a longwallmining shearer

Mining process with the longwall system is a specific production process consists of a dozenprocesses of particular machines and devices carried out underground with characteristic naturalhazards and geological conditions existence. The main machine in a longwall face is a shearer(Appendix A shows an image of this device). The shearer moves along a longwall face from sideto side mining the coal deposit. The general process model of the shearer cycle consists of thefollowing phases: 1. working at the beginning of the longwall face - location increasing (phase1) 2. technological operations - location is constant (phase 2) 3. working at the beginning ofthe longwall face - location decreasing (phase 3) 4. and working beyond the beginning of thelongwall face - location highly increasing (phase 4).

The beginning and the end of the longwall face can be identified as places at a distance of 30-40 meters from both ends of the excavation. These phases repeat in the working cycle and change

Page 74: Automated Methods for the Evaluation and Analysis of

56Chapter 4. Automatic Procedure Following Evaluation through time series-aware conformance

checking

Figure 4.6: The general model of a shearer cycle. The number in brackets refer to the phases ofthe process.

accordingly to the shearer location in the longwall face (see Figure 4.6). The process model usedin this experimentation represents the first half of the theoretical cycle shown in Figure 4.6 (thecontinuous black line). The numbers in brackets refer to the phases of the process, which followthe sequence: Phase 1 - Phase 2 - Phase 3 - Phase 4. The last phase is optional, sincethere are some cases in which the shearer does not reach the end of the excavation accordingto the model. A graphical representation of the TSWF-net that results from modelling theaforementioned sequence of phases can be seen in Appendix A (more specifically in Figure A.2).That appendix also shows the description of the ts-guards of every transition comprising theTSWF-net, in both a formal and a descriptive way (see Appendix A, Tables A.3 and A.1). Allthe ts-guards are expressed in terms of changes in the monotonicity of the location of the shearer,in order to match the theoretical cycle shown in Figure 4.6.

The analysed dataset contains data from a machinery monitoring system in an undergroundhard coal mine. The raw data includes 176 variables (mainly binary) in 2,66 millions times-tamped records from a monthly period (January 2018) 6. From all of them, only the location onthe longwall of the shearer will be used for this experiment (V = {“Location”}), due to the theo-retical model is expressed exclusively in terms of that variable (See Figure 4.6). A description ofthe parameter tuning for this experimentation is shown in Appendix A, including parameters fordealing with noise and missing data in the ts-guards, such as the minimum fulfillment duration,that define the minimum length of an interval where the condition of a ts-guard must be fulfilledin order to consider the ts-guard as fulfilled.

Since only one variable of the time series log is used, the results of the conformance checkingcan be easily validated by visualizing them on top of the time series data that is being analysed.The process model used can be fulfilled following different paths (depending on whether phase4 is present or not), and hence, we will visualize the results of each of them one on top of theother. Each entry of the conformance function returned by the proposed algorithm comprises theidentifier of a transition, a time index and the conformance category assigned to that transition

6The dataset is not published due to confidential agreements.

Page 75: Automated Methods for the Evaluation and Analysis of

4.4. Experimentation 57

(a) Results for a normal case. Every task wasassigned a match conformance category in at least

one path.

(b) Results for an extreme case. A task wasassigned the category time mismatch.

Figure 4.7: Results of the complete conformance checking algorithm over two cases of the longwallmining time series log.

at that time. To visualize all this information over the time series data, a mark (vertical line) isplaced at the time index of the evaluation, labelled with the name of the transition and colouredaccording to the conformance category assigned: 1. Green colour for the match category. 2. Redcolour for the time mismatch category. 3. Entries with the category absence are written ontop of the plot.

We will show the results of applying the proposed approach over two representative casesof the dataset: a normal case, i.e., one with a shape similar to the theoretical process, and anextreme case with an unexpected shape. Figure 4.7a shows the results for the normal case,wherethe algorithm managed to detect correctly the beginning and the end of every phase accordingto the theoretical model. The absence category assigned to Phase 4 not present on path 2indicates that Phase 4 is actually present. It is remarkable that, although the case data hassome noise (See the last part of the time series) the algorithm manages to avoid it and detectthe phases correctly.

In Figure 4.7b, conversely, the shape of the location time series of the case shown is far fromthe theoretical cycle. Still, the algorithm matched almost every task in the process model, due tothe matching of tasks like Phase 1 [end] and Phase 2 [end] is done in short intervals, whichmay not be representative. That could be solved tuning the parameter minimum fulfillmentduration properly for the time series log, which may require of sophisticated hyperparametertuning algorithms. Note also that the end of Phase 4 is assigned the category time mismatch,while the beginning of the phase is matched correctly. This is consistent from a data perspective,since the conditions for the end of the phase are met, but it is inconsistent with the fact thatthe end of a phase cannot be assigned before its beginning. Further study is needed to deal withthese cases with hard dependencies between tasks.

Page 76: Automated Methods for the Evaluation and Analysis of

Chapter 5

Conclusions and Future Works

5.1 Conclusions

The main contributions of this PhD Thesis are the development of methods that automatethe tasks of evaluation and analysis carried out by an instructor in a training mission on aUnmanned Aircraft System (UAS), in order to allow training operations on a large scale. Morespecifically, the tasks of performance analysis, extraction of behavioural patterns and procedurefollowing evaluation have been addressed with methods that rely, partially or exclusively, onthe mission data produced during multiple training sessions, such as time series clustering,Markovian modelling, or process mining. To achieve the main goals of this dissertation inrelation to each of the aforementioned instructor tasks, several approaches have been studied.

Regarding the task of performance analysis, a method based on time series clustering toextract representative performance profiles from UAV operators during their training processeshas been presented, in order to analyse and compare how different operators act throughouta mission. The proposed method is applied in a multi-UAV simulation environment, DWR,with inexperienced operators, obtaining a fair description of the temporal behavioral patternsfollowed during the course of the simulation.

On the other hand, the task of finding behavioural patterns among operators, which has re-cently been addressed with the development of predictive models of behaviour based on HMMs,has been applied to the data extracted from inexperienced operators in the simulation environ-ment DWR, and extended in two ways. The former is related to the use of MC-HMMs, whichallow to gather into the same model multiple data sequences such as the combination of operatorinteractions and mission events. Thus, the resulting models contain more robust and informa-tive states than those from previous works in the field. The latter is concerned with the use ofDCMMs, which extend the possibilities of the currently used HMMs by combining two higherorder Markov chains into the same model, one for the model states (hidden chain) and one forthe observations (visible chain). The different processes for creating, evaluating and selectingthese models in a UAS-like context have been detailed, and after an experimentation in DWR,the resulting models show that adding a visible Markov chain among the layer of observationsimproves the predictive capabilities of DCMMs over classical HMMs, while maintaining a fairlevel of interpretability.

59

Page 77: Automated Methods for the Evaluation and Analysis of

60 Chapter 5. Conclusions and Future Works

Finally, and related to the task of procedure following evaluation, the solution to automateit lies in bringing a new perspective to the topic of conformance checking, and process miningin general, by introducing the use of time series data as the basis for the analysis. This iscrucial for analysing process models whose tasks depend on how one, or multiple, variables ofthe process variate, which is the case of operating procedures in a UAS. The formalization ofboth the data, the process model and the algorithms needed to perform conformance checkingare here redesigned and adapted for the challenging perspective of time series data and timeseries-aware processes. A couple of case studies have been carried out to demonstrate how toapply these concepts, not only for the do-list operating procedures of a UAS, but also for aprocess model from a different domain.

Taking into account all the experimental analysis carried out for each method proposed inthis thesis, it is now possible to provide an answer to the main research questions that have beenbriefly presented in the introduction chapter. Below is a revision of these research questions,and the corresponding answers, which are given on the basis of the experimental conclusionsreported in this dissertation:

ïżœ Q1: Is it possible to automate the analysis of the normal evolution of the operator perfor-mance throughout the course of a mission?

The approach presented in Section 3.1 to deal with the task of performance analysis in aUAS aims to describe the behaviour of operators over time using a profile-based model. As-suming that we have defined the performance profile of a simulation as a multivariate timeseries composed of the evolution of several performance measures, the proposed methodbegins by grouping the data for each measure separately, validating different clusteringconfigurations. The clustering results for each measure are used to define the similaritybetween two profiles, which is used in a last medoid-based clustering process to extractthe most representative ones.

This method has been applied in a lightweight multi-UAV environment, named DWR,where a total of 6 performance measures comprises the performance profile. To evaluatethe results, a human judgement-based ground truth dataset has been created by askingusers to rate the similarity between pairs of performance time series. The results obtainedfrom the experimentation show that the proposed method gets good accuracy scores,specially from a general perspective where the applicability to different domains is takeninto account. In this sense, the proposed approach stands out due to both its adaptabilityto different types of time series metrics and the use of different clustering methods. Therepresentative operator performance profiles obtained in the experimentation have beenqualitatively analyzed, according to the observed relationships between the changes in theoperator performance and the events that occurred in the mission.

ïżœ Q2: How can we extend the current HMM-based methods for behavioural modelling inUASs?

Since a classical HMM only allows to model a single sequence of data (the sequence ofoperator interactions), the resulting cognitive states may lack of additional informationregarding the changes in the course of the mission. By using Multichannel (or Multivariate)HMMs, the states of the model can be enriched with the usage of parallel sources of

Page 78: Automated Methods for the Evaluation and Analysis of

5.1. Conclusions 61

information: the interactions performed by the operators in the simulation environment,and the events that describes the course of the mission. In this work, the different stepsfor creating, selecting and analysing MC-HMMs in a UAS-like environment are described,and an experiment with inexperienced operators in DWR has been carried out.

The resulting model for this experiment turns out to be fairly descriptive, and revealsseveral behavioral patterns that represent the inexperience of the operators tested, suchas the way they control the simulation speed, or the general tendency they have to hastenand change the prescheduled mission plan, specially at the latter parts of a mission. Insum, by adding extra information in the model apart from the operator interactions, weachieve more robust and informative models than those from previous works in the field.

On the other hand, the first order Markov assumption in which HMMs rely, and theassumed independence between the operator actions along time, limit their modellingcapabilities. To overcome these issues, we analyse the applicability of DCMMs, whichprovide a flexible modelling framework in which two high order Markov chains (one hiddenand one visible) are combined. In this work we proposed a method to rank and selectDCMMs based on a set of evaluation measures that quantify the predictability and theinterpretability of the models.

The resulting models after an experiment in the simulation environment DWR show that,despite the inclusion of higher order hidden chains do not substantially improve the qualityof the model neither in terms of predictability nor interpretability. However, adding avisible Markov chain among the model observations improves the predictive capabilitiesof DCMMs over classical HMMs, while maintaining a fair level of interpretability. In anycase, these results only show the conclusions for a specific simulation environment, andwhat it is really interesting for the state of the art is the flexibility and richness of theproposed modelling framework over the current HMM-based methods.

ïżœ Q3: Is it possible to automate the procedure following evaluation?Automating the task of Procedure Following Evaluation (PFE) needs of a method forcomparing Operating Procedures (OPs) and mission data. According to the literature,the family of techniques focused on comparing process models with data from the sameprocess is known as conformance checking. However, classical conformance checking is notsuitable for PFE because normally, the fulfillment of the steps of an OP depends on howone or many variables in the log evolve, which cannot be handled with a simple eventlog and a task-event matching. In this work, this issue is overcome by introducing theperspective of time series data into conformance checking. This causes a paradigm shiftin conformance checking, and process mining at a more general level.

In order to implement this perspective of conformance checking, the use of event logs hasbeen replaced by time series logs, and a subtype of Petri nets, namely a TSWF-net, hasbeen defined to represent a time series-aware process model. An algorithm for detectingthe discrepancies between a time series log and a TSWF-net has been presented, returningnot only whether a task of the model has been matched in the log or not, but also thematching time and the possibility that the task is matched in a time interval differentthan expected. To illustrate the effectiveness of our approach as a solution to automaticPFE, a case study was carried out by designing and modelling an emergency OP of a realUAS, and evaluating it using the proposed approach against a battery of test operations.Although the algorithm used in the case study is basic and it could be enhanced byreturning more information about the discrepancies between the operator behaviour and

Page 79: Automated Methods for the Evaluation and Analysis of

62 Chapter 5. Conclusions and Future Works

the OP, the process runs on a fully automatic basis. A graphical representation of theresults of the developed automation can be seen in [RARFGPC17].

ïżœ Q4: Can we apply any of the proposed methods in a different domain than UASs?

With respect to the task of performance analysis, the proposed method for extractingrepresentative profiles of operator performance has one main drawback: it depends on theexistence of direct performance measures in order to define a profile. These measures areusually related to the specific system that is being analyzed, and may not exist or maynot be significant. However, in purely methodological terms, the proposed approach isadaptable to measures of any nature, since it searches and validates different time seriesdistances and clustering configurations without the need to select any parameter a priori.Therefore, its application in new domains is simple once a set of reliable performancemeasures is available.

In the same way as before, the proposed methods for modelling behaviour in a UAS throughMC-HMMs and DCMMs have been applied only in the simulation environment DWR, butare easily scalable to other UAS-like simulation environments, or even to other human-machine interaction systems, providing that certain conditions are met: 1. The systemlogs sequential data, from both the interactions of the user and the context of the process;2. The number of possible interactions that the user can perform is not very high; 3. Theresulting model is meant to feature a combination of predictability and interpretabilitycapabilities.

With respect to the task of procedure following evaluation, the solution proposed in thiswork relies on an adaptation of conformance checking techniques to time series-awareprocesses. In this sense, all the formalizations and algorithms have been designed in ageneral way, without taking into account the specific domain of UASs. Only in the firstcase study the approach has been implemented specifically for evaluating a real operatingprocedure of a UAS. However, a second case study has been carried out in a completelydifferent domain (longwall mining), proving the generality of the proposed approach.

5.2 Future lines of work

Finally, there are several lines of work that could be extended in the near future related to thedifferent methods and algorithms presented in this dissertation:

ïżœ A formal comparison of the proposed methods across different UAS-like simulation envi-ronments, and even across other human-machine interaction systems.

ïżœ Related to HMM-based behavioural modelling, a study of how the two proposed extensions,i.e., the use of MC-HMMs and DCMMs, can be applied together, in form of a MultichannelDouble Chain Markov Model (MCDMM), would be of great interest in the improvementof the behavioural patterns found in the model. In this sense, it is also worth to base theproposed extensions in HSMMs instead of HMMs, due to the use of Semi-Markov chainsallow an explicit modelling of the duration of the hidden states, which is useful for modernUASs-like environments where the operator spends most of the mission time supervisingthe state of the mission, and thus, the number of interactions is low.

Page 80: Automated Methods for the Evaluation and Analysis of

5.2. Future lines of work 63

ïżœ The use of covariates in the creation of behavioral models, in order to compare patternswith respect to specific operator features, such as the age or the previous experience withUAV operations.

ïżœ The use of state of the art hyperparameter optimization techniques, such as model-basedbayesian optimization, for searching the best configurations for both the approaches basedon clustering and Markov models.

ïżœ Development of a Markovian benchmarking tool, which provides, given a database of dis-crete sequences (possibly multivariate), a ranking of Markovian-based models that betterscore on a set of evaluation measures, selected by the user. The ranking may include notonly HMMs, but also HSMMs, DCMMs, plain Markov chains, and multichannel versionsof all of them.

ïżœ Both the clustering-based model obtained for analysing the operator performance profilesand the HMM-based models created for extracting behavioural patterns have been mainlyused for providing the instructor with valuable information to interpret and analyse. How-ever, these models, as any machine learning result, can be used with prediction purposes.In fact, the selection of DCMMs in this work takes into account explicitly the predictivecapabilities of the model in order to choose it as a good candidate. Thus, it is importantto deploy the resulting models as an online predictive tool that detects, sufficiently inadvance, abnormal behaviours of the operators during a real mission, as well as significantreductions of their performance.

ïżœ The approach of time series-aware conformance checking developed for automating theprocedure following evaluation in UASs paves the way for a deeper inclusion of time seriesas a branch of research on process mining. In this sense, some related lines of work mayinclude: 1. Study of other possible conformance categories that can result from comparingtime series and process models; 2. Improvement of the current version of the algorithm, sothat a task can represent a state instead of an event. This way, inconsistencies between de-pendant tasks such as the one shown in Figure 4.7b would be addressed; 3. Implementationfor declarative models (only procedural models represented as Petri Nets are consideredin this work); 4. Exploration of visualisation methods for the results of the proposed al-gorithm, in cases where the number of variables in the time series log or the number offulfilment paths in the model is high; 5. Development of automatic methods that choosethe best parameters (e.g. time scopes, ts-guard parameters) for a specific process model,given a training dataset.

Page 81: Automated Methods for the Evaluation and Analysis of

Part II

Publications

65

Page 82: Automated Methods for the Evaluation and Analysis of

PUBLICATION 1

Analysing temporal performance profiles of UAV operatorsusing time series clustering

ïżœ Rodrıguez-Fernandez, Vıctor, Hector D. Menendez, and David Camacho. 2017. “AnalysingTemporal Performance Profiles of UAV Operators Using Time Series Clustering.” ExpertSystems with Applications 70: 103–18. DOI: 10.1016/j.eswa.2016.10.044

– State: In Press. Published Online.

– Impact Factor (JCR 2017): 3.768

– Category:

* Computer Science, Artificial Intelligence. Rank: 20/132 [Q1].

* Engineering, Electrical & Electronic. Rank: 42/260 [Q1].

* Operation Research & Management Science. Rank: 8/84 [Q1/D1].

– Contribution of the PhD candidate:

* First author of the article.

* Co-authoring in the conception of the presented idea.

* Definition of the measures that comprise a temporal performance profile.

* Design, implementation and testing of the proposed method.

* Design and implementation of the webapp used to evaluate the results.

* Design and execution of the experiments.

* Co-authoring in the interpretation and discussion of results.

* Writing of the manuscript with inputs from all authors, and design of the figures.

67

Page 83: Automated Methods for the Evaluation and Analysis of

Expert Systems With Applications 70 (2017) 103–118

Contents lists available at ScienceDirect

Expert Systems With Applications

journal homepage: www.elsevier.com/locate/eswa

Analysing temporal performance profiles of UAV operators using time

series clustering

VĂ­ctor RodrĂ­guez-FernĂĄndez

a , ∗, HĂ©ctor D. MenĂ©ndez

b , David Camacho

a

a Universidad AutĂłnoma de Madrid (UAM) 28049, Madrid, Spain b University College London (UCL), London, UK

a r t i c l e i n f o

Article history:

Received 27 July 2016

Revised 27 September 2016

Accepted 18 October 2016

Available online 28 October 2016

Keywords:

UAVs

UAV operators

Time Series Clustering

Performance measures

Simulation-Based Training

a b s t r a c t

The continuing growth in the use of Unmanned Aerial Vehicles (UAVs) is causing an important social

step forward in the performance of many sensitive tasks, reducing both human and economical risks.

The work of UAV operators is a key aspect to guarantee the success of this kind of tasks, and thus UAV

operations are studied in many research fields, ranging from human factors to data analysis and machine

learning. The present work aims to describe the behaviour of operators over time using a profile-based

model where the evolution of the operator performance during a mission is the main unit of measure.

In order to compare how different operators act throughout a mission, we describe a methodology based

of multivariate-time series clustering to define and analyse a set of representative temporal performance

profiles. The proposed methodology is applied in a multi-UAV simulation environment with inexperi-

enced operators, obtaining a fair description of the temporal behavioural patterns followed during the

course of the simulation.

© 2016 Elsevier Ltd. All rights reserved.

1. Introduction

Unmanned Aerial Vehicles (UAVs) have become a relevant area

in the last decade. The main goal of this field is to replace hu-

man supervision in several sensitive tasks using UAVs in an ac-

curate way. The automation of these tasks supposes an important

step forward in several areas of our societies such as: agriculture,

traffic, infrastructure inspection and forestry among others ( Pereira

et al., 2009 ).

In the current state of UAV research and development, there are

some processes that can be almost totally automated with low risk,

but others still require the role of the operator as a critical part

of the entire system. A hard training of these operators is usu-

ally performed to guarantee that they have the appropriate atti-

tudes to handle with this technology, specially in risky situations.

The training process can also help to describe different features of

the trainee, not only technical but also psychological aspects that

might help to prevent dangerous circumstances.

This study focuses on UAV operators and takes information

about how they evolve during a specific simulation, paying spe-

cial attention to how their performance change during the process.

∗ Corresponding author.

E-mail addresses: [email protected] (V. RodrĂ­guez-FernĂĄndez),

[email protected] (H.D. MenĂ©ndez), [email protected] (D. Camacho).

With this information, we build a temporal performance profile of

a simulation which will help to describe the decision abilities.

In previous works we were focused on describing a general pro-

file of the operators, based on their behaviour during the whole

simulation ( Rodríguez-Fernåndez, Menéndez, & Camacho, 2015b ).

Also, the temporal interaction patterns during a mission were

modelled through the use of Hidden Markov Models in RodrĂ­guez-

FernĂĄndez, Gonzalez-Pardo, and Camacho (2015) . However, one of

the most relevant aspects of the training process is the perfor-

mance evolution during the simulation course. This work is fo-

cused on that attitude, creating temporal performance profiles for

different simulations and then extracting and analysing the most

representative of all.

In order to achieve the purposes of this work, we combined

clustering techniques with time series analysis ( Liao, 2005 ), to de-

fine a set of representative simulation profiles, based on the evo-

lution of a set performance measures that describe the attitude of

the operator in specific moments of a simulation. To test the va-

lidity of the proposed methodology, an experiment with inexperi-

enced operators is carried out, simulating a training mission in a

lightweight multi-UAV simulation environment, developed as part

of our previous work in the field ( Rodriguez-Fernandez, Menen-

dez, & Camacho, 2015a ). Several experiments have been carried out

to evaluate the quality of the results of the methodology and to

compare those results against other clustering approaches. Further-

http://dx.doi.org/10.1016/j.eswa.2016.10.044

0957-4174/© 2016 Elsevier Ltd. All rights reserved.

Page 84: Automated Methods for the Evaluation and Analysis of

104 V. Rodríguez-Fernández et al. / Expert Systems With Applications 70 (2017) 103–118

more, a qualitative analysis of the results have been made in the

context of the experimental simulation environment.

In sum, this paper presents the following contributions:

‱ A new multi-variate time series clustering methodology is de-

fined in the context of performance analysis for UAV operations.

The proposed methodology is divided into two steps: the first

focused on finding patterns in each dimension of the multivari-

ate time series and the second focused on generating a multi-

variate distance using the patterns found in the previous step. ‱ The proposed methodology is scalable to the use of different

time series dissimilarity metrics, different clustering methods

and different number of clusters. ‱ A collective human judgement-based evaluation process is car-

ried out to create ground truth information with which we

are able to evaluate and compare the results of the proposed

methodology. ‱ A quantitative and qualitative interpretation is given for the re-

sults obtained in a lightweight multi-UAV simulation environ-

ment.

The rest of the paper is structured as follows: next section

presents the Related Work, after, Section 3 describes the proposed

methodology, emphasizing on its division into two steps. Then,

Section 4 provides a description of how to apply the proposed

methodology to a specific simulation environment, detailing the

environment itself, the defined performance measures compris-

ing a simulation profile, and the evaluation criteria used to judge

whether the results are right in an objective way. In Section 5 we

carry out some experiments to evaluate and compare quantita-

tively the quality of the proposed methodology, and afterwards,

Section 6 makes a qualitative analysis of the results obtained. Fi-

nally, Section 7 presents the conclusions and future work.

2. Related work

This section aims to provide a general overview around the two

main fields of this work: UAV’s research and machine learning al-

gorithms. We start by introducing the current problems that have

been frequently studied in UAVs and after that, we describe clus-

tering models that can be found in the literature.

2.1. UAVs research

UAVs research aims to solve different problems related to this

area in order to create a competitive field that can help in soci-

eties development, by automating complex human tasks. Several

of the ideas are based on the design and development of these

new vehicles, however, from this work perspective, we are more

aware about the intelligence and the autonomy of these systems,

specially for the new multi-UAVs systems.

Since the current state of the research do not allow fully in-

dependent and intelligent UAV operations, it is important to focus

on the human factors associated to these technologies. Considering

the importance of the operator work and, specially, the sensitive-

ness of their tasks and the costs of these technologies from both

human and economical perspectives. It is critical to have appro-

priate means to measure and monitor the operator performance.

For this reason, there are several works focused on analysing be-

havioural features during UAV operations, specially in the fields

of Human Supervisory Control (HSC) and Human-Robot Interac-

tion (HRI) systems ( McCarley & Wickens, 2004 ). These features are

usually measured according to the performance standards on HRI

systems, which focus on the operator workload and its Situational

Awareness ( Drury, Scholtz, & Yanco, 2003 ). In order to gather in-

formation related to direct measures of performance, as the ones

used in this work, some ideas are taken from the video games field

( Begis, 20 0 0 ).

From a more general perspective, there are two main research

lines in Unmanned Aircraft System (UAS) systems: those focused

on the system design ( Lemaire, Alami, & Lacroix, 2004 ) and those

developing efficient training processes for the operators ( McCarley

& Wickens, 2004 ). The former is relevant according to the num-

ber of operators needed to manage a single UAV (typically the

model is many-to-one, where several operators manage a single

UAV). The later, related to the former, is focused on how to pre-

pare the new operators to deal with these complex tasks, ensuring

that the trainee is highly qualified after this process. Due to these

systems are currently evolving fast, the training systems need to

be redesigned frequently, in order to meet the demands. Besides,

in order to cope with the enormous future demand of UAVs oper-

ators, it is interesting to extend the availability of these technolo-

gies to new inexperienced but promising users, such as video game

players ( McKinley, McIntire, & Funke, 2011 ).

2.2. Machine Learning and Clustering Analysis

Machine Learning is the process of extracting knowledge-based

models from data, identifying different patterns ( Larose, 2005 ).

Machine Learning techniques have been successfully applied to

several different fields, such as medicine ( Lavra c, 1999 ), sports

( Menéndez, Bello-Orgaz, & Camacho, 2013 ), security ( Portnoy, Es-

kin, & Stolfo, 2001 ) and transport ( Liao, Patterson, Fox, & Kautz,

2007 ), among others. There are several areas related to Machine

Learning, however, in this work we focus on unsupervised learn-

ing, specifically clustering analysis ( Larose, 2005 ).

Clustering is focused on discovering knowledge blindly with no

labelled information ( Larose, 2005 ). This process groups the data

according to some criteria defined by the analyser. The groups are

named clusters and satisfies two main properties: the objects in-

side a cluster are related to each other, and objects of different

clusters are different ( Hruschka, Cam pello, Freitas, & de Carvalho,

2009 ). These properties make the evaluation process a difficult

task ( Schaeffer, 2007 ), and it is still an open problem. However,

there are some validation methods based on evaluation indexes

(such as the Silhouette or the Dunn index) that provide an objec-

tive quality measure of the clustering discrimination process. There

are lots of clustering algorithms, some of them based on different

perspectives of the clustering problem and the information that

can be extracted from the search space. Good and relevant exam-

ples are the centroid-based approaches ( Macqueen, 1967 ), where

the algorithm optimizes the position of a set of centroids in a

known search space, and medoid-based approaches ( Kaufman &

Rousseeuw, 1987 ), where the features of the search space are un-

known and only the distance between the data instances is known.

Using this distance, the most relevant data instances (the so-called

medoids) are chosen as the most representative elements of each

cluster.

The most classical clustering algorithms are K-means

( Macqueen, 1967 ), Expectation Maximization ( Dempster, Laird,

& Rubin, 1977 ) and Hierarchical Clustering. The first two algo-

rithms are based on statistical iterations over the parameters of a

specific estimator, while Hierarchical Clustering nests the clusters

by hierarchical levels, describing degrees of similarity by level.

Modern algorithms are based on other properties that can be

extracted from data, such as continuity ( von Luxburg, 2007 ) (i.e.,

the shape defined by the data in the space) or density ( Navarro,

Frenk, & White, 1997 ). These different ways of dividing the space

increase the analyst choices when selecting the appropriate algo-

rithm, and thus the validation process becomes a relevant step in

order to determine which is the best solution for a given dataset

with respect to the algorithm and metric. Furthermore, another

Page 85: Automated Methods for the Evaluation and Analysis of

V. Rodríguez-Fernández et al. / Expert Systems With Applications 70 (2017) 103–118 105

important parameter that is commonly unknown during the

clustering process is the number of clusters. Finding the optimum

number of clusters is also an open problem, but nevertheless the

validation process also provides a general idea about the quality

of the cluster according to this parameter ( Brock, Pihur, Datta, &

Datta, 2008 ). In this work we are focused on developing a robust

validation for the clustering results.

Clustering is also applied to time series. This area, also known

as time series clustering ( Liao, 2005 ) consists in finding similar time

series, grouping them into clusters describing the general trends

within the data, and predicting the evolution of a specific time

series according to the group it belongs to. Authors working in

these scenarios have been specially focused on solving missing val-

ues problems or large data volumes, as Iorio et al. who generate

a simplified time series using P-splines, which are specially ro-

bust to missing values ( Iorio, Frasso, DAmbrosio, & Siciliano, 2016 ).

Some application domains of time series clustering are: anomalous

event detection ( Piciarelli, Micheloni, & Foresti, 2008 ), social me-

dia trends ( Yang & Leskovec, 2011 ), and video game-user profiling

( Menéndez, Vindel, & Camacho, 2014 ).

The main goal of this work is to combine clustering algorithms

and evaluation indexes to produce a robust process for clustering

time series data. This algorithm will group UAV operators’ profiles

during their training process according to their evolution.

3. Proposed methodology for the automatic retrieval of

representative simulation profiles

This section is focused on describing the methodology proposed

in this work to retrieve automatically the most representative sim-

ulation profiles from a simulation environment. A simulation pro-

file comprises a set of time series (i.e., it is a multivariate time

series) representing the evolution of a number of performance

measures throughout the execution of a mission. Obtaining and

analysing the most representative simulation profiles is really use-

ful for improving the quality of simulation-based training systems,

since it can help not only to exploit general behavioural patterns

among simulations, but also to detect off-nominal performances

and to study whether the behaviour of a specific operator changes

when he/she is encountered in dangerous situations.

Given a log of simulations and a set of M performance mea-

sures, this process will blindly compute those measures for each

simulation and extract the most representative profiles using a

two-step clustering-based process. At the end of the process, sev-

eral representative profiles will be generated, ready to be analysed

and described by a domain expert.

Below are detailed the two steps in which this methodology can

be divided, namely the independent discrimination of each of the

performance measures and the final extraction of the simulation

profiles. In Figs. 1 and 2 , a graphical overview of this process is

shown.

3.1. Step 1: Applying Time Series Clustering on every performance

measure separately

Suppose we have a dataset composed of N simulations, each

of them containing all the interactions and events happened dur-

ing a simulation in a specific simulation environment. By using a

set of M time-dependant performance measures, every simulation

is processed and transformed into M time series, i.e., into a M -

dimensional time series. Each dimension represents the evolution

of a performance measure. This multivariate time series comprises

the profile of that simulation, namely the simulation profile .

The first step in this methodology consists in extracting pat-

terns among the M performance measures (i.e. among the M di-

mensions) separately . For this purpose, we will make use of Time

Series Clustering Techniques . A graphical overview of this step of the

methodology is shown in Fig. 1 .

In order to perform time series clustering, we need to fix three

important parameters:

‱ Time S eries dissimilarity metric ( ÎŒ) : A crucial question in cluster

analysis lies in establishing what we mean by “similar” data ob-

jects, i.e., determining a suitable similarity/dissimilarity metric

between two objects. In the specific context of time series data,

the concept of dissimilarity is particularly complex due to the

dynamic character of the series. In this work, since the dura-

tion of two simulations usually differs, only those dissimilarity

metrics which accept series of different length will be allowed

to be part of the methodology. Once a dissimilarity metric is

applied over a set of time series, a pairwise dissimilarity ma-

trix is obtained and taken as a starting point for a conventional

clustering algorithm. ‱ Clustering method ( Ο ) : Choosing the best clustering method a

priori for a given data is a difficult task. The only requirement

imposed in this methodology over the algorithm to use is that

it can be used with dissimilarities instead of raw data. ‱ Number of clusters (k 1 ) : Other critical point in many clustering-

based systems is to establish the optimal number of clusters,

namely k 1 , given a dissimilarity matrix and a clustering method

to use.

Since we have no prior information about the different groups

in which each performance measure can be discriminated, we will

compute different clustering solutions using different values of Ό,

Ο and k 1 . Then, in order to automatically decide which is the best

discrimination for each performance measure, the results of all

those clusterizations will be assessed by three internal validation

indices , based on the works of Hennig and Liao (2013) :

‱ Average Silhouette Width (ASW) : The silhouette of an observa-

tion in a specific clusterization measures the degree of confi-

dence with which we can ensure that the observation really

belongs to the cluster it is assigned ( Rousseeuw, 1987 ). Given

an observation i the silhouette for that observation, s ( i ), is de-

fined as:

s (i ) =

b i − a i max (b i , a i )

,

where a i is the average intra-cluster distance for i , and b i the

average inter-cluster distance with respect to the nearest clus-

ter to i , i.e.:

b i = min

C k ∈ C \ C (i )

∑

j∈ C k

d ist(i, j )

n (C k ) , (1)

where C ( i ) represents the cluster to which i is assigned, and

n ( C k ) the number of observations contained in cluster C k . The

closer s ( i ) gets to 1, the more confidence we have of i as well-

assigned, and viceversa if s ( i ) gets close to −1 . Finally, to com-

pute the silhouette width of a clusterization, we simply com-

pute the average silhouette value for each observation:

S(C) =

∑

C k ∈ C ∑

i ∈ C k s (i )

| C| (2)

The result lies in [ −1 , 1] , and should be maximized in order to

achieve a good discrimination. ‱ Calinski and Harabasz index (CH) : Proposed in Cali nski and

Harabasz (1974) and popularized in Milligan and Cooper (1985) ,

it establishes a ratio between the separation and cohesion of a

partition, defined as:

B (k )(N − k )

W (k )(k − 1) ,

Page 86: Automated Methods for the Evaluation and Analysis of

106 V. Rodríguez-Fernández et al. / Expert Systems With Applications 70 (2017) 103–118

Fig. 1. Step 1: finding the best discrimination for each of the M performance measures separately.

where k denotes the number of clusters and B ( k ) and W ( k )

denote the between (separation) and within (cohesion) clus-

ter sums of squares of the partition, respectively (see details

in Hennig & Liao (2013) ). An optimal clusterization maximizes

this measure. ‱ Pearson version of Huberts ïżœ (PH) : This metric rates the Pear-

son correlation, ρ( d, v ) between the vector d of pairwise dis-

similarities and the binary vector v that is 0 for every pair of

observations in the same cluster and 1 for every pair of ob-

servations in different clusters. It was proposed by Baker and

Hubert (1975) and revised by Halkidi, Batistakis, and Vazirgian-

nis (2001) to overcome some computational problems. Best dis-

criminations are obtained when this value is maximized.

In order to automatically choose the best discrimination based

on these validation indices, we define a final Validation Rating (VR),

which balances the scores obtained for each of the indices defined

above. Since all the indices defined denote better clusterizations

when maximized, the Validation Rating (VR) is defined as:

VR (Ό, Ο , k ) =

ASW (Ό, Ο , k )

max Ό,Ο ,k ASW

+

CH(Ό, Ο , k )

max Ό,Ο ,k CH

+

P H(Ό, Ο , k )

max Ό,Ο ,k ph

, (3)

where k 1 refers to a specific number of clusters tested in the val-

idation process, Ό refers to a time series metric and Ο to a clus-

Page 87: Automated Methods for the Evaluation and Analysis of

V. Rodríguez-Fernández et al. / Expert Systems With Applications 70 (2017) 103–118 107

Fig. 2. Step 2: Extracting the most representative simulation profiles based on the clustering results obtained by following the process described in Fig. 1 .

tering method. Using the criteria of Eq. (3) allows us to choose a

discrimination that may not be the best in one of the validation

indices, but guarantees reasonable values in all of them. The com-

bination of parameters Ό, Ο and k 1 whose clustering result maxi-

mizes the value of VR ( Ό, Ο , k ) will be chosen and pass to the next

step.

In this step, we have considered each performance measure as

an independent value with respect to the rest (i.e., we only cluster

time series using a specific performance measure in an univariate

way). By applying the above validation process, we automatically

obtain a set of M clusterizations containing the shared patterns

found among each of the performance measures separately. Note

that it may be the case that different performance measures are

grouped with different values of ( Ό, Ο , k ), depending on the na-

ture of the data and the measure itself. In fact, the more values

we try for each of these parameters, the more chances we have

of finding a suitable discrimination of a given performance mea-

sure, which provides an easy and scalable framework for the use

of this methodology in different simulation environments. In the

next step, we will define a multi-variate distance for a whole sim-

ulation profile using the results obtained in this step.

Once Step 1 is finished, the M performance measures of all

the simulations in the dataset have been clustered into groups

of shared temporal behaviour, sharing some features such as the

monotony or the minimum and maximum values reached. The

next step consists in using those clusters to define the similitude

between two simulation profiles. This part of the methodology is

based on the work of Menéndez et al. (2014) .

Let {

C m

i

}k m

i =1 be the clusters obtained after applying time series

clustering on the mth performance measure. Note that the number

of clusters, k m

, can vary depending on the measure referred. Each

of the N simulations profiles will belong to one cluster per mea-

sure. Denoting by c m

n , 1 ≀ n ≀ N , 1 ≀ m ≀ M the assignation of the

nth simulation profile to a cluster of the mth performance measure,

with c m

n ∈

{C m

1 , . . . , C m

k m

}, we can build a N × M matrix containing

the cluster assignations for all the simulations in the dataset: ⎛ ⎜ ⎜ ⎜ ⎝

c 1 1 · · · · · · c M

1 . . .

. . .

c 1 N · · ·

. . .

· · ·. . .

c M

N

⎞ ⎟ ⎟ ⎟ ⎠

(4)

Rows in Eq. (4) represent different simulation profiles and

columns account for each of the M performance measures used.

Given this matrix, we define a dissimilarity metric between two

simulation profiles (rows) based on the number of cluster assigna-

tions shared among them. Formally, the Cluster Assignation Distance

(CAS) between two simulation profiles s i and s j is defined as:

CAS (s i , s j ) = 1 −∑ M

m =1 ÎŽm

i, j

M

, (5)

where M is the number of performance measure considered, and ÎŽis the Dirichlet delta (i.e., the coincidences) defined as:

ÎŽm

i, j =

{1 if c m

i = c m

j

0 otherwise

Given the cluster assignation matrix from Eq. (4) and the dis-

similarity metric from Eq. (5) , the pairwise dissimilarity matrix

among all the simulation profiles can be computed, and used as

input for a conventional clustering algorithm. In this case, since

we are interested in analysing the most representative simulation

profiles, we will perform a medoid-based clustering algorithm to

gather the simulation profiles together based on the defined dis-

similarity metric and extract the medoids of each of the resulting

clusters.

For this work, the medoid-based clustering algorithm used in

this last step is fixed, and it is the classical Partition Around

Medoids (PAM) method ( Kaufman & Rousseeuw, 1987 ). However,

as it happened in the first round of clustering of this methodology,

an optimal number of clusters (or medoids in this case), namely

k 2 , needs to be established. The process to select this value will be

Page 88: Automated Methods for the Evaluation and Analysis of

108 V. Rodríguez-Fernández et al. / Expert Systems With Applications 70 (2017) 103–118

Fig. 3. Screenshot of Drone Watch And Rescue (DWR).

the same as in the previous step, i.e., we will assess several pos-

sibilities via a set of validation indices and get the one maximiz-

ing a balance ratio among all of them (See Eq. (3) ). After that, the

optimal medoids will be obtained and will conform the most rep-

resentative simulation profiles in the dataset. The analysis of these

medoids, carried out by a domain expert, will give helpful infor-

mation about the behavioural patterns followed in the simulations

and the causes that increase or decrease the performance of an

operator over time.

4. Experimental setup

In this section, the proposed methodology is tested using a

lightweight multi-UAV simulation environment. Below are given

all the necessary details to understand how the methodology has

been applied, including a brief overview of the simulation environ-

ment used, a formal description of the 6 performance measures

comprising a simulation profile in this environment, the process

of creating ground truth information to evaluate the clustering re-

sults, the dataset used and the different parameters fixed for the

whole process.

4.1. DWR - a multi-UAV simulation environment

Retrieving data from the interactions of UAV operators during

a multi-UAV simulation is a novel task, due to the premature state

of the works in this field. This is causing an impediment to expand

the analysis in this field towards an accessible place, where an in-

expert user could be trained to become a potential expert in UAV

operations ( Cooke, Pedersen, Connor, Gorman, & Andrews, 2006;

McKinley et al., 2011 ).

For this reason, the simulation environment used as the basis

for this work has been designed following the criteria of accessi-

bility and usability. It has been named as Drone Watch And Rescue

(DWR), and its complete description can be found in Rodriguez-

Fernandez et al. (2015a) . DWR gamifies the concept of a multi-UAV

mission (see Fig. 3 ), challenging the operator to capture all mission

targets consuming the minimum amount of resources, while avoid-

ing at the same time the possible incidents that may occur during

a mission. To avoid these incidents, an operator in DWR can per-

form multiple interactions to alter both the UAVs in the mission

and the waypoints comprising their mission plan. One important

aspect to remark about this type of simulations is that the level of

user interaction is usually low. Operators are instructed to follow

a restricted set of procedures in order to overcome incidents, but

they are not supposed to interact with the system actively when

the mission is going right ( Boussemart & Cummings, 2011 ).

Below is listed all the possible interactions that an operator can

perform in DWR:

‱ Select UAV : Allows the operator to focus, monitor and send

commands to a specific UAV. ‱ Set UAV speed : Change the speed of a selected UAV. ‱ Set simulation speed : Increase or decrease the simulation speed.

Usually, UAV missions last many hours, thus sometimes it is

desirable to accelerate the process to allow a fast simulation-

based training. The minimum possible simulation speed is 1,

which means that it is equal to real time. The maximum pos-

sible value is 10 0 0, which means that it is 10 0 0 times higher

than real time. ‱ Change UAV path : Add/edit/remove waypoints of any UAV. In

the case of adding new waypoints, the behaviour of the sim-

Page 89: Automated Methods for the Evaluation and Analysis of

V. Rodríguez-Fernández et al. / Expert Systems With Applications 70 (2017) 103–118 109

ulator varies depending on the active control mode (see control

modes below). ‱ Edit waypoint table : Waypoints can be rearranged in a way-

points table, increasing or decreasing their order. ‱ Set control mode : Control modes in DWR manage how an op-

erator can change the current path of a UAV. There are three

control modes:

1. Monitor : This is the default control mode and allows the op-

erator to see and edit the position and order of the way-

points of the selected UAV, but not to add new waypoints.

2. Add waypoints : This control mode allows the operator to

view and edit the UAV waypoints, and also to add new way-

points at the beginning of the UAV path, maintaining the

rest of the waypoints unchanged.

3. Manual : This control mode allows the operator to define a

new path, deleting the previous one.

Regarding the incidents that may occur during the execution

of a simulation, three different types have been implemented in

DWR:

‱ Danger area : Due to a heavy storm or any other weather threat,

a new danger area appears somewhere in the map. When a

UAV overflies it, it will be automatically destroyed. To overcome

this incident, the operator must change the flying path of the

UAVs at risk of flying over these areas. ‱ Payload breakdown : The sensors conforming the UAVs payload

stop working. From this moment, the UAV is not able to detect

any target. To overcome this incident, the operator must com-

mand the affected UAV to return to an airport, where it will be

repaired. ‱ Low fuel : When the fuel level of a UAV is lower than a prede-

fined threshold, an alert will be displayed notifying about the

incident. The operator must command the affected UAV to fly

to the closest refueling station in the mission map.

DWR saves data from a simulation whenever an event occurs

during a simulation, DWR stores the simulation status in that mo-

ment, as a Simulation Snapshot . This snapshot contains information

related to the current status of every element taking part in the

simulation. Storing the data in this way allows to reproduce the

entire simulation, which is helpful for the analysis process.

4.2. Performance measures on DWR

The simulation environment DWR, introduced in Section 4.1 , re-

trieves information about the events triggered and the interactions

performed by an operator throughout the execution of a mission.

In this section, all the retrieved information will be used to define

a set of performance measures which assess the performance of

a user in a specific simulation. These measures form the basis for

subsequent analysis.

In previous works ( RodrĂ­guez-FernĂĄndez et al., 2015b ), the per-

formance measures were computed globally , hence every simula-

tion was described as a numeric tuple (m 1 , m 2 , . . . , m M

) (assuming

that a number of M metrics has been defined), where each met-

ric m i was represented by a value in the range [0, 1], being 0 the

worst performance for that metric, and 1 the best.

However, in this work every performance measure is defined as

a time series, thus not only we are able to analyse the general per-

formance of a simulation, but also to study the performance evolu-

tion and to detect the time intervals where the values of a specific

measure tend to increase or decrease.

A total of six performance measures have been defined:

Score(S), Agility(A), Attention (At), Cooperation (C), Aggressiveness

(Ag) and Precision (P). All of them take values in the range [0, 1],

and are defined cumulatively over time. This means that, given an

instant t in the simulation time, the value of a performance mea-

sure will depend on the information retrieved from time 0 (simu-

lation start time) to time t . Following this, a simulation profile s is

defined as a multivariate time series with the 6-tuple:

(S(s ) , A (s ) , At(s ) , C(s ) , Ag(s ) , P (s ))

Below are described each of the performance measures developed

for this work.

4.2.1. Score

The Score ( S ) measure gives a global success/failure rate of a

simulation. The main goal for an operator in DWR is to detect the

maximum number of targets, while keeping safe all the UAVs in

the mission. Based on this description, we define the score of a

simulation s as:

S(s, t ) =

1

2

[ | t D (s, t ) | | T (s ) | +

(1 − | dUAV s (s, t ) |

| U(s ) | )]

(6)

where tD ( s, t ) and dUAVs ( s, t ) refer to the targets detected and the

UAVs destroyed respectively up to time t, T ( s ) is the set of all mis-

sion targets and U ( s ) is the set of all UAVs participating in the mis-

sion.

4.2.2. Agility

Agility ( A ) measures how the speed of the operator interactions

varies during a simulation. The speed of an interaction is given by

the value of the simulation speed at the time when each interaction

was performed. As it was mentioned in Section 4.1 , the simulation

speed in DWR can be set at any time to a value in the range [1,

10 0 0], which causes an acceleration or deceleration of the simu-

lation dynamics. An operator is considered agile if he/she can in-

teract when things are happening fast. Let I ( s, t ) be the set of all

interactions performed up to time t in a simulation s , the Agility

at that time is computed as:

A (s, t) =

1

| I(s, t) | ∑

i ∈ I(s,t)

simulationSpeed (i )

MAX_ SPEED

(7)

where MAX_SPEED = 10 0 0 in this simulation environment and

simulationSpeed(i) gives the speed in which the simulation was

running at the moment when the interaction i was made. Note

that computing Eq. (7) over time can be seen as calculating the

average speed of the interactions cumulatively.

4.2.3. Attention

The Attention measure ( At ) is focused on assessing the progress

of the operator intensity in terms of the number of interactions

he/she performs throughout the simulation time. Let I ( s, t ) be, as

in the previous section, the set of interactions performed from the

beginning of simulation s until time t , the Attention at that time is

defined as:

At (s, t ) = 1 − 1

1 +

√ | I(s, t) | . (8)

Note that the time series generated by computing Eq. (8) over time

presents a monotonous increasing, since the number of interac-

tions | I ( s, t )| always grows when t rises. A square root is introduced

in the equation in order to avoid a fast convergence to 1.

4.2.4. Cooperation

Since the simulations carried out in DWR are focused on multi-

UAV missions, it is important to measure how the operator has in-

teracted with every available UAV. This concept is issued by the

Cooperation measure, which is higher the more the interactions of

a simulation are balanced among all UAVs. Assuming that a simu-

lation s features a total of N UAVs ( Ui ), the set of interactions per-

formed up to time t, I ( s, t ), can be split into N subsets, { I Ui (s, t) } N i =1 ,

Page 90: Automated Methods for the Evaluation and Analysis of

110 V. Rodríguez-Fernández et al. / Expert Systems With Applications 70 (2017) 103–118

depending on which of the N UAVs was being monitored when the

interaction was performed (Some interactions may belong to more

than one subset). Let I U (s, t) = { | I u 1 (s, t) | , . . . | I uN (s, t) | } be the vec-

tor gathering the size of each of these subsets, i.e., the number of

interactions per UAV, the Cooperation is defined as:

C(s, t) =

1

1 +

√

Var (I U (s, t)) ,

where Var () describes the variance between the size of the differ-

ent interaction sets. When the variance is low, the user is interact-

ing in a similar way with all the UAV, therefore, the cooperation

metric tends to 1.

4.2.5. Aggressiveness

The Aggressiveness measure analyses how the operator changes

the strength of his/her interactions during a simulation, in terms

of what control mode it has been activated when some changes

to the path of a UAV are made. Recall that the simulation envi-

ronment used in this work features three control modes ( Monitor,

Add waypoints and Manual ), and each of them allows the opera-

tor to change the waypoints of a UAV in a different way. In Mon-

itor mode, the user is only allowed to move an existing waypoint,

which is considered a “soft” interaction. Mode Add waypoints per-

mits appending new waypoints to an existing path, while mode

Manual allows the user to define a whole new path, which is an

“aggressive” way of interacting with the simulation.

Since we will measure the Aggressiveness according to the way-

point modifications in the three different modes, we define the

sets W MO ( s, t ), W A ( s, t ) and W MA ( s, t ) which represent the set of

interactions with waypoints performed up to time t during the

Monitor, Add waypoints and Manual mode, respectively. Using these

variables, the measure at time t is defined as:

A (s, t) =

α| W MA (s, t) | + ÎČ| W A (s, t) | + Îł | W MO (s, t) | | W (s, t) | ,

α, ÎČ, Îł < 1 , α > ÎČ > Îł ,

where W (s, t) = W MO (s, t) âˆȘ W A (s, t) âˆȘ W MA (s ) represents the com-

plete set of waypoint interactions until time t , used to normalize

the metric in the range [0, 1]. Parameters α, ÎČ , Îł are weight co-

efficients used for balancing the aggressive factor of each type of

interaction. Obtaining values of this metric close to 1 indicates that

the user is performing mostly aggressive interactions at that time,

i.e., he/she is probably defining new paths. On the contrary, values

close to 0 designate moments of quick and soft waypoint handling.

4.2.6. Precision

The Precision ( P ) measure studies the replanning skills of an

operator on a simulation, rating how he has reacted to the mis-

sion incidents. The design of this measure is based on the follow-

ing assumption: a precise operator should only alter the path of

a UAV when an incident occurs. Therefore, the waypoints added

when no incident is happening should penalize the precision rate.

Based on this, we can divide the computation of this measure into

two parts: the precision in times of incidents ( Incident Precision,

P I ) and the precision when nothing is altering the normal execu-

tion of the simulation, i.e., when the operator must only monitor

the simulation status ( Monitoring Precision, P M

).

The Incident Precision ( P I ) supposes that every waypoint

added/edited/removed during a specific interval time from the be-

ginning of an incident (10 seconds for this work) is placed in order

to avoid that incident, so it is considered as a precise interaction.

Let In ( s, t ) be the set of incidents triggered up to time t on simu-

lation s , we can compute P I ( s, t ) as follows:

P I (s, t) =

∑

i ∈ In (s,t) p I (i, s, t)

| In (s, t) | p I (i, s, t) = 1 − 1

1 + | W i (s, t) | ,

where p I ( i, s, t ) gives the precision for an specific incident i .

In this last equation, W i ( s, t ) is the set of all waypoint interac-

tions (add/edit/remove) performed since the incident i started un-

til a maximum of 10 seconds after, i.e., interactions within the

interval [ startTime (i ) , min ( startTime (i ) + t, startTime (i ) + 10)] ). The

more waypoints changed during this interval, the more the pre-

cision increases for that incident.

The Monitoring Precision ( P M

) is conceptually contrary to P I , in

the sense that it penalizes the waypoint interactions performed out

of the scope of incidents, i.e., during monitoring time . Here, the less

interactions the more precision obtained. It is computed as

P M

(s, t) =

1

1 + | W M

(s, t) | , W M

( s, t) =

⋃

i ∈ In (s,t)

W i (s, t) ,

where W M

( s, t ) is the set of all waypoint interactions performed

during monitoring time up to time t , i.e., the complementary of

the union of all waypoint interactions made to avoid any of the

incidents triggered until that moment. Averaging the values of the

Incident Precision and the Monitoring Precision , we finally get the

expression for the Precision measure:

P (s, t) =

P I (s, t) + P M

(s, t)

2

(9)

4.3. Evaluation criteria

In order to perform an external evaluation of the clustering

results obtained in this work, and to compare them objectively

against other clustering approaches, we have created a ground

truth dataset based on collective human judgement, inspired by

the work of Afnan et al. in Al-Subaihin et al. (2016) , where the sim-

ilarity of a set of mobile apps is rated manually by several users.

Human judgement as a way to create ground truth data is also

typical from the field of sentiment analysis. Here, a group of expert

users categorize the opinion expressed in a piece of text, especially

in order to determine whether the writer’s attitude towards a par-

ticular topic is positive, negative or neutral ( Liu, 2012 ).

In this work, the ground truth is created by asking users to rate

the similarity of pairs of time series, corresponding to the evo-

lution of a specific performance measure between two randomly

selected simulations executed in DWR. Ratings are given on a 5-

star rating system ( Pang & Lee, 2005 ), where 1 star indicates the

lowest possible similarity and 5 the highest. Note that, although

in this work the unit of analysis is a simulation profile , which is a

multivariate time series, the item to rate by humans is a pair of 1-

dimensional time series. This is because comparing a pair of mul-

tivariate time series is much more difficult, and thus, the resulting

ground truth would be less reliable.

To measure the degree of consistency among the evaluations

from multiple raters, many statistical measures have been stud-

ied, depending on the number of participants and the type of

scale used. Some examples are the Cohen’s Kappa and Weighted

Kappa, when there are two raters ( Cohen, 1968 ), the Fleiss Kappa

( Fleiss, 1971 ) when multiple raters use a nominal or categorical

scale, or the Interclass Correlation Coefficient (ICC) ( Bartko, 1966 )

for semantic-differential scales. Since we use a ordinal scale with

multiple raters, we select the Kendall’s Coefficient of Concordance

( W ) ( Kendall & Smith, 1939 ). Kendall’s W assigns a value of consis-

tency among the raters that ranges between 0 and 1. Low values

indicate high variations of the scores given to each item by the

raters, and high values indicate more consensus.

The process of rating has been automated by the use of a web-

app, whose graphical user interface can be seen in Fig. 4 . This app

simply takes two random simulations from the simulations dataset,

and choose a random performance measure to show (as a time se-

ries). Once the user has rated their similarity, the app stores the

Page 91: Automated Methods for the Evaluation and Analysis of

V. Rodríguez-Fernández et al. / Expert Systems With Applications 70 (2017) 103–118 111

Fig. 4. Screenshot of the app developed to create a ground truth dataset by labelling the similarity between pairs of time series. In the screenshot, the user is asked to rate

the similarity between two time series representing the evolution of the agility performance measure in two randomly selected simulations executed in DWR.

Page 92: Automated Methods for the Evaluation and Analysis of

112 V. Rodríguez-Fernández et al. / Expert Systems With Applications 70 (2017) 103–118

corresponding information and show a new pair of time series. For

every submitted similarity rating, we store the following informa-

tion:

‱ Rater’s name ‱ Identifier of the two simulations that have been faced in the

rating process. ‱ Name of the performance measure that has been rated (Score,

Agility...). ‱ Value of the similarity rating assigned (one of {1, 2, 3, 4, 5})

Given this data, the Average Similarity Rating (ASR) between

two simulations s i and s j is computed by averaging the ratings val-

ues for all the stored data between this pair of simulations. The

more evaluations we have between every possible pair of simula-

tions, the more reliable will be the ground truth dataset. Formally,

let S p =

{(s i , s j )

}N

i, j=1 be the set of all possible pairs (ignoring or-

der) of simulations in our simulation dataset S , the ground truth of

this work can be defined as a function ASR : S p → [1, 5].

To measure the accuracy of a clustering result against this type

of ground truth, we check whether the pairs of simulations rated

with high similarity ratings are assigned to the same cluster or not.

If the ASR between a pair of simulations in the ground truth is

greater than a given match acceptance threshold ( ΞA ) (a value be-

tween 1 and 5), then the ground truth is saying that they are very

similar, so a given clustering solution should locate them at the

same cluster. On the contrary, if the ASR between the two sim-

ulations is lower than a given match rejection threshold ( ΞR ), the

clustering solution is expected to place the simulations into differ-

ent clusters. If the ASR falls between the two thresholds, then we

consider that the human judgment is not decisive, and that pair is

not taken into account.

With this, let C =

(C s 1 , . . . , C s N

)be a given clustering solution

for those simulations, we define the Pairwise-Accuracy (P-Acc) as

the percentage of concordance between the clustering solution and

the ground truth over every pair of simulations in S p . Algorithm 1

Algorithm 1 Pairwise Accuracy (P-Acc).

Input: C =

(C s 1 , . . . , C s N

)is a the clustering solution to evaluate.

ASR : S p → [1 , 5] is the ground truth function. ξA is the match

acceptance threshold. ΞR is the match rejection threshold

Output: Value between 0 and 100 indicating the pairwise accuracy

of the clustering solution

1: function P-Acc ( C, ASR, ΞA , ΞR )

2: P-Acc ← 0

3: decisivePairs ← 0

4: for (s i , s j ) ∈ S p ( i ïżœ = j) do

5: if ASR (s i , s j ) ≄ ΞR or ASR (s i , s j ) ≀ ΞA then ïżœ

ASR (s i , s j ) ∈ [1 , 5]

6: decisivePairs ← decisivePairs + 1

7: if ASR (s i , s j ) ≄ ΞA & C(s i ) = C(s j ) then

8: P-Acc ← P-Acc + 1

9: if ASR (s i , s j ) ≀ ΞR & C(s i )! = C(s j ) then

10: P-Acc ← P-Acc + 1

11: return ( P-Acc / decisivePairs ) ∗ 100

shows in detail the process to calculate this value. Basically, it

consists in looping over every pair of simulations checking, firstly,

whether the ASR value for that pair is decisive or not, and finally,

whether the clustering solution classify in the same cluster or not

both elements in the pair.

Table 1

Summary of the main features of the test mission used to carry out

the simulations conforming the dataset of this work.

Parameter Value

Map extension 80 0 × 50 0

UAVs 3

Surveillance areas 2

Number of targets 4

Preplanned incidents 4

(2 danger area and 2 payload breakdown)

No flight zones 2

Refueling stations 4

4.4. Dataset

In this work, the simulation environment (DWR) was tested by

Computer Engineering students of the Autonomous University of

Madrid (AUM), all of them inexperienced in this type of systems.

All users completed a brief tutorial before using the simulator, ex-

plaining the mission objectives and the basic controls. After that,

they were told to execute a test mission prepared for this exper-

iment. That mission (see Fig. 3 ) features a total of 3 UAVs per-

forming 4 Surveillance Tasks in 2 different areas, in order to detect

4 mobile targets. The map also presented 4 No-Flight-Zones and

4 Refueling Stations. During the simulation, 4 scheduled incidents

were triggered, affecting both the UAVs and the environment. Al-

though all the incidents were planned to be triggered at the same

simulation intervals, a user could receive an incident sooner or

later depending on the speed with which he/she was running the

simulation. A summary of the contents featured in this mission is

given in Table 1 . For more information about the mission elements

involved in the simulation see Rodriguez-Fernandez et al. (2015a) .

The dataset resulted from this experiment comprises 87 distinct

simulations, executed by a total of 40 users. To achieve a robust

analysis of the data extracted, we must clean the dataset by remov-

ing those simulations which can be considered as useless. Taking

into account the difficulty level of the test mission, we have con-

sidered as useless those simulations aborted before 20 seconds of

duration or those which presented less than 10 interactions. From

the 87 simulations composing our initial simulations dataset, only

55 of them are considered as useful for this experiment, hence

N = 55 .

Regarding the ground truth dataset, a total of 3 raters, namely

the authors of this paper, has contributed to the rating of time se-

ries pairwise similarities. Due to the abundance of possible combi-

nations of the tuple (simulation 1, simulation 2, performance mea-

sure) to rate, we draw a random sample of 20 simulations, S , from

the original dataset S . Thus, the set of possible simulation pairs ˜ S p to be rated contains 190 unique elements (20 ∗ 19/2). Since

there are a total of 6 performance measures for each simulation,

the number of possible cases to rate amounts to 1140 (190 ∗ 6).

After several days of using the web-app for creating the ground

truth (See Fig. 4 ), a total of 1742 evaluations were gathered, cov-

ering 936 out of the 1140 possible rating cases. That means that

for many pair of simulations there are some performance mea-

sures that have not been rated. Depending on the minimum num-

ber of rated performance measures that we establish as necessary to

achieve an accurate analysis, and on the threshold values of ΞA and

ΞR needed to decide whether the human judgement for a given

pair of simulations is decisive or not, the number of useful ground

truth data will vary.

With regard to the Kendall’s coefficient of concordance ( W ),

there are a total of 45 rating cases that have been evaluated by

all the raters. Using this common cases, the achieved coefficient is

0.58 ( p-value = 0 . 0018 ), and thus, according to the common criteria

Page 93: Automated Methods for the Evaluation and Analysis of

V. Rodríguez-Fernández et al. / Expert Systems With Applications 70 (2017) 103–118 113

Table 2

Parameter tuning for all the variables involved in the experimental setup of this work.

Context Parameter Value

DWR Number of performance measures (M) 6

Sampling resolution 20 0 0 ms

Proposed methodology Time S eries metrics Frechet DTW

Clustering methods AGNES DIANA PAM

Possible number of clusters for step 1 ( K 1 ) 2 . . . 8

Possible number of clusters for step 2 ( K 2 ) 3 . . . 8

Ground truth and clustering evaluation Minimum number of performance measures rated for each pair of simulations 4

Match acceptance threshold ( ΞA ) 3 .5

Match rejection threshold ( ΞR ) 2 .5

to judge this value (RemĂžy, 2010) , there is a moderate agreement

among raters.

4.5. Parameter tuning

In this section, all the free parameters seen in the proposed

methodology will be assigned to a value or a set of values in the

context of this experiment. A summary of this parameter tuning is

shown in Table 2 .

Once the dataset has been created, the simulation profile for

every simulation in the dataset has to be processed. To do this, we

will use the set of performance measures defined in Section 4.2 , so

we have a total of M = 6 time series comprising each simulation

profile.

In order to compute a simulation profile, the measures will be

computed for different time steps throughout the whole simulation

duration. These time steps are obtained from sampling the whole

simulation time into equidistant time slots, fixing a time slot res-

olution. That sampling resolution, in this experiment, is computed

automatically as the average distance between subsequent inter-

actions in all the simulation dataset. For the dataset used in this

work, this results in a sampling resolution of 20 0 0 ms. Thus, every

20 0 0 ms we will compute the values of each of the performance

measures defined in Section 4.2 and create the performance time

series for each simulation.

In order to perform time series clustering for each of the per-

formance measures (Step 1 of the proposed methodology), differ-

ent dissimilarity metrics (values of Ό), different clustering meth-

ods (values of Ο ) and different number of clusters (values of k 1 )

are tested during the validation process. Recall that the time series

dissimilarity metrics to use must accept series of different length,

and the clustering methods must work with a pairwise dissimilar-

ity matrix as input.

Regarding time series metrics, two examples are tested and

compared. Both of them allow to recognize similar shapes among

time series, even in the presences of signal transformations such

as shifting or scaling:

1. Fréchet Distance : This distance has been extensively used in the

time series framework for both continuous and discrete cases

(Eiter & Mannila, 1994) . It does not just treat the series as two

point sets, but it has into account the ordering of the observa-

tions and can be computed on series of different length. De-

note by X and Y two discrete time series and by P the set of all

possible sequences of p pairs preserving the data order in the

form ((X a 1 , Y b 1 ) , . . . , (X a p , Y b p )

),

then the Fréchet distance is computed as:

Frechet (X, Y ) = min

P

(max

i =1 , ... ,p | X a i − Y b i |

)2. Dynamic Time Warping Distance (DTW) : This distance, very pop-

ular in the field of time series pattern recognition is aimed to

minimize the sum of distances between the sequence of pairs

as defined for the Fréchet distance. The definition of the DTW

distance is given by:

DT W ( X, Y ) = min

P

( ∑

i =1 , ··· ,m

∣∣X a i − Y b i

∣∣)

Regarding the clustering methods to test, three classical algo-

rithms will be applied, both of them allowing dissimilarity matri-

ces as input data:

1. Agglomerative Nesting (AGNES) : This is one of the most fre-

quently used clustering algorithms ( Kaufman & Rousseeuw,

2009 ). It is a bottom-up, non-parametric hierarchical algorithm.

Each observation is initially placed in its own cluster, and the

clusters are iteratively joined together according to their close-

ness. This closeness of any two clusters is measured by a dis-

similarity matrix between sets of observations, usually achieved

by use of an appropriate metric (Euclidean distance in this

case). The results of this algorithm (and all hierarchical meth-

ods) are usually presented in a dendrogram . This dendrogram

can be cut at a chosen height to produce the desired number

of clusters.

2. DIANA : DIvisive ANAlysis Clustering (DIANA) ( Kaufman &

Rousseeuw, 2009 ) is a divisive hierarchical algorithm that con-

structs the hierarchy in the inverse order ( top-down ). It initially

starts with all observations in a single cluster, and successively

divides the clusters until each cluster contains a single observa-

tion. Although it is usually less efficient than the agglomerative

nesting, DIANA stands out as a competitive clustering algorithm

for many fields ( Datta & Datta, 2003 ).

3. Partition Around Medoids (PAM) : Proposed by Kaufman and

Rousseeuw (1987) , this algorithm is similar to the popular K-

means. In contrast to the k-means algorithm, PAM chooses data

points as centres (called medoids ) instead of centroids.

In order to choose the optimal number of clusters in the first

step of the methodology, we will test different values of K 1 , from

2 to 8. For the second step, we search among values of K 2 from 3

to 8.

Regarding the creation of the ground truth dataset, we set the

minimum number of rated performance measures for every pair

of simulations in 4, over a total of 6. This way, the amount of use-

ful rated simulation pairs in our dataset is reduced from 190 to

132. With regard to the values of ΞA and ΞR , we establish that any

pair of simulations with an average rating above ΞA = 3 . 5 or be-

low ΞR = 2 . 5 will be decisive for the clustering evaluation process.

On this basis, only 128 simulation pairs will conform our decisive

ground truth information.

Every process of clustering, validation and evaluation described

in this work has been implemented in the R Statistical Environ-

ment, using the packages TSclust for the computation of time series

dissimilarities ( Montero & Vilar, 2014 ), and fpc for the computation

Page 94: Automated Methods for the Evaluation and Analysis of

114 V. Rodríguez-Fernández et al. / Expert Systems With Applications 70 (2017) 103–118

Table 3

Summary of the best validation results for the time series clustering of each of the performance

measures used, corresponding to Step 1 of the methodology proposed in this work.

P ERFORMANCE MEASURES

S A At C Ag P

Dissimilarity metric Frechet Frechet DTW Frechet Frechet Frechet

Clustering method PAM PAM AGNES AGNES DIANA PAM

Number of clusters ( k 1 ) 8 8 3 7 8 8

ASW 0 .586 0 .581 0 .708 0 .606 0 .580 0 .587

CH 608 .725 596 .207 1354 .357 529 .529 510 .083 611 .322

PH 0 .389 0 .397 0 .423 0 .436 0 .443 0 .398

Validation R ating ( VR ) 2 .332 2 .335 2 .243 2 .376 2 .390 2 .344

of the validation indices ( Hennig, 2010 ). The final code is available

on Github. 1

5. Experimentation

In this section, we will deepen into the results obtained after

applying the proposed methodology to extract the most represen-

tative simulation profiles in DWR. First of all, we will detail the

intermediate and final validation results of our two-step method-

ology, and check the evaluation results against the ground truth

data. Finally, a comparative study will be carried out in order to

compare the results of the proposed methodology against other

clustering approaches.

5.1. Results from applying the proposed methodology in DWR

Due to the large number of parameter combinations tested to

find a good cluster discrimination for each performance measure

in the first step of the methodology, only the best results are sum-

marized in Table 3 for legibility purposes. As it can be seen, all

dissimilarity metrics and clustering methods tested are selected as

“best” at least once. The Validation Rating introduced in this work

allows an easy comparison among clusterizations and avoids the

differences in the range of each of the validation index. The opti-

mal number of clusters chosen is, excluding the Attention and Co-

operation measures, always the minimum or maximum value of

k tested. This gives us general information about the variance in

the temporal evolution of each of the metrics and must be taken

into account when analysing the simulation profiles: those time

series grouped into 8 different clusters will define a richer set of

behaviours and must be given more importance than those with

only 2 different patterns detected (best k is 2).

Based on the best clusterizations given by the results of Table 3 ,

a N × M cluster assignation matrix is built following the struc-

ture described in Eq. (4) . After applying Eq. (5) over this matrix

and cluster the resulted dissimilarity matrix using a PAM algorithm

with values of k 2 from 3 to 8, we select k 2 = 7 as optimal number

of clusters to separate the simulation profiles. Table 4 shows the

validation results for each of the values of k 2 tested in this last

clustering process. As it can be seen, the selected k 2 not only get

the best general rating (represented by the Validation Rating), but

also maximizes each of the validation indices independently.

The 7 medoids of this clusterization represent the most repre-

sentative simulation profiles for this dataset. Section 6 will focus

on analysing those profiles (medoids) and give some ideas about

the typical behaviours followed by the users of this experiment.

With regard to the external evaluation, we calculate the Pair-

wise Accuracy (P-Acc) of the clustering results as detailed in

Algorithm 1 . The result marks 84.09% , which is quite a good result

1 The code will be published once this work is accepted due to copyright issues

Table 4

Validation results for the final clustering process of

the proposed methodology, corresponding to Step

2. Bolded cells represent the best results obtained.

ASW CH PH VR

K 2 = 3 0 .451 22 .919 0 .592 1 .417

K 2 = 4 0 .628 44 .298 0 .828 2 .086

K 2 = 5 0 .701 53 .13 0 .843 2 .273

K 2 = 6 0 .73 67 .724 0 .897 2 .498

K 2 = 7 0 .788 112 .873 0 .924 3 .0 0 0

K 2 = 8 0 .782 103 .035 0 .807 2 .779

taking into account the accuracy values usually obtained when us-

ing human judgement-based ground truth data. As an example, in

the world of sentiment analysis, values of accuracy above 70% are

considered more than acceptable ( Pak & Paroubek, 2010 ).

5.2. Comparative study between the proposed methodology and other

multivariate time series clustering approaches.

In this section, we are interested in finding out if the proposed

methodology performs better than other clustering approaches.

Due to the unit of analysis that we want to cluster is a simula-

tion profile, which is a multivariate time series composed of the

evolution of several performance measures, we will compare our

approach against a PAM clustering applied over different multi-

variate time series distances from the literature. As a requisite, we

need that the distance accepts time series of different length, so

that we can compare simulations with different durations. Below

is detailed the list of multivariate time series metrics used for this

comparison:

‱ Mean Frechet : This metric computes the Frechet distance for

each component of the multivariate time series and averages

the results. ‱ Mean DTW : The same than Mean Frechet but using DTW as

metric. ‱ Penrose Distance : Proposed by Penrose in ( Penrose, 1952 ), it

computes a distance based on means, variance and covariances

for each sample based on p variables. It takes into account

within population variation by weighting each variable by the

inverse of its variance, but does not account for correlations

among variables. ‱ Mahalanobis Distance : Described in ( De Maesschalck, Jouan-

Rimbaud, & Massart, 20 0 0 ), this distance is very similar to the

Penrose distance, except for the fact that in this case, the con-

tribution of each pair of variables is “weighted” by the inverse

of their covariance.

The results of this comparison are shown in Table 5 . The value

used to compare the clustering results is the Pairwise Accuracy (P-

Acc) against our ground truth dataset (See Algorithm 1 ). Results for

the other clustering approaches are given for different number of

Page 95: Automated Methods for the Evaluation and Analysis of

V. Rodríguez-Fernández et al. / Expert Systems With Applications 70 (2017) 103–118 115

Table 5

Comparative results, in terms of Pairwise Accuracy (P-Acc), between the proposed methodology against direct clus-

tering approaches based on multivariate time series distances. The results are compared for different number of

clusters ( K 2 ). While the bolded cell indicates the result obtained for the proposed methodology, cells in italics show

the results that surpasses our best value.

Penrose distance Mahalanobis distance Mean DTW Mean Frechet Proposed methodology

K 2 = 3 60 .61 46 .97 62 .12 68 .94 –

K 2 = 4 66 .67 73 .48 73 .48 71 .97 –

K 2 = 5 73 .48 75 .76 76 .52 81 .82 –

K 2 = 6 75 .00 78 .03 77 .27 84 .09 –

K 2 = 7 83 .33 84 .85 78 .79 87 .88 84 .09

K 2 = 8 87 .12 88 .64 86 .36 89 .39 –

Fig. 5. Plots of the most representative simulation profiles (I). Red lines mark times when an incident was triggered, and the green line indicates the moment when the

mission preparation phase finishes and the execution phase starts. Each subplot contains the evolution of the six performance measures (S, A, At, C, Ag, P) for comprising a

simulation profile. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

clusters, ranging from 3 to 8, exactly the same range of values used

in the last step of the proposed methodology. Note that the results

of the proposed methodology for values of K 2 different than K 2 = 7

have a low interest, since these values were not chosen as best in

the internal validation process.

From the results we can appreciate that, on a total of six occa-

sions, some other clustering result has surpassed the P-Acc value

obtained with respect to the proposed methodology. Also, it can be

noted that in general, the Mean Frechet distance is the most suited

for this experiment, probably because of the nature of the perfor-

mance measures defined for this specific experimental setup. How-

ever, these results have to be analysed with the sights set on a big-

ger picture, due to the proposed methodology is clearly intended

for being applied in any simulation environment. Thus, achieving a

P − Acc value of 84.09% is clearly above the mean accuracy for the

rest of the methods, and this is achieved without the need of se-

lecting any parameter a priori. In fact, since the proposed method-

ology is scalable, it may be the case that adding more cluster-

ing methods or more dissimilarity metrics to the first step of the

methodology would lead to an increase of the accuracy. In con-

clusion, summarizing the pros and cons of applying this the pro-

posed methodology, we conclude that this methodology is quite

accurate and interesting for open and new environments where

the nature of the time series is unknown, and though, one does

not know a priori which clustering configuration is optimal for the

problem.

Page 96: Automated Methods for the Evaluation and Analysis of

116 V. Rodríguez-Fernández et al. / Expert Systems With Applications 70 (2017) 103–118

Fig. 6. Plots of the most representative simulation profiles (II). Red lines mark times when an incident was triggered, and the green line indicates the moment when the

mission preparation phase finishes and the execution phase starts. Each subplot contains the evolution of the six performance measures (S, A, At, C, Ag, P) for comprising a

simulation profile. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

6. Discussion - analysis of the most representative simulation

profiles

Generally, the knowledge of a domain expert is required when

developing a cluster analysis, specially when the clusters represent

time series data. In our case, due to our experience with the simu-

lation environment DWR gained from previous works ( Rodriguez-

Fernandez et al., 2015a ), we are able to carry out an analysis of the

most Representative Simulation Profiles (RSPs) obtained by apply-

ing the methodology proposed in this work. In other works, when

static profiles were used, this analysis was automated by using a

set of Fuzzy Control Systems ( Rodríguez-Fernåndez, Menéndez, &

Camacho, 2016 ), but due to the complexity introduced in this pa-

per by the temporal nature of the performance measures, the anal-

ysis is carried out by observing in detail each of the profiles.

Figs. 5 and 6 show a grid with the evolution of the performance

measures defined in work for each RSP found in this experiment.

In order to facilitate the analysis, red lines mark the instants when

some incidents are triggered, and a green line marks the moment

when the operator started to accelerate the simulation speed for

the first time, denoting the end of the mission preparation phase

and the beginning of the mission execution phase . During the mis-

sion preparation phase, the simulation is paused, and the oper-

ator can spend some time overviewing the scenario and making

changes in the initial mission plan.

Based on the plots of Figs. 5 and 6 , the RSPs are described as

follows:

1. Passive Monitoring ( Fig. 5 a): This simulation profile features a

constant level of attention once the mission preparation phase

has ended. This means that there have been scarcely a few in-

teractions during the mission execution phase. Thus, all the per-

formance measures which depend directly on the interactions

will remain constant. Incident times are close to each other,

which means that the simulation speed set to start the mis-

sion execution phase is high. Despite all this, the Score does

not decrease until the end of the mission, which suggests that

operators within this simulation profile trust in the pre-loaded

mission plan in order to detect all targets.

2. Aggressiveness to overcome incidents ( Fig. 5 b): This simulation

profile features a type of aggressive operation. After a soft mis-

sion preparation, only dedicated to overview the map (no paths

are changed because aggressiveness marks 0), the simulation

begins with high speed, and to overcome the incidents, the

paths of the UAVs are completely redesigned (maximum ag-

gressiveness). The rest of the simulation maintains the path of

one single UAV, ensuring that it detects all targets. The mission

finishes with maximum score, which indicates that all targets

have been detected and none of the UAVs were destroyed.

3. Well-balanced behaviour, cautious and relaxed after incidents ( Fig.

5 c): Unlike the previous simulation profile, this one features a

Page 97: Automated Methods for the Evaluation and Analysis of

V. Rodríguez-Fernández et al. / Expert Systems With Applications 70 (2017) 103–118 117

more relaxed behaviour in terms of the way the operator acts

to solve the incidents. The simulation speed is set lower, due

to the wide time intervals between subsequent incidents, and

whenever something alters the simulation, only a few and soft

interactions, possibly waypoint editions, are performed, main-

taining the cooperation among all the UAVs in the mission. The

result of this behaviour in terms of score is also achieving the

maximum score possible, but this time the process is made

without taking drastic decisions and having into account all the

available UAVs and resources.

4. No waypoint interactions ( Fig. 5 d): This simulation profile is no-

table for having no interactions with the paths of the UAVs.

Operators within this profile just monitor the mission execu-

tion during short periods of time, and abort the mission when

some of the UAVs are lost. This is suggested by the decrease of

the Score metric just before the end of the simulation time.

5. Fast operations ( Fig. 6 a): This simulation profile represents fast

operations where the mission preparation phase is practically

nonexistent. At the beginning of the simulation execution,

when all incidents are triggered, the operator tries to manage

them by altering as little as possible the path of each of the

UAVs.

6. Increasing Agility, Constant single-UAV focus ( Fig. 6 b): In this sim-

ulation profile, we see how the agility metric constantly in-

creases over time, which indicates that the operator is gradu-

ally taking control of the mission and that allows him/her to

speed up the simulation speed. Also, it can be seen that from

the very beginning of the mission preparation phase that the

cooperation metric goes down drastically, suggesting that the

focus of the control is always located on one single UAV.

7. Cautious, passive before incidents, single UAV-focus : ( Fig. 6 c): This

profile is very similar to the one from Fig. 5 c, except for the fact

that in this case, the precision measure maintains low values

during all the simulation, which is a sign of passivity before

alerts.

7. Conclusions and Future Work

This work presents an analytical methodology based on time

series clustering to extract representative simulation profiles from

UAVs operators during their training processes. Assuming that we

have defined the profile of a simulation as a multivariate time se-

ries composed of the evolution of several performance measures,

the proposed methodology begins by grouping the data for each

measure separately, validating different clustering configurations.

The clustering results for each measure are used to define the sim-

ilarity between two simulation profiles, which is used in a last

medoid-based clustering process to extract the most representative

profiles.

This methodology has been applied in a lightweight multi-

UAV environment where a total of 6 performance measures com-

prises the profile of a simulation. To evaluate the results, a human

judgement-based ground truth dataset has been created by asking

users to rate the similarity between pairs of time series. The re-

sults obtained from the experimentation show that the proposed

methodology gets good accuracy scores, specially from a general

perspective, due to the scalability offered to the use of different

time series metrics and clustering methods. Furthermore, the dif-

ferent representative profiles obtained in the experimentation have

been qualitatively analysed, according to the decisions that opera-

tors take during a training session. This shows how this methodol-

ogy can be applied to describe real cases, where the performance

needs to be evaluated with a high granularity level.

The future work will be focused on: 1. Applying the proposed

methodology in different simulation environments using different

performance measures, and verify objectively the scalability of the

solution. 2. Trying to use this information to predict significant re-

ductions in the performance of an operator. 3. Developing different

performance measures, ensuring that all of them offer valuable in-

formation. 4. Assessing the evolution of an operator not only dur-

ing a single simulation, but during a whole training process. 5. Ex-

tending this methodology to large data using robust methods for

missing values problems such as P-splines ( Iorio et al., 2016 ).

Acknowledgements

This work has been supported by the next research projects:

EphemeCH (TIN2014-56494-C4-4-P) Spanish Ministry of Econ-

omy and Competitivity, CIBERDINE S2013/ICE-3095, both under

the European Regional Development Fund FEDER, SeMaMatch

EP/K032623/1 and Airbus Defence & Space (FUAM-076914 and

FUAM-076915). The authors would like to acknowledge the sup-

port obtained from Airbus Defence & Space, specially from Savier

Open Innovation project members: José Insenser, Gemma Blasco,

Juan Antonio HenrĂ­quez and CĂ©sar Castro.

References

Al-Subaihin, A. A. , Sarro, F. , Black, S. , Capra, L. , Harman, M. , Jia, Y. , et al. (2016). Clus- tering mobile apps based on mined textual features. In Proceedings of the 10th

ACM/IEEE International Symposium on Empirical Software Engineering and Mea- surement (pp. 38:1–38:10). ACM .

Baker, F. B. , & Hubert, L. J. (1975). Measuring the power of hierarchical cluster anal- ysis. Journal of the American Statistical Association, 70 (349), 31–38 .

Bartko, J. J. (1966). The intraclass correlation coefficient as a measure of reliability.

Psychological reports, 19 (1), 3–11 . Begis, G. (20 0 0). Adaptive gaming behavior based on player profiling. US Patent

6,106,395. Boussemart, Y. , & Cummings, M. L. (2011). Predictive models of human supervisory

control behavioral patterns using hidden semi-markov models. Engineering Ap- plications of Artificial Intelligence, 24 (7), 1252–1262 .

Brock, G. , Pihur, V. , Datta, S. , & Datta, S. (2008). clvalid: An r package for cluster

validation. Journal of Statistical Software, 25 (1), 1–22 . Cali nski, T. , & Harabasz, J. (1974). A dendrite method for cluster analysis. Communi-

cations in Statistics-theory and Methods, 3 (1), 1–27 . Cohen, J. (1968). Weighted kappa: Nominal scale agreement provision for scaled

disagreement or partial credit. Psychological Bulletin, 70 (4), 213–220 . Cooke, N. J. , Pedersen, H. K. , Connor, O. , Gorman, J. C. , & Andrews, D. (2006). 20. Ac-

quiring team-level command and control skill for UAV operation. Human Factors

of Remotely Operated Vehicles, 7 , 285–297 . Datta, S. , & Datta, S. (2003). Comparisons and validation of statistical clustering

techniques for microarray gene expression data. Bioinformatics, 19 (4), 459–466 . De Maesschalck, R. , Jouan-Rimbaud, D. , & Massart, D. L. (20 0 0). The mahalanobis

distance. Chemometrics and Intelligent Laboratory Systems, 50 (1), 1–18 . Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from in-

complete data via the EM algorithm. Journal of the Royal Statistical Society. Series

B (Methodological), 39 (1), 1–38. doi: 10.2307/2984875 . Drury, J. L. , Scholtz, J. , & Yanco, H. A. (2003). Awareness in human-robot interactions.

In Systems, man and cybernetics, 2003. IEEE international conference on: vol. 1 (pp. 912–918). IEEE .

Eiter, T. , & Mannila, H. (1994). Computing discrete Fréchet distance. Technical Report . Citeseer .

Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psycho-

logical Bulletin, 76 (5), 378–382 . Halkidi, M. , Batistakis, Y. , & Vazirgiannis, M. (2001). On clustering validation tech-

niques. Journal of Intelligent Information Systems, 17 (2), 107–145 . Hennig, C. (2010). fpc: Flexible procedures for clustering. R Package Version, 2 , 0–3 .

Hennig, C. , & Liao, T. F. (2013). How to find an appropriate clustering for mixed-type variables with application to socio-economic stratification. Journal of the Royal

Statistical Society: Series C (Applied Statistics), 62 (3), 309–369 .

Hruschka, E., Campello, R., Freitas, A., & de Carvalho, A. (2009). A survey of evolu- tionary algorithms for clustering. Systems, Man, and Cybernetics, Part C: Applica-

tions and Reviews, IEEE Transactions on, 39 (2), 133–155. doi: 10.1109/TSMCC.2008. 2007252 .

Iorio, C. , Frasso, G. , DAmbrosio, A. , & Siciliano, R. (2016). Parsimonious time series clustering using p-splines. Expert Systems with Applications, 52 , 26–38 .

Kaufman, L. , & Rousseeuw, P. (1987). Clustering by means of medoids . Kaufman, L. , & Rousseeuw, P. J. (2009). Finding groups in data: An introduction to

cluster analysis : vol. 344. John Wiley & Sons .

Kendall, M. G. , & Smith, B. B. (1939). The problem of m rankings. The annals of mathematical statistics, 10 (3), 275–287 .

Larose, D. T. (2005). Discovering knowledge in data . John Wiley & Sons . Lavra c, N. (1999). Selected techniques for data mining in medicine. Artificial Intelli-

gence in Medicine, 16 (1), 3–23 .

Page 98: Automated Methods for the Evaluation and Analysis of

118 V. Rodríguez-Fernández et al. / Expert Systems With Applications 70 (2017) 103–118

Lemaire, T. , Alami, R. , & Lacroix, S. (2004). A distributed tasks allocation scheme in multi-uav context. In Robotics and automation, 2004. proceedings. ICRA’04. 2004

IEEE international conference on: 4 (pp. 3622–3627). IEEE . Liao, L. , Patterson, D. J. , Fox, D. , & Kautz, H. (2007). Learning and inferring trans-

portation routines. Artificial Intelligence, 171 (5), 311–331 . Liao, T. W. (2005). Clustering of time series data a survey. Pattern Recognition, 38 (11),

1857–1874 . Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis Lectures on Human

Language Technologies, 5 (1), 1–167 .

von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17 (4), 395–416. doi: 10.10 07/s11222-0 07- 9033- z .

Macqueen, J. B. (1967). Some methods of classification and analysis of multivari- ate observations. In Proceedings of the fifth berkeley symposium on mathematical

statistics and probability (pp. 281–297) . McCarley, J. S., & Wickens, C. D. (2004). Human factors concerns in UAV flight.

Urbana-Champaign, IL: Institute of Aviation, University of Illinois Retrieved Octo-

ber 30, 2016 from the World Wide Web: http://citeseerx.ist.psu.edu/viewdoc/ download?doi=10.1.1.551.6883&rep=rep1&type=pdf .

McKinley, R. A. , McIntire, L. K. , & Funke, M. A. (2011). Operator selection for un- manned aerial systems: Comparing video game players and pilots. Aviation,

Space, and Environmental Medicine, 82 (6), 635–642 . MenĂ©ndez, H. , Bello-Orgaz, G. , & Camacho, D. (2013). Extracting behavioural mod-

els from 2010 fifa world cup. Journal of Systems Science and Complexity, 26 (1),

43–61 . MenĂ©ndez, H. D. , Vindel, R. , & Camacho, D. (2014). Combining time series and clus-

tering to extract gamer profile evolution. In Computational collective intelligence. technologies and applications (pp. 262–271). Springer .

Milligan, G. W. , & Cooper, M. C. (1985). An examination of procedures for determin- ing the number of clusters in a data set. Psychometrika, 50 (2), 159–179 .

Montero, P. , & Vilar, J. A. (2014). TSclust: An R Package for Time Series Clustering.

Journal of Statistical Software, 62 (1), 1–43 . Navarro, J. F. , Frenk, C. S. , & White, S. D. (1997). A universal density profile from

hierarchical clustering. The Astrophysical Journal, 490 (2), 493 . Pak, A. , & Paroubek, P. (2010). Twitter as a corpus for sentiment analysis and

opinion mining. In Proceedings of the Seventh International Conference on Lan- guage Resources and Evaluation (LREC’10) (pp. 1320–1326). European Language

Resources Association (ELRA) .

Pang, B. , & Lee, L. (2005). Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd annual

meeting on association for computational linguistics (pp. 115–124). Association for Computational Linguistics .

Penrose, L. S. (1952). Distance, size and shape. Annals of Eugenics, 17 (1), 337–343 . Pereira, E. , Bencatel, R. , Correia, J. , FĂ©lix, L. , Gonçalves, G. , Morgado, J. , et al. (2009).

Unmanned air vehicles for coastal and environmental research. Journal of Coastal Research, 56 (56), 1557–1561 .

Piciarelli, C. , Micheloni, C. , & Foresti, G. L. (2008). Trajectory-based anomalous event

detection. IEEE Transactions on Circuits and Systems for Video Technology, 18 (11), 1544–1554 .

Portnoy, L. , Eskin, E. , & Stolfo, S. (2001). Intrusion detection with unlabeled data using clustering. In proceedings of ACM CSS workshop on data mining applied to

security (DMSA-2001) . Citeseer . RemĂžy, H. T. (2010). Out of office: A study on the cause of office vacancy and transfor-

mation as a means to cope and prevent . IOS Press .

RodrĂ­guez-FernĂĄndez, V. , Gonzalez-Pardo, A. , & Camacho, D. (2015). Modeling the behavior of unskilled users in a multi-UAV simulation environment. In Intelli-

gent data engineering and automated learning–ideal 2015 (pp. 4 41–4 48). Springer . Rodriguez-Fernandez, V. , Menendez, H. D. , & Camacho, D. (2015a). Design and de-

velopment of a lightweight multi-uav simulator. In Cybernetics (cybconf), 2015 IEEE 2nd international conference on (pp. 255–260). IEEE .

Rodríguez-Fernåndez, V. , Menéndez, H. D. , & Camacho, D. (2015b). User profile anal-

ysis for UAV operators in a simulation environment. In Computational collective intelligence (pp. 338–347). Springer .

Rodríguez-Fernåndez, V. , Menéndez, H. D. , & Camacho, D. (2016). Automatic profile generation for UAV operators using a simulation-based training environment,

Progress in Artificial Intelligence: 5(1) (pp. 37–46) . Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and vali-

dation of cluster analysis. Journal of Computational and Applied Mathematics, 20 ,

53–65 . Schaeffer, S. E. (2007). Graph clustering. Computer Science Review, 1 (1), 27–64 .

Yang, J. , & Leskovec, J. (2011). Patterns of temporal variation in online media. In Proceedings of the fourth ACM international conference on web search and data

mining (pp. 177–186). ACM .

Page 99: Automated Methods for the Evaluation and Analysis of

PUBLICATION 2

Modelling Behaviour in UAV Operations Using Higher OrderDouble Chain Markov Models

ïżœ Rodrıguez-Fernandez, Vıctor, Antonio Gonzalez-Pardo, and David Camacho. 2018.“Modelling Behaviour in UAV Operations Using Higher Order Double Chain Markov Mod-els.” IEEE Computational Intelligence Magazine 12 (4): 28–37. DOI: 10.1109/MCI.2017.2742738

– State: In Press. Published Online.

– Impact Factor (JCR 2017): 6.611

– Category:

* Computer Science, Artificial Intelligence. Rank: 9/132 [Q1/D1].

– Contribution of the PhD candidate:

* First author of the article.

* Co-authoring in the conception of the presented idea.

* Design and implementation of the proposed method.

* Author of the simulation environment and dataset.

* Design and execution of the experiments.

* Co-authoring in the interpretation and discussion of results.

* Writing of most of the manuscript and design of the figures.

85

Page 100: Automated Methods for the Evaluation and Analysis of

28 IEEE ComputatIonal IntEllIgEnCE magazInE | novEmbEr 2017 1556-603x/17©2017IEEE

Modelling Behaviour in UAV Operations Using Higher Order Double Chain Markov Models

Digital Object Identifier 10.1109/MCI.2017.2742738Date of publication: 12 October 2017

Abstract—Creating behavioural models of human operators engaged in supervisory control tasks with UAVs is of great value due to the high cost of operator failures. Recent works in the field advocate the use of Hidden Markov Models (HMMs) and derivatives to model the operator behaviour, since they offer interpretable patterns for a domain expert and, at the same time, provide valuable predictions which can be used to detect abnormal behaviour in time. However, the first order Markov assumption in which HMMs rely, and the assumed independence

VĂ­ctor RodrĂ­guez-FernĂĄndez, Antonio Gonzalez-Pardo and David CamachoDepartment of Computer Science, Universidad AutĂłnoma de Madrid (UAM), Madrid, Spain

Corresponding authors: Victor RodrĂ­guez-FernĂĄndez, David Camacho (Email: victor [email protected], [email protected])

©istockphoto.com/ftgipsy, ©istockphoto.com/oktay ortakcioglu

Page 101: Automated Methods for the Evaluation and Analysis of

novEmbEr 2017 | IEEE ComputatIonal IntEllIgEnCE magazInE 29

between the operator actions along time, limit their modelling capabilities. In this work, we extend the study of behavioural modelling in UAV operations by using Double Chain Markov Models (DCMMs), which provide a flexible modelling framework in which two higher order Markov Chains (one hidden and one visible) are combined. This work is focused on the development of a process flow to rank and select DCMMs based on a set of evaluation measures that quantify the predictability and interpretability of the models. To evaluate and demonstrate the possibilities of this modelling strategy over the classical HMMs, the proposed process has been applied in a multi-UAV simulation environment.

I. Introduction

In the last decade, the study on Unmanned Aerial Vehicles (UAVs) has grown considerably and it is expected to grow even more by 2020 [1]. This growth is produced due to the interest of both the industry and the research commu-

nity. On the one hand, the different potential applications of UAVs, such as infrastructure inspection, monitoring coastal zones, traffic and disaster management, agriculture or forestry have attracted the interest of the industry [2]. On the other hand, the research community is also interested in UAVs due to the challenging problems that must be faced from different fields such as Machine Learning, Automated Planning, Multiagent Systems, Simulation, or Computer Vision among others.

The role of a UAV operator is a critical aspect in this type of systems due to the high costs involved in any real mission. Thus, UAV operators are hardly trained using simulation envi-ronments, where they are asked to face different situations and alerts to get used to them, and thus, to be able to solve the situ-ation successfully in a real scenario [3]. However, the increasing use of UAVs has not been met with appropriate integration of training science [4]. Although several researchers are contribut-ing to improve the effectiveness UAV training methods [5], the expectations of UAV growth have raised alarm among experts, due to the current lack of tools capable of evaluating and ana-lysing the performance of operators on a large, or even mas-sive, scale.

In this regard, some recent studies have examined the possi-bility of supporting UAV training instructors with models that exploit operator behavioural patterns among operators and provide alerts when likely anomalous behaviours are detected. Thus, the works of Boussemart et al. [6] set the usage of Hid-den Markov Model Hidden Markov Models (HMMs) and Hidden Semi-Markov Models (HSMMs) as the state of the art in modelling and predicting knowledge-based tasks such as supervisory control of UAVs. The application of these models can be performed due to their hidden-visible structure, which allows to infer underlying cognitive processes from the patterns of visible events extracted from the operator interactions in training contexts. Nevertheless, there are two main drawbacks in the modelling process proposed in these works. On the one hand, these models rely on the first order Markov assumption, which implies memoryless transitions from one hidden state to

another, limiting their predictive capabilities and the possibili-ties to discover more complex and long patterns when inter-preting the models. On the other hand, HMMs make the assumption of conditional independence between the visible observations, which is unlikely to hold for UAV operators. As an example, an operator may maintain the same cognitive (or hid-den) state “Replanning mission” for a time, and meanwhile fol-low a typical interaction (or visible) pattern “Select UAV-Add Waypoint” that must be captured outside the hidden chain.

This work extends the study of behavioural modelling in UAV operations by using a flexible fully Markovian model called the Double Chain Markov Model (DCMM). The main characteristic of this model (proposed by Berchtold in [7]) is that it combines two Markov chains: an observed non-homo-geneous Markov chain and a hidden homogeneous one, whose state at each time decides the transition matrix used in the vis-ible process. Furthermore, this work studies higher order Markovian dependencies among both chains in the model, and it presents a process flow to find the best fit DCMMs for a particular training dataset. For the selection of the best DCMM, the process flow presented in this work uses Rank Aggregation techniques that look for a trade off between the predictability and interpretability of the model. These two crite-ria are extremely relevant when working with Markovian models. On the one hand, predictability represents the ability to forecast the following output of the visible chain given the previous ones. On the other hand, interpretability represents a feature that allows any reader to understand and/or interpret a given model.

To test the application of DCMMs in this context, a multi-UAV simulation environment has been used. This simulation environment has been previously used in some of our previous work to build Markovian-based behavioural models [8]–[10].

The main contributions of this paper can be summarized as follows:

❏ The application of DCMMs to model behaviour not only in UAV operations, but in the context of Human Supervi-sory Control systems.

❏ The development of a process flow for finding the best fit DCMM. This process is based on the requirements of UAV instructors that look for models that are easily interpretable and also provide fair predictions.

❏ The definition of predictability and interpretability of a DCMM based on a set of evaluation measures, some of which are a novelty in this work.

❏ The application Rank Aggregation techniques for aggregat-ing and sorting the performance of DCMMs.The rest of the paper is structured as follows: next section

introduces the theoretical backgrounds on DCMMs. In Section III we detail the process flow proposed for fitting this type of models for UAV operations, from the data preprocess-ing to the model selection. Then, in Section IV we introduce the different evaluation measures that quantify the predictabili-ty and the interpretability of a DCMM, which are later used in the model selection. Section V applies the proposed

Page 102: Automated Methods for the Evaluation and Analysis of

30 IEEE ComputatIonal IntEllIgEnCE magazInE | novEmbEr 2017

methodology to a multi-UAV simulation environment and finally, Section VI concludes the paper with some discussions and future research lines of work.

II. BackgroundThis section provides the backgrounds related to Higher Order Double Chain Markov Models and Rank Aggregation tech-niques. Both techniques constitute the basic concepts of this work.

A. Double Chain Markov Models (DCMMs)HMMs are one of the most popular Markovian models. They are stochastic models used to model and predict time series. In any HMM there is a sequence of hidden discrete states that fol-lows a Markov process, i.e., the current state Xt^ h only depends on the previous state (Xt 1- ) and not on earlier states.

The term hidden refers to the fact that the variable, or state ,Xt is not observed directly. Any HMM contains an observable

random variable Yt taking values in the set , ,K1 f . At any time ,t it is possible to observe the value of Yt but the value of Xt is

hidden. The successive observations of Yt are independent from each other and there is a different probability distribution of the observed variable Yt associated to each possible state .Xt

DCMMs extend the classical HMMs, making the assump-tion that the observed variable Yt also follows a Markov pro-cess. Hence the name of the model, because there are two different levels of Markov chains: the first one affecting the hidden random variable ,Xt and the second one affecting the observable random variable .Yt Each hidden state in a DCMM has its own visible Markov chain.

This paper is focused on the study of higher order extensions of a DCMM. Higher order DCMMs are a generalization of DCMMs where there are two parameters to determine the number of the hidden variables l 1$^ h and the number of observed variables ,f 1$^ h which affect to the current hidden variable Xt^ h and the current observed variable .Yt^ h Both parameters l and f define the order of the hidden and visible chains, respectively. For this reason any DCMM in this work can be described as a triple ( , , )M l fn = , made up by three hyperpar-meters, where M represents the number of hidden states, l rep-resents the order of the hidden chain, and f represents the order of the visible chain. It is important to note that a classical

first-order HMM is a specific case of DCMMs where l 1= and .f 0= For the sake of simplicity, in this paper we will refer to

DCMMs assuming that they can be of higher orders. Figure 1(a) shows a HMM, a DCMM can be observed in Figure 1(b), whereas Figure 1(c) provides a representation of a High Order DCMM with f 3= and .l 2= As it can be observed in the lat-ter, the current state Xt depends on the three previous states ( ,X Xt t1 2- - and )Xt 3- and the current observed variable depends only on the last two variables (Yt 1- and .)Yt 2-

One of the problems related to DCMMs is the estimation of all the parameters included in the initial state probability matrix, the transition probability matrix for the hidden chain, and the transition probability matrix for the visible chain of each state. The estimation of these matrices is a complex task that is usually performed by iterative algorithms such as the popular Baum Welch algorithm [11]. This approach is fast and ensures a solution, but reaching the global optimum is hard. In this work, the estimation of a DCMM is carried out by a Memetic Algorithm (MA) [12], which hybridizes a Genetic Algorithm (GA) with a local search procedure (the Baum-Welch algorithm), in order to expand the search space. More details about the estimation algorithm for DCMMs used in this work can be found at [13].

B. Rank AggregationRank Aggregation can be defined as the task of combining sev-eral individual sorted lists (also called base rankers) in order to generate a better one which aggregates information from each of the input lists [14]. In recent years, Rank Aggregation tech-niques have become more sophisticated and have been used in different areas such as web meta-searching [15] or bioinformat-ics [16] among others.

Following the notation of [17], Rank Aggregation can be expressed as an optimization problem, where one would like to find an optimal ranking which would be as “close” as possible to all the base rankers simultaneously. In terms of an objective function, this can be formulated as follows:

( ) ( , ),d Li ii

m

1

d ~ dU ==

/ (1)

where L i is the ith base ranker, d is a proposed aggregated ranking of length ,Li i~ is the importance associated to the

HiddenStates

Observations

X1 X2 X3

Y1 Y2 Y3

X1 X2 X3

Y1 Y2 Y3

Xt–3

Yt–3

Xt–2

Yt–2

Xt–1

Yt–1

Xt

Yt

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

(a) (b) (c)

FigUre 1 Figure (a) shows a basic representation of a first-order HMM (with l = 1 and f = 0). Figure (b) represents a Double Chain Markov Model with order 1 in both chains, i.e. l = 1 and f = 1. Figure (c) represents a High Order Double Chain Markov Models, with l = 3 and f = 2.

Page 103: Automated Methods for the Evaluation and Analysis of

novEmbEr 2017 | IEEE ComputatIonal IntEllIgEnCE magazInE 31

base ranker ,L i and d is a distance function between d and .L i Using previous formulation, the goal of any Rank Aggregation method can be defined as finding a ranking d) that minimizes the function ,U i.e.:

( ) .arg mind dU=)d

(2)

Two popular distance functions are typically used to mea-sure the distance between rankings [18]: The Kendall distance (K), which counts the number of pairwise disagreements between two lists [19], and the Spearman footrule distance, which  can be computed at linear time for two lists [20]. According to [19], [21], the Kendall distance can be approxi-mated via the Spearman footrule distance, so in this work we will focus on the latter, due to its higher simplicity and compu-tational efficiency.

There are situations in which not only the ranking posi-tion of an element is given, but also a weight associated to it (Weighted Rank Aggregation). The qualitative difference in terms of ranks has an underlying quantitative difference too. Let ( ), , ( )W W k1i if be the scores associated with the ranking ,L i where ( )W 1i is the best score (maximum or minimum

depending on the context), ( )W 2i is the second best, and so on. To incorporate the score information into the distance function, the Weighted Spearman footrule distance is defined in [17] as:

( , ) ( ( ) ( ( )) ( ) ( ) ,WS L W r t W r t r t r tiL

t L

Li

i

i#d = - -,!

d

d

d/ (3)

where ( )r tLi is the position of element t in the rank .L i This metric can be understood in terms of sum of penalties for moving an arbitrary element t from the position ( )r td to the position ( ),r tLi adjusted by the difference in scores between the two positions.

The Rank Aggregation obtained by optimizing the Spear-man footrule distance is also called footrule optimal aggregation and it is identified as an NP-hard problem for partial lists and rankings [21]. Thus, different optimization algorithms have tried to tackle this problem, such as the Cross-Entropy Monte Carlo algorithm [22] and GAs [23].

In this work, a GA is used to solve the optimization problem related to Rank Aggregation. More information about how to apply GAs in this problem can be found in [17], [23].

III. Finding the best DCMM for Human Supervisory Control Analysis in UAV Training OperationsThis section provides a detailed description of the whole pro-cess flow designed to find the most suitable DCMM for mod-elling the behaviour of a set of UAV operators engaged in a training mission. This training mission is performed under the supervision of a domain expert, known as the instructor of the operation. Below are described the different parts of the pro-cess, which are shown in Figure 2.

A. Extracting Interaction SequencesThe first issue to overcome when modelling through the use of DCMMs is related to adapt the input data, i.e. mapping it into sequences of observation symbols that a DCMM can process. In our case, the data consists in a set of logs resulting from a training operation. For every operator participating in the training program, the resulting log is processed, and a sequence of operator interactions Y Y Y YT1 2f= is extracted. For example, if any specific operator changes two given way-points (waypoint 1 and waypoint 2) and specifies that the UAV must go to waypoint 1, the value of Y could be: Y = {MoveWayPoint MoveWayPoint GoToWayPoint}. Every interaction Yt is characterized as a numerical identifier taking values in the finite set , , ,K1 f" , where K is the num-ber of all possible interactions that the instructor wants to analyse. The value of K is usually low in common UAV control systems, which makes easier the creation of behavioural models [6].

As in every problem involving predictive modelling, the interaction sequences are divided into a training/test set, which pass to the next steps (See Figure 2, A).

B. Learning DCMMsIn the parameter estimation algorithm used to learn a single DCMM (See Section II), there are three hyperparameters that must be fixed:1) ,M the number of hidden states of the model.2) ,l the order of the hidden chain.3) ,f the order of the visible chain (0 for a classical HMM).

The choice of these hyperparameters is crucial since they define the topology and the complexity of the model. Although in many applications they are known in advanced [24], in this work we do not rely on any prior knowledge to create the models. For this reason, we train DCMMs along the entire M l f# # search space. Once the training process is finished, those models that best fit our data, in terms of a set of evaluation measures that will be described in the next sec-tion, are selected. This method of hyperparameter optimiza-tion is commonly known as grid search [25], and it is only feasible when the domain range of each hyperparameter is not too broad. This is the case of this work, since both the number of  hidden states and the order of the model chains cannot be high in order to allow the model to be analysable by a UAV instructor.

C. Evaluating DCMMsGiven the richness of Markovian modelling and the numerous options provided in such a general model as the DCMM, it is essential to be able to compare the performance of different models. Generally, the use of the Bayesian Information Criterion (BIC) for comparison and selection is a standard procedure [26], since it provides a balance between the predictability and the interpretability of the model. In this work, we go beyond that idea of predictability-interpretability balance by setting the two concepts as families, which gather several evaluation measures that are designed to cover the corresponding concept in some way.

Page 104: Automated Methods for the Evaluation and Analysis of

32 IEEE ComputatIonal IntEllIgEnCE magazInE | novEmbEr 2017

In the context of this work, the need of the predictability-interpretability trade-off comes from the requirements of UAV instructors for future operations. Not only it is important that he/she is able to understand the behavioural patterns among the operators after a training session, but also it is critical that the model could be deployed during a real mission to generate online predictions of the future behaviour of the operator, so the instructor could detect possible operator failures in time [27]. This fits perfectly with the dual nature of DCMMs, which, on the one hand, can be seen as a forecasting black-box, and on the other hand, can be seen as a probabilistic graphical model that can be analysed.

All the predictability and interpretability measures used or developed for this work are explained in the next section. As it can be seen in Figure 2, C, we apply them over every DCMM ( , ,M ln f ) created in the previous step. By sorting the results on

each evaluation measure, we obtain one ranking of models for each measure, which will be used in the next processing step.

D. Weighted Rank AggregationLet LiP be the ranking of learnt DCMMs sorted by the ith pre-dictability measure, and Wi

P the weights of that ranking, i.e.,

the value of the measure for each of the models in .LiP Similar-ly, we have L j

I and WjI for the weighted ranking sorted by the

jth interpretability measure (See Figure 2, B-C).The problem is to find one ranking of size ,Q namely ,L)

which represents the optimal aggregation of the base rankings introduced above and thus, give the best Q DCMMs in terms of predictability and interpretability (See Figure 2, D). To do this, we apply Weighted Rank Aggregation techniques over the rankings LiP i" , and ,L j

Ij" , using the corresponding weights

WPii" , and Wj

Ij" , in the distance function between rankings

(See the Weighted Spearman Footrule Distance in Eq. (3)). The problem of Rank Aggregation is an NP-hard optimization problem, and here we will apply a GA in order to find an opti-mal solution.

One critical issue to address in this process is the impor-tance given to each of the base rankings LiP and ,L j

I i.e., how much or how little an evaluation measure will influence on the aggregated result. This corresponds to the parameter ~ in the objective function (See Eq. 1), and in this work, we deter-mine this value based on two factors: On the one hand, the dispersion of the weight vector Wi associated to each ranking, which is computed in terms of the Coefficient of Variation

A) ExtractingInteractionSequences

Operators

Instructor

TestSequences

TrainingSequences

B) LearningDCMMs

C) EvaluatingDCMMs

D) WeightedRank

AggregationPredictability Importance

Pre

dict

abili

tyM

easu

res

Inte

rpre

tabi

lity

Mea

sure

s

Interpretability Importance

Top-QDCMM

Ranking(L∗)

1. ” (2, 2, 2)2. ” (1, 3, 1)

” (M0, I0, f0)

K. ” (5, 1, 3). . .

Weighted Rankings

Weighted RankingsM

I

f

LPi LP(1)i LP(2)i LP(M × I × f )i

WP(M × I × f )iWPi WP(1)i WP(2)i

...

...

...

...

Llj Ll (1)j Ll (2)j Ll (M × I × f )j

W l (M × I × f )jW lj W l (1)j W l (2)j

...

...

...

...

FigUre 2 General scheme of the analysis process developed in this work to find the best DCMM for modelling behaviour in a training operation with UAVs.

Page 105: Automated Methods for the Evaluation and Analysis of

novEmbEr 2017 | IEEE ComputatIonal IntEllIgEnCE magazInE 33

[28]. This measure is scale invariant and dimensionless, and it is defined as:

( ) ( )/ ,C x x xV v= r (4)

where ( )xv is the standard deviation of x and xr is the mean. Thus, if the weights of a specific evaluation measure are very similar in every learnt DCMM, the corresponding ranking will have less effect in the aggregation process. On the other hand, the needs of the instructor in the analysis may cause that the predict-ability is given more importance than the interpretability, and vice versa. Thus, the importance of each ranking is expressed as:

( ) ( ), ( ) ( ) ( ),L C W L C W1iP

V iP

jI

V jI

~ a ~ a= = - (5)

where [ , ]0 1!a represents the predictability importance (and thus, ( )1 a- the interpretability importance). At the end of this step, an aggregated ranking of size ,Q ,L) is obtained, ready to be analysed by the UAV instructor in order to extract conclusions about the operator behavioural patterns, and to select a final model to be deployed for predicting behaviour in a real mission.

IV. Evaluation measures for a DCMMAs it was introduced in the previous section, we make use of several evaluation measures to quantify the predictability and the interpretability of a DCMM separately. These measures enrich the evaluation process and provide opportunities to higher order models, which are excessively penalized by the BIC measure. In Table 1, a summary of the evaluation measures designed for this work are shown.

A. Predictability measuresThis family of evaluation measures focuses on the capabilities of a DCMM to predict and recognize sequence of observa-tion symbols.

1) Sequence likelihood (Sl)Let Y Y Y YT1 2f= be a test sequence of observations (not used in the model training), the sequence likelihood is defined as the probability that a given model n can generate that sequence, i.e., ( | ) .SL P Y n= This probability can be com-puted by the so-called forward-backward dynamic program-

ming algorithm [11]. Normally, since this algorithm computes products of low probabilities close to zero, a logarithmic scale is used.

2) accuracy of generated predictions (agp)Based on the Model Accuracy Score applied in [6] to evaluate Hidden Semi-Markov Models, this measure is intended to evaluate the model “in action”, generating predictions of what an operator will do next, and evaluating whether that prediction is correct based on the real interaction made by the operator.

First of all, we must define a method for generating a pre-diction of the next observation symbol expected. Assume we have received in the system part of the observation sequence. Let Y Y Y Yt1 2f= be the sequence of t symbols received so far. We must give a prediction of , , .Y K1t 1 f!+ " , Since the number K of possible observation symbols to receive is not usually high in UAV operations [6], it is reasonable to calculate the sequence likelihood of the future sequence Y Y Yt t1 1f + for every possible value of , , .Y K1t 1 f!+ " , The resulting vector of (log)-likelihoods kt

k K1 1t + =" , is normalized to ,0 16 @ and gives

the expected probability for each observation symbol. We will call it the expectation vector.

When the real observation symbol arrives, we can score our prediction. Best accuracy scores are assigned when the observed value Yt 1+ is the most probable in the expectation vector

.tk

kK

1 1t + =" , Otherwise, penalties to the accuracy score will be applied, in proportion to how we get away from the most expect-ed observation. Based on our previous works [10], we define the accuracy of the generated prediction at time t 1+ as:

( ) ( ) , , ,Y e Y K1AGP maxt t

PRt1 1 1

t tY

1 1t 1

f!t = t t+ +

- -+

+ ++^^ h h " , (6)

The more this score gets closer to 1, the more accurate the pre-diction has performed. Since we use an exponential function, high accuracies are only assigned when the expected probabili-ty of the observed value is very close to be the maximum pos-sible. The Penalty Rate (PR) is a fixed parameter that marks the steepness of the exponential function, which makes harder/easier to obtain good accuracy scores.

Finally, in order to calculate the AGP measure for a whole test sequence ,Y Y Y YT1 2f= we average the values of Eq. (6) for every time step , , , .t T1 2 f!

TABle 1 Summary of the evaluation measures used to assess the quality of a DCMM in this work. These measures are divided into two groups, namely predictability and interpretability, depending on the aspect of the model covered by each of them. The column “Best” indicates whether best models are achieved by maximizing (Max.) or minimizing (Min.) the value of the evaluation measure.

eVAlUATiON MeASUre rANge BeST

PreDiCTABiliTY SEQUENCE (LOG)-LIKELIHOOD (SL) [29] ,03-^ h MAX.

PreDiCTABiliTY ACCURACY OF GENERATED PREDICTIONS (AGP) [ , ]0 1 MAX.

PreDiCTABiliTY MINIMUM PRECISION OF GENERATED PREDICTIONS (MPGP) [ , ]0 1 MAX.

BOTH BAYESIAN INFORMATION CRITERION (BIC) [26] ( , )0 3 MIN.

iNTerPreTABiliTY COEFF. OF HIGH PROBABILITY HIDDEN TRANSITIONS (CHPHT) (0, 1) MAX.

Page 106: Automated Methods for the Evaluation and Analysis of

34 IEEE ComputatIonal IntEllIgEnCE magazInE | novEmbEr 2017

3) minimum precision of generated predictions (mpgp)In the field of predictive analytics, it is common to measure not only the accuracy, but also the precision of a model in order to validate whether all the possible predictable values are fairly predicted. In the context of this work, using a DCMM with high precision could lead to detect in time when the operator is going to perform a “rare” interaction.

For this reason, the MPGP measure is created to evaluate which of the possible K outputs of a DCMM is worst predict-ed in a test observation sequence. Predictions are generated using the same method of the AGP measure, but here, instead of scoring the expectation vector ,t

k Kk1 1t + =" , we just extract

the most expected observation from it and check whether it matches the actual observation .Yt 1+ Counting these matchings for every , , ,t T1 2 f! leads to a K × K confusion matrix

,CM where the precision for each possible observation symbol is computed as:

, , ,CM

CM k K1Precisionkkj

j

kk f!= " ,/ (7)

assuming that rows in CM denote predictions and columns denote actual values. Finally, the MPGP measure is computed as the minimum value of Eq. (7) in the set , , .K1 f" ,

B. Interpretability MeasuresThis family of evaluation measures intends to rate, automatical-ly, different aspects around the interpretability of a DCMM, once it has been fit to a specific training dataset.

1) bayesian Information Criterion (bIC)The Bayesian Information Criterion (BIC) has become very popular in the field of statistical modelling, specially for com-paring Markovian models [26]. It penalizes the likelihood of a model by a complexity factor proportional to number of parameters in the model and the number of training observations, so it gives advantages to simple and general models and avoids overfitting. It is defined as:

,log logL p2BIC x=- + (8)

where L is the likelihood of the model, p is the number of independent parameters and x is the number of components in the likelihood, i.e., the number of observations used to train the model. The less the BIC scores, the better the model is considered. This measure makes a balance between the predict-ability and the interpretability of a DCMM, and thus, we will include it in both perspectives for the model selection (See Table 1).

2) Coefficient of High probability Hidden transitions (CHpHt)This measure favours those models in which every hidden state (or sequence of hidden states for higher order chains) has a clear leading transition. Let ( , , )M l fn be a DCMM and An its

transition matrix. An is an M Ml l# matrix, in which each row represents the probability of moving to a new sequence of states from a sequence of l previous states. Let ( ) max Aa j iji n =

nl the maximum probability of each row of the transition matrix, then the Coefficient of High Probability Hidden Transitions (CHPHT) can be defined as:

( ) .M e1

11CHPHT ( )l l a TRT

i

M

1i

l

n =+ n- -

=l^ h/ (9)

As it can be seen, we have used a logistic function to score the values of ( ) maxa Ai j ijn =

nl , in order to favour more remarkably the highest probabilities and penalize the lowest. The Transition Relevancy Threshold (TRT) marks the sigmoid’s midpoint of the function and the order of the hidden chain, ,l denotes the steepness of the curve, so that the same value of ( )ai nl will be scored better in a higher order model. This is because high prob-ability transitions in higher order DCMMs usually offer more informative and longer patterns.

V. ExperimentationIn this section, the modelling strategy proposed in this work is applied to model the interactions extracted from a multi-UAV simulation environment. Below are described the simula-tion environment used, the experimental setup and the results obtained.

A. Drone Watch & RescueThe simulation environment used as test-bed for this work is called Drone Watch & Rescue (DWR), and its complete description can be found in [10]. DWR gamifies the concept of a multi-UAV mission, challenging the operator to capture all mission targets consuming the minimum amount of resources, while avoiding at the same time the possible incidents that may occur during a mission (e.g., danger areas, sensor breakdowns). To avoid these incidents, an operator in DWR can perform multiple interactions to modify both the UAVs in the mission and the waypoints composing their mission plan. These interac-tions are the following: 1) Change Control Mode (CCM); 2) Change UAV Path (CUP); 3) Modify Waypoints Table (MWT); 4) Select UAV (SU); 5) Change UAV Speed (CUS); and 6) Change Simulation Speed (CSS).

B. Experimental SetupIn this experiment, we will apply the process flow shown in Fig-ure 2 to the data extracted from DWR, from the point of view of an instructor who wants to obtain a behavioural model. Due to the lack of space, we will not qualitatively analyse the best DCMMs found, so we will focus in studying how changing the importance balance between the predictability and interpretabili-ty affects to the hyperparameters of the best DCMM found. Furthermore, we will check if the capabilities of DCMMs over the currently used HMMs are worthwhile when it comes to find better models.

The choice of the parameters involved in the process flow described in the previous section is very important for the

Page 107: Automated Methods for the Evaluation and Analysis of

novEmbEr 2017 | IEEE ComputatIonal IntEllIgEnCE magazInE 35

success of the analysis. Table 2 gathers the chosen values for all the parameters needed. Below are some remarks about this parameter tuning:

❏ The simplest DCMM in the grid search has ( ,M 3= , )l f1 0= = (3-state HMM) and the most complex has

( , , ) .M l f8 3 2= = = We consider that any model with higher hyperparameters would be intractable. Taking into account the range of each hyperparameter, we will train a total of 54 DCMMs in the grid search process.

❏ A transition probability is considered “relevant” when it is higher than 0.9 ( .TRT 0 9= in measure CHPHT, Eq. (9)).

❏ With regard to the GA used to perform Rank Aggregation, we have used the default parameters set in [17]. The param-eter “Rerun times” means that we will execute the algorithm 20 times and get the result with minimum fitness value. The parameter “Convergence iterations” is the stopping criteria. If the best solution does not change in 15  iterations, the algorithm will be stopped.All the experiments have been implemented in the R Statis-

tical Environment, making use of the package march [30] for dealing with DCMMs, and RankAggreg [17] for performing Rank Aggregation. The source code is available on Github1.

C. Experimental DatasetIn this experiment, the simulation environment (DWR) was test-ed by Computer Engineering students of the Autonomous University of Madrid (AUM), all of them inexperienced in this

1Source code of this work: https://github.com/lordbitin/ CIM-2017.

type of systems. The mission prepared for this experiment fea-tured a total of 3 UAVs performing 4 Surveillance tasks in 2 dif-ferent areas, in order to detect 4 mobile targets. The map also presented 4 No-Flight-Zones and 4 Refueling Stations. During the simulation, 4 scheduled incidents were triggered, affecting both the UAVs and the environment. For more information about the mission elements involved in the simulation see [9].

The dataset resulted from this experiment comprises N 85= distinct simulations, executed by a total of 27 users. A 75% of them (63) will be used for training DCMMs, and the rest 25% (22) will be used as test sequences used to compute the predictability evaluation measures. After applying the data preprocessing steps as detailed in Section III-A, we extract the corresponding interaction sequences for each of the simula-tions. The length of those sequences goes from 15 interactions (minimum allowed) to 210, achieving the average in 64.6 interactions per simulation.

D. Experimental ResultsTable 3 shows the top-5 DCMM ranking for each evaluation measures used in this work, when they are applied to the data extracted from DWR. First of all, it is important to remark which are the most relevant evaluation measures for this exper-iment, i.e, which help us the most to discriminate models. The fact that an evaluation measure is relevant here does not mean that it should be the same for other domains and configura-tions. As it can be observed, the most heterogeneous measures, i.e., the ones that achieve highest coefficients of variation are BIC and MPGP. Therefore, the models leading the rankings of

TABle 2 Parameter tuning for all the variables involved in the experimentation carried out in this work. The “context” column makes reference to the processes shown in Figure 2.

CONTeXT PArAMeTer VAlUe

A. EXTRACTING INTERACTION SEQUENCES NUMBER OF POSSIBLE INTERACTIONS (K) 6

MIN. NUMBER OF INTERACTIONS PER SEQUENCE 15

B. LEARNING DCMMS (HYPERPARAMETERS) RANGE OF M (NUMBER OF STATES) [3,8]

RANGE OF L (ORDER OF THE HIDDEN CHAIN) [1,3]

RANGE OF f (ORDER OF THE VISIBLE CHAIN) [0,2]

B. LEARNING DCMMS (MEMETIC ALGORITHM) NUMBER OF GENERATIONS 5

POPULATION SIZE 5

BAUM-WELCH ALGORITHM - MAX. ITERATIONS 4

BAUM-WELCH ALGORITHM - TOLERANCE 0.01

C. EVALUATING DCMMS CHPHT - TRANSITION RELEVANCY THRESHOLD (TRT) 0.9

AGP - PENALTY RATE (PR) 2

D. WEIGHTED RANK AGGREGATION SIZE OF THE AGGREGATED RANKING (Q) 10

D. RANK AGGREGATION (GA) POPULATION SIZE 100

MAXIMUM NUMBER OF ITERATIONS ALLOWED 2000

CONVERGENCE ITERATIONS 15

CROSSOVER PROBABILITY 0.4

MUTATION PROBABILITY 0.01

RERUN TIMES 20

Page 108: Automated Methods for the Evaluation and Analysis of

36 IEEE ComputatIonal IntEllIgEnCE magazInE | novEmbEr 2017

BIC and MPGP will likely be the ones leading the final rank-ing. Classical HMMs ( ,l f1 0= = ) are mostly found within the top-5 BIC ranking, which is a sign that the use of higher order DCMMs only makes sense if we extend the model selection process with additional measures. If we focus on the SL and AGP rankings, which are both accuracy measures in some way, we see that top positions are occupied by complex DCMMs with higher order chains and a high number of states, which give us an initial idea of the superiority of these models over classical HMMs in terms of predictive capabilities. Also, it is remarkable that we find DCMMs with high values of l (the order of the hidden chain) within the top-5 CHPHT ranking, probably containing long hidden transition patterns.

In order to have a better understanding of the influence of each hyperparameter of a DCMM in terms of the predictabili-ty and the interpretability, we represent in Figure 3 three heat-maps (one for each hyperparameter) showing the best position of a DCMM in the top-10 aggregated ranking obtained after carrying out the whole process flow described in this work, modifying the importance of the predictability and interpret-ability (a in Eq. 5). From them, we can extract some impor-tant conclusions related to this experiment:

❏ The 6-state DCMMs lead the rankings when the predictabili-ty importance is low (and thus, the interpretability importance

is high). This is because in this experiment there are K 6= possibilities for an observation, and hence, each state in the model is representing one of them, which makes the model states easily understandable. As a big picture, we can see that the top-right corner of the heatmap is darker than the bottom left, which confirms empirically that our predictive evaluation measures tend to favor models with high number of states.

❏ Although in this experiment models with a first order hid-den chain )(l 1= always get the best rank positions (see the second heatmap of Figure 3), there is a clear inverse rela-tionship between the predictability importance and the value of l. This means that, at least for the simulation envi-ronment DWR, the use of higher order hidden chains in order to link more than one previous cognitive operator states is not combined with an improvement of the model predictive capabilities, probably because the evolution of the operator cognitive state is not so complex in a simple simu-lation environment as the one used.

❏ Regarding the order of the visible chain (third heatmap of Figure 3), we can see that the value of f for best DCMMs goes from 0 (the DCMM is an HMM) to 1 as the predict-ability importance increases, which means that models with visible chains outperform HMMs when it comes to pre-dictability issues. Furthermore, models with f 1= also

3

4

5

6

7

8

0.00 0.25 0.50 0.75 1.00Predictability Importance

0.00 0.25 0.50 0.75 1.00Predictability Importance

0.00 0.25 0.50 0.75 1.00Predictability Importance

Num

ber

of S

tate

s (M

)

1

2

3

Hid

den

Cha

in O

rder

(l)

0

1

2

Vis

ible

Cha

in O

rder

(f)

1

5

10

Best RankPosition

FigUre 3 Heatmaps showing the best position of a DCMM in the top-10 ranking in terms of the three hyperparameters of a DCMM: M, l and f. The x-axis shows how the results vary when we modify the importance of the predictability of the model, and thus, modifying importance of the interpretability too. Empty cells are used when no model with such hyperparameters is found in the top-10 ranking.

TABle 3 Top-5 (of 54) ranking for each evaluation measure used in this work, for both the predictability and the interpretability aspects of a DCMM. Each cell contains an identifier of the DCMM based on its hyperparameters (M, l, f ) and the associated numerical measure. For each measure it is shown its associated coefficient of variation (e.g. SL, Cv = 0.076), which influences the importance of each measure in the Rank Aggregation process.

PreDiCTABiliTY PreDiCTABiliTY PreDiCTABiliTY BOTH iNTerPreTABiliTY

# Sl 0.076Cv = AgP 0.025Cv = MPgP 0.651Cv = BiC 0.462Cv = CHPHT 0.190Cv =

1 ( , , ) [ . ]8 2 2 69 530n - ( , , ) [ . ]3 1 2 0 837n ( , , ) [ . ]7 2 0 0 310n ( , , )6 1 0n [ . ]11261 926 ( , , ) [ . ]3 1 2 0 451n

2 ( , , ) [ . ]8 1 2 70 174n - ( , , ) [ . ]3 1 1 0 835n ( , , ) [ . ]4 2 0 0 301n ( , , )3 1 1n [ . ]11589 492 ( , , ) [ . ]4 1 2 0 441n

3 ( , , ) [ . ]7 1 2 70 906n - ( , , ) [ . ]7 1 2 0 834n ( , , ) [ . ]4 3 0 0 279n ( , , )5 1 1n [ . ]11591 593 ( , , ) [ . ]3 3 0 0 436n

4 ( , , ) [ . ]7 2 2 71 362n - ( , , ) [ . ]5 1 1 0 834n ( , , ) [ . ]5 1 1 0 279n ( , , )8 1 0n [ . ]11621 068 ( , , ) [ . ]5 1 2 0 431n

5 ( , , ) [ . ]8 3 2 71 472n - ( , , ) [ . ]7 1 1 0 834n ( , , ) [ . ]8 1 1 0 271n ( , , )5 1 0n [ . ]11626 030 ( , , ) [ . ]6 1 0 0 427n

Page 109: Automated Methods for the Evaluation and Analysis of

novEmbEr 2017 | IEEE ComputatIonal IntEllIgEnCE magazInE 37

occupy high positions in the ranking when more impor-tance is given to interpretability (one can see dark colors in the whole heatmap when f 1= ). This is extremely impor-tant for the motivation of this work, due to it shows clearly the advantage of having a Markov chain among operator interactions in each state, which is not possible in the cur-rent state of the art HMM-based modelling framework.

VI. Conclusions & Future WorkThis work has presented a new way to find and model behav-ioural patterns among UAV operators. It is based on Double Chain Markov Models (DCMMs), which extend the possibili-ties of the currently used HMMs by combining two higher order Markov chains into the same model, in order to improve the predictive capabilities of the model and the quality of the patterns that can be interpreted from it. The different processes for creating, evaluating and selecting the models in a UAV con-text have been detailed, and an experiment has been carried out in using data from a set of inexperienced operators in a simple multi-UAV simulation environment.

The resulting models from this experiment show that, despite the inclusion of higher order hidden chains do not substantially improve the quality of the model neither in terms of predictability nor interpretability, adding a visible Markov chain among the model observations improves the predictive capabilities of DCMMs over classical HMMs, while maintaining a fair level of interpretabili-ty. In any case, these results only show the conclusions for a specific simulation environment, but what it is really interesting for the state of the art is the flexibility and richness of the proposed modelling framework over the current HMM-based methodologies.

As future work, several issues will be extended and improved, including: 1) A study of the correlation among the evaluation measures used in this work across different experiments and domains, in order to extract conclusions about their usefulness and practical sense. 2) An extension of the DCMM which allows to gather in the same model multiple data sequences, such as the operator interactions and mission events. 3) An extension of the DCMM which allows to model explicitly the duration of each hidden state (Double Chain Semi-Markov Model). 4) The use of covariates in the model creation to compare behavioral pat-terns with respect to specific operator features, such as the age or the previous experience with UAVs. 5) A formal comparison among different Markovian models in the context of UAV operations. 6) The use of the resulting model as an online pre-dictive tool to detect abnormal behaviors during a mission.

AcknowledgmentThis work has been supported by the next research projects: EphemeCH (TIN2014-56494-C4-4-P) Spanish Ministry of Economy and Competitivity, CIBERDINE S2013/ICE-3095, both under the European Regional Development Fund FEDER, and by Airbus Defence & Space (FUAM-076914 and FUAM-076915). The authors would like to acknowledge the support obtained from Airbus Defence & Space, specially from Savier Open Innovation project members: Jose Insenser,

Gemma Blasco, and Juan Antonio Henriquez. Finally, we would like to thank you to the reviewers and the Editor in Chief for the different suggestions and comments made to this work.

References[1] M. DeGarmo and G. M. Nelson, “Prospective unmanned aerial vehicle operations in the future national airspace system,” in Proc. AIAA 4th Aviation Technology Integration and Operations Forum, Chicago, Sept. 20–22, 2004, pp. 20–23.[2] V. V. Klemas, “Coastal and environmental remote sensing from unmanned aerial ve-hicles: An overview,” J. Coastal Res., vol. 31, no. 5, pp. 1260–1267, Apr. 2015.[3] T. Haritos, J. M. Robbins, S. Clyde, and M. Blvd, “The use of high fidelity simulators to train pilot and sensor operator skills for unmanned aerial systems,” Proc. Society for Ap-plied Learning Technologies Conf., Jan. 2012, pp. 1–6.[4] W. Bennett, Jr., J. B. Bridewell, L. J. Rowe, S. D. Craig, and H. M. Poole, “Training issues for remotely piloted aircraft systems from a human systems integration perspective,” in Proc. Remotely Piloted Aircraft Systems: A Human Systems Integration Perspective Conf., 2016, pp. 163–176.[5] D. C. Ison, B. A. Terwilliger, and D. A. Vincenzi, “Designing simulation to meet UAS training needs,” in Proc. Int. Conf. Human Interface and the Management of Information, Las Vegas, July 21–26, 2013, pp. 585–595.[6] Y. Boussemart and M. L. Cummings, “Predictive models of human supervisory con-trol behavioral patterns using hidden semi-Markov models,” Eng. Applicat. Artif. Intell., vol. 24, no. 7, pp. 1252–1262, Oct. 2011.[7] A. Berchtold, “High-order extensions of the double chain Markov model,” Stochastic Models, vol. 18, no. 2, pp. 193–227, Jan. 2002.[8] V. RodrĂ­guez-FernĂĄndez, A. Gonzalez-Pardo, and D. Camacho, “Finding behavioral patterns of UAV operators using multichannel hidden Markov models,” in Proc. IEEE Symp. Series on Computational Intelligence, Athens, Dec. 6–9, 2016, pp. 1–8.[9] V. RodrĂ­guez-FernĂĄndez, H. D. MenĂ©ndez, and D. Camacho, “Design and develop-ment of a lightweight multi-UAV simulator,” in Proc. IEEE 2nd Int. Conf. Cybernetics, Gdynia, June 24–26, 2015, pp. 255–260.[10] V. RodrĂ­guez-FernĂĄndez, A. Gonzalez-Pardo, and D. Camacho, “A method for building predictive HSMMs in interactive environments,” in Proc. IEEE Congr. Evolution-ary Computation, Vancouver, July 24–29, 2016, pp. 3146–3153.[11] L. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proc. IEEE, vol. 77, no. 2, pp. 257–286, Jan. 1989.[12] P. Moscato, D. Corne, M. Dorigo, F. Glover, D. Dasgupta, P. Moscato, R. Poli, and K. V. Price, “Memetic algorithms: A short introduction,” in New Ideas in Optimization. U.K.: McGraw-Hill, 1999, pp. 219–234.[13] A. Berchtold, “Optimization of mixture models: Comparison of different strategies,” Comput. Stat., vol. 19, no. 3, pp. 385–406, Sept. 2004.[14] Y.-T. Liu, T.-Y. Liu, T. Qin, Z.-M. Ma, and H. Li, “Supervised rank aggregation,” in Proc. 16th Int. Conf. World Wide Web, Alberta, May 8–12, 2007, pp. 481–490.[15] K. W. Lam and C. H. Leung, “Rank aggregation for meta-search engines,” in Proc. 13th Int. World Wide Web Conf. Alternate Track Papers & Posters, New York, May 19–21, 2004, pp. 384–385.[16] V. Pihur, S. Datta, and S. Datta, “Finding cancer genes through meta-analysis of microarray experiments: Rank aggregation via the cross entropy algorithm,” Genomics, vol. 92, no. 6, pp. 400–403, Dec. 2008.[17] “Rankaggreg, an R package for weighted Rank aggregation,” BMC Bioinform., vol. 10, no. 1, pp. 62, Feb. 2009.[18] D. E. Critchlow, Metric Methods for Analyzing Partially Ranked Data. Springer Science & Business Media, 2012, vol. 34.[19] C. Dwork, R. Kumar, M. Naor, and D. Sivakumar, “Rank aggregation methods for the web,” in Proc. 10th Int. Conf. World Wide Web, Hong Kong, May 1–5, 2001, pp. 613–622.[20] P. Diaconis and R. L. Graham, “Spearman’s footrule as a measure of disarray,” J. R. Stat. Soc. Ser. B (Methodological), vol. 39, no. 2, pp. 262–268, May 1977.[21] M. S. Beg and N. Ahmad, “Soft computing techniques for rank aggregation on the world wide web,” World Wide Web, vol. 6, no. 1, pp. 5–22, Mar. 2003.[22] V. Pihur, S. Datta, and S. Datta, “Weighted rank aggregation of cluster validation measures: A monte carlo cross-entropy approach,” Bioinformatics, vol. 23, no. 13, pp. 1607–1615, May 2007.[23] J. A. Aledo, J. A. GĂĄmez, and D. Molina, “Tackling the rank aggregation problem with evolutionary algorithms,” Appl. Math. Comput., vol. 222, pp. 632–644, Oct. 2013.[24] J. O’Connell, F. A. TĂžgersen, N. C. Friggens, P. LĂžvendahl, and S. HĂžjsgaard, “Combining cattle activity and progesterone measurements using hidden semi-Markov models,” J. Agri. Biol. Environ. Stat., vol. 16, no. 1, pp. 1–16, Mar. 2011.[25] C. B. Do, C.-S. Foo, and A. Y. Ng, “Efficient multiple hyperparameter learning for log-linear models,” Adv. Neural Inform. Process. Syst., vol. 20, pp. 377–384, 2007.[26] R. Kass and A. Raftery, “Bayes factors,” J. Am. Stat., vol. 90, no. 430, pp. 773–795, Mar. 1995.[27] R. Castonia and Y. Boussemart, “The design of a HSMM-based operator state mod-eling display,” in Proc. AIAA Infotech@ Aerospace, Atlanta, Apr. 20–22, 2010, pp. 1–10.[28] R. E. McAuliffe, “Coeff icient of variation,” in Wiley Encyclopedia of Management, vol. 8, Jan. 2015.[29] F. Salfner and M. Malek, “Using hidden semi-Markov models for effective online failure pre-diction,” Proc. IEEE Symp. Reliable Distributed Systems, Beijing, Oct. 10–12, 2007, pp. 161–174.[30] O. Maitre (with contributions from Andre Berchtold, and O. Buschor). (2016). March: Markov chains, R package version 1.4. [Online]. Available: https://CRAN.R-project .org/package=march

Page 110: Automated Methods for the Evaluation and Analysis of

PUBLICATION 3

Automatic Procedure Following Evaluation Using PetriNet-Based Workflows

ïżœ Rodrıguez-Fernandez, Vıctor, Antonio Gonzalez-Pardo, and David Camacho. 2018.“Automatic Procedure Following Evaluation Using Petri Net-Based Workflows.” IEEETransactions on Industrial Informatics 14 (6): 2748–59. DOI: 10.1109/TII.2017.2779177

– State: In press. Published online.

– Impact Factor (JCR 2017): 5.430

– Category: Automation & Control Systems. Rank: 4/61 [Q1/D1].

– Category: Computer Science, Interdisciplinary Applications. Rank: 5/105 [Q1][D1]

– Category: Engineering, Industrial. Rank: 1/47 [Q1/D1]

ïżœ Contribution of the PhD candidate:

– First author of the article.

– Co-authoring in the conception of the presented idea.

– Formalization of the modelling concepts proposed

– Implementation of the proposed algorithms.

– Design and simulation of the use cases.

– Co-authoring in the interpretation and discussion of results.

– Writing of the manuscript with inputs from all authors, and design of the figures.

97

Page 111: Automated Methods for the Evaluation and Analysis of

2748 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 14, NO. 6, JUNE 2018

Automatic Procedure Following Evaluation UsingPetri Net-Based Workflows

Vıctor Rodrıguez-Fernandez , Antonio Gonzalez-Pardo , Member, IEEE,and David Camacho, Member, IEEE

Abstract—An operating procedure (OP), also known aschecklist or action plan, is a list of actions or criteria ar-ranged in a systematic way, commonly used in areas suchas aviation or healthcare to ensure the success of criticaltasks and to help decrease human errors. In these areas,operators are hardly trained to follow the OP carefully, butthe evaluation of how they are following, it is usually per-formed manually by an expert instructor. Automating thisevaluation process would lead to an objective and scalableanalysis of the operator performance, which is extremely im-portant in areas where the number of operators to evaluateis high. This problem, which can be referred as automaticprocedure following evaluation, needs of new techniquesand formalizations due to current conformance checkingmethods do not fit well with some aspects of a OP. In thispaper, OPs are modeled as Petri Net-based workflows, andinteract with the data log of the system to allow an auto-matic evaluation of the progress and time spent followingthe OP. In order to illustrate the contributions of this paper, acase study is carried out designing and modeling an emer-gency OP for an unmanned aircraft system, and evaluatingthe proposed approach with a battery of tests.

Index Terms—Checklist, conformance checking, operat-ing procedure (OP), Petri Net (PN), procedure following eval-uation (PFE), workflow.

I. INTRODUCTION

THE number of human-dependant critical tasks is increasingevery day in a large number of works. These tasks are

considered critical because any failure during their executioncould produce important consequences such as human life lossor high monetary loss. To reduce the risk involved in them, astep-by-step guiding tool [called operating procedure (OP)] isusually given, describing the different checks and actions thatthe person responsible for this task, namely the operator, must

Manuscript received April 24, 2017; revised July 14, 2017, September30, 2017, and November 17, 2017; accepted November 27, 2017. Date ofpublication December 4, 2017; date of current version June 1, 2018. Thiswork has been supported by the next research projects: Airbus Defence& Space (FUAM-076914 and FUAM-076915), EphemeCH (TIN2014-56494-C4-4-P) and DeepBio (TIN2017-85727-C4-3-P) Spanish Ministryof Economy and Competitivity, CIBERDINE S2013/ICE-3095, both un-der the European Regional Development Fund FEDER, and RiskTrack(JUST-2015-JCOO-AG-723180). Paper no. TII-17-0828. (Correspondingauthor: Vıctor Rodrıguez-Fernandez.)

The authors are with the Departamento de Informatica, UniversidadAutonoma de Madrid, Madrid 28049, Spain (e-mail: [email protected]; [email protected]; [email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TII.2017.2779177

perform in order to solve it successfully. The concept of an OPis also known as a checklist in many domain fields, although itcan be referred to as action checklists or emergency OPs.

There is a plethora of applications that use OPs to definethe different steps required to solve a critical task. The mostextended area of application is aviation, where pilots act inmany high-risk situations such as takeoff emergencies, ejectionprocedures, landing emergencies, or fuel system failures amongothers [1]. OPs are also used in medicine and healthcare, inorder to provide surgical care [2], or to define how to transportthe different patients [3].

Due to the critical facet of the just mentioned tasks, operatorsare trained in order to get them used to follow OPs strictly. Inthis training, an instructor is in charge of controlling that the op-erator is correctly following the steps described in the OP. Thisis what we will, henceforth, call procedure following evaluation(PFE). Nowadays, in most of the applications involving OPs,the PFE is performed manually, i.e., the instructor must verifythat an operator is following all the steps described in the OPs[4]. This manual supervision is not appropriate if the number ofoperators increases because instructors are not able to inspectwhether each trainee is correctly performing all the steps de-scribed in the OP. For this reason, the development of systemsthat automatically evaluate and analyze how operators followan operating procedure (OP) is highly important, not only forthe scalability of the training phase, but also to extract measuresabout the performance of the operators.

According to the literature, the family of techniques focusedon comparing process models with data from the same process isknown as conformance checking. In recent years, some methodssuch as replay [5] or behavioral alignment [6] advocate for theuse of Petri Net-based Workflows, or Workflow Nets (WF-Nets),to check the conformance between a business process modeland an event log. However, these techniques are not suitablefor PFE for two main reasons: On the one hand, the businessprocess models are event based, which imply that the conditionto fulfil an activity in the WF-Net consists just in checking thepresence of a single event in the log. Instead, OPs are state-based processes, and some procedural steps must be checkedaccording to flexible and complex conditions involving the stateof one or many variables in the data log during a certain periodof time, for example, the step “Supervise that the altitude ofthe vehicle is below 10 000 m for 1 min.” On the other hand,the concept of time is secondary in most of the conformancechecking methods, and here it is a critical factor not only for

1551-3203 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

Page 112: Automated Methods for the Evaluation and Analysis of

RODRıGUEZ-FERNANDEZ et al.: AUTOMATIC PROCEDURE FOLLOWING EVALUATION USING PETRI NET-BASED WORKFLOWS 2749

Fig. 1. General concept of the proposed approach.

evaluating the time spent to complete an OP, but also to define apriori the maximum step duration allowed for each proceduralstep.

Thus, this paper extends the use of WF-Nets with new formal-izations and methodologies to model OPs and execute confor-mance checking (or PFE) on them. The problem of automatingthe PFE is solved by expressing the contents of an OP as a for-mal model in which we can introduce the data log resulted fromthe response of the operators in the system, in order to analyzethe progress achieved and time they spent on the process (seeFig. 1). The contributions of this paper can be summarized asfollows:

1) We formalize and generalize the concepts and relationsinvolved in an OP.

2) We describe a method to model OPs using (WF-Nets),both, the static and dynamic components of the model areadapted to the concepts and relations involved in the OP.We have selected Petri Nets (PNs) because they provide aformal mathematical framework that allows us to modelboth concurrent and sequential steps, which is extremelyuseful in cases where the operator is asked to performmore than one action at the same time.

3) We introduce an automatic procedure following evalu-ation (APFE) algorithm which uses the WF-Net-basedOP to automatically evaluate the response of an operatorfollowing a specific OP. This evaluation is not performedin real time, but it works taking into account the systemlog that contains all the timestamped data resulted fromthe operator response.

4) In order to illustrate the ideas mentioned above, we de-velop a test case by using a realistic OP belonging to thefield of unmanned aerial vehicle (UAV) operations.

5) A comparison study is carried out to remark the mainadvantages of this approach over previous Procedure Fol-lowing Evaluation (PFE) approaches.

The rest of the paper is structured as follows: Section IIprovides the basic backgrounds on PNs, Workflow Nets, andOPs needed to contextualize the current paper. In Section III,the definition of the proposed modeling strategy is described,and Section IV presents the APFE algorithm based on an OPmodel. Then, Section V shows the case study of this paper, andin Section VI, we focus on comparing the proposed approachwith the related work. Finally, the conclusions and future workcan be found in Section VII.

II. BACKGROUNDS

A. OPs and Checklists

A checklist, or OP is typically a list of action items or criteriaarranged in a systematic manner, allowing a user to record thepresence/absence of the individual items listed to ensure thatall are considered or completed. Their efficacy as a cognitiveaid lies in their ability to “chunk” item-specific information inan organized fashion, and in the fact that these lists of instruc-tions are often better understood than information in paragraphformat. As a consequence, the use of checklists usually resultsin an improvement of performance and memory recall, and adecrease in the human error rate [7].

Checklists have been widely and consistently used, at least inthe following two different areas.

1) Aviation: Checklists constitute a critical part of the flightprotocol. Two types of checklists are usually defined inthis field: a) Normal checklists, which guide regular flightpractices such as takeoff, landing, etc., b) Emergencychecklists, used to guide the correction of alert situations,such as landing emergencies, fuel system failures, etc.

2) Healthcare: Checklists have been demonstrated to beeffective in high-intensity fields of medicine, such astrauma and anesthesiology [8].

Reijers et al. [9] discuss about the philosophy of design ofa checklist and propose a checklist classification in terms ofseveral characteristics.

1) The device: Although the most common device is a pieceof paper, we can also find scroll, mechanical, vocal, andelectronical checklists [10].

2) The Method: There are two predominant methods ofconducting a checklist: a) Do-list: The checklist fol-lows a step-by-step “cookbook” approach. b) Challenge-response: The checklist is a backup procedure that is usedafter a specific operation to verify that all the items listedon the checklist have been correctly accomplished.

3) The Items on the List: The type and the amount of itemsincluded in a checklist is a cardinal question in the check-list philosophy.

4) Level of Automation: It discusses whether to include in thechecklist some steps that could be automatically verifiedby the system or not.

B. Petri Nets

A classical (or low-level) PN is a directed bipartite graph withtwo node types called places and transitions, and a set of directedarcs that connect nodes of different types. Connections betweennodes of the same type (i.e., place to place and transition totransition) are not allowed.

At any time of a process, a place in a PN can contain zeroor more tokens, usually drawn as dots inside the place. Thedistribution of tokens over places is often referred as marking,and represents the status of the net at any (discrete) time.

Formally, a PN can be described as a quintuple (P, T,I,O,M), where the following statements hold.

1) P ={p1, . . . , pnp

}is a finite set of np places.

2) T = {t1, . . . , pnt} is a finite set of nt transitions.

Page 113: Automated Methods for the Evaluation and Analysis of

2750 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 14, NO. 6, JUNE 2018

3) I ⊆ (P × T ) is a set of input arcs, which connect placesto transitions. A place p is called an input place of atransition t iff there exists an input arc from p to t.

4) O ⊆ (T × P ) is a set of output arcs, which connect tran-sitions to places. A place p is called an output place of atransition t iff there exists an output arc from t to p.

5) M : P → N is the marking of the net in a specific mo-ment. We can represent a marking as follows: 1p1 +3p2 + 5p4 to show a state that has one token in placep1, 3 tokens in p2, 5 in p4, and 0 tokens in the rest ofthe places. Furthermore, M(p) represents the number oftokens of place p in marking M .

Transitions are the active components of a PN, that are used tochange the position and number of tokens during the executionof the net. The dynamics of a classical PN follow some executionrules.

1) A transition t is said to be enabled in a marking M iffeach input place of the transition contains at least onetoken, i.e., iff M(p) > 0,∀p ∈ ‱t, where ‱t refers to theset of input places of the transition t.

2) When a transition t is enabled, it may fire, consuming onetoken from each input place pin ∈ ‱t and producing onetoken in each output place pout ∈ t‱.

PNs have been widely used in multiple fields as a tool fordescribing and analyzing discrete-event dynamic systems [11],[12], due to the combination they offer of a great expressivepower, legibility, and mathematical formality.

However, PNs describing real processes tend to be complexand extremely large, so it is not possible to use the standard PNto model complex data or time-dependent problems. For thatreason, many extensions of a classical PN have been proposed,known as high-level PNs. In this paper, we are focused on a typeof PNs where the tokens contain a data value, usually known astoken color [13]. Examples of this type of PNs are Colored PNs,E-nets, or First Input First Output (FIFO-) nets, among others.Transitions describe the relation between the colors of the inputtokens and the colors of the output tokens. Also, a transitioncan establish “preconditions” (guards) to enable/disable itselfdepending on the color of the input tokens.

C. Workflow Management (WfM) and WF-Nets

WfM includes methods, techniques, and tools to supportthe design, enactment, management, and analysis of workflows[14]. Workflows are case based, i.e., every piece of work is ex-ecuted for a specific case. Cases are handled by executing tasksin a specific order. Each task has “pre-” and “post-” conditions,and they are described as TRUE/FALSE expressions.

It is important to use an established framework for modelingand analyzing workflow processes. The benefits of using PNsas a framework for the specification and modeling of workflowsare the following.

1) The formal semantics.2) The availability of many analysis techniques.3) The fact that PNs are state-based rather than event based.

Modeling a workflow process in terms of a PN is ratherstraightforward: tasks are modeled by transitions, conditions

are modeled by places, and cases are modeled by tokens. Theapplication of PNs to WfM results in a class of PNs namedWF-Nets [15], which satisfies the following.

1) It has one initial place (i), where the process starts, andone output end place (e), where the process ends.

2) Every task transition and place should be located on a pathfrom the initial place to the end place. That means that ifwe virtually extended the PN adding a transition whichconnects place e with i, then the resulting PN should bestrongly connected.

A WF-Net can be used to describe the routing of casesthroughout the workflow, building blocks such as the AND-split,AND-join, OR-split, and OR-join in order to model sequential,conditional, parallel, and iterative routing [5].

III. MODELING AN OP AS A WF-NET

In this section, we provide a complete description of how toadapt the elements of a WF-Net to model the concepts and re-lationships present in a written OP. The resulting model, namedas WF-Net-based operating procedure (OP), will “interact” withthe data log in of the system, in order to evaluate the operatorresponse to a critical task (see Fig. 1). To achieve this, first of allthe concepts and relationships of an OP must be described, andthen modeled in terms of WF-Net elements. Finally, the dynam-ics of the model must be defined in order to make it appropriatefor evaluating the operator response.

A. Concepts and Relations for OPs

In this paper, we will refer to the term “Operating Procedure”as a way to gather different step-by-step guiding tools, such aschecklists, action checklists, Emergency OPs, action plans, etc.

Here a description of the main concepts and relations forOPs is given. This description has been defined based on worksfrom different fields where procedures and checklists have beenproven to be successful in real operational environments. Thisincludes the use of the WHO Surgical Safety Checklist, thestudy of nuclear OPs as the loss of coolant accident procedure[16], and finally, the analysis of OPs from the field of aviation,such as the Airbus manual for emergencies in UAV missions.1

Basically, an OP comprises an ordered sequence of procedu-ral steps, each of them describing a subgoal that the operatormust accomplish to successfully complete the whole procedure.A procedural step may be either atomic or a container of asequence of substeps, which must be all fulfilled in order toconsider the container step has been completed. Additionally, aprocedural step contains an attribute, namely Ό, indicating themaximum step duration, which in some cases can be provideda priori by a domain expert (the process modeler), and in othercases, it can be computed in terms of a step complexity measure.

Based on the type of response expected from the operator ina procedural step, each atomic procedural step can be classifiedaccording to three groups.

1References and specific details from these procedures have not been includeddue to a confidential agreement with Airbus Defence and Space Company.

Page 114: Automated Methods for the Evaluation and Analysis of

RODRıGUEZ-FERNANDEZ et al.: AUTOMATIC PROCEDURE FOLLOWING EVALUATION USING PETRI NET-BASED WORKFLOWS 2751

1) Action: Represents a step in which the operator is asked todo something that will alter in some way the status of thesystem. Actions are usually mandatory for the completionof the OP and consume a specific amount of time. Anexample of an Action Step might be: “Command altitudechange to 10 000 ft.”

2) Check: A check is a step in which the operator is askedto confirm/refuse that a specific property inside the op-erational environment is fulfilled. Usually, the processof checking a property is performed visually. In this pa-per, we consider that checks are instantaneous (ÎŒ = 0),which means that the time employed by the operator isirrelevant. Examples of a check step would be: “Confirmthe remaining fuel is higher than 50 L” or conditionalsentences like “If the remaining fuel is higher than 50L...”).

3) Supervision: There are some procedural steps that askthe operator to supervise the status of some aspects ofthe system for a time period (e.g., “Verify that the UAVaborts autonomously and lands”). Unlike check steps, theprocess of supervising is not instantaneous, but persistentduring a time interval (ÎŒ ïżœ= 0). A conditional statementsof an OP involving time, such as “While ...” and “When...” can be modeled as a supervision step.

B. Implementing the Concepts of the OP UsingWF-Net Elements

Once we have formalized the different concepts and relationscomprising an OP, it is possible to express each of those conceptsin terms of classical PN elements, following common workflowdesign patterns.

As in every PN-based workflow, a WF-Net-based OP startswith an unique initial place and finish with a unique end place.Between them, consecutive procedural steps follow one another,linked by arcs. However, unlike for common WF-Nets, there isnot a direct mapping between atomic PN elements and proce-dural steps. Instead, the concepts of an OP become meaningfulonly in terms of blocks. The presence of cycles is allowed in aWF-Net-based OP, as long as it corresponds to a natural logiccontained in the written procedure that is being translated (e.g.,“If the conditions of step 3 are not fulfilled, go back to step 1”).

Fig. 2 shows the definition of the different type of proceduralsteps described in terms of blocks of places, transitions, andarcs. Actions are the simplest steps to model, only a sequentialcombination “Place-Arc-Transition” is needed. This is becausewhen an OP requires the action of an operator, there is no otherway to follow than waiting for the operator to execute that actionsuccessfully. On the other hand, Check steps and Supervisionsteps always offer two different ways to follow the executionof the OP, depending on whether the condition to check orsupervise is fulfilled or not. This fact is modeled as a placeconnected to two mutually exclusive transitions, which can beseen as an XOR-Split routing block.

Regarding container steps, we distinguish between concur-rent and sequential containers, depending on whether the OPallows operators to perform those substeps in parallel or not.

Fig. 2. Workflow process definition for the different type of proceduralsteps defined for any OP.

In case of having substeps concurrency, we surround them withan AND-split routing block at the beginning and an AND-Joinat the end, to ensure that every substep is analyzed and thatStep N+1 cannot start until every substep from (Step N) has fin-ished. In addition, if there are check or supervision steps insidethe concurrent container, we must link, on the one hand, thefulfilled transitions of each of them to the aforementionedAND-Join block, and on the other hand the !fulfilled tran-sitions must be directed to an OR-Join block that gathers everypossible nonfulfillment of the concurrent container (see Fig. 5as an example).

C. Dynamic Behavior of the WF-Net-Based OP

In order to understand how the execution of an OP is imple-mented inside a PN in this paper, tokens, markings, and firingpolicies in the net must be specified. In a WF-Net-based OP, thesupport for the movement of tokens comes from the data log ofthe operation.

Definition 1: A data log Δ = (V,U,R) consists of thefollowing.

1) A set V of variables.2) A function U that defines the values admissible for each

variable, i.e., U(v) is the domain of variable v for eachv ∈ V . In this context, D =

⋃v∈V U(v) represents the

set of all possible data values of any variable.3) A function R, that records the values of each variable over

time, i.e., R : [0,∞)× V ïżœâ†’ D âˆȘ {⊄}, with R(x, v) ∈U(v) âˆȘ {⊄}. 2

2If a variable is not given a value, we use the special symbol ⊄.

Page 115: Automated Methods for the Evaluation and Analysis of

2752 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 14, NO. 6, JUNE 2018

Note that R is defined as a partial function, due to the valuesof each variable are recorded only at specific timestamps. Thus,the domain of R, dom(R) ⊂ [0,∞)× V specify the times andvariables for which the data log contains a recorded value. Eitherthe records are created synchronously (with a preset record rate)or asynchronously (whenever a variable changes), the key issuein the above definition is that it allows us to track over time thevariables governing the operation to be analyzed. This definitionof a data log comes from both the ideas presented in [17], whereauthors convert unstructured text into a structured log that showsthe logged variables in (name, value) pairs, and the notation usedfor Data PNs in [18], to which we have added the notions oftemporal dependency in the values of variables.

The records contained in a data log can be seen as a set oflog entries 〈x, v,R(x, v)〉 ∈ E, where E = [0,∞)× V ×D.In order to navigate through the log entries continuously overtime, we introduce the concepts of last record time and logstate, which represents the closest past record of a variable atany moment of the process.

Definition 2: Given a data log Δ = (V,U,R), the log stateis a function that extracts, for any timestamp x ∈ [0,∞) andany variable v ∈ V , the log entry that contains the closest pastrecord of that variable in the log, i.e.,

SΔ : [0,∞)× V → E

SΔ(x, v) = 〈CPRT(x, v), v, R(CPRT(x, v), v)〉

where CPRT is the closest past record time, defined as

CPRT : [0,∞)× V → [0,∞)

CPRT(x, v) = max〈x â€Č,v â€Č〉∈dom (R)|v â€Č=v∧x â€Č≀x

(xâ€Č).

Based on this, we introduce the set SΔx =

⋃v∈V SΔ(x, v)

which gathers the log entries of every variable in the datalog at a given time. In other words, it represent the stateof the process up to time x. In the same way, SΔ

[x,y ] ={SΔ(xâ€Č, v) | x ≀ xâ€Č ≀ y, v ∈ V

}gathers all the log entries be-

tween x and y.A token conceptually represents the operator response facing

procedural steps. It is defined by two attributes.1) Position (p): Represents the place on the net where the

token is located.2) Timestamp (x): It specifies the “age” of the token. This

timestamp will be continuously changing by the dynam-ics of the net.

We can use the tuple 〈p, x〉 ∈ P × [0,∞) to denote a tokenin place p with timestamp x. A marking conceptually representsthe current progress of the operator through the OP. Inside amarking, tokens are organized in places in form of unorderedsets, so if two tokens that enable a transition share the sameplace, the decision of using one over the other is chosen ran-domly. In order to better handle the tokens in this paper, theformal definition of a marking is modified with respect to theone of classical PNs (see Section II). Let K = P × [0,∞) be thespace of tokens, a marking is defined as a function M : P → K∗

that assigns a set of tokens to each place. 3 Furthermore, givena transition t ∈ T , M(‱t) =

⋃p∈‱t M(p) refers to all the input

tokens of t, and M(t‱), defined analogously, to all the outputtokens.

Transitions are the active components of the net, since theycause changes in the net marking when they are fired. In terms ofthe dynamics of the OP, they are responsible for the following.

1) Deciding whether a procedural step has been performedcorrectly or not.

2) Moving the token(s) in case that the procedural step hasbeen performed.

3) Increase the timestamp of the token to the exact momentthat the operator completed the step.

To formalize the firing policies of a WF-Net-based OP wedefine, for each transition t ∈ T , a control function, namelyguard condition (GC), that is used to specify, in terms of thedata log, whether the procedural step associated to the transitionhas been fulfilled or not.

Definition 3: Given a data log Δ = (V,U,R) and let E∗ bethe set of all possible finite sets of log entries 〈x, v, d〉 suchthat v ∈ V, d ∈ U(v) ⊂ D. A GC is a function GC : E∗ →{true,false} that decides whether a set of log entries areadmissible to fulfil a condition or not.

By including temporal information into the definition of theguards (recall that a log entry contains a timestamp), we allowthe creation of complex and flexible conditions that could notbe expressed via event log-based data guards [18]. This is es-pecially important for expressing the statements associated tosupervision steps in an OP (e.g., “Supervise that the altitude ofthe vehicle is below 10 000 m for 30 s”), due to they usuallyinvolve the verification of the state of some variables over time,rather than the occurrence of an event.

Based on the above definition, we can create a function,namely the time of completion (ToC), that receives an inputtoken from an enabled transition t, and returns the earliest mo-ment, after the timestamp of the token, when the GC of t isfulfilled, i.e., when the associated procedural step was com-pleted.

Definition 4: Let Δ = (V,U,R) be a data log. Let xin be thetimestamp of an input token at a given transition t ∈ T . Let ÎŒ(t)be the maximum step duration of the procedural step to whichthe transition belongs. The ToC is denoted as

ToC : T × [0,∞)→ [0,∞)

ToC(t, xin)

= min{

x ∈ [x in, xin + ÎŒ(t)] | GCt(SΔ[xi n ,x]) = true

}

where GCt is the GC associated to transition t, and S[xi n ,x] isthe log state between xin and x.

The interval [xin, xin + ÎŒ(t)] is called the dynamic fulfillmentinterval of the transition. It imposes the minimum and maximumtimestamp in which GCt will be evaluated to decide the firing oft. If GCt(S[xi n ,x]) = false ∀x ∈ [xin, xin + ÎŒ(t)] then ToCreturns −1.

3K ∗ is the set of all possible sets of tokens in the net.

Page 116: Automated Methods for the Evaluation and Analysis of

RODRıGUEZ-FERNANDEZ et al.: AUTOMATIC PROCEDURE FOLLOWING EVALUATION USING PETRI NET-BASED WORKFLOWS 2753

Fig. 3. General scheme for the dynamics of a WF-Net-based OP. Inthis example, the guard transition GCt is associated to the proceduralstep “Supervise that foo is greater than 4.” The entry of the data log thatmeets the condition for the first time (and gives value to xou t ) is shaded.

Note that the situation may arise where two tokens 〈p, x1〉 and〈p, x2〉 share the same place p ∈ ‱t, but have different times-tamps. As a consequence, we may have ToC(t, x1) ïżœ= −1 andToC(t, x2) = −1, i.e., only one of the tokens is valid for thetransition guard. In this situation, a classical PN firing couldchoose the invalid token to be fired. Furthermore, during thetransition firing, we want that the timestamp of the output to-ken(s) is updated to the result of the function ToC, so that firingsrepresent the concept of progress in the procedure following. Forthese reasons, the definition of a firing must be adapted for aWF-Net-based OP.

Definition 5: Let t ∈ T be a transition of a given WF-Net-based OP, let M ∈M ∗ be a marking of that net,and let xout ∈ [0,∞). Given a set of tokens Îș ∈ K∗ | ∀k ∈Îș k.p ∈ ‱p ∧ ToC(t, k.x) ïżœ= −1, a firing is a function T ×M ∗ ×K∗ × [0,∞)→M that produces a marking M â€Č =firing(t,M, Îș, xout) where

M â€Č(p) =

⎧âŽȘâŽȘâŽȘâŽȘâŽȘ⎚âŽȘâŽȘâŽȘâŽȘâŽȘ⎩

M(p) \ Îș if p ∈ ‱t \ t‱M(p) âˆȘ {〈p, xout〉} if p ∈ t ‱ \ ‱ t

(M(p) \ Îș) âˆȘ {〈p, xout〉} if p ∈ ‱t ∩ t‱M(p) otherwise.

The third case of the above definition covers the situationswhere a place is both an input and an output of the fired transition(loops). Note that the firing is associated to a set of tokensinstead of a single one. This way, we cover the case of AND-Jointransitions where there are more than one input places and thus,the firing must consume one token from each of them. Alsonote that, although we use the concept of time as part of thetokens, we are not violating the atomic property of transitionfirings from classical PNs, i.e., there is no time elapse betweenthe enabling and the firing of a transition.

Fig. 3 represents a general scheme for the dynamics of aWF-Net-based OP with the concepts explained above. Once the

transition t is enabled (i.e., there is at least one token in everyinput place of t), we compute xout = ToC(t, xin). If xout ïżœ= −1,the transition may fire, consuming the input token and producingthe corresponding output tokens with a timestamp equal to xout

(xout ≄ xin ). Otherwise, the transition is not fired and the tokenremains in the input place, due to the log state does not meet theGC of the transition in the interval [xin, xin + ÎŒ(t)].

Given the definitions from Definitions 3 and 4, we can de-scribe formally the dynamical behavior of every type of proce-dural step in an OP.

1) Actions comprise one transition which must be fired onlyif the action was performed within the associated maxi-mum step duration (ÎŒ). If the action is found in the datalog within that interval, the transition is fired, and theTime of Completion (ToC) establishes the timestamp ofthe output token to a value xout ∈ [xin , xin + ÎŒ], indicat-ing the moment in which the action was completed.

2) Checks comprise two transitions, one controlling the be-havior of the OP in those cases that the condition tocheck has been fulfilled, and the other governing the con-tingency route for the cases where the condition has notbeen fulfilled (see Fig. 2). Let GCt be the GC for the“fulfilled” transition, the corresponding GCt for the“!fulfilled” transition is the complementary of GCt ,i.e., a function where GCt =!GCt for every input. As aconsequence, those transitions become mutually exclu-sive, resulting in an XOR-Split block. Checks are con-sidered to be instantaneous, thus ÎŒ = 0 (the dynamicfulfillment interval becomes a point), and the ToCs forboth transitions do not add time to the output tokens(xout = xin ).

3) Supervisions combine two mutually exclusive transitionswith complementary GCs as in the case of Checks (XOR-Split block). Tokens are evaluated in a dynamic interval[xin , xin + ÎŒ]. It the supervision is fulfilled, the “ful-filled” transition will be fired, setting a new timestampto the token, xout , indicating the exact moment in whichthe Supervision was fulfilled. Otherwise, the “!ful-filled” transition will fire, and xout = xin , due to theconditions of the two transitions are complementary.

4) AND-Split and AND-Join transitions, also called invisibletransitions due to they are not directly related to a pro-cedural step, have a trivial GC, i.e., GCt returns alwaystrue. Thus, they do not add time to the input tokensduring the firing (ToC(t, xin ) = xin ).

Summarizing, based on the definitions provided in this sec-tion, a WF-Net-based OP is formally expressed as the tu-ple (P, T, I,O,G, ÎŒ, pi , pe), where ÎŒ : T → [0,∞) maps ev-ery transition to the associated maximum step duration, andG : T → GC∗ does the same with the GCs [GC∗ is the spaceof GCs (see Definition 3)].

One common issue to address when defining a workflow pro-cess is the formal verification of the soundness of the workflow.A sound workflow requires that at the moment it terminatesthere is a token in the end place and all the other places areempty. Since a WF-Net-based OP is aimed to be used as anevaluator, soundness is not desired, and instead, we want the

Page 117: Automated Methods for the Evaluation and Analysis of

2754 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 14, NO. 6, JUNE 2018

process to be able to finish in different markings so that dif-ferent evaluations can be assigned based on the final marking.Formally, a WF-Net is sound if and only if the correspondingextended WF-Net, which connects the end place to the initialplace, meets the following properties [19].

1) It is live, which means that for every reachable mark-ing M â€Č and every transition t, there is a marking M â€Čâ€Č

reachable from M â€Č which enables t.2) It is bounded, which means that for each place p there is

a natural number n such that for every reachable state thenumber of tokens in p is less than n.

In a WF-Net-based OP, we might have one or many concur-rent containers holding check or supervision substeps. In suchcases, there is the possibility that some of them are fulfilled andsome other not. Thus, the AND-join transition that concludesthe fulfilled branch of the concurrent container would notbe fired, leaving some tokens (the ones associated to the ful-filled substeps) blocked. Since in the extended net the end placeis connected to the initial place, this in turn can result into asink of tokens. Thus, such a WF-Net-based OP would not bebounded. Regarding the live property, it is easy to see that thecondition of an action step can be fulfilled for a given token at aspecific dynamic fulfillment interval, but when the token comesback from the end place to the initial place, it may have a highertimestamp, and the same condition may not be fulfilled, blockingthe token and transforming some markings that were reachablebefore into unreachable markings. Thus, a WF-Net-based OP isnot live, and hence it is not sound neither.

IV. WF-NET-BASED AUTOMATIC PROCEDURE FOLLOWING

EVALUATION (WF-NET APFE)

Although the idea of modeling an OP has many advantagesand applications, the OP dynamics defined in this paper arefocused on the application of the model to analyze the procedurefollowing level of an operator. To the best of our knowledge,this concept, which can be named as APFE, is new, and onlya few works have issued the topic by other techniques, such assequence alignment methods [20].

Algorithm 1 summarizes the process of Automatic ProcedureFollowing Evaluation (APFE): given a WF-Net-based OP withall its attributes (OP ), a data log (Δ) and a start time when wewant to begin the evaluation (x0), a single token 〈OP.pi, x0〉 iscreated in the initial place of the OP (recall that this place alwaysexists in a WF-Net). This token comprises the initial marking ofthe OP, namely M . During the execution of the algorithm, wewill need to keep track of the invalid conditions. An invalid con-dition is a pair 〈t, k〉 ∈ T ×K, such that ToC(t, k.x) = −1,i.e., such that the token k does not meet the GC of the tran-sition t. We denote by IC the set of tracked invalid condi-tions. At the beginning of the algorithm, IC is initialized to theempty set.

After the initialization, the dynamics of WF-Net-based OP areexecuted in a loop (see line 5) while the following conditionsare met.

1) There is at least one enabled transition in the currentmarking of the net, i.e., ∃t ∈ T | ∀p ∈ ‱t M(p) ïżœ= ∅.

Algorithm 1: Workflow Net (WF-Net)-based AutomaticProcedure Following Evaluation (WF-Net APFE).

Input: OP = (P, T, I,O,C, ÎŒ, pi , pe) is an OperatingProcedure modelled as a Workflow Net (WF-Net).Δ = (V,U,R) is the data log of the operation. x0

represents the start time for the evaluation.Output: Tuple containing the following elements: A

boolean value indicating whether the procedure hasbeen successfully followed or not, the final marking ofthe Petri Net at the end of the process, and the timespent in the Procedure Following Evaluation.

1: function APFE (OP , Δ, x0)2: M ← {〈OP.pi, x0〉}3: OP.setMarking(M)4: IC ← ∅ ïżœ Invalid conditions5: while (∃t ∈ enabledTransitions(OP ) |

∀p ∈ ‱t∃k ∈M(p) | 〈t, k〉 /∈ IC) do6: Îș← ∅ ïżœ Valid tokens7: Îł ← ∅ ïżœ Times of completion8: for all p ∈ ‱t do9: k← arg maxk â€Č∈M (p)|〈k â€Č,t〉/∈IC (kâ€Č.x)

10: Îș← Îș âˆȘ {k}11: γ← Îł âˆȘ {〈k, ToC(t, k.x)〉} ïżœ See Def. 412: end for13: if ïżœk ∈ Îș | 〈k,−1〉 ∈ Îł then14: xout ←max〈k,x〉∈γ (x)15: M â€Č ← firing(t,M, Îș, xout) ïżœ See Def. 516: M ←M â€Č

17: else18: IC ← IC âˆȘ {〈t, k〉|k ∈ Îș | 〈k,−1〉 ∈ Îł}19: end if20: end while21: Mf ← OP.getMarking()22: time←maxk∈Mf

(k.x− x0)23: if |Mf (OP.pe) | > 0 then24: return < true,Mf , time >25: else26: return < false,Mf , time >27: end if28: end function

2) For every input place, there is at least one token that isnot part of a tracked invalid condition, i.e., ∀p ∈ ‱t∃k ∈M(p) | 〈t, k〉 /∈ IC.

For each iteration of the loop, the algorithm tries to fire atransition. A set of tokens to be consumed, namely Îș ⊂M(‱t),will be created by adding, from each input place, the “valid”token with the maximum timestamp. This rule is applied inorder to avoid the ambiguity of the process when selecting inputtokens that share the same place. Then, the time of completionfunction (ToC) is computed for all the selected input tokens.The dynamic fulfillment interval [x, x + ÎŒ(t)] used in the ToCdepends on the step type to which the transition belongs (seeFig. 2). The ToC will search in data log (Δ) whether the GCassociated to the transition is fulfilled between x and x + ÎŒ(t),

Page 118: Automated Methods for the Evaluation and Analysis of

RODRıGUEZ-FERNANDEZ et al.: AUTOMATIC PROCEDURE FOLLOWING EVALUATION USING PETRI NET-BASED WORKFLOWS 2755

and based on this it will decide both the firing of the transitionand the timestamp of the output token(s). In case that the guardcondition is met for every selected input token, i.e., in case thatthe times of completion for all of them are be different from−1, the transition will be fired, consuming the tokens from Îșand producing the output tokens properly. The timestamp of theoutput tokens (xout) will be set as the maximum ToC. Again,this rule is applied in order to be deterministic in the cases where|Îș| > 1, which is something that, in the context of a WF-Net-based OP, only happens at an AND-Join transition at the endof concurrent containers. Here, the input tokens represent theresponse of the operator in each of the branches of the container,and thus, keeping the maximum timestamp among them is thecorrect way to track the general procedure following time. Incase that some of the selected input tokens k ∈ Îș do not meetthe GC of the transition, the corresponding pairs 〈k, t〉 are addedto the set of invalid conditions IC.

When the above loop ends, the net reaches a marking Mf inwhich no transition can be fired (Note that a transition can beenabled but not fired). The maximum timestamp of a token inMf will allow us to compute the time spent in the procedurefollowing (see line 22 of the Algorithm 1), and the rest of theinformation contained in Mf will be used to judge whether theOP has been successfully completed or not.

1) If Mf contains a token placed in the end place (pe ) of thenet (see Algorithm 1, line 23), then we consider that theoperator has followed the OP successfully.

2) Otherwise, if the initial token has been locked in someintermediate place of the net, it means that some proce-dural step in the OP has not been performed correctly.Nevertheless, the algorithm returns the information ofthe marking Mf so that we can analze which steps arecausing troubles for the operators.

V. CASE STUDY: EMERGENCY OP FOR THE “ENGINE BAY

OVERHEATING” ALERT IN AN UNMANNED AIRCRAFT

SYSTEM (UAS)

The use of UASs and their related technologies is becominga hot topic in the last few years, from both the research and theindustrial interests. From an industrial perspective, these tech-nologies have proven to provide different complex applicationsin real and heterogeneous domains, such as coastal monitoring,traffic management, and agriculture among others [21]. Fromthe research point of view, UAS operations have been highlystudied for many years in the field of aeronautics and coopera-tive control [22], though lately more and more computer sciencefields are emerging on the scene, by developing complex algo-rithms for mission planning [23] and for extracting behavioralpatterns among operators [24], [25], to mention a few examples.

Typically, the whole system involved in an operation withUAV is known as Unmanned Aircraft System (UAS), and itis mainly composed of the UAV itself and a ground controlstation (GCS), from where a single operator (or a crew) remotelysupervises and controls the route and the advances of the UAV ina specific mission. Due to the high costs involved in any missionof this kind, every critical step or possible failure is controlled

I

I

I I

II

II

II

II

II

III I

II

I I

I

Fig. 4. Description of the OP for the “Engine Bay Overheating” alert, asdescribed in an Airbus-based checklist document for UAV operations.4

by the guidelines of detailed action checklists, as it happens inmanned operations.

In order to put into practice, the theoretical aspects describedin this paper, we will study how to model and evaluate the re-sponse of an operator in an UAS while following an OP designedto face the alert “Engine Bay Overheating.” This alert is firedwhen the temperature inside the engine bay of an UnmannedAerial Vehicle (UAV) is over the equipment operative limit. Therisk of damage or loss of equipment creates a serious danger, andthus, in case the operator is unable to reduce the temperature,he/she must act immediately aborting the mission and landingthe UAV.

A. Creating the WF-Net for the “Engine BayOverheating” OP

In Fig. 4, a snapshot of an OP document related to the alert“Engine Bay Overheating” is shown. There are some importantaspects to consider in this document.

1) The different checks comprising the first procedural stepare carried out in parallel. These checks are consideredto be performed visually and instantaneously.

2) If any of the checks comprising the first procedural stepis not fulfilled, it is considered that the alert is not real,i.e., it is a spurious, and the operator should not continuethe next steps of the OP.

Fig. 5 shows a graphical representation of the WF-Net thatresults from applying the modeling techniques proposed in thispaper to the “Engine Bay Overheating” OP shown in Fig. 4.The type and maximum step duration (ÎŒ) for every step is givenin Table I. Based on this, each procedural step has been ex-pressed as a set of places, transitions and arcs, following the

4The contents of the OP shown here have been adapted from a real checklistused by Airbus Defence and Space Company. However, and due to a confidentialagreement with this company, any detail or sensitive information has beenremoved.

Page 119: Automated Methods for the Evaluation and Analysis of

2756 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 14, NO. 6, JUNE 2018

Fig. 5. Petri Net-based workflow model for the “Engine Bay Overheat-ing” OP showed in Fig. 4.

rules summarized in Fig. 2. To allow a better understandingof the model, transitions are labeled according to the name ofthe step to which they belong, prefixed by the step type (e.g.,CHECK:S_1.2) and suffixed by the labels “FULFILLED” or“NOT FULFILLED” for check and supervision steps. Beloware remarked some details related to the modeling process.

1) Step 1 is modeled as concurrent container, due to theparallel nature of the substeps contained in it. Thus, anAND-Split/AND-Join routing block is added and linked tothefulfilled transition of every substep, ensuring thefulfillment of all of them to continue. Likewise, the placepe acts not only as an end place, but also as an OR-Joinrouting block, since it gathers the nonfulfillment of anyof the substeps.

2) In Step 2, it is mandatory to follow the order of thesubsteps, thus it is modeled as a sequential container.

3) Step 3: “Supervise that EBO caution disappears ...” makesthe operator wait for a certain period of time until the con-dition is fulfilled. This is modeled as a supervision step,whose associated maximum step duration (ÎŒ) establishesthe maximum waiting time.

The data logs used in this experimentation has been extractedusing the message formats and variables described in the NATOStandardization Agreement 4586 (STANAG 4586) [26], thatrecord the state of the UAV and the GCS periodically duringa mission. The data log will be accessed by the GCs of every

transition in the model, in order to route the token throughout theworkflow. The formal definition of the GCs in terms of entriesof the data log is shown in Table I.

B. Testing the APFE Algorithm in the “Engine BayOverheating” Model

In order to test the model created in the previous section andto make a better understanding about how the APFE algorithm(shown in Section IV) actually works in practice, some test caseshave been developed. These test cases have been designed bymodifying not only the conditions in which the alert “EngineBay Overheating” is triggered, but also looking for differentresponses from the operators.

Table II shows the different test scenarios defined for thiscase study. Each of them simulates a possible combination ofFulfillment (F) and not Fulfillment (!F) of the different checksand supervisions found in the Engine Bay Overheating OP, inorder to cover every possible way of facing the alert. The mostinteresting scenarios in terms of evaluating the operator responseare TS1 and TS5, due to they are the only scenarios where thetwo initial check steps (Steps 1.1 and 1.2) are fulfilled, and thus,the entire Step 1 is completed and we can consider that the alertis real, thus the operator needs to continue the procedure. Therest of the cases represent spurious, i.e., false alerts.

On the other hand, the actions performed by a battery of testoperators in order to face the alert are recorded in Table III.Each cell represents the time, measured in seconds since thebeginning of the alert, when the operator completed the action,and in case the operator did not perform the action a “-” symbolis shown.

With regard to the parameter tuning involved in this casestudy, the start time for the APFE algorithm, x0, must be set.For the sake of simplicity, it is set to the time when the alert wastriggered. Regarding the implementation details: the design ofthe model shown in Fig. 4; the coding of the dynamics and theGCs of every transition; and the APFE algorithm itself, havebeen developed in Java 8, based on the PN simulation enginePetriNetSim.5 The reason for using this tool is that it providescode generation capabilities, thus we can draw a WF-Net andthen translate it into a Java class, where it is easier to implementthe GCs shown in Table I in terms of a data log. This way, theWF-Net becomes part of a bigger evaluation tool, which fitsour needs since we are not interested in simulating the WF-Net,but in using it externally to evaluate the operator response. Thesource code is available on Github.6

The results of running the APFE algorithm for each test caseare shown in Table IV. There are a total of 80 test cases, combin-ing the test scenarios shown in Table II with the test operators ofTable III. Based on these results, we can extract some remark-able conclusions about this case study.

1) Regarding the test cases run in a real-alert test scenario(i.e., TS1 and TS5), only one test operator (TO8 in thecase of TS1, and TO10 in TS5) was well evaluated by the

5PetrinetSim: https://github.com/zamzam/PetriNetSim.6Source code of this work: https://github.com/lordbitin/TII-2017.

Page 120: Automated Methods for the Evaluation and Analysis of

RODRıGUEZ-FERNANDEZ et al.: AUTOMATIC PROCEDURE FOLLOWING EVALUATION USING PETRI NET-BASED WORKFLOWS 2757

TABLE IINFORMATION FOR MODELLING THE OP “ENGINE BAY OVERHEATING”

# Step Type Ό Guard Condition

Step 1 Concurrent Container – –

Step 1.1 Check 0 GC(Δ) =

{true if∃x ∈ [0,∞) | 〈x, 58501.07, 0〉 ∈ Δ ∧ 〈x, 58501.15, 0〉 ∈ Δ

false otherwise

Step 1.2 Check 0 GC(Δ) =

{true if∃x ∈ [0,∞), d ∈ R | d ≄ 70 ∧ 〈x, 55253.02, d〉 ∈ Δ

false otherwise

Step 2 Action 15 GC(Δ) =

{true if∃ 〈x, 2002.43.05, d〉 , 〈xâ€Č, 2002.43.05, dâ€Č〉 ∈ Δ | d ïżœ= dâ€Č

false otherwise

Step 3 Supervion 10 GC(Δ) =

{true if∃x ∈ [0,∞) | 〈x, 55254.04, 3〉 ∈ Δ

false otherwiseStep 4 Sequential Container – –

Step 4.1 Action 15 GC(Δ) =

{true if∃x ∈ [0,∞) | 〈x, 2002.43.15, 4〉 ∈ Δ ∧ 〈x, 2001.42.04, 33〉 ∈ Δ

false otherwise

Step 4.2 Action 15 GC(Δ) =

{true if∃x ∈ [0,∞) | 〈x, 2001.42.04, 19〉 ∈ Δ

false otherwise

Step 5 Action 15 GC(Δ) =

{true if∃x ∈ [0,∞) | 〈x, 2001.42.04, 33〉 ∈ Δ

false otherwise

The variable names follow the format [Message type].[Field number] coming from STANAG 4586 (NATO Standardization Agreement 4586). All GCs are expressed in terms of a set oflog entries Δ ∈ E ∗ (E = (0,∞) × V ×D ) from a given data log Δ = (V , U, R).

TABLE IITEST SCENARIOS DEFINED FOR THIS CASE STUDY, WHERE F MEANS

“fulfilled” AND !F , “!fulfilled”

TS1 TS2 TS3 TS4 TS5 TS6 TS7 TS8

Step 1.1 F !F F !F F !F F !FStep 1.2 F F !F !F F F !F !FStep 3 F F F F !F !F !F !F

TABLE IIITEST OPERATORS DEFINED FOR THIS CASE STUDY

Action 2 Action 4.1 Action 4.2 Action 5

TO1 – – – –TO2 10 – – –TO3 – 10 – –TO4 – – 10 –TO5 – – – 15TO6 10 20 – –TO7 10 – 20 –TO8 10 – – 25TO9 – 10 20 –TO10 10 20 30 –

Each cell contains the ToC of an action. The symbol “-” means thatthe action has not been performed by the operator.

algorithm, which makes sense due to they are the onlyoperators that performed all the necessary actions for thatscenario. However, since the algorithm also returns thelast marking of the PN, we can analyze which is thefirst missing action that causes an operator to fail in theprocedure following. As an example, for the operatorsTO1,TO3,TO4,TO5, and TO9, the algorithm returns 1p2

as last state, which means that they are obviously missing

action two of the procedure. That information could beused by an instructor to improve the performance of theseoperators during a training session.

2) For every test case run in a spurious test scenario (i.e.,TS2, TS3, TS4, TS6, TS7, and TS8), the APFE algo-rithm returns TRUE as a general response and 0 as thetime spent to complete the OP, regardless the test opera-tor. This means that no matter what actions the operatorperform, he/she will always be well evaluated with re-spect to this alert if the alert is not real. Although this canbe a problem of the OP itself, which should ask for anexplicit “acknowledge” action as a last step, our intentionfor the future is to improve the APFE algorithm so thatit can evaluate when the operator “overreacts” to an alertwith unnecessary actions.

VI. RELATED WORK

Once the proposed approach has been described and appliedin a case study, it is important to provide a comparison studywith previous works in the context of PFE. From a generalperspective, there is a family of process mining techniques,called conformance checking, to compare the behavior betweena business process model and an event log of the same process,which are comparable to this approach [5], [6]. Additionally,if we focus specifically on procedural behavior OPs, the maincontributions devoted to evaluate it are found in [4] and [20].Finally, the use of model checking and WfM for analyzingEmergency OPs [27], [28] is closely related to this work, buttackling the problem from the point of view of the system itself,not from the operator who acts externally with it. Thus, we willnot include them into this study.

In order to perform the comparison, we consider multiplefeatures.

Page 121: Automated Methods for the Evaluation and Analysis of

2758 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 14, NO. 6, JUNE 2018

TABLE IVRESULTS OF THE APFE ALGORITHM FOR EACH TEST CASE (TEST SCENARIO + TEST OPERATOR) DEVELOPED IN THIS CASE STUDY

TS1 TS2 TS3 TS4 TS5 TS6 TS7 TS8

TO1 f, 1p2, 0 t, 1pe + 1paj 1.2, 0 t, 1pe + 1paj 1.1, 0 t, 2pe , 0 f, 1p2, 0 t, 1pe + 1paj 1.2, 0 t, 1pe + 1paj 1.1, 0 t, 2pe , 0TO2 f, 1p5, 15 t, 1pe + 1paj 1.2, 0 t, 1pe + 1paj 1.1, 0 t, 2pe , 0 f, 1p4.1, 10 t, 1pe + 1paj 1.2, 0 t, 1pe + 1paj 1.1, 0 t, 2pe , 0TO3 f, 1p2, 0 t, 1pe + 1paj 1.2, 0 t, 1pe + 1paj 1.1, 0 t, 2pe , 0 f, 1p2, 0 t, 1pe + 1paj 1.2, 0 t, 1pe + 1paj 1.1, 0 t, 2pe , 0TO4 f, 1p2, 0 t, 1pe + 1paj 1.2, 0 t, 1pe + 1paj 1.1, 0 t, 2pe , 0 f, 1p2, 0 t, 1pe + 1paj 1.2, 0 t, 1pe + 1paj 1.1, 0 t, 2pe , 0TO5 f, 1p2, 0 t, 1pe + 1paj 1.2, 0 t, 1pe + 1paj 1.1, 0 t, 2pe , 0 f, 1p2, 0 t, 1pe + 1paj 1.2, 0 t, 1pe + 1paj 1.1, 0 t, 2pe , 0TO6 f, 1p5, 15 t, 1pe + 1paj 1.2, 0 t, 1pe + 1paj 1.1, 0 t, 2pe , 0 f, 1p4.2, 20 t, 1pe + 1paj 1.2, 0 t, 1pe + 1paj 1.1, 0 t, 2pe , 0TO7 f, 1p5, 15 t, 1pe + 1paj 1.2, 0 t, 1pe + 1paj 1.1, 0 t, 2pe , 0 f, 1p4.1, 10 t, 1pe + 1paj 1.2, 0 t, 1pe + 1paj 1.1, 0 t, 2pe , 0TO8 t, 1pe , 25 t, 1pe + 1paj 1.2, 0 t, 1pe + 1paj 1.1, 0 t, 2pe , 0 f, 1p4.1, 10, t, 1pe + 1paj 1.2, 0 t, 1pe + 1paj 1.1, 0 t, 2pe , 0TO9 f, 1p2, 0 t, 1pe + 1paj 1.2, 0 t, 1pe + 1paj 1.1, 0 t, 2pe , 0 f, 1p2, 0 t, 1pe + 1paj 1.2, 0 t, 1pe + 1paj 1.1, 0 t, 2pe , 0TO10 f, 1p5, 15 t, 1pe + 1paj 1.2, 0 t, 1pe + 1paj 1.1, 0 t, 2pe , 0 t, 1pe , 30 t, 1pe + 1paj 1.2, 0 t, 1pe + 1paj 1.1, 0 t, 2pe , 0

Bolded cells mark the cases where the test scenario represents a real alert and the operator followed the procedure successfully. t and f denote true and false, respectively.

TABLE VFEATURE COMPARISON BETWEEN PFE APPROACHES

Auto. MS TS ST Ό

EBAT [4] ✗√

✗ All ✗Sequence alignment [20]

√✗ ✗ A-C ✗

Conformance Checking [5], [6]√ √

✗ A-C ✗Proposed

√ √ √All

√

1) Automatic (Auto.): the approach can be executed auto-matically by a computer.

2) Multiscenario: the approach takes into account the differ-ent scenarios in which the same procedure can be evalu-ated.

3) Time spent (TS): the approach records the time spent bythe operator in the procedure following.

4) Step types (ST): the approach accepts OPs with Actions(“A”), Checks (“C”), Supervisions (“S”), or all of them(“All”).

5) Maximum step duration (Ό): the approach allows us toconstraint the maximum time spent in every proceduralstep.

A summary of the qualitative comparison based on thesefeatures is shown in Table V.

1) Dietz et al. describe in [4] an adaptation of the Event-Based Approach for Training (EBAT) framework to eval-uate the procedural behavior in UASs. Here, an alertin the system is seen as an “event” and the procedu-ral steps are specified as “targeted responses” for thatevent. The instructor is responsible for manually markingYES/NO/unnecessary for each targeted response,depending on the response of the operator and the cur-rent training scenario. No timestamps and maximum stepduration are considered.

2) Kim describes in [20] an automatic quantification of theprocedure following level based on sequence alignmenttechniques. It needs a reference (or “standard”) sequenceto compute the similarity between the operator responseand that reference, so we would need to create multiplereferences to make a multiscenario evaluation, which isnot scalable. Furthermore, it only takes into account theorder of the operator actions, not the time spent in eachof them.

3) Aalst et al. [5] and Garcia-Banuelos et al. [6] aim at theautomatic detection of inconsistencies between a processmodel and its corresponding event log. The process modelis also based on WF-Nets, which allows multiple execu-tion contexts. Only actions and checks steps are allowedsince they can be expressed as events. In these cases,the event log do not include timestamps to keep track ofthe conformance time or to constraint the maximum stepduration.

4) Our approach describes the procedural behavior as a PN-based workflow, allowing an automatic evaluation of themultiple scenarios and routes of the procedure, and keep-ing track of the time spent by the operator in each step.Different step types can be modeled due to the flexibilityof the GCs, and a maximum step duration can be set forevery step.

VII. CONCLUSION

This paper has provided a new approach for modeling OPsbased on PN-based Workflows, or WF-Nets, focusing on theuse of the model as a tool to allow an APFE. No restriction isimposed over the domain field in which this approach couldbe used. Since the dynamics of a WF-Net-based operatingprocedure (OP) are based on the presence of a data log, weneed that the system in which the OP is executed providesdata-recording capabilities in some way. In other words, theconcepts of OP models and APFE are closely linked to the datarecorded by the system.

In order to test the proposed modeling methods and the APFEalgorithm, an OP from the area of UAV operations has beenmodeled, in concrete, the do-list associated to the alert “En-gine Bay Overheating.” Some test cases have been designed todemonstrate the usefulness of the model and to show how theAPFE algorithm works under different scenarios and operators.

Several future works and extensions are currently under con-sideration.

1) In order to allow an easy creation of WF-Net-based OPs,a Domain-Specific Language tool for translating the con-cepts and relations defined for an OP to the correspondingelements of a WF-Net would be very useful, especiallyfor those domain experts who need to model large numberof OPs in a system.

Page 122: Automated Methods for the Evaluation and Analysis of

RODRıGUEZ-FERNANDEZ et al.: AUTOMATIC PROCEDURE FOLLOWING EVALUATION USING PETRI NET-BASED WORKFLOWS 2759

2) The current APFE algorithm described in this paper couldbe enhanced by returning all the missing actions andoverreactions of the operator.

3) Finally, in order to test the generality of the proposedmethod, more case studies are needed, comparing modelsamong different domain areas (aviation, healthcare, etc.)in which the shape of the OPs would be different.

ACKNOWLEDGEMENTS

The authors would like to acknowledge the support fromAirbus Defence & Space, specially from Savier Open Innovationproject members: Jose Insenser, Gemma Blasco, Juan AntonioHenrıquez and Cesar Castro. Finally, we would like to thank thereviewers and the Editor in Chief for the different suggestionsand comments made to this work.

REFERENCES

[1] J. P. Ornato and M. A. Peberdy, “Applying lessons for commercial avia-tion safety and operations to resuscitation,” Resuscitation, vol. 85, no. 2,pp. 173–176, 2014.

[2] D. R. Urbach, A. Govindarajan, R. Saskin, A. S. Wilton, and N. N. Baxter,“Introduction of surgical safety checklists,” J. Med., vol. 370, no. 11,pp. 1029–1038, 2014.

[3] R. Kelly, V. Monnelly, and B. Stenson, “G566 (p) checklists for time-critical equipment failure during patient transport,” Arch. Disease Child-hood, vol. 100, pp. A255–A255, 2015.

[4] A. S. Dietz, J. R. Keebler, R. Lyons, E. Salas, and V. C. Ramesh, “De-veloping unmanned aerial system training: An event-based approach,”in Proc. Human Factors Ergonom. Soc. Annu. Meeting, 2013, vol. 57,pp. 1259–1262.

[5] W. Van der Aalst, A. Adriansyah, and B. van Dongen, “Replaying historyon process models for conformance checking and performance analysis,”Data Mining Knowl. Discovery, vol. 2, no. 2, pp. 182–192, 2012.

[6] L. Garcia-Banuelos, N. van Beest, M. Dumas, M. La Rosa, andW. Mertens, “Complete and interpretable conformance checking of busi-ness processes,” IEEE Trans. Softw. Eng., vol. PP, no. 99, pp. 1–29, 2017.

[7] A. F. Arriaga et al., “Simulation-based trial of surgical-crisis checklists,”New England J. Med., vol. 368, no. 3, pp. 246–253, 2013.

[8] H. S. Kramer and F. A. Drews, “Checking the lists: A systematic reviewof electronic checklist use in health care,” J. Biomed. Informat., vol. 71,pp. S6–S12, 2017.

[9] H. A. Reijers, H. Leopold, and J. Recker, “Towards a science of check-lists,” Proc. 50th Hawaii Int. Conf. Syst. Sci., 2017, pp. 5773–5782.

[10] C. Thongprayoon, A. M. Harrison, J. C. O’Horo, R. A. S. Berrios,B. W. Pickering, and V. Herasevich, “The effect of an electronic checkliston critical care provider workload, errors, and performance,” J. IntensiveCare Med., vol. 31, no. 3, pp. 205–212, 2016.

[11] A. Ikram, A. Anjum, and N. Bessis, “A cloud resource management modelfor the creation and orchestration of social communities,” Simul. Model.Pract. Theory, vol. 50, pp. 130–150, 2015.

[12] A. Pla, P. Gay, J. Melendez, and B. Lopez, “Petri net-based process mon-itoring: a workflow management system for process modelling and mon-itoring,” J. Intell. Manuf., vol. 25, no. 3, pp. 539–554, 2014.

[13] K. Jensen, Coloured Petri Nets: Basic Concepts, Analysis Methods andPractical Use, vol. 1. New York, NY, USA: Springer, 2013.

[14] J. Liu, E. Pacitti, P. Valduriez, and M. Mattoso, “A survey of data-intensive scientific workflow management,” J. Grid Comput., vol. 13, no. 4,pp. 457–493, 2015.

[15] W. Liu, Y. Du, M. Zhou, and C. Yan, “Transformation of logical workflownets,” IEEE Trans. Syst., Man, Cybern., Syst., vol. 44, no. 10, pp. 1401–1412, Oct. 2014.

[16] A. Arkoma, M. Hanninen, K. Rantamaki, J. Kurki, and A. Hamalainen,“Statistical analysis of fuel failures in large break loss-of-coolant acci-dent (LBLOCA) in EPR type nuclear power plant,” Nuclear Eng. Desi.,vol. 285, pp. 1–14, 2014.

[17] W. Xu, L. Huang, A. Fox, D. Patterson, and M. I. Jordan, “Detecting large-scale system problems by mining console logs,” in Proc. ACM SIGOPS22Nd Symp. Oper. Syst. Principles, 2009, pp. 117132.

[18] M. de Leoni and W. M. P. van der Aalst, “Data-aware process mining:Discovering decisions in processes using alignments,” in Proc. 28th Annu.ACM Symp. Appl. Comput., 2013, pp. 1454–1461.

[19] R. Klimek, “A system for deduction-based formal verification ofworkflow-oriented software models,” Int. J. Appl. Math. Comput. Sci.,vol. 24, no. 4, pp. 941–956, 2014.

[20] Y. Kim, “Analyzing procedural behaviors of human-machine systems us-ing sequence alignment techniques,” Jpn. J. Ergonom., vol. 49, no. Suppl.,pp. S451–S454, 2013.

[21] I. Colomina and P. Molina, “Unmanned aerial systems for photogrammetryand remote sensing: A review,” ISPRS J. Photogramm. Remote Sens.,vol. 92, pp. 79–97, 2014.

[22] J. Ruiz, A. Viguria, J. Martinez-de Dios, and A. Ollero, “Immersive dis-plays for building spatial knowledge in multi-UAV operations,” in Proc.2015 IEEE Int. Conf. Unmanned Aircr. Syst., 2015, pp. 1043–1048.

[23] C. Ramirez-Atencia, S. Mostaghim, and D. Camacho, “A knee pointbased evolutionary multi-objective optimization for mission planningproblems,” in Proc. Genetic Evol. Comput. Conf., 2017, pp. 1216–1223.

[24] V. Rodrıguez-Fernandez, A. Gonzalez-Pardo, and D. Camacho, “Mod-elling behaviour in UAV operations using higher order double chainMarkov models,” IEEE Comput. Intell. Mag., vol. 12, no. 4, pp. 28–37,2017.

[25] V. Rodrıguez-Fernandez, H. D. Menendez, and D. Camacho, “Analysingtemporal performance profiles of UAV operators using time series clus-tering,” Expert Syst. Appl., vol. 70, no. 15, pp. 103–118, 2017.

[26] S. Frazzetta and M. Pacino, “A STANAG 4586 oriented approach to UASnavigation,” J. Intell. Robot. Syst., vol. 69, no. 14, pp. 21–31, 2013.

[27] C. Liu, Q. Zeng, H. Duan, M. Zhou, F. Lu, and J. Cheng, “E-Net modelingand analysis of emergency response processes constrained by resourcesand uncertain durations,” IEEE Trans. Syst. Man, Cybern. Syst., vol. 45,no. 1, pp. 84–96, Jan. 2015.

[28] W. Tan and M. Zhou, Business and Scientific Workflows: A Web Service-Oriented Approach, vol. 5. Hoboken, NJ, USA: Wiley, 2013.

Vıctor Rodrıguez-Fernandez received theB.Sc. degree in computer science and the B.Sc.degree in mathematics from the UniversidadAutonoma de Madrid, Madrid, Spain, in 2013,and the M.Sc. degree in computer sciencefrom the Universidad Autonoma de Madrid in2015. He is currently working toward the Ph.D.degree at Escuela Politecnica Superior (EPS-UAM), Madrid, Spain.

He is currently involved with the AIDA re-search group at EPS-UAM. His research in-

terests include conformance checking, behavioral modeling, patternrecognition, and unmanned aerial systems.

Antonio Gonzalez-Pardo (M’15) received thePh.D. degree in computer science from the Uni-versidad Autonoma de Madrid, Madrid, Spain, in2014.

He is currently a Lecturer with the UniversidadAutonoma de Madrid. His main research inter-ests include computational intelligence (geneticalgorithms, PSO, SWARM intelligence, etc.),and machine learning techniques. The applica-tion domains for his research are constraint sat-isfaction problems, complex graph-based prob-

lems, optimization problems, and video games.

David Camacho (M’14) received the Ph.D.degree in computer science from the Universi-dad Carlos III de Madrid, Madrid, Spain, in 2001.

He is currently an Associate Professor withthe Department of Computer Science, Universi-dad Autonoma de Madrid, Spain, and the Headof the Applied Intelligence and Data AnalysisGroup. He has authored or coauthored morethan 200 journals, books, and conference pa-pers. His research interests includes data mining(clustering), evolutionary computation (GA and

GP), multiagent systems and swarm intelligence (Ant Colonies), auto-mated planning and machine learning, or video games among others.

Page 123: Automated Methods for the Evaluation and Analysis of

Bibliography

[ABSW05] Amy L Alexander, Tad Brunye, Jason Sidman, and Shawn A Weil. From gam-ing to training: A review of studies on fidelity, immersion, presence, and buy-inand their effects on transfer in pc-based simulations and games. DARWARSTraining Impact Group, 5:1–14, 2005.

[AK12] Mamoun A Awad and Issa Khalil. Prediction of user’s web-browsing behavior:Application of markov model. IEEE Transactions on Systems, Man, andCybernetics, Part B (Cybernetics), 42(4):1131–1142, 2012.

[ASSB+16] Afnan Al-Subaihin, Federica Sarro, Sue Black, Licia Capra, Mark Harman,Yue Jia, and Yuanyuan Zhang. Clustering mobile apps based on mined textualfeatures. In 2016 International Symposium on Empirical Software Engineeringand Measurement, page In press. IEEE, 2016.

[BA04] Kenneth P Burnham and David R Anderson. Multimodel inference under-standing aic and bic in model selection. Sociological methods & research,33(2):261–304, 2004.

[BC11] Yves Boussemart and Mary L. Cummings. Predictive models of human su-pervisory control behavioral patterns using hidden semi-Markov models. En-gineering Applications of Artificial Intelligence, 24(7):1252–1262, 2011.

[BC17] Andrea Burattin and Josep Carmona. A framework for online conformancechecking. In International Conference on Business Process Management, pages165–177. Springer, 2017.

[BCFR11] Yves Boussemart, Mary L Cummings, Jonathan Las Fargeas, and NicholasRoy. Supervised vs. unsupervised learning for operator state modeling inunmanned vehicle settings. Journal of Aerospace Computing, Information,and Communication, 8(3):71–85, 2011.

[Beg00] Glenn Begis. Adaptive gaming behavior based on player profiling. US PatentUS6106395 A, 2000.

[Ber99] A Berchtold. The double chain Markov model. Communications in Statistics-Theory and Methods, 1999.

111

Page 124: Automated Methods for the Evaluation and Analysis of

112 Bibliography

[Ber02] A Berchtold. High-order extensions of the double chain Markov model.Stochastic Models, 2002.

[BJBR+16] Winston Bennett Jr, John B Bridewell, Leah J Rowe, Scotty D Craig, andHans M Poole. Training issues for remotely piloted aircraft systems from ahuman systems integration perspective. Remotely Piloted Aircraft Systems: AHuman Systems Integration Perspective, page 163, 2016.

[BMS16] Andrea Burattin, Fabrizio M. Maggi, and Alessandro Sperduti. Conformancechecking based on multi-perspective declarative process models. Expert Sys-tems with Applications, 65:194–211, dec 2016.

[Bou11] Yves Boussemart. Predictive models of procedural human supervisory controlbehavior. Technical report, Massachusetts Institue of Technology, Cambridge,2011.

[BP66] Leonard E Baum and Ted Petrie. Statistical inference for probabilistic func-tions of finite state markov chains. The annals of mathematical statistics,pages 1554–1563, 1966.

[CC07] Jacob W Crandall and Mary L Cummings. Identifying predictive metricsfor supervisory control of multiple robots. Robotics, IEEE Transactions on,23(5):942–951, 2007.

[CJN14] Ricardo Roman Cordon, Francisco Javier, and Saez Nieto. RPAS Integrationin Non-segregated Airspace: The SESAR Approach. In Sesar Wpe. FourthSESAR Innovation Days, 25th – 27th November 2014, pages 1–8, 2014.

[CWM08] Joyce Chiang, Z Jane Wang, and Martin J McKeown. A hidden markov, mul-tivariate autoregressive (hmm-mar) network framework for analysis of surfaceemg (semg) data. IEEE Transactions on Signal Processing, 56(8):4069–4081,2008.

[dAAvD12] Wil der Aalst, Arya Adriansyah, and Boudewijn van Dongen. Replayinghistory on process models for conformance checking and performance analy-sis. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery,2(2):182–192, 2012.

[Dar12] Ed Darack. Uavs: The new frontier for weather research and prediction.Weatherwise, 65(2):20–27, 2012.

[DKL+13] Aaron S Dietz, Joseph R Keebler, Rebecca Lyons, Eduardo Salas, andVC Ramesh. Developing unmanned aerial system training: An event-basedapproach. In Proceedings of the Human Factors and Ergonomics Society An-nual Meeting, volume 57, pages 1259–1262. SAGE Publications Sage CA: LosAngeles, CA, 2013.

[dLM17] M de Leoni and A Marrella. Aligning Real Process Executions and PrescriptiveProcess Models through Automated Planning. Expert Systems with Applica-tions, 82(Supplement C):162–183, 2017.

Page 125: Automated Methods for the Evaluation and Analysis of

Bibliography 113

[dLvdA13] Massimiliano de Leoni and Wil M P van der Aalst. Data-aware process mining:discovering decisions in processes using alignments. Proceedings of the 28thAnnual ACM Symposium on Applied Computing - SAC ’13, pages 1454–1461,2013.

[DN04] Matthew DeGarmo and Gregory M Nelson. Prospective unmanned aerial ve-hicle operations in the future national airspace system. In AIAA 4th AviationTechnology, Integration and Operations (ATIO) Forum, pages 20–23, 2004.

[DSY03] Jill L Drury, Jean Scholtz, and Holly A Yanco. Awareness in human-robotinteractions. In Systems, Man and Cybernetics, 2003. IEEE InternationalConference on, volume 1, pages 912–918, Washington, D.C., USA, 2003. IEEE.

[FP13] Stefano Frazzetta and Marco Pacino. A stanag 4586 oriented approach to uasnavigation. Journal of Intelligent & Robotic Systems, 69(1-4):21–31, 2013.

[GBvBD+18] Luciano Garcıa-Banuelos, Nick RTP van Beest, Marlon Dumas, MarcelloLa Rosa, and Willem Mertens. Complete and interpretable conformancechecking of business processes. IEEE Transactions on Software Engineering,44(3):262–290, 2018.

[GJA92] Wayne D Gray, Bonnie E John, and Michael E Atwood. The precis of projecternestine or an overview of a validation of goms. In Proceedings of the SIGCHIconference on Human factors in computing systems, pages 307–312. ACM,1992.

[GO03] Sule Gunduz and M Tamer Ozsu. A web page prediction model based on click-stream tree representation of user behavior. In Proceedings of the ninth ACMSIGKDD international conference on Knowledge discovery and data mining,pages 535–540. ACM, 2003.

[HGAR15] Francisco Hernando-Gallego and Antonio Artes-Rodrıguez. Individual perfor-mance calibration using physiological stress signals. In Workshop of Shimmersensors, IEEE Body Sensor Networks Conference 2015, 2015.

[HH16] Satu Helske and Jouni Helske. Mixture hidden markov models for sequencedata: the seqhmm package in r, 2016.

[HL13] Christian Hennig and Tim F Liao. How to find an appropriate clustering formixed-type variables with application to socio-economic stratification. Journalof the Royal Statistical Society: Series C (Applied Statistics), 62(3):309–369,2013.

[How11] William R Howse. Knowledge, skills, abilities, and other characteristics for re-motely piloted aircraft pilots and operators. Technical report, Damos AviationServices Inc., 2011.

[HPJ10] Lamis Hawarah, Stephane Ploix, and Mireille Jacomino. User behavior pre-diction in energy consumption in housing using bayesian networks. In In-ternational Conference on Artificial Intelligence and Soft Computing, pages372–379. Springer, 2010.

Page 126: Automated Methods for the Evaluation and Analysis of

114 Bibliography

[HRCB12] Tom Haritos, John M Robbins, South Clyde, and Morris Blvd. The Use of HighFidelity Simulators to Train Pilot and Sensor Operator Skills for UnmannedAerial Systems. pages 1–6, 2012.

[HS09] H-SIM. UAV SIMULATOR SOLUTIONS: SIMDRONE. http://www.h-sim.com/new_uav_sims.php, 2009. Last access: 2019-03-04.

[ITV13] David C. Ison, Brent A. Terwilliger, and Dennis A. Vincenzi. Designing Sim-ulation to Meet UAS Training Needs. pages 585–595. Springer Berlin Heidel-berg, 2013.

[Joh09] CW Johnson. The safety research challenges for the air traffic management ofunmanned aerial systems (uas). In Proceedings of the 6th EUROCONTROLExperimental Centre Safety Research and Development Workshop, Munich,Germany, October, volume 21, 2009.

[KR87] Leonard Kaufman and Peter J Rousseeuw. Clustering by means of medoids,statistical data analysis based on the l1 norm and related methods. edited byY. Dodge, North-Holland, pages 405–416, 1987.

[KR95] Robert E Kass and Adrian E Raftery. Bayes factors. Journal of the americanstatistical association, 90(430):773–795, 1995.

[KS39] Maurice G Kendall and B Babington Smith. The problem of m rankings. Theannals of mathematical statistics, 10(3):275–287, 1939.

[LDZY14] Wei Liu, YuYue Du, MengChu Zhou, and Chun Yan. Transformation of log-ical workflow nets. IEEE Transactions on Systems, Man, and Cybernetics:Systems, 44(10):1401–1412, 2014.

[MAHO16] Jesus Martın, Hania Angelina, Guillermo Heredia, and Anıbal Ollero. Tankeruav for autonomous aerial refueling. In Robot 2015: Second Iberian RoboticsConference, pages 571–583. Springer, 2016.

[MC06] L Merino and F Caballero. A cooperative perception system for multipleUAVs: Application to automatic detection of forest fires. Journal of Field,2006.

[McA15] Robert E McAuliffe. Coefficient of variation. Wiley Encyclopedia of Manage-ment, 8(1), 2015.

[MdBJG16] Tomas Mantecon, Carlos R del Blanco, Fernando Jaureguizar, and NarcisoGarcıa. Hand gesture recognition using infrared imagery provided by leapmotion controller. In International Conference on Advanced Concepts for In-telligent Vision Systems, pages 47–57. Springer, 2016.

[MVC14] Hector D Menendez, Rafael Vindel, and David Camacho. Combining time se-ries and clustering to extract gamer profile evolution. In Computational Col-lective Intelligence. Technologies and Applications, pages 262–271. Springer,2014.

Page 127: Automated Methods for the Evaluation and Analysis of

Bibliography 115

[oDUSoA12] Deparment of Defense United States of America. Department of defense reportto congress on future unmanned aircraft systems training, operations, andsustainability. D. o. Defense, Ed., ed. Washington, DC, 2012.

[OP14] Joseph P Ornato and Mary Ann Peberdy. Applying lessons from commercialaviation safety and operations to resuscitation. Resuscitation, 85(2):173–176,2014.

[PBC+09] E Pereira, R Bencatel, J Correia, L Felix, G Goncalves, J Morgado, andJ Sousa. Unmanned Air Vehicles for Coastal and Environmental Research.Journal of Coastal Research Journal of Coastal Research SI Journal of CoastalResearch SI, 56(56):1557–1561, 2009.

[PBR12] Jean-Charles Pomerol and Sergio Barba-Romero. Multicriterion decision inmanagement: principles and practice, volume 25. Springer Science & BusinessMedia, 2012.

[PCBPLOdlC16] Sara Perez-Carabaza, Eva Besada-Portas, Jose A Lopez-Orozco, and Jesus Mde la Cruz. A real world multi-uav evolutionary planner for minimum timetarget detection. In Proceedings of the 2016 on Genetic and EvolutionaryComputation Conference, pages 981–988. ACM, 2016.

[PL05] Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sen-timent categorization with respect to rating scales. In Proceedings of the 43rdannual meeting on association for computational linguistics, pages 115–124.Association for Computational Linguistics, 2005.

[PP10] Alexander Pak and Patrick Paroubek. Twitter as a corpus for sentimentanalysis and opinion mining. In Proceedings of the International Conference onLanguage Resources and Evaluation, LREC 2010, 17-23 May 2010, Valletta,Malta, 2010.

[RARFGPC17] Cristian Ramırez-Atencia, Vıctor Rodrıguez-Fernandez, Antonio Gonzalez-Pardo, and David Camacho. New Artificial Intelligence approaches for futureUAV Ground Control Stations. In 2017 IEEE Congress on Evolutionary Com-putation, CEC 2017, Donostia, San Sebastian, Spain, June 5-8, 2017, pages2775–2782. IEEE, 2017.

[Rem10] Hilde Therese RemĂžy. Out of office: a study on the cause of office vacancyand transformation as a means to cope and prevent. IOS Press, 2010.

[RFGPC15] Vıctor Rodrıguez-Fernandez, Antonio Gonzalez-Pardo, and David Camacho.Modeling the Behavior of Unskilled Users in a Multi-UAV Simulation Envi-ronment. In Intelligent Data Engineering and Automated Learning – IDEAL2015: 16th International Conference, Wroclaw, Poland, October 14-16, 2015,pages 441–448. Springer International Publishing, 2015.

[RFGPC16] Vıctor Rodrıguez-Fernandez, Antonio Gonzalez-Pardo, and David Camacho.Finding behavioral patterns of UAV operators using Multichannel HiddenMarkov Models. In 2016 IEEE Symposium Series on Computational Intel-ligence, SSCI 2016, Athens, Greece, December 6-9, 2016, pages 1–8. IEEE,2016.

Page 128: Automated Methods for the Evaluation and Analysis of

116 Bibliography

[RFGPC18a] Vıctor Rodrıguez-Fernandez, Antonio Gonzalez-Pardo, and David Camacho.Automatic Procedure Following Evaluation using Petri Net-based Workflows.IEEE Transactions on Industrial Informatics, 14(6):2748–2759, 2018.

[RFGPC18b] Vıctor Rodrıguez-Fernandez, Antonio Gonzalez-Pardo, and David Camacho.Modelling Behaviour in UAV Operations Using Higher Order Double ChainMarkov Models. IEEE Computational Intelligence Magazine, 12(4):28–37,2018.

[RFMC15] Vıctor Rodrıguez-Fernandez, Hector D Menendez, and David Camacho. De-sign and development of a lightweight multi-uav simulator. In Cybernetics(CYBCONF), 2015 IEEE 2nd International Conference on, pages 255–260.IEEE, 2015.

[RFMC16] Vıctor Rodrıguez-Fernandez, Hector D. Menendez, and David Camacho. Au-tomatic profile generation for UAV operators using a simulation-based trainingenvironment. Progress in Artificial Intelligence, 5(1):37–46, feb 2016.

[RFMC17a] Vıctor Rodrıguez-Fernandez, Hector D Menendez, and David Camacho. Astudy on performance metrics and clustering methods for analyzing behaviorin UAV operations. Journal of Intelligent and Fuzzy Systems, 32(2):1307–1319,2017.

[RFMC17b] Vıctor Rodrıguez-Fernandez, Hector D Menendez, and David Camacho.Analysing temporal performance profiles of UAV operators using time seriesclustering. Expert Systems with Applications, 70:103–118, 2017.

[RGCGBC16] Juan Jesus Roldan Gomez, Jaime del Cerro Giner, and Antonio Barrien-tos Cruz. Multi-uav coordination and control interface. 2016.

[RGY+15] Susana Ruano, Guillermo Gallego, Anthony Yezzi, Carlos Cuevas, and NarcisoGarcıa. Robust image registration with global intensity transformation. InConsumer Electronics (ISCE), 2015 IEEE International Symposium on, pages1–2. IEEE, 2015.

[Rob14] Mary Rose Roberts. 5 drone technologies for firefighting. http://www.

firechief.com/2014/03/20/5-drone-technologies-firefighting/,2014. Last access: 2019-03-04.

[SM07] Felix Salfner and Miroslaw Malek. Using hidden semi-Markov models foreffective online failure prediction. Proceedings of the IEEE Symposium onReliable Distributed Systems, pages 161–174, 2007.

[ST16] Alana Saulnier and Scott N Thompson. Pol0ice uav use: institutional realitiesand public perceptions. Policing: An International Journal of Police Strategies& Management, 39(4):680–693, 2016.

[UGS+14] David R Urbach, Anand Govindarajan, Refik Saskin, Andrew S Wilton, andNancy N Baxter. Introduction of surgical safety checklists in ontario, canada.New England Journal of Medicine, 370(11):1029–1038, 2014.

[Uni19] Oklahoma State University. Unmanned aircraft systems oklahoma. https:

//unmanned.okstate.edu/, 2019. Last access: 2019-03-04.

Page 129: Automated Methods for the Evaluation and Analysis of

Bibliography 117

[VdA16] Wil MP Van der Aalst. Process mining: data science in action. Springer,2016.

[Vis11] Ingmar Visser. Seven things to remember about hidden markov models: Atutorial on markovian models for time series. Journal of Mathematical Psy-chology, 55(6):403–415, 2011.

[vTKB03] Wil M P van Der Aalst, Arthur H M Ter Hofstede, Bartek Kiepuszewski,and Alistair P Barros. Workflow patterns. Distributed and parallel databases,14(1):5–51, 2003.

[WZ06] J Wu and G Zhou. High-resolution planimetric mapping from UAV video forquick-response to natural disaster. Proc. IEEE Int. Conf. IGARSS, 2006.

[YLZ+14] Ding Yuan, Yu Luo, Xin Zhuang, Guilherme Renna Rodrigues, Xu Zhao,Yongle Zhang, Pranay Jain, and Michael Stumm. Simple testing can preventmost critical failures: An analysis of production failures in distributed data-intensive systems. In OSDI, pages 249–265, 2014.

[Yu10] Shun-Zheng Yu. Hidden semi-markov models. Artificial intelligence,174(2):215–243, 2010.

[ZCE+19] Haibei Zhu, Mary Cummings, Mahmoud Elfar, Ziyao Wang, and MiroslavPajic. Operator strategy model development in uav hacking detection. IEEETransactions on Human-Machine Systems, PP:1–10, 02 2019.

Page 130: Automated Methods for the Evaluation and Analysis of

Appendix A

Additional resources for theexperimental evaluation of time

series-aware conformance checkingon Longwall mining processes

The process model used in this experimentation represents the first half of the theoretical cyclefollowed by a longwall mining shearer (see Figure A.1a). This cycle is shown in Figure A.1b(the continuous black line). The numbers in brackets refer to the phases of the process, whichfollow the sequence: Phase 1 - Phase 2 - Phase 3 - Phase 1 - Phase 4. The last phase isoptional, since there are some cases in which the shearer does not reach the end of the excavationaccording to the model.

Figure A.2 shows a graphical representation of the TSWF-net that results from modellingthe aforementioned sequence of phases. As it can be seen, every phase has been expressed as asequence of two consecutive transitions, one for identifying the start event and one for the end.Since phase 4 is optional, an OR-block (in fact, a XOR-block) is introduced at place p35, andthus, the process has two possible fulfilment paths. All the transitions are assigned a time scopeequals to [0, 10000] (seconds), so that, on the one hand, each event can be identified just afterthe previous one has been treated, and on the other hand, its fulfilment can be checked in awide interval (note that the whole cycle of the shearer can take several hours).

Table A.1: Description of the time series guards of the transitions of the process model “First halfcycle of a shearer”.

Transition (Task) GTS description

Phase 1 [start] Location is less than 40 meters and starts increasingPhase 1 [end] Location is greater or equal than 40 meters or stops increasingPhase 2 [start] Location is less than 40 meters and starts being constantPhase 2 [end] Location stops being constantPhase 3 [start] Location is less than 40 meters and starts decreasingPhase 3 [end] Location stops decreasingPhase 4 [start] Location is greater than 40 meters and starts increasingPhase 4 [end] Location stops increasingPhase 4 not present Location is less than 40 meters and does not increase

In Table A.1 a description of the ts-guards of every transition comprising the TSWF-net of

119

Page 131: Automated Methods for the Evaluation and Analysis of

120Chapter A. Additional resources for the experimental evaluation of time series-aware

conformance checking on Longwall mining processes

(a) Longwall shearer. Source: FAMUR S.A.

Po

siti

on

in t

he

lon

gwal

l fac

e

(1)

(2)

(3)

(4)

time

0

max

max-40m

40m

(b) The general model of a shearer (half) cycle.The number in brackets refer to the phases of the

process.

Figure A.1: Context of the experimentation.

pi

Phase 1 [start]

Phase 1 [end]

Phase 2 [start]

Phase 2 [end]

Phase 3 [start]

Phase 3 [end]

Phase 4 not present

Phase 1: Location increases

Phase 2: Location remains constant

Phase 3: Location

decreases

pe

Phase 4: Location

highly increases

Phase 4 [start]

Phase 4 [end]

Figure A.2: Representation of the process model “First half cycle of a shearer” as a TSWF-net.

Figure A.2 is provided, in both a formal and a descriptive way. As it can be seen, all of themare expressed in terms of changes in the monotonicity of the location of the shearer, in order tomatch the theoretical cycle shown in Figure A.1b. To the best of our knowledge, this way ofmodelling guards is new and paves the way for the analysis of complex time-dependent processmodels.

It is important to observe that – as it was noted above – the mining process itself is veryvariable. Thus the information extracted from it is usually noisy and has data missing. Thus, itis hardly possible that observed cycle of the shearer looks as neat as the curve of Figure A.1b. Asa consequence, we may have a case where a ts-guard is fulfilled (e.g. Phase 1 [start]) but onlyfor a singular and non representative set of log entries, which may lead to wrong conformanceevaluations and to an excess of sensitivity of the proposed algorithms. To prevent this, theconditions of the ts-guards showed in Table A.1 need to be complemented by the constraints ofthe following parameters in order to be considered as fulfilled:

ïżœ Minimum fulfilment duration (ÎŽ ∈ X): It defines the minimum length of an interval whereevery subset of log entries within it must fulfil the condition of the ts-guard.

Page 132: Automated Methods for the Evaluation and Analysis of

121

Table A.2: Parameter tuning for the experiments.

Context Parameter Value

Time series log Index space (X) NVariable set (V ) {“Location”}Variable domain (U) U(“Location”) = R

TSWF-net Time scope [0, 100000] (s)

ts-guardsMinimumfulfillment duration (ÎŽ)

60 (s)

Minimumfulfillment duration (ÎŽ)(Phase 4 [end])

180 (s)

Minimumfulfillment duration (ÎŽ)(Phase 4 not present)

900 (s)

Maximumunfulfillment ratio (ρ)

0.05

Conformancechecking

Reversing time (R) 3600 (s)

ïżœ Maximum unfulfilment ratio (ρ ∈ [0, 1]): It defines the maximum allowed ratio of unfulfil-ment subintervals within a wider interval of analysis.

The parameters defined above add a layer of constraints on top of the nominal ts-guards ofTable A.3. Formally, in this experimentation a ts-guard GTS is considered fulfilled if and onlyif the following condition holds 2:

GTS(Δ) = true ∧∃Δâ€Č ⊂ Δ :

(interval(Δâ€Č) ≄ ÎŽ

)∧

1

ÎŽ

∑

Δâ€Čâ€Č∈Δâ€Č:GTS(Δâ€Čâ€Č)=false

interval(Δâ€Čâ€Č)

≀ ρ

(A.1)

A summary of the parameter tuning for this experimentation is presented in Table A.2,including the parameters discussed above and other related values, such as the reversing timefor the evaluation of time mismatch events. Note that the values of ÎŽ for the ts-guard of Phase4 [end] and Phase 4 not present are greater than the default value, due to they are usuallylonger phases, more prone to errors and data noise. Note also that the value of r is not very high,in order to avoid situations where moving too far backward in time leads to a de-contextualizedevaluation.

1Two location values di and di−1 are considered equal if their difference is less than a threshold value, whichis set to 3 meters in this experiment.

2In the expression, interval(Δ) refers to the time between the last and first log entry in Δ.

Page 133: Automated Methods for the Evaluation and Analysis of

122Chapter A. Additional resources for the experimental evaluation of time series-aware

conformance checking on Longwall mining processes

Table A.3: Expressions of the time series guards of the transitions of the process model “First halfcycle of a shearer”. Guards are expressed in terms of a set of log entries Δ ∈ E∗ (E = X × V ×D)from a specific case of a time series log L = (V,U,X, Y ). For simplicity sake, we assume that the setof log entries is sorted by time index, i.e., Δ can be expressed as Δ = {ei = 〈xi, vi, di〉}i∈N, where xiis the time index of the i-th log entry (xi > xi−1 ∀i), vi the variable name and di the data value.

Transition(Task)

GTS expression

Phase 1[start]

GTS(Δ) =

true if ∃ 〈xi, v, di〉 , 〈xi−1, v, di−1〉 ∈ Δ

∣∣∣v=“LOCATION”

: di < 40 ∧ di > di−1

false otherwise

Phase 1[end]

GTS(Δ) =

true if ∃ 〈xi, v, di〉 , 〈xi−1, v, di−1〉 ∈ Δ

∣∣∣v=“LOCATION”

: di >= 40 √ di ≀ di−1false otherwise

Phase 2[start]

GTS(Δ) =

true if ∃ 〈xi, v, di〉 , 〈xi−1, v, di−1〉 ∈ Δ

∣∣∣v=“LOCATION”

: di < 40 ∧ di ' di−11

false otherwise

Phase 2[end]

GTS(Δ) =

true if ∃ 〈xi, v, di〉 , 〈xi−1, v, di−1〉 ∈ Δ

∣∣∣v=“LOCATION”

: di 6= di−1

false otherwise

Phase 3[start]

GTS(Δ) =

true if ∃ 〈xi, v, di〉 , 〈xi−1, v, di−1〉 ∈ Δ

∣∣∣v=“LOCATION”

: di < 40 ∧ di < di−1

false otherwise

Phase 3[end]

GTS(Δ) =

true if ∃ 〈xi, v, di〉 , 〈xi−1, v, di−1〉 ∈ Δ

∣∣∣v=“LOCATION”

: di < di−1

false otherwise

Phase 4[start]

GTS(Δ) =

true if ∃ 〈xi, v, di〉 , 〈xi−1, v, di−1〉 ∈ Δ

∣∣∣v=“LOCATION”

: di > 40 ∧ di > di−1

false otherwise

Phase 4[end]

GTS(Δ) =

true if ∃ 〈xi, v, di〉 , 〈xi−1, v, di−1〉 ∈ Δ

∣∣∣v=“LOCATION”

: di ≀ di−1false otherwise

Phase4 notpresent

GTS(Δ) =

true if ∃ 〈xi, v, di〉 , 〈xi−1, v, di−1〉 ∈ Δ

∣∣∣v=“LOCATION”

: di < 40 ∧ di ≀ di−1false otherwise