understanding occupants’ behaviour, engagement, emotion
TRANSCRIPT
Understanding occupants’ behaviour, engagement,1
emotion, and comfort indoors with heterogeneous2
sensors and wearables3
Nan Gao1, Max Marschall2, Jane Burry3, Simon Watkins4, and Flora D. Salim14
1RMIT University, School of Computing Technologies, Melbourne, 3000, Australia5
2RMIT University, School of Architecture and Urban Design, Melbourne, 3000, Australia6
3Swinburne University of Technology, School of Design, Melbourne, 3122, Australia7
4RMIT University, School of Engineering, Melbourne, 3000, Australia8
ABSTRACT9
We conducted a field study at a K-12 private school in the suburbs of Melbourne, Australia. The data capture contained twoelements: First, a 5-month longitudinal field study In-Gauge using two outdoor weather stations, as well as indoor weatherstations in 17 classrooms and temperature sensors on the vents of occupant-controlled room air-conditioners; these werecollated into individual datasets for each classroom at a 5-minute logging frequency, including additional data on occupantpresence. The dataset was used to derive predictive models of how occupants operate room air-conditioning units. Second,we tracked 23 students and 6 teachers in a 4-week cross-sectional study En-Gage, using wearable sensors to log physiologicaldata, as well as daily surveys to query the occupants’ thermal comfort, learning engagement, emotions and seating behaviours.This is the first publicly available dataset studying the daily behaviours and engagement of high school students usingheterogeneous methods. The combined data could be used to analyse the relationships between indoor climates and mentalstates of school students.
10
Background & Summary11
How can indoor spaces be designed in ways that increase occupant well-being while decreasing energy consumption? Answering12
this question requires a holistic understanding of indoor climates, occupant comfort and behaviour, as well as the dynamic13
relationships between these different aspects. The present study sits within a context of research that aims to gain insights14
by examining these themes using mixed methods of data capture within operational buildings. More specifically, the study15
contains two separate assays, each relating to a distinct body of existing research.16
The first assay is a 5-month longitudinal field study using outdoor and indoor weather stations as well as sensors to17
determine the use of occupant-controlled room air-conditioners. This assay was undertaken to contribute knowledge to the18
research field of occupant behaviour modelling in building performance simulation. During the design of buildings, engineers19
often use simulations to predict the indoor environmental quality and energy consumption of design options in order to inform20
decision-making. There are often large discrepancies between simulated and actual building performance1. One of the main21
factors driving this so-called ’performance gap’ is the current misrepresentation of occupant behaviour in the simulations2. The22
software is accurate at modelling deterministic systems like automated air-conditioning units that are governed by set point23
temperatures, but incapable of accurately modelling the probabilistic nature of human behaviour, for example, the manual24
operation of air-conditioners. Occupant behaviour tends to be modelled on simplistic, rule-of-thumb assumptions that are not25
backed by data3, usually by using the same set point approaches that are applied to automated systems (e.g. occupant switches26
on the air-conditioner when the indoor temperature exceeds 24 °C). Actual human behaviour is less responsive and more varied;27
thus, researchers have conducted field studies in operational buildings, by measuring various environmental and other variables28
alongside an observed behaviour (for example, the operation of air-conditioners, windows, lights, fans, etc.). They use this29
data to derive statistical models of the observed behaviour based on one or several of the observed independent variables4–6.30
The first assay of our study contributes data towards this endeavour, specifically enabling the creation of predictive models of31
occupants’ use of room air-conditioners in schools.32
The second assay is a 4-week cross-sectional study tracking 23 students and 6 teachers, using wearable sensors to log33
physiological data, as well as daily surveys to query the occupants’ thermal comfort, learning engagement, seating positions34
and emotions while at school. Buildings contribute about a third of world energy consumption, which is mainly due to indoor35
climate regulation using heating, ventilation and air-conditioning (HVAC) systems. Since we spend so much energy and effort36
on providing adequate environments to building occupants, it is worth investigating what exactly constitutes their comfort and37
Name Year Par Type Modalities Annotations Duration ScenarioDriving-stress9 2005 24 Field ECG, EDA, EMG, RESP Stress level >50 mins Real-world driving tasks
DEAP10 2011 32 LabVideos, EEG, EDA, BVP,RESP, ST, EMG and EOG
Arousal, valence, like/dislike, dominance,familarity
40 mins Watch music videos
Driving-work11 2013 10 Field EDA, HR, TEMP Mental workload 30 mins Drive a predefined routeStudentLife12 2014 48 Field Smartphone Stress, mood, happiness 10 weeks Real life, student exams
DECAF13 2015 30 LabECG, EMG, EOG, MEG,near-infrared face, video
Valence, arousal, anddominance
>1 hourWatch music video andmovie clips
Non-EEG14 2016 20 LabACC, EDA, HR, TEMP,SpO2
N/A <1 hoursFour types of stress(physical, emotional,cognitive, none)
Ascertain15 2016 58 LabECG, EDA, EEG, facialfeatures
Arousal, valence,engagement, liking,familarity, personality
90 mins Watch movie clips
Stress-math16 2017 21 Lab ACC, EDA, HR, TEMP Anxiety26 hours(total)
Solve math questionsunder different pressure
WESAD17 2018 15 LabACC, BVP, ECG, EDA,EMG, RESP, TEMP
Affect, anxiety, stress 2 hoursNeutral, amusement andstress conditions
Snake18 2020 23 Lab ACC, BVP, EDA, TEMPCognitive load,personality
>6 minsSmartphone games withthree difficulty levels
CogLoad18 2020 23 Lab ACC, BVP, EDA, TEMPCognitive load,personality
N/A 6 cognition load tasks
K-EmoCon19 2020 32 LabVideos, audio, ACC,EDA, EEG, ECG, BVP,TEMP
Arousal, valence,stress, affect
173 mins(total)
Social interactionscenario involvingtwo people
En-Gage 2021 29 FieldACC, EDA, BVP, TEMP,In. TEMP, HUMID., CO2,NOISE
Cognitive, behavioral,emotion engagement,thermal comfort, arousal,valence
4 weeks(1416 hoursin total)
Real-world courses ina high school
Table 1. Publicly available datasets in affective computing area
well-being. The above-mentioned building performance simulations tend to define comfort either by using deemed-to-satisfy38
temperature thresholds or by using comfort models, most commonly the predicted mean vote (PMV) model. However, the39
PMV model has not been updated since it was derived from laboratory experiments in the 1960s. It has been criticised for its40
poor predictive performance in real-world contexts7 and does not appear to apply for all age groups8. Furthermore, thermal41
acceptance is clearly only one of several metrics for assessing indoor well-being.42
On the other hand, studying students’ learning engagement, emotions, and daily behaviours has attracted increasing interests43
to address problems such as low academic performance and disaffection. Sensor-based physiological and behaviour recordings44
provide great opportunities to unobtrusively measure students’ behaviours and emotional changes in classroom settings20, 21.45
In previous studies, various physiological signals such as electrodermal activity (EDA), heart rate variability (HRV) and46
environmental data have been investigated to assess emotional arousal and engagement level. For example, EDA is generally47
regarded as a good indicator of psychological arousal, which has been increasingly studied in affective computing area, such as48
the detection of engagement20, 21, emotion22, and depression23. Existing datasets in affective computing area provide limited49
scope on understanding emotion responses in real-world settings. Or, they only consider a particular type of annotations to meet50
their research goals (e.g., stress level, mental workload). Table 1 shows how En-Gage dataset is distinguished from existing51
emotion datasets.52
Our second assay is the first publicly available dataset studying the daily behaviours and engagement of high school students53
using heterogeneous methods. Together with the first assay data set, it offers a unique opportunity to analyse the relationships54
between indoor climates and the mental states of school students – not only related to their thermal comfort but also their55
emotions, engagement and productivity while at school. Especially, it’s unusual to combine individual sensor data with building56
environmental data together, to study how indoor and outdoor environments influence the complex occupant behaviours and57
physiological responses, which will benefit building scientists, behaviour psychologists and affective computing researchers.58
2/15
(a) Empatica E4 wristbands (b) Netatmo indoor weather station (c) Classroom for Year 10 students
Figure 1. Devices and environments for collecting wearable and indoor data.
Group Room Participant
FormR1 P13, P14, P15, P16, P17, P18, P19, P20, P21, P22R2 P8, P9, P10, P11, P12, P23R3 P1, P2, P3, P4, P5, P6, P7
MathR1 P2, P4, P5, P10, P11, P14, P18R2 P3, P6, P7, P8, P9, P15, P16, P17, P20R3 P1, P12, P13, P19, P21, P22, P23
Language
R1 P1, P2, P4, P7, P10, P13, P15, P17, P19, P20, P21, P22, P23R2 P9, P14R3 P5, P6, P11, P12, P16R4 P3, P8 P18
Table 2. Distribution of student participants in different class groups.
Methods59
Ethics approval60
The data collection was approved by the Science, Engineering and Health College Human Ethics Advisory Network (SEH61
CHEAN) of RMIT University. SEH CHEAN also reviewed and approved the participant consent form which included62
information on the purpose and procedure of the research, the types of data that were collected, the compensation of the63
involvement, and the protocol of privacy protection and data storage. The project was furthermore approved by the principal of64
the school in which the study was conducted.65
Participants and recruitment66
For the cross-sectional assay in this study, we recruited 23 students (15-17 years old, 13 female and 10 male) in Year 10, and67
6 teachers (33-62 years old, 4 female and 2 male). The recruitment occurred between August and September 2019. First,68
we presented an introduction of the project to students and teachers in the school hall, and handed out information leaflets,69
recruitment letters and consent forms - for students, their guardians and teachers. In the following days, we received the signed70
consent forms from students and teachers who volunteered to participate. The volunteers were then asked to complete an71
online background survey which was accessible through a web page link that we shared with them. In this survey, we collected72
information on the participants’ age, gender, general thermal comfort and students’ classes. The students of year 10 at this73
school were taught in separate class groups. They were separated into 3 "form" groups (for English, Science, Global Politics,74
Physical Education, Health/Sport classes), 3 math groups and 4 language groups (see Table 2). Asking for each individual75
student’s class group in the background survey allowed us to determine which classroom they were in at any given time. Among76
the participating teachers, there were 3 math teachers, 1 English teacher, 1 Japanese teacher, and 1 science teacher.77
As a token of appreciation for their participation, we gave each of the participating students a certificate of participation and78
four movie vouchers - one for each week of successful participation. Participation in this research project was voluntary, and79
we communicated to the participants that they were free to withdraw from the project at any stage.80
3/15
Devices Collected data Sampling rate Time frame
Empatica E4 wristband
3-axis acceleration 32 Hz
4 weeksSkin temperature 4 HzElectrodermal activity 4 HzBlood volume pulse 64 Hz
Netamo indoor weatherstation
Humidity, temperature, noise level, CO2 5 minutes 5.5 months
DigiTech XC0422 outdoorweather station
Temperature, humidity, barometricpressure, wind speed, wind direction,solar radiation, UV, rainfall
5 minutes 5.5 months
PHILIO Z-wave (attachedto air-conditioning vents)
Humidity, temperature 5 minutes 5.5 months
Table 3. Data collected with sensors with respective sampling rate and time.
Figure 2. Data sample showing the indoor ambient temperature, the temperature reading at the air conditioning vent and theinferred air conditioning states.
Experiment setup81
We conducted our study at a mixed-gender K-12 private school.The longitudinal study was conducted for a 5.5-month period82
from 7th October 2019 to 23rd March 2020, using the indoor and outdoor weather stations as well as temperature sensors83
attached to air-conditioning outlets. The cross-sectional study included 4 weeks of data capture: the first two weeks of data84
were collected in early September 2019 (winter in the southern hemisphere), and the second two weeks in November 201985
(spring in the southern hemisphere). Totally, we have collected 1415.56 hours of wearable data from all participants.86
In our data collection, we tracked participants using Empatica E4 1 wristbands measuring physiological data, as well as daily87
surveys to query the occupants’ thermal comfort, learning engagement, and emotions while at school. 1 volunteer student was88
chosen as the representative for each of the three form classes. Their job was to distribute the wristband sensors each morning,89
collect them after school, and remind the participants to complete the online surveys at the appropriate times. We anonymised90
the student’s data by assigning each student an ID. Occupancy schedules were obtained from individual classroom schedules91
provided by the school. These schedules may be used to represent the actual occupancy patterns of the building, although92
slight deviations from the planned schedule are to be expected in a school setting due to sickness and other circumstances. The93
following is a description of the research instruments used in the study.94
DigiTech XC0422. We set up two outdoor weather stations on-site: one in the prevailing NNW windward direction located95
at some distance from the buildings, and one on the SSE leeward side. These logged the data types shown in Table 2 at 5-minute96
intervals via the school’s guest WIFI to WUnderground.com where it can be accessed remotely. Note that these weather stations97
log solar irradiance values in W/m2 but only have a luminosity sensor. The method of conversion from lux to W/m2 is unclear98
from the product’s datasheet, but we assumed that it was in line with a commonly used, simplified conversion rate (e.g. Michael,99
2019)24.100
Netatmo Healthy Home Coach. We collected indoor environmental data using Netatmo Healthy Home Coaches 2 installed101
in 17 classrooms as shown in Figure 1(b) and Figure 1(c). These devices measure indoor temperature, relative humidity, CO2102
and noise levels at a 5-minute logging frequency. The data is uploaded in real-time via the school’s guest WiFi to the Netatmo103
cloud platform from which we could access the data remotely through our Netatmo account login. The analysed classrooms104
1Empatica E4 wristband: https://www.empatica.com/en-int/research/e4/2Netatmo Healthy Home Coach: https://www.netatmo.com/en-eu/aircare/homecoach
4/15
Figure 3. Daily indoor environmental trends by month.
differed from one another in several aspects including the room geometry and orientation, as well as the number and location of105
windows. The placement of environmental sensor devices was therefore determined on a case-by-case basis, with the goal of106
finding an optimal trade-off between several, partly conflicting considerations, most of which were suggested by Wagner et107
al. (2017). For example, we tried placing the sensors close to the occupants but at the same time avoiding the sensors from108
being obstructed, biased or obtrusive due to their proximity to the occupants, furniture, heating elements, vents or appliances.109
ASHRAE Standard 55 recommends temperature sensor heights of 0.1, 0.6 and 1.1 m for ankles, waists and heads of seated110
occupants, respectively. Since in this study only one device per room was installed and the head height of children is lower than111
that of adults, we attempted to place the sensors at about 0.9 m height, while respecting the above considerations as well as112
possible. Three of the classrooms with a Netatmo sensor were classrooms frequented by year 10 students, therefore this data113
could be used in combination with the data captured by the Empatica E4 wristband sensors in the cross-sectional study assay. It114
should be noted that we started using these 3 devices and the outdoor weather stations before beginning the full longitudinal115
field study in all classrooms.116
Philio Temperature/Humidity Sensor. The classrooms had split-system remote-controlled air-conditioning units for117
heating and cooling. We inferred their usage by measuring temperature fluctuations with Philio Temperature/Humidity Sensors118
placed at the outlets of the vents of the remote-controlled room air conditioning units. The sensors logged data via Z-Wave to119
Vera Edge hubs, several of which were placed throughout the school due to their limited range. Data was logged at 5-minute120
intervals using custom LUA scripts via the VeraAlerts app within the Vera SmartHome app, which enabled sending the data to121
the Pushbullet online platform from where they could be accessed remotely.122
Empatica E4 wristband. These wristband sensors (see Figure 1(a)) were first proposed for use in studies by Garbarino et123
al.25. The watch-like devices have multiple sensors: an electrodermal activity (EDA) sensor, a photoplethysmography (PPG)124
sensor, a 3-axis accelerometer (ACC), and an optical thermometer. EDA refers to the constantly fluctuating changes in the125
electrical properties of the skin at 4 Hz, when the level of sweat increases, the conductivity of the skin increases. PPG sensors126
measure the blood volume pulse (BVP) at 64 Hz, from which the inter-beat interval (IBI) and heart rate variability (HRV)127
can be derived. The ACC records 3-axis acceleration in the range of [-2g, 2g] at 32Hz and captures motion-based activity,128
which has been widely used in smartphones, wearables, and other IoT devices26. The optical thermometer reads peripheral skin129
temperature (ST) at 4 Hz. In recording mode, E4 wristbands can store 60 hours of data in memory, with battery lifetimes of130
over 32 hours. They are light-weight, comfortable and water-proof, thus especially suitable for continuous and unobtrusive131
monitoring of the participants in our study. Before the data collection, all wristbands were synchronized with the E4 Manager132
App, using a single laptop to ensure that the internal clocks were accurate. Each student was assigned a wristband sensor133
marked with their unique study ID. The students were asked to wear the wristband on non-dominating hands, and to avoid134
pressing the button or performing any unnecessary movements during class. The teaching participants were only required to135
wear the wristbands while teaching the year 10 classes. Figure 4 shows the average wearable signals per school day for all136
participants from 369 traces.137
Daily surveys. On each school day, student participants were asked to complete online surveys (either through tablets that138
we placed in each of the study classrooms, or using their own digital devices) at 11:00, 13:25, 15:35 (directly after the 2nd ,139
4th, and 5th lesson). The length of the 2nd and 4th lesson was either 40 minutes or 80 minutes depending on the day of the140
week, and the 5th class always lasted 80 minutes. The class schedules at this school had a fortnightly rhythm. That is, week141
5/15
(a) Heart Rate (b) Electrodermal Activity
(c) Skin Temperature (d) 3-axis Acceleration (Magnitude)
Figure 4. Wearable signals per school day for all participants (369 traces in total).
1 and 2 had different timetables, but week 1 and 3 were identical, as were week 2 and 4 etc. The representative student was142
tasked to remind the student participants to complete the online surveys on time. The collected annotations (thermal comfort,143
learning engagement, emotion, etc.) are described in Table 4. Figure 6 shows the distribution of responses for the thermal144
sensation (from -3 to 3), thermal preference and clothing level. The distribution of multi-dimensional (behavioural, emotional145
and cognitive) engagement, and valence/arousal emotion are shown in the Figure 7.146
Data post-processing147
For the longitudinal study, we extracted the environmental data from their respective online platforms and rounded each data148
point’s timestamp to the nearest 5-minute step to enable the aggregation of data from different sources, and interpolated over149
missing data points. The data from the two weather stations were averaged for each time step. In cases where one of the150
stations had missing data, we used the other weather station’s data point. The outdoor wind direction was originally given151
in a 16-step scale of cardinal directions which we converted to numerical angle values in degrees. Within the context of this152
field study, there was no way to directly monitor when the air conditioning units were in use. Instead, we measured their use153
indirectly with the Philio temperature sensors mounted to the air conditioning outlets. Creating an algorithm that reliably154
distinguishes all four event types (cooling switched on, cooling switched off, heating switched on and heating switched off) is a155
Annotation categories Description Measurement scale
Thermal sensation Commonly used ASHRAE thermal sensation27 -3: cold, -2: cool, -1: slightly cool, 0: neutral,1: slightly warm, 2: warm, 3 = hot
Thermal preference Commonly used ASHRAE thermal preference27 Choose one (cooler, no change, warmer)Clothing level Commonly used ASHRAE clothing insulation27 Choose multipleSeating position Seating position in the classroom Click one point
Behavioural/Emotional/Cognitive engagement
Adapted In-class Student EngagementQuestionnaires (ISEQ)28
-2: strongly disagree, -1: somewhat disagree,0: neither agree nor disagree, 1: somewhat agree,2: strongly agree
Arousal/ValenceCommonly used affective dimensions fromthe Photographic Affect Meter (PAM)29 Choose one photo
Confidence level Confidence level of the response1: not confident, 2: slightly confident, 3: moderatelyconfident, 4: very confident, 5: extremely confident
Table 4. Collected annotations from the questionnaires.
6/15
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23Participant ID
2
3
4
Ove
rall
Enga
gem
ent S
core
Low
High
Enga
gem
ent L
evel
Figure 5. Distribution of the overall engagement scores for 23 student participants.
task that would have exceeded the scope of our research. Instead, we used threshold values of the temperature slope to predict156
events. If the current state was off, then a sudden rise would be classified as switching on the heating; if the current state was157
cooling, the same rise in temperature would be classified as switching off the cooling. Since this crude method was limited158
in its predictive capability, we relied on a visual assessment of the data and manually overwrote time frames with states that159
appeared to have been incorrectly categorised by the algorithm. This is a potential source for error, but we assumed that the160
assessment was sufficiently accurate for this study - an assumption that proved correct when testing it on site. We aggregated161
all the data into spreadsheets for each classroom individually, and added several data, including columns that identified holiday162
periods, occupancy and time frames of insufficient data coverage.163
In the cross-sectional study, for the wearable data, we converted the timestamps of the wristband sensor readings from164
raw time intervals and Unix time to the local datetime format. Then we categorized the wearable data based on a different165
date. We also extracted the wearable data according to the scheduled length for the 2nd , 4th, 5th class which ends at 11:00,166
13:25, 15:35. For the online survey, we received a total of 488 valid online surveys from students with a response rate of 35.3%.167
We also received 22 online surveys from teachers. We then aligned the survey data to one of the three classes. We set the168
survey responses before 11:25 pm belong to the 2nd class, between 12:15 pm - 14:15 pm belong to the 4th class, after 14:15 pm169
belongs to the 5th class.170
Data Records171
Summary172
We are sharing 2 datasets: one for the longitudinal study and one for the cross-sectional study. For the latter, we have provided173
two versions: the original raw data by date and a dataset based on the different class groups of the participants.174
The longitudinal study dataset consists of comma-separated variable (CSV) files - one for each classroom. Each classroom’s175
spreadsheet contains time-related information and outdoor weather conditions (these are obviously identical for all classrooms).176
Furthermore, each classroom has information on its own indoor climate, whether or not it is occupied according to the class177
schedule, and information on whether its room air-conditioner is in heating or cooling mode.178
3 2 1 0 1 2 3General Thermal Sensation
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
Freq
uenc
y
(a) Thermal Sensation
Cooler No change WarmerThermal Preference
0
50
100
150
200
250
Num
bers
(b) Thermal preference
Shirt JumperJacket Pants Shorts Skirt Dress Other0
100
200
300
400
Num
bers
(c) Clothing level
Figure 6. Distribution of responses related to thermal comfort.
7/15
0 1 2 3 4 5 6Engagement Score
0.0
0.1
0.2
0.3
0.4
0.5
Den
sity
BehaviouralEmotionalCognitive
(a) Multi-dimensional Engagement
1 2 3 4Valence
43
21
Arou
sal
0.02 0.01 0.09 0.05
0.01 0.07 0.04 0.09
0.03 0.04 0.07 0.08
0.04 0.23 0.11 0.030.025
0.050
0.075
0.100
0.125
0.150
0.175
0.200
0.225
(b) Valence and Arousal
Figure 7. Distribution of responses related to the engagement and emotion.
Figure 8. Hourly outdoor climate (averaged between the two weather stations).
The cross-sectional study dataset includes physiological signals measured with the wristband sensors as well as self-reported179
engagement, thermal comfort, seating locations, and emotion data from the student and teacher participants. It also lists the180
outdoor and indoor environment data for the year 10 classrooms.181
Contents182
The In-Gauge and En-Gage datasets30 are available on Figshare (https://doi.org/10.25439/rmt.14578908). In the following, we183
describe the directories and files in our datasets.184
Longitudinal185
This folder contains all data pertaining to the longitudinal field study. It consists of a TXT file describing the dataset and 16186
CSV files - one for each classroom. The CSV file names correspond to the classroom names. Each CSV file has a single header187
line and each of the following rows contains the following timestamped data at a resolution of 5 minutes per row:188
• Timestamp: Local Datetime format e.g. ’2019-10-08 18:25:00’.189
• Year: An integer of either 2019 or 2020.190
• Month: An integer between 1 and 12.191
• DayOfYear: An integer between 1 and 365.192
• Occupied: ’0’ means that the room was not occupied at this time according to the classroom schedule; ’1’ means it was.193
• SchoolDay: ’0’ means that this day was not a school day; ’1’ means it was.194
• Hour: An integer representing the hour od day from 0 to 23.195
8/15
Figure 9. Multiple logistic regression modelling results for switching a room air-conditioner on (left) and off (right), based onindoor and outdoor air temperature.
Units Range Accuracy ResolutionDry Bulb Temperature °C -40 °C – 60 °C ±1 % 0.1 °CDew Point Temperature °C -40 °C – 60 °C ±1 % 0.1 °CRelative Humidity % 1 % - 99 % ±5 % 1%
Wind Speed m/s 0 m/s - 50 m/s±1 m/s (<5 m/s)±10 % (>= 5 m/s)
0.1 m/s
Gust Speed m/s 0 m/s - 50 m/s±1 m/s (<5 m/s)±10 % (>= 5 m/s)
0.1 m/s
Wind Direction ° 0 ° - 360 ° ±22.5 ° 22.5 °
Rainfall mm 0 mm - 9999 mm ±10 %0.3 mm (<1000 mm)1 mm (>= 1000 mm)
Light Lux 0k Lux - 400k Lux ±15 % 0.1 LuxSolar Radiation W/m2 - - -
Table 5. DigiTech XC0422 logging specifications.
• LessonNumber: An integer signifying which lesson is currently taking place (note that each school day started with a196
10-minute assembly referred to here as the ’0’th lesson): ’-1’ = outside of school hours; ’0’ = 8:50-9:00; ’1’ = 9:00-9:40;197
’2’ = 9:40-10:20; ’3’ = 10:20-11:00; ’4’ = 11:25-12:05; ’5’ = 12:05-12:45; ’6’ = 12:45-13:25; ’7’ = 14:15-14:55; ’8’ =198
14:55-15:35; ’9’ = Recess times or special "Breadth Studies" session on Wednesdays.199
• LessonPct: A fraction between 0.0 and 1.0 describing how much of the current lesson has passed.200
• IndoorTemperature: A decimal number representing the current indoor temperature in °C.201
• IndoorHumidity: An integer representing the current indoor relative humidity in %.202
• IndoorCO2: An integer representing the current indoor CO2 concentration in ppm.203
• IndoorNoise: An integer representing the current indoor noise level in dB.204
• OutdoorTemperature: A decimal number representing the current outdoor temperature in °C.205
• OutdoorHumidity: An integer representing the current outdoor relative humidity in %.206
• OutdoorDewpoint: A decimal number representing the current outdoor dewpoint temperature °C.207
• OutdoorWindDirection: An integer representing the current outdoor wind direction in degrees, from 0 to 360 (0° = north208
wind, 90° = east wind, etc.).209
• OutdoorWindSpeed: A decimal number representing the current outdoor wind speed in m/s.210
• OutdoorGustSpeed: A decimal number representing the current outdoor gust speed in m/s.211
• Precipitation: A decimal number representing the current outdoor precipitation in mm.212
• UvLevel: An integer between 0 and 11 representing the current outdoor Global Solar UV Index.213
• SolarRadiation: An integer representing the current outdoor solar radiation intensity in W/m2.214
• CoolingState: ’0’ means that the room air-conditioner was currently not cooling the room; ’1’ means it was.215
• HeatingState: ’0’ means that the room air-conditioner was currently not heating the room; ’1’ means it was.216
• UsabilityMask: For timeframes where too much data was missing, we set this UsabilityMask field to "False" for the217
entire day. During holidays, the UsabilityMask also reads "False".218
9/15
Units Range Accuracy ResolutionDry Bulb Temperature °C 0 °C to 50 °C ± 0,3 °C 0.1 °CRelative Humidity % 0 to 100 % ± 3 % 1 %
CO2 ppm 0 to 5,000 ppm±50 ppm (<1,000 ppm)±5 % (>= 1,000 ppm)
1 ppm
Noise dB 35 dB to 120 dB - 1 dB
Table 6. Netatmo Healthy Home Coach logging specifications.
Figure 10. Simple logistic regression modelling results for switching a room air-conditioner on (first figure) and off (secondfigure), based on indoor air temperature (third figure) or outdoor temperature (fourth figure).
Participant_class_info219
This folder contains demographic information on the background questionnaires participants, and the class table. Note that for220
several survey questions, we adopted the 5-point Likert scale: -2 = ’strongly disagree’, -1 = ’somewhat disagree’, 0 = ’neither221
agree nor disagree’, 1 = ’somewhat agree’ and 2 = ’strongly agree’. The Participant_class_info folder contains the following222
files:223
1. Student.csv. Each row in this file contains a participant ID (Column A), gender (Column B), age in years (Column C),224
form room, math room and language room (Columns D - F), and three background questions (Columns G - K) related to225
their general thermal comfort and engagement in class. Specifically, Columns G to I represent, respectively, the questions226
’What is your general feeling in the classroom?’ [-3 = cold, -2 = cool, -1 = slightly cool, 0 = neutral, 1 = slightly warm, 2227
= warm, 3 = hot], ’When I am engaged in class, I usually don’t feel too hot or too cold’ and ’When I am engaged in class,228
I could get distracted when the room is too hot or too cold’. For the latter 2 questions, we adopted the 5-point Likert229
scale.230
2. Teacher.csv. Each row in this file contains a participant ID (Column A), gender (Column B), age in years (Column C),231
teaching subject (Columns D), and three background questions similar to the student.csv file, except that we changed the232
last two questions slightly from ’When I am engaged in class, [...]’ to ’When I am engaged in teaching, [...]’.233
3. Class_table.csv. We generate this file from the class schedule obtained from the school. Each row in this file contains the234
information of one single class, including the unique class ID (Column A), classroom (Column B), date (Column C),235
start time of the current class (Column D), finish time of the current class (Column E), length of the class (Column F),236
week (Column G), weekday (Column H), the order of the class (Column I) and the course name (Column J). Specifically,237
Column K shows whether students take this class in a form group, where ’0’ indicates they are not in a form group, ’all’238
indicates all students take this class in one whole form group (i.e., Assembly, Chapel), the R1/R2/R3 means students take239
this class in form groups and their form room is R1, R2 or R3.240
Survey241
This folder contains 2 files: Student_survey.csv and Teacher_survey.csv.242
Student_survey.csv contains the 488 survey responses including 15 columns where Column A is participant ID and Column243
B is the recorded time. There are columns containing thermal comfort-related information (Columns C - G), multi-dimensional244
student engagement (Columns H - L), mood (Column M), and confidence level of the survey (Column N). The engagement245
questions were rated using the Likert-scale. To calculate the engagement score, users should reverse the responses in item 2 and246
item 4, then calculate the average of the 5-point Likert scale for each dimension of engagement. The specific columns relate to247
the following questions:248
10/15
Figure 11. Distribution of seating positions across different participants.
• Column C: Thermal_sensation: "How do you feel right now in the classroom?" [-3 = cold, -2 = cool, -1 = slightly cool, 0249
= neutral, 1 = slightly warm, 2 = warm, 3 = hot].250
• Column D: Thermal_preference: "Would you like to be?" [Cooler, No change, Warmer].251
• Column E: Clothing: "What are you wearing now? (multiple options allowed)" [Shirt, Jumper, Jacket, Pants, Shorts,252
Skirt, Dress, Other].253
• Columns F - G: Loc_x, Loc_y: "Where did you sit in the last class? (please click on the floorplan)" [x, y pixels in the254
400*321 room thumbnail where x = y = 0 at the upper left corner].255
• Columns H - L: Engage_1, 2, 3, 4, 5: "Please describe your engagement in the last class": [I paid attention in class], [I256
pretended to participate in class but actually not], [I enjoyed learning new things in class], [I felt discouraged when we257
worked on something], [I asked myself questions to make sure I understood the class content].258
• Columns M - N: Arousal, Valence: "Touch the photo that best captures how you feel right now (optional)" [We assigned259
the arousal and valence values from 1 - 4 to each picture. For instance, for the right bottom picture, valence = 4 and260
arousal = 1].261
• Column N: Confidence_level: "Please rate your confidence level for your answers in this survey (optional)" [5-point262
Likert scales where 1 = Not confident, 2 = Slightly confident, 3 = Moderately confident, 4 = Very confident, 5 = Extremely263
confident].264
Teacher_survey.csv contains the 22 survey responses by the teachers. The file includes 11 columns where Column A is the265
recorded time, Column B is the wristband ID, Columns C - E are the thermal comfort related information, Columns F - G are266
the engagement related information, and Column K is the confidence level of the survey. For the wristband ID in Column B,267
A/B/C/D represent the classrooms R1/R2/R3/R4. The specific columns relate to the following questions:268
• Column B: Wristband_id: "Please enter your wristband ID." [A, B, C, D].269
• Column C: Thermal_sensation: "How do you feel right now in the classroom?" [-3 = cold, -2 = cool, -1 = slightly cool, 0270
= neutral, 1 = slightly warm, 2 = warm, 3 = hot].271
• Column D: Thermal_preference: "Would you like to be?" [Cooler, No change, Warmer]272
• Column E: Clothing: "What are you wearing now? (multiple options allowed)" [Shirt, Jumper, Jacket, Pants, Shorts,273
Skirt, Dress, Other].274
• Column F - G: Engage_1, 2, 3, 4, 5: "Please describe your engagement in the last class": [I was excited about teaching],275
[I felt happy while teaching], [While teaching, I paid a lot of attention to my work], [I cared about the problems of my276
students], [I was aware of my students’ feelings].277
• Column K: Confidence_level: "Please rate your confidence level for your answers in this survey (optional)" [5-point278
Likert scales where 1 = Not confident, 3 = Somewhat confident, 5 = Very confident].279
11/15
Figure 12. Screenshot of the mood question.
Raw_wearable_data280
This folder contains 20 sub-folders from ’20190902’ to ’20191122’, containing the raw wearable data for each day during281
the 4-week data collection. In each sub-folder, there are multiple sessions from different participants. Some participants282
provided more than 1 session on the same day. The name of each session consists of two parts connected by an underscore: the283
unique session ID and the participant ID. For example, the session named ’1567380164_18’ indicates the data is provided by284
participant 18. There are 6 CSV files in each session, and each of these files (except IBI.csv) has the following format: the first285
row is the initial time of the session expressed as a Unix timestamp in UTC. The second row is the sample rate expressed in Hz.286
Specifically:287
1. ACC.csv contains data from a 3-axis accelerometer sampled at 32Hz which is configured to measure accelerations in the288
range of [-2g, 2g]. Acceleration is the rate of change of the velocity with respect to time, where SI (International System289
of Units)31 derived unit for acceleration is the metre per second squared (m ·s−2) where 1g is equal to 9.80665m ·s−2. The290
unit in this file is 1/64g where the raw value of 64 indicates 1g. The 3 columns refer to the x, y, and z-axis, respectively.291
2. BVP.csv contains Blood Volume Pulse (BVP) sampled at 64Hz which is the primary output from the photoplethysmograph292
(PPG) sensor. BVP signals can be used to compute the inter-beat-intervals (IBI) and heart rate (HR)32.293
3. EDA.csv contains data from an electrodermal activity (EDA) sensor expressed as microsiemens (µS) sampled at 4Hz. The294
variation of EDA values indicates the electrical changes of the skin surface and the EDA arises when the skin receives295
nerve signals from the brain and sweat level increases33.296
4. HR.csv contains the average heart rate data extracted from the BVP signal, calculated in spans of 10 seconds. The first297
row is the initial time of the session and it is 10 seconds after the beginning of the recording. The sampling rate of heart298
rate is 1Hz.299
5. IBI.csv contains the time intervals between a participant’s heartbeats extracted from the BVP signal. This file does not300
have a sampling rate. The first column is the time (in respect to the starting time) of the detected inter-beat interval301
expressed in seconds (s). The second column is the duration in seconds (s) of the detected inter-beat interval (i.e., the302
distance in seconds from the previous beat).303
6. TEMP.csv contains data from a temperature sensor expressed in degrees Celsius (◦C), sampled at 4Hz.304
Class_wearable_data305
The Class_wearable_data folder contains 221 sub-folders representing 221 different classes during which the wearable data306
were recorded. Each sub-folder is named by the unique ’Class_id’ as shown in the Class_table.csv. Each sub-folder includes307
further sub-folders named by the unique participant id or simply the label ’teacher’. These contain data from the wristband308
sensors for each participant of this class. There are 6 CSV files in each sub-folder: ACC.csv, EDA.csv, BVP.csv, HR.csv, IBI.csv,309
and TEMP.csv. The format of these files is identical to the ones in the Raw_wearable_data folder.310
Technical Validation311
Tables 5 and 6 show the specifications of the outdoor and indoor weather stations, respectively. Figure 2 shows a sample of312
data captured from these sensors and the air-conditioning states that we inferred from the data. Figure 8 shows data captured313
by the outdoor weather station; Figure 3 shows data captured by the indoor weather station. We were not able to find the314
12/15
accuracy of the Philio sensors in their datasheet, but assumed that it was sufficient for the purposes of our study. Figures 10315
and 9 demonstrate a potential use case for the longitudinal dataset; here, we fitted different logistic regression models to the316
air-conditioning usage data. Within the scope of our research, we were not able to validate these models. The figures serve317
merely as an illustration of what the dataset may be used for.318
Usage Notes319
Our datasets include the outdoor/indoor/wearable sensing data and the self-report occupants’ thermal comfort, learning320
engagement, and emotions while at school. This dataset is the first publicly available dataset for studying the daily behaviours321
and engagement of high school students with heterogeneous sensing. For the longitudinal outdoor and indoor sensing data, the322
most straightforward potential usage is to derive predictive models of how occupants operate room air-conditioning units34.323
Our dataset could potentially useful to examine the relationships between indoor/outdoor climates and physiological signals of324
occupants, which provide opportunities for the future design of intelligent feedback systems to benefit both students and staffs325
on campus.326
Specifically, various data mining (e.g., segmentation35, clustering36, 37) and modelling techniques38–40 could be applied to327
build prediction models for measuring occupants’ mental state with the sensor-based physiological and behaviour recordings in328
buildings. This could be further used for various applications, such as monitoring signs of frustration and disengagement20, 41,329
proving better seating arrangements42, improving teaching strategies by measuring the emotional climate in classrooms43,330
ventilating the classrooms timely to prevent excessive carbon dioxide from affecting students’ concentration44–46, providing a331
thermally comfort environments for both students and staffs47, etc.332
Code Availability333
Python code for prepossessing the data and implementing the segmentation based on different classes are available online334
https://github.com/cruiseresearchgroup/InGauge-and-EnGage-Datasets.335
References336
1. Haldi, F. & Robinson, D. The impact of occupants’ behaviour on building energy demand. J. Build. Perform. Simul. 4,337
323–338 (2011).338
2. Rijal, H. B., Tuohy, P., Humphreys, M. A., Nicol, J. F. & Samuel, A. An algorithm to represent occupant use of windows339
and fans including situation-specific motivations and constraints. In Building Simulation, vol. 4, 117–134 (Springer, 2011).340
3. Schiavon, S. & Lee, K. H. Dynamic predictive clothing insulation models based on outdoor air and indoor operative341
temperatures. Build. Environ. 59, 250–260 (2013).342
4. Schweiker, M., Haldi, F., Shukuya, M. & Robinson, D. Verification of stochastic models of window opening behaviour for343
residential buildings. J. Build. Perform. Simul. 5, 55–74 (2012).344
5. Langevin, J., Gurian, P. L. & Wen, J. Tracking the human-building interaction: A longitudinal field study of occupant345
behavior in air-conditioned offices. J. Environ. Psychol. 42, 94–115 (2015).346
6. Reinhart, C. F. & Voss, K. Monitoring manual control of electric lighting and blinds. Light. research & technology 35,347
243–258 (2003).348
7. Cheung, T., Schiavon, S., Parkinson, T., Li, P. & Brager, G. Analysis of the accuracy on pmv–ppd model using the ashrae349
global thermal comfort database ii. Build. Environ. 153, 205–217 (2019).350
8. Kim, J. & de Dear, R. Thermal comfort expectations and adaptive behavioural characteristics of primary and secondary351
school students. Build. Environ. 127, 13–22 (2018).352
9. Healey, J. A. & Picard, R. W. Detecting stress during real-world driving tasks using physiological sensors. IEEE353
Transactions on intelligent transportation systems 6, 156–166 (2005).354
10. Koelstra, S. et al. Deap: A database for emotion analysis; using physiological signals. IEEE transactions on affective355
computing 3, 18–31 (2011).356
11. Schneegass, S., Pfleging, B., Broy, N., Heinrich, F. & Schmidt, A. A data set of real world driving to assess driver workload.357
In Proceedings of the 5th international conference on automotive user interfaces and interactive vehicular applications,358
150–157 (2013).359
12. Wang, R. et al. Studentlife: assessing mental health, academic performance and behavioral trends of college students using360
smartphones. In Proceedings of the 2014 ACM international joint conference on pervasive and ubiquitous computing,361
3–14 (2014).362
13/15
13. Abadi, M. K. et al. Decaf: Meg-based multimodal database for decoding affective physiological responses. IEEE363
Transactions on Affect. Comput. 6, 209–222 (2015).364
14. Birjandtalab, J., Cogan, D., Pouyan, M. B. & Nourani, M. A non-eeg biosignals dataset for assessment and visualization of365
neurological status. In 2016 IEEE International Workshop on Signal Processing Systems (SiPS), 110–114 (IEEE, 2016).366
15. Subramanian, R. et al. Ascertain: Emotion and personality recognition using commercial sensors. IEEE Transactions on367
Affect. Comput. 9, 147–160 (2016).368
16. Gjoreski, M., Luštrek, M., Gams, M. & Gjoreski, H. Monitoring stress with a wrist device using context. J. biomedical369
informatics 73, 159–170 (2017).370
17. Schmidt, P., Reiss, A., Duerichen, R., Marberger, C. & Van Laerhoven, K. Introducing wesad, a multimodal dataset for371
wearable stress and affect detection. In Proceedings of the 20th ACM international conference on multimodal interaction,372
400–408 (2018).373
18. Gjoreski, M. et al. Datasets for cognitive load inference using wearable sensors and psychological traits. Appl. Sci. 10,374
3843 (2020).375
19. Park, C. Y. et al. K-emocon, a multimodal sensor dataset for continuous emotion recognition in naturalistic conversations.376
Sci. Data 7, 1–16 (2020).377
20. Gao, N., Shao, W., Rahaman, M. S. & Salim, F. D. n-gage: Predicting in-class emotional, behavioural and cognitive378
engagement in the wild. Proc. ACM on Interactive, Mobile, Wearable Ubiquitous Technol. 4, 1–26 (2020).379
21. Di Lascio, E., Gashi, S. & Santini, S. Unobtrusive assessment of students’ emotional engagement during lectures using380
electrodermal activity sensors. Proc. ACM on Interactive, Mobile, Wearable Ubiquitous Technol. 2, 1–21 (2018).381
22. Bakker, J., Pechenizkiy, M. & Sidorova, N. What’s your current stress level? detection of stress patterns from gsr sensor382
data. In 2011 IEEE 11th international conference on data mining workshops, 573–580 (IEEE, 2011).383
23. Sarchiapone, M. et al. The association between electrodermal activity (eda), depression and suicidal behaviour: A384
systematic review and narrative synthesis. BMC psychiatry 18, 1–27 (2018).385
24. Michael, P. R., Johnston, D. E. & Moreno, W. A conversion guide: solar irradiance and lux illuminance. J. Meas. Eng. 8,386
153–166 (2020).387
25. Garbarino, M., Lai, M., Bender, D., Picard, R. W. & Tognetti, S. Empatica e3—a wearable wireless multi-sensor device388
for real-time computerized biofeedback and data acquisition. In 2014 4th International Conference on Wireless Mobile389
Communication and Healthcare-Transforming Healthcare Through Innovations in Mobile and Wireless Technologies390
(MOBIHEALTH), 39–42 (IEEE, 2014).391
26. Gao, N., Shao, W. & Salim, F. D. Predicting personality traits from physical activity intensity. Computer 52, 47–56 (2019).392
27. Handbook-Fundamentals, A. American society of heating. Refrig. Air-Conditioning Eng. (2009).393
28. Fuller, K. A. et al. Development of a self-report instrument for measuring in-class student engagement reveals that394
pretending to engage is a significant unrecognized problem. PloS one 13, e0205828 (2018).395
29. Pollak, J. P., Adams, P. & Gay, G. Pam: a photographic affect meter for frequent, in situ measurement of affect. In396
Proceedings of the SIGCHI conference on Human factors in computing systems, 725–734 (2011).397
30. Gao, N., Marschall, M., Burry, J., Watkins, S. & Salim, F. In-gauge and en-gage datasets. figshare https://doi.org/10.25439/398
rmt.14578908, 10.25439/rmt.14578908 (2021).399
31. of Weights, I. B., Measures, Taylor, B. N. & Thompson, A. The international system of units (SI) (US Department of400
Commerce, Technology Administration, National Institute of . . . , 2001).401
32. E4 data - bvp expected signal.402
33. Braithwaite, J. J., Watson, D. G., Jones, R. & Rowe, M. A guide for analysing electrodermal activity (eda) & skin403
conductance responses (scrs) for psychological experiments. Psychophysiology 49, 1017–1034 (2013).404
34. Schwee, J. H. et al. Room-level occupant counts and environmental quality from heterogeneous sensing modalities in a405
smart building. Sci. data 6, 1–11 (2019).406
35. Deldari, S., Smith, D. V., Sadri, A. & Salim, F. Espresso: Entropy and shape aware time-series segmentation for processing407
heterogeneous sensor data. Proc. ACM on Interactive, Mobile, Wearable Ubiquitous Technol. 4, 1–24 (2020).408
36. Shao, W., Salim, F. D., Song, A. & Bouguettaya, A. Clustering big spatiotemporal-interval data. IEEE Transactions on Big409
Data 2, 190–203 (2016).410
14/15
37. Shao, W. et al. Onlineairtrajclus: An online aircraft trajectory clustering for tarmac situation awareness. In 2019 IEEE411
International Conference on Pervasive Computing and Communications (PerCom, 192–201 (IEEE, 2019).412
38. Salim, F. D. et al. Modelling urban-scale occupant behaviour, mobility, and energy in buildings: A survey. Build. Environ.413
183, 106964 (2020).414
39. Carlucci, S. et al. Modeling occupant behavior in buildings. Build. Environ. 174, 106768 (2020).415
40. Kjærgaard, M. B. et al. Current practices and infrastructure for open data based research on occupant-centric design and416
operation of buildings. Build. Environ. 177, 106848 (2020).417
41. Rahaman, M. S. et al. An ambient–physical system to infer concentration in open-plan workplace. IEEE Internet Things J.418
7, 11576–11586 (2020).419
42. Rahaman, M. S., Kudo, S., Rawling, T., Ren, Y. & Salim, F. D. Seating preference analysis for hybrid workplaces. arXiv420
preprint arXiv:2007.15807 (2020).421
43. Gashi, S., Di Lascio, E. & Santini, S. Using students’ physiological synchrony to quantify the classroom emotional climate.422
In Proceedings of the 2018 ACM International Joint Conference and 2018 International Symposium on Pervasive and423
Ubiquitous Computing and Wearable Computers, 698–701 (2018).424
44. Arief-Ang, I. B., Salim, F. D. & Hamilton, M. Da-hoc: semi-supervised domain adaptation for room occupancy prediction425
using co2 sensor data. In Proceedings of the 4th ACM International Conference on Systems for Energy-Efficient Built426
Environments, 1–10 (2017).427
45. Arief-Ang, I. B., Hamilton, M. & Salim, F. D. Rup: Large room utilisation prediction with carbon dioxide sensor. Pervasive428
Mob. Comput. 46, 49–72 (2018).429
46. Arief-Ang, I. B., Hamilton, M. & Salim, F. D. A scalable room occupancy prediction with transferable time series430
decomposition of co2 sensor data. ACM Transactions on Sens. Networks (TOSN) 14, 1–28 (2018).431
47. Gao, N. et al. Transfer learning for thermal comfort prediction in multiple cities. Build. Environ. 195, 107725 (2021).432
Acknowledgements433
This research is supported by the Australian Government through the Australian Research Council’s Linkage Projects funding434
scheme (project LP150100246). This paper is also a contribution to the IEA EBC Annex 79.435
Author contributions statement436
N.G. designed, prepared, and conducted the cross-sectional study for wearable data collection, analysed the wearable and indoor437
sensor data, and wrote the manuscript. M.M. designed, prepared and conducted the longitudinal data collection for outdoor and438
indoor sensing, and processed the dataset and wrote the manuscript. J.B, S.W., and F.D.S supervised the data collection, dataset439
design, and revised the manuscript. F.D.S advised N.G on the overall project and data analysis and modelling.440
Competing interests441
The authors declare no competing interests.442
15/15